2014-02-05

Text-to-speech audio books with text image videos for youtube


Down and Out in the Magic Kingdom by Cory Doctorow has a very permissive license for reuse, so I've gone through the steps of making an audio book with images of the text and putting it on youtube:



To do this, the first thing was to download the text from the Cory Doctorow site:
http://craphound.com/down/Cory_Doctorow_-_Down_and_Out_in_the_Magic_Kingdom.txt

There are some issues with text encoding that I mostly plowed through though I suspect another process for conversion to UTF8 could have worked better.

First thing is to get rid of some ampersand hash forty fives that I think were dashes in vim:

:%s/-//g

Also need to remove http://en.wikipedia.org/wiki/Specials_(Unicode_block) the U+FFFD unicode characters.

%s/\%uFFFD//g

Also replacing tabs with spaces turned out to be necessary.

Imagemagick wouldn't do automatic line breaks for me later in this process (though pango might have worked), so added linebreaks to keep lines under 80 characters was necessary:

fmt ../Cory_Doctorow_-_Down_and_Out_in_the_Magic_Kingdom.txt > ../Cory_Doctorow_-_Down_and_Out_in_the_Magic_Kingdom_line_breaks.txt  

There were still some odd question marks generated by convert in the text, I hand edit to get the worst one out- the one that would have appeared on the title of the book.

Next thing was to split the book at every blank line into roughly 1500 text files which will probably be short enough to show in a single image:

csplit -f down -b '%05d.txt' ../*.txt '/^$/' '{*}'

Next is the conversion of each of the split text files into HD png files

for i in *.txt; 
do convert -background black -fill white -size 1920x1080 -pointsize 45 -gravity center label:"$(<$i)" PNG8:"$i.png"; 
done

And then generate wave files from each of the 1500 text files:

for i in *txt;
do pico2wave -w $i.wav "$(<$i)"
done


Videos are then created from putting the png images together with the images, this part is very similar to the process in http://binarymillenium.com/2013/07/turn-set-of-mp3s-into-static-image.html

for i in *.txt; 
do avconv -loop 1 -r 1 -i "$i.png" -c:v libx264 -i "$i.wav" -c:a aac -b:a 32k -strict experimental -shortest "$i.mp4"; 
done

Some conversions result in 0 length mp4s with this error:
[buffer @ 0x8959e0] Invalid pixel format string '-1' , 
this turned out to be caused by some of the convert png images being 16-bit instead of 8-bit (why wasn't it consistent, most were 8-bit), but putting PNG8: into the convert command line fixed this.

Create a text file listing of all the mp4 files:

rm all_videos.txt 
for i in *mp4; 
do echo $i echo "file '$i'" >> all_videos.txt 
done

And concatenate all the mp4 files together into one giant 6 hour video with no recompression (only 500MB though):

mkdir output
avconv -f concat -i all_videos.txt -c copy output/down_and_out.mp4

For the first few minutes on youtube it looked like the video was all black instead of showing the titles, but a few minutes later this was fixed.

No comments: