Saved here for my own reference, and possibly others’ if they should stumble across it: the easiest workflow I’ve found yet for converting DVDs or Blu-Rays (if you have a Blu-Ray reader, of course) for personal use on OS X, including OCR conversion of subtitles in either VOBSUB (DVD) or PGS (Blu-Ray) format to text-based .srt files suitable for use as soft subtitles, either as a sidecar file or included in the final movie file.
- Use MakeMKV to rip the DVD or BluRay disc to an .mkv file (if I run into a stubborn DVD, or one with a lot of multiplexing, I’ll use RipIt first, then run its output through MakeMKV). When doing this, I generally only select the primary audio track for inclusion, though you can select others if you want commentary tracks archived as well (I played with this for a while, then decided that I’d rather conserve the storage space, and just pop the physical media in if/when I was in the mood to listen to a director’s commentary). I do select all available English-language subtitle tracks, though, as some discs will include both standard subtitles and subtitles for the hearing impaired or closed captions, which include some extra information on who is speaking and background sounds, or occasionally even director’s commentaries.
- Use Subler to OCR and export the subtitle files. This takes two runs through Subler to complete. First run; drag the .mkv file onto Subler, and only select the subtitle track(s). Pop that into the export queue, and after a few minutes of processing (this is when the OCR process happens) Subler will output a tiny .m4v file. Second run; drag that file back onto Subler, click on the subtitle track, and choose File > Export… to save the .srt file(s). The tiny .m4v file can then be deleted.
Now, the OCR process is not perfect, and the resulting .srt file(s) are virtually guaranteed to have some errors. How many and how intrusive they are depends on the source. BluRay subs seem to come out better than DVD subs (likely due to the higher resolution of the format giving better quality text for the OCR process to scan), DVD subs are also affected by the chosen font and whether or not italics were used. For correction, I use one of two methods. For a quick-and-dirty “good enough for now” run, I use BBEdit (but just about any other text editor would work) to do a quick spellcheck, identifying common errors and using search-and-replace to fix them in batches. For a real quality fix, I use Aegisub to go through line-by-line, comparing the text to the original audio, adding italics when appropriate, and so on. Of course, these two processes can be combined, done at different times, or skipped entirely; right now, I’m just living with the OCR errors, because I can always go back and use Subler to extract the .srt files for cleanup later on when I have more time.
- At the same time as this process is going on, use HandBrake to re-encode and convert the .mkv file (which at this point will be fairly large, straight off the source media) to a smaller .m4v file.
- Finally, use Subler to combine the files into one: Drag the .m4v file from HandBrake on to Subler, drag the .srt file(s) into the window that opens, and then drop that into the queue for final remuxing (optionally, before adding the files to the queue, use Subler’s metadata search tools to add the description, artwork, and other metadata). Then run the queue to output the file.
And that’s it. Now, you should have a .m4v file with embedded text-based soft subtitles for programs that support that (VLC, etc.), or you can just use the .srt file(s) created by Subler earlier as a sidecar file for programs that don’t read the embedded .srt (Plex Media Server, at least at this point, as far as I can tell, unless I’m doing something wrong).