DVD/Blu-Ray conversion with text soft subs on Mac OS X

Saved here for my own reference, and possibly others’ if they should stumble across it: the easiest workflow I’ve found yet for converting DVDs or Blu-Rays (if you have a Blu-Ray reader, of course) for personal use on OS X, including OCR conversion of subtitles in either VOBSUB (DVD) or PGS (Blu-Ray) format to text-based .srt files suitable for use as soft subtitles, either as a sidecar file or included in the final movie file.

Movie Rip WorkflowThe flow diagram to the right gives an overview of the process I’ve landed on. Here’s a slightly more detailed breakdown.

  1. Use MakeMKV to rip the DVD or BluRay disc to an .mkv file (if I run into a stubborn DVD, or one with a lot of multiplexing, I’ll use RipIt to create a disk image first, then run that image through MakeMKV). To save space, you can select only the primary audio track for inclusion, or you can select others if you want other languages or commentary tracks archived as well (though this will require more storage space). I also select all available English-language subtitle tracks, as some discs will include both standard subtitles and subtitles for the hearing impaired or closed captions, which include some extra information on who is speaking and background sounds, or occasionally even transcriptions of commentary tracks.
  2. Use Subler to OCR and export the subtitle files. This takes two runs through Subler to complete.
    1. First run; drag the .mkv file onto Subler, and only select the subtitle track(s). Pop that into the export queue, and after a few minutes of processing (this is when the OCR process happens) Subler will output a tiny .m4v file.
    2. Second run; drag that file back onto Subler, click on the subtitle track, and choose File > Export… to save the .srt file(s). The tiny .m4v file can then be deleted.

    Now, the OCR process is not perfect, and the resulting .srt file(s) are virtually guaranteed to have some errors. How many and how intrusive they are depends on the source. BluRay subs seem to come out better than DVD subs (likely due to the higher resolution of the format giving better quality text for the OCR process to scan), DVD subs are also affected by the chosen font and whether or not italics were used. For correction, I use one of two methods.

    1. For a quick-and-dirty “good enough for now” run, I use BBEdit (but just about any other text editor would work) to do a quick spellcheck, identifying common errors and using search-and-replace to fix them in batches.
    2. For a real quality fix, I use Aegisub to go through line-by-line, comparing the text to the original audio, adding italics when appropriate, and so on.

    Of course, these two processes can be combined, done at different times, or skipped entirely; right now, I’m just living with the OCR errors, because I can always go back and use Subler to extract the .srt files for cleanup later on when I have more time.

  3. Use HandBrake to re-encode and convert the .mkv file (which at this point will be fairly large, straight off the source media) to a smaller .m4v file. You can either embed the .srt files at this point, under HandBrake’s ‘Subtitles’ tab, or if you prefer…
  4. …you can use Subler to .srt files into into the .m4v: Drag the .m4v file from HandBrake on to Subler, drag the .srt file(s) into the window that opens, and then drop that into the queue for final remuxing (optionally, before adding the files to the queue, use Subler’s metadata search tools to add the description, artwork, and other metadata). Then run the queue to output the final file.

And that’s it. Now, you should have a .m4v file with embedded text-based soft subtitles for programs that support that (VLC, Plex, etc.), or you can just use the .srt file(s) created by Subler earlier as a sidecar file for programs that don’t read the embedded .srt.


There has been a tendency to mock people that want to buy products simply because a certain company makes them. Some will say this type of buyer is being guided by marketing, or is just a follower, but in reality it comes down to trust. Many people trust Apple. It is this very important connection with users that will likely get people to at least try the Apple Watch, and for Apple that is the best outcome they can wish for.

There is a better than 50% chance that I’ll be ordering an Watch on the day they’re added to the Apple Store.

My 2015 Resolutions

  • 640×1136 (iPhone 5s)
  • 2,048×1,536 (iPad Air 2)
  • 5120×2880 (iMac with Retina Display)

Yes, I make this joke somewhat annually. But…it amuses me, so I’ll probably continue to do so. One of these days I should dig back through prior years to figure out where I’ve posted this (blog, Facebook, Twitter) and see how my resolutions have changed over time.

No such thing as “just metadata”

With all the recent news concerning the NSA’s surveillance programs (Prism et al.), one of the common defenses has been that for at least some of these programs (though not all), the government is “just” collecting metadata. For example, should the government access your email records, they might not have access to the content of the email, merely the associated data — like who you communicate with, when, how often, who else is included in the messages, and so on.

Techdirt has a good overview of why the “it’s just metadata” argument is a foolish argument to make — basically, there is a lot of information that can be derived from “just metadata” — but there’s also an MIT project called “Immersion” (noted in the TechDirt article, though I found it elsewhere) that gives a good visualization of what can be learned from a relatively limited dataset.

Immersion scans your Gmail account (with your explicit permission, of course), and then runs an analysis on the metadata — not the content — of your email history to create a diagram showing you you communicate with and the connections among them.

As an example, here’s my result (with names removed). This is an analysis of almost 52 thousand messages over nearly nine years among 201 separate contacts. Each dot is a single contact, the size of the dot is a measure of how often I’ve communicated with them, and the lines between them show existing relationships between those people (based on messages with multiple recipients).

Immersion Contact Map

In that image, there are two obvious constellations: the blue grouping at the top right are my family and long-time friends; the orange/green/red/brown grouping to the left are my Norwescon contacts. The scattering of purples and yellows are contacts that fall outside of those two primary groups. While there’s not much here of great surprise or import for me, I did already learn one thing of interest — apparently one of my old high school friends has had some amount of contact with one of my Norwescon friends (that’s the single line connecting the two constellations). Now, I have no idea what sort of relationship exists between them — it could be nothing more than my sending a group email that included one and accidentally including the other as part of the group — but some sort of relationship does, and that’s information I didn’t have before.

Now, my metadata is fairly innocuous. But for argument’s sake, suppose I was involved not with Norwescon, but with some other group of people that, for whatever reason, I wanted to keep quiet about. Maybe I’m involved in the local kink scene, and could face repercussions at my job or in my personal life if this became known. Maybe I’m having a gender identity crisis that I’m not comfortable publicly discussing, but have a strong internet-based support group. Maybe I’m part of Anonymous or some similar group, discussing ways to cause mischief. Maybe I’m a whistleblower, and these are my contacts. Maybe I’m a news reporter who has guaranteed anonymity for my sources — but suddenly, this metadata exposes not only who I communicate with, but when and how often, and if there’s a sudden ramp in communication between me and certain contacts in the weeks or months before I break a big story with a lot of anonymous sources, suddenly they’re not so anonymous any more. And, yes, of course, because no list like this would be complete without the modern boogeyman that is the government’s excuse for why this surveillance is necessary — maybe I’m a terrorist. (For the record, I’m none of the above-mentioned things.)

However, of that list of possibilities, terrorism (or, less broadly, investigation of known or suspected crimes) is the only one that the government should really have any interest in, and that’s exactly the kind of investigation that they should be getting warrants for. If they suspect someone, get a warrant, analyze their data, and build a case from there. But analyzing everyone’s data, all the time, without specific need, without specific justification, and without warrants? And then holding on to the data indefinitely, allowing them to troll through it at any time for any reason, whether or not a crime is suspected?

There’s a very good reason why terms like “Orwellian”, “Big Brother”, and “1984” keep coming up in these conversations.

Now PGP-enabled

With all the recent concerns about security and privacy in the world of PRISM, I finally decided to carry through on something I’d considered from time to time in the past, and have set myself up to be able to handle PGP encryption for my mail. I’m using GPGTools for the OS X Mail client and Mailvelope for Chrome when I need web access to my Gmail account.

To be honest, I don’t know how often I’ll actually use PGP for anything other than signing my messages — I can’t think of a time when I’ve ever been truly concerned about what someone might find if they snooped through my email (they’d probably be pretty bored) — but as long as the option is there, might as well make sure I’m set up to use it in case I ever feel the need.

My PGP public key follows:

Continue reading

Markdown is the new Word 5.1

From Markdown is the new Word 5.1:

There’s a way out of this loop of bouncing between cluttered word processors and process-centric writing tools, a way to avoid having cater to Clippy’s every whim while not having to hide your own work from yourself in order to concentrate. People have been saying for years that Word 5.1 needs to be ported to Mac OS X; that having that program running on current hardware would be the ideal solution to all of these problems with writing tools.

The truth is, there’s a solution now that’s most of the way there: Markdown and a good text editor. That’s the new Word 5.1. Think about it: a program like TextMate (I use TextWrangler. –mh) has almost no window chrome, and opens almost instantly. You start typing, and that’s all you have to do. I bring up Gruber because he invented Markdown, which lets you do basic formatting of text without really having to sweat much else. The types of formatting you don’t need aren’t even available to you when writing Markdown in a text editor, so you never have to deal with them.

Markdown will never be unreadable by a program, because it’s just ASCII text. It’s formatted, but if you’re reading the raw text, it’s not obscured the way a raw HTML file is. Any decent editor will give you a word count and can use headings as section and chapter breaks. With MultiMarkdown the options get even crazier: render your text file as a LaTeX document, or straight to PDF, or any number of other things. All from a text file and an editor with a minimal interface.

Almost all of my writing for many, many years now has been in a text editor using Markdown-formatted text. I’m using Markdown formatting for this blog post (which WordPress then automatically translates into HTML), I’ve written many, many discussion board posts for school in Markdown format before pasting them into BlackBoard, and I use Markdown formatting whenever I’m writing email messages.

I’m in that set of people who fondly remember Word 5.1, and miss the days of having a word processor that was actually a word processor, not an overblown attempt to do absolutely everything ever related to desktop publishing all at once (even Apple’s Pages, while far preferable to any post-5.1 version of Word, is far more than just a simple word processor). My senior year of high school, I booted my Mac Classic into Mac OS 6 with one 1.44 MB floppy; another 1.44 MB floppy held Word 5.1 and every paper I wrote that year.

Those days will never come again, admittedly. But a simple text editor and Markdown formatting is all that’s really needed.