A few weeks ago I downloaded an HD version of the excellent Thai horror movie Shutter, the version I downloaded was Shutter 2004 (1080p Bluray x265 HEVC 10bit AAC 5.1 Thai Tigole) but I couldn’t find Spanish subtitles synchronized for that version. The movie was in a Matroska container that included the English and German subtitles, the only option I had was to extract the English subtitles, use the Spanish subtitles from other versions and synchronize the timing of the English subtitles.
The Matroska container
The definition of the Matroska container, according to the wikipedia is the following:
Matroska is a project to create a container format that can hold an unlimited number of video, audio, picture, or subtitle tracks in one file.[3] The Matroska Multimedia Container is similar in concept to other containers like AVI, MP4, or Advanced Systems Format (ASF), but is an open standard.
After knowing a little about this container follows the explanation about this task, I remind you that my OS is GNU/Linux distribution Manjaro x64, although the instructions are suitable for any GNU/Linux distribution even for ms windows users, except for the way to install the tool used.
MKVToolNix the Swiss Army Knife for Matroska Container
The main and basic tool is MKVToolNix, a powerful set of tools to manipulate Matroska files; this program can be downloaded from its sitio web and is available for different GNU/Linux distributions as well as for Windows. To install it in Manjaro is as easy as typing this from the terminal:
sudo pacman -S mkvtoolnix-cli
You can also install it from the Manjaro package gesture or Octopi.
After installing MKVToolNix, we open the terminal, from GNU/Linux or ms windows, and we locate where the file is located the Matroska container file from which we are going to extract the subtitles, in my case the name of the file is Shutter.2004.mkv; the first thing is to locate the “track” where the subtitles that we need are located, this we do it with the command mkvinfo:
mkvinfo Shutter.2004.mkv
The output is as follows:
+ EBML head
|+ EBML version: 1
|+ EBML read version: 1
|+ EBML maximum ID length: 4
|+ EBML maximum size length: 8
|+ Doc type: matroska
|+ Doc type version: 4
|+ Doc type read version: 2
+ Segment, size 2048528838
|+ Seek head (subentries will be skipped)
|+ EbmlVoid (size: 4012)
|+ Segment information
| + Timecode scale: 1000000
| + Multiplexing application: libebml v1.3.4 + libmatroska v1.4.5
| + Writing application: mkvmerge v9.4.0 ('Knurl') 64bit
| + Duration: 5804.222s (01:36:44.222)
| + Date: Tue Sep 06 06:57:54 2016 UTC
| + Segment UID: 0xa1 0xc1 0xec 0x85 0xd8 0x36 0x5b 0x84 0x85 0x9b 0xdf 0x6d 0xca 0x60 0x48 0x23
|+ Segment tracks
| + A track
| + Track number: 1 (track ID for mkvmerge & mkvextract: 0)
| + Track UID: 10832681874364858
| + Track type: video
| + Lacing flag: 0
| + MinCache: 1
| + Codec ID: V_MPEGH/ISO/HEVC
| + CodecPrivate, length 1131 (HEVC profile: Main 10 @L4.0)
| + Default duration: 41.667ms (24.000 frames/fields per second for a video track)
| + Language: und
| + Video track
| + Pixel width: 1920
| + Pixel height: 1080
| + Display width: 1920
| + Display height: 1080
| + A track
| + Track number: 2 (track ID for mkvmerge & mkvextract: 1)
| + Track UID: 16111199640271706300
| + Track type: audio
| + Codec ID: A_AAC
| + CodecPrivate, length 2
| + Default duration: 21.333ms (46.875 frames/fields per second for a video track)
| + Language: tha
| + Audio track
| + Sampling frequency: 48000
| + Channels: 6
| + A track
| + Track number: 3 (track ID for mkvmerge & mkvextract: 2)
| + Track UID: 4506191980823210926
| + Track type: subtitles
| + Lacing flag: 0
| + Codec ID: S_TEXT/UTF8
| + A track
| + Track number: 4 (track ID for mkvmerge & mkvextract: 3)
| + Track UID: 5025865172156633481
| + Track type: subtitles
| + Default flag: 0
| + Lacing flag: 0
| + Codec ID: S_VOBSUB
| + CodecPrivate, length 348
| + Language: ger
| + Content encodings
| + Content encoding
| + Content compression
|+ EbmlVoid (size: 1181)
|+ Chapters
| + EditionEntry
| + EditionFlagHidden: 0
| + EditionFlagDefault: 1
| + EditionUID: 3152495284696511680
| + ChapterAtom
| + ChapterUID: 15056966616418343257
| + ChapterTimeStart: 00:00:00.000000000
| + ChapterFlagHidden: 0
| + ChapterFlagEnabled: 1
| + ChapterTimeEnd: 00:07:57.250000000
| + ChapterDisplay
| + ChapterString: Chapter 01
| + ChapterLanguage: eng
| + ChapterAtom
| + ChapterUID: 15253337529613683601
| + ChapterTimeStart: 00:07:57.250000000
| + ChapterFlagHidden: 0
| + ChapterFlagEnabled: 1
| + ChapterTimeEnd: 00:15:43.000000000
| + ChapterDisplay
| + ChapterString: Chapter 02
| + ChapterLanguage: eng
| + ChapterAtom
| + ChapterUID: 1994698034217092959
| + ChapterTimeStart: 00:15:43.000000000
| + ChapterFlagHidden: 0
| + ChapterFlagEnabled: 1
| + ChapterTimeEnd: 00:23:48.208333333
| + ChapterDisplay
| + ChapterString: Chapter 03
| + ChapterLanguage: eng
| + ChapterAtom
| + ChapterUID: 258434961097259531
| + ChapterTimeStart: 00:23:48.208333333
| + ChapterFlagHidden: 0
| + ChapterFlagEnabled: 1
| + ChapterTimeEnd: 00:31:47.083333333
| + ChapterDisplay
| + ChapterString: Chapter 04
| + ChapterLanguage: eng
| + ChapterAtom
| + ChapterUID: 18288776774064417453
| + ChapterTimeStart: 00:31:47.083333333
| + ChapterFlagHidden: 0
| + ChapterFlagEnabled: 1
| + ChapterTimeEnd: 00:39:34.666666666
| + ChapterDisplay
| + ChapterString: Chapter 05
| + ChapterLanguage: eng
| + ChapterAtom
| + ChapterUID: 1173979347969436520
| + ChapterTimeStart: 00:39:34.666666666
| + ChapterFlagHidden: 0
| + ChapterFlagEnabled: 1
| + ChapterTimeEnd: 00:47:19.208333333
| + ChapterDisplay
| + ChapterString: Chapter 06
| + ChapterLanguage: eng
| + ChapterAtom
| + ChapterUID: 11281031232039191669
| + ChapterTimeStart: 00:47:19.208333333
| + ChapterFlagHidden: 0
| + ChapterFlagEnabled: 1
| + ChapterTimeEnd: 00:56:15.375000000
| + ChapterDisplay
| + ChapterString: Chapter 07
| + ChapterLanguage: eng
| + ChapterAtom
| + ChapterUID: 10815360979304430871
| + ChapterTimeStart: 00:56:15.375000000
| + ChapterFlagHidden: 0
| + ChapterFlagEnabled: 1
| + ChapterTimeEnd: 01:02:14.208333333
| + ChapterDisplay
| + ChapterString: Chapter 08
| + ChapterLanguage: eng
| + ChapterAtom
| + ChapterUID: 16809010294411897098
| + ChapterTimeStart: 01:02:14.208333333
| + ChapterFlagHidden: 0
| + ChapterFlagEnabled: 1
| + ChapterTimeEnd: 01:12:48.291666666
| + ChapterDisplay
| + ChapterString: Chapter 09
| + ChapterLanguage: eng
| + ChapterAtom
| + ChapterUID: 8150110444134871576
| + ChapterTimeStart: 01:12:48.291666666
| + ChapterFlagHidden: 0
| + ChapterFlagEnabled: 1
| + ChapterTimeEnd: 01:18:00.958333333
| + ChapterDisplay
| + ChapterString: Chapter 10
| + ChapterLanguage: eng
| + ChapterAtom
| + ChapterUID: 18439445082546892497
| + ChapterTimeStart: 01:18:00.958333333
| + ChapterFlagHidden: 0
| + ChapterFlagEnabled: 1
| + ChapterTimeEnd: 01:27:29.416666666
| + ChapterDisplay
| + ChapterString: Chapter 11
| + ChapterLanguage: eng
| + ChapterAtom
| + ChapterUID: 18327904455893100144
| + ChapterTimeStart: 01:27:29.416666666
| + ChapterFlagHidden: 0
| + ChapterFlagEnabled: 1
| + ChapterTimeEnd: 01:36:44.208333333
| + ChapterDisplay
| + ChapterString: Chapter 12
| + ChapterLanguage: eng
|+ EbmlVoid (size: 101)
|+ Cluster
Four tracks can be identified, we locate the track we are interested in, which is number three:
| + A track
| + Track number: 3 (track ID for mkvmerge & mkvextract: 2)
| + Track UID: 4506191980823210926
| + Track type: subtitles
| + Lacing flag: 0
| + Codec ID: S_TEXT/UTF8
But how do we know it is the one we are looking for, by the track type, which indicates that it is a subtitle, although track number four is also a subtitle:
| + A track
| + Track number: 4 (track ID for mkvmerge & mkvextract: 3)
| + Track UID: 5025865172156633481
| + Track type: subtitles
| + Default flag: 0
| + Lacing flag: 0
| + Codec ID: S_VOBSUB
| + CodecPrivate, length 348
| + Language: ger
| + Content encodings
| + Content encoding
| + Content compression
But this track contains the German subtitles:
| + Language: ger
When the language is not indicated in the track information it means that it is in English. We go back to track number three which is the one of interest and focus on the line where the track number is located:
| + Track number: 3 (track ID for mkvmerge & mkvextract: 2)
In this one you can see the number two as track ID which is useful to use it with tools like mkvmerge and mkvextract, this last one is the one used to extract the content of a track, already having the necessary information you can extract the subtitles of interest by means of mkvextract:
mkvextract tracks Shutter.2004.mkv 2:Shutter.2004.srt
Where tracks indicates that what is going to be extracted is a track, followed by the name of the Matroska container file, in this case Shutter.2004.mkv, then the track identifier (track ID) which is two and followed by a colon, the output file of the subtitles Shutter.2004.srt, the output name can be any, it is important to say that if the codec ID of the subtitles is S_TEXT/UTF8 the output file is SubRip (.srt). When running the tool the following message is displayed:
Extracting track 2 with the CodecID 'S_TEXT/UTF8' to the file 'Shutter.2004.srt'. Container format: SRT text subtitles
Progress: 100%
Where it informs us all the process that is being carried out and its respective indicator of progress, at the end the English subtitles file is obtained, which was the objective, they are ready for the synchronization of the Spanish subtitles with these.