Reference types =============== The **reference** is whatever ffsubsync treats as the ground truth for timing. ffsubsync inspects the reference — mostly its file extension — and picks one of several strategies for turning it into a speech signal to align against. Understanding these paths helps you choose the fastest and most accurate option for what you have on hand. Video or audio (voice-activity detection) ----------------------------------------- When the reference is a media file, ffsubsync uses ffmpeg to extract the audio and then runs a **voice-activity detector (VAD)** to label each 10 ms window as speech or silence. This is the most general path — it works for any video with a dialogue track — but also the most expensive, since audio extraction dominates the runtime. Which detector runs, and how to tune it for difficult audio, is covered under :ref:`vad-backends`. The audio is extracted at a sample rate controlled by ``--frame-rate`` (default 48000; this is the *audio* sample rate used for VAD, not the video's frames per second). Embedded subtitles first (``subs_then_*``) ------------------------------------------ Many video containers (especially MKV) carry one or more **embedded text subtitle** streams. Those are already a perfect speech signal — far cheaper and often more accurate than running a VAD over the audio. The default detector, ``subs_then_webrtc``, exploits this: it first tries to use an embedded text-subtitle stream from the reference, and only falls back to the WebRTC audio VAD if no usable embedded subtitles are found. The ``subs_then_*`` family (``subs_then_webrtc``, ``subs_then_auditok``, ``subs_then_silero``) all behave this way, differing only in which audio VAD they fall back to. Use a bare detector name (e.g. ``--vad webrtc``) to skip the embedded-subtitle shortcut and force audio detection. Subtitle file ------------- If the reference itself is a subtitle file — extension ``.srt``, ``.ass``, ``.ssa``, or ``.sub`` — ffsubsync derives the speech signal straight from the reference's on/off subtitle timings. No audio is extracted, so this is the fastest path (typically under a second). This is the "sync against an already-correct subtitle" workflow from :doc:`usage`. When the reference is a subtitle file you can also control its text encoding with ``--reference-encoding`` (it defaults to auto-detection, just like input subtitles — see :doc:`encoding`), and merge the reference into the output with ``--merge-with-reference``. PGS image subtitles ------------------- Blu-ray rips often ship subtitles as **PGS** (Presentation Graphic Stream) image-based tracks rather than text. ffsubsync can use a PGS track as the sync reference without any OCR, deriving speech timing from when each subtitle image is displayed: .. code-block:: console $ ffs ref.mkv -i in.srt -o out.srt --pgs-ref-stream Passing ``--pgs-ref-stream`` with no value auto-detects the first ``hdmv_pgs_subtitle`` track. To pick a specific track, give it a stream specifier (the leading ``0:`` is optional): .. code-block:: console $ ffs ref.mkv -i in.srt -o out.srt --pgs-ref-stream s:2 Serialized speech (``.npy`` / ``.npz``) --------------------------------------- If you pass a ``.npy`` or ``.npz`` file as the reference, ffsubsync loads a previously-serialized speech signal instead of computing one. You produce such a file with ``--serialize-speech`` (see :doc:`advanced`). This is handy when you want to sync several subtitle files against the same video: extract the speech signal once, then reuse it repeatedly without re-decoding the audio. Selecting a stream from the reference ------------------------------------- A video file can contain several audio or subtitle tracks. Use ``--reference-stream`` to choose which one to use, formatted according to ffmpeg conventions: .. code-block:: console $ ffs ref.mkv -i in.srt -o out.srt --reference-stream s:2 For example, ``0:s:0`` uses the first subtitle track and ``0:a:3`` uses the fourth audio track; you may drop the leading ``0:`` and write ``s:0`` or ``a:3``. Offset-only mode (no reference) ------------------------------- Finally, ffsubsync doesn't strictly need a reference at all. If you already know the correction you want, ``--apply-offset-seconds`` shifts every subtitle by a fixed amount with no alignment step: .. code-block:: console $ ffs -i in.srt -o out.srt --apply-offset-seconds 3.5 This is covered further in :doc:`advanced`.