Advanced options
This page groups the more advanced flags by the problem they solve. For the exhaustive, auto-generated list of every option, see the CLI reference reference.
Framerate correction
Subtitles authored against a differently-encoded copy of a video (for example, 25 fps PAL subtitles played over a 23.976 fps release) drift progressively: they start roughly aligned but grow more and more out of sync toward the end. A single constant offset can’t fix this — the rate is wrong, not just the start point.
By default ffsubsync tries a handful of common framerate ratios in addition to a straight offset, so ordinary PAL/NTSC-style mismatches are corrected automatically. Two flags adjust this behavior:
--gssuses golden-section search to hunt for the optimal framerate ratio continuously, instead of only evaluating the handful of common discrete ratios. Reach for it when you suspect a framerate mismatch that the default ratios don’t cover.--no-fix-frameratedisables framerate correction entirely and assumes the reference and subtitles share a framerate. This constrains the search to a pure offset, which can help when a spurious framerate “correction” is making a borderline sync worse.--skip-infer-framerate-ratioleaves the discrete-ratio search in place but skips the heuristic that guesses a ratio from the reference/subtitle duration ratio.
Voice-activity detectors (--vad)
When the reference is video or audio, ffsubsync labels speech with a
voice-activity detector. --vad selects the backend:
webrtc(default fallback)The VAD built into WebRTC — fast, dependency-light, and a good default. This is what the default
subs_then_webrtcfalls back to when no embedded subtitles are present.auditokAn energy-based detector from auditok. It detects all audio rather than voice specifically, which is usually worse but can outperform a true VAD on low-quality audio where speech detection struggles. (auditok is GPLv3 and is imported lazily only when selected.)
sileroThe neural silero VAD. More robust on noisy audio, but requires PyTorch — install it with the
torchextra (see Installation).fused,fused:weighted,fused:intersection,fused:unionCombine the WebRTC and silero detectors.
weighted(the defaultfusedstrategy) blends them as0.6 * silero + 0.4 * webrtc;intersectionmarks speech only where both agree (conservative);unionmarks speech where either fires (aggressive). These also require thetorchextra.
Each detector also has a subs_then_ variant that prefers embedded text
subtitles before falling back to that audio VAD; see Reference types.
The quality gate (bulk syncing)
When syncing many files unattended, a confidently-wrong sync is worse than no
change at all. --skip-sync-on-low-quality leaves the subtitles untouched when
the winning alignment looks untrustworthy, instead of writing a probably-wrong
result. Three thresholds define “untrustworthy”:
--min-score(default0.0) rejects alignments scoring below the given value. The score’s magnitude isn’t normalized, but its sign is meaningful, so the default of0.0rejects only anti-correlated (clearly wrong) alignments.--quality-max-offset-seconds(default30.0) rejects an alignment whose offset exceeds this many seconds, on the assumption that huge shifts are usually spurious.--max-framerate-deviation(default0.1) rejects an alignment whose framerate scale factor deviates from 1.0 by more than this. The default permits every framerate correction ffsubsync would legitimately make, so it never rejects a real one; tighten it only when you know the framerate should not change.
When an alignment is rejected, ffsubsync writes the original, unshifted subtitles and reports the sync as unsuccessful.
Long and remote references
Extracting audio from a long — or remotely-streamed — reference is the slow part of a sync. Three flags cut that cost:
--max-duration-seconds Nprocesses only the firstNseconds of the reference (measured from--start-seconds). Because ffmpeg stops reading — and therefore downloading — once that duration is reached, this is especially effective for remote references.$ ffs "https://example.com/video.mp4" -i in.srt -o out.srt --max-duration-seconds 600
--extract-audio-firstcopies the remote audio track to a local temp file (no re-encode) before running detection, instead of holding a network stream open throughout. On flaky connections this is often more stable. It is ignored for local references and composes with--max-duration-seconds.--multi-segment-syncsamples several short segments spread across the whole reference and runs detection on just those. Unlike--max-duration-seconds, it can still catch desync that only appears later in the runtime, because each segment keeps its true timeline position — so the framerate-ratio and offset search is unchanged and a framerate mismatch is still corrected.$ ffs "https://example.com/video.mp4" -i in.srt -o out.srt --multi-segment-sync
Tune it with
--segment-count N(default 8),--skip-intro-outro(skip the first 30 s and last 60 s, which often lack dialogue), and--parallel-workers N(overlap segment downloads, default 4). It applies to video/audio references only.
Applying a fixed offset
--apply-offset-seconds N adds a constant N-second shift to the computed
offset. Combined with a reference, it nudges the automatic result. With no
reference, it becomes a pure manual shift with no alignment step at all:
$ ffs -i in.srt -o out.srt --apply-offset-seconds 3.5
Reusing a speech signal
--serialize-speech saves the reference’s computed speech signal to a
compressed <reference>.npz array. You can then pass that .npz back as the
reference (see Reference types) to sync additional subtitles against the
same video without re-decoding its audio.
--make-test-case goes further, bundling the serialized speech together with
the input and output subtitles into an archive — useful for filing a reproducible
bug report.
Other useful flags
--overwrite-inputrewrites the input subtitle in place instead of writing a separate output file. Required when you pass multiple-iinputs.--merge-with-referencemerges the reference subtitles into the synced output (valid only when the reference is itself a subtitle file).--extract-subs-from-streamskips syncing altogether and just extracts a subtitle track from the reference via ffmpeg.--suppress-output-if-offset-less-than Nwrites nothing when the computed offset is smaller thanN— handy for skipping no-op rewrites in bulk jobs.--strictrefuses to parse subtitle files with formatting problems instead of doing its best.--ffmpeg-pathpoints ffsubsync at a specific ffmpeg/ffprobe location (otherwise the systemPATHis used).--log-dir-pathsaves anffsubsync.logfile to an existing directory for later inspection.--start-secondsand--max-subtitle-secondsbound, respectively, where processing begins and the longest plausible single-subtitle duration.