CLI reference

The complete list of command-line options, generated directly from ffsubsync’s argument parser. The same options are available under any of the three entry points (ffs, subsync, ffsubsync).

Synchronize subtitles with video.

usage: ffsubsync [-h] [-i [SRTIN ...]] [-o SRTOUT] [--merge-with-reference]
                 [--make-test-case] [--reference-stream REFERENCE_STREAM]
                 [--pgs-ref-stream [PGS_REF_STREAM]] [-v] [--overwrite-input]
                 [--encoding ENCODING]
                 [--max-subtitle-seconds MAX_SUBTITLE_SECONDS]
                 [--start-seconds START_SECONDS]
                 [--max-offset-seconds MAX_OFFSET_SECONDS]
                 [--split-penalty [SPLIT_PENALTY]]
                 [--split-length-penalty SPLIT_LENGTH_PENALTY]
                 [--split-subsample SPLIT_SUBSAMPLE]
                 [--max-duration-seconds MAX_DURATION_SECONDS]
                 [--extract-audio-first] [--multi-segment-sync]
                 [--segment-count SEGMENT_COUNT] [--skip-intro-outro]
                 [--parallel-workers PARALLEL_WORKERS]
                 [--apply-offset-seconds APPLY_OFFSET_SECONDS]
                 [--skip-sync-on-low-quality] [--min-score MIN_SCORE]
                 [--quality-max-offset-seconds QUALITY_MAX_OFFSET_SECONDS]
                 [--max-framerate-deviation MAX_FRAMERATE_DEVIATION]
                 [--frame-rate FRAME_RATE] [--skip-infer-framerate-ratio]
                 [--non-speech-label NON_SPEECH_LABEL]
                 [--output-encoding OUTPUT_ENCODING]
                 [--reference-encoding REFERENCE_ENCODING] [--vad VAD]
                 [--whisper-weights WHISPER_WEIGHTS] [--language LANGUAGE]
                 [--whisper-args WHISPER_ARGS] [--no-fix-framerate]
                 [--serialize-speech]
                 [--extract-subs-from-stream EXTRACT_SUBS_FROM_STREAM]
                 [--suppress-output-if-offset-less-than SUPPRESS_OUTPUT_IF_OFFSET_LESS_THAN]
                 [--ffmpeg-path FFMPEG_PATH] [--log-dir-path LOG_DIR_PATH]
                 [--gss] [--strict]
                 [reference]

Positional Arguments

reference: Reference (video, subtitles, or a numpy array with VAD speech) to which to synchronize input subtitles.

Named Arguments

-i, --srtin

Input subtitles file (default=stdin). If omitted (and nothing is piped in), subtitles sharing the reference’s name in its directory are auto-detected (e.g. movie.srt, movie.en.srt for movie.mkv) and each is synced to a <name>.synced.srt next to it; pass –overwrite-input to overwrite the detected file(s) in place.

-o, --srtout

Output subtitles file (default=stdout).

--merge-with-reference, --merge

Merge reference subtitles with synced output subtitles.

Default: False

--make-test-case, --create-test-case

If specified, serialize reference speech to a numpy array, and create an archive with input/output subtitles and serialized speech.

Default: False

--reference-stream, --refstream, --reference-track, --reftrack

Which stream/track in the video file to use as reference, formatted according to ffmpeg conventions. For example, 0:s:0 uses the first subtitle track; 0:a:3 would use the third audio track. You can also drop the leading 0:; i.e. use s:0 or a:3, respectively. Example: ffs ref.mkv -i in.srt -o out.srt –reference-stream s:2

--pgs-ref-stream, --pgsstream

Use a PGS (Presentation Graphic Stream) image-based subtitle track from the reference MKV as the sync reference instead of audio VAD. Optionally specify the stream (leading 0: is optional, e.g. s:0 or 3). Omit the value to auto-detect the first hdmv_pgs_subtitle track. Example: ffs ref.mkv -i in.srt -o out.srt –pgs-ref-stream (auto) or ffs ref.mkv -i in.srt -o out.srt –pgs-ref-stream s:2 (explicit).

-v, --version

show program’s version number and exit

--overwrite-input

If specified, will overwrite the input srt instead of writing the output to a new file.

Default: False

--encoding

What encoding to use for reading input subtitles (default=infer).

Default: 'infer'

--max-subtitle-seconds

Maximum duration for a subtitle to appear on-screen (default=10.000 seconds).

Default: 10

--start-seconds

Start time for processing (default=0 seconds).

Default: 0

--max-offset-seconds

The max allowed offset seconds for any subtitle segment (default=60 seconds).

Default: 60

--split-penalty

Enable alass-style piecewise synchronization: instead of a single global offset, allow the offset to change across the timeline to correct commercial breaks, inserted/removed scenes (director’s cuts), or discs concatenated into one file. An optional value is the cost (in seconds of overlap) of introducing each such split – lower splits more eagerly, higher stays closer to a single global offset; values around 4-20 are typical. Pass the flag with no value to use a reasonable default (5). If the flag is omitted entirely (the default), a single global offset is used as before.

--split-length-penalty

Only meaningful with –split-penalty. Weight of the edge/length term used when scoring each cue’s placement (‘standard scoring’): a cue is charged this fraction of the reference speech just outside its edges, so it prefers a same-sized speech block over sitting anywhere inside a longer one. 0 falls back to pure overlap scoring (default=0.25).

Default: 0.25

--split-subsample

Only meaningful with –split-penalty. Sub-sample resolution of the offset search: 1 aligns to whole 100 Hz samples (10ms); higher values refine the offset to 1/N of a sample via exact linear interpolation of the reference. 1 is plenty for perceptual sync (default=1).

Default: 1

--max-duration-seconds

If specified, only process the first this-many seconds of the reference (measured from –start-seconds). Useful for speeding up long or remote references, since ffmpeg stops reading/downloading once this duration is reached.

--extract-audio-first

For remote URL references, first copy the audio track to a local temp file (no re-encode) and run speech detection on that, instead of streaming the full container over the network during detection. Can be more stable on flaky connections; ignored for local references.

Default: False

--multi-segment-sync

Sample a few short segments spread across the reference and run speech detection only on those, instead of the whole reference. Speeds up long or remote references; the usual framerate and offset search is unchanged. Only applies to video / audio references.

Default: False

--segment-count

Number of segments to sample for –multi-segment-sync (default=8).

Default: 8

--skip-intro-outro

With –multi-segment-sync, skip the first 30s and last 60s of the reference when placing segments (intros/credits often lack dialogue).

Default: False

--parallel-workers

How many segments to extract in parallel for –multi-segment-sync (default=4); useful for overlapping downloads of remote references.

Default: 4

--apply-offset-seconds

Apply a predefined offset in seconds to all subtitle segments (default=0 seconds).

Default: 0

--skip-sync-on-low-quality

If the alignment looks untrustworthy (see the thresholds below), leave the subtitles unmodified instead of applying a probably-wrong sync. Useful for batch jobs where a bad sync is worse than none.

Default: False

--min-score

With –skip-sync-on-low-quality, reject alignments scoring below this. The score’s magnitude is not normalized, but its sign is meaningful, so the default of 0.0 rejects only anti-correlated (clearly wrong) alignments.

Default: 0.0

--quality-max-offset-seconds

With –skip-sync-on-low-quality, reject alignments whose offset exceeds this many seconds (default=30.0).

Default: 30.0

--max-framerate-deviation

With –skip-sync-on-low-quality, reject alignments whose framerate scale deviates from 1.0 by more than this. The default of 0.10 permits every framerate correction ffsubsync can make (so it never rejects a legitimate one); tighten it only when you know the framerate should not change.

Default: 0.1

--frame-rate

Frame rate for audio extraction (default=48000).

Default: 48000

--skip-infer-framerate-ratio

If set, do not try to infer framerate ratio based on duration ratio.

Default: False

--non-speech-label

Label to use for frames detected as non-speech (default=0.000000)

Default: 0.0

--output-encoding

What encoding to use for writing output subtitles (default=utf-8). Can indicate “same” to use same encoding as that of the input.

Default: 'utf-8'

--reference-encoding

What encoding to use for reading / writing reference subtitles (if applicable, default=infer).

--vad

Which voice activity detector to use for speech extraction (if using video / audio as a reference, default=subs_then_webrtc). Choices: subs_then_webrtc, webrtc, subs_then_auditok, auditok, subs_then_silero, silero, fused, fused:weighted, fused:intersection, fused:union. The fused options combine webrtc and silero and require the optional silero dependency (torch). With –whisper-weights this instead takes a path to a ggml VAD model for whisper’s optional audio fragmentation.

--whisper-weights, --whisper-model, --ffmpeg-transcription-model-weights

Path to a whisper.cpp ggml model file (e.g. ggml-base.en.bin). If given, the reference audio is transcribed with ffmpeg’s whisper filter and the transcript is used as the sync reference instead of audio VAD (requires ffmpeg >= 8.0 built with –enable-whisper). ‘~’ is expanded for you. Example: ffs video.mp4 -i in.srt -o out.srt –whisper-weights ~/whisper.cpp/models/ggml-base.en.bin.

--language

Language code for –whisper-weights transcription (e.g. en, es, fr), or ‘auto’ to let whisper detect it. Default: inferred as ‘en’ for *.en model files, else ‘auto’. Only used with –whisper-weights.

--whisper-args

Extra options for ffmpeg’s whisper filter as key=value pairs separated by ‘:’ (e.g. queue=12), overriding ffsubsync’s defaults. The model, format, and destination are managed by ffsubsync and cannot be overridden. Only used with –whisper-weights.

--no-fix-framerate

If specified, subsync will not attempt to correct a framerate mismatch between reference and subtitles.

Default: False

--serialize-speech

If specified, serialize reference speech to a numpy array.

Default: False

--extract-subs-from-stream, --extract-subtitles-from-stream

If specified, do not attempt sync; instead, just extract subtitles from the specified stream using the reference.

--suppress-output-if-offset-less-than

If specified, do not produce output if offset below provided threshold.

--ffmpeg-path, --ffmpegpath

Where to look for ffmpeg and ffprobe. Uses the system PATH by default.

--log-dir-path

If provided, will save log file ffsubsync.log to this path (must be an existing directory).

--gss

If specified, use golden-section search to try to findthe optimal framerate ratio between video and subtitles.

Default: False

--strict

If specified, refuse to parse srt files with formatting issues.

Default: False