Extract Audio from Y4M to DSS — Free Online Tool
Extract audio from a Y4M (YUV4MPEG2) uncompressed video file and save it as a DSS (Digital Speech Standard) file encoded with ADPCM IMA OKI — the codec used natively by Olympus, Philips, and Grundig digital dictation devices. This is a niche but precise conversion for workflows that pipe lossless video through Y4M intermediates and need to archive or transfer the speech audio in dictation-compatible format.
to
FFmpeg Command
Copy this command to run the same conversion locally with FFmpeg on your desktop. Download FFmpeg
Drop your Y4M file here
or click to browse
Free — no uploads, no signups. Your files never leave your browser.
Settings
Note: Browser-based encoding uses approximate quality targets. For precise CRF compression, copy the FFmpeg command above and run it on your desktop.
Estimated output:
Conversion Complete!
DownloadHow It Works
Y4M is a raw, uncompressed video container commonly used as an intermediate format when piping video between applications like FFmpeg, avisynth, or vapoursynth — it carries rawvideo frames with no compression. However, Y4M itself has no standard audio codec, meaning any audio present is typically minimal or workflow-dependent. During this conversion, FFmpeg discards the video stream entirely using the -vn flag, then encodes the audio track using the ADPCM IMA OKI codec and writes it into a DSS container. DSS is a proprietary format developed jointly by Olympus, Philips, and Grundig, optimized for low-bitrate speech recording on handheld digital voice recorders. The resulting file is lossy and heavily compressed compared to the uncompressed source, optimized for voice intelligibility rather than fidelity.
What Each Flag Does
| Flag | What it does |
|---|---|
ffmpeg
|
Invokes the FFmpeg command-line tool, which handles all demuxing, decoding, encoding, and muxing steps in this conversion pipeline. |
-i input.y4m
|
Specifies the input file in Y4M (YUV4MPEG2) format — an uncompressed rawvideo container commonly used as a lossless intermediate in video processing pipelines. FFmpeg will demux the container and identify any available audio streams. |
-vn
|
Disables video output entirely, telling FFmpeg to ignore the rawvideo stream from the Y4M file. Since DSS is an audio-only container, this flag is essential — without it FFmpeg would attempt to include video in the output and fail. |
-c:a adpcm_ima_oki
|
Selects the ADPCM IMA OKI encoder for the audio stream — the proprietary lossy codec used natively in the DSS format as developed by Olympus, Philips, and Grundig for digital dictation devices. This codec compresses audio aggressively for speech intelligibility at very low bitrates. |
output.dss
|
Defines the output filename and tells FFmpeg to write the result into a DSS (Digital Speech Standard) container. FFmpeg infers the DSS muxer from the .dss extension, which pairs with the ADPCM IMA OKI audio codec to produce a file compatible with digital dictation software and hardware. |
Common Use Cases
- A transcriptionist receives a Y4M file from a video processing pipeline that captured a dictation session, and needs to convert the audio into DSS format to load it into professional transcription software that natively supports Olympus/Philips dictation files.
- A legal or medical office uses FFmpeg-based tooling to process video recordings of consultations stored as Y4M intermediates, and must extract audio into DSS format for archiving in a digital dictation management system.
- A developer building a pipeline between video capture tools and a dictation workflow needs to verify that Y4M audio can be successfully extracted and encoded into DSS with ADPCM IMA OKI for compatibility testing.
- An archivist working with legacy digital voice recorder ecosystems needs to produce DSS files from uncompressed video sources to ensure compatibility with older Olympus or Philips playback hardware and software.
- A quality assurance engineer testing FFmpeg's DSS muxer and ADPCM IMA OKI encoder uses Y4M as a known-good uncompressed source to isolate audio encoding behavior without video codec interference.
Frequently Asked Questions
Y4M is primarily a video-only format and in most practical uses it carries no audio at all — the specification focuses on transporting raw YUV video frames. If you're piping Y4M through FFmpeg or a similar tool, the audio stream, if any, is typically handled separately or simply absent. If your Y4M file has no audio track, this conversion will produce an empty or invalid DSS file. You should verify your source file has an audio stream before converting.
DSS with ADPCM IMA OKI encoding is a heavily lossy, low-bitrate codec designed exclusively for speech intelligibility on digital dictation devices — not for music or high-fidelity audio. It typically encodes at very low sample rates (around 8 kHz) with narrow frequency response, meaning the output will sound noticeably degraded compared to any wideband audio in the source. If your source audio is speech recorded in a quiet environment, the result will be intelligible. If it is music or broadband audio, significant quality loss should be expected.
DSS is a proprietary format originally developed for Olympus, Philips, and Grundig hardware and software ecosystems. Most general-purpose media players do not support DSS natively — VLC has partial support and may play FFmpeg-encoded DSS files, but Windows Media Player typically does not. The intended playback environment is dedicated dictation software such as Olympus DSS Player, Philips SpeechExec, or compatible transcription platforms.
Y4M carries very little metadata beyond basic video parameters such as frame rate, pixel format, and interlacing, and DSS has extremely limited metadata support as a dictation-oriented format. Essentially no meaningful metadata will transfer between these two formats. If you need to attach author, date, or recording information to the DSS file, you would need to handle that through DSS-specific tooling or dictation software after conversion.
You can append -ar 8000 to explicitly set the sample rate to 8000 Hz, which is standard for DSS dictation files, or use -ac 1 to force mono output — DSS is a mono format, so this is strongly recommended. For example: ffmpeg -i input.y4m -vn -ac 1 -ar 8000 -c:a adpcm_ima_oki output.dss. Note that DSS and the ADPCM IMA OKI codec have very limited configuration options — there is no variable bitrate or quality parameter available for this codec in FFmpeg.
Yes, you can batch process multiple files using a simple shell loop around the FFmpeg command. On Linux or macOS, run: for f in *.y4m; do ffmpeg -i "$f" -vn -c:a adpcm_ima_oki "${f%.y4m}.dss"; done. On Windows Command Prompt: for %f in (*.y4m) do ffmpeg -i "%f" -vn -c:a adpcm_ima_oki "%~nf.dss". This is particularly useful when processing multiple Y4M files exported from a video pipeline. Note that batch processing is best done locally on desktop since each file must be handled individually.
Technical Notes
Y4M (YUV4MPEG2) is an uncompressed rawvideo container and in practice is almost never used to store audio — its primary role is as a pipe-friendly intermediate for lossless video frame transport between tools. The DSS format uses the ADPCM IMA OKI codec, a variant of the IMA ADPCM adaptive differential pulse-code modulation family, specifically tuned for the Oki Semiconductor implementation found in Olympus and compatible dictation chipsets. FFmpeg supports DSS muxing and ADPCM IMA OKI encoding, but the codec has essentially no tunable quality parameters — bitrate, channels, and sample rate are the only meaningful knobs, and DSS is expected to be mono at 8 kHz. If the source audio in a Y4M file is stereo or at a higher sample rate, FFmpeg will automatically downmix and resample during encoding. The DSS container also has a proprietary header structure that may not be fully compatible with all Olympus/Philips hardware without post-processing by native dictation software. File sizes for DSS output will be dramatically smaller than any audio extracted from a Y4M source, given the contrast between the uncompressed source and the highly compressed dictation codec.