Matroska Stem Files

Matroska Stem Files Independent

sam@samwhited.com https://blog.samwhited.com

General Internet Engineering Task Force audio matroska stems djing This document defines a multi-track profile of the Matroska container format for distributing stems. It is intended to be used by DJ applications and Digital Audio Workstations while remaining backwards compatible with existing media players.

Introduction Stems are recordings of individual instruments, or clusters of instruments, used by DJs and music producers for live mixing of music. Historically stems have been stored as individual audio files, or using patent-encumbered or vendor specific, proprietary container formats. A common feature of modern software used by DJs is "dynamic" or "live" stem separation where the DJ software attempts to algorithmically separate the audio signals in a track to allow the DJ to mute, solo, or apply effects to individual instruments. The results of such dynamic separation vary but are, generally speaking, noticeably different from the original stems used by the producer and frequently contain distortions and other artifacts that sound undesirable. A better model is to have the producer release the original stems along with the original track. This allows the final mix to sound closer to the producers original vision for the track, even while it is being remixed and re-interpreted by a DJ or another artist. This specification documents a profile for the Matroska container format that allows it to store the final mix for a track alongside the lossless or lossy stems used to mix the track in a single file. The target consumer of these stem files are DJ applications meant for live remixing and performance, as well as Digital Audio Workstations (DAWs) used by producers who want their music to be remixed.

Requirements Language The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14 when, and only when, they appear in all capitals, as shown here.

Track Layout

Audio Streams Each stem file may contain an arbitrary number of tracks containing audio and MUST include at least three audio tracks (the mixed audio and at least two stems). For stem files meant for live DJ use, it is RECOMMENDED that four or fewer stem tracks be used (as opposed to stem files meant for music production or non-live remixing where a DAW may utilize a significantly larger number of tracks). For ease of decoding each track SHOULD be encoded using the same codec with the same parameters including bitrate, and sample rate. Stems are often recorded with a single channel and only the final mix is in stereo. For stem files that are meant to be re-mixed by a DAW this is fine, but DJs may want to maintain a similar balance and channel layout to the original track. Stems MAY have a different channel count or layout than the main audio track, however it is RECOMMENDED that all stem tracks maintain the same channel count and layout as the main track and have the same channel balance as their component parts in the final mix. For example, if the final mix is a stereo track that contains a fiddle that is 75% in the right channel and only 25% in the left channel, the stem track for the fiddle would also be in stereo with the stem mostly appearing from the right channel as in the final mix. The first track containing audio data MUST be the final post-mix audio in the default language (the mixdown track). All mixdown tracks regardless of language MUST have the Matroska "Default" flag set to "1" (, ). This helps preserve backwards compatibility in media players which do not support this format which typically play the first audio stream found or may select based on the default flag. In addition, the "Enabled" flag for any mixdown tracks MUST be set to "1" (). The remaining audio tracks will be individual stems and MUST have the same effective length as the mixdown track such that playing each stem track from the beginning would result in roughly the same audio (excluding mastering and possibly excluding inter-stem effects at the producers discretion) as the final mix present in the mixdown track. For example, if the original track is three minutes long and the stem file includes a percussion track but the percussion does not start until minute two, the percussion stem would still be three minutes long but would contain a minute of silence at the start of the track, or would have a block timestamp () that sets the effective start time to one minute. Each stem track MUST have the Matroska "Default" flag set to "0" and MUST have the "Enabled" flag set to "0". When creating the file the stem tracks SHOULD NOT have any intra-stem gain normalization applied to bring the stems up to the same perceived volume. Instead they should retain the same levels as they would have in the final mix present in the mixdown track so that if all stems were played at unity gain the overall level would be equivalent to the level of the final mix. On playback, DJ software MAY choose to normalize the gain on any combination of stems currently being played to make it equivalent to the mixdown track or any other tracks being mixed in even if some stems are muted or have their individual gains adjusted. However, the exact mixing behavior of DJ applications is outside the scope of this specification. Each stem track MUST set the value of the track Name element () to a short, human-meaningful, track name for the stem that describes its contents, for example "Percussion" or "Vocals". These names are intended for display in playback applications and therefore should remain concise (generally no more than one word), but no specific format or length requirement is defined. The track Name element MAY also be duplicated or overridden as a tag, in which case the order of precedence from SHOULD be respected. For each stem track a tag () SHOULD also be set with its target set to the stem track and a tag name of "STEM_COLOR". The tag value must be a string in RGB hex format set to a color representing the stem (ie. #145374).

Mastering Because mastering happens post-mix and the stems are pre-mix audio the stem tracks SHOULD NOT have any mastering steps applied. This means that a DJ playing the track using the stem tracks instead of the mixdown track will result in different audio from the final mix. This is deemed an acceptable trade off since the final sound of the DJs version of a track is likely to be significantly different from what the original track producer had created either way. Even without mastering this method still gives the producer more control over the final sound than if the DJ were to use an auto stem separation algorithm.

Format Support The Matroska container format can store many types of audio, not all of which are suitable for DJing or music production. To ensure compatibility between playback and encoding applications the following formats SHOULD be supported depending on the use case of the software as shown in the following table. Formats with the use case "Live remixing" are intended largely for playback applications meant for live performance (ie. DJ software). Formats with the use case "Music production" are intended to be distributed for remixing in a non-live setting (ie. with a DAW or music tracker). Audio codec support

Codec	Use Case	Codec ID
FLAC	Live remixing, Music production	A_FLAC
Opus	Live remixing	A_OPUS

IANA Considerations This memo modifies the "Matroska Tag Names" registry defined in to add the following values: Additions to the "Matroska Tag Names" Registry

Tag Name	Tag Type	Reference
STEM_COLOR	UTF-8	This document,

Security Considerations This document inherits security considerations from both and . It does not have additional security considerations.

Normative References Informative References

Acknowledgements Thanks to the members of #matroska on the libera.chat IRC network, and to mosu and JanC in particular, for patiently explaining the basics of the format to me and for all their feedback. Thanks also to the members of the Ardour forums for their feedback on DAWs and mastering. Finally, thanks to the members of the IETF CELLAR working group, especially Steve Lhomme, for their feedback.