Provenance and Containers

MQA Encoding

When audio is encoded into MQA, several things happen, the most important of which are:

  • ‘Deblurring’ of the source to remove audible artefacts introduced by analogue-digital converters, mixing and mastering.
  • Identifying the musical information and encapsulating it for the highest quality sound.
  • ‘Origami’ – folding content into a PCM stream for distribution.
  • Embedding instructions for the Decoder and Renderer on how to reconstruct with the minimum impact on the clarity of sound and metadata about the content.
  • Embedding the Provenance signature.

Origami is always used when the input sample rate is higher than the ‘transmission rate’.

MQA can be sourced from analogue, or PCM, from A/D modulators or from DSD, but the final output is a PCM stream. Currently, all MQA on streaming services has a transmission rate of 1x (either 44.1 kHz or 48 kHz depending on the ‘family’ of the original).

When the input is PCM, the output stream will have the same bit-depth as the input unless either a) Origami is used or b) the input is DSD or floating-point; in these cases, the MQA stream output will always be 24 bit. So an original at 44.1 kHz/24b will create a 24b file and 44.1kHz/16b will create a 16b file. However an original of 96kHz/16b) will generate a 48kHz/24b MQA file because Origami was used. [1]

Playback

So, an MQA stream is PCM that contains all the audio, information about the audio, instructions for the different ranks of playback (depending on decoder and renderer) and a signature. When the stream is played back, the MQA decoder accesses all these parts and can display, e.g. MQA or Studio, Original sample rate, and so on. A fully-featured MQA decoder can display this information purely from the PCM.

Although the MQA stream knows the original sample rate, we don’t bury information about the original bit depth because it is implicit and irrelevant (given that MQA always encapsulates the full dynamic range of musical information with the highest possible precision).

Containers

So far, we have talked about MQA as a PCM ‘stream’ – because an MQA encoder can operate indefinitely, with no unique start and end, e.g. for a Live or Radio broadcast. Not only that, but a decoder can seamlessly ‘join’ the stream at any point.

Because it is a PCM stream that can be heard without a decoder, the MQA data can be losslessly compressed without any impact. So long as the lossless transmission remains bit-accurate, then the MQA stream can be recovered for a decoder — it was simply distributed in a ‘lossless container’ – and we do it to reduce the data rate in distribution.  Typical saving with lossless compression will average 8 bits-per sample and vary between 0 and 12 bits depending on the content. [2]

Tracks (Songs)

Music is often distributed in an album comprising a collection of shorter-length tracks that may be intended to be played alone, in groups (as ‘a work’) or all together. These days, each track is created, stored and distributed as a file because that is most convenient when we listen via music servers, on portable music players or to a streaming service.

When a track is encoded in MQA, the encoding process is enhanced:

  • Each MQA file has additional ‘start and ‘stop’ information for the decoder.
  • If requested, the encoder will analyse groups of tracks (work or album) to optimise the encode and facilitate seamless playback.

The MQA encode, as before, is PCM and can be conveyed in any lossless container, e.g. as WAV, AIFF, ALAC, FLAC and others. For general distribution, we prefer to use lossless compression to save space and select FLAC for the well-supported additional headers that can carry metadata, album art but also information about the encoder, original sample rate and so on.

Provenance

An MQA stream in a FLAC file has the bit-precise audio inside a container. That FLAC file, in turn, has a fragile header with information about the contents. The header is fragile because anyone can change it on the journey between the Label and the listener. But if the MQA indicator is to light up, then the audio itself must be unchanged and, in this way, we can be sure that the listeners hear exactly what was signed off by the artist, producer or label.

Headers

Why do headers matter? There are several reasons, the header can contain:

  • Information that is useful for a player to access without actually playing the music, for example, artist and song title, genre, cover art, publisher. (Some of this information can be different according to geographical region).
  • Information that is changed or added by the distribution chain, e.g. adding a retailers name to a download.
  • Information that may be useful for User Interface in an MQA player before playback starts – good examples here include the MQA fields: ‘this is probably MQA’ and ‘Original sample rate’.
  • Information to assist playback that can be added by the label, distributor or streaming service. This includes, e.g. optional loudness normalisation level and Seek Tables. Loudness information is accessed by the player, Seek Tables are used by the streaming server when the listener scans forward and back in the song. [3]
    Unfortunately, streaming services can have different specifications for loudness and seek tables and so this information can be altered after the file leaves the label.

More on Provenance

So we see that Provenance is secure because the audio in an MQA file will not decode if it is changed.

But there is one harmless way it can be changed and that is if a 16b MQA stream is extended to 24b by the addition of zeros to the bottom 8 bits. The zeros contain no information and the MQA decoder will ignore them.

There is no benefit to this word-width extension, but it can happen benignly and automatically if a 16b MQA stream is passed over a 24-bit link such as SPDIF/optical or HDMI. 16-bit MQA goes in, 24 PCM bits come out, but the audio information in the top 16-bits is not changedand that is all we care about.

Of course, if the MQA starts out as 24b, then there is no useful scenario for altering the word-width in this way (except as provided for compatibility) – see blog about playback and MQA-CD.

It is possible for a 16b FLAC file to be opened (for example to add a seek table) and then saved (in error) as a 24b container. When that happens, the audio is zero-extended, just as described above for SPDIF. Such a file would then indicate to the outside that it is a 24b container and normally, we would expect it to contain 24 bits of significant data. Still, FLAC will detect the zeros and compress down to basically the same size as the 16-bit – as it should because the information content is identical (except for the seek table header).

Streaming 16b MQA

Recently, as we have rolled out many more MQA files sourced from 44.1 kHz 16b masters – where we expect the MQA file to be 16b – this FLAC mishandling occasionally happened while adding seek tables and some listeners have been confused about the bit-depth indication on their players. However, in this case, whether the streamed FLAC is 16b or 24b, the audio is identicalas we know from the MQA indicator.

Because there was a process to tidy up that error, some contents might appear to change between 16b and 24b, but it has no significance.

What is 16b MQA? Is it MQA-CD?

This is a topic for the next blog …. 16-bit MQA

————————————

[1] Yes, it does exist, 96 kHz/16b is a formatted supported on DVD.

[2] An audio stream in a lossless container has a variable output data rate (in technical terms the instantaneous data rate reflects the underlying entropy or information in the content). Notice here that we are separating ‘quantity of data‘ from ‘quantity of information‘ – a distinction that people often miss when worrying about sample rate and bit-depth.
A saving of 8 bits-per-sample means that a 16b file can be halved in size, whereas a 24b file can only be reduced by one third. Another way of looking at that is that the 24b signal contains, in the lowest 8 bits, a signal that from the information point of view, closely resembles random noise and, by corollary, may not convey useful information – even though it may change the behaviour of a DAC. Once again we see the distinction between ‘data‘ (bits in the file) and ‘information‘ (sounds that convey meaning).

[3] Why is a Seek Table needed? Put simply, remember that the degree of compression in a lossless file is not constant, it depends on the music – we get more compression in quieter passages. So, the middle of a song is not necessarily in the middle of the file. The server can use the seek table to be precise about a jump.