Appendix 4: MQA Encoding those files

MQA is not a codec in the conventional sense. It takes account of the source (A/D and mastering) and playback (D/A converter). The conceptual target is analogue to analogue, with a temporal blur equivalent to a few meters of air and a noisefloor target of atmospheric ‘absolute zero’. [1]

In the diagram, ‘Encapsulation’ includes deblurring. The lossless core process can use ‘Origami’ if the input sample rate exceeds the transmission rate. The decoder is driven by the encoder and includes a Core decoder followed by matched rendering which is customised for each D/A converter to ensure a consistent end-to-end response. [2]

If we try to measure the MQA process, the ‘deblurring’ that manages end-end impulse response and modulation noise will reveal a difference intended by design. However, if used correctly, that difference never includes overload, distortion or any detectable downward aliasing. Also, the sound will always be clearer.

There are different classes of MQA encoder according to application. Extensive tools and facilities are available to mastering engineers and labels. Even more facilities to control the deblurring and encapsulation processes are used for ‘white-glove’ projects. In many scenarios, an encoder can estimate optimum deblurring based on analysis and training over a large corpus of music.

Hierarchy

MQA is a hierarchical system that supports multiple channels and a hierarchy of sample rates. There are two variants of encoding: ‘MQA’ where the input and transmission sample rates are the same and ‘MQL’ where the input rate exceeds the transmission rate by a factor of 2 or more. [3]

‘MQA’ can encode a ‘rectangular information space’. MQL encodes information within a triangle with a generous additional guard-band using the ‘Origami’ process.

When ‘MQA’ encoding is used the encoder can include the complete gamut of the input signal and it is not necessary to find a triangle.

When ‘MQL’ encoding is used, the signal should be music, speech or natural sounds and have an identifiable noisefloor to successfully encode. The transmission rate is chosen to be high enough that the signal peak and noise converge (i.e., a triangle can be determined). [4]

Although theoretically for extreme archive work, higher transmission rates may be needed, so far, across millions of songs, the convergence criterion is met for transmission at 1x (44.1or 48kHz). This is explained in the AES paper. [5]

What about the two files encoded by the blogger?

In both cases, the automatic ‘deblurring’ was thoroughly confused by the series of different signals and inconsistent noise floor in the file.

The 44.1 kHz file encoded as ‘MQA’ because the information was ‘rectangular’ but issued warning messages.[6]

The 88.2 kHz encode failed to find a triangle (not surprising when we look at the spectrum) and gave multiple error messages, as described earlier.

The MQA process is for music and, by design, it focuses on natural sounds, speech and music – and in particular on preserving those very fine details that preserve ‘resolution’ for the human listener.

It is not logical to use ‘deblurring’ on computer-generated test signals – we don’t use that when making test vectors!

Does that mean MQA is defective?

Not at all!

Most of the confusion in the blog arose because a) the 44.1 kHz file was intercepted in the playback process where hierarchical upsampling had begun. [7] and, b) with the 88.2kHz file, severe overload resulted in nonsense.

In fact, when we separate out sections of the files, they can be ‘MQA’ encoded perfectly with the ‘white-glove encoder’ (albeit, because of the signals, a somewhat pointless academic exercise). [8]

As described earlier, some of the test signals were far outside of the performance envelope of the ‘MQL’ process.

When operated within its performance envelope, MQA does not introduce distortion or detectable aliasing and, as can be seen in the diagrams below, the Core temporal response is perfect.

Opposite we show the MQA response to a full-scale 1kHz tone at 44.1 kHz.

Blue is the measured spectrum of the undecoded MQA file.
Red is the decoded output.

Green is a reference for dither at the 23rd bit.

We can see that there is no distortion and the decoder reveals the true dynamic range, lowering the floor from Blue to Red.

Opposite we illustrate the dynamic range of ‘MQL’.

Blue is the spectrum of the undecoded 44.1 kHz ‘MQL’ file.

Yellow The decoder reveals the original signal, in this case, -140 dBFS 4kHz @ 88.2 kHz.

Orange is a reference for dither at the 23rd bit.

We can see that the dynamic range exceeds 23b for audio frequencies below 16 kHz. The measured noise floor is -134 dBA

Opposite we see the Core decoded waveform when the 88.2 kHz square-wave is correctly encoded in ‘MQA’.

Opposite we see the Core decoded waveform when the 88.2 kHz Impulse is correctly encoded in ‘MQA’.

————–

[1] The baseline on Earth is thermal noise in the air and can be seen plotted here. https://www.stereophile.com/content/mqa-questions-and-answers-understanding-information-diagrams

[2] The lossless Core delivers exactly the sound approved in the studio.

[3] The hierarchy is described in https://www.stereophile.com/content/mqa-questions-and-answers-mqa-hierarchy

[4] Read more about the ‘Origami’ process here.

[5] Stuart, J. Robert; Craven, Peter G., ‘A Hierarchical Approach for Audio Archive and Distribution’, JAES Volume 67 Issue 5 pp. 258-277; May 2019. Open Access http://www.aes.org/e-lib/browse.cfm?elib=20456

[6] Warnings about ‘brick-wall’, inconsistent noisefloor and ‘overs’.

[7] Almost without exception, audio DACs use internal or external oversampling to match the incoming PCM to a high-speed modulator for conversion to analogue. This is described in detail in the reference in footnote [5] above. The MQA system is based on a hierarchy of modified B-spline kernels. To make the step from 1x (44.1 or 48 kHz) content to the hierarchy we use a very specific low-blur interpolator that precisely matches the encoder. The blog confused this with an aliasing defect.

[8] The encapsulation and dithering process adds noise lower than -144dB (A-weighting).