Monday 15 December 2014

DSD - Is DST Compression Really Lossless?

The SACD format was built around DSD right from the start.  Since DSD takes up about four times the amount of disk space of a 16/44.1 equivalent this meant that a new physical disc format with more capacity than a CD was going to be required.  Additionally, SACD was specified to deliver multi-channel content, which increases the storage requirement by another factor of 3 or more, depending on how many channels you want to support.  The only high-capacity disc format that was on the horizon at the time was the one eventually used for DVD, and even this was going to be inadequate for the full multi-channel capability envisaged for SACD.

The solution was to adopt a lossless data compression protocol to reduce the size of a multi-channel DSD master file so that it would fit.  The protocol chosen is called DST, and is an elaborate DSP-based method derived from the way MP3 works.  Essentially, you store a bunch of numbers that represent the actual data as a mathematical function which you can later use to try to re-create the original data.  You then store a bunch of additional numbers which represent the differences between the actual data and the attempted recreation.  If you do this properly, the mathematical function numbers, plus the difference data, takes up less space than the original data.  On a SACD the compression achieved is about 50%, which is pretty good, and permits a lot of content to be stored.

Given that DST compression is lossless, it is interesting that the SACD format allows discs to be mastered with your choice of compressed or non-compressed data.  And, taking a good look at a significant sample of SACDs, it appears that a substantial proportion of those discs do not use compression.  Additionally, if you look closely, you will see that almost all of the serious audiophile remasters released on SACD are all uncompressed.  So the question I have been asking is - is there any reason to believe that DST-compressed SACDs might sound worse than uncompressed ones?

First of all, let me be clear on one thing.  The DST compression algorithm is lossless.  This means that the reconstructed bit stream is bit-for-bit identical to the original uncompressed bit stream.  This is not at issue here.  Nor is the notion that compressing and decompressing the bits somehow stresses them so that they don’t sound so relaxed on playback.  I don’t buy mumbo jumbo.  The real answer is both simpler than you would imagine (although technically quite complicated), and at the same time typical of an industry which has been known to upsample CD content and sell it for twice the price on a SACD disc.

To understand this, we need to take a closer look at how the DSD format works.  I have written at length about how DSD makes use of a combination of massive oversampling and noise shaping to encode a complex waveform in a 1-bit format.  In a Sigma-Delta Modulator (SDM) the quantization noise is pushed out of the audio band and up into the vast reaches of the ultrasonic bandwidth which dominates the DSD encoding space.  The audio signal only occupies the frequency space below 20kHz (to choose a number that most people will agree on).  But DSD is sampled at 2,822kHz, so there is a vast amount of bandwidth between 20kHz and 2,822kHz available, into which the quantization noise can be swept.

One of the key attributes of a good clean audio signal is that it have low noise in the audio band.  In general, the higher quality the audio signal, the lower the noise it will exhibit.  The best microphones can capture sounds that cannot be fully encoded using 16-bit PCM.  However, 24-bit PCM can capture anything that the best microphones will put out.  Therefore if DSD is to deliver the very highest in audio performance standards it needs to be able to sustain a noise floor better than that of 16-bit audio, and approaching that of 24-bit audio.

The term “Noise Shaping” is a good one.  Because quantization noise cannot be eliminated, all you can hope to do is to take it from one frequency band where you don’t want it, and move it into another where you don’t mind it - and in the 1-bit world of DSD there is an awful lot of quantization noise.  This is the job of an SDM.  The design of the SDM determines how much noise is removed from the audio frequency band, and where it gets put.  Mathematically, DSD is capable of encoding a staggeringly low noise floor in the audio band.  Something down in the region of -180dB to -200dB has been demonstrated.  What good DSD recordings achieve is nearer to -120dB, and the difference is partly due to that fact that practical real-world SDM designs seriously underperform their theoretical capabilities.  But it also arises because better performance requires a higher-order SDM design, and beyond a certain limit high-order SDMs are simply unstable.  A workmanlike SDM would be a 5th-order design, but the best performance today is achieved with 8th or 9th order SDMs.  Higher than that, and they cannot be made to work.

So how does a higher-order SDM achieve superior performance?  The answer is that it packs more and more of the quantization noise into the upper reaches of the ultrasonic frequency space.  So a higher-performance higher-order SDM will tend to encode progressively more high-frequency noise into the bitstream.  A theoretically perfect SDM will create a Bit Stream whose high frequency content is virtually indistinguishable from full-scale white noise.

This is where DST compression comes in.  Recall that DST compression works by storing a set of numbers that enable you to reconstruct a close approximation of the original data, plus all of the differences between the reconstructed bit stream and the original bit stream.  Obviously the size of the compressed (DST-encoded) file will be governed to a large degree by how much data is needed to store the difference signal.  It turns out that the set of numbers that reconstruct the ‘close approximation’ do a relatively good job of encoding the low frequency data, but a relatively poor job of encoding the high frequency data.  Therefore, the more high frequency data is present, the more additional data will be needed to encode the difference signal.  And the larger the difference signal, the larger the compressed file will be.  In the extreme, the difference signal can be so large that you will not be able to achieve much compression at all.

This is the situation we are in with today’s technology.  We can produce the highest quality DSD signal and be unable to compress it effectively, or we can accept a reduction in quality and achieve a useful degree of (lossless) compression.

So what happens when we have a nice high-resolution DSD recording all ready to be sent to the SACD mastering plant?  What happens if the DSD content is too large to fit onto a SACD, and cannot be compressed enough so that it does?  The answer will disappoint you.  What happens is that the high quality DSD master tape is remodulated using a modest 5th-order SDM, in the process producing a new DSD version which can now be efficiently compressed using DST compression.  Most listeners agree that a 5th order SDM produces audibly inferior sound to a good 8th order SDM, but with real music recordings it is essentially impossible to inspect a DSD data file and determine unambiguously what order of SDM was used to encode it.  So it is easy enough to get away with.

How do you tell if a SACD is compressed or not?  Well, if you have the underground tools necessary, you can rip it and analyze it definitively.  For the rest of us there is no sure method except for one.  You simply add up the total duration of the music on the disc, and calculate 2,822,400 bits of data per second, per channel.  If the answer amounts to more than 4.7GB then the data must be compressed.  If it adds up to less, there is no guarantee that it won’t be DST-compressed, but the chances are pretty good that it is not.  After all, if the record company wants to compress it, they’d have to pay someone to do that, and that probably ain’t gonna happen.  The other simple guideline is that if it is multi-channel it is probably compressed, but if it is stereo it probably is not.

Of course, none of this need apply to downloaded DSD files.  If produced by reputable studios these will have been produced using the best quality modulators they can afford, and since DST encoding is not used on commercial DSF and DFF* files this whole issue need not arise.  However, if the downloaded files are derived from a SACD (as many files are which are not distributed by the original producers), then the possibility does exist that you are receiving an inferior 5th-order remodulated version.  The take-away is that not all DSD is created equal.  Yet another thing for us to have to bear in mind!

[* Actually, the DFF file format does allow for the DSD content to be DST compressed, because the DFF file format is what is used by the mastering house to provide the final distribution-ready content to the SACD disc manufacturing plant.  However, for consumer DSD file distribution, I don’t think anybody employs DST compression.]