2007-08-30

Audio coding, part 1

Lately I've been thinking about lossy audio compression in general, and ambisonic compression in particular. The ambisonic stuff will mostly go to sursound-list, but I thought I should put down some of my less coherent ideas here.

One of the things that strikes me as odd is the sharp line people tend to draw between parametric and wideband coding schemes. I really see no need for that sort of thing. For example, think about sinusoidal parametric codecs. What are they besides subband coders with exceedingly thin bands and a very particular parametrization for generating the bitstream?

I think that sort of artificial boundary can actually hinder progress in the coding front. For example, HILN nowadays achieves extremely high coding gain by encoding harmonic/periodic sounds as a pitch period and a low order linear predictive coded spectral envelope. Why is it that none of the wideband coders do anything of the sort when discrete, impulsive components are present in the spectrum? That sort of thing could prove a real improvement, because much of the coding gain of a typical subband codec comes from approximation of the spectral envelope of the sound, or the masking threshold. The efficiency of such an approach is entirely based on a smoothness assumption, which discrete spikes in the spectum violate. Hence for instance Vorbis could quite conceivably achieve smaller residuals and sparser interpolation of the spectral floor by using two separate, and more aggressively interpolated, envelopes: one for the impulsive components, and another for the smoother, continuous part underneath.

I think one reason that sort of stuff doesn't seem to attract attention in the literature is because of the idea that ideas born on one side of the fence cannot readily be utilized elsewhere.