44.1: A Love Story
The sampling theorem - variously named after Shannon, Nyquist, Whittaker, Kotelnikov, and others that Wikipedia isn’t telling me about - ends up being extremely important to my great love, audio. The statement of it I find easiest to think about is this:
A continuous function whose Fourier Transform reveals no frequencies above B Hz is completely determined by sampling at 2B Hz.
The proof and history are both also interesting, so I encourage you to read more about them at your local library.
Next question: what’s the upper range of human hearing? Normally we top out at around 20 kHz, but that’s for young people and there’s also variation. Provided you have good enough sound reproduction equipment, you can test yours with something like this video. (Mine tops out at about 15.5 kHz, it looks like, which is not only good for my age but surprising given how much music I listen to.)
So let’s over-engineer a bit (because we can) for that one kid in our school who took the mosquito ringtone and pitch-shifted it up so that only they could hear it. A reasonable sampling frequency to capture the spectrum of human hearing must therefore be, by Sampling theorem above, at least 40 kHz. And let’s say we pick as our sampling rate 44.1 kHz, which gives us a frequency bound at 22.05 kHz; enough to appease that kid.
This choice isn’t arbitrary at all; it’s in fact the choice that Sony and Phillips made when it came time to encode audio onto VHS tapes. It’s compatible therefore with both PAL and NTSC (so standards from the 1940s and 1950s) for reasons I’m not going to go into because they happened too far before I was born.
Phillips and Sony are also the two companies who created the next major standard for digital audio storage. And so, my beloved audio CDs (standard published in 1980) are all encoded using a sampling rate of - wait for it - 44.1. (Vinyl records and cassette tapes are analog and therefore not invited to this party.)
When I Linuxed my previous laptop Jeska, it was the first time I had been really able to see in a very real way how much of the system worked. (This seems to be something of a common thread among Linux users.) In particular, I could see CPU usage, and I could see how high the CPU usage was when playing audio.
After on-and-off searching the Internet (keeping in mind that this was almost a decade ago), I discovered that there were two problems causing this.
The first was that I had pulseaudio installed, which was easily fixed. (That sounds like a throwaway - and it is - but I will come back to it in a moment).
The second was that, by default, the sample rate for the system was set at 48
kHz. This was nonsense to me: all of my audio was ripped from CDs, so why
would I bother with the (computationally) expensive process of resampling all
of it to 48? A handful of lines in ~/.asoundrc
and the problem was fixed:
audio was sent directly to the card at 44.1, and if it was resampling, it was
doing it in hardware and I didn’t see the effects of it anymore.
The way pulse made it worse is fascinating, though. In the default configuration, it seems to have been resampling again, for a total of two resamples, for reasons that are beyond me. While this isn’t harmful to the signal itself - nothing is lost by upsampling and then downsampling to the original sample rate - it’s a surprising waste to find in a system.
Anyway, I’ve kept those configurations, and most of the music too (though I rip in Vorbis now, which has probably saved me the cost of a larger hard drive). And the vast body of audio available to me not from CDs (i.e., on the Internet) has been as 44.1 as the day is long. I have a more modern machine now, with actual sound acceleration and a real CPU, so I wouldn’t notice the resampling overhead, but I like not having the waste there.
And here’s what I’ve been leading up to. Opus is starting to see serious use now (four years is a reasonable lag time from standardization to adoption, I suppose) on the great audio sites of the web (Youtube), and of course carries with it claims and data that it beats the pants off every other format ever at all bandwidth constraints. But it doesn’t have a very wide selection of sample rates. The 44.1 is gone (why?) in favor of a 48 with an RFC-mandated cap at 20 kHz (what are you DOING).
I actually do know the reason they’re doing it this way (and so do you if you read the next couple paragraphs of the RFC, but I blame you not at all if you didn’t), and I don’t like it - they wanted the sample rates to evenly divide the fullband rate, but that doesn’t justify the choice of fullband rate to me. 44.1, 22.05, 14.7, 11.025, 7.35 is no less appealing a rate collection than the 48, 24, 16, 12, 8 that Opus provides - it’s just prettier. And for some reason, we care a lot about that. Enough to upsample everything, despite clamping to a lower maximum frequency.