Alright, Emmett, I need you to clear something up for me. All that sample rate and bitrate stuff. I don’t get it and everyone says something different. I trust you more than most.
Thanks
Scott
Hey Scott,
I get the confusion. It isn’t just you. Confusion over this kind of thing extends beyond just those working in home studios, and goes all the way up to very skilled audio engineers. You may be wondering how that’s possible. Well, the simplest answer is that it just doesn’t matter that much. As long as you’re able to deliver audio the way a client has requested, you know all you really need to know. A sample rate of 44,100Hz and a bit-depth of 16, is all our ears need to be happy. Every single interface and DAW will operate at least to those specifications, and virtually all will go higher. Any professional can make fantastic sounding recordings without ever exceeding those basic numbers. I’m not saying you should work at those specs; I’m saying nearly all discussions about sample rates and bit-depths, involve specs higher than the minimum necessary to satisfy our ears, which is why I say it isn’t all that important.
Most discussions aren’t even really about the sound quality. For instance, no one argues a lower sampling rate sounds better than a higher sampling rate. They’re much more focused on what’s actually necessary, and the point where it becomes a waste of resources. To be perfectly clear, if a client has a request, it is never a waste of your resources to meet that request. I may silently judge them for their choices, but it isn’t my job to provide my opinion; it’s my job to honor their requests. Doing anything else would be remarkably unprofessional.
Before jumping into each of the options, let’s talk about where all this incorrect information originates.
Those Who Don’t Know. Plain ol’ bad information, is the most common culprit. That stems from people passing around information they just don’t understand well enough. It gets convoluted, with impressive-sounding pseudoscience and easily disprovable hypotheses. This is quite often a symptom of people creating content solely for the purpose of likes and shares.
Then there’s The Dinosaurs. They pass around outdated information. While the nature of digital audio hasn’t changed, the way we work with it certainly has. Analog-to-digital converters, word clocks, and DAWs have all made significant improvements over the years. There will probably always be a gap between the theoretical limits of digital audio, and the practical limits of our equipment, but that gap has shrunken, considerably. Between the now-very-common practice of oversampling used by DAWs, to improvements in design and implementation of anti-alias filters, to general improvements in the mathematics of data conversation, and the hardware to support it, things are very different, than just a decade ago. Because of that, lots of what you’ll see, is simply no longer applicable under current conditions.
Next, is The Super-Professional Operators. Who is going to challenge a Grammy-winning engineer, or a Platinum-selling recording artist? No one. And that’s the problem. Just as it’s entirely possible to create amazing music without a deep or completely-correct understanding of music theory, it’s possible be exceptional at producing audio, without fully understanding the technical details happening under the hood. That circles back to the fact that it just doesn’t matter that much. All of those exceptional audio producers would be just as exceptional, working with 44.1k, 16-bit audio. Some will try to give technical explanations of why they do not, and those explanations are often flawed. It doesn’t matter, except people will take it is gospel, due to the source. It’s important to remember, it’s 100% possible to be great on the practical side, and terrible on the theory side.
Finally, the Super-Geek Purists. It isn’t that the Super-Geeks are incorrect, it’s that they allow theory to dominate practice. Rounding at the 138th decimal place, for instance, will be technically less precise than rounding at the 143rd decimal place. But that type of precision does not translate to an audible difference, and isn’t relevant. The Super-Geek Purists, however, will present those minuscule difference, as being vitally important to the sound, which can be confusing for those who just need to get to work.
I enjoy getting geeky, and getting into the theory of these things, but like you, I also have a job to do. To dispense advice, you should know a lot. But to do the job, you only need a basic understanding.
Let’s start out with the easiest thing to clear up: bitrate. Technically all audio files have a bitrate, but it’s only a term we use when discussing lossy formats, like mp3. That means, it’s rarely a consideration for a final delivery product. Rather than a measurement of audio quality, bitrate is a measure of data over time. A common bitrate for mp3, for instance, is 128kbps. If a file is :60 seconds long, a 128kbps file will always be the same size. It will not, however, always be the same quality. The more information the CODEC tries to cram into 128kbps, the lower the audio quality becomes. This is most evident with a stereo file, versus a mono file. The stereo file is trying to cram twice as much information, into the same amount of space. To do that, more frequency information must be removed. Complex sources, with lots of frequency information, will present a lower quality result than a simpler source, with less frequency information.
Let's do a food analogy, and let's do a sweet food. Imagine a cake pan, filled with rainbow-colored cake, and each layer is a different color. You have to transfer the cake from that pan, to a smaller pan. You know there’s no way to preserve the perfect appearance of the cake, but it should still taste good, right? You cut it up, so it can fit. You lose some crumbs, here and there, but nothing to worry about. But then, you realize you can’t make it all fit. You decide the yellow layer is the least important, so you cut it out to make the rest of the cake fit into the smaller pan. The lower the bitrate, the more layers have to be cut away. Sure, the cake still tastes good, but it isn’t as good as the whole thing. Lossy files with bitrates are used mainly for auditions. Plan on using lossless wav or aiff files, when dealing with delivery of an actual recording for use. Also, it’s a good idea to always save a full-quality backup file of your audio, in case you need to revisit for some reason.
Moving on to bit-depth, it isn’t the same as bitrate. Bit-depth is the measurement of resolution. As it translates to practical terms, it’s manly a measure of dynamic range. With fixed-point files, each bit is equal to 6dB of resolution. 8 bits offers 48dB, 16 bits has 96dB, and 24 bits allows for 144dB. 32-bit floating-point files are basically a 24-bit file with an exponent, which allows them to slide along a scale of around 1500dB, without internally clipping.
In addition to dynamic range, those extra bits allow more decimal places with processing. Each process is a mathematical calculation. The more available decimal places, the greater the accuracy, which can ultimately translate to better sound quality. Lower bit-depths require rounding to fewer decimal places. Realistically, it isn’t something you’ll notice with a single processor, and on many DAWs, it isn’t something you’ll notice at all. But some DAWs use destructive processing, where each processor will cumulatively do more and more rounding. Applying several subsequent processes at 16-bit, and it becomes easy to hear the degradation. At 24-bit, it’s less noticeable, bit still audible. At 32-bit floating-point, however, there is effectively no loss, because the exponent allows for a huge number of decimal places. I always recommend at least 24-bit. The real-world advantages of working with 32-bit floating-point files are debatable. If you’re using a destructive editor, like Audition’s Waveform View, or Twisted Wave, I’d recommend 32-float. For nondestructive DAWs, almost all have a floating-point mix engine, so it’s not as important. I still, however, recommend at least 24, because the greater available dynamic range will allow you to set your gain to a lower level without introducing additional noise.
Moving on to sample rates, this is where the debate amongst pros becomes more interesting. The sampling rate is the number of snapshots taken of the audio each second. It takes two samples to represent any given frequency. Therefore, a sampling rate of 44,100 can represent frequencies of 0-22,050 Hz. The standard range of normal human hearing is 20-20,000 Hz. That’s why I said everything 44,100 and above is fine. Because any difference in sound quality is imperceptible.
So why 44,100? That’s CD quality. When the CD was developed in 1979, there were many more practical considerations, when dealing with digital audio, than there are today. I’ve heard a number of stories about the arrival at that particular number, and they may all be varying degrees of correct, but the short answer comes down to practicality at the time. It covered the range of human hearing, plus it left some margin for aliasing filters. It also worked very well with video tapes, which, oddly enough, were the preferred method for delivering audio to duplication facilities.
So, if 44,100 serves the purpose, why go higher? Well, this is where the debates really start to heat up. 44,100Hz (also known as 44.1kHz, or just 44.1k), is still heavily used as a delivery format. From streaming applications, to radio stations, it’s still a common working format. But at some point, there was a split, and 48kHz became the digital audio standard. That’s what’s used for video, games, and frequently, as a general working format. I don’t know exactly why 48,000 was chosen, but I hypothesize it was simply a way to create a larger cushion above the range of human hearing, and technological improvements at that time, made it a viable choice.
In the old days, conversion between sample rates was not a straightforward task. 48kHz and 44.1 kHz do not divide nicely. The math is complicated enough that early conversion from 48kHz did not convert to 44.1kHz without degradation of sound quality. That’s where the arguments began. Some would argue the extended frequencies helped keep aliasing (I’ll explain in a moment) out of the audible band, and therefore, it was best to record at 48kHz and convert, if 44.1kHz was to be the delivery format. Others argued the quality loss of the conversion was too significant, and therefore, it was better to record directly to 44.1kHz. Today, the argument may still exist, but in a different form. Both problems have been addressed enough that neither one is an issue with modern equipment. The conversion from 48k to 44.1k can be done nearly perfectly. But anti-aliasing filters are also now nearly perfect. The geeks will still debate, but it has become theoretical, rather than the practical problem it was when it began.
I promised I would explain aliasing. Frequencies that exceed the Nyquist limit, which is the highest frequency represented by a sampling rate, create aliasing, which is a specific kind of distortion that appears within the available frequencies. At 44.1k, that leaves 2.05kHz above the defined range of human hearing. Anti-aliasing filters are used to eliminate the problem. But in the earlier days, the filters just weren’t very good, and would eat up a chunk of frequencies. Today, they’re near perfect, and simply not an issue. Well, kind of. I’ll get to that.
So far, we’ve covered recording at 44.1 and 48, and hopefully you’ve gathered that both sound great. Some people choose to record everything at 48k, and downsample, if needed. I prefer to know the delivery format ahead of time, and record at what will become the final sample rate. Both approaches are acceptable. I should note, though, recording at 44.1k and upsampling to 48k is NOT acceptable, because the quality will still be that of 44.1k. I could argue that it doesn’t actually matter, because the difference will be inaudible. But the fact is, if the client is expecting 48k, that’s what you owe them.
You’ve probably seen sample rates much higher than 48kHz, such as 96kHz. On very rare occasions, a 96k sampling rate, or even 192k, will be requested. Like everything else, it is your job to provide.
Why would someone request 96k? There are a few reasons. For one thing, it weeds out talent with rudimentary gear. It’s a form of gatekeeping. In my opinion, it’s perfectly acceptable to gatekeep, in this way. If a voice actor cannot deliver files at a requested sample rate, there’s an increased likelihood of other technical inefficiencies. Everyone is there to do a job, and bypassing those who may slow down the process, has value.
Another reason is the basic idea that it’s harmless, and more is more. I can’t really argue with that line of thinking either. I could (and do) argue that it’s unnecessary, but if I’m not the one producing, my thoughts on the necessity are irrelevant. Hard drive space is cheap, internet is fast, and DAWs can handle it. So there’s no harm in preserving as much of the signal as possible, even if those extra frequencies are meaningless.
Some people think it sounds better. These people are either delusional, or The Dinosaurs. When gear began to be created for home users, many design compromises needed to be made. Because of cheap components, 44.1 and even 48k did not actually perform to their intended specifications. They had problems with aliasing and overly-aggressive anti-alias filters. During that time, 96k actually did sound better. It wasn’t because of the sample rate itself; it was because of limitations of the gear. At one time, they heard a comparison between 44.1k, and 96k, and could easily hear the difference. Not knowing the real reason, they concluded 96k sounded better, and it has stayed with them. This hasn’t really been much of an issue for about 20 years, so if they’re still working in the industry, they’re obviously doing something right, and don't have to justify their reasons for doing anything. That circles back to The Super-Professional Operators, who know how to use the gear, but may not actually understand what it’s doing with their audio, once the audio becomes numbers (binary code).
As for those I called “delusional,” it might be better to say they’re experiencing the placebo effect. Perception, even if imagined, is reality. If the user believes 96k sounds better, the only way to prove them wrong, is through double-blind testing, to which few will submit, because they just don’t really care.
Finally, some point to processing as reasoning for a higher sampling rate. These days, proper anti-aliasing and oversampling within plugins has made this largely a non-issue. Linear processes haven’t ever really been much of a problem, but certain nonlinear processes did not account for the Nyquist limit, which is the highest frequency that can be represented by a given sampling rate. So, for instance, pitching up a few semitones could cause aliasing to occur at the top of the signal. Today, many DAWs and plugins use oversampling and/or filters to avoid these problems. Oversampling is when a processor will increase the sample rate two, four, or sometimes as much as 32 times that of the original sample rate, apply signal processing, filter, and return to the original sample rate. Frankly, it’s a feature that should have become standard years ago. The practice isn’t new, but it has only recently become commonplace. Some processors still aren’t using it, so working at a higher sample rate avoids the issue.
So, where does all of that leave the voice actor? For the most part, simply record and deliver what’s asked. It doesn’t have to be complicated. For everything else, I suggest always working in 24-bit or higher. Sample rate is debatable. I, personally, record most things at 44.1kHz, because more than 95% of my work will be delivered at that sampling rate. Recording at 48k is also fine, and can easily be converted, if needed. I do not recommend recording at sample rates above 48k, except when specifically requested, because it’s so rarely necessary.
You’re probably feeling more confused than before. Sorry about that. If I can clarify any specific points, let me know!
Emmett
Comments