Digital Audio Basics
By Ben Blakesley
When working with digital audio there are two opposing factors that must always be considered: file size and sound quality. The following is an explanation of common file types, their attributes, and options.
Lossless Audio
Digital audio formats can easily be broken up into two categories, compressed audio and full resolution audio. Full resolution files are usually what can be called "source files" or "lossless" audio. This is because audio is typically recorded in the digital domain in this format and they do not use any sort of compression methods to decrease file size. The two most widely used lossless file types are WAV (.wav, standard for Windows-based systems) and AIFF (.aif, standard for Mac-based systems). Although WAV and AIFF started off as OS specific, today both file types can easily be used by either platform. WAV and AIFF files offer the best sound quality available, but their large file size can be restrictive in certain uses. A typical ratio for WAV or AIFF files is 10MB for every minute of audio (assuming stereo, 44.1kHz, 16-bit). So a 3 minute piece would be around 30MB. WAV and AIFF files should be used when creating master CDs for duplication, when handing off audio to a third party for editing or mastering, or any application where quality is of the utmost importance.
Sample Rate and Bit Depth
Sample rate refers to the number of reference points regarding sine wave amplitude per second. The bit depth of an audio file completes the digital grid representing the sound wave. I know, a bit confusing, but for all practical purposes the higher the sample rate and higher the bit depth, the higher the audio quality. (The more snapshots of the audio wave you have, the closer the digital representation will be to the actual wave). The most common and widely used sample rate is 44.1kHz (44,100 samples per second). This is the sample rate of a normal audio Compact Disc and the majority of digital music available to consumers. This is the rate at which you should record your audio and generally, unless you have significant file size issues, you should always keep your audio at this rate. It is able to be played by all players and will never cause a compatibility issue. Sample rates vary and include common rates like 11kHz, 22kHz, 48kHz, 88.2kHz, 96kHz, and the highest quality available in recording studios today, 192kHz. If you choose to use a sample rate other than 44.1kHz, the file size will change accordingly (lower sample rate = smaller file size and lower quality, and vise versa).Bit depth follows the same principle, the higher the bit depth, the better the sound quality but the larger the file size. A typical audio CD has a bit depth of 16. Other common bit depths are 8, 20, 24, and 32. There is also an option called "floating bit depth" and is usually coupled with 32. For all intents, stick with 16 as this will be the bit depth required for CD duplication. Generally it's a good idea to keep audio at its highest quality until it gets to its final format. So if you intend to create a standard CD from your audio, do not go below 44.1kHz and 16-bit.
Compressed Audio
Although using the highest resolution audio possible maintains the best quality, often the size of such files is prohibitive. In cases where files must be transferred over the internet or through email, compressed audio might be the only option for efficient use of time. Common compressed or "lossy" audio types are: MP3, AAC, and WMA. AAC is associated with Mac systems and is the default encoding format for iTunes and other Apple programs (file extension .m4a). WMA is the proprietary format developed by Microsoft for Windows and Windows Media Player. Although both file formats are comparable in size and audio quality, the universal standard for lossy audio is the multi-platform MP3 format. It's a good idea to use MP3 whenever compressed audio is needed, as there will be fewer compatibility issues.
Bit Rate
Compressed audio follows the same guidelines for sample rate, bit depth, and stereo/mono options as lossless audio. Again, it's a good idea to use 44.1kHz, 16-bit, stereo as your default if you're unsure of what should be used. But in addition to those three options, lossy audio has a fourth quality-measuring feature called bit rate (different from bit depth, don't get confused!). Currently there is no standard being used across the board by everyone for bit rate, but the most common bit rate is 128kbps. As usual, a higher number indicates better quality but larger file size. A stereo, 44.1kHz, 16-bit, MP3 file encoded at 128kbps will generally yield a file size of 1MB per minute of audio, so a 3 minute audio clip would be 3MB. About 90% smaller than its WAV counterpart. Other common bit rates are 32kbps, 64kbps, 96kbps, and so on in 32kbps increments all the way up to 320kbps.
NOTE: Bit rate is measured on a per channel basis meaning that a stereo MP3 at a bit rate of 128kbps is the same quality as a mono MP3 at 64kbps, because the stereo file has a left channel at 64kbps and a right channel at 64kbps that combines to 128kbps. So the most common bit rate for mono files is 64kbps.
General Guidelines
- Always record source audio as a lossless file (WAV or AIFF). Try to avoid using lossy file formats for recording original material. A lossy recording cannot be returned to lossless quality once it has been converted.
- When transferring files over the internet for editing or mastering purposes, use lossless audio if time and space permit. If this is not possible due to file size, convert the file to an MP3 and use the highest bit rate possible while maintaining a usable file size.
- When in doubt, use 44.1kHz, 16-bit, stereo for all formats.
Ben Blakesley is the Chief Engineer for Philadelphia based Javboy Records, which specializes in creating custom audio solutions. Visit them at www.javboyrecords.com