MP3 ("MPEG Audio layer 3") is a lossy audio data compression format, developed by the International Organization for Standardization (ISO). This format is used to compress normal audio formats (WAV or CD audio) at a rate of 1:12.
As mp3 files, the equivalent of twelve music CDs take up the same space as one CD-ROM. What's more, mp3 only barely alters the perciptible sound quality.
MPEG Layer 3 compression involves removing the data corresponding to frequencies inaudible to the average person under normal listening conditions. This compression analyses the spectrometric components of an audio signal, and applies a psychoacoustic model to them so as to preserve only "audible" sound. The average human ear is able to discern sounds between 0.02 kHz and 20 kHz, with sensitivity being at its peak for frequencies between 2 and 5 kHz (the human voice is between 0.5 and 2 kHz), following the curve given by Fletcher and Munson's law.
MPEG compression involves determining which sounds go unheard and can be deleted; it is therefore "lossy compression," with some data being destroyed.
Gabriel Bouvigne explains:
"When you look at the sun and a bird passes in front of it, you don't see it, because the light from the sun is too bright. Acoustics are like that. When there are loud sounds, you don't hear the quiet sounds. Take an organ, for example: When an organist isn't playing, you can hear whistling in the pipes, and when he is playing, you can't hear it anymore, because it's masked.
This is why it isn't necessary to record every sound, and this is the guiding principle used in the MP3 format to save space."
Often, certain passages of a musical recording cannot be encoded without changing the sound fidelity. Therefore, mp3 uses a small bit reservoir, which works by using passages which can be encoded at a lower bit rate than the rest of the data.
Most hi-fi sound systems use a single boomer (which produces the bass). However, it doesn't sound like the audio is coming from the boomer, but from the other speakers. Below a certain frequency, the human ear cannot tell where a sound is coming from. The mp3 format can (optionally) take advantage of this phenomenon by using the joint stereo method. This means that certain frequencies are recorded in mono, but they are accompanied by additional data in order to sound more like a multi-speaker setup.
The Huffman algorithm is an encoding (not compression) algorithm, which takes effect at the end of the compression process, by creating variable-length codes over a large number of bits. The codes have the advantage of a unique prefix, but they may be correctly decoded despite their variable length, and this can be done quickly with the use of tables. This type of encoding saves, on average, a little under 20% of the space taken up.
When sounds are "pure" (that is, there is no masking), the Huffman algorithm is very effective, as digital audio contains many redundant sounds.
With MP3 compression, a minute of CD audio (at a frequency of 44.1 kHz, 16 bits, stereo) takes up only 1MB.
An average song, therefore, is 3 or 4 MB, which makes it possible to download it even when using a modem.
| Frequency (Hz) | Mode | Bitrate | Quality | Compression |
|---|---|---|---|---|
| 11,025 | Mono | 8 kbps | Very poor | 200:1 |
| 22,050 | Stereo | 64 kbps | Poor | 25:1 |
| 44,100 | Stereo | 96 kbps | Fair | 16:1 |
| 44,100 | Stereo | 128 kbps | Good | 12:1 |
| 44,100 | Stereo | 196 kbps | Very good | 12:1 |