Why compress data?
Nowadays, the computing power of processors increases more quickly than storage capacities, and is much faster than network band-widths, because this requires enormous changes in the telecommunication infrastructures.
Thus, to compensate for this, it is usual to rather reduce the size of the data by exploiting the computing power of the processors rather than by increasing storage and data transmission capacities.
What is data compression?
Compression consists in reducing the physical size of information blocks. A compressor uses an algorithm which is used to optimize the data by using suitable considerations for the type as data to be compressed; a decompressor is thus necessary to reconstruct the original data using an algorithm that is the opposite to that used for compression.
The compression method depends intrinsically on the type of data to be compressed: an image will not be compressed in the same way as an audio fileâ€¦
Compression can be defined by the compression factor, that is, the number of bits in the compressed image divided by the number of bits in the original image.
The compression ratio, which is often used, is the inverse of the compression factor; it is usually expressed as a percentage.
Finally, the compression gain, also expressed as a percentage, is equivalent to 1 minus the compression ratio:
Types of compressions and methods
Physical and logical Compression
Physical compression acts directly on the data; it is thus a question of storing the redundant data from one bit pattern to another.
Logical compression on the other hand is carried out by a logical reasoning, substituting this information with equivalent information.
Symmetrical and asymmetrical Compression
In the case of symmetrical compression, the same method is used to compress and to decompress the data.
The same amount of work is thus needed for each of these operations. It is this type of compression which is generally used in data transmission.
Asymmetrical compression requires more work to be done for one of the two operations, it is usual to seek algorithms for which compression is slower than decompression. Algorithms that perform compression faster than decompression may be necessary in the case of data files which are seldom accessed (for security reasons for example), because this creates compact files.
Lossy compression, as opposed to lossless compression, eliminates some information in order to achieve the best possible compression ratio, while keeping a result which is as close as possible to the original data. It is the case, for example, of certain image or sound compressions, such as MP3 or the Ogg Vorbis format.
Since this type of compression removes information contained in the data that is to be compressed, it is usual to speak of irreversible compression methods.
Executable files, for example, cannot be compressed using this method, because they particularly need to preserve their integrity in order to be able to run. Indeed, it is not conceivable to roughly reconstruct a program by omitting bits and then adding some.
On the other hand, multimedia data (audio, video) can tolerate a certain level of degradation without the sensory organs (eye, tympanum, etc) distinguishing any significant degradation.
Adaptive, semi-adaptive and non-adaptive encoding
Certain compression algorithms are based on dictionaries that are for a specific type of data: these are non-adaptive encoders. The occurrence of letters in a text file, for example, depends on the language in which it is written.
An adaptive encoder adapts to the data which it will have to compress, it does not start out with an already prepared dictionary for a given type of data.
A semi-adaptive encoder will build a dictionary according to the data to be compressed: it builds the dictionary by going through the file and then compresses the latter.