(algorithm)
Definition: A minimal variable-length character coding based on the frequency of each character. First, each character becomes a trivial binary tree, with the character as the only node. The character's frequency is the tree's frequency. Two trees with the least frequencies are joined as the subtrees of a new root that is assigned the sum of their frequencies. This is repeated until all characters are in one tree. One code bit represents each level. Thus more frequent characters are near the root and are coded with few bits, and rare characters are far from the root and are coded with many bits.
Also known as static Huffman coding.
Specialization (... is a kind of me.)
adaptive Huffman coding, k-ary Huffman coding.
Aggregate child (... is a part of or used in me.)
full binary tree, priority queue, greedy algorithm.
See also arithmetic coding, optimal merge, Shannon-Fano coding.
Note: The worst case for Huffman coding (or, equivalently, the longest Huffman coding for a set of characters) is when the distribution of frequencies follows the Fibonacci numbers. For this and other relations see Alex Vinokur's note on Fibonacci numbers, Lucas numbers and Huffman codes.
Joining trees by frequency is the same as merging sequences by length in optimal merge. See the example there. Since a node with only one child is not optimal, any Huffman coding is a full binary tree.
Huffman coding is one of many lossless compression algorithms.
Shannon-Fano is a minimal prefix code. Huffman is optimal for character coding (one character-one code word) and simple to program. Arithmetic coding is a little better still, since it can allocate fractional bits, but is more complicated and has patents.
Author: PEB
A survey on data compression, John Morris' explanation, example, and analysis. Wikipedia's encyclopedic article on Huffman coding.
David A. Huffman, A Method for The Construction of Minimum Redundancy Codes, Proceedings of IRE, 40(9):1098-1101, September 1952.
If you have suggestions, corrections, or comments, please get in touch with Paul E. Black.
Entry modified Thu Oct 27 10:05:21 2005.
HTML page formatted Thu Oct 27 10:13:14 2005.
Cite this as:
Paul E. Black, "Huffman coding", from
Dictionary of Algorithms and Data
Structures, Paul E. Black, ed.,
NIST.
http://www.nist.gov/dads/HTML/huffmanCoding.html