# 1988Image coding using vector quantization_ a review.pdf

IEEE TRANSACTIONS ON COMMUNICATIONS, VOL. 36, NO. 8, AUGUST 1988 957 Image Coding Using Vector Quantization: A Review NASSER M. NASRABADI, MEMBER, IEEE, AND ROBERT A. KING Abstract-This paper presents a review of vector quantization tech- niques used for encoding digital images. First the concept of vector quantization is introduced, then its application to digital images is explained. Spatial, predictive, transform, hybrid, binary, and subband vector quantizers are reviewed. The emphasis here is on the usefulness of the vector quantization when it is combined with conventional image coding techniques or when it is used in different domains. I. INTRODUCTION MAGE compression is essential for applications such as TV I transmission, video conferencing , facsimile transmission of printed material, graphics images, or transmission of remote sensing images obtained from satellites and reconnaissance aircraft. Another area for the application of efficient coding is where pictures are stored in a database, such as archiviqg medical images, multispectral images, finger prints, and drawings [ 13 and [2]. The vector quantization algorithms for reducing the trans- mission bit rate or the storage have recently been extensively investigated for speech and image signals [8] and [ 1 11. In this paper, only the applications of vector quantization to image coding are discussed; the contents are thus restricted to coding techniques that employ vector quantizers (VQ’s). Recently, there have been several review articles on general technique of image compression [3]-[6]. In particular, vector quantization has been applied to speech coding for a number of years [7]- [lo], and Gray has recently published an excellent review paper on this subject [ 1 11. A fundamental goal of data compression is to reduce the bit rate for transmission or data storage while maintaining an acceptable fidelity or image quality. Numerous bandwidth compression techniques have been developed, such as differ- ential pulse code modulation, transform coding, hybrid coding, and adaptive versions of these techniques in response to the growth of image-processing methods. These techniques usually exploit the psychovisual as well as statistical redundan- cies in the image data to reduce the bit rate [12]. One deficiency with all of the conventional coding techniques is that quantization is performed on individual real-valued samples of waveforms or pixels of images. For example, transform coding does this by first taking block transforms of a vector and then scalar quantizing the transformed samples within the vector. Predictive coding does it by coding an error term formed as the difference between the current sample and its prediction. These techniques are not optimal since the processed samples are still somehow correlated or dependent. Paper approved by the Editor for Signal Processing and Communication Electronics of the IEEE Communications Society. Manuscript received October 12, 1984; revised June 16, 1987. This paper was presented at the IEEE Acoustics, Speech, and Signal Processing Conference 1985, Tampa, FL, March 26-29, 1985. N. M. Nasrabadi is with the Department of Electrical Engineering, Atwater Kent Laboratories, Worcester Polytechnic Institute, MA 01609. R. A. King is with the Department of Electrical Engineering, Imperial College of Science and Technology, London, SW7, England. IEEE Log Number 8822470. According to Shannon’s rate-distortion theory, a better per- formance is always achievable in theory by coding vectors instead of scalars, even though the data source is memoryless. This paper is not intended as a survey of the basic vector quantization design algorithms or the searching techniques, but rather as a demonstration of the application of vector quantizers to image coding. A definition of the vector quantizer is given followed by its applications to images in the spatial domain, the predictive domain, the transform domain, and combinations of these known as hybrid domains. 11. DEFINITION OF VECTOR QUANTIZATION Extensive studies of vector quantizers or multidimensional quantizers has recently been performed by many researchers [13]-[17]. The design of optimal vector quantizers from empirical data were proposed and extensively studied by Linde, Buzo, and Gray [9] using a clustering approach which is discussed in Section 11-A. This algorithm is now commonly referred to as the LBG algorithm. A vector quantizer can be defined as a mapping Q of K-dimensional Euclidean space R into a finite subset Y of RK. Thus, Q: RK+Y where Y = (fi; i = 1, 2, *, N) is the set of reproduction vectors and N the number of vectors in Y. It can also be seen as a combination of two functions: an encoder, which views the input vector x and generates the address of the reproduc- tion vector specified by Q(x), and a decoder, which uses this address to generate the reproduction vector 2. If a distortion measure d(x, 2) which represents the penalty or cost associated with reproducing vectors x by 2 is defined, then the best mapping Q is the one which minimizes d(x, 2) [13]. The LBG algorithm and other variations of this algorithm are based upon this minimization, using a training set as the signal. One simple distortion measure for waveform coding is the square error distortion given by K- I d(x, a)= I(x-a((z= (xj-a;). (2) j=O A weighted mean square error (WMSE) distortion can also be used [37]. Other error distortion measures have also been suggested, but they are too expensive computationally [18] for practical implementation. One problem with vector quantiza- tion is that a large effort is required by the encoder to search the whole codebook in order to identify the nearest matching vector template to an input vector. Recently, Buzo et af. [7] have proposed a tree searched (TSVQ) encoder in order to reduce the search effort of a vector quantizer. A TSVQ encoder can be explained as an encoder that searches a sequence of small codebooks instead of one large codebook. The encoder structure can be depicted as a tree and each search and decision corresponds to advancing one level or stage in the tree, starting from the root of the tree. A detailed description of TSVQ’s and TSVQ design algorithms may be found in [ 191 and [20]. The disadvantages of TSVQ encoders are that their 0090-6778/88/08OO-0957$01 .OO 0 1988 IEEE 958 Table Look up Channel IEEE TRANSACTIONS ON COMMUNICATIONS, VOL. 36, NO. 8, AUGUST 1988 Output - codebook storage requirement is greater than full-search VQ’s, and the codewords they select are not, in general, optimum in the sense of minimizing d(x, a). Other vector quantization techniques that reduce memory as well as encoding complexity can be found in [21]-[22] and [109]- [110] where Fischer designed a quantizer (Pyramid Vector Quantizer) based on the implicit geometry of an independent and identically distributed Laplacian source. A. Vector Quantizer Design The goal in designing an optimal vector quantizer is to obtain a quantizer consisting of N reproduction vectors, such that it minimizes the expected distortion. Optimality is said to be achieved if there is no other quantizer that can achieve the minimum expected distortion. Lloyd [ 171 proposed an itera- tive nonvariational technique known as his “Method I” for the design of scalar quantizers. Recently, Linde et al. [9] extended Lloyds’ basic approach to the general case of vector quanti- zers. Let the expected distortion be approximated by the time- averaged square error distortion given by the expression xxxx xxxx xxxx ~ xxxx (3) xxxx xxxx xxxx xxxx The algorithm for an unknown distribution training sequence is given in [9]. 1) Let N = number of levels; distortion threshokd E 2 0. Assume an initial N level reproduction alphabet Ao, and a training sequence (xi; j = 0, 1, * e, n - l), and rn = number of ite_rations, set to zero. 2) Given A, = (yi;-i = I, * * e, N), find the minimum distortion partition P(A,) = (Si; i = 1, -*e, N) of the training sequence: xj E S, if d(xj, yi) 5 d(xj, y/), for all 1. Compute the average distortion n-l D,=D[(A,, ~(A,))l=n- 1 min d(xj, y). (4) 3) If (Dm- - D,)/D, 5 E, stop the iteration with A, as 4) Find the optimal reproduction alphabet f(P(A,)) = YEA, j=O the final reproduction alphabet; otherwise continue. (a($); i = 1, * e, N) for P(A,) where Codebook (a) lm 2(S;)=- xj. IISill j:xjesi Codebook (ROW 5) Set a,, I = a( j, = 1, * . , n] to be classified into the appropriate class, [x:; J = 1, * . * , d, i = 1, . * , MI; each codebook Ci was designed by using the training sequence that belonged to that class xf . The LBG algorithm was used to minimize the class distortion di(X, Y) where x and y were restricted to the corresponding training sequence x and the codebook Ci. Fig. 6 shows the coding procedure. Gradient edge operators were employed on the edge enhanced images for classification of each vector image. By increasing the size of the edge codebook, they noticed the quality of the images was seen to be greatly improved. Finally, Ramamurthi and Gersho extended the above CVQ technique to include different codebooks and appropriate coding processes for each class. For example, the blocks that fall within the midrange class, which represents moderate intensity variation but excluding any definite edges, are two- dimensional transformed by using a discrete cosine transform 960 uw IEEE TRANSACTIONS ON COMMUNICATIONS, VOL. 36, NO. 8, AUGUST 1988 Mean A Side Mean Change Information Detection Codebook 7 Label Memory + +I Side Label Change Information Detection Fig. 7. The block diagram of a codebook replenishment system [MI. (DCT). A number of high-frequency coefficients are discarded (zonal sampling) [38] thereby reducing the VQ computational complexity in exchange for a small increase in distortion. The distortion is negligible because most of the energy is carried by the low frequency coefficients, especially for the blocks that belong to the midrange class. For the blocks belonging to the edge class, the separate-mean VQ coder of Baker and Gray [23] is employed because the edge blocks are similar in edge orientation and location and differ only in the average intensity level; these can be represented by codebook vectors that are normalized by the mean. These vector quantization techniques can be extended to multidimensional vector quantizers for coding moving images, but the storage for the codebooks as well as the computational complexity will be exponentially increased. C. Codebook Replenishment VQ Goldberg et al. [39]-[40] introduced an adaptive vector quantizer for the coding of monochromatic and color pictures. In their system, adaptivity was based upon computing and transmitting a small codebook that matched the local statistics of the image to be coded. Fig. 7 shows the block diagram of the proposed adaptive vector quantization system. The image is subdivided into nonoverlapping subimages, and for each subimage, a separate codebook consisting of about 16-64 representative vectors is formed. For each subimage, the locally generated representative vectors of the codebook are transmitted, followed by the codewords or labels for the vectors of the image. Images coded by this technique produce visually acceptable pictures falling between 1- 1.5 bit-pixel. The block size of 2 x 2 was employed in their algorithm. One problem with this method is that the redundancy between different codebooks used in the different subimages is not removed as indicated by the authors [40]. The extension of this technique to color images is discussed in Section VIII. The major problem with this technique is that the new codebooks must be sent to the receiver involving the transformation of a large amount of side information and also needing a large computational effort for designing the new codebooks. Gersho and Yano 1411 developed a similar algorithm in which adaptation was applied to change only a small part of the codebook rather than the entire codebook. The algorithm monitors the distortion for each input vector and if it is larger than a predetermined threshold, that input vector is added to the codebook as a new vector. $H 31 39 M 58 Fig. 8. A typical quad-tree representation of an image [43]. D. Hierarchical VQ An important point that has been ignored in adaptive SVQ is that the block size is constant throughout the encoding process. A better technique which could lower the bit rate significantly is to use variable block length. One such technique was introduced by Nasrabadi [42] called adaptive hierarchical VC (AHVQ). In this coding technique, a quad-tree algorithm [43] is first used to partition the image into blocks of size 2 X 2, 4 x 4, 8 x 8, and 16 x 16. This information about partitioning the image is represented by a quad-tree and is counted as the side information to be transmitted. Fig. 8 shows a typical quad-tree representation. In the coding process, the small blocks, 2 x 2 and 4 x 4, are coded by using a typical SVQ. The larger blocks representing constant patterns are encoded by applying VQ in the transform domain. Many of the high- frequency coefficients can be discarded and thus the effective dimension of the blocks is reduced for computational pur- poses. A similar but one-dimensional algorithm was also proposed for speech 1441. Recently, Vaisey and Gersho 1451 implemented such a variable block-size vector quantizer by using a standard split- merging segmentation technique to partition the image into subblocks. Two approaches were studied for vector quantiza- tion of images. In the first approach, vector quantizer was used as well as a transform VQ, and in the second approach an adaptive vector quantizer was used where the adaptivity was based upon updating or replenishing the codebook by vector patterns that represent the statistics of the current image more efficiently. E. Interframe VQ In an image sequence, successive frames are usually very highly correlated. A simple interframe coding system that only transmits the intensity of the pixels that change between successive frames is known as a frame replenishment coding system. Recently, Goldberg and Sun [39]-[40] introduced several interframe coding systems where the ideas of label replenishment and codebook replenishment were incorporated in their coding system. Fig. 9 shows the schematic diagram for a label replenishment VQ system. In this system, each frame is first divided into two-dimensional blocks, and vectors are then extracted from the resulting blocks either directly in the spatial domain or indirectly through the transform domain. The vectors corresponding to the first frame are taken as the training set for generating the first codebook. The first frame is quantized, and the labels are stored in the label memory and transmitted as indicated by dashed line 2. Each subsequent frame is subdivided into two-dimensional blocks and each vector is quantized. The label is then compared to the one in the labe