# 1985FSVQ-Finite-state vector quantization for waveform coding.pdf

348 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. IT-31, NO. 3, MAY 1985 Finite-State Vector Quantization for Waveform Coding JOHN FOSTER, MEMBER, IEEE, ROBERT M. GRAY, FELLOW, IEEE, AND MAR1 OSTENDORF DUNHAM Abstract-A finite-state vector quantizer is a finite-state machine used for data compression: Each successive sonrce vector is encoded into a codeword using a minimum distortion rule, and into a code book, depend- ing on the encoder state. The current state and the selected codeword then determine the next encoder state. A finite-state vector quantizer is capable of making better use of the memory in a source than is an ordinary memoryless vector quantizer of the same dimension or blocklength. Design techniques are introduced for finite-state vector quantizers that combine ad hoc algorithms with an algorithm for the design of memoryless vector quantizers. Finite-state vector quantizers are designed and simulated for Gauss-Markov sources and sampled speech data, and the resulting perfor- mance and storage requirements are compared with ordinary memoryless vector quantization. I. INTR~DUOTI~N A FINITE-STATE vector quantizer (FSVQ) is an ex- ample of a finite-state machine for data compression. It can be viewed as a finite collection of ordinary vector quantizers (VQ’s) or block source codes, where each succes- sive source vector is encoded using a VQ determined by the current encoder state. The current state and the codeword selected then determine the next encoder state. Unlike an ordinary memoryless VQ, an FSVQ can take advantage of the memory or correlation between successive source vec- tors by choosing the appropriate code or VQ, given the past behavior. Finite-state machines for vector quantization are not new: ordinary VQ’s and sliding-block or sliding-window codes are special cases of finite-state machines. Trellis encoders are also examples of finite-state machines where the encoder is permitted to look ahead of the current source vector. More general finite-state machine models were considered by Ziv [l] and Gaardner and Slepian [2]. Ziv focused on rate-distortion theory for such codes, and Gaarder and Slepian developed several optimality proper- ties and some coding theorems for such codes. Gaarder Manuscript received June 27, 1983; revised June 18, 1984. This work was supported in part by a Bell Laboratories Cooperative Research Fellowship, the National Science Foundation, the U.S. Army Research Office, and the Joint Services Electronics Program at Stanford University. The material in this paper was presented in part at the 1982 IEEE International Symposium on Information Theory in Les Arcs, France, June 1982. J. Foster was with Bell Laboratories, Holmdel, NJ, and the Information Systems Laboratory, Stanford University, Stanford, CA. He is now with the Tuskegee Institute, Tuskegee, AL 36088, USA. R. M. Gray is with the Information Systems Laboratory, Stanford University, Stanford, CA 94305, USA. M.O. Dunham is with BBN Laboratories, Inc., Cambridge, MA, 02238, USA. and Slepian also considered several properties of the exam- ple of a sliding-block or sliding-window code for a Gauss-Markov source. Unlike such general finite-state machines, finite-state vector quantizers assume an encoder that uses the minimum distortion rule to select a codeword from a state code book, that is, it operates as a quantizer, but at each time the quantizer may be different. Thus, finite-state vector quantizers are a special case of feedback quantizers, as introduced by Kieffer [3], as a model for predictive quantization schemes. We introduce a class of finite-state machines for data compression called finite-state vector quantizers or FSVQs. These codes are not as general as the basic models of [l] and [2], but they have an intuitive structure and their similarity to ordinary VQ’s motivates a design algorithm that is the principle contribution of this work. In the language of [2], our finite-state machine codes are of the “tracking finite-state system” variety, this is, the new en- coder state is a function of the previous encoder state and the selected channel codeword rather than a more general function of the previous state and the current input vector. This permits the decoder to track the encoder state if it knows the initial condition (and the channel is noiseless). General FSVQ’s have two principal differences from ordinary memoryless VQ’s. First, the code book may vary with time as a function of the encoder state. This resembles a universal source code or a finite-choice adaptive code, but here no side information is transmitted to specify the code book selected; such information must be inferred from the received sequence of channel symbols and an initial state. Second, the minimum distortion selection rule is not necessarily optimum for a given decoder since a low distortion codeword may lead to a bad state and hence to poor long-term behavior. Thus the encoder mapping is not necessarily optimal for the decoder in any long-run sense. It is, however, a reasonable heuristic. In addition, the problem of selecting a bad state by this encoder can be minimized by good design. No new Shannon theory is required by FSVQ’s. Since FSVQ’s contain ordinary block source codes (memoryless VQ’s) as a special case, the usual positive source coding theorem relative to a distortion measure applies. Since they are special cases of finite-state machines, the negative coding theorems of [l] and [2] apply. In addition, since asymptotically mean stationary (ams) sources driving finite-state machines yield ams input/output processes 001%9448/85/0500-0348$01.00 01985 IEEE FOSTER et d. : FINITE-STATE VECTOR QUANTIZATION 349 [4]-[7], the converse for ams sources [7] applies. Regard- less, the optimal performance that is theoretically achiev- able is given by the Shannon rate-distortion function of the source and distortion measure, given an ams ergodic source and a sufficiently well-behaved distortion measure. In con- trast to [l] and [2], the goal of this work is to present the development and simulation of design algorithms for the class of codes considered. A finite-state vector quantizer is not the same as the finite-state Markov models now popular for use in isolated utterance and continuous speech recognition systems. (See, e.g., [S]-[ll]). We pause to discuss these systems since we are aware of some confusion concerning their similarities and differences. In these recognitions systems, a memory- less VQ is first used as a front end (or acoustic processor) to encode speech parameters, e.g., selected spectral coeffi- cients or LPC parameter vectors. An algorithm such as the forward-backward algorithm is then used to fit a “hidden Markov model” of a given structure to the observed VQ outputs, that is, to construct a Markov model for the observed compressed data [8]. The estimation algorithm used to find the parameters of the Markov source that best matches the observed training data is approximately a maximum likelihood algorithm if one assumes the given model. In particular, the estimation algorithm tries to match the observed relative frequencies to those that would be produced by the Markov model. Once the model is found (in the design or training stage), it is then used in a linguistic decoder to perform approximate maximum likeli- hood recognition on the speech data. In the recognition problem the quantization is in the memoryless VQ; the modeling is inherently a continuous estimation problem in which one tries to find transition probabilities that best explain the VQ output observations. No compression is involved in the hidden Markov modeling; the compression is performed prior to the modeling in the front end. On the other hand, an FSVQ is a compression device and hence a possible alternative to the VQ as a front end of such a recognition system. No Markovian assumptions are made, nor are transition probabilities estimated; the machine is designed to minimize average distortion be- tween its input and output. Waveforms are matched in the sense of minimizing distortion, not in the sense of match- ing relative frequencies to assumed model distributions. In short, FSVQ codewords are designed given a finite-state quantizer structure, whereas the hidden Markov model structure is designed given a sequence of VQ codewords. It should also be observed that the output of the finite-state machine is not a Markov source in general (unless the process being compressed is memoryless). There is a tempting parallel between modeling a source and building a good FSVQ for the source in the sense that both the model outputs and the possible FSVQ outputs should “look like” the typical source outputs. However, this view has not yet yielded any design methods for FSVQ, probably because the notions of “look like” and the fundamental goals are quite different. In addition, the hidden Markov techniques have so far concentrated on coding LPC or spectral parameter vectors; our goal is to code waveforms. (FSVQ for LPC parameter vectors is developed in [12].) In the next section we define and discuss tracking finite- state source codes and FSVQ’s. Then two different struc- tures are considered for FSVQ, the difference being in how the reproduction codewords relate to the states of the decoder. The decoder finite-state machine can either be a Moore machine [13] with its outputs, the reproduction codewords, associated with the states (or nodes), or it can be a Mealy machine [13] with its outputs associated with the transitions (or branches). We shall refer to these two structures as labeled-state and labeled-transition FSVQ’s, respectively. The two structures are equivalent in the sense that Mealy and Moore finite-state machines are equivalent [13]. That is, given an FSVQ of one form, one can find an FSVQ of the other form that is equivalent in the sense that a given initial state and input sequence will yield the same output sequences, except possibly for the initial state out- put. Thus the two forms of FSVQ are capable of the same rate-distortion performance. The complexity and storage requirements of the two forms may be quite different in different applications, however, and one form or the other may be preferable in certain cases. In addition, iterative design algorithms such as those developed here may pro- duce quite different final codes even if run on equivalent initial codes. In the subsequent section we introduce several design algorithms for FSVQ’s. These algorithms combine a basic VQ design algorithm (see, e.g., [14]) with various ad hoc algorithms for selecting the next-state functions or state transition functions. The basic approach was motivated by the preliminary tests of Rebolledo [15], who observed that in voice coding (LPC) VQ systems, each codeword was virtually always followed by one of a small subset of codewords, that is, all codewords were almost certainly followed by a small subcode. Thus, a lower rate should suffice for the same performance if successors to a code- word are restricted to lie in a small subcode. The problem, of course, is that in the rare instances where the source changes in an unusual way, the input vector may not be well reproduced by the available subcode. In this case there may be no transition to an appropriate state and the coder can become derailed. A useful design technique must pro- duce codes that can find their way back to good reproduc- tions if so derailed. The next section is devoted to simulation studies for two different sources: a Gauss-Markov source and a sampled speech source. The first is useful because it is a popular test source for data compression systems and since its rate-dis- tortion performance bounds are known. The second is of more practical interest because of the increasing impor- tance of digital voice systems. The FSVQ’s are compared to ordinary VQ’s on the basis of performance as measured by both signal-to-(quantization)-noise ratio (SNR) and com- plexity. Further comparisons with other feedback quan- tizers, such as Cuperman and Gersho’s vector predictive quantizer [16], [17], may be found in the general survey 350 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. IT-31, NO. 3, MAY 1985 paper [18]. The closing section provides some conclusions and some suggestions for future research. II. TRACKING FINITE-STATE SOURCE CODES Define a state space S as a finite collection of symbols S = {a,; i = 0, 1,2; * *, K - l} called states. Fix a dimen- sion k, called the code block length, and let Rk denote k-dimensional Euclidean space. Each input source vector x E Rk is to be encoded into a channel symbol for com- munication or storage. For convenience we denote the alphabet of channel symbols as N = (0, 1,2; . ., N - l}. The rate of the coding is 1ogN bits per input vector or k-’ log N bits per source symbol (or sample), where all logarithms are to base 2. Often it is convenient to consider only integer rates in bits per vector and hence to consider N = (0, l} R, where R is the rate in bits per vector. In this case the channel symbols are simply binary vectors. A finite-state code is used to encode a sequence of vectors or a random process {X,; n = 0, 1,2, . . . } as follows. Given an initial state S, = a0 E S, the channel symbol sequence {U,; n = 0, 1,2, . * - }, the state sequence {S,; n = 0,1,2, *** }, and the reproduction sequence { Xn; n = 0, 1,2, . - - } are defined recursively for n = 0, 1, . - . as III. FINITE-STATE VECTOR QUANTIZERS What makes a quantizer a special case of a finite-state machine is the use of a minimum distortion encoding rule. Hence we next introduce a distortion measure. Given a source vector x = (xi, x2,. . . , xk) and a reproduction vec- tor P = (R1,322,., a,), the distortion d(x, 3) between x and f is defined as the mean-squared error A finite-state encoder is a mapping LY: Rk X S + N, that is, given an input source vector x E Rk and a state s E S, then the vector x is encoded into the channel symbol or channel codeword (Y(x, s) E N. The encoder works in conjunction with a next-state function, a mapping f: N X S + S. If the encoder has a current state s E S, views an input vector x E Rk, and produces a codeword u = (Y(x, s), then the next encoder state is f(u, s). The encoder and next-state function together describe a finite- state machine, which we will call the encoding machine. d( x, 5) = (Ix - f112 = ; (xi - ii)2, i=l where )I . ]I denotes the Euclidean norm. As with ordinary VQ’s, other distortion measures can be considered, but we shall see that we do need more than the principal require- ment of ordinary VQ, the existence of a generalized centroid with respect to the distortion measure. A finite-state decoder is a mapping /3: N X S + Rk, that is, given a channel symbol u E N and a state s, the channel symbol u is decoded into a reproduction vector f = /qu,s) = p(a(x,s,s). The decoder and next-state function together form a finite-state machine that we