# 1982MSVQ.pdf

Multiple Stage Vector Quantization for Speech CodingBiing—Hwang JuangA.H. Gray, Jr.Signal Technology Inc., 15 W. De La Guerra, Santa Barbara, CA 93101 ABSTRACTInthispaper,we present a multiplestagevectorquantization technique which allows easyexpansion of the original vector quantizer designto operate at higher bit rates for lower distor-tion. The computation and storage reduction isachieved by the fact that the overall requirements are the sum of the requirenierts of each stage ins-tead of an exponentially increasing function ofthe bit rate as in the original one stage design.In the case of Euclidean distance measures suchas the log area ratio measure, experimental re—suits show that the quantizer performance is veryclose to a theoretically predicted asymptoticallyoptimal rate distortion relationship. I. IntroductionIt has been shown [1,2,3] that vector quanti—zation is a very efficient quantization method,particularly for speech coding. An 800bps LPC vo-coder based upon vector quantization scheme hasbeen successfully implemented [3] and demonstratedto be able to achieve a DRT score of 80 percent. However, practical limitations of the basic vectorquantization scheme do exist when lower distortionis desired.The computation and storage requirements arethe major difficulty in implementing a vector quan—tizer. In the full—search algorithm, the computa-tion and storage complexity is an exponential func-tion of the number of bits used in quantizing each frame of spectral information. For example, in 10-bit vector quantization, the codebook consists of1024 codewords and for each input vector 1024 spec-tral comparisons (distortion evaluations) need tobe carried out in order to find the closest one.For 10th order all pole filters, this requires11,264 multiply/adds for each input vector. For aframe rate of 50 per second, this requires 563,200 multiply/adds per second. The storage for the quan-tizer codebook in this case is 11,264 words. Thesenumbers increase exponentially when a higher bitrate (per frame) or lower distortion is desired.An M—ary search approach has been proposed toeffectively reduce the computation [2,3]. However,these M-ary search procedures ultimately producesuboptimal codebooks with higher distortion and larger transmitter codebook size than the fulrsearch approach. Thus, depending on the capacityof the computer employed, the computation and/orstorage complexity imposes on the scheme an im-plementation barrier which prohibits one from ach- 597 ieveing an arbitrary low average distortion withthe vector quantization technique. In situationsthat require higher speech quality (but have lessconstraint on the bit rate), a modification of thevector quantization technique is necessary to over-come the implementation difficulties. Furthermore,as the MRP (multi-rate processor) has attracted in-terest recently, a technique which can fill in the bit rate gap between 800 and 2400 bps with perform-ance very close to the existing 2400bps LPC-10 sys-tems would be very desirable.In this paper, we briefly review the asympto-tically optimalvector quantization theory [4] andbasedon the theoretical development propose amultiple tage vector quantizer that allows easyexpansion of the vector quantization scheme for lower distortion without incurring prohibitive com-putation and storage complexity. The experimentalresult obtained from a full scale data base test-ifies that such a scheme achieves a rate-distortionrelationship very close to the theoretically deriv-ed performance of an asymptotically optimal vectorquantizer. II.AsymptOticallyOptimal Vector QuantizationConsider an rn—dimensional random vector x withjoint density p(x)=p(xl,xZ,. .,x).An N-point vectorquantizer q mapsi ininto one of N output vectorpoints j,n,, YNbelonging to Rm. The quan-tizer is specifiedThy te output vectors which arecontained in the codebook and by a partition of the space Rm into Ndisj1int and exhaustive regions H1H2,,HN where H1=q ()cR. Thenq() = ifxeH for i=1,2,,N.The performance of such a quantizer is measured bythe distortion dr =Ex-q(x)r (1)where 1?denotesthe usOal L norm, r the power ofthe norm, and E the expectatin.For large N, with the approximations that theoutput point density function x(y) is continuous andthatP() = for xe H1the distortion (1) can be expressed as [4]dr =NC(m,r)fp()x)d (3)where =r/m and C(m,r) is the quantization coeffi— CH 17467I82JO00D -0597 $ 00.75 ? 1982 IEEE (2) N.SIJ H.i=1 = (q+q )(x)= =1 and fl if(1) Thus, if+2?are in H. thenV(H +.)/V(H)N/N for j=1,2, ,n. (8)The same result holds for each H., i=1,2, ,N. Itthen follows that when the numbe of output pointsis increased from N to N , each region H will have an equal number of output points, i.e. n N /N.Also, if n is large, 4, +2 ? y may be ob-tained locally through auniorm quantt2ation on Halone except those points associated with regionstouching the boundary of H. This edge effect whichmay cause degradation in performance (to worse than(5)) is expected to be small for very large n.Based upon the above developments, a class of quantization schemes is thus proposed. Followingthe implementable N—point asymptotically optimalvector quantizer, define a difference vector spaceS as (9)where i.={s; s=x- for x in H}. An (N /N)—point vector iuantizer q is Then designed on S result-ing in output points vj and associated regions .,i=1,2,,(N /N),i.e. 1= q (s) ifoisinThe overall quantizer (q+q ) is then defined as cient depending on m and r only. Note that A(y) isnormalized such that fA(yjdyL overis unity.A lower bound on the integral in (3) can beobtained using Holder s inequality [5] IIp(y)Il1/(].+)where l1p()!When x(y) is proportional to ()1I (1), the equa-lity is attained and the minimum distortion di(N)is thendr N —Cm N 5mm — ,r, py_,In designing such an asymptotically optimalquantizer using Lloyd s algorithm [61, it is clearthat a large amount of computation and storage isrequired to achieve a very low distortion. A quan-tizer design whichrhas performance very close tothe relationship d “N without incurring prohibi- tive computational as well as storage complexity isthus the present focus of interest.rn. Multiple Stage Vector QuantizerSuppose there is an N-point vector quantizerwhere N is sufficiently large, but within the im—pleinentation limit, such that the performance fol- lows (5). If a smaller distortion is desired, saydr(N ), it is obvious that N must be greater thanN. The problem is how to achieve a distortion closEto d. (N ) when N —point optimal vector quantizercan Y be practically implemented with Lloyd smethod.One important fact about the quantizer basedon (1) is that the performance is translation in- variant. Supposerepresents the point where thecoordinate origin Fias been shifted to. The distor-tion (1) will remain the same upon translation ofcoordinate sincedr =Ex-q(Ir =Ej(-0)-fq()-0}jlrNext consider two “asymptotically optimal“ quantizers of N-point and N -point output vectorsrespectively. N N. Each quantizer has its corres-ponding output points, output point density, andpartition, i.e.N—point quantizer:output points: ,,, ,outputdensity: x(L7partition: H1, H2,, HN N —point quantizér:output points: j,output density: x ()partition: Hi, H ,, H.Assuming that {} is proerly ordered such that,,ar in H1, ÷1a+2 “ ;+b are inHf,, and so on. The volume associated with everybunded region H and H can be written as V(H.)1/{NA(m)}for i=1,2,,N(6)and 1V(H)1/{N x (y)} for i=1,2,,N (7)respectively. The assumption of (2) suggests thatifis in H, since fpx d= fAd if x is in H and 5=x-. is inThe scheme is expandable to multiple stages, ifnecessary. -Notethat since each Hj is a uniforly distri-buted region, the intersection Z of all H,Nz =i =1is also uniformly distributed. By neglecting theedge problem, the resultant output pointswhere v. is in Z, can be properly selected o oin-cide wT h some +k obtained by an optimal proce-dure if the latter were implementable, because theyare consistent with uniform quantization. In theextreme case whereZ=H1H2= .=HF the quantizer (q+q ) is an asymptotically optimalN -point vector quantizer. In general, the N JN—point combination outputs are not necessarily op-timal. As will be shown in the next section, how-ever, the combination output points so designedachieve an average distortion that is only slight-ly inferior to the optimal result.Although such a quantizer q+q is suboptimal, the savings in computation and storage it providesis significant. For an overall N —point vector quan—tizer, the 2—stage scheme requires only N+(N /N)codewords storage as opposed to the N codewordsoriginally required. The same number holds for thedistortion evaluation requirements in quantizingeach input vector. The minimum requirement occurs 598 at N=(N )l/2 corresponding to half of the overallnumber of bits.Block diagrams of a two—stage vector quantiza—tion configuration according to the above schemeare shown in Fig. 1. In the training phase (Fig.la)a training vector sequence is provided and a (1st)codebook C of N entries is generated according tothe algorithm presented in 11]. The entire training sequence is then quantized fn q using the nearestneighbor principle and the first codebook C. Foreach input x in the training sequence, a corres-ponding difference vector 6. is obtained by S.=where .isthe nearet point of x1 in ?.Adifference vetor training sequence is thus estab-lished for the second stage training purpose. Asecond codebook C with N /N entries is further obtained through the same procedure as the firststage using the difference vector training se-quence. Note that the same scheme can be expand-ed to several stages with one codebook for eachstage.During quantization (Fig.lb), the two code-booksCand C and the nearest neighbor principledetermine the two quantizersq and q . Each input x is first quantized by q resulting in a quantizedoutput q(x). q(x) has an index of j in C. Thedifference vector 6=x-q(x) is then further quan-tized by q . The result of the second stage quan-tization is the output q () with index j in C .The reproduction x is simply the sum of the twoquantized outputsx= For transmission or storage, indices j and j areused. The bit rate is log2N+log2(N /N)=log2N pervector. Fig. la Block Diagram of CodebookTraining for 2—stage VectorOuantization. Fig. lb Block Diagram of a 2—stageVectorQuantizer. 599 IV. Experimental ResultsAs with any vector quantization scheme report-ed previously, the codebooks need to be generatedbased upon a training sequence. In the study, thetraining speech sequence was gathered from 30 mm.of recording of 7 male and 3 female speech (3 mm.talker). It was then processed to generate 40,196 voiced and 20,167 unvoiced vectors after eliminat-ing over 7 minutes of silent (background noise)segments.Since the quantizer performance will be mea-sured by (1) with r=2, the mean square error cri-terion, we employed the log area ratio measure inthe experiment. The log area ratio measure has itstheoretical background in spectral sensitivity studies [e.g. 7] and is defined asNdAR(1/A,1/A ) =E(v—vfl2i=1where 1/A(z) and 1/A (z) are two Mth order (in theexperiment, M=10) all pole filtersharacterizedby two reflection coefficient vectors k=(k1 k2kM) and k =(kJ k k)respectively,and =1og{(1k1)/(l+k)},=logf(1—k)f(l+k)}the logarea ratio coefficients, Note that the logarearatio measure is not the only mean square errormeasure available. Other measures such as the trun-cated cepstral measure may be equally applied. However, the truncated cepstral measure has poten-tial filter stability problem and may be less de-sirable.For the current study, we conducted the experi-ment with only two stages. The performance of thefirst stage vector quantizer is evaluated up to 10bits. For the second stage, the 10—bit codebook ge-nerated in the first stage was used to obtain the difference training data. The same procedure wasagain applied on the difference training data andthe performance was further evaluated for another10 bits. This provides us one typical performancecurve of the two stage vector quantizer in the rangeof 1 bit to 20 bits. The results, log2d versusthe number of bits, are plotted in Fig. 2a and 2bfor voiced and unvoiced data respectively. In each figure, an approximately linear curvewith slope -0.2 can be seen beyond 13 bits. Notethat (5) can be expressed aslog2d1(N) =Kq—1og2N (10)where K is a function of the quantization coeffi-cient C(m,r) and the joint probability density p(x), independent of the number of output pointsN. Eq.(10) shows an asymptotically linear rela-tionship between the base 2 logarithm of distor-tion and the number of bits. In the experiment, m=N10 and r2 giving =r/m=0.2. Thus the 2-stagequantizer designed for the current training se-quence has a performance very close to the expectedasymptotic result in the sense that they achieve approximately the same incremental gain. A signi-ficant difference is that a discontinuity occurs atthe stage transition, from 10 bits to 11 or 12 bitsfor the 2—stage curve. Such a degradation is due to TRRNjNG CODE000KVECTORS GENERATORA CLU500RTI(PROCEDURE the fact that N /N (=2 or 4) is not large enough,as explained above. However, considering the tre-mendous reduction in computation and storage re-quirements, the 2—stage vector quantizer is anextremely viable alternative to the fully optimalquantizer Cwhichis not implementable for a code-book larger than 12 to 14 bits in size.) V. ConclusionA multiple stage vector quantization schemebasedupon the development of an asymptoticallyoptimal vector quantization theory has been des-cribed. Thescheme allows easy expansion of avector quantizer in order to obtain desired levelsof distortion. The computation and storage is sig- nificantly lower than the one stage approach be-cause the overall complexity is a sum of the two(or multiple) component stages and not an expo-nentially increasing function of the bit rate asin the one stage design. Experimental results in-dicate that the incremental performance gain ofthe scheme is in fact very close to the theoreti-cally derived asymptotically optimal rate-distor- tion performance.REFERENCES1.B.H. Juang, D.Y. Worig, and A.H. Gray, Jr., “Re-cent Development in Vector Ouantization forSpeech Processing,“ ICASSP-81 Proceedings, pp.1-4, Atlanta, March 1981. 2. B.H. Juang, D.Y. Wong, and A.H. Gray, Jr., “Dis—tortion Performance of Vector Duantization forLPC Voice Coding,“ to be published in IEEE ASSP.3. D.Y. Wong, B.H. Juang, and A.H. Gray, WT“An800bpsVector Quantization LPCVocoder,“ to bepublishedin IEEE ASSP.4. A. Gersho, “Asymptotically Optimal Block Ouan—tization,“IEEE Trans. Inform.Theory, vol. IT— 25,pp. 373-380, Jul. 1979.5. M. Abramowitz and l.A. Stegun, Handbook of Mathe-inaticalFunctions, National Burea of Standards,Applied Math. Series 55, p. 11, Mar. 1965.6. B.H. Juang, Vector Quantization for Linear Pre-diction Voice Coding, Ph.D. Dissertation, Univ.of Calif., Santa Barbara, May 1981.7. R. Viswanatha