Study on Speech Compression and Decompression by using Discrete Wavelet Transform

Speech signal can be compressed and decompressed by discrete wavelet transform technique. Discrete wavelet transform compression is based on compressing speech signal by removing redundancies present in it. Speech compression is a technique to transform speech signal into compact form. Objective of compressing speech signal is to enhance transmission and storage capacity. The compression parameters in speech such as Signal to Noise Ratio (SNR), Peak Signal to Noise Ratio (PSNR), Normalized Root Mean Square Error (NRMSE), Compression Factor (CF) and Retained Signal Energy (RSE) are measured using Matlab.


INTRODUCTION
Speech compression is the process of representing a voice signal for efficient transmission or storage. The compressed speech can be sent over both band limited wire and wireless channels. The aim of speech compression is to represent the samples of a speech signal in a compact form thus having the less code symbols without degrading the quality of the speech signal [1]. The compressed speech is very important in cellular and mobile communication. It is also applied in voice over internet protocol (VOIP), videoconferencing, electronic toys, archiving, digital simultaneous voice and data (DSVD), numerous computer-based gaming and multimedia applications [2]. Speech signal is compressed by converting the signal data into a new format that requires less bits to transmit. There are two basic categories of compression techniques. The first category is lossless compression. Lossless compression methods achieve completely error free decompression of the original signal. The second category is lossy compression. A lossy compression method produces inaccuracies in the decompressed signal. Lossy techniques are used when these inaccuracies are so small as to be imperceptible. The advantage of lossy technique over lossless one is that much higher compression ratios can be attained. With wavelet compression method, the imperceptible inaccuracies can be found in the decompressed signal [3].
Wavelet analysis has the benefit of varying the window size. This means that wavelets can efficiently trade time resolution for frequency resolution and vice versa. Wavelets can adapt to various time-scales and perform local analysis. Furthermore, wavelets have the ability to detect characteristics of non-stationary signals due to their finite nature that describes local features. Wavelets have been widely applied to areas such as speech and image denoising and compression [4,5]. Wavelet compression is a form of predictive compression where the amount of noise in the data set can be estimated relative to the predictive function [6].
Speech compression is the technology of converting human speech into an efficiently encoded representation that can later be decoded to produce a close approximation of the original signal. Figure 1 shows the block diagram used for compression of the speech signal and reconstruction of the signal. Wavelet analysis is not a compression tool but a transformation to a domain that provides a different view of the data that is more suitable to compression than the original data itself. First the speech signal is decomposed into the wavelet transform coefficients. Then a threshold is calculated and applied to the wavelet coefficients. The small valued coefficients below a threshold are truncated to zero made an imperceptible to the signal. Signal compression is achieved by encoding the thresholder coefficients.
Many of the wavelet coefficients produced from the wavelet transform have an absolute value close to zero. These small valued coefficients are likely to attribute only small variations of the signal and contain a small percentage of the signal's total energy. These small coefficients can be discarded without a significant loss in the quality of the signal and more importantly of the interesting features. Thus, a threshold is required below which all coefficients will be discarded. The compressed signal is decoded. And then the decoded signal must be reconstructed by the inverse wavelet transform to get the original signal.
The rest of this paper is arranged as follows. In Section 2, speech compression using discrete wavelet transforms related literature to identify the key issues and summarize the experiences from various studies in different countries about the topic. In Section 3, we describe the data, the methodology and present related descriptive statistics. In Section 4, compression factors associated with fatigue driving and/or the severity of fatigue-related crashes are reported. Discussion of results is given in Section 5.

Speech Compression Using Discrete Wavelet
Transform Speech compression using wavelets is primarily linked to the relative scarceness of the wavelet domain representation for the signal. Wavelets concentrate speech information (energy and perception) into a few neighbouring coefficients. As a result of taking the wavelet transform of the signal, many coefficients will either be zero or have negligible magnitudes. Data compression is then achieved by treating small valued coefficients as insignificant data and discarding them. The choice of wavelet, decomposition level in the discrete wavelet transform, threshold criteria for the truncation of coefficients and encoding coefficients are investigated for the process of compressing speech signal.
In the wavelet transform compression, the signal can be transformed into a wavelet domain of the signal. All values of the transform coefficients which lie below some threshold value are set to zero. Only the significant, non-zero values of the transform coefficients can be transmitted. This should be a much smaller data set than the original signal. At the receiving end, the inverse wavelet transform of the transmitted data will be performed by assigning zero values to the insignificant values which were not transmitted. This decompression produces an approximation of the original signal [3,9]. The measurement of the compression parameters is evaluated in terms of Signal to Noise Ratio (SNR), Peak Signal to Noise Ratio (PSNR), Normalized Root Mean Square Error (NRMSE) and Compression Factor (CF). The source code for speech compression will be written by using Matlab. is the mean square of the speech signal and is the mean square difference between the original and reconstructed signal.

Peak Signal to Noise Ratio:
N is the length of the reconstructed signal, X is the maximum absolute square value of the signal x and is the energy of the difference between the original and reconstructed signal.

Normalized Root Mean Square Error:
is the speech signal, is the reconstructed signal, and is the mean of the speech signal.

Retained Signal Energy:
This indicates the amount of energy retained in the compressed signal as a percentage of the energy of original signal.
is the norm of the original signal and is the norm of the reconstructed signal. For one dimensional orthogonal wavelets the retained energy is equal to thenorm recovery performance.

Compression Factor:
It is the ratio of the original signal to the compressed signal. The value of compression factor greater than 1 indicates compression and less than 1 indicates expansion. I referred to previous theory in my research work.

Analytical Results
The mother wavelet chosen to compress speech signal is important as some wavelets offer better reconstruction quality and different compression ratios than others. However, there is no wavelet that gives the best results for all kinds of signals. The test signal is "Great, now we've got time to party". The test signal 'voice38kz.wav' is formed by converting the MP3 file of audio into wav file by 'wavesurfer' software. The 'voice38kz.wav' has 25913 sampled data with sampling frequency 8kHz.
Selecting mother wavelet is related to the amount of energy a wavelet basis function can concentrate into the first level approximation coefficients. The signal energy retained in the first N/2 transform coefficients is shown in Table 1 This energy is equivalent to the energy stored in the first level approximation coefficients. The higher the amount of energy in the first level approximation, the better is the wavelet for compression of the signal. The Haar and Daubechies (db2, db4, db6, db8, db10) wavelets concentrate more than 96% of the signal energy. Db10 wavelet concentrates 99.5949 % of energy into the first level approximation coefficients. Wavelets with many vanishing moments should be utilized for better reconstruction quality as less distortion and more signal energy concentration are introduced in the approximation coefficients. Wavelets with many vanishing moments are described with many coefficients in the scaling and wavelet functions. Thus, the computation of the wavelet transforms, the complexity of the algorithm and the output file size are increased. Figure 3 shows the flow chart of the program for compression of the speech signal. In this work, the six wavelets are chosen and compared for speech compression. Choosing a decomposition level for the discrete wavelet transform usually depends on the type of signal being analyzed. For processing speech signal no advantage is gained in going beyond level 5 [8]. After calculating the wavelet transform of the speech signal, compression involves truncating wavelet coefficients below a threshold. For the truncation of small valued transform coefficients, level dependent thresholding is used. Haar and Daubechies (db2, db4, db6, db8, db10) wavelets are used and compared against each other to measure the compression parameters for the speech signal. The signal is decomposed at scale 5 and level dependent threshold is applied. Figure 4 shows the flow chart of the program for decompression of the signal. The results of the compression parameters are shown in Table 2 to In decomposition level from 2 to 5, the SNR, PSNR values of db10 wavelet are not obviously higher than db8 wavelet. But db10 wavelet is better than db8. Thus, db10 wavelet gives the best result among other wavelets. RSE is the amount of energy retained in the compressed signal as a percentage of the energy of original signal. RSE is over 95% for decomposition level up to 3. The value of RSE is lesser in decomposing at scale 4 and the least value at scale 5. The compression factor and the % of zero coefficients are increased with increase in decomposition level. Figure 6 shows the original signal. The comparison between the original signal and reconstructed signal using db10 wavelet is shown in Figure 7 for decomposition level 3, in Figure 8 for level 4 and in Figure 9 for level 5 respectively. Figure9. The comparison between the original signal and reconstructed signal at decomposition level 5

Results and Discussion
The reconstructed signal is written to an audio file by using 'audiowrite' function in Matlab. Listening test is carried out on each level of the reconstructed signal in audio file. The quality of reconstructed signal is very close to the original signal in the decomposition level 1 and 2. The quality of the signal is nearly close to the original signal in level 3. The quality of the reconstructed signal is bad at decomposition level 4 and 5. From the overall results the level 3 decomposition is suitable for this signal. At higher levels the approximation data is not as significant and hence does a poor job in approximating the input signal. The number of samples in the compressed signal with different wavelets are shown in Table 7.  19588  12606  7396  4116  2241  db2  19525  12357  7182  4014  2243  db4  19438  11978  6831  3844  2216  db6  19325  11924  6749  3792  2186  db8  19321  11902  6696  3799  2217  db10  19250  11817 6701 3817 2250 Figure 10 shows the results of compression parameters using db10 wavelet with different decomposition levels.
Figure10. The results of compression parameters using db10 wavelet with different decomposition levels The comparison result of the original signal and compressed signal is shown in Figure 11. The signal to noise ratio variation relative to compression factors using db10 wavelet is shown in Figure 12. The source code for the calculation of compression parameters is displayed in Matlab.
Figure 12.SNR and PSNR variation relative to compression factors using db10 wavelet

Conclusion
Speech compression is a solution to the problem of large amount of storage and bandlimited transmission. The discrete wavelet transform performs well in the compression of speech signal. The performance measurement results are obtained by using the Haar and Daubechies wavelets. The compressed signal can be reconstructed back to its original form with full audibility. A good reconstructed signal is the one with low MSE and high PSNR and SNR. This means that the signal has low error and high signal fidelity. Db10 wavelet has the high SNR , PSNR values and the low NRMSE as compared with other wavelets. SNR, PSNR, NRMSE, CF, RSE and % of zero coefficients are measured to evaluate the performance of the speech compression. Decomposition level at scale 3 is suitable for this signal. The measurement results are obtained by writing the source code in Matlab.The decomposition level for different types of speech will be chosen using wavelets.