Previous article
Next article
Research Article
01 July 2009
Authors: M. Farrús and J. Hernando
Get access
Abstract
Jitter and shimmer are measures of the fundamental frequency and amplitude cycle-to-cycle variations, respectively. Both features have been largely used for the description of pathological voices, and since they characterise some aspects concerning particular voices, they are expected to have a certain degree of speaker specificity. In the current work, jitter and shimmer are successfully used in a speaker verification experiment. Moreover, both measures are combined with spectral and prosodic features using several types of normalisation and fusion techniques in order to obtain better verification results. The overall speaker verification system is also improved by using histogram equalisation as a normalisation technique previous to fusing the features by support vector machines.
Get full access to this article
View all available purchase options and get full access to this article.
Get access
References
1.
Schmidt-Nielsen A. and Crystal T.H. Speaker verification by human listeners: experiments comparing human and machine performance using the NIST 1998 speaker evaluation data Digit. Signal Process. 10 249-266
2.
Sönmez M.K., Shriberg E., Heck L., and Weintraub M. Modeling dynamic prosodic variation for speaker verification Proc. ICSLP November 1998 Sydney, Australia
3.
Doddington G. Speaker recognition based on idiolectal differences between speakers Proc. Eurospeech September 2001 Aalborg, Denmark
4.
Andrews W., Kohler M.A., Campbell J., Godfrey J., and Hernández-Cordero J. Gender-dependent phonetic refraction for speaker recognition Proc. ICASSP May 2002 Orlando, FL
5.
Bartkova K., Le-Gac D., Charlet D., and Jouvet D. Prosodic parameter for speaker identification Proc. ICSLP September 2002 Denver, CO
6.
Weber F., Manganaro L., Peskin B., and Shriberg E. Using prosodic and lexical information for speaker identification Proc. ICASSP May 2002 Orlando, FL
7.
Peskin B., Navrátil J., and Abramson J. Using prosodic and conversational features for high-performance speaker recognition: report from JHU WS'02 Proc. ICASSP April 2003 Hong Kong, China
8.
Reynolds D.A., Andrews W., and Campbell J. The SuperSID project: exploiting high-level information for high-accuracy speaker recognition Proc- ICASSP April 2003 Hong Kong, China
9.
Carey M.J., Parris E.S., Lloyd-Thomas H., and Bennett S. Robust prosodic features for speaker identification Proc. ICSLP October 1996 Philadelphia, PA
10.
Atal B.S. Automatic speaker recognition based on pitch contours J. Acoust. Soc. Am. 52 1687-1697
11.
Wittig F. and Müller C. Implicit feedback for user-adaptive systems by analyzing the user's speech Proc. ABIS-03 2003 Karlsruhe, Germany
12.
Rabiner L.R. and Juang B.H. Fundamentals of speech recognition Prentice-Hall, Inc. Englewood Cliffs, NJ
13.
Campbell J.P. Speaker recognition: a tutorial IEEE 85 1437-1462
14.
Gish H. and Schmidt M. Text-independent speaker identification IEEE Signal Process. Mag. 11 18-32
15.
Davis S.B. and Mermelstein P. Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences IEEE Trans. Acoustic, Speech Signal Process. 28 357-366
16.
Oppenheim A.V. and Schafer R.W. From frequency to quefrency: a history of the cepstrum IEEE Signal Process. Mag. 21 5 95-106
17.
Nadeu C., Hernando J., and Gorricho M. On the decorrelation of filter bank energies in speech recognition Proc. Eurospeech September 1995 Madrid, Spain
18.
Hernando J. and Nadeu C. CDHMM speaker recognition by means of frequency filtering of filter-bank energies Proc. Eurospeech September 1997 Rhodes, Greece 2363-2366
19.
Abad A., Nadeu C., Hernando J., and Padrell J. Jacobian adaptation based on the frequency-filtered spectral energies Proc. Eurospeech 2003 Geneva, Switzerland
20.
Adami A.G. Modeling prosodic differences for speaker recognition Speech Commun. 49 277-291
21.
Tuson J. (dir). Diccionari de lingüística Vox Barcelona
22.
Nooteboom S. Hardcastle W.J. and Laver J. The prosody of speech: melody and rhythm 641-673 Blackwell Publishers Ltd Oxford
‘The handbook of phonetic sciences'’
23.
Wennerstrom A. The music of everyday speech. Prosody and discourse analysis Oxford University Press Oxford
24.
Dellwo V., Huckvale M., and Ashby M. Müller C. How is individuality expressed in voice? An introduction to speech production and description for speaker classification 1-20 Springer Berlin
I ‘Speaker classification’
25.
Shriberg E., Stolcke A., Hakkani-Tur D., and Tur G. Prosody-based automatic segmentation of speech into sentences and topics Speech Commun. 32 127-154
26.
Godfrey J.J., Holliman E.C., and McDaniel J. Switchboard: telephone speech corpus for research and development Proc. ICASSP April 1990 Alburquerque, New Mexico
27.
28.
Limpert E., Stahel W.A., and Abbt M. Log-normal distributions across the sciences: keys and clues BioScience 51 341-352
29.
Sönmez M.K., Heck L., Weintraub M., and Shriberg E. A lognormal tied mixture model of pitch for prosody-based speaker recognition Proc. Eurospeech September 1997 Rhodes, Greece
30.
Behlau, M., Madazio, G., Feijó, D., Pontes, P.: ‘Avaliação da Voz’, in ‘Voz - O Livro do Especialista’ (Revinter, Rio de Janeiro, 2001), vol. I, Ch. 3, pp. 86–180
31.
Wertzner H.F., Schreiber S., and Amaro L. Analysis of fundamental frequency, jitter, shimmer and vocal intensity in children with phonological disorders Rev. Bras. Otorrinolaringol. 71 582-588
32.
Behlau M. and Pontes P. Avaliação e Tratamento das Disfonias Lovise São Paulo
33.
Wagner I. A new jitter-algorithm to quantify hoarseness: an exploratory study Forensic Linguistics 2 18-27
34.
Kreiman J. and Gerrat B.R. Perception of aperiodicity in pathological voice J. Acoust. Soc. Am. 117 2201-2211
35.
Michaelis D., Fröhlich M., Strube H.W., Kruse E., Story B., and Titze I.R. Some simulations concerning jitter and shimmer measurement Proc. Third Int. Workshop Advances in Quantitative Laryngoscopy 1998 Aachen, Germany
36.
Li X., Tao J., Johnson M.T., Soltis J., Savage A., Leong K.M., and Newman J.D. Stress and emotion classification using jitter and shimmer features Proc. ICASSP April 2007 Honolulu, Hawaii
37.
Ludlow C.L., Coulter D.C., and Bassich C.J. Relationships between vocal jitter, age, sex, and smoking J. Acoust. Soc. Am. 71 55-56
38.
Linville S.E. The aging voice ASHA Leader 9 19 12-21
39.
Sadeghi Naini A. and Homayounpour M.M. Speaker age interval and sex identification based on jitters, shimmers and mean MFCC using supervised and unsupervised discriminative classification methods Proc. ICSP 2006 Guilin, China
40.
Kröger B. Zur Auswirkung der Glottis-Sprechtakt-Kopplung auf die Stimmreinheit Sprache-Stimme-Gehör 15 139-142
41.
Bolle R.M., Connell J.H., Pankanti S., Ratha N.K., and Senior A.W. Guide to biometrics Springer New York
42.
Jain A., Nandakumar K., and Ross A. Score normalization in multimodal biometric systems Pattern Recognit. 38 2270-2285
43.
Fox N.A., Gross R., Chazal P., Cohn J.F., and Reilly R.B. Person identification using automatic integration of speech, lip and face experts Proc. ACM SIGMM 2003 Multimedia Biometrics Methods and Applications Workshop 2003 Berkeley, CA
44.
Indovina M., Uludag U., Snelik R., Mink A., and Jain A. Multimodal Biometric authentication methods: a COTS approach Proc. Workshop Multimodal User Authentication 2003 Santa Barbara, CA
45.
Lucey S. and Chen T. Improved audio-visual speaker recognition via the use of a hybrid combination strategy Proc. AVBPA 2003 Guildford, UK
46.
Wang Y., Wang Y., and Tan T. Combining fingerprint and voiceprint biometrics for identity verification: an experimental comparison Proc. ICBA 2004 Hong Kong, China
47.
Farrús M., Ejarque P., Temko A., and Hernando J. Histogram equalization in SVM multimodal person verification Proc. ICB 2007 Seoul, Korea
48.
de la Torre Á., Peinado A.M., Segura J.C., Pérez-Córdoba J.L., Benítez M.C., and Rubio A.J. Histogram equalization of speech representation for robust speech recognition IEEE Trans Speech Audio Process. 13 355-366
49.
Skosan M. and Mashao D. Modified segmental histogram equalization for robust speaker verification Pattern Recognit. Lett. 27 479-486
50.
Hilger F. and Ney H. Quantile based histogram equalization for noise robust speech recognition Proc. Eurospeech 2001 Aalborg, Denmark
51.
Balchandran R. and Mammone R. Non parametric estimation and correction of non linear distortion in speech systems Proc. ICASSP May 1998 Seattle, WA
52.
Pelecanos J. and Sridharan S. Feature warping for robust speaker verification Proc. t ODYSSEY-2001 2001 Crete, Greece
53.
Kitter J., Hatef M., Duin R., and Matas J. On combining classifiers IEEE Trans. Pattern Anal. Mach. Intell. 20 226-239
54.
Rodríguez-Liñares L., García-Mateo C., and Alba-Castro J.L. On combining classifiers for speaker authentication Pattern Recognit. 36 347-359
55.
Cristianini N. and Shawe-Taylor J. An introduction to support vector machines (and other kernel-based learning methods) Cambridge University Press
56.
Hearst M.A. Trends and controversies: support vector machines IEEE Intell. Syst. 13 18-28
57.
Burges C.J.C. A tutorial on support vector machines for pattern recognition Data Min. Knowl. Discov. 2 121-167
58.
Newcombe R.G. Two-sided confidence intervals for the single proportion: Comparison of seven methods Stat. Med. 17 857-872
59.
Zhang X., Wong H., and Cheung W. A privacy-aware service-oriented platform for distributed data mining Proc. Int. Conf. E-Commerce Technology and the Int. Conf. Enterprise Computing 2006 Palo Alto, CA
Information and Authors
Information
Published in
IET Signal Processing
Volume 3 • Issue 4 • 01 July 2009
Pages: 247 - 257
Copyright
© The Institution of Engineering and Technology.
History
Published in print: 01 July 2009
Published online: 31 March 2024
Inspec keywords
- jitter
- speaker recognition
- support vector machines
Keywords
- jitter
- shimme
- speaker verification
- pathological voices
- spectral feature
- prosodic feature
- normalisation techniques
- fusion techniques
- histogram equalisation
- support vector machines
Authors
Affiliations
M. Farrús
TALP Research Centre, Department of Signal Theory and Communications, Universitat Politècnica de Catalunya, C/Jordi-Girona 1-3, Barcelona, 08034, Spain
View all articles by this author
J. Hernando
TALP Research Centre, Department of Signal Theory and Communications, Universitat Politècnica de Catalunya, C/Jordi-Girona 1-3, Barcelona, 08034, Spain
View all articles by this author
Metrics and Citations
Metrics
Citations
Download article citation data for:
Using Jitter and Shimmer in speaker verification
M. Farrús and J. Hernando
If you have the appropriate software installed, you can download article citation data to the citation manager of your choice. Simply select your manager software from the list below and click Download.
View Options
Access content
Please select your options to get access
Login options
Check if you have access through your login credentials or your institution to get full access on this article.
Login Institutional
Purchase options
Save for later Item saved, go to cart
Buy this article
Using Jitter and Shimmer in speaker verification
$19.99
Add to cart
Buy this article Checkout
Knowledge pack (10 download credits)
Using Jitter and Shimmer in speaker verification
$94.35
Add to cart
Knowledge pack (10 download credits) Checkout
View options
Figures
Tables
Media