Using Jitter and Shimmer in speaker verification (2025)

Previous article

Next article

Research Article

01 July 2009

Authors: M. Farrús and J. Hernando

Publication: IET Signal Processing

Volume 3, Issue 4

Get access

Abstract

Jitter and shimmer are measures of the fundamental frequency and amplitude cycle-to-cycle variations, respectively. Both features have been largely used for the description of pathological voices, and since they characterise some aspects concerning particular voices, they are expected to have a certain degree of speaker specificity. In the current work, jitter and shimmer are successfully used in a speaker verification experiment. Moreover, both measures are combined with spectral and prosodic features using several types of normalisation and fusion techniques in order to obtain better verification results. The overall speaker verification system is also improved by using histogram equalisation as a normalisation technique previous to fusing the features by support vector machines.

Get full access to this article

View all available purchase options and get full access to this article.

Get access

References

1.

Schmidt-Nielsen A. and Crystal T.H. Speaker verification by human listeners: experiments comparing human and machine performance using the NIST 1998 speaker evaluation data Digit. Signal Process. 10 249-266

2000

2.

Sönmez M.K., Shriberg E., Heck L., and Weintraub M. Modeling dynamic prosodic variation for speaker verification Proc. ICSLP November 1998 Sydney, Australia

3.

Doddington G. Speaker recognition based on idiolectal differences between speakers Proc. Eurospeech September 2001 Aalborg, Denmark

4.

Andrews W., Kohler M.A., Campbell J., Godfrey J., and Hernández-Cordero J. Gender-dependent phonetic refraction for speaker recognition Proc. ICASSP May 2002 Orlando, FL

5.

Bartkova K., Le-Gac D., Charlet D., and Jouvet D. Prosodic parameter for speaker identification Proc. ICSLP September 2002 Denver, CO

6.

Weber F., Manganaro L., Peskin B., and Shriberg E. Using prosodic and lexical information for speaker identification Proc. ICASSP May 2002 Orlando, FL

7.

Peskin B., Navrátil J., and Abramson J. Using prosodic and conversational features for high-performance speaker recognition: report from JHU WS'02 Proc. ICASSP April 2003 Hong Kong, China

8.

Reynolds D.A., Andrews W., and Campbell J. The SuperSID project: exploiting high-level information for high-accuracy speaker recognition Proc- ICASSP April 2003 Hong Kong, China

9.

Carey M.J., Parris E.S., Lloyd-Thomas H., and Bennett S. Robust prosodic features for speaker identification Proc. ICSLP October 1996 Philadelphia, PA

10.

Atal B.S. Automatic speaker recognition based on pitch contours J. Acoust. Soc. Am. 52 1687-1697

1972

11.

Wittig F. and Müller C. Implicit feedback for user-adaptive systems by analyzing the user's speech Proc. ABIS-03 2003 Karlsruhe, Germany

12.

Rabiner L.R. and Juang B.H. Fundamentals of speech recognition Prentice-Hall, Inc. Englewood Cliffs, NJ

1993

13.

Campbell J.P. Speaker recognition: a tutorial IEEE 85 1437-1462

1997

14.

Gish H. and Schmidt M. Text-independent speaker identification IEEE Signal Process. Mag. 11 18-32

1994

15.

Davis S.B. and Mermelstein P. Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences IEEE Trans. Acoustic, Speech Signal Process. 28 357-366

1980

16.

Oppenheim A.V. and Schafer R.W. From frequency to quefrency: a history of the cepstrum IEEE Signal Process. Mag. 21 5 95-106

2004

17.

Nadeu C., Hernando J., and Gorricho M. On the decorrelation of filter bank energies in speech recognition Proc. Eurospeech September 1995 Madrid, Spain

18.

Hernando J. and Nadeu C. CDHMM speaker recognition by means of frequency filtering of filter-bank energies Proc. Eurospeech September 1997 Rhodes, Greece 2363-2366

19.

Abad A., Nadeu C., Hernando J., and Padrell J. Jacobian adaptation based on the frequency-filtered spectral energies Proc. Eurospeech 2003 Geneva, Switzerland

20.

Adami A.G. Modeling prosodic differences for speaker recognition Speech Commun. 49 277-291

2007

21.

Tuson J. (dir). Diccionari de lingüística Vox Barcelona

2000

22.

Nooteboom S. Hardcastle W.J. and Laver J. The prosody of speech: melody and rhythm 641-673 Blackwell Publishers Ltd Oxford

1997

‘The handbook of phonetic sciences'’

23.

Wennerstrom A. The music of everyday speech. Prosody and discourse analysis Oxford University Press Oxford

2001

24.

Dellwo V., Huckvale M., and Ashby M. Müller C. How is individuality expressed in voice? An introduction to speech production and description for speaker classification 1-20 Springer Berlin

2007

I ‘Speaker classification’

25.

Shriberg E., Stolcke A., Hakkani-Tur D., and Tur G. Prosody-based automatic segmentation of speech into sentences and topics Speech Commun. 32 127-154

2000

26.

Godfrey J.J., Holliman E.C., and McDaniel J. Switchboard: telephone speech corpus for research and development Proc. ICASSP April 1990 Alburquerque, New Mexico

27.

Boersma P. and Weenink D. Praat: doing phonetics by computer

1992

Website: http://www.praat.org

28.

Limpert E., Stahel W.A., and Abbt M. Log-normal distributions across the sciences: keys and clues BioScience 51 341-352

2001

29.

Sönmez M.K., Heck L., Weintraub M., and Shriberg E. A lognormal tied mixture model of pitch for prosody-based speaker recognition Proc. Eurospeech September 1997 Rhodes, Greece

30.

Behlau, M., Madazio, G., Feijó, D., Pontes, P.: ‘Avaliação da Voz’, in ‘Voz - O Livro do Especialista’ (Revinter, Rio de Janeiro, 2001), vol. I, Ch. 3, pp. 86–180

31.

Wertzner H.F., Schreiber S., and Amaro L. Analysis of fundamental frequency, jitter, shimmer and vocal intensity in children with phonological disorders Rev. Bras. Otorrinolaringol. 71 582-588

2005

32.

Behlau M. and Pontes P. Avaliação e Tratamento das Disfonias Lovise São Paulo

1995

33.

Wagner I. A new jitter-algorithm to quantify hoarseness: an exploratory study Forensic Linguistics 2 18-27

1995

34.

Kreiman J. and Gerrat B.R. Perception of aperiodicity in pathological voice J. Acoust. Soc. Am. 117 2201-2211

2005

35.

Michaelis D., Fröhlich M., Strube H.W., Kruse E., Story B., and Titze I.R. Some simulations concerning jitter and shimmer measurement Proc. Third Int. Workshop Advances in Quantitative Laryngoscopy 1998 Aachen, Germany

36.

Li X., Tao J., Johnson M.T., Soltis J., Savage A., Leong K.M., and Newman J.D. Stress and emotion classification using jitter and shimmer features Proc. ICASSP April 2007 Honolulu, Hawaii

37.

Ludlow C.L., Coulter D.C., and Bassich C.J. Relationships between vocal jitter, age, sex, and smoking J. Acoust. Soc. Am. 71 55-56

1982

38.

Linville S.E. The aging voice ASHA Leader 9 19 12-21

2004

39.

Sadeghi Naini A. and Homayounpour M.M. Speaker age interval and sex identification based on jitters, shimmers and mean MFCC using supervised and unsupervised discriminative classification methods Proc. ICSP 2006 Guilin, China

40.

Kröger B. Zur Auswirkung der Glottis-Sprechtakt-Kopplung auf die Stimmreinheit Sprache-Stimme-Gehör 15 139-142

1991

41.

Bolle R.M., Connell J.H., Pankanti S., Ratha N.K., and Senior A.W. Guide to biometrics Springer New York

2004

42.

Jain A., Nandakumar K., and Ross A. Score normalization in multimodal biometric systems Pattern Recognit. 38 2270-2285

2005

43.

Fox N.A., Gross R., Chazal P., Cohn J.F., and Reilly R.B. Person identification using automatic integration of speech, lip and face experts Proc. ACM SIGMM 2003 Multimedia Biometrics Methods and Applications Workshop 2003 Berkeley, CA

44.

Indovina M., Uludag U., Snelik R., Mink A., and Jain A. Multimodal Biometric authentication methods: a COTS approach Proc. Workshop Multimodal User Authentication 2003 Santa Barbara, CA

45.

Lucey S. and Chen T. Improved audio-visual speaker recognition via the use of a hybrid combination strategy Proc. AVBPA 2003 Guildford, UK

46.

Wang Y., Wang Y., and Tan T. Combining fingerprint and voiceprint biometrics for identity verification: an experimental comparison Proc. ICBA 2004 Hong Kong, China

47.

Farrús M., Ejarque P., Temko A., and Hernando J. Histogram equalization in SVM multimodal person verification Proc. ICB 2007 Seoul, Korea

48.

de la Torre Á., Peinado A.M., Segura J.C., Pérez-Córdoba J.L., Benítez M.C., and Rubio A.J. Histogram equalization of speech representation for robust speech recognition IEEE Trans Speech Audio Process. 13 355-366

2005

49.

Skosan M. and Mashao D. Modified segmental histogram equalization for robust speaker verification Pattern Recognit. Lett. 27 479-486

2006

50.

Hilger F. and Ney H. Quantile based histogram equalization for noise robust speech recognition Proc. Eurospeech 2001 Aalborg, Denmark

51.

Balchandran R. and Mammone R. Non parametric estimation and correction of non linear distortion in speech systems Proc. ICASSP May 1998 Seattle, WA

52.

Pelecanos J. and Sridharan S. Feature warping for robust speaker verification Proc. t ODYSSEY-2001 2001 Crete, Greece

53.

Kitter J., Hatef M., Duin R., and Matas J. On combining classifiers IEEE Trans. Pattern Anal. Mach. Intell. 20 226-239

1998

54.

Rodríguez-Liñares L., García-Mateo C., and Alba-Castro J.L. On combining classifiers for speaker authentication Pattern Recognit. 36 347-359

2003

55.

Cristianini N. and Shawe-Taylor J. An introduction to support vector machines (and other kernel-based learning methods) Cambridge University Press

2000

56.

Hearst M.A. Trends and controversies: support vector machines IEEE Intell. Syst. 13 18-28

1998

57.

Burges C.J.C. A tutorial on support vector machines for pattern recognition Data Min. Knowl. Discov. 2 121-167

1998

58.

Newcombe R.G. Two-sided confidence intervals for the single proportion: Comparison of seven methods Stat. Med. 17 857-872

1998

59.

Zhang X., Wong H., and Cheung W. A privacy-aware service-oriented platform for distributed data mining Proc. Int. Conf. E-Commerce Technology and the Int. Conf. Enterprise Computing 2006 Palo Alto, CA

Information and Authors

Information

Published in

Using Jitter and Shimmer in speaker verification (1)

IET Signal Processing

Volume 3 • Issue 4 • 01 July 2009

Pages: 247 - 257

Copyright

© The Institution of Engineering and Technology.

History

Published in print: 01 July 2009

Published online: 31 March 2024

Inspec keywords

  1. jitter
  2. speaker recognition
  3. support vector machines

Keywords

  1. jitter
  2. shimme
  3. speaker verification
  4. pathological voices
  5. spectral feature
  6. prosodic feature
  7. normalisation techniques
  8. fusion techniques
  9. histogram equalisation
  10. support vector machines

Authors

Affiliations

M. Farrús

TALP Research Centre, Department of Signal Theory and Communications, Universitat Politècnica de Catalunya, C/Jordi-Girona 1-3, Barcelona, 08034, Spain

View all articles by this author

J. Hernando

TALP Research Centre, Department of Signal Theory and Communications, Universitat Politècnica de Catalunya, C/Jordi-Girona 1-3, Barcelona, 08034, Spain

View all articles by this author

Metrics and Citations

Metrics

Citations

Download article citation data for:

Using Jitter and Shimmer in speaker verification

M. Farrús and J. Hernando

If you have the appropriate software installed, you can download article citation data to the citation manager of your choice. Simply select your manager software from the list below and click Download.

View Options

Access content

Please select your options to get access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Login Institutional

Purchase options

Save for later Item saved, go to cart

Buy this article

Using Jitter and Shimmer in speaker verification

$19.99

Add to cart

Buy this article Checkout

Knowledge pack (10 download credits)

Using Jitter and Shimmer in speaker verification

$94.35

Add to cart

Knowledge pack (10 download credits) Checkout

View options

PDF

View PDF

Figures

Tables

Media

Using Jitter and Shimmer in speaker verification (2025)

References

Top Articles
Latest Posts
Recommended Articles
Article information

Author: Annamae Dooley

Last Updated:

Views: 6695

Rating: 4.4 / 5 (45 voted)

Reviews: 92% of readers found this page helpful

Author information

Name: Annamae Dooley

Birthday: 2001-07-26

Address: 9687 Tambra Meadow, Bradleyhaven, TN 53219

Phone: +9316045904039

Job: Future Coordinator

Hobby: Archery, Couponing, Poi, Kite flying, Knitting, Rappelling, Baseball

Introduction: My name is Annamae Dooley, I am a witty, quaint, lovely, clever, rich, sparkling, powerful person who loves writing and wants to share my knowledge and understanding with you.