QUT ePrints

Optimising figure of merit for phonetic spoken term detection

Wallace, Roy G., Vogt, Robert J., Baker, Brendan J., & Sridharan, Sridha (2010) Optimising figure of merit for phonetic spoken term detection. In Douglas, Scott (Ed.) Proceedings of the 2010 IEEE International Conference on Acoustics, Speech, and Signal Processing, The Institute of Electrical and Electronics Engineers, Inc, Dallas, Texas, pp. 5298-5301.

[img] Conference Paper (PDF 229kB)
Submitted Version.

    View at publisher


    This paper introduces a novel technique to directly optimise the Figure of Merit (FOM) for phonetic spoken term detection. The FOM is a popular measure of sTD accuracy, making it an ideal candiate for use as an objective function. A simple linear model is introduced to transform the phone log-posterior probabilities output by a phe classifier to produce enhanced log-posterior features that are more suitable for the STD task. Direct optimisation of the FOM is then performed by training the parameters of this model using a non-linear gradient descent algorithm. Substantial FOM improvements of 11% relative are achieved on held-out evaluation data, demonstrating the generalisability of the approach.

    Impact and interest:

    6 citations in Scopus
    Search Google Scholar™
    5 citations in Web of Science®

    Citation countsare sourced monthly from Scopus and Web of Science® citation databases.

    These databases contain citations from different subsets of available publications and different time periods and thus the citation count from each is usually different. Some works are not in either database and no count is displayed. Scopus includes citations from articles published in 1996 onwards, and Web of Science® generally from 1980 onwards.

    Citations counts from the Google Scholar™ indexing service can be viewed at the linked Google Scholar™ search.

    Full-text downloads:

    243 since deposited on 30 Aug 2010
    60 in the past twelve months

    Full-text downloadsdisplays the total number of times this work’s files (e.g., a PDF) have been downloaded from QUT ePrints as well as the number of downloads in the previous 365 days. The count includes downloads for all files if a work has more than one.

    ID Code: 34246
    Item Type: Conference Paper
    Keywords: Spoken Term Detection, Speech Processing, Speech Recognition, Information Retrieval
    DOI: 10.1109/ICASSP.2010.5494969
    ISBN: 9781424442966
    ISSN: 1520-6149
    Subjects: Australian and New Zealand Standard Research Classification > INFORMATION AND COMPUTING SCIENCES (080000) > ARTIFICIAL INTELLIGENCE AND IMAGE PROCESSING (080100) > Pattern Recognition and Data Mining (080109)
    Divisions: Past > QUT Faculties & Divisions > Faculty of Built Environment and Engineering
    Past > Schools > School of Engineering Systems
    Copyright Owner: Copyright 2010 IEEE
    Deposited On: 30 Aug 2010 11:50
    Last Modified: 01 Mar 2012 00:16

    Export: EndNote | Dublin Core | BibTeX

    Repository Staff Only: item control page