QUT ePrints

Boosting the margin: A new explanation for the effectiveness of voting methods

Schapire, R. E., Freund, Y., Bartlett, P.L., & Lee, W. S. (1998) Boosting the margin: A new explanation for the effectiveness of voting methods. The Annals of Statistics, 26(5), pp. 1651-1686.

[img] PDF (3MB)
Administrators only | Request a copy from author

    View at publisher


    One of the surprising recurring phenomena observed in experiments with boosting is that the test error of the generated classifier usually does not increase as its size becomes very large, and often is observed to decrease even after the training error reaches zero. In this paper, we show that this phenomenon is related to the distribution of margins of the training examples with respect to the generated voting classification rule, where the margin of an example is simply the difference between the number of correct votes and the maximum number of votes received by any incorrect label. We show that techniques used in the analysis of Vapnik's support vector classifiers and of neural networks with small weights can be applied to voting methods to relate the margin distribution to the test error. We also show theoretically and experimentally that boosting is especially effective at increasing the margins of the training examples. Finally, we compare our explanation to those based on the bias-variance decomposition.

    Impact and interest:

    870 citations in Scopus
    Search Google Scholar™
    521 citations in Web of Science®

    Citation countsare sourced monthly from Scopus and Web of Science® citation databases.

    These databases contain citations from different subsets of available publications and different time periods and thus the citation count from each is usually different. Some works are not in either database and no count is displayed. Scopus includes citations from articles published in 1996 onwards, and Web of Science® generally from 1980 onwards.

    Citations counts from the Google Scholar™ indexing service can be viewed at the linked Google Scholar™ search.

    ID Code: 43935
    Item Type: Journal Article
    Additional Information: Publisher PDF uploaded as it is a pre 2000 publication
    Keywords: Bagging, Boosting, Decision trees, Ensemble methods, Error-correcting, Markov chain, Monte Carlo, Neural networks, Output coding
    ISSN: 0090-5364
    Subjects: Australian and New Zealand Standard Research Classification > MATHEMATICAL SCIENCES (010000) > STATISTICS (010400)
    Australian and New Zealand Standard Research Classification > ECONOMICS (140000) > ECONOMETRICS (140300)
    Divisions: Past > QUT Faculties & Divisions > Faculty of Science and Technology
    Past > Schools > Mathematical Sciences
    Copyright Owner: Institute of Mathematical Statistics
    Deposited On: 12 Aug 2011 12:36
    Last Modified: 12 Aug 2011 12:36

    Export: EndNote | Dublin Core | BibTeX

    Repository Staff Only: item control page