Exponentiated gradient algorithms for conditional random fields and max-margin Markov Networks

Collins, Michael, Globerson, Amir, Koo, Terry, Carreras, Xavier, & Bartlett, Peter L. (2008) Exponentiated gradient algorithms for conditional random fields and max-margin Markov Networks. Journal of Machine Learning Research, 9(Aug), pp. 1775-1822.

View at publisher


Log-linear and maximum-margin models are two commonly-used methods in supervised machine learning, and are frequently used in structured prediction problems. Efficient learning of parameters in these models is therefore an important problem, and becomes a key factor when learning from very large data sets. This paper describes exponentiated gradient (EG) algorithms for training such models, where EG updates are applied to the convex dual of either the log-linear or max-margin objective function; the dual in both the log-linear and max-margin cases corresponds to minimizing a convex function with simplex constraints. We study both batch and online variants of the algorithm, and provide rates of convergence for both cases. In the max-margin case, O(1/ε) EG updates are required to reach a given accuracy ε in the dual; in contrast, for log-linear models only O(log(1/ε)) updates are required. For both the max-margin and log-linear cases, our bounds suggest that the online EG algorithm requires a factor of n less computation to reach a desired accuracy than the batch EG algorithm, where n is the number of training examples. Our experiments confirm that the online algorithms are much faster than the batch algorithms in practice. We describe how the EG updates factor in a convenient way for structured prediction problems, allowing the algorithms to be efficiently applied to problems such as sequence learning or natural language parsing. We perform extensive evaluation of the algorithms, comparing them to L-BFGS and stochastic gradient descent for log-linear models, and to SVM-Struct for max-margin models. The algorithms are applied to a multi-class problem as well as to a more complex large-scale parsing task. In all these settings, the EG algorithms presented here outperform the other methods.

Impact and interest:

76 citations in Scopus
40 citations in Web of Science®
Search Google Scholar™

Citation counts are sourced monthly from Scopus and Web of Science® citation databases.

These databases contain citations from different subsets of available publications and different time periods and thus the citation count from each is usually different. Some works are not in either database and no count is displayed. Scopus includes citations from articles published in 1996 onwards, and Web of Science® generally from 1980 onwards.

Citations counts from the Google Scholar™ indexing service can be viewed at the linked Google Scholar™ search.

ID Code: 43998
Item Type: Journal Article
Refereed: Yes
Additional URLs:
Keywords: exponentiated gradient, conditional random fields, structured prediction, maximum-margin models, log-linear models, OAVJ
ISSN: 1533-7928
Subjects: Australian and New Zealand Standard Research Classification > INFORMATION AND COMPUTING SCIENCES (080000) > ARTIFICIAL INTELLIGENCE AND IMAGE PROCESSING (080100)
Australian and New Zealand Standard Research Classification > PSYCHOLOGY AND COGNITIVE SCIENCES (170000) > COGNITIVE SCIENCE (170200)
Divisions: Past > QUT Faculties & Divisions > Faculty of Science and Technology
Past > Schools > Mathematical Sciences
Copyright Owner: Copyright 2008 Michael Collins, Amir Globerson, Terry Koo, Xavier Carreras and Peter L. Bartlett
Deposited On: 17 Aug 2011 21:49
Last Modified: 29 Feb 2012 14:34

Export: EndNote | Dublin Core | BibTeX

Repository Staff Only: item control page