QUT ePrints

Leveraging Web 2.0 data for scalable semi-supervised learning of domain-specific sentiment lexicons

Lau, Raymond, Lai, Chun-Lam, Bruza, Peter D., & Wong, Kam-Fai (2011) Leveraging Web 2.0 data for scalable semi-supervised learning of domain-specific sentiment lexicons. In 20th ACM Conference on Information and Knoweledge Management, 24-28 October 2011, Crowne Plaza, Glasgow. (In Press)

View at publisher

Abstract

Since manually constructing domain-specific sentiment lexicons is extremely time consuming and it may not even be feasible for domains where linguistic expertise is not available. Research on the automatic construction of domain-specific sentiment lexicons has become a hot topic in recent years. The main contribution of this paper is the illustration of a novel semi-supervised learning method which exploits both term-to-term and document-to-term relations hidden in a corpus for the construction of domain specific sentiment lexicons.

More specifically, the proposed two-pass pseudo labeling method combines shallow linguistic parsing and corpusbase statistical learning to make domain-specific sentiment extraction scalable with respect to the sheer volume of opinionated documents archived on the Internet these days. Another novelty of the proposed method is that it can utilize the readily available user-contributed labels of opinionated documents (e.g., the user ratings of product reviews) to bootstrap the performance of sentiment lexicon construction. Our experiments show that the proposed method can generate high quality domain-specific sentiment lexicons as directly assessed by human experts. Moreover, the system generated domain-specific sentiment lexicons can improve polarity prediction tasks at the document level by 2:18% when compared to other well-known baseline methods. Our research opens the door to the development of practical and scalable methods for domain-specific sentiment analysis.

Impact and interest:

2 citations in Scopus
Search Google Scholar™

Citation countsare sourced monthly from Scopus and Web of Science® citation databases.

These databases contain citations from different subsets of available publications and different time periods and thus the citation count from each is usually different. Some works are not in either database and no count is displayed. Scopus includes citations from articles published in 1996 onwards, and Web of Science® generally from 1980 onwards.

Citations counts from the Google Scholar™ indexing service can be viewed at the linked Google Scholar™ search.

Full-text downloads:

186 since deposited on 19 Sep 2011
71 in the past twelve months

Full-text downloadsdisplays the total number of times this work’s files (e.g., a PDF) have been downloaded from QUT ePrints as well as the number of downloads in the previous 365 days. The count includes downloads for all files if a work has more than one.

ID Code: 46046
Item Type: Conference Paper
Keywords: Sentiment Lexicon, Sentiment Analysis, Text Mining, Statistical Learning
Subjects: Australian and New Zealand Standard Research Classification > INFORMATION AND COMPUTING SCIENCES (080000) > INFORMATION SYSTEMS (080600) > Decision Support and Group Support Systems (080605)
Australian and New Zealand Standard Research Classification > INFORMATION AND COMPUTING SCIENCES (080000) > LIBRARY AND INFORMATION STUDIES (080700) > Information Retrieval and Web Search (080704)
Divisions: Past > QUT Faculties & Divisions > Faculty of Science and Technology
Past > Institutes > Institute for Creative Industries and Innovation
Past > Schools > Information Systems
Copyright Owner: Copyright 2011 [please consult the author]
Deposited On: 20 Sep 2011 09:39
Last Modified: 22 Sep 2011 05:24

Export: EndNote | Dublin Core | BibTeX

Repository Staff Only: item control page