Text mining and probabilistic language modeling for online review spam detecting

Lau, Raymond Y.K., Liao, S.Y., Kwok, Ron Chi Wai, Xu, Kaiquan, Xia, Yunqing, & Li, Yuefeng (2011) Text mining and probabilistic language modeling for online review spam detecting. ACM Transactions on Management Information Systems, 2(4), pp. 1-30.

View at publisher


In the era of Web 2.0, huge volumes of consumer reviews are posted to the Internet every day. Manual approaches to detecting and analyzing fake reviews (i.e., spam) are not practical due to the problem of information overload. However, the design and development of automated methods of detecting fake reviews is a challenging research problem. The main reason is that fake reviews are specifically composed to mislead readers, so they may appear the same as legitimate reviews (i.e., ham). As a result, discriminatory features that would enable individual reviews to be classified as spam or ham may not be available. Guided by the design science research methodology, the main contribution of this study is the design and instantiation of novel computational models for detecting fake reviews. In particular, a novel text mining model is developed and integrated into a semantic language model for the detection of untruthful reviews. The models are then evaluated based on a real-world dataset collected from amazon.com. The results of our experiments confirm that the proposed models outperform other well-known baseline models in detecting fake reviews. To the best of our knowledge, the work discussed in this article represents the first successful attempt to apply text mining methods and semantic language models to the detection of fake consumer reviews. A managerial implication of our research is that firms can apply our design artifacts to monitor online consumer reviews to develop effective marketing or product design strategies based on genuine consumer feedback posted to the Internet.

Impact and interest:

40 citations in Scopus
Search Google Scholar™

Citation counts are sourced monthly from Scopus and Web of Science® citation databases.

These databases contain citations from different subsets of available publications and different time periods and thus the citation count from each is usually different. Some works are not in either database and no count is displayed. Scopus includes citations from articles published in 1996 onwards, and Web of Science® generally from 1980 onwards.

Citations counts from the Google Scholar™ indexing service can be viewed at the linked Google Scholar™ search.

ID Code: 49113
Item Type: Journal Article
Refereed: Yes
Additional URLs:
Keywords: Design, Algorithms, Experimentation, Language models, Text mining, Review spam, Spam detection, Design science
DOI: 10.1145/2070710.2070716
ISSN: 2158-6578 (online) 2158-656x (print)
Subjects: Australian and New Zealand Standard Research Classification > INFORMATION AND COMPUTING SCIENCES (080000) > ARTIFICIAL INTELLIGENCE AND IMAGE PROCESSING (080100) > Pattern Recognition and Data Mining (080109)
Australian and New Zealand Standard Research Classification > INFORMATION AND COMPUTING SCIENCES (080000) > DISTRIBUTED COMPUTING (080500) > Web Technologies (excl. Web Search) (080505)
Divisions: Current > Schools > School of Electrical Engineering & Computer Science
Current > QUT Faculties and Divisions > Science & Engineering Faculty
Copyright Owner: Copyright 2011 ACM
Copyright Statement:

Rights Retained by Authors and Original Copyright Holders
- the right to post author-prepared versions of the work covered by ACM copyright in a personal collection on their own Home Page and on a publicly accessible server of their employer, and in a repository legally mandated by the agency funding the research on which the Work is based. Such posting is limited to noncommercial access and personal use by others, and must include this notice both embedded within the full text file and in the accompanying citation display as well:

"© ACM, 2011. This is the author's version of the work. It is posted here by permission of ACM for your personal use. Not for redistribution. The definitive version was published in ACM Transactions on Management Information Systems, {VOL 2, ISS 4, (2011)} http://doi.acm.org/10.1145/nnnnnn.nnnnnn"

Deposited On: 12 Mar 2012 23:18
Last Modified: 21 Dec 2016 02:52

Export: EndNote | Dublin Core | BibTeX

Repository Staff Only: item control page