Web-Based Query Translation for English-Chinese CLIR

Lu, Chengye, Xu, Yue, & Geva, Shlomo (2008) Web-Based Query Translation for English-Chinese CLIR. Computational Linguistics and Chinese Language Processing (CLCLP), 13(1), pp. 61-90.


Dictionary-based translation is a traditional approach in use by cross-language information retrieval systems. However, significant performance degradation is often observed when queries contain words that do not appear in the dictionary. This is called the Out of Vocabulary (OOV) problem. In recent years, Web mining has been shown to be one of the effective approaches for solving this problem. However, the questions of how to extract Multiword Lexical Units (MLUs) from the Web content and how to select the correct translations from the extracted candidate MLUs are still two difficult problems in Web mining based automated translation approaches. Most statistical approaches to MLU extraction rely on statistical information extracted from huge corpora. In the case of using Web mining techniques for automated translations, these approaches do not perform well because the size of the corpus is usually too small and statistical approaches that rely on a large sample can become unreliable. In this paper, we present a new Chinese term measurement and a new Chinese MLU extraction process that work well on small corpora. We also present our approach to the selection of MLUs in a more accurate manner. Our experiments show marked improvement in translation accuracy over other commonly used approaches.

Impact and interest:

4 citations in Web of Science®
Search Google Scholar™

Citation counts are sourced monthly from Scopus and Web of Science® citation databases.

These databases contain citations from different subsets of available publications and different time periods and thus the citation count from each is usually different. Some works are not in either database and no count is displayed. Scopus includes citations from articles published in 1996 onwards, and Web of Science® generally from 1980 onwards.

Citations counts from the Google Scholar™ indexing service can be viewed at the linked Google Scholar™ search.

Full-text downloads:

340 since deposited on 17 Oct 2008
10 in the past twelve months

Full-text downloads displays the total number of times this work’s files (e.g., a PDF) have been downloaded from QUT ePrints as well as the number of downloads in the previous 365 days. The count includes downloads for all files if a work has more than one.

ID Code: 15238
Item Type: Journal Article
Refereed: Yes
Additional Information: The contents of this journal can be freely accessed online via the journal's web page (see hypertext link).
Additional URLs:
Keywords: Cross, Language Information Retrieval, CLIR, Query Translation, Web Mining, OOV Problem, Term Extraction
Subjects: Australian and New Zealand Standard Research Classification > INFORMATION AND COMPUTING SCIENCES (080000) > INFORMATION SYSTEMS (080600)
Divisions: Past > QUT Faculties & Divisions > Faculty of Science and Technology
Copyright Owner: Copyright 2008 Association for Computational Linguistics and Chinese Language Processing
Deposited On: 17 Oct 2008 00:00
Last Modified: 09 Jun 2010 13:05

Export: EndNote | Dublin Core | BibTeX

Repository Staff Only: item control page