A hybrid Chinese information retrieval model

Xu, Yue & Geva, Shlomo (2010) A hybrid Chinese information retrieval model. In An, Aijun (Ed.) AMT'10 Proceedings of the 6th International Conference on Active Media Technology, Springer-Verlag Berlin, Toronto, Canada, 267 - 276.

View at publisher

Abstract

A distinctive feature of Chinese test is that a Chinese document is a sequence of Chinese with no space or boundary between Chinese words. This feature makes Chinese information retrieval more difficult since a retrieved document which contains the query term as a sequence of Chinese characters may not be really relevant to the query since the query term (as a sequence Chinese characters) may not be a valid Chinese word in that documents. On the other hand, a document that is actually relevant may not be retrieved because it does not contain the query sequence but contains other relevant words. In this research, we propose a hybrid Chinese information retrieval model by incorporating word-based techniques with the traditional character-based techniques. The aim of this approach is to investigate the influence of Chinese segmentation on the performance of Chinese information retrieval. Two ranking methods are proposed to rank retrieved documents based on the relevancy to the query calculated by combining character-based ranking and word-based ranking. Our experimental results show that Chinese segmentation can improve the performance of Chinese information retrieval, but the improvement is not significant if it incorporates only Chinese segmentation with the traditional character-based approach.

Impact and interest:

0 citations in Scopus
Search Google Scholar™

Citation counts are sourced monthly from Scopus and Web of Science® citation databases.

These databases contain citations from different subsets of available publications and different time periods and thus the citation count from each is usually different. Some works are not in either database and no count is displayed. Scopus includes citations from articles published in 1996 onwards, and Web of Science® generally from 1980 onwards.

Citations counts from the Google Scholar™ indexing service can be viewed at the linked Google Scholar™ search.

Full-text downloads:

84 since deposited on 26 Apr 2011
1 in the past twelve months

Full-text downloads displays the total number of times this work’s files (e.g., a PDF) have been downloaded from QUT ePrints as well as the number of downloads in the previous 365 days. The count includes downloads for all files if a work has more than one.

ID Code: 41427
Item Type: Conference Paper
Refereed: Yes
Keywords: Chinese Segmentation, Information Retrieval, Document ranking, Chinease characters
ISBN: 9783642154690
Subjects: Australian and New Zealand Standard Research Classification > INFORMATION AND COMPUTING SCIENCES (080000) > INFORMATION SYSTEMS (080600)
Divisions: Past > Schools > Computer Science
Past > QUT Faculties & Divisions > Faculty of Science and Technology
Past > Institutes > Institute for Creative Industries and Innovation
Copyright Owner: Copyright 2010 Springer
Copyright Statement:

This is the author-version of the work.

Conference proceedings published, by Springer Verlag, will be available via SpringerLink. http://www.springerlink.com

Deposited On: 26 Apr 2011 22:36
Last Modified: 01 Mar 2012 02:11

Export: EndNote | Dublin Core | BibTeX

Repository Staff Only: item control page