An English-translated parallel corpus for the CJK Wikipedia collections

Tang, Ling-Xiang, Geva, Shlomo, & Trotman, Andrew (2012) An English-translated parallel corpus for the CJK Wikipedia collections. In 17th Australasian Document Computing Symposium, 5-6 December 2012, Dunedin, New Zealand.

View at publisher


In this paper, we describe a machine-translated parallel English corpus for the NTCIR Chinese, Japanese and Korean (CJK) Wikipedia collections. This document collection is named CJK2E Wikipedia XML corpus. The corpus could be used by the information retrieval research community and knowledge sharing in Wikipedia in many ways; for example, this corpus could be used for experimentations in cross-lingual information retrieval, cross-lingual link discovery, or omni-lingual information retrieval research. Furthermore, the translated CJK articles could be used to further expand the current coverage of the English Wikipedia.

Impact and interest:

0 citations in Scopus
Search Google Scholar™

Citation counts are sourced monthly from Scopus and Web of Science® citation databases.

These databases contain citations from different subsets of available publications and different time periods and thus the citation count from each is usually different. Some works are not in either database and no count is displayed. Scopus includes citations from articles published in 1996 onwards, and Web of Science® generally from 1980 onwards.

Citations counts from the Google Scholar™ indexing service can be viewed at the linked Google Scholar™ search.

Full-text downloads:

86 since deposited on 07 Mar 2013
16 in the past twelve months

Full-text downloads displays the total number of times this work’s files (e.g., a PDF) have been downloaded from QUT ePrints as well as the number of downloads in the previous 365 days. The count includes downloads for all files if a work has more than one.

ID Code: 57835
Item Type: Conference Paper
Refereed: Yes
Keywords: Wikipedia, Corpus, English, Chinese, Japanese, Korean, machine learning, cross-lingual information retrieval, cross-lingual link discovery
DOI: 10.1145/2407085.2407099
ISBN: 9781450314114
Subjects: Australian and New Zealand Standard Research Classification > INFORMATION AND COMPUTING SCIENCES (080000) > LIBRARY AND INFORMATION STUDIES (080700) > Information Retrieval and Web Search (080704)
Divisions: Past > QUT Faculties & Divisions > Faculty of Science and Technology
Copyright Owner: Copyright 2012 ACM
Copyright Statement: Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee.
Deposited On: 07 Mar 2013 04:13
Last Modified: 07 Mar 2013 23:55

Export: EndNote | Dublin Core | BibTeX

Repository Staff Only: item control page