An English-translated parallel corpus for the CJK Wikipedia collections
Tang, Ling-Xiang, Geva, Shlomo, & Trotman, Andrew (2012) An English-translated parallel corpus for the CJK Wikipedia collections. In 17th Australasian Document Computing Symposium, 5-6 December 2012, Dunedin, New Zealand.
In this paper, we describe a machine-translated parallel English corpus for the NTCIR Chinese, Japanese and Korean (CJK) Wikipedia collections. This document collection is named CJK2E Wikipedia XML corpus. The corpus could be used by the information retrieval research community and knowledge sharing in Wikipedia in many ways; for example, this corpus could be used for experimentations in cross-lingual information retrieval, cross-lingual link discovery, or omni-lingual information retrieval research. Furthermore, the translated CJK articles could be used to further expand the current coverage of the English Wikipedia.
Impact and interest:
Citation counts are sourced monthly from and citation databases.
These databases contain citations from different subsets of available publications and different time periods and thus the citation count from each is usually different. Some works are not in either database and no count is displayed. Scopus includes citations from articles published in 1996 onwards, and Web of Science® generally from 1980 onwards.
Citations counts from theindexing service can be viewed at the linked Google Scholar™ search.
Full-text downloads displays the total number of times this work’s files (e.g., a PDF) have been downloaded from QUT ePrints as well as the number of downloads in the previous 365 days. The count includes downloads for all files if a work has more than one.
|Item Type:||Conference Paper|
|Keywords:||Wikipedia, Corpus, English, Chinese, Japanese, Korean, machine learning, cross-lingual information retrieval, cross-lingual link discovery|
|Subjects:||Australian and New Zealand Standard Research Classification > INFORMATION AND COMPUTING SCIENCES (080000) > LIBRARY AND INFORMATION STUDIES (080700) > Information Retrieval and Web Search (080704)|
|Divisions:||Past > QUT Faculties & Divisions > Faculty of Science and Technology|
|Copyright Owner:||Copyright 2012 ACM|
|Copyright Statement:||Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee.|
|Deposited On:||07 Mar 2013 04:13|
|Last Modified:||07 Mar 2013 23:55|
Repository Staff Only: item control page