Structuring free-form tagging in online news

Lau, Cher Han (Andy) (2009) Structuring free-form tagging in online news. Masters by Research thesis, Queensland University of Technology.


Tagging has become one of the key activities in next generation websites which allow users selecting short labels to annotate, manage, and share multimedia information such as photos, videos and bookmarks. Tagging does not require users any prior training before participating in the annotation activities as they can freely choose any terms which best represent the semantic of contents without worrying about any formal structure or ontology. However, the practice of free-form tagging can lead to several problems, such as synonymy, polysemy and ambiguity, which potentially increase the complexity of managing the tags and retrieving information. To solve these problems, this research aims to construct a lightweight indexing scheme to structure tags by identifying and disambiguating the meaning of terms and construct a knowledge base or dictionary. News has been chosen as the primary domain of application to demonstrate the benefits of using structured tags for managing the rapidly changing and dynamic nature of news information. One of the main outcomes of this work is an automatically constructed vocabulary that defines the meaning of each named entity tag, which can be extracted from a news article (including person, location and organisation), based on experts suggestions from major search engines and the knowledge from public database such as Wikipedia. To demonstrate the potential applications of the vocabulary, we have used it to provide more functionalities in an online news website, including topic-based news reading, intuitive tagging, clipping and sharing of interesting news, as well as news filtering or searching based on named entity tags. The evaluation results on the impact of disambiguating tags have shown that the vocabulary can help to significantly improve news searching performance. The preliminary results from our user study have demonstrated that users can benefit from the additional functionalities on the news websites as they are able to retrieve more relevant news, clip and share news with friends and families effectively.

Impact and interest:

Search Google Scholar™

Citation counts are sourced monthly from Scopus and Web of Science® citation databases.

These databases contain citations from different subsets of available publications and different time periods and thus the citation count from each is usually different. Some works are not in either database and no count is displayed. Scopus includes citations from articles published in 1996 onwards, and Web of Science® generally from 1980 onwards.

Citations counts from the Google Scholar™ indexing service can be viewed at the linked Google Scholar™ search.

Full-text downloads:

539 since deposited on 09 Feb 2010
16 in the past twelve months

Full-text downloads displays the total number of times this work’s files (e.g., a PDF) have been downloaded from QUT ePrints as well as the number of downloads in the previous 365 days. The count includes downloads for all files if a work has more than one.

ID Code: 30315
Item Type: QUT Thesis (Masters by Research)
Supervisor: Tjondronegoro, Dian & Nayak, Richi
Keywords: folksonomy, social tagging, online news, digital news
Divisions: Past > QUT Faculties & Divisions > Faculty of Science and Technology
Institution: Queensland University of Technology
Deposited On: 09 Feb 2010 02:48
Last Modified: 28 Oct 2011 19:55

Export: EndNote | Dublin Core | BibTeX

Repository Staff Only: item control page