Text2EL+: Expert Guided Event Log Enrichment using Unstructured Text

, , & (2024) Text2EL+: Expert Guided Event Log Enrichment using Unstructured Text. Journal of Data and Information Quality, 16(1), Article number: 8.

Free-to-read version at publisher website

Description

Through the application of process mining, business processes can be improved on the basis of process execution data captured in event logs. Naturally, the quality of this data determines the quality of the improvement recommendations. Improving data quality is non-Trivial, and there is great potential to exploit unstructured text, e.g., from notes, reviews, and comments, for this purpose and to enrich event logs. To this end, this article introduces Text2EL+ , a three-phase approach to enrich event logs using unstructured text. In its first phase, events and (case and event) attributes are derived from unstructured text linked to organisational processes. In its second phase, these events and attributes undergo a semantic and contextual validation before their incorporation in the event log. In its third and final phase, recognising the importance of human domain expertise, expert guidance is used to further improve data quality by removing redundant and irrelevant events. Expert input is used to train a Named Entity Recognition (NER) model with customised tags to detect event log elements. The approach applies natural language processing techniques, sentence embeddings, training pipelines and models, as well as contextual and expression validation. Various unstructured clinical notes associated with a healthcare case study were analysed, and completeness, concordance, and correctness of the derived event log elements were evaluated through experiments. The results show that the proposed method is feasible and applicable.

Impact and interest:

0 citations in Scopus
Search Google Scholar™

Citation counts are sourced monthly from Scopus and Web of Science® citation databases.

These databases contain citations from different subsets of available publications and different time periods and thus the citation count from each is usually different. Some works are not in either database and no count is displayed. Scopus includes citations from articles published in 1996 onwards, and Web of Science® generally from 1980 onwards.

Citations counts from the Google Scholar™ indexing service can be viewed at the linked Google Scholar™ search.

ID Code: 245822
Item Type: Contribution to Journal (Journal Article)
Refereed: Yes
ORCID iD:
Wynn, Moe Thandarorcid.org/0000-0002-7205-8821
ter Hofstede, Arthurorcid.org/0000-0002-2730-0201
Measurements or Duration: 28 pages
Keywords: Data Quality, Process Mining
DOI: 10.1145/3640018
ISSN: 1936-1955
Pure ID: 156050193
Divisions: Current > Research Centres > Centre for Data Science
Current > QUT Faculties and Divisions > Faculty of Science
Current > Schools > School of Information Systems
Copyright Owner: 2024 Copyright held by the owner/author(s)
Copyright Statement: This work is covered by copyright. Unless the document is being made available under a Creative Commons Licence, you must assume that re-use is limited to personal use and that permission from the copyright owner must be obtained for all other uses. If the document is available under a Creative Commons License (or other specified license) then refer to the Licence for details of permitted re-use. It is a condition of access that users recognise and abide by the legal requirements associated with these rights. If you believe that this work infringes copyright please provide details by email to qut.copyright@qut.edu.au
Deposited On: 24 Jan 2024 04:05
Last Modified: 06 Aug 2024 21:56