Text2EL: Exploiting Unstructured Text for Event Log Enrichment

, , & (2022) Text2EL: Exploiting Unstructured Text for Event Log Enrichment. In Yetongnon, Kokou, Dipanda, Albert, & Gallo, Luigi (Eds.) Proceedings of the 2022 16th International Conference on Signal-Image Technology and Internet-Based Systems (SITIS). Institute of Electrical and Electronics Engineers Inc., United States of America, pp. 1-8.

View at publisher

Description

Process mining provides a range of methods and techniques to analyse business processes through information stored in so-called event logs. The richer these event logs and the higher quality they are, the more insights we can obtain. Till now, information in the form of unstructured text, e.g. notes, comments, reviews, and posts, is not fully and systematically exploited for the purposes of log enrichment. In this paper, we introduce Text2EL, a two-phase event log enrichment approach based on unstructured text. In Phase 1, events, case attributes, and event attributes are extracted from unstructured text associated with organisational processes. In Phase 2, the extracted events and attributes are semantically and contextually validated before enriching the event log. Our approach applies techniques from natural language processing, sentence embeddings, and contextual and expression validation. We evaluated the completeness, concordance, and correctness of an enriched event log through experiments with a real-life healthcare data set. The experiments showed the feasibility and applicability of our approach.

Impact and interest:

1 citations in Scopus
Search Google Scholar™

Citation counts are sourced monthly from Scopus and Web of Science® citation databases.

These databases contain citations from different subsets of available publications and different time periods and thus the citation count from each is usually different. Some works are not in either database and no count is displayed. Scopus includes citations from articles published in 1996 onwards, and Web of Science® generally from 1980 onwards.

Citations counts from the Google Scholar™ indexing service can be viewed at the linked Google Scholar™ search.

ID Code: 235943
Item Type: Chapter in Book, Report or Conference volume (Conference contribution)
Series Name: Proceedings - 16th International Conference on Signal-Image Technology and Internet-Based Systems, SITIS 2022
ORCID iD:
Wynn, Moe Thandarorcid.org/0000-0002-7205-8821
ter Hofstede, Arthurorcid.org/0000-0002-2730-0201
Measurements or Duration: 8 pages
Keywords: event log, data quality, unstructured text, natural language processing, semantic validation
DOI: 10.1109/SITIS57111.2022.00010
ISBN: 978-1-6654-6496-3
Pure ID: 131960380
Divisions: Current > Research Centres > Centre for Behavioural Economics, Society & Technology
Current > Research Centres > Centre for Data Science
Current > Research Centres > Centre for Biomedical Technologies
Current > QUT Faculties and Divisions > Faculty of Business & Law
Current > QUT Faculties and Divisions > Faculty of Science
Current > Schools > School of Information Systems
Current > QUT Faculties and Divisions > Faculty of Engineering
Copyright Owner: Consult author(s) regarding copyright matters
Copyright Statement: This work is covered by copyright. Unless the document is being made available under a Creative Commons Licence, you must assume that re-use is limited to personal use and that permission from the copyright owner must be obtained for all other uses. If the document is available under a Creative Commons License (or other specified license) then refer to the Licence for details of permitted re-use. It is a condition of access that users recognise and abide by the legal requirements associated with these rights. If you believe that this work infringes copyright please provide details by email to qut.copyright@qut.edu.au
Deposited On: 03 Nov 2022 02:48
Last Modified: 29 Feb 2024 15:32