Event log imperfection patterns for process mining - Towards a systematic approach to cleaning event logs

Suriadi, Suriadi, Andrews, Robert, ter Hofstede, Arthur H.M., & Wynn, Moe T. (2016) Event log imperfection patterns for process mining - Towards a systematic approach to cleaning event logs. Information Systems. (In Press)

[img] PDF (2MB)
Administrators only | Request a copy from author
Available under License Creative Commons Attribution Non-commercial No Derivatives 4.0.

View at publisher


Process-oriented data mining (process mining) uses algorithms and data (in the form of event logs) to construct models that aim to provide insights into organisational processes. The quality of the data (both form and content) presented to the modeling algorithms is critical to the success of the process mining exercise. Cleaning event logs to address quality issues prior to conducting a process mining analysis is a necessary, but generally tedious and ad hoc task. In this paper we describe a set of data quality issues, distilled from our experiences in conducting process mining analyses, commonly found in process mining event logs or encountered while preparing event logs from raw data sources. We show that patterns are used in a variety of domains as a means for describing commonly encountered problems and solutions. The main contributions of this article are in showing that a patterns-based approach is applicable to documenting commonly encountered event log quality issues, the formulation of a set of components for describing event log quality issues as patterns, and the description of a collection of 11 event log imperfection patterns distilled from our experiences in preparing event logs. We postulate that a systematic approach to using such a pattern repository to identify and repair event log quality issues benefits both the process of preparing an event log and the quality of the resulting event log. The relevance of the pattern-based approach is illustrated via application of the patterns in a case study and through an evaluation by researchers and practitioners in the field.

Impact and interest:

Citation counts are sourced monthly from Scopus and Web of Science® citation databases.

These databases contain citations from different subsets of available publications and different time periods and thus the citation count from each is usually different. Some works are not in either database and no count is displayed. Scopus includes citations from articles published in 1996 onwards, and Web of Science® generally from 1980 onwards.

Citations counts from the Google Scholar™ indexing service can be viewed at the linked Google Scholar™ search.

ID Code: 97670
Item Type: Journal Article
Refereed: Yes
Additional URLs:
Keywords: process mining, data mining, data quality, event log quality, patterns
DOI: 10.1016/j.is.2016.07.011
ISSN: 0306-4379
Divisions: Current > Schools > School of Information Systems
Current > QUT Faculties and Divisions > Science & Engineering Faculty
Copyright Owner: Copyright 2016 Elsevier B.V.
Deposited On: 26 Jul 2016 22:55
Last Modified: 19 Oct 2016 04:57

Export: EndNote | Dublin Core | BibTeX

Repository Staff Only: item control page