Controlled automated discovery of collections of business process models

Garcia-Banuelos, Luciano, Dumas, Marlon, La Rosa, Marcello, De Weerdt, Jochen, & Ekanayake, Chathura C. (2014) Controlled automated discovery of collections of business process models. Information Systems, 46, pp. 85-101.

View at publisher


Automated process discovery techniques aim at extracting process models from information system logs. Existing techniques in this space are effective when applied to relatively small or regular logs, but generate spaghetti-like and sometimes inaccurate models when confronted to logs with high variability. In previous work, trace clustering has been applied in an attempt to reduce the size and complexity of automatically discovered process models. The idea is to split the log into clusters and to discover one model per cluster. This leads to a collection of process models – each one representing a variant of the business process – as opposed to an all-encompassing model. Still, models produced in this way may exhibit unacceptably high complexity and low fitness. In this setting, this paper presents a two-way divide-and-conquer process discovery technique, wherein the discovered process models are split on the one hand by variants and on the other hand hierarchically using subprocess extraction. Splitting is performed in a controlled manner in order to achieve user-defined complexity or fitness thresholds. Experiments on real-life logs show that the technique produces collections of models substantially smaller than those extracted by applying existing trace clustering techniques, while allowing the user to control the fitness of the resulting models.

Impact and interest:

4 citations in Scopus
Search Google Scholar™
3 citations in Web of Science®

Citation counts are sourced monthly from Scopus and Web of Science® citation databases.

These databases contain citations from different subsets of available publications and different time periods and thus the citation count from each is usually different. Some works are not in either database and no count is displayed. Scopus includes citations from articles published in 1996 onwards, and Web of Science® generally from 1980 onwards.

Citations counts from the Google Scholar™ indexing service can be viewed at the linked Google Scholar™ search.

Full-text downloads:

7 since deposited on 27 Apr 2014
5 in the past twelve months

Full-text downloads displays the total number of times this work’s files (e.g., a PDF) have been downloaded from QUT ePrints as well as the number of downloads in the previous 365 days. The count includes downloads for all files if a work has more than one.

ID Code: 70567
Item Type: Journal Article
Refereed: Yes
Additional URLs:
Keywords: process mining, process discovery, trace clustering, clone detection, process model collection
DOI: 10.1016/
ISSN: 0306-4379
Subjects: Australian and New Zealand Standard Research Classification > INFORMATION AND COMPUTING SCIENCES (080000) > INFORMATION SYSTEMS (080600)
Divisions: Current > Schools > School of Information Systems
Current > QUT Faculties and Divisions > Science & Engineering Faculty
Copyright Owner: Copyright 2014 Elsevier Ltd
Copyright Statement: NOTICE: this is the author’s version of a work that was accepted for publication in Information Systems. Changes resulting from the publishing process, such as peer review, editing, corrections, structural formatting, and other quality control mechanisms may not be reflected in this document. Changes may have been made to this work since it was submitted for publication. A definitive version was subsequently published in Information Systems, [VOL#, ISSUE#, (DATE)] DOI#ÂÂÂÂÂÂÂÂÂÂÂÂÂ
Deposited On: 27 Apr 2014 22:53
Last Modified: 01 Jan 2017 14:00

Export: EndNote | Dublin Core | BibTeX

Repository Staff Only: item control page