Estimation of discrete choice models considering simultaneously multiple objectives and complex data characteristics

, , , Pinz, Alban, & (2024) Estimation of discrete choice models considering simultaneously multiple objectives and complex data characteristics. Transportation Research Part C: Emerging Technologies, 160, Article number: 104517.

Open access copy at publisher website

Description

This paper focuses on the discrete choice estimation problem, which involves multiple objectives and testing a broad range of hypotheses that can affect both interpretability and prediction accuracy. Previous studies have proposed mathematical programming formulations to assist with hypothesis testing and estimation. However, there is limited knowledge regarding the effect of in- and out-of-sample model performance criteria during the search for parsimonious specifications. To address this knowledge gap, a multi-objective optimization framework is proposed, including both in-sample goodness-of-fit and out-of-sample predictive accuracy, to generate multiple unique specifications and perform extensive hypothesis testing considering simultaneously potential explanatory variables, their functional forms, nonlinearities, heterogeneous effects, and correlations. A metaheuristic was designed and implemented to solve the proposed multi-objective nonlinear mixed-integer mathematical programming problem. Experiments, including various datasets and discrete choices, were used to illustrate the efficacy of the proposed framework. The goal was to find specifications that are either similar or dominate those reported in literature, considering both interpretability and prediction accuracy. Important insights regarding potential explanatory factors and heterogeneous preferences, which were not reported in literature, were captured using the proposed framework. In addition, for one of the datasets used in this study, the proposed framework enabled the discovery of three distinct clusters considering specification type and model performance in terms of interpretability and prediction accuracy. For the given dataset, these clusters suggest that the proposed approach allowed extensive exploration of the data across different specification types. In addition, the Mixed-Logit models with correlated parameters were found to perform significantly better in terms of in-sample fit than those without correlation. Similarly, multinomial-Logit models showed the worst performance for the given dataset. In contrast, multinomial-Logit models provided superior out-of-sample fit relative to advanced specifications, which illustrates trade-offs between model in- and out-of-sample fitness. A comparative analysis, including multiple performance measures, was also conducted. The results suggest that model evaluation using in-sample Bayesian Information Criterion (BIC) and out-of-sample Mean Absolute Error (MAE), and in-sample BIC and out-of-sample Mean Squared Error (MSE) enables estimation of specifications with better in- and out-of-sample performance compared to those estimated using maximum log-likelihood and minimum number of model parameters. In addition, a mostly linear relationship was observed between in-sample and out-of-sample log-likelihood, indicating that the latter does not provide much additional information regarding prediction compared to the in-sample estimates. These results showed the value of using an optimization framework to support modelling decisions by enabling extensive hypothesis testing and including multiple performance criteria as well as complex data characteristics to discover important and reliable insights.

Impact and interest:

0 citations in Scopus
Search Google Scholar™

Citation counts are sourced monthly from Scopus and Web of Science® citation databases.

These databases contain citations from different subsets of available publications and different time periods and thus the citation count from each is usually different. Some works are not in either database and no count is displayed. Scopus includes citations from articles published in 1996 onwards, and Web of Science® generally from 1980 onwards.

Citations counts from the Google Scholar™ indexing service can be viewed at the linked Google Scholar™ search.

ID Code: 246995
Item Type: Contribution to Journal (Journal Article)
Refereed: Yes
ORCID iD:
Beeramoole, Prithvi Bhatorcid.org/0000-0002-8192-4146
Haque, Md Mazharulorcid.org/0000-0003-1016-110X
Paz, Alexanderorcid.org/0000-0002-1217-9808
Measurements or Duration: 26 pages
Keywords: Discrete choice, Discrete choice models, Metaheuristic, Multi-objective, Optimization
DOI: 10.1016/j.trc.2024.104517
ISSN: 0968-090X
Pure ID: 163536677
Divisions: Current > Research Centres > Centre for Future Mobility/CARRSQ
Current > QUT Faculties and Divisions > Faculty of Science
Current > Schools > School of Mathematical Sciences
Current > QUT Faculties and Divisions > Faculty of Engineering
Current > Schools > School of Civil & Environmental Engineering
Current > QUT Faculties and Divisions > Faculty of Health
Funding Information: This study was supported by the “Transport Academic Partnership” between the Queensland Department of Transport and Main Roads and the Queensland University of Technology .
Copyright Owner: 2024 The Author(s)
Copyright Statement: This work is covered by copyright. Unless the document is being made available under a Creative Commons Licence, you must assume that re-use is limited to personal use and that permission from the copyright owner must be obtained for all other uses. If the document is available under a Creative Commons License (or other specified license) then refer to the Licence for details of permitted re-use. It is a condition of access that users recognise and abide by the legal requirements associated with these rights. If you believe that this work infringes copyright please provide details by email to qut.copyright@qut.edu.au
Deposited On: 06 Mar 2024 07:06
Last Modified: 02 Aug 2024 02:51