Hierarchical models for 2D presence/absence data having ambiguous zeroes: With a biogeographical case study on dingo behaviour

Low Choy, Samantha Jane (2001) Hierarchical models for 2D presence/absence data having ambiguous zeroes: With a biogeographical case study on dingo behaviour. PhD thesis, Queensland University of Technology.

Abstract

This dissertation is primarily an applied statistical modelling investigation, motivated by a case study comprising real data and real questions. Theoretical questions on modelling and computation of normalization constants arose from pursuit of these data analytic questions.

The essence of the thesis can be described as follows.

Consider binary data observed on a two-dimensional lattice. A common problem with such data is the ambiguity of zeroes recorded. These may represent zero response given some threshold (presence) or that the threshold has not been triggered (absence). Suppose that the researcher wishes to estimate the effects of covariates on the binary responses, whilst taking into account underlying spatial variation, which is itself of some interest. This situation arises in many contexts and the dingo, cypress and toad case studies described in the motivation chapter are examples of this.

Two main approaches to modelling and inference are investigated in this thesis.

The first is frequentist and based on generalized linear models, with spatial variation modelled by using a block structure or by smoothing the residuals spatially. The EM algorithm can be used to obtain point estimates, coupled with bootstrapping or asymptotic MLE estimates for standard errors.

The second approach is Bayesian and based on a three- or four-tier hierarchical model, comprising a logistic regression with covariates for the data layer, a binary Markov Random field (MRF) for the underlying spatial process, and suitable priors for parameters in these main models. The three-parameter autologistic model is a particular MRF of interest. Markov chain Monte Carlo (MCMC) methods comprising hybrid Metropolis/Gibbs samplers is suitable for computation in this situation. Model performance can be gauged by MCMC diagnostics.

Model choice can be assessed by incorporating another tier in the modelling hierarchy. This requires evaluation of a normalization constant, a notoriously difficult problem. Difficulty with estimating the normalization constant for the MRF can be overcome by using a path integral approach, although this is a highly computationally intensive method.

Different methods of estimating ratios of normalization constants (N Cs) are investigated, including importance sampling Monte Carlo (ISMC), dependent Monte Carlo based on MCMC simulations (MCMC), and reverse logistic regression (RLR). I develop an idea present though not fully developed in the literature, and propose the Integrated mean canonical statistic (IMCS) method for estimating log NC ratios for binary MRFs. The IMCS method falls within the framework of the newly identified path sampling methods of Gelman & Meng (1998) and outperforms ISMC, MCMC and RLR. It also does not rely on simplifying assumptions, such as ignoring spatio-temporal dependence in the process. A thorough investigation is made of the application of IMCS to the three-parameter Autologistic model. This work introduces background computations required for the full implementation of the four-tier model in Chapter 7.

Two different extensions of the three-tier model to a four-tier version are investigated. The first extension incorporates temporal dependence in the underlying spatio-temporal process. The second extensions allows the successes and failures in the data layer to depend on time. The MCMC computational method is extended to incorporate the extra layer. A major contribution of the thesis is the development of a fully Bayesian approach to inference for these hierarchical models for the first time.

Note: The author of this thesis has agreed to make it open access but invites people downloading the thesis to send her an email via the 'Contact Author' function.

Impact and interest:

Citation counts are sourced monthly from Scopus and Web of Science® citation databases.

These databases contain citations from different subsets of available publications and different time periods and thus the citation count from each is usually different. Some works are not in either database and no count is displayed. Scopus includes citations from articles published in 1996 onwards, and Web of Science® generally from 1980 onwards.

Citations counts from the Google Scholar™ indexing service can be viewed at the linked Google Scholar™ search.

Full-text downloads:

17 since deposited on 22 Sep 2010
3 in the past twelve months

Full-text downloads displays the total number of times this work’s files (e.g., a PDF) have been downloaded from QUT ePrints as well as the number of downloads in the previous 365 days. The count includes downloads for all files if a work has more than one.

ID Code: 37098
Item Type: QUT Thesis (PhD)
Additional Information:

Presented to the Centre in Statistical Science and Industrial Mathematics, School of Mathematical Sciences, Queensland University of Technology.

Thesis includes a postprint version of a paper published as:
Pettitt, A.N. and Low Choy, S. (1999) Bivariate Binary Data with Missing Values: Analysis of a Field Experiment to Investigate Chemical Attractants of Wild Dogs. Journal of Agricultural, Biological, and Environmental Statistics. Vol. 4, No. 1 (Mar., 1999), pp. 57-76

Reproduced with here with publisher permission.
http://www.jstor.org/stable/1400421

Keywords: Mathematical statistics, Dingo Geographical distribution Statistical methods, 2D lattice data, ambiguous zeroes, autologistic distribution, 3-parameter autologistic, Bayesian inference, Bernoulli-Autologistic, binary Markov random fields, binary data, biogeography, bootstrapping, dingo behaviour, distribution maps, EM algorithm, environmental mangement, frequentist inference, Gibbs sampling, hierachical model, Markov chain Monte Carlo, MCMC, Metropolis-Hastings, falsely inflated zeroes, Ising model, normalization constant, population atlas, presence/absence data, spatial statistics, spatio-temporal data, statistical modelling, underlying spatio-temporal dependence, thesis, doctoral
Divisions: Current > Schools > School of Mathematical Sciences
Institution: Queensland University of Technology
Copyright Owner: Copyright Samantha Jane Low Choy
Deposited On: 22 Sep 2010 13:07
Last Modified: 13 Sep 2016 22:14

Export: EndNote | Dublin Core | BibTeX

Repository Staff Only: item control page