QUT ePrints

Fast content-based file type identification

Ahmed, Irfan, Lhee, Kyung-Suk , Shin, Hyun-Jung , & Hong, Man-Pyo (2011) Fast content-based file type identification. In Sujeet, Shenoi & Peterson, Bert (Eds.) 7th Annual IFIP WG 11.9 International Conference on Digital Forensics, January 30 - February 2, 2011, Orlando, Florida.

[img] PDF (446kB)
Administrators only | Request a copy from author

    View at publisher

    Abstract

    Digital forensic examiners often need to identify the type of a file or file fragment based only on the content of the file. Content-based file type identification schemes typically use a byte frequency distribution with statistical machine learning to classify file types. Most algorithms analyze the entire file content to obtain the byte frequency distribution, a technique that is inefficient and time consuming. This paper proposes two techniques for reducing the classification time. The first technique selects a subset of features based on the frequency of occurrence. The second speeds classification by sampling several blocks from the file. Experimental results demonstrate that up to a fifteen-fold reduction in file size analysis time can be achieved with limited impact on accuracy.

    Impact and interest:

    Citation countsare sourced monthly from Scopus and Web of Science® citation databases.

    These databases contain citations from different subsets of available publications and different time periods and thus the citation count from each is usually different. Some works are not in either database and no count is displayed. Scopus includes citations from articles published in 1996 onwards, and Web of Science® generally from 1980 onwards.

    Citations counts from the Google Scholar™ indexing service can be viewed at the linked Google Scholar™ search.

    ID Code: 41535
    Item Type: Conference Paper
    Additional URLs:
    Keywords: File type identification, File content classification, Byte frequency
    ISBN: 9783642242113
    Subjects: Australian and New Zealand Standard Research Classification > INFORMATION AND COMPUTING SCIENCES (080000) > COMPUTER SOFTWARE (080300) > Computer System Security (080303)
    Divisions: Past > Institutes > Information Security Institute
    Copyright Owner: Copyright 2011 Springer
    Copyright Statement: This is the author-version of the work. Conference proceedings published, by Springer Verlag, will be available via SpringerLink. http://www.springerlink.com
    Deposited On: 29 Aug 2011 08:26
    Last Modified: 27 Jan 2012 13:24

    Export: EndNote | Dublin Core | BibTeX

    Repository Staff Only: item control page