Fast content-based file type identification
Ahmed, Irfan, Lhee, Kyung-Suk, Shin, Hyun-Jung, & Hong, Man-Pyo (2011) Fast content-based file type identification. In Sujeet, Shenoi & Peterson, Bert (Eds.) 7th Annual IFIP WG 11.9 International Conference on Digital Forensics, January 30 - February 2, 2011, Orlando, Florida.
Administrators only | Request a copy from author
Digital forensic examiners often need to identify the type of a file or file fragment based only on the content of the file. Content-based file type identification schemes typically use a byte frequency distribution with statistical machine learning to classify file types. Most algorithms analyze the entire file content to obtain the byte frequency distribution, a technique that is inefficient and time consuming. This paper proposes two techniques for reducing the classification time. The first technique selects a subset of features based on the frequency of occurrence. The second speeds classification by sampling several blocks from the file. Experimental results demonstrate that up to a fifteen-fold reduction in file size analysis time can be achieved with limited impact on accuracy.
Impact and interest:
Citation counts are sourced monthly from and citation databases.
These databases contain citations from different subsets of available publications and different time periods and thus the citation count from each is usually different. Some works are not in either database and no count is displayed. Scopus includes citations from articles published in 1996 onwards, and Web of Science® generally from 1980 onwards.
Citations counts from theindexing service can be viewed at the linked Google Scholar™ search.
|Item Type:||Conference Paper|
|Keywords:||File type identification, File content classification, Byte frequency|
|Subjects:||Australian and New Zealand Standard Research Classification > INFORMATION AND COMPUTING SCIENCES (080000) > COMPUTER SOFTWARE (080300) > Computer System Security (080303)|
|Divisions:||Past > Institutes > Information Security Institute|
|Copyright Owner:||Copyright 2011 Springer|
This is the author-version of the work.
Conference proceedings published, by Springer Verlag, will be available via SpringerLink. http://www.springerlink.com
|Deposited On:||28 Aug 2011 22:26|
|Last Modified:||27 Jan 2012 03:24|
Repository Staff Only: item control page