Fast content-based file type identification
Ahmed, Irfan, Lhee, Kyung-Suk, Shin, Hyun-Jung, & Hong, Man-Pyo (2011) Fast content-based file type identification. In Sujeet, Shenoi & Peterson, Bert (Eds.) 7th Annual IFIP WG 11.9 International Conference on Digital Forensics, January 30 - February 2, 2011, Orlando, Florida.
Administrators only | Request a copy from author
Digital forensic examiners often need to identify the type of a file or file fragment based only on the content of the file. Content-based file type identification schemes typically use a byte frequency distribution with statistical machine learning to classify file types. Most algorithms analyze the entire file content to obtain the byte frequency distribution, a technique that is inefficient and time consuming. This paper proposes two techniques for reducing the classification time. The first technique selects a subset of features based on the frequency of occurrence. The second speeds classification by sampling several blocks from the file. Experimental results demonstrate that up to a fifteen-fold reduction in file size analysis time can be achieved with limited impact on accuracy.
Impact and interest:
Citation counts are sourced monthly from and citation databases.
Citations counts from theindexing service can be viewed at the linked Google Scholar™ search.
|Item Type:||Conference Paper|
|Keywords:||File type identification, File content classification, Byte frequency|
|Subjects:||Australian and New Zealand Standard Research Classification > INFORMATION AND COMPUTING SCIENCES (080000) > COMPUTER SOFTWARE (080300) > Computer System Security (080303)|
|Divisions:||Past > Institutes > Information Security Institute|
|Copyright Owner:||Copyright 2011 Springer|
|Copyright Statement:||This is the author-version of the work. Conference proceedings published, by Springer Verlag, will be available via SpringerLink. http://www.springerlink.com|
|Deposited On:||28 Aug 2011 22:26|
|Last Modified:||27 Jan 2012 03:24|
Repository Staff Only: item control page