FlexAnalytics: A flexible data analytics framework for big data applications with I/O performance improvement
Zou, Hongbo, Yu, Yongen, Tang, Wei, & Chen, Hsuan-Wei Michelle (2014) FlexAnalytics: A flexible data analytics framework for big data applications with I/O performance improvement. Big Data Research, 1, pp. 4-13.
Description
Increasingly larger scale applications are generating an unprecedented amount of data. However, the increasing gap between computation and I/O capacity on High End Computing machines makes a severe bottleneck for data analysis. Instead of moving data from its source to the output storage, in-situ analytics processes output data while simulations are running. However, in-situ data analysis incurs much more computing resource contentions with simulations. Such contentions severely damage the performance of simulation on HPE. Since different data processing strategies have different impact on performance and cost, there is a consequent need for flexibility in the location of data analytics. In this paper, we explore and analyze several potential data-analytics placement strategies along the I/O path. To find out the best strategy to reduce data movement in given situation, we propose a flexible data analytics (FlexAnalytics) framework in this paper. Based on this framework, a FlexAnalytics prototype system is developed for analytics placement. FlexAnalytics system enhances the scalability and flexibility of current I/O stack on HEC platforms and is useful for data pre-processing, runtime data analysis and visualization, as well as for large-scale data transfer. Two use cases – scientific data compression and remote visualization – have been applied in the study to verify the performance of FlexAnalytics. Experimental results demonstrate that FlexAnalytics framework increases data transition bandwidth and improves the application end-to-end transfer performance.
Impact and interest:
Citation counts are sourced monthly from Scopus and Web of Science® citation databases.
These databases contain citations from different subsets of available publications and different time periods and thus the citation count from each is usually different. Some works are not in either database and no count is displayed. Scopus includes citations from articles published in 1996 onwards, and Web of Science® generally from 1980 onwards.
Citations counts from the Google Scholar™ indexing service can be viewed at the linked Google Scholar™ search.
| ID Code: | 88652 |
|---|---|
| Item Type: | Contribution to Journal (Journal Article) |
| Refereed: | Yes |
| Measurements or Duration: | 10 pages |
| Keywords: | Big Data, Data Preparation, High-end Computing, I/o Bottlenecks, In-situ Analytics |
| DOI: | 10.1016/j.bdr.2014.07.001 |
| ISSN: | 2214-5796 |
| Pure ID: | 32743565 |
| Divisions: | Past > QUT Faculties & Divisions > Science & Engineering Faculty |
| Copyright Owner: | Consult author(s) regarding copyright matters |
| Copyright Statement: | This work is covered by copyright. Unless the document is being made available under a Creative Commons Licence, you must assume that re-use is limited to personal use and that permission from the copyright owner must be obtained for all other uses. If the document is available under a Creative Commons License (or other specified license) then refer to the Licence for details of permitted re-use. It is a condition of access that users recognise and abide by the legal requirements associated with these rights. If you believe that this work infringes copyright please provide details by email to qut.copyright@qut.edu.au |
| Deposited On: | 04 Nov 2015 10:00 |
| Last Modified: | 25 Oct 2025 07:01 |
Export: EndNote | Dublin Core | BibTeX
Repository Staff Only: item control page