[IPDPS 2016] DataNet: A Data Distribution-aware Method for Sub-dataset Analysis on Distributed File Systems