[ICDCSW 2014] On Balance among Energy, Performance and Recovery in Storage Systems

With the increasing size of the clusters as well as

the increasing capacity of each storage node, current storage

systems are spending more time on recovery. When node failure

happens, the system enters degradation mode in which node

reconstruction/block recovery is initiated. This very process needs

to wake up a number of disks and takes a substantial amount of

I/O bandwidth which will not only compromise energy efficiency

but also performance. This raises a natural problem: how to

balance the performance, energy, and recovery in degradation

mode for an energy efficient storage system? Without considering

the I/O bandwidth contention between recovery and performance,

we find that the current energy proportional solutions cannot

answer these question accurately. This paper presents a mathematical

model called Perfect Energy, Reliability, and Performance

(PERP) which provides guidelines of provisioning active nodes

number and recovery speed at each time slot with respect to the

performance and recovery constraints. We apply our model to

practical data layouts and test the effectiveness on our 25-node

CASS cluster. Experimental results validate that our model helps

realize 25% energy savings while meeting both performance and

recovery constraints and the saving is expected to increase with

a larger number of nodes.