in HDFS, synchronous replication (in pipeline) has performance bottlenecks, and seldomly helps application performance
- only 2% of data was read within 5mins of being written
So do asynchronous replication
Need to use flow control to manage congestion as well
ManyLogs: Improved CMR/SMR Disk Bandwidth and Faster Durability with Scattered Logs
problem: small durable writes severely impacts bandwidth of other users (e.g., sequential reader)
in this case, data journaling outperforms ordered journaling!
Ordered journaling: efficient for large writes
data journaling:efficient for small writes (less seeks)
previous work: adaptive journaling (ATC'05)
many logs, small writes to the nearest log (to the current head)
where to put logs on the disk? reserve 10MB for every platter (?)
checkpointing: lazy instead of every 5 seconds for many-logs
没有评论:
发表评论