2012年2月16日星期四

FAST'12 Session 7: Cloud

BlueSky: A Cloud-Backed File System for the Enterprise
Michael Vrable, Stefan Savage, and Geoffrey M. Voelker, University of California, San Diego

Cloud Interface: only supports writing whole objects, but does support random read access
System design: NFS/CIFS interface (which itself has overhead!)
write-back caching, data asychrousely pushed to cloud
log-structured file system layout
cleaner running in the cloud (Amazon EC2)
Performance: if all hit cache, good. Otherwise, not so good. Whole segment pre-fetching improves performance.
Q&A:
Q: muliple proxies accessing the same data store?
A: thought about, not implemented. Need distributed locking if you want strong consistency.
Q; backup?
A: Because we are log structured, you just read old checkpoint region and don’t do cleaning!
Q: Was cleaning worth it? Why not just send more requests a time to increase bandwidth?
A: Big reason is cost.
Q: We don’t see those bad local numbers in Redhat!
A: Nothing special in our setup. If you use NetApp, of course it’s better.
Q: Consistency from cloud?
A: Not a big problem for us because we are log structured and don’t overwrite data!







Rethinking Erasure Codes for Cloud File Systems: Minimizing I/O for Recovery and Degraded Reads

Osama Khan and Randal Burns, Johns Hopkins University; James Plank and William Pierce,University of Tennessee; Cheng Huang, Microsoft Research


My takeaway: big-data aware erasure code is needed.

Replication too expensive for big data. Erasure coding (more fancy parity) comes to the rescue.
Two prominent operations: disk reconstruction
degraded reads
Erasure codes designed for I/O recovery. Rotated solomon code
Didn’t really understand...But basically it is a new algorithm accounted for (1. most are signle disk failure, 2. most failures are transient)








NCCloud: Applying Network Coding for the Storage Repair in a Cloud-of-Clouds

Yuchong Hu, Henry C.H. Chen, and Patrick P.C. Lee, The Chinese University of Hong Kong; Yang Tang, Columbia University

My takeaway:
if you store redundant data in a "cloud array", you want to do it in a traffic-aware way. (But why do you want to do it anyway?!)

Motivation:multiple cloud storage
They developed a proxy, which distributed data to multiple cloud transparently. Proxy can be mounted as a file system
Use MDS code for redundancy, and repair data when one cloud is not available
Goal: minimize repair traffic when doing repairing.
How: one chunk instead of whole file when reparing (use regenerating coding)
System design:
code chunk = linear combination of original data chunks.
repaire: one chunk from each survning nodes
and some details I don’t understand, but mostly coding details and how to minimize traffice

Q&A:
Q: What are the odds of losing two clouds?
A: Not all clouds as reliable as Amazon S3
Q: Cost for addtional code storage? Not really feasible?
A: Thank you for your comments...

没有评论:

发表评论