2014年4月4日星期五

Strata: High-Performance Scalable Storage on Virtualized Non-volatile Memory

Dutch Mayer, Coho Data

(An extension to the FAST'14 talk)
Actually I think it is the most SDS like work I ever heard about. They deal with SDN-storage interaction too


Goal: take high end flash memory, add all the enterprise features and sustain the high end performance as much as possible (not possible to sustain all because the flash is so fast, everything you put on top is problematic)

PCI-based Flash: a first step toward a real use of non-volatile memory
Problem:
1. It is really fast, and if you put it in a system, something will break and you won't get the raw performance.
   -- prioritize device utilization
2. Hardware evolves really quickly, so need to abstract it in some way
    -- virtualize, scale out
3. Have to play well with others (dealing with enterprise, you have to support old protocols, various hardware, different application domains, etc...)

Ideas:
high end flash now look a lot like CPUs: fast, expensive, mostly idle. So we want to virtualize it like we virtualize CPU.
Actually flash is frequently bottlenecked on CPU (if you add more cores, you might get 80% performance boost). So you really need to balance the cpu, the network and the flash


architecture 1. virtualization:
each flash organized as a log structure file system with updated records
records form objects, and objects are organized in a B+ tree style, eventually a big dumb address space

architecture 2. flexibility
inspired by Click: packets (requests) pass through pipeline, (data pahs) at each stage of the pipeline, you mutate it (or pass it...). That's how they handle replication, tripping, etc. (This is kind of SDS like, isn't it...)

load balancing with data paths: just by chaining data paths
from data paths to protocols: a library which binds to different kind of front end (NFS, fuse, mysql). They also allow you to push things down into the data path...

They ship with an SDN switch...(and a lot of other stuff which their box work with...)

Some technical problems:
1. support NFSv3 as a scale-out architecture (they use SDN Scaling, use the switch as a load balancer by pushing some rules. they are limited by the size of rule tables though)
2. remote replication in mutable trees

Future work:
1. they focused on utilization, bu QoS is important too
2. tiering and prefetch
3. next year's flash is 2x better

Q&A:
Q:How do you analyze where you bottleneck. Where do you bottleneck?
A: Btree, uncontested locks (because of cache coherence) and almost everything. Doing analysis is really hard, instrumentation and tracing help a lot. Then we come up with a theory and try test it.









没有评论:

发表评论