from Microsoft Research and Microsoft
SOSP 2013
This is the closest architecture to what I imagine Software-Defined Storage should be like.
They borrowed the whole set of ideas from SDN, especially the separation and programmability of the control plane.
Key Idea:
1. Define IOFlows as {VMs, operations, files, shares}, basically source, destination and type (read/write/create)
2. A language to describe end-to-end policies over different I/O flows (bandwidth, priority, middleboxes etc.)
3. Define stages as different I/O request processing points (storage drivers, NICs, etc.), and expose a standard queuing abstraction. (analogy to switches and openflow in SDN)
4. Centralized controller to realize policies by specify queuing rules at different stage (analogy to SDN controller, distributed state abstraction is also realized here). The rules are updated periodically based on running statistics.
A lot of challenges can then be solved by directly using SDN-based techniques.
Challenges Specific for Storage:
1. No standard I/O request header (like network packet header)
Solution: controller maps flow names to low level identifiers (files, blocks, etc.)
2. I/O requests, unlike network packets, have different types (read/write/create), which requires different treatment
Solution: releasing tokens based on end-to-end cost of operations. This cost is estimated by running benchmark to get the cost of I/O request as a function of their type and size.
Unsolved Problems:
1. In order to realize some policies, need to enforce rules at EVERY I/O processing point. (e.g, HOL blocking for priority)
For example, file systems(journals), I/O schedulers, network switches, distributed file systems, etc. How to enforce policies at these places, how do they work together? They didn't address this.
(The unsupported layer problem could be alleviated by controlling the numbers of outstanding I/Os in those layers though)
2. Didn't consider the impact of network.
They assume a good enough network (40G/s), didn't consider the possibly complicated storage/network interactions.
3. Doesnot consider the location information of IO requests.
4. Need to operate on the VM granularity, no fine control.
5. This kind of queuing model doesn't work well with cache mechanism.
the biggest problem I have with this paper is that they did not use disk in their evaluation for the effectiveness of policy enforcement. Instead, they used memory-store. The challenge from disk-store is that the throughput a disk-based storage will provide highly depends on what workloads are running on top of it. for example, random workloads will bring down the bandwidth significantly. It then becomes (much) more challenging to enforce QoS. In fact, I don't believe their system can enforce bandwidth guarantees for disk-based storages, unless more work needs to be done.
回复删除Or put it more generally, even though they claim this an architecture for storage, their work didn't address any unique challenges which SDS would have that is different from SDN. Thus it is still unclear that whether this architecture will work for storage systems (not just SMB and network drivers...) in general, and how we could borrow SDN's well established techniques into SDS.
回复删除Some guys I know do not like this paper at all. But I think it points to an interesting direction. And the work this paper left out would be our opportunities :)