Microsoft Research
SOSP 2011
GFS-like stream layer. They added another layer on top of it to implement Blobs, Tables, Queues (for reliable message) and Drives (NTFS volume) abstractions.
Focused on load balancing (by their fancy index techniques?) and consistency protocol.
Scale computing separate from storage: nice for multi-tenant, bi-sectional environment bad for latency/bandwidth from storage. They didn't talk about how Azure Storage stresses their network though, just general load balancing.
Argon: performance insulation for shared storage servers
Matthew Wachs, Michael Abd-El-Malek, Eno Thereska, Gregory R. Ganger
Carnegie Mellon University
FAST 2007 
Performance isolation for storage systems. However, they focus on the I/O bound, by assigning time quota to use disk, combined with pre-fetching etc.
For nonstorage resources like CPU time and network bandwidth, they claim that well-established resource management mechanisms can support time-sharing with minimal inefficiency from interference and context switching. The Panasas ActiveScale Storage Cluster: Delivering Scalable High Bandwidth Storage:
FIXME: read it!
Proportional-Share Scheduling for Distributed Storage Systems
UMich/HP Lab
FAST 2007
Assumed dedicated storage system (client v.s. data nodes), a variation of fair queuing to serve requests. In a sense similar to Fair cloud, which shedules requests to key-value store, however no replication/migration/multiple resources type etc. so flexiblity of this scheme is limited. They do it in a distributed way, v.s., fair cloud uses a central scheduler.
They assume network is good enough.
Ursa Minor: versatile cluster-based storage
CMU
FAST 2005
Online change (software defined, late binding, whatever you call it) and adaptive management to data encoding (replication or parity etc.) and fault model (how many stop-fail failure and how many byzantine failure to tolerate). Not quite sure how they learn it though??
Others:
Storage Virtualization / Software Defined Storage:
SNIA Technical tutorial on Storage Virtualization from 2003 -
http://www.snia.org/sites/
SANRAD white paper about snapshots and Global replication etc with
storage virtualization -
http://www.sanrad.com/uploads/
Cloud database / file systems:
Smoke and mirrors: reflecting files at a geographically remote
location without loss of performance -
http://dl.acm.org/citation.
Cassandra - A Decentralized Structured Storage System -
http://dl.acm.org/citation.
API:
Cassandra offers a key + structured object (which consists of multiple, hierarchical column families) data model, and provides atomic operation per key per replica, no transactional guarantee here. Namely, insert row, get column of row and delete column ofrow are the main API.
Distributed aspects:
Cassandra is a decentralized (chord-like) distributed storage system, which uses chord-style consistent hashing for key partitioning and chord-style replication management (replicate to next N-1 nodes in the ring, but possibly rack aware). . All standard distributed techniques here. One thing interesting:
They use an accrual failure detector to detect node failure, and observes exponential time interval distribution for gossip message arrivals.
Local persistence mechanism (all sequential):
1. Data is first persisted to a commit log (to ensure durability) and only after than changes are applied to in-memory data structure. This commit log (or journal) is written sequentially to a dedicated disk, and, of course, a lot of fsync.
2. They write in-memory structure to disk also in a sequential-only fashion, i.e., every time memory is full, a bunch of new files are generated, instead of modifying and argumenting existing file. More specifically, one file per column family, one file for primary key, and also an index file for the primary key. (Sort of column-orientated storage, but not entirely). After this, the corresponding commit log could be deleted.
3. They combine these generated files into large files periodically.
The Case for RAMClouds: Scalable High-Performance Storage Entirely in
DRAM - http://dl.acm.org/citation.
 
没有评论:
发表评论