Zettabyte Reliability with Flexible
End-to-end Data Integrity
Data corruption could go undetected, thus
high-level (end-to-end) integrity needed
Checksum used (strong checksum needed)
Drawback:
Performance
bad (need to compute checksum) – to change checksum online
Detection to
late (want to detect before it goes to durable storage!) --- solved by let
every component knows about the checksum
They
use corruption and checksum model to model the undected corruption probability (for
a b-bit block)
Zettabyte reliability?????
Less
than 3.46*10e-18 for 4KB block (17.5 score)
Improving disk reliabity doesn’t improve
overall reliability that much (maybe because disk corruption prob is already
small???)
How about add compue overhead as a
parameter?
PUE: (factory to data center. Used for
transformer, etc.)
2005: 2-3 2012: ~1.1
Proportionality: (when idle, don’t use
power, when computing in full load, use full power)
Effergy efficiency: 1GHz sweat spot
Increasing speed cost once:
1.
Once for switching speed
2.
Once for memory wall (caching,
prefeching, out-of-order execution)
So, sweat spot configuration (wimpy nodes)
1.6G Dral Core
32-160G flash SSD
Only 1GB RAM
Design key-value store from the very bottom
(hardware) up
Fast front end cache (cuckoo caching?)
Backend: log structure data structure +
hash table index (instead of using file system)
Partial-key caching (complete key stored
along with data to deal with collision) to enable memory efficiency
Then hardware change: (CPU 6x, memory 8x,
SSD: 30-60x)---so CPU and memory have to keep up
How do you minizie memory per entry???
Static external dictionary theory problem
EPH – 3.8bits/entry
Entroy-coded tries: 2.5 bits/entry
Think it as a pipeline!! (how??????)
They shadow writes then batchly coping them
to main index(?)
So they can only support up to 10% puts
Problem: linux kernel I/O stack too high
Load balancing: add cache in front end to deal with hot spot
Proof: only nlogn cache entries needed for
n loads to achieve almost perfect load balancing
So only need L3 for caching!
Intuition: (didn’t understand)
Their solution can’t deal with hash hacking
(assume hash function invisible)
Some tradeoff: more reads to avoid some
writes?
Bottleneck always in I/O? and also flash
performance bursts
They want to manage raw flash, and treat it
as many sequential writing devices(???????)
Now they have an SCSI command to exchange
(remap) mapping on SSD, and they can do cool things using that
没有评论:
发表评论