2016年5月5日星期四

MSST'16 session 3 Store More, Longer, and for Less: Deduplication and Archival Systems

A Long Term User-Centric Analysis of Deduplication Patterns

study a dataset of 21 months, 1 snapshot per user per day
tracer.filesystems.org 

a lot of small files (< 1M), but a few large files consume most of the space
in general, small files achieve higher deduplication ratio than large files 

per-user deduplication ratio, redundancy (across users),  differs a lot  

Lazy Exact Deduplication

postpone disk lookups (fingerprints lookup) until we can do them in a batch

Sorted Deduplication: How to Process Thousands of Backup Streams

requirement is changing: a few large streams ---> many streams (e.g., cloud backup)

Effects of Prolonged Media Usage and Long-term Planning on Archival Systems

preserving data for ~100 to ~1000 years

question: 
when do you retire/replace media?
how long do you plan for?

Failure scenarios: device failures and economic failure 

1. should media be used past their manufacture suggested service life or warranty period? 
    (for archival data disk might last longer) 

have a model to model the purchase, maintaining and retiring phase to calculate cost 






没有评论:

发表评论