study a dataset of 21 months, 1 snapshot per user per day
tracer.filesystems.org
a lot of small files (< 1M), but a few large files consume most of the space
in general, small files achieve higher deduplication ratio than large files
per-user deduplication ratio, redundancy (across users), differs a lot
Lazy Exact Deduplication
postpone disk lookups (fingerprints lookup) until we can do them in a batch
Sorted Deduplication: How to Process Thousands of Backup Streams
requirement is changing: a few large streams ---> many streams (e.g., cloud backup)
Effects of Prolonged Media Usage and Long-term Planning on Archival Systems
preserving data for ~100 to ~1000 years
question:
when do you retire/replace media?
how long do you plan for?
Failure scenarios: device failures and economic failure
1. should media be used past their manufacture suggested service life or warranty period?
(for archival data disk might last longer)
have a model to model the purchase, maintaining and retiring phase to calculate cost
没有评论:
发表评论