Session 1: Implications of New Storage Technology
De-indirection in SSD with nameless writes
from our group
Q&A:
Q: SSD is not hard disk drive. Why not expose SSD internals to file systmes?
A: Let vendors control SSD internals
Q: How about asscociate data in calbacks?
A: in OOB
Q: Why not richer interface? Hints to device maybe?
A: That could be useful.
Q: More interesting with BrtFS?
The Bleak Future of NAND Flash Memory
Laura M. Grupp, University of California, San Diego; John D. Davis, Microsoft Research, Mountain View; Steven Swanson,University of California, San Diego
My takeaway: SSD not replacing HDD, tradeoff must be made to increase capacity and such.
Flash memeory case study. They looked at capacity, latency and througput.
How to increase density: multi-bit cells, Moore's Law.
Use them to predict future density: 1.6T in 2024 at best?
Latency: SLC-1, MLC-2, TLC-3, higher capacity, larger latency!
So latency likely to increase in the future (3ms for 1.6TB for TLC-3?)
Throughput: for fixed size capacity, throughput for TLC/MLC-2 far worse than SLC-3 (0.7x)
IOPS: 0.4x (32k, for HDD it's 0.2k)
Conclution: not so greater compared to HDD (in some cases!)
Q&A:
Q: Future doesn't seem so bleak?
A: SSD don't just "get better". Tradeoffs instead of straightly got better.
Q: Power characteristics?
A: Didn't study
Q: Lifetime for SLC-1, MLC-2 and TLC-3?
A: drop form 10,000 o 500!
When Poll Is Better than Interrupt
Jisoo Yang, Dave B. Minturn, and Frank Hady, Intel Corporation
My takeaway: well, everybody know poll is better when ops are fast...But they talked in detail how asych
NVM and future SSD made of NVM: fast, use up of PCI bus bandwidth
Traditional approach (asynchronous model)
I/O request submitted to device, SSD interrupts with IO competition. (CPU free while doing I/O)
Synchrous model:
Bypass kernel block I/O layer, send request directly to device and poll. (CPU busy polling while doing I/O, only beneficial when device fast)
Prototype: NVM Express interface (really fast! 4 us per 4K)~
Measurements shows that synchronous model faster!!!
Futher issus with Async I/O
1. Device undertuitlized.when IOPS pressed (why??????)
2. Interrupt overhead: can be reduced by coalescing, but increase latency
3. Negative on cache and TLB thrashing
Implication:
Non-blocking i/o useless
Rethink I/O buffering (esp. I/O prefetchiing) why????
Q&A:
Q: Multi-thread implication?
A: dedicated pooling loop in current implementation.
Q: how about if the request is long? CPU polling for 5-10 ms???
Q: ????
Q; according to last talk, are we going to get that latency you are assuming????
A: last talk in about NAND, not the same thing?
Q: even with polling, OS overhead is big (50%). Should we free OS completely? Saying doing I/O in user-space or with GPU?
A: maintaining current interface is nice.
Q: make use of concurrency, oen thread doing polling to get potential benefit?
A: depends on app logic. And blahblahblah….
Q: overhead breakdown? (context switch time? You are using make_request instead of request function kernel provides!)
A: refer to other paper….