2012年2月15日星期三

FAST'12 Session 1: Implication of New Storage Technology



De-indirection in SSD with nameless writes

from our group

Q&A:

Q: SSD is not hard disk drive. Why not expose SSD internals to file systmes?

A: Let vendors control SSD internals

Q: How about associate data in callbacks?

A: stored in OOB.

Q: Why not richer interface? Hints to device maybe?

A: That could be useful.

Q: More interesting with BrtFS?




The Bleak Future of NAND Flash Memory

Laura M. Grupp, University of California, San Diego; John D. Davis, Microsoft Research, Mountain View; Steven Swanson, University of California, San Diego

My takeaway:

SSD not replacing HDD, tradeoff must be made to increase capacity and such.

Flash memeory case study. They looked at capacity, latency and througput.

How to increase density: multi-bit cells, Moore's Law.

Use them to predict future density: 1.6T in 2024 at best?

Latency: SLC-1, MLC-2, TLC-3, higher capacity, larger latency!

So latency likely to increase in the future (3ms for 1.6TB for TLC-3?)

Throughput: for fixed size capacity, throughput for TLC/MLC-2 far worse than SLC-3 (0.7x)

IOPS: 0.4x (32k, for HDD it's 0.2k)

Conclusion: not so greater compared to HDD (in some cases!)

Q&A:

Q: Future doesn't seem so bleak?

A: SSD don't just "get better". Tradeoffs instead of straightly got better.

Q: Power characteristics?

A: Didn't study

Q: Lifetime for SLC-1, MLC-2 and TLC-3?

A: drop form 10,000 o 500!




When Poll Is Better than Interrupt

Jisoo Yang, Dave B. Minturn, and Frank Hady, Intel Corporation

My takeaway:

well, everybody know poll is better when ops are fast...But they talked in detail how asynchronous I/O overhead breaks down (in their paper maybe?)

NVM and future SSD made of NVM: fast, use up of PCI bus bandwidth

Traditional approach (asynchronous model):

I/O request submitted to device, SSD interrupts with IO competition. (CPU free while doing I/O)

Synchrous model:

Bypass kernel block I/O layer, send request directly to device and poll. (CPU busy polling while doing I/O, only beneficial when device fast)

Prototype: NVM Express interface (really fast! 4 us per 4K)~

Measurements show that synchronous model faster!

Futher issus with Async I/O

1. Device underutilized. when IOPS pressed (why??????)

2. Interrupt overhead: can be reduced by coalescing, but increase latency

3. Negative on cache and TLB thrashing

Implication:

Non-blocking i/o useless

Rethink I/O buffering (esp. I/O perfecting) why????

Q&A:

Q: Multi-thread implication?

A: dedicated pooling loop in current implementation.

Q: how about if the request is long? CPU polling for 5-10 ms???

Q: ????

Q; According to last talk, are we going to get that latency you are assuming????

A: last talk in about NAND, not the same thing?

Q: even with polling, OS overhead is big (50%). Should we free OS completely? Saying doing I/O in user-space or with GPU?

A: maintaining current interface is nice.

Q: make use of concurrency, oen thread doing polling to get potential benefit?

A: depends on app logic. And blahblahblah….

Q: overhead breakdown? (context switch time? You are using make_request instead of request function kernel provides!)

A: refer to other paper….

没有评论:

发表评论