Yang Suli的Blog: SOCC'14 Session 1: High Performance Data Center Operating Systems and Networks

Arrakis: An OS for the Data Center

Systems in data center generally I/O bound

Today's I/O devices are fast (NIC, raid controller etc), but the OS cannot match with it.

Kernel: API, Naming, ACL, Protection, I/O scheduling, etc: two heavyweighed

Arrakis: Skip kernel and deliver I/O directly to applications, but keep classical server OS features.

Hardware can help, because more and more functionalites embedded in hardware (SR-IOV, IOMMU, Packet filters, logical disks, NIC rate limiter, etc.)

Approach: put protection, multiplexing, I/O scheduling to device, put API an I/O scheduling to Application, put naming, ACL, resouce limiting still in kernel as they are not in data path. So: device + application: data plane, kernel: control plane.

Kernel: do ACL control once when confiuring the data plane, virtual file system for naming

Redis (application): persistent data structures (log, queue etc.)

Results: In-memeary get latency reduced by 65%, put latency by 75%, 1.75x GET throughput etc.

Implication: we are all OS developers now.

I/O hardware-application co-design

Application needs fine-grained control (aka openflow): where in memory do packets go, how to route packets through cores, etc.

Application-specific storage design

Question:

Q:How does it compare with hacked Linux kernel?

A: No specific answer. Some people worked on "hacked Linux kernel", e.g., user-level networking, or Remzi's work (?)

Q: Limitations? In particular binding for large scale applications?
A: Limitations on hardware. E.g, you can't have more than a few virtual disks on a real disk, but you can do hundreds for network devices (?)

Network Subways and Rewiring:

Today's datacenter tension: cost vs. capacity, above ToR switches, average link utilization only 25%

Why: rack-level traffic is bursty/long tailed

Subways: multiple ports per server

So, what do we do with the extra links?

Today: wire to multiple core switches

Propose: connect to neighbor TOR, less ToR traffic, distribution more evenly

Result: memcached up to 2.8x performance improvement

Question:

Q:Wiring across racks could concern people (datacenter administrators)

A: We haven't talked with those people, but there is a huge performance benefit

Q: How does this change failure modes?

A: Large scale failre modes we don't know. But we can do faster local recovery etc.

Q: Power usage?

Q: Competing jobs and your rewiring?

A: We have more flexibility

Yang Suli的Blog

2014年11月3日星期一

SOCC'14 Session 1: High Performance Data Center Operating Systems and Networks

没有评论:

发表评论

网页浏览总次数

Other Useful Stuff: