2014年11月3日星期一

SOCC'14 Session 1: High Performance Data Center Operating Systems and Networks

Arrakis: An OS for the Data Center

Systems in data center generally I/O bound
Today's I/O devices are fast (NIC, raid controller etc), but the OS cannot match with it.

Kernel: API, Naming, ACL, Protection, I/O scheduling, etc: two heavyweighed

Arrakis: Skip kernel and deliver I/O directly to applications, but keep classical server OS features.

Hardware can help, because more and more functionalites embedded in hardware (SR-IOV, IOMMU, Packet filters, logical disks, NIC rate limiter, etc.)

Approach: put protection, multiplexing, I/O scheduling to device, put API an I/O scheduling to Application, put naming, ACL, resouce limiting still in kernel as they are not in data path. So: device + application: data plane, kernel: control plane.

Kernel: do ACL control once when confiuring the data plane, virtual file system for naming
Redis (application): persistent data structures (log, queue etc.)

Results: In-memeary get latency reduced by 65%, put latency by 75%, 1.75x GET throughput etc.

Implication: we are all OS developers now.
I/O hardware-application co-design
Application needs fine-grained control (aka openflow): where in memory do packets go, how to route packets through cores, etc.
Application-specific storage design

Question:
Q:How does it compare with hacked Linux kernel?
A: No specific answer. Some people worked on "hacked Linux kernel", e.g., user-level networking, or Remzi's work (?)
Q: Limitations? In particular binding for large scale applications?
A: Limitations on hardware. E.g, you can't have more than a few virtual disks on a real disk, but you can do hundreds for network devices (?)



Network Subways and Rewiring:

Today's datacenter tension: cost vs. capacity, above ToR switches, average link utilization only 25%

Why: rack-level traffic is bursty/long tailed

Subways: multiple ports per server
So, what do we do with the extra links?
Today:  wire to multiple core switches
Propose: connect to neighbor TOR, less ToR traffic, distribution more evenly

Result: memcached up to 2.8x performance improvement

Question:
Q:Wiring across racks could concern people (datacenter administrators)
A: We haven't talked with those people, but there is a huge performance benefit
Q: How does this change failure modes?
A: Large scale failre modes we don't know. But we can do faster local recovery etc.
Q: Power usage?
Q: Competing jobs and your rewiring?
A: We have more flexibility


没有评论:

发表评论