Arrakis: An OS for the Data Center
Systems in data
center generally I/O bound
Today's I/O devices
are fast (NIC, raid controller etc), but the OS cannot match with it.
Kernel: API, Naming,
ACL, Protection, I/O scheduling, etc: two heavyweighed
Arrakis: Skip kernel and deliver I/O directly
to applications, but keep classical server OS features.
Hardware can help,
because more and more functionalites embedded in hardware (SR-IOV, IOMMU,
Packet filters, logical disks, NIC rate limiter, etc.)
Approach: put
protection, multiplexing, I/O scheduling to device, put API an I/O scheduling
to Application, put naming, ACL, resouce limiting still in kernel as they are
not in data path. So: device + application: data plane, kernel: control plane.
Kernel: do ACL
control once when confiuring the data plane, virtual file system for naming
Redis (application):
persistent data structures (log, queue etc.)
Results: In-memeary
get latency reduced by 65%, put latency by 75%, 1.75x GET throughput etc.
Implication: we are
all OS developers now.
I/O
hardware-application co-design
Application
needs fine-grained control (aka openflow): where in memory do packets go, how
to route packets through cores, etc.
Application-specific
storage design
Question:
Q:How does it
compare with hacked Linux kernel?
A: No specific
answer. Some people worked on "hacked Linux kernel", e.g., user-level
networking, or Remzi's work (?)
Q: Limitations? In
particular binding for large scale applications?
A: Limitations on hardware. E.g, you can't have more than a few virtual disks on a real disk, but you can do hundreds for network devices (?)
A: Limitations on hardware. E.g, you can't have more than a few virtual disks on a real disk, but you can do hundreds for network devices (?)
Network Subways and Rewiring:
Today's datacenter
tension: cost vs. capacity, above ToR switches, average link utilization only
25%
Why: rack-level
traffic is bursty/long tailed
Subways: multiple
ports per server
So, what do we do
with the extra links?
Today: wire to multiple core switches
Propose: connect to
neighbor TOR, less ToR traffic, distribution more evenly
Result: memcached up
to 2.8x performance improvement
Question:
Q:Wiring across
racks could concern people (datacenter administrators)
A: We haven't talked
with those people, but there is a huge performance benefit
Q: How does this
change failure modes?
A: Large scale
failre modes we don't know. But we can do faster local recovery etc.
Q: Power usage?
Q: Competing jobs
and your rewiring?
A: We have more
flexibility
没有评论:
发表评论