Yang Suli的Blog: 十一月 2014

2014年11月3日星期一

Wrangler: Predictable and Faster Jobs using Fewer Resources

From UC-Berkeley

Solution for stragglers:

speculative execution, but wasted resources and/or times

Design spaces:

LATE(osdi'08):
Wantri(OSDI'10)
Dolly (NSDI'13)

Design Principles:

Identify stragglers as early as possible (to avoid wasted resources)

Schedule tasks for improved job finish time (to avoid wasted resources and time)

Architecture of Wrangler:

Master: model builder, predictive scheduler

Slaves: workers

Selecting "input features": memory, disk, run-time contention, faulty hardware

Using feature selection methods: features of importance vary across nodes and across time.

Why: complex task-to-node interaction and task-to-task interaction, heterogeneous clusters and task requirements

Approach: classification techniques to build model automatically. They use SVM

Evaluation

~80% true positive and true negative rate

Question: Is this accuracy good enough?

How to Answer: improved job completion time? Reduced resource consumption? ---Key is Better load-balancing.

Initial evaluation: no better load-balancing

Second Iteration: Use confidence measure

Final Evaluation: Reduced job completion time and reduced resource consumption.

Insight: confidence is key!

Another question: Sophisticated schedulers exist. Why Wrangler?

Difficult to anticipate the dynamically chaning causes
Difficult to build a generic and unbiased scheduler

Q&A:

Q: How to differentiate stragglers due to poor environment and due to node actually has more work to do.

A: In this work it is not addressed and we will look into it.

Q: How does Wrangler compare to existing techniques such as Late and Dolly

A: I don't have numbers for that. But we provide a mechanism (?) which is on top of everything else.

Q: How much time do you need to train the model?

A: We keep collecting data (a bit online fashion)

SOCC'14 Session 1: High Performance Data Center Operating Systems and Networks

Arrakis: An OS for the Data Center

Systems in data center generally I/O bound

Today's I/O devices are fast (NIC, raid controller etc), but the OS cannot match with it.

Kernel: API, Naming, ACL, Protection, I/O scheduling, etc: two heavyweighed

Arrakis: Skip kernel and deliver I/O directly to applications, but keep classical server OS features.

Hardware can help, because more and more functionalites embedded in hardware (SR-IOV, IOMMU, Packet filters, logical disks, NIC rate limiter, etc.)

Approach: put protection, multiplexing, I/O scheduling to device, put API an I/O scheduling to Application, put naming, ACL, resouce limiting still in kernel as they are not in data path. So: device + application: data plane, kernel: control plane.

Kernel: do ACL control once when confiuring the data plane, virtual file system for naming

Redis (application): persistent data structures (log, queue etc.)

Results: In-memeary get latency reduced by 65%, put latency by 75%, 1.75x GET throughput etc.

Implication: we are all OS developers now.

I/O hardware-application co-design

Application needs fine-grained control (aka openflow): where in memory do packets go, how to route packets through cores, etc.

Application-specific storage design

Question:

Q:How does it compare with hacked Linux kernel?

A: No specific answer. Some people worked on "hacked Linux kernel", e.g., user-level networking, or Remzi's work (?)

Q: Limitations? In particular binding for large scale applications?
A: Limitations on hardware. E.g, you can't have more than a few virtual disks on a real disk, but you can do hundreds for network devices (?)

Network Subways and Rewiring:

Today's datacenter tension: cost vs. capacity, above ToR switches, average link utilization only 25%

Why: rack-level traffic is bursty/long tailed

Subways: multiple ports per server

So, what do we do with the extra links?

Today: wire to multiple core switches

Propose: connect to neighbor TOR, less ToR traffic, distribution more evenly

Result: memcached up to 2.8x performance improvement

Question:

Q:Wiring across racks could concern people (datacenter administrators)

A: We haven't talked with those people, but there is a huge performance benefit

Q: How does this change failure modes?

A: Large scale failre modes we don't know. But we can do faster local recovery etc.

Q: Power usage?

Q: Competing jobs and your rewiring?

A: We have more flexibility

Yang Suli的Blog

2014年11月3日星期一

Wrangler: Predictable and Faster Jobs using Fewer Resources

SOCC'14 Session 1: High Performance Data Center Operating Systems and Networks

网页浏览总次数

Other Useful Stuff: