2014年11月3日星期一

Wrangler: Predictable and Faster Jobs using Fewer Resources

From UC-Berkeley

Solution for stragglers:
  1. speculative execution, but wasted resources and/or times
Design spaces:
  1. LATE(osdi'08):
  2. Wantri(OSDI'10)
  3. Dolly (NSDI'13)

Design Principles:
Identify stragglers as early as possible (to avoid wasted resources)
Schedule tasks for improved job finish time (to avoid wasted resources and time)

Architecture of Wrangler:
Master: model builder, predictive scheduler
Slaves: workers

Selecting "input features": memory, disk, run-time contention, faulty hardware
Using feature selection methods: features of importance vary across nodes and across time.
Why: complex task-to-node interaction and task-to-task interaction, heterogeneous clusters and task requirements
Approach: classification techniques to build model automatically. They use SVM


Evaluation
~80% true positive and true negative rate
Question: Is this accuracy good enough?
How to Answer: improved job completion time? Reduced resource consumption? ---Key is Better load-balancing.
Initial evaluation:  no better load-balancing
Second Iteration: Use confidence measure
Final Evaluation: Reduced job completion time and reduced resource consumption.
Insight: confidence is key!

Another question: Sophisticated schedulers exist. Why Wrangler?
  1. Difficult to anticipate the dynamically chaning causes
  2. Difficult to build a generic and unbiased scheduler

Q&A:
Q: How to differentiate stragglers due to poor environment and due to node actually has more work to do.
A: In this work it is not addressed and we will look into it.
Q: How does Wrangler compare to existing techniques such as Late and Dolly
A: I don't have numbers for that. But we provide a mechanism (?) which is on top of everything else.
Q: How much time do you need to train the model?

A: We keep collecting data (a bit online fashion)

没有评论:

发表评论