Yang Suli的Blog: Wrangler: Predictable and Faster Jobs using Fewer Resources

From UC-Berkeley

Solution for stragglers:

Design spaces:

Design Principles:

Identify stragglers as early as possible (to avoid wasted resources)

Schedule tasks for improved job finish time (to avoid wasted resources and time)

Architecture of Wrangler:

Master: model builder, predictive scheduler

Slaves: workers

Selecting "input features": memory, disk, run-time contention, faulty hardware

Using feature selection methods: features of importance vary across nodes and across time.

Why: complex task-to-node interaction and task-to-task interaction, heterogeneous clusters and task requirements

Approach: classification techniques to build model automatically. They use SVM

Evaluation

~80% true positive and true negative rate

Question: Is this accuracy good enough?

How to Answer: improved job completion time? Reduced resource consumption? ---Key is Better load-balancing.

Initial evaluation: no better load-balancing

Second Iteration: Use confidence measure

Final Evaluation: Reduced job completion time and reduced resource consumption.

Insight: confidence is key!

Another question: Sophisticated schedulers exist. Why Wrangler?

Q&A:

Q: How to differentiate stragglers due to poor environment and due to node actually has more work to do.

A: In this work it is not addressed and we will look into it.

Q: How does Wrangler compare to existing techniques such as Late and Dolly

A: I don't have numbers for that. But we provide a mechanism (?) which is on top of everything else.

Q: How much time do you need to train the model?

A: We keep collecting data (a bit online fashion)

Yang Suli的Blog