2014年3月27日星期四

A new class of bugs

classify bugs, then an ecosystem for a bug class
see also: 
          Toward Optimization-Safe Systems: Analyzing the Impact of Undefined Behavior (SOSP'13 best paper)
           http://yangsuli.blogspot.com/2013/11/sosp13-papers.html


e.g.:
buffer overflow:
tools: Purify, Valgrind
System: ALSR
Languages: Java

A new class: unstable code
state of the art: turn off optimizations if seeing weird bugs

How prevalent?
How to think about it?
How to detect?

undefined behavior leads to unstable code
example: if(buf + off < buf) (gcc translates to if(false) because pointer overflow behavior is undefined
also: signed integer overflow, oversized shit, null pointer dereference, absolute value overflow
code may or may not work

How prevalent?
all major compilers, all major languages (they have a nice graph in paper) silently discard unstable code (in different ways). so changing/upgrading compiler may lead to broken system
more optimization over time (which makes problems worse)

How to think about it?
first approach: consider unstable code as dead code
problem: restricted to one particular compiler, and lists of false warnings because compiler always kills code

second apprach: as bug patterns
problem: aka antivirus software, inherently inomplete

third approach: as undefined hehavior
problem: too many false positives

Final approach:
cause: disagree on spec (undefined behavior)
Formulation overview:
disagreement delta: compiler: program never invokes undefined heavier
what can be done only with delta: kill unstable code 

Two round simulation: first without delta, second with delta, see if output is different
i.e., first time SAT oracle->N, and second time SAT oracle->Y, then unstable code.

Formulate delta:
reach(e, in)
undef(e, in)
delta(in) = for every e: reach(e, in) -> not undef(e, in)

SAT oracle: (booldector solver)
false negative or positive:
1. phase 1 not powerful enough->false errors
2. phase 2 not powerful enought->false negative


How to detect:
STACK: unstable code checker
challenge: previous statements requests inspect the entire programs, would lead to gigantic boolean predicate
solution: per-function and approximation 

Summary:
unstable code lead to subtle bugs
language designer: be cautious about undefined behavior
compiler writers: use our techniques to generate better warnings
programs: check your code using STACK...

Research approach: (I like this!)
identify systems problems
derive general solutions
build practical systems/tools
e.g. GUI not-responding (as graph reachability problem)
       replay (as graph cut problem)
killing bugs: theorem proving (CompCert: formally verified C compiler CACM'09, seL4: formally verified OS kernel but that's just 8,700 LOC)
            identify subcompnones in Linux kernel to verify


Q&A:
Q:Is it because C language just makes some decisions? 
A: It is more like if you could design a standard which doesn't have this problem? C developers actually have incentives to make signed integers overflow undefined. If you want to define buffer overflow, then you have to instrument every memory write, then you need a GC, you just turned C into Java...
Q: Is missing code the only result undefined behavior results?Why do you choose to focus on code discarding?
A: An interesting direction to pursue...We focused on this because that's what got our attention. 
Q: Does higher level language have fewer undefined behavior? Is it because of the nature of the language being high level, or is it because it's just newer thus better?
A: Yes. And unstable code is (kinda) OK in application level, not so in system level code. 
Q: Other undefined behavior like function argument evaluation order, sequence points?
A: I feel like it's less interesting because it can be solved by the frond-end of the compiler. Also, you could statically check that in your source code. There are also C-light without sequence point problems.



2014年3月7日星期五

Greening Datacenters Through Self-Generation of Renewable Energy

(Software defined datacenter power management. And they build a real system: hardware + software on a small size datacenter)

Thu Nguyen,   Rutgers 

Motivation: 
Datacenters uses lots of energy.....Most of it consumed by small datacenters.
Can we use renewables?

Approaches:
1. Buy renewable energy from other power plants off-site. (Google buys wind power from NextEra) --- not stable enough, transmission loss.
2. Self-generation, co-location (Apple built 40MW solar array in NC) --- location may not be ideal for DC or power plant.

Trends in solar energy:
Current PV efficiency (solar energy to electricity) ~15%, expect to grow to ~40% in 10 years.
Capacity efficiency (this largely depends on location) ~15-25%
Cost of solar energy system installation decreasing in time, expect to go down by 40%-60% in 10 years (DOE data)

Main Challenge: power supply is variable, and it may not match your workload consumption

Idea: Match world to energy supply, instead of matching energy supply to workload demands.
         (And you can use net metering to sell the power your battery can't store to your grid electricity provider)

Solution: 
Hardware (Parasol):
Steel structure on the roof + Backup power (battery + grid)  + IT equipment + Cooling
(Actually we don't need cooling that much, even though they are really energy consuming)
Software:
Use Hadoop workload scheduling to predict energy availability (based on weather forecast). Schedule jobs on renewable energy first, then low-price grid electricity, then peak-price electricity, as long as we can meet the job completion deadline.
Also, make sure we don't draw a lot of power from grid at any particular time (high peak usage adds up to bills!)
GreenSwitch: keep keys on jobs, servers on/off, battery lifetime management, peak time power and total power consumption, then output an energy source schedule and a workload schedule.

Results:
Parasol (use polar energy) without software: 60% cost save, amortize in 7 years.
Parasol + GreenSwitch: 75% cost save, but batteries are too expensive to amortize...
Parasol + GreenSwitch + Deferrable Workloads: 96% cost saving, solar + battery amortized in 7.6 years.
(But they did the experiment in summer, so might be too optimistic.....)

Aside:
They have a year of datacenter energy profile data (for every minute, the power efficacy at each inverter, the temperature and the humidity, etc)