Evaluating the Accuracy of Java Profilers
PLDI'10 IBM Research
This paper is not related to virtualization directly. Instead it talks about how popular Java profilers fails to sample the call stack randomly, because most profilers will defer sampling to application yield points.
The proposed fix is to interrupt the application execution using signals, and do true random sampling based on time.
However, it might be interesting to think about measurements in general in a cloud/virtualized setting.
1. What would be useful for people to measure in a virtualized/cloud setting? Call stacks? CPU statistics, e.g., branches? Memory usage?
2. What properties we need to ensure for each measurement, in order for the results to be relevant and useful? E.g., sampling needs to be random in time to give accurate profiles.
3. How are these measurements typically performed? How do they rely on hardware events, e.g., CPU cycles or interrupts?
4. How are these hardware events typically emulated in a virtualized setting? Either it is a trap-emulate architecture, or based on binary translation?
5. The way these events are emulated, will they affect the properties we desire? For example, software generated interrupts may be batched or deferred to deliver to the VM, which will affect the timing of the interrupts. An instruction in VM may cause trap into VMM and take thousands of cycles to emulate, which makes the CPU cycles concept fuzzy.
6. How will that affect our measurement results? How do we work around it to get correct measurement results which can shed light in optimizing our programs?
I don't have satisfactory answer to any of the above questions. But it might be interesting read/think about it.
Another example of this is that TCP relys on accurate RTT estimates in order to perform flow control and adjust window sizes. A VM, however, may be descheduled for tens or even huandreds of millseconds while a packet is pending. As a result, CPU time-multiplexing can distort a VM's RTT values, causing its congestion windows to grow too slowly, which degrades throughput signifiantly. To solve this, some has proposed offloading more TCP functionality to hypervisor, or presenting VMs with virutal NI hardware that supports optional TOE(TCP Offload Engine).
没有评论:
发表评论