2014年6月7日星期六

Generalized Filesystem Dependencies

C Frost et al., SOSP 2007

Some Background:
three rules to ensure metadata consistency:
1. Never write pointers before initializing the structure it points to.
2. Never reuse a resource before nullifying all pointers to it.
3. Never clear last pointer to live resources before setting new one.

Soft-updates:
Keep track of dependencies between blocks (B->A, block A must be written before B)
But also need to keep track of undo information to changes: block granularity causes false sharing and cycles between blocks. Undo changes to break block dependency cycles.

Key Idea:
Similar to soft-update, but implemented in a file system independent way.
The key idea is a new division of labor: file system explicitly manipulate consistency dependency informations, while the kernel write back mechanism is solely responsible for flushing blocks to disk while respecting dependency.
Dependencies are presented between "patches", while a patch is just a single change to one block.
Rules to enforce:
dep[C] &sub C
dep[F] &sub C  ( dep[B_B] &sub (C &cup F_B) )

Benefit and Potential: 
This separation of consistency manipulation and implementation would allow different parties to easily cooperate to enforce consistency. Virtualization seems like an obvious fit. In the paper they talk about loopback device consistency.

Application level consistency is also promising. They propose an application level "patch group", which is a mechanism to make one set of file changes depend on another set; in addition to the dependencies file system enforces. This is implemented by inserting two empty patches at the start and end of a patch group, and make one group's start patch depend on the other group's end patch.

One can also imagine that without file system dictating when to write blocks to disk, the block level scheduler now have more freedom to optimize. This would completely separate the decision of "where to write a block (by file system)" and "when to write a block (by kernel block scheduler)". It is interested related to split level scheduling, where when to write is based on file system informations, or delayed delayed allocation, where we try to combine the decision of where to write and when to write.

Criticism/Possible improvement:
1. In order to track the committed set, they would require knowledge of when each block actually hit disk surface, which is generally not available... They do this by using a combination of NCQ and FUA support offered by SATA, but it is still expensive. I am not quite sure how to better though. Maybe a set of blocks committed?

2. Every changes to every block seems like too much to keep track of...Of course they do implementation optimizations. But seems like a simpler abstraction, say on the block level, or introduce atomicity, would be more useful? If it is on the block level, then it is more like soft-updates, which is again hard to implement. So maybe one should stare more at the consistency requirements and come up with a better abstraction, which is both for filesystem to manipulate, and for buffer/block layer to implement.

3. Even though they implemented soft updates and journaling, and svn application consistency. They didn't use this mechanism to enable more interesting things: virtualization, different consistency guarantees to different clients, file system cooperation, scheduler optimization etc...

Related Work:
1. Soft updates do things at the block level, but in a file system dependent way.
2. CAPFS and Echo considered customizable application-level consistency protocols in the context of distributed, parallel file systems. Echo maintains a partial order on the locally cached updates to the remote file system.
3. Burnett's thesis describes a system tracking dependencies among system calls, associate dirty blocks with unique IDs returned by those calls, and duplicate dirty blocks when necessary to preserve ordering.
4. Xsynfs's external synchrony provide uses with the same consistency guarantees as synchronous writes, but are implemented by committing blocks in groups using a journaling design.

Other resources:
Note of this paper:
http://www.scs.stanford.edu/13wi-cs240/notes/featherstitch.txt