2011年9月25日星期日

How to run customized ext4 under Linux 2.6.26

There are two difficulties:
1. How to build your own ext4 and jbd2 module without re-build the whole kernel.
2. How to enable ext4 support on old linux 2.6.26 kernel

My platform is centOS.

Solution:

0. Make a copy of your linux source code tree. Modify the ext4 and jbd2 source code as you need.

1. Modify the .config file in your linux kernel source code directory. Delete all the xxx=m line. Add the following line:
COFIG_CRC16=m //needed by ext4
CONFIG_EXT4DEV_FS=m
CONFIG_JBD=m
CONFIG_JBD2=m

2. do "make modules" under this directory. This will produce the needed kernel modules.

3. Check /etc/modprobe.conf, /etc/modules.conf, and /etc/modprobe.d/blacklist files, make sure ext4 and jbd2 module are not automatically loaded. This is to prevent the kernel from loading the some other version of these modules from a default place.

4. reboot the kernel. Do insmod on lib/crc_16.ko, fs/jbd2/jbd2.ko and fs/ext4/ext4dev.ko. Theses are the modules you modified. Note that crc_16.ko is needed by ext4dev.

5. yum install e4fsprogs. This will give you necessary utility support for ext4.

6. Choose a disk partition to mkfs: mkfs -t ext4dev /dev/sdc

7. Tune the fs to tag that it's a test filesystem (otherwise kernel will refuse to mount it): tune4fs -E test_fs /dev/sdc

8. mount -t ext4dev /dev/sdc /dir/you/want/to/mount. Note that you may need to mount with some especial flags (e.g. no extent support), since ext4 under 2.6.26 have some bugs.

Done.



2011年9月21日星期三

Efficient Locking for Concurrent Operations on B-Trees

Lehman et al. 2002

Summary:

This paper proposed a new solution to support concurrent access on B-trees which reside on secondary storage. Basically a new link pointer is added to the B-tree data structure to chain all the pages in the same level into a linked list. The intent of doing this is that if an insertion split a node into two, other processes will still see a "logically single node" by following the link pointer, thus the consistency of the tree is not lost.

The the search for the B-tree will proceed as usual, without grabbing any locks. However if the search key exceed the highest value in a node (as indicated by the high key), it indicates that the tree structure has been changed, and we should follow the link pointer to get the split node. Insertion into the B-tree is divided into two phase: the search phase to locate where to inserte, which is identical to normal search; and the actual insertion phase, where we lock a node, split it as necessary, and add appropriate link pointers. An insertion to the parent node is also needed, if split happens, and it worked the same as leaf node. A stack of the rightmost node we examined at each level during the descent through the tree is maintained, to urge the case when we backtrack the tree.

An informal proof of the correctness of this solution is given: it ensures that each process sees a consistent state of the B-tree, and prevents deadlock. Livelock, on the other hand, is possible if a process keep following the link pointer other processes created. This solution is highly concurrent because an insertion process at most hold three locks simultaneously and a search process don't need to grab any lock (not even shared lock).

The whole model they developed is based on the assumption that the whole B-tree sits on disk, and every process has its own private memory cache. However, I don't think this is very pratical. A more common case would be that the first and second level of the B-tree sit in main memory and is accessed by all the process. How to correctly access this kind of B-tree correctly remains a problem.

A Critique of ANSI SQL Isolation Level

From Microsoft, 1995


Summary


This paper discussed the definition of ANSI SQL(92) isolation level, and pointed out its ambiguity and failure to preclude some abnormality. The authors proposed a new set of definitions which could naturally map to the standard locking implementation of each level, and also discussed some additional isolation level, such as snapshot isolation.


The author first introduced the Dirty Read, Non-repeatable Rad and Phantom phenomena; and pointed out due to the ambiguity of their English statement, there could be a broad and a strict interpretation of each of the phenomena. They stated each interpretation in terms of execution history and defined different ANSI SQL Isolation level by how they preclude those phenomena.


The author then defined another set of isolation level in terms of different locking implementation. They discussed what kind of abnormality each of these isolation levels preclude, and noticed that their discrepancy with the ANSI isolation levels. They defined a new kind of phenomena, namely dirty write. A new set of isolation levels, defined by how they preclude those four phenomena (dirty write, dirty read, non-repeatable read and phantom) in the strict interpretation, are proposed. The author then showed that they are equivalent to their corresponding locking isolation levels.


The author discussed two additional kinds of isolation level: cursor stability and snapshot isolation. They clearly defined them by what abnormality they preclude, and showed that how they sit in the isolation level hierarchy.


Finally a summary has been given to show under the better definition, what kind of abnormality each isolation level precludes; and how they form an isolation level hierarchy.

On Optimistic Methods for Concurrency Control

paper here:

http://www.google.com/url?sa=t&source=web&cd=1&ved=0CCMQFjAA&url=http%3A%2F%2Fciteseerx.ist.psu.edu%2Fviewdoc%2Fdownload%3Fdoi%3D10.1.1.114.3052%26rep%3Drep1%26type%3Dpdf&rct=j&q=On%20optmistic%20method%20for%20concurrency%20control&ei=EPB5Tp6rFYWJsQLW28joAw&usg=AFQjCNGrkzBJkj-S35zwDWNlA5iYM--eWg&sig2=vLC4D24pmZlETmYNr0q4Kw


Summary:

In this paper the authors analyzed the drawbacks of doing concurrency control using traditional lock techniques: you have to pay the locking overhead even when you don't really need it. They then proposed optimistic concurrency control: let transactions proceed as if there's no conflict, and back-up them if a conflict is detected.


The main idea behind this optimistic approach is to divide the transactions into three phases: read, validation and write. In read phase, transactions only read database values and all updates take place in local copy; we also remember what we read and update in this transaction. In validation phase, we use some algorithm to detect whether there is a conflict during the read phase. If there is no conflict, we proceed to write phase and apply the updates to the global database; otherwise we just back-up and do necessary clean-up.


The authors discussed in detail how to validate a transaction. They laid out three conditions for transactions to meet serial equivalence (mainly about the read/write set of each transaction should not intersect with each other), and discussed how to assign each transaction a serial number to facilitate validation. Two families of validations algorithm are then proposed: one require serial write phase of all transactions; and the other allow concurrent write phases.


The author also discussed what kind of workload is suitable for optimistic concurrency control: query-dominate workload and other workload which conflict happen in a low probability. An example is given and analyzed (in paper) to show that optimistic concurrency control is promising in maintaining B-tree index. However, no experimental results are shown.


Comments:

1. This whole optimistic CC thing has a copy-on-write semantics. So note that it could break the original physical layout and extra caution should be taken.

2. The paper didn't quantify at all the overhead of the optimistic approach: maintaining sets, intersection, making local copies (memory copy is expensive!). The authors intentionally did that but you should let them manipulating you....

3. Optimistic Control has not yet been implemented in any commercial system as far as I know, but some techniques developed in this paper, like assigning transaction numbers, are very influential.

2011年9月1日星期四

Do NOT block make_request function!

Today I encountered a bug in my code, which I think is interesting enough and worth documenting.

So I have two block device drivers, journal_bd and checkpoint_bd, which communicate with each other. Basically, when journal_bd serves I/O request in its journal_bd_make_request function, it will look at the type of bio it's serving; and for a special kind of bio, it will postpone the service for that bio, issue some I/O requests (actually, writes) to checkpoint_bd, wait for those writes to be completed by checkpoint_bd, and finally serve the original bio.

This seems like a straightforward functionality. Thus I had the following implementation:

journal_bd_make_request(bio * bio){
//....
if(bio is special bio){
checkpoint(); //this is a function checkpoint_bd module exports
serve_bio();
}
//...
return;
}

In checkpoint(), which is implemented inside checkpoint_bd module, we issue writes to disk by calling generic_make_request() and wait for them to complete:

checkpoint(){
for_each_write_bio_we_want_complete{
generic_make_request(write_bio);
}
wait_on_semaphore();
return;
}

checkpoint() will block until all the writes hit disk. At that point, we will raise the semaphore in an I/O completion function checkpoint_end_io(), which will enable checkpoint() to return, and journal_bd_make_request() to proceed and serve the special bio.

checkpoint_end_io(){
//check if we are done with all the writes
//if we are, raise the semaphore
}

The above scheme seems nice, except for that it didn't work...

When I tried the above implementation, I noticed that we got stuck waiting on the semaphore, which is never getting raised.

At first I thought that this a simple synchronization bug: I probably didn't initialize the semaphore correctly, or ignored some condition where I should have raised the semaphore but didn't. All of these have happened before. However, after more inspection I realized that checkpoint_end_io(), our I/O completion function, was never called! That's why it doesn't have the chance to raise the semaphore. Things are getting interesting....

After making sure that I did assign the right value to bio->bi_end_io field, I wrote a pseudo device driver which intercepts all the bios and log them. Astonished, I found that those write_bios we submitted to disk using generic_make_request() in checkpoint() were never received by the disk!

OK...So apparently kernel is silently discarding our bio requests. Maybe I didn't set the bio fields right? But then kernel should at least complain about it...And using a debugger, I confirmed that all the bio requests I submitted are legitimated ones.

So is it because of some permission problem? Anyway, checkpoint() is called by another module. Maybe kernel doesn't allow that? Or maybe module makes some implicit assumption about its execution context, which we are not ensuring when calling checkpoint()?

Ishani came at this point and suggested to check all the global variables in checkpoint_bd module are being read as its correct value when we are executing checkpoint() on behalf of journal_bd. We checked that, and it didn't offer us too much information.

At last we decided to actually step through the bio submit process to understand where all the bios we submitted actually went. We used gdb to step into the checkpoint() function, which calls into generic_make_request(write_bio). However, we found that instead of submitting the bio, kernel chained the bio in a list and immediately returns. (Line 1417 - 1422 in blk-core.c, if you are using Linux 2.6.26). The comments for this function explain it fairly clearly, and I am going to copy them here:

/*
1405  * We only want one ->make_request_fn to be active at a time, 1406  * else stack usage with stacked devices could be a problem. 1407  * So use current->bio_{list,tail} to keep a list of requests 1408  * submited by a make_request_fn function. 1409  * current->bio_tail is also used as a flag to say if 1410  * generic_make_request is currently active in this task or not. 1411  * If it is NULL, then no make_request is active.  If it is non-NULL, 1412  * then a make_request is active, and new requests should be added 1413  * at the tail 1414  */
So what happens is that kernel only allows one make_request function to be active at a time for a particular kernel thread. Since we are already inside the journal_bd_make_request function, i.e., it is active, all the bio's we submitted will be kept in a list instead of being handed to the block device. And since we then sleep inside the journal_bd_make_request function, which makes it always active, so the chained bio cannot get a chance to be served. This why those bios are not served and why we never see the I/O completion function to raise the semaphore.

Once we understand the problem, the fix is straightforward: just make journal_bd_make_request immediately return instead of blocking on the semaphore. Instead, we serve the special bio inside checkpoint_bd, once we are completed all the write_bios. And this fix works perfectly.

So moral of story: do not ever block your make_request function; this will kill the chance of serving the other bio's you submitted.


2011年7月23日星期六

关于何新有关六四演讲的一点感想

大半夜的,在校内上看到这个看激动了....=,=

http://www.inmediahk.net/node/29225

我当然不同意何新的基本观点,虽然我同意他的许多技术性论断。美国和其他西方国家跟中国有没有战略利益冲突?当然有,而且他们必然会按照他们的最大利益行事。当时的学生是不是过于天真,他们的愤怒是不是过于泛化而缺乏建设性,甚至有破坏性?当然是,什么时候的学生不天真呢?但是,对于国家利益的维护,建立在对于国家利益的正确认知上。过于上纲上线的国家利益,从来都是危险而又靠不住的。而我以为,在坦克开出来,无辜学生的鲜血流出来的时候,所谓国家利益,已经受到了最大程度的破坏。

可是同样有趣的,是何新本人受到的很多不公正对待,以及一部分人,一部分北大学生对待何新的态度。那些文革化的语言,以及对待阶级敌人式的一棒子打死的态度,实在让人触目惊心。而这些语言或实际的迫害,来自于激烈主张民主,自由与基本权利的群体,多么讽刺!而八九到现在,二十年已经过去了,虽然大家对于政府的不信任,已经发展到了危险的地步;但是我们争取权利的技巧和素养,却并没有提高多少。从充斥着“广大网友”的新浪天涯,到以学生为主的未名水木,再到以留学生为主的未名空间;放眼望去,我只觉得辱骂和攻击满天飞。争取权利的一方,或者说弱势的一方,对于他人人格的漠视和权利的践踏,跟他们攻击的一方一样严重,如果没有更严重的话。这一点,真让人忧虑。

最后再aside一下关于国家利益的问题。上金融课的时候,老师讲到一战以后英国政府坚持不让英镑贬值,尽管彼时内忧外患,贬值看起来是一个如此诱人的选择。一个很重要的原因是因为当时社会上有巨大的声音,认为在一战中掏腰包购买债券,表达了对英国坚定支持的人,需要得到公正的对待。不能以国家利益之名,伤害他们的利益,同时也伤害大英帝国的荣誉。尽管英国政府维持币值的决定,后来看来并不明智。可是当时我坐在讲台下想的是,这个国家,多么的让人尊敬。

2011年7月14日星期四

在中国拜访政府部门的流程建议

偶然在某个BBS上看到,神作啊~~~
恩,我决定以后每次出去办事以前都先学习一下此贴再出发!

========转载分割线====================


因为在这里的工作有很多机会要接触官员...
以下就个人经验打些建议:

1. 先了解好拜访各政府机关(尽量短时间内跑完所有关连单位), 总结需要达到的目标

2. 准备好项目构想, 资料, 方案, 愈具体愈好, 例如示意图、草案计画「书」, 要看起来势在必行

3. 找出所有相关的「局、处、所」的「各科别」

4-1. 最有效率的方式, 是透过人脉联系项目主要关系部门, 具决策能力的人, 如局长,委员会主席, 县长等-职权大于其他部门访谈对象的官员初次拜访, 首要目标是让对方下达行政公文, 第一是要跟对方有直接关系, 例如发展改革局
, 第二是阵仗要够大, 就算以后不会再出现的人也多到几个, 因为官员大多很和谐, 见到人多总要给点面子

4-2. 在当地找无人脉, 就找项目中有直接关系的单位的科长, 事先约个时间表达来意, 主动问是否再上班时间或非上班时间见个面, 「请他们指导」一下, 见面的主要程序是: 用「充分准备」的资料让对方认同项目, 多到脑袋混乱最好, 然后告诉对方, 哪方面有困难, 尤其是对方了解的领域, 打开话题后, 将话题目的围绕在让对方「表明」支持项目, 让对方上台阶..., 然后开始提出些「请求」

(这里的官员对于来者很重视以下:所属单位、职务或地方身分、有无相关部门授权等可信任程度, 不然资料一个都不给, 问题都随便打发打发)

(非绝对必要, 千万不要掏钱给官员, 不然一定会陷入无底洞或吃亏)

以上是第一阶段

在取得重要官员的支持和协助后, 开始着手对区域内各关系部门的资料索取或访会谈:

1. 列出清单, 请高层官员授权、支持和协助对各部门的会谈资格, 不然被草草了事打发的可能性很大

2. 在每个见面的行程前, 最好三天前就电话约好时间, 并且将要谈的内容和希望了解的资料列出, 让对方有充分时间准备, 相对也比较不会敷衍了事。见面前一定要在电话中就让对方信任, 或是明确的承诺会提供协助..实际见到面的态度才不会有太多落差, 事先约好也绝对好过突然拜访耽误对方午休或接小孩时间之类的...

3. 每次见面, 一个人主问, 一旁最好有人随行纪录补充。 因为会谈地点大多会在对方办公室内, 所以人不要超过三个为宜, 因为办公室通常只会多准备两张椅子...

4.事先了解对方办公处所在是必要的, 但仍然可以此为由, 装傻的问对方如何前往, 让对话多一些, 信任和情感程度也会多些。以上是有充分时间的方式, 如果时间不够, 就做好一个部门要去两次的心理准备, 不论如何一个关键「资料要准备充分、齐全、具体」

5. 确认自己的服装仪容, 在台湾, 第一次见面要以正装出席会谈可能是大家的默契了, 但在中国, 白领身分的人却随便穿着是很理所当然的.. ., 身为台湾人...更应该坚持从小教导服装仪容要整齐的文化..., 正装出席给对方好的印象, 对方通常比较愿意给面子多(中国真的有不少阶级意识),至少要白衬衫, 切忌牛仔裤和一堆装饰的「潮衣潮裤」...



见到面了:

1. 问候、再次表明来意、出示相关证明和名片

2. 先主动握个手吧! 肌肤之亲很能拉进彼此距离.. [1;31m...(握手后请勿随意处碰, 不过我在这里, 已经自动让洁癖荡然无存了...)

3.尽可能随身准备包烟和打火机, 练习点烟, 尽管自己不抽, 还是可以询问对方...以烟会友...
对方不抽, 就收起来开始进入话题对方抽了, 自己要不要跟着抽...其实影响不大..

4.有资料拿资料, 有问题提问题, 尽量提出自己困难点, 让对方有表现机会

5. 在初步的提问候, 自己一定要能摸清对方的办事性格, 不擅长奉承的人, 最好练习一下一些奉承对方的提问语气

6.不同部门不同职务的官员, 办事效率有些奇差, 堪称冗员, 但也是有无其认真的, 见人说人话, 见鬼说鬼话。

7.有求于人的关键就是让对方觉得"你这小老百姓, 看我的吧!" 千万别期待这里的官员是热心服务的.........他们可是老大...

8.还有个要提醒的...别期待这里政府无所不知, 实际上他们知道的东西真的很少, 加上细分的职权...不同部门科别的资讯交流只限会议中提到和公开的资料, 实际上大概说没有的资料就是真的没有, ...统计方面的正确性...真的参考参考就好, 把官员的官员的用途定位在"出面解决问题",像是签名敲章授权或陪同造访等...

准备结束话题了:
1.真的也没啥好问的时候, 如果对方对案子影响很重要... 就邀请对方吃个饭, 许多话是在办公室里问不到的, 另外, 如果谈得来, 从公务上拉进成朋友关系, 在这里要干点事情, 有关系真的很多事情都好办很多

2.没要请对方吃饭就直接说「下午三点了, 我们也差不多要前往下个地方了」之类的理由开即可

3.走之前一定要再次确认和总结会谈, 向对方要个名片或提醒需要提供的资讯, 免得效率太差空等待..

4.一些预约时间的事情, 就像杀价一样最好说句能否再早一点?或是先提供局部的? 可以省掉不少时间...



另外些建议:
1.这里行政都有"第XX个五年计画", 简称XX五计画, 尽量让案子和这些计画有关联, 政府会比较重视..

2.一些行政公文要请政府批示的, 事先帮那些官做好, 他们只要看过同意敲个章就能用这样的模式成功率会请政府提供些公文高很多

3.关于问卷之类的, 最好”不要”用开放式问卷给政府人员看, 把问题和对方要回答的东西都自己想好大概…写成封闭式问卷, 现场给对方写完后再根据选项作进一步询问, 是最好的资料搜集方式, 如果纯粹是要官员提供建议和指导的, 就在一开始见面问候后, 直接的说, 省得浪费彼此时间, 问得越精辟, 资料愈详细, 通常对方回覆的意愿会高上很多, 不然直接说不知道不清楚不了解带过是最常发生的事…

另外, 对于些非大城市地区:
1.很忙, 因为一天只有6小时上班时间
2.很闲, 因为地方这么小, 以前都是农村, 现在多了点厂房, 实际上没啥好做的
3.善尽职责, 因为这么小, 也不会有啥展发展, 未来规划的事情都听上面领导跟着做就好
4.政府设施很棒, 超级大办公室空荡荡的, 一个局一大栋, 各局分布在地方的四处是很正常的.....
5.很有效率, 因为统计数字多是忽悠出来的, 所以统计很快, 也明白的说没参考意义
6.一问三不知、推托责任、敷衍了事、安于现状...其他有待版友补充

2011年4月8日星期五

Top 10 Reasons for NFS Stale handler error while mounting FS

1. When the kernel tries to read the first block of the device, and found all 0 in there! That is, you didn't flush writes properly to the appropriate place.

2011年3月10日星期四

Followup on academic publishing in systems

zz from http://www.thegibson.org/blog/archives/305

My personal take away from this blog:

1. Getting a faculty job in a decent university is reallllllllly hard. 2 OSDI, 3 SOSP and other publications on FAST?!?! It is hard to imagine myself making this kind of achievement in 5 years, even though I am lucky enough to work with his former advisor. (Would it be easier for research labs affilicated to companies? But you woundn't have grad students for most time of a year, and that's like a whole different ecosystem).

2. Publishing in top system conferences is kind of subjective (at least to me). You have to guess what will make PC interested, and it is still a bit mysterious to me :(
My advisor seems to have a quite different perspective on what is interesting work in the system area than I do; and I guess I should try to develop this kind of perspective too if I am seriously thinking about research as my career. (Still can not believe some debugging linux stuff get their way into OSDI, though...)

3. Congrats to Haryadi! This is like the first faculty in our group? Even though those guys in Microsoft research or national labs are also active in publishing papers (Eurosys, SIGOPS etc.), but I guess being in an university is different.

===========================================================================

I follow Ed Felten’s blog, Freedom To Tinker (which is actually now a blog for many people at Princeton’s Center for Information Technology Policy) — it has good coverage on issues like electronic voting and intellectual property. Dan Wallach, a well known security (among other things) researcher at Rice, published an interesting post titled “Acceptance rates at security conferences” assessing the state of academic CS conferences in the area of security. He points out that the conferences are getting increasingly competitive with an ever growing field of researchers and a relatively fixed number of conference venues; he notes that this will lead to certain “structural problems” in the research community and discusses potential options.

He also points to Matt Welsh‘s thoughts on similar issues in the systems community:

“Scaling up conferences”
“Scaling up program committees”
This is of particular interest to me since, as a PhD student, I am an academic systems researcher. Dan Wallach summarizes Matt first post as follows:

He argues that there’s a real disparity between the top programs / labs and everybody else and that it’s worthwhile to take steps to fix this. (I’ll argue that security conferences don’t seem to have this particular problem.) He also points out what I think is the deeper problem, which is that hotshot grad students must get themselves a long list of publications to have a crack at a decent faculty job. This was emphatically not the case ten years ago.

I definitely see what Matt is talking about in the systems community. For example, for a large subset of lower-level systems work*, SOSP and OSDI are a sort of gold standard in publication venues. Each conference is held every two years (alternating years between the two venues), so each year 20-30 papers will be accepted total (for reference, OSDI ’08 accepted 26/193 and SOSP ’07 accepted 25/131). Given the size of the systems community, that doesn’t give much leeway for up-and-coming researchers, but a publication in such a venue is virtually required to be competitive academically — as Matt describes it, a publication in these venues is “a highly prized commodity, and one that is becoming increasingly valued over time.” Matt says:

Several of us on the hiring committee were amazed at how many freshly-minted Ph.D.s were coming out with multiple papers in places like SOSP, OSDI, and NSDI. Clearly there is a lot more weight placed on getting those papers accepted than there used to be. … Somewhere along the way the ante has been upped considerably.

I notice this too. For example, Georgia Tech’s College of Computing (where I am finishing my PhD) was ranked in the top 10 graduate programs in CS (#9) by US News and World Report in 2008. For systems research specifically, we were also ranked in US News and World Report’s top 10 (#10) in 2008. Now, of course US News and World Report’s rankings are contentious and reducing the work of a whole bunch of different researchers in CS to a single dimensional ordinal representing the whole program is very subjective, but the point is only to say that our program can be considered competitive in the universe of CS graduate programs.

But if you look at our publishing track record in these two prized venues, we’re virtually unrepresented. If you look at the OSDI proceedings, you will see that a paper from Georgia Tech has never been accepted there (my 2008 submission was rejected, although it did get decently positive reviews), and we have two SOSP papers — one in 97 which was collaborative with Microsoft Research and involved only students and no Georgia Tech faculty (which makes me wonder if it was related to an internship) and one in 2007 which was a collaboration between a student in the Electrical and Computer Engineering and a professor in the College of Computing. Compare this college-wide record with that of Haryadi Gunawi, an excellent faculty candidate interviewing at Georgia Tech this year. In his career as a PhD student, he had 2 OSDI and 3 SOSP publications (plus publications in top venues in other areas, like PLDI, ISCA and FAST). As a student, he has amassed significantly more publications in these prized venues than our whole College of Computing can claim**. And other students from his advisor(s) have similarly impressive CVs. Look at the students of many other “rockstar” systems researchers and you’ll see the same pattern; we had a parade of great faculty candidates with similarly strong records.

So what am I supposed to make of this? I get a deep sense of cynicism when I see trends like this over many years. Matt says, “I don’t have hard data to back this up, but it feels that it is increasingly rare to see papers from anything other than ‘top ten’ universities and labs in the top venues.” I would go a step further and say that there’s a certain “clique” (or “cabal” if you want something sinister) of key researchers who facilitate virtually all of the publications in these venues. If you are a student of one of these researchers, or a nth-generation student (e.g. a student of a faculty member who once was a student of…), you know how to do work that appeals to the program committee and present it in the proper way — if you don’t have the right perspective on these fine points of taste, your chances are grim.*** As a student, if your advisor is a big name, you can have a paper in these top venues every year. If you don’t, you have very bleak academic job prospects. Now I’m definitely not trying to diminish Haryadi’s impressive accomplishments, and his research is very exciting. But I get the sense that there’s a very strong dis-proportionality in academic publishing in systems that is a lot worse than most other areas in computer science.

A comment to Matt’s first post and the end of Dan’s post also pointed me to another relevant article. In May’s CACM Viewpoints, Ken Birman and Fred Schneider wrote an interesting critique of the state of systems conferences titled “Program Committee Overload in Systems” (here’s a free pdf from Fred Schneider — the same content but without the fancy formatting of the CACM hardcopy). This CACM article seems like a follow-up and expansion of an earlier work of Ken’s I’ve blogged about (titled “Overcoming Challenges of Maturity”).

Anyway, I’m glad that some well-respected systems researchers are being vocal about these issues. It’s definitely good to know I’m not the only one with gripes; I’ve been somewhat cynical about this for a while, but since I have very little clout it helps to find a few senior systems researchers with some common concerns.

* Yes, I understand that “lower-level” is a matter of perspective. To my electrical and computer engineering colleagues, things like hypervisors and operating systems count as “high-level” “end-user” programming.

** If you look at DBLP, you will find a good bit more from current College of Computing faculty, but I’m counting publications where the author is at Georgia Tech when the publication is made (i.e. the author’s affiliation at the time of the publication).

*** Even presented well, good work on certain kinds of systems topics just doesn’t seem to be interesting to the PCs of these top conferences (the Europeans have been irked by this for years — leading to the establishment of EuroSys).