2010年11月28日星期日

linux software raid5 code reading notes (2)

A typical write process

1. When the bio corresponding to the write request is passed to make_request(), the sector number is first remapped into disk number and index within disk by raid5_compute_sector(), then get_active_stripe() is called to get the stripe this sector locates. Finally this bio is added to this stripe by add_stripe_bio().
2. Set STIPE_HANDLE bit of the current stripe
3. Calls into release_stripe() which will call mdev->thread (i.e, raid5d) to handle this stripe.
4. Raid5d calls into handle_stripe(), which calls handle_stripe5() for raid5.
5. Handle_stripe5() will call handle_stripe_dirtying5().
6. Handle_stripe_dirtying5() will first check whether a reconstruct-write or a read-modify-write is more preferable according to how many extra data needs to be read. Then it will make the R5_LOCKED and R5_Wantread flag for the disk buffer that needs to be read if this stripe is PREREAD_ACTIVE. (The PREREAD_ACTIVE flag is marked subject to the delay policy of raid5 described elsewhere). Handle_stripe_dirtying5() then returned w/ STRIPE_HANDLE flag cleared for the current stripe.
7. Handle_stripe5() will then call ops_run_io() to register I/O completion function to be raid5_end_read_request() and submit the read I/O requests for all the bio in rdev->req of each disk. Finally handle_stripe5() calls return_io() and returns.
8. Release_stripe() is called to clear the PREREAD_ACTIVE request. In the meantime, Raid5_end_read_request() will set R5_UPTODATE flag and STRIPE_HANDLE flag, and call release_stripe() function.
9. Again, with STRIPE_HANDLE flag, the handle_stripe5() function will be called. It calls into handle_stripe_dirtying5(), which in turn directly calls schedule_reconstruction() since all the disk buffers are uptodate.
10. Schedule_reconstruction() will mark STRIPE_OP_BIODRAIN and STRIPE_OP_RECONSTRUCT flag and R5_Wantdraian flag and set sh->reconstruct_state=reconstruct_state_drain_run for reconstruction write; or it will mark STRIPE_OP_PREXOR and STRIP_OP_BIODRAIN and STRIPE_OP_RECONSTRUCT and R5_Wantdrain flags for read-modify-write. It will also lock appropriate buffers.
11. Raid5_run_ops() is called. For read-modiry-writes, it calls ops_run_prexor() to calculate the xor results for old data and old parity block. Then ops_run_biodrain() is called (for both rcw and rmw) to copy data from bio to per-disk-cache. Finally ops_run_reconstruct5() is called, which calculates the parity block contents.
12. Ops_run_io() is called to register I/O completion function to be raid5_end_write_request() and submit I/O request for all the bio in rdev->req of each disk. Finally handle_stripe5() calls return_io() and returns.
13. Upon I/O completion, raid5_end_write_request() will clear R5_LOCKED flag and set STRIPE_HANDLE flag.
14. Again handle_stripe() is called, this time it does nothing but clear the STRIPE_HANDLE flag.
15. Control returns to make_request, which return the user-submitted bio.

linux software raid5 code reading notes (1)

Process of a typical read

1. When the bio corresponding to the read request is passed to make_request(), the sector number is first remapped into disk number and index within disk by raid5_compute_sector(), then get_active_stripe() is called to get the stripe this sector locates. Finally this bio is added to this stripe by add_stripe_bio().
2. Set STIPE_HANDLE bit of the current stripe
3. Calls into release_stripe() which will call mdev->thread (i.e, raid5d) to handle this stripe.
4. Raid5d calls into handle_stripe(), which calls handle_stripe5() for raid5.
5. Handle_stripe5() would call handle_stripe_fill5(), which will in turn call fetch_block5(). Fetch_block5() will set the R5_LOCKED and R5_WantRead flags, while handle_stripe_fill5() will set the STRIPE_HANDLE flag (which was previously cleared by handle_stripe5()).
6. Handle_stripe5() will then call ops_run_io() to register I/O completion function to be raid5_end_read_request() and submit I/O requests for all the bio in rdev->req of each disk. Finally handle_stripe5() calls return_io() and returns.
7. Raid5d then calls release_stripe(). Note: the I/O completion function raid5_end_read_request() and release_stripe() run in the same time???
8. Raid5_end_read_request() will set R5_UPTODATE flag and STRIPE_HANDLE flag, and call release_stripe() function.
9. Release_stripe() will again call handle_stripe() function. This time according to all the flags, handle_stripe() will call raiad_run_ops(), which will in turn call ops_run_biofill() to copy the data previous read to user-submitted bio asscociated buffer. Upon completion of this async coping, ops_complete_biofill() is called, which set STRIPE_HANDLE flag again.
10. This time handle_stripe() is called but it does nothing but clear STRIPE_HANDLE flag
11. Control returns to make_request, which calls bio_endio to return user-submitted bio.
12. Finally, calls into release_stripe() function which releases current stripe and if necessary, wake up raid5d to deal with other stipes.