blktrace

blktrace is a tool developed by Jens Axboe to trace the Linux io request. Here is a nice blktrace guide about installation and use. I have a live example to show why blktrace is useful.

Andre post a new io mode, blockio, for IET last month. The code runs well for him with various Linux iSCSI initiator. Later Ross found one problem with it. A Windows host can not create file system on any iSCSI volume export with blockio mode while Linux host can. And more funny thing is once a NTFS or FAT32 partition is created in advanced on an iSCSI volume, the Windows host can recognize the file system and do various file system operation without problem.

So what is the problem?

We used Wireshark to get all the iSCSI traffic. Windows format will read immediately after a write and the data read out is not the same as previous write. This confused the Window and lead to failure. But why normal I/O after format can read correct data out? We need a way to find out what happened exactly in various Linux IO layers. blktrace shines here.

We reran the format with blktrace running. Here is what we got.

….

8,16 0 768 7.012762981 4916 Q W 6736 + 32 [istiod1]
8,16 0 769 7.012766958 4916 G W 6736 + 32 [istiod1]
8,16 0 770 7.012769506 4916 P R [istiod1]
8,16 0 771 7.012770511 4916 I W 6736 + 32 [istiod1]
8,16 0 772 7.012777259 4916 D W 6736 + 32 [istiod1]
8,16 0 773 7.012981610 4916 C W 6736 + 32 [0]
8,16 0 774 7.013543284 4916 Q R 6736 + 32 [istiod1]
8,16 0 775 7.013546301 4916 G R 6736 + 32 [istiod1]
8,16 0 776 7.013548430 4916 P R [istiod1]
8,16 0 777 7.013549233 4916 I R 6736 + 32 [istiod1]
8,16 0 778 7.014866251 0 UT R [swapper] 1
8,16 0 779 7.014880793 9 U R [kblockd/0] 1
8,16 0 780 7.014885958 9 D R 6736 + 32 [kblockd/0]
8,16 0 781 7.015813374 9 C R 6736 + 32 [0]

Till now it looks ok.
8,16 0 782 7.025277381 4915 Q W 6768 + 32 [istiod1]
8,16 0 783 7.025283850 4915 G W 6768 + 32 [istiod1]
8,16 0 784 7.025286799 4915 P R [istiod1]
8,16 0 785 7.025287794 4915 I W 6768 + 32 [istiod1]

A new write(W) request was inserted (Q) into the queue.

8,16 0 786 7.026059876 4915 Q R 6768 + 32 [istiod1]
8,16 0 787 7.026064451 4915 G R 6768 + 32 [istiod1]
8,16 0 788 7.026066369 4915 I R 6768 + 32 [istiod1]

Before we see the write request finishes, a new read (R) request was also inserted (Q) into the queue.

8,16 0 789 7.034883766 0 UT R [swapper] 2
8,16 0 790 7.034904284 9 U R [kblockd/0] 2

The block device IO queue was unplugged.
8,16 0 791 7.045272094 9 D R 6768 + 32 [kblockd/0]
8,16 0 792 7.045654039 9 C R 6768 + 32 [0]

Bingo! Here the read request was sent to device and finished before write.

8,16 0 793 7.045669809 9 D W 6768 + 32 [kblockd/0]
8,16 0 794 7.049840970 0 C W 6768 + 32 [0]

Repeat Linux mkfs and do not see such IO pattern. So because Windows and Linux do I/O differently during creating file system, such error does not show up in Linux. And since read the same block from last write request (especially bypass the initiator file system cache and send to storage directly) is so rare that we do not see error in normal I/O.

The root reason is because that version of blockio does not preserve the I/O order in all time. Andre, Ross, and I are working on that.

So with a tool that can tell you what precisely happen, life is much easier!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: