Saturday, January 17, 2009

It is all 'fsck'ed up

A couple of days ago, my MacBook Pro(MBP) refused to boot up. When booting from the Mac OS X partition, I got the usual spinning logo, then it would shut off completely. I did not realize there was an issue until after about five or so attempts at booting, all of which failed.

One of my options would have been to boot up the MBP in target disk mode, mount the volume on another Mac and run Disk Utility to find out any potential issues. Since, I had no other Mac machine handy, I decided to boot from the Mac OS X install disk and run the Disk Utility app that is in there.

Disk Utility detected a rather cryptic problem called 'invalid sibling link' error while verifying the catalog file. Also, interestingly, the 'Repair Disk' option in Disk Utility was grayed out.

I was desperate now. There was a lot of data in that partition which I needed and had no backup of. Thankfully, I was able to fix the problem and here is how I did it.

But, what is 'invalid sibling link' error?

To understand this, let us first, very briefly, look at how a HFS+ volume is structured.

An HFS+ volume contains five special files:
  1. Catalog file: Describes the folder and file hierarchy of the volume. It is organized as a "balanced tree" for fast and efficient searches.
  2. Extents overflow file: This keeps track of the allocation blocks that are allocated to each file as extent (extent-based file systems allocate disk blocks in large groups at a single time, which forces sequential allocation, of course this is somewhat of an oversimplification).
  3. Allocation file: Specifies whether an allocation block is free. This is stored in a bitmap, specifying a free allocation block with a "clear bit".
  4. Attributes file: Contains attribute information regarding files or folder.
  5. Startup file: Allows computers to boot that do have built in support for HFS+ file systems
Catalog files, Extents overflow files and Attribute files are stored in data structures that are technically known as B-trees (technically the HFS B-Trees are a variant of B+-Trees which Apple's technical documentation calls them B*-Trees.)

The Catalog tree (which describes a file or a folder) contains records which, for efficiency, are grouped into fixed length blocks called nodes. The B-trees are structured in such way that the records are in the leaf nodes. These records are sorted and indexed for fast searching which could physically scatter the nodes around the B-tree file. So, all the nodes are linked together to form a doubly linked list which allows the traversal of leaves in the same directory.

If, for whatever reason, the link between the leaves breaks (technically, becomes an invalid or dangling pointer), then we end up with an error like 'invalid sibling link'.

Now that we understand what the problem really is (through the rather oversimplified write-up that preceded this), here is how to fix the problem.
  1. Start in single user mode by holding down Command-S while booting (in my case the problem occurred on the boot volume so this was required).
  2. Once the text scrolls all the way down you should be at a command prompt
  3. Enter fsck -fy (fsck is a file system consistency and repair tool. the 'f' option is to force checks even on a clean file system and the 'y' is to just blindly say 'yes' to every option that may come up. Read the man document on fsck for more detailed explanations).
  4. If it reports 'FILE SYSTEM WAS MODIFIED' repeat from step 3 again.
  5. Keep repeating until it says 'The volume appears to be OK' (or something to that extent) or you give up ;-).
  6. If successful, type reboot to reboot.
Running fsck multiple times may be necessary. In my case I had to run it 4 times before the problem was fixed.

Googling for 'invalid sibling link' returns thousands of results but none of them really explain what causes the problem. Here is another good solution (and a great discussion) to the same problem.

If neither of these solutions work, then Alsoft's DiskWarrior (which will set up back by $99.95) could be another option although even this tool has varying degrees of success.

The first thing I did, after successfully rebooting, was to setup Time Machine. After this episode, I realized the need for a good backup solution. If things are beyond repair, then I at least know that my data is safe. :-)

No comments: