I remember the old times where you could find a those geek guys that would use a hex editor to "patch" your favorite game and become bullet-proof or with 99...9 (still counting) lives or ... do what ever you wanted in order to win. At that time i was sure that hex editors are powerful to save you from a "disaster" but i couldn't think what a disaster can be.
We are using GPFS as network file system for our clusters and except from dummy scratch space for MPI jobs it is also used for some local user's home directory. A local team need to expand their GPFS filesystem so we had to add a few disks to our array. The procedure sounded trivial, adding the new disks to the array, create a new logical volume and finally add the new raw device to GPFS filesystem.
But of course something went wrong. The new volume was about 10TB in size which due to a GPFS limitation we had to partition in at least 2 partitions. Easy work via parted but what happens when parted (for a reason still unknown) "modifies" the partition table of other logical volume which is part of the GPFS filesystem as a whole (without partition table).
Well the parted simply ruins the first sectors of a GPFS NSD which means it ruins all the valuable information (Disk ids and NSD ids as well as the filesystem definition) from this disk. The users report "We are receiving 'Input/Output error' when using the X file" and everything gets worse and worse.
Fortunately there IS a solution to this disaster. Although we couldn't find any official IBM documentation on this (apart from some posts in GPFS's forum), there is a way to recover from this situation. What you need is a hex editor, the famous "dd" and a lot of patience.
First copy the sector 8 from each disk within the GPFS filesystem. This sector is the File System Descriptor and it is common on all disks. Next we have to recover sector 2 and sector 1. Sector 2 is GPFS disk identifier (know also as the NSD-ID). Finally the sector 1 contains information about the disk which is called the disk descriptor.
Due to legal reasons i'm not sure if i'm allowed to reveal more information on how to do this but studying carefully the sectors starting from 8 and going to 2 and then 1 you are able to recover your FS.
The Three Co-ordinators
-
It is has been a while since we posted on the blog. Generally, this means
that things have been busy and interesting. Things have been busy and
interesting...
10 years ago
No comments:
Post a Comment