?

Log in

No account? Create an account

Previous Web | Next Web

yucky days!

well, the last couple of days have been all panicky and horrid... seems there was something a little buggy in one of the tools I've been trying to get to make a USB stick bootable... anyway, the Volume Boot Record (VBR; that is, sector one of a volume/partition(/"drive" in windows-speak)) on one of my data HDs got screwed up... which isn't an issue until you reboot! And then it breaks access to that volume entirely!


Yuckiness...!

OK, quick description of what a VBR is... so, on a hard-disk, the 1st block is called the Master Boot Record (MBR)... this contains the primary bootstrap program, the partition table, an "id" field and a special identifier sequence (see http://littlemissgoth.livejournal.com/85333.html for a bit more on the MBR and bootstrap loaders)... so, the VBR is the 1st block in a partition (or on a floppy disk/whole disk volume (iow, a "partitionless" disk)).. its similar to the MBR... except it doesn't have the partition table; if memory serves, it has the "top" of the filesystem structures instead... or maybe a pointer to it)... anyway, the VBR's contains another bit of boostrap code... this is what gets chainloaded by the MBR. The VBR's bootstrap's job is to load the next stage of boot-up... so in an NT-based Windows the VBR will load ntldr, in MS-DOS it'll go for IO.SYS (actually, it may even *be* IO.SYS... I can't remember... (although I suspect probably not...)).

Oh yeah, bootstraps generally contain a "bitch'n'wait" chunk in case the disk *doesn't* contain a bootable OS... thats the "Insert system disk and retry..." type message that DOS displays when you try and boot from a floppy that doesn't contain DOS boot files.

So yeah, thats *fundamentally* it...

Anywho, in NTFS, the VBR contains the Master File Table (or atleast some of it/a pointer to it)... this is rather, um, important.... without it, well, no File Table == no filesystem!!!

So I've spent two days trying to recover this one block...!

First thing was to try and find out if there was anything to recover... at this point, I only really had a theory as to what had happened...

So, I pull the Windows XP hdd out and swap it for the shiny new 300Gb drive I'm gonna use for my reinstall (lucky I've got it and its empty really!)... this gives me somewhere to make a backup to...

But this means I have no bootable OS...

Anyway, I want Linux for this job (I much prefer the flexibility of Linux/Unix for disk-access...)

So anyway, I dig out a Knoppix DVD and boot it, and try to mount the damaged FS... and no joy...

So, next is to check the partition table is ok... a bit of fdisk (1st reason for not using Windows... I *like* Linux FDisk... nice and powerful), and everything looks fine... so its looking a bit like my theory was right...

Next is to look at the raw filesystem... (2nd reason...!) something I don't know how (or if you even *can*) do in Windows, but its easy in Linux :-) Anyway, "less /dev/sdb" (cos, unlike more, less doesn't try to read its target file into memory all in one go before showing it... I do not have 300Gb of RAM!!!!)... Two strings immediately jump out of the first screen-full... something like "MS-DOS 5" and "UBCDForWindows".... that just *reeks* of this being a VBR for a FAT filesystem (that is, the old MS-DOS filesystem) that has been created by something to do with the Ultimate Boot CD For Windows (the thing I was trying to make a bootable USB-stick of)... so, not I'm about 99.9999999999999999999% certain my theory is correct!

But what to do about it?

Well, for some reason, I can't get Knoppix to go online, and I don't have another machine with Internet access in here... and my parent's were using their machines....

So, I could wait for them to finish, or I could just try and bodge it! ;-)

Well, first things first, before we try fix it, lets back it up... afterall, with filesystem repairs, fixes that fail have a tendency to be rather destructive!!!

So, a bit of "dd" magic and 3 1/2 hours later(!) I have a copy of the damaged filesystem in a 280GiB file on a somewhat tuned ext2 filesystem (tuned is so much as it was created with "sparce superblocks", no reserved space and as few inodes (iow, as many bytes-per-inode) as I dare go with... which basically just means I get the overheads of the filesystem down to a minimum and gives me more actual usable space to play with) on my new drive...

At this point though, I have no idea if that is a backup of anything! It could be an image of a totally screwed filesystem (that is an ex-filesystem, RIP!).

Right, so now I can atleast start to try things... so, the first thing is to try and run Windows' ChkDsk on the volume and see whether it can fix it... so, swap drives back and reboot... try "chkdsk /f z:" from a cmd prompt... and it fails... can't remember the exact message, but it was something like it couldn't recognise the filesystem... Drive Properties reports a "RAW" filesystem... this is obviously not good :-(

So back to Knoppix... lets try and throw a new VBR on the volume and see what happens... afterall, I know this bit is broken... or rather, lets throw a new-ish VBR at it...

So, dd the first 512b of the NTFS partition on my other external drive to the broken filesystem, and try to mount it...

And it mounts! !yay!

OK, so the next step seemed sensible at the time.... in retrospect it was a Bad ThingTM to do!!!

Reboot into Windows again and run chkdsk /f again... and it finds LOTS of errors... not good, not pleasant; not happy... :-(

See, one of the things on this partition is that there are files that are essentially just "zerod out"... they're incomplete downloads that the program allocates the size of the whole file on disk... means that there's room for the download to finish (unless compression is turned on, and then they get treated pretty much as sparse files... and only take up the room of the downloaded bits!)... problem is, chkdsk has decided to truncate all these files to zero bytes.... and the download program's status-tracking doesn't get reset/updated to reflect this... so they all break!!!

Damn!!!!!

Hmmmph, -sulk-

Back to Knoppix... oh, it won't mount... apparently my $MFT and $MFTMirr are now different!! sounds painful!!!

(actually, sounds like there's a backup MFT and just copying a "new-ish" one over has caused trouble...

So, back to Windows so I can get online... Google "NTFS MFT" to try and find out where the backup MFT lives... plan now is to restore the "broken" FS from my backup dd the backup MFT to the VBR...

Except one of the results is "Advanced NTFS Boot and MFT Repair," and references a (reasonably mature) open-source disk tool called TestDisk... and it gives me a potentially better plan...

So, back to Knoppix, back to dd and restore the backup to the partition....

This doesn't go well... Either I didn't specify the blocksize when I copied the volume to the file, or it just did a better job copying in that direction, but after five hours (I initially just thought writing to the USB drive was slower than writing to the SATA internal... so gave it some extra time, but after that long, I decided it was getting beyond a joke and stopped it... turned out at 33Gb complete!!!!!!!). So yeah, I messed around with hdparm, KDE Device Manager and some other tools, trying to figure out if there was a setting at fault somewhere... and then messed with dd... dd-ing from the SATA drive to the SATA was lovely and fast 120Mb/second or there abouts, but dd-ing to the USB drive was a stunning 1.9MB/second or something.... so 300Gb would be, well, days....! I even tried rebooting just to make sure I'd told Knoppix to enable DMA... all to no avail...

In the end I managed to get something like 22MB/second out of dd by specifying a block size of 1048576... (that is, 1MiB)... I did try 10Mib, but it was no faster, and I couldn't be bothered trying 2MiB... or rather, calculating (it was late, I was tired..!) and typing in 2097152!!!!

So, this morning I got up to a restored damaged filesystem...!

iow, I was back to square one!

except, now I had a plan!!!

So, back into Windows again... and this time run TestDisk... Nice; its a console app, not a GUI tool... its only a little thing, but it does suggest the programmer(s) focussed more on function than form... and the TUI is plenty powerful enough (*nix folk, think b&w curses-esque)...

About 5 minutes later, I have a Windows-mountable filesystem again... and my files aren't zero bytes are this point... and the download program seems happy with them...

Thats where I'm at now... the downloader is currently checking all the files... and cos there's a fair bit of stuff in there its taking a while... (and its peer2peer, so its scanning the finished files too....) seems to be trying to do every 'job' in parallel too... which I suspect is possibly not the fastest way to try to scan 300+ files... would've thought serial execution would have avoided I/O bottlenecking better... but then, what do I know??? ;-)

Anyway so far today, so good...

I haven't (yet) dared run chkdsk.... but once the file checking is completed, thats my next job....

If it fails for some reason (or decides to break all my files again... -grrrr!-) then I atleast know how to get back to this point (thats why this entry is rather more detailed than perhaps is necessary for pure journalling reasons....)

And I also know that I need to try and come up with a way to protect the files from chkdsk.... the problem is that I don't have a 2nd 300Gb drive to dump everything onto... I don't think... I wonder how big the drive is on my mum's machine...........!