wednesday, 14 february 2007

posted at 21:44
[packet] in init
[packet] in open
[packet] devicename 'fdsk.device' unit 0 dosname 'PKT'
[packet] couldn't load fat.phandler
[packet] couldn't load L:fat.phandler
[packet] loaded DEVS:fat.phandler
[packet] starting handler process
[packet] in packet_startup
[packet] calling handler
[fat] starting up
[packet] started, process structure is 0xb7616c48
[packet] sending startup packet

FATFS: opening libraries.
        FS task: b7616c48, port b7616ca8, version: 0.1debug [AROS] (Feb 14 2007)
        Device successfully opened
        Disk change interrupt handler installed
        Initiated device: "PKT"
Returning packet: ffffffff 0
Handler init finished.

[packet] handler fat.phandler for mount PKT now online

Got disk change request
        Disk has been inserted
        Reading FAT boot block.
        Reading sector 0
        DoIO returned 0

        Boot sector:
        SectorSize = 2
        SectorSize Bits = 1
        SectorsPerCluster = 4
        ClusterSize = 8
        ClusterSize Bits = 3
        Cluster Sectors Bits = 2
        First FAT Sector = 256
        FAT Size = 5120
        Total Sectors = 256
        RootDir Sectors = 32
        Data Sectors = -10448
        Clusters Count = 1073739212
        First RootDir Sector = 10496
        First Data Sector = 10528
        Invalid FAT Boot Sector

That there is the output of the moment of truth, where you know that you're on the right track and every is going to work out OK. I had the same kind of magical moment when working on tap.device, where the foundation is in place and the rest is just adding features. Its extremely satisfying.

For the uninitiated, this is the debug output from FATFileSystem as it mounts a ten-megabyte image created under Linux with mkfs.vfat and made available to AROS via fdsk.device. It seems to be correctly reading the image, which means my replacement block code is correct, and the handler is happy doing its own thing.

This comes on the end of over two days of completely depressing debugging work. I've been deep inside LoadSeg(), I've disassembled the handler code, I've looked desperately for any kind of unusual AROS-ness that might be causing gdb to spit up some truly outrageous output, such as the same function appearing multiple times in the stack trace.

The problem was eventually found in AllocDosObject(). This function, among other things, allocates struct StandardPacket objects, which are a kind of wrapper around the normal DOSPacket structure, providing an Exec message as well as the packet itself. The thing it doesn't do is link the packet and the message, so any attempt to access one via the other resulted in a null-pointer dereference and a crash.

I have no idea why gdb was handling it so badly, but even stepping the code before the crash produced the wrong values. Checking the value for a certain global pointer yielded the "base" value of that symbol in the executable itself before relocation, which is what led me down into LoadSeg(). In the end, printing the value showed that gdb was quite wrong, and thus leading me off the scent.

I was really excited when I finally got this all sorted out and got the thing to run. So excited that I very nearly cheered and punched the air in the business meeting I was in at the time. I was bored, and coding excites me :)

Next step is to implement IOFS-to-packet translation in packet.handler. Soon I should be reading actual files :)