monday, 4 june 2012

posted at 11:22

I use IRC a lot, both for work and personal stuff. I use bip, an IRC proxy, to keep me in my channels all the time and log stuff so that I never miss a thing. I run it on my home server and connect to it from XChat from work, home or wherever else I happen to be. It works well.

I also use IRC from my phone, using AndChat. I connect directly the networks and channels I'm interested in with that. It works very nicely and lets me keep track of things as I move around, which happens a lot. Unfortunately its at the mercy of the madness that is mobile connectivity, but that's hardly its fault.

Lately though, I've had a problem. AndChat has been unable to hold a connection to Freenode. It will connect fine, but then after a little while if I go to send something, I find the connection has actually dropped in the background. AndChat dutifully reconnects, but by that time I've lost any conversation that was happening. It also meant that the other people in the channel were seeing lots of connects and disconnects from me. Its fairly normal for IRC, but it looks messy and I'm not keen on that.

The thing that I found curious through all of this was that my conenction to work's IRC server never dropped. So its likely not AndChat at fault, but something lower down. I have been upgrading the Android version on my phone quite a bit, trying to find the "best" community version of ICS for it. Its likely there's a change there.

After a lot of searching and piecing things together, the conclusion I've come to is that the particular build of Android (at least, maybe all 4.0.4 builds) don't send TCP keepalives as often as they have in previous versions. Whatever interval is set is longer than the connection idle timeout set by my service provider. That is not a problem for work's IRC server as it sends keepalives far more regularly. Freenode however does not seem to send any at all.

I pointed the phone at the bip proxy for both services, which sees them both losing connection. This appears to confirm my suspicions, and unfortunately also shows that bip doesn't send keep[alives. Happily its open-source, so I can fix it. Into the code we go!

The way to enable keepalives on a socket is quite simple:

int on = 1;
setsockopt(s, SOL_SOCKET, SO_KEEPALIVE, &on, sizeof(on));

Keepalives have three parameters: time the connection has to be idle before keepalives start, interval between keepalives, and number of keepalives sent without response before the the connection is declared invalid. These parameters are defined at the OS level for all sockets, and on Linux default to 7200, 75 and 9. That's right, two hours idle before starting keepalives. Not at all suitable for what we need.

There's no standard interface for changing this parameters on a per-socket basis, but Linux has helpfully provided its own socket options to allow Linux applications to do this. I'm hardly concerned with portability for this hack, so these are exactly what we need:

int idle = 60;
int interval = 60;
int count = 5;

setsockopt(s, IPPROTO_TCP, TCP_KEEPIDLE, &idle, sizeof(idle));
setsockopt(s, IPPROTO_TCP, TCP_KEEPINTVL, &interval, sizeof(interval));
setsockopt(s, IPPROTO_TCP, TCP_KEEPCNT, &count, sizeof(count));

That is, start sending keepalives after one minute idle, send every minute after that, and five missed responses mean the connection is dead. These seemed like reasonable numbers. I don't want to ping too often as each ping makes the phone do work and thus use a little bit of battery. This seems to be working very well though!

thursday, 17 july 2008

posted at 09:07

So, long time huh. Time to write something, I guess.

Its hard to write again after a break, not least because so much as happened in the last three weeks that unless I write for another three weeks I'm never going to cover it all. So right now I'm not going to try. i'll just write about what I've been working on, get caught up to some degree, and maybe come back to any other interesting stuff down the track. So here we go!

I've long threatened picking up a DS homebrew kit and doing something interesting with it. I finally snagged a M3 DS Real from the Monash post office, of all places. Its a cute little thing. I got a 4GB microSD card with it, so I should have enough grunt to do anything I could ever hope to do with it. The next thing is just to work out what that is. I have an idea of course, one probably rather predictable if you know me.

I have an interest bordering on an infatuation with the game Frontier: First Encounters; specifically the JJFFE variant. Its a great game, and the effort and style that went into making an old Windows game come to life on more modern systems still totally imprresses me. John Jordan took the game binary and run it through a custom disassembler that produced an assembler source file that would compile on whatever platform he chose. His disassembler also identified operating system calls for graphics, sound, input and filesystem access, which he then abstracted and reimplemented for DirectX, SDL, and whatever else. So now the game runs on (i386) Linux without issue as well as on modern Windows systems. He even fixed a heap of bugs. Thats great!

I've messed with this code at various times. I implemented the OS abstraction for AROS a couple of years before I got involved in AROS proper. (That work later led to me working on some graphics subsystem speedups and a SDL graphics driver for AROS). I've also long dreamed of somehow converting it to pure C so that it could be hacked on properly. I've dabbled with this at various times, both using automatic and manual techniques, but haven't really got very far mostly because of because of the limited success others have had with the general problem of decompiling assembly back to C.

So anyway, I got a DS kit, and of course started to think about how cool it would be to play FFE on it, and also about how to take advantage of the dual screen and touch screen. I've been dreaming of interesting UI changes that would make the game work much better on the DS, but of course first I have to get the game working there. That is not a trivial task, and has been the subject of inquiry for the last couple of weeks.

The problem is obvious. The DS has a pair of ARM CPUs. The JJFFE source is in i386 assembly. So there are really only two options - some sort of emulation, or converting the code to a higher-level language and then recompiling it for ARM.

Emulation would only really require a processor emulator for the game core since all the systems interaction could be done in C, and perhaps would have been the easier option, it doesn't help much with my eventual goal (or "hope", rather) of making significant modifications to the code to support the DS hardware properly. So instead, I've again returned to converting the assembler code to C.

As mentioned above though, this is something I'd pretty much given up on as being too difficult. I thought about it for a while and realised as a first cut, I don't need to convert it back to anything resembling its original C. Instead, what if I was write an assmbler that produced C code that implemented the processor instructions, rather than producing raw machine code. The result would look much like C - we'd essentially have a kind of static CPU emulator built into the program code itself, with global variables representing the processor stack and registers. But, it could be recompiled for another CPU, which is the point of the exercise.

This seemed like a reasonable approach, but writing an assembler is insanely complicated. After attempting a hacky parser in Perl, I decided that nothing short of a full assembler would be able to do the job. NASM proved too complicated to penetrate, but then I found YASM, which is a highly modular clone of NASM.

So I took YASM and started writing a custom object format, one that would output C code. However, after experimenting to gain some experience with the interface, I realised that I was just getting the raw machine code and then converting it to C with a little bit of symbol table gymnastics to identify and produce simple unadorned C functions. This reminded me of a project I worked on for a while in 2004 that turns out to be much better suited. That project is a custom disassembler/decompiler of the same kind of was used to produce JJFFE in the first place! Let me explain.

Another old game that I love is Transport Tycoon (actually its sequel, Transport Tycoon Deluxe). At the time, it was Windows-only. There was a project called TTDPatch which would take the server binary and hook all sorts of stuff into it to add new features and fix bugs and whatever else. This worked well, but it was still Windows-only. Wine did a reasonable job with it, but it was still less than ideal. So I decided that I'd give it the same treatment as FFE got, and produce a disassembly and system abstraction that could be run anywhere.

I spent a lot of time studying JJFFE and Jordan's decompiler and even had a series of email discussions with him to get a feel for just how to do this. After several weeks I managed to get my decompiler to the stage where it produced a viable disassembly and C file of OS call stubs. But, as fate would have it, the day it compiled and ran for the first time (segfaulting of course, as I hadn't yet learnt about how Windows uses the %fs segment register), OpenTTD was announced, which was essentially a conversion of the original game back to C. My decompiler had no further reason to exist, and so I abandoned it.

The way it worked was pretty straightforward. It implemented what is essentially a Portable Executable (ie the Windows binary format, like ELF for Unix) loader with calls into the NASM disassembler to analyse the code and produce a disassembly and a stub file. Simplified, it does the following:

  • Inspect the program binary and find the code, data, bss, import and relocation segments.
  • Load the program binary into RAM.
  • Apply the relocations to produce a complete program image, additionally creating a "label" for each relocation.
  • Inspect the import section to build a list of all the external libraries and functions that the program wants.
  • Disassemble the code segments to find all the relocation labels that are in use and what they point to. From the instruction type, we can determine whether the target is code, data, bss (ie unitialised data), a vector table, etc.
  • Disassemble from each code label to the next to find any other labels missed in the first disassmble run. That might have happened, for example, if there were "garbage" bytes in between the end of one function and the start of another that caused the wrong disassembly to be produced crossing the function boundary.
  • Do this dissambly over and over until no new labels are produced.
  • Run through any relocation labels that have not been processed yet, and make them data labels. This works off the assumption that if the linker thought it important enough to include a relocation, we should probably include whatever that relocation points to in the output, even though we couldn't actually find it in the code.
  • Output EXTERN declarations for each external function name.
  • Disassemble from each code label again, this time producing actual output. Any memory references in the output (ie things beginning with 0x) get replaced with their corresponding label, if there is one.
  • "Disassemble" from each data label, producing raw byte output (ie db or dd). For any data that was referenced via a relocation, produce the corresponding label.
  • "Disassemble" from each bss label, producing a memory reservation in the output (ie `resb

Theoretically, the resulting output from that is just about usable. There's a bit of manual cleanup that has to happen (like the mentioned deal with the %fs register), but this output should at least compile and link, which is most of the fun. Theoretically you implement the stubs for your platform and you're away.

So, back to our original problem of producing C from a binary. I realised that in this code, I'd already done most of what I needed. I know where all the function boundaries, jump points, vector tables and data are. All that needs to happen is instead of producing formatted assembly, all I need to do is produce some equivalent bit of C code. There's some complication of course, like the fact that sometimes several instructions map to a single C construct (like if or while) but I figure I'm most of the way there.

So right now, I'm working on cleaning up and generalise the decompiler, which I've christened opsoup. It was pretty heavily tied to the PE format before, which of course is no good for me - I need ELF. I'm not bothering with trying to keep it compatible with PE at this point, as I have a pretty specific purpose. I can always add that back in later if I ever need it.

I have absolutely no idea how this is going to go, but its fun finding out. In adition to playing my game, I'm hoping that having the code in C, even horrible not-quite-C, will make it much easier to gradually convert some of the code in actual nice C (due to the availability of things like gdb and printf). I don't expect it to happen fast, but I've been hacking at this code on-and-off for the last five years, so messing with it for another five doesn't really concern me that much.

tuesday, 18 december 2007

posted at 11:17

Now that I've (apparently) fixed the loader, my mammoth WebKit test binary loads and runs, and so I've begun implementing the stub functions with earnest. To start my method has been to run the program until it crashes, find out where the crash happened, which is usually a NULL pointer dereference, and then provide a basic implementation of the class that that thing is supposed to be pointing to.

The current problem is a crash that occurs inside a regular method call, for no apparent reason. The offending method, in its entirety:

void DocumentLoader::setFrame(Frame* frame)
{
    if (m_frame == frame)
        return;
    ASSERT(frame && !m_frame);
    m_frame = frame;
    attachToFrame();
}

Good old printf() tracing shows that the crash occurs after m_frame = frame but before attachToFrame(). That is, that method is never called. This is highly unusual, and tedious to debug, because it means we have no choice but to drop down to assembly code, which I can muddle through well enough but can't really wrap my brain around.

Disassembling the last two lines of the method, we get this:

    mov    0x8(%ebp),%edx
    mov    0xc(%ebp),%eax
    mov    %eax,0xc(%edx)

    mov    0x8(%ebp),%eax
    mov    (%eax),%eax
    add    $0x8,%eax
    mov    (%eax),%eax
    sub    $0xc,%esp
    pushl  0x8(%ebp)
    call   *%eax
    add    $0x10,%esp

The pointer to the current object, this, is on the stack, 8 bytes in, as is the frame pointer, 12 bytes in. So we see the value of this being dereferenced through the stack and stored in %edx, and then the same for the frame pointer, being stored it in %eax. Then the location 12 bytes into the object proper is computed (which is where m_frame is stored), and %eax (the location of the frame object) is stored in it. Thus, m_frame = frame.

The next chunk, predictably, is the call to attachToFrame(). The important thing about this method is that its what C++ calls a virtual method. It wasn't until Friday that it was actually explained to me what that meant, and I found it hilarious. Consider:

    Object *o = new Object;
    o->method();

    o = new SubObject;
    o->method();

(where SubObject is a subclass of Object).

Now, if method() is a virtual function, this will do what you'd expect from most other OO languages: the first call will call Object::method(), the second calling SubObject::method(). If its not virtual, then both calls will go to Object::method, because its taken from the type of the pointer, not the type of the object itself.

I don't know if this was considered counterintuitive when it was first designed, but its certainly not the way most OO languages work these days. Usually you have to be explicit when you want to call a superclass version.

In any case, the code generated is different. In the simple non-virtual case, the call can be done via an absolute address, as the compiler can know exactly where the method() function is for the type. The virtual case is more complicated as the object itself needs to be interrogated to find out where its function is.

To do this, a table for each class that the object inherits from is placed inside the object, containing pointers to the functions that the object wants to use for its virtual methods. A virtual method call might then be rendered in C as:

    o->_vtbl_Object.method();

That is, go through the table of implementations of methods defined in the Object class to find the method, and call it.

So, getting back to our disassembly. attachToFrame() is a virtual method. The code gets this from the stack, 8 bytes in, and puts it in %eax. Then it dereferences the pointer to find the actual memory location of the object. It then adds 8 to that to get the location of the virtual method table, and dereferences that to get a pointer to the attachToFrame() function, which goes into %eax.

Then it does the usual function call setup, making room on the stack for the arguments and return address, and then calls the function at the location in %eax. It is here that the crash occurs, because %eax has 0 in it.

I was floored when I first saw this. I checked a number of times in different places, finally checking the constructor itself. And sure enough, the virtual table contains all zeroes. To me this smelt suspiciously like a relocation problem - if the the ELF loader is not correctly doing the relocations for virtual tables, then they'll point to garbage memory, causing a crash.

I'm not entirely sure how this can be, and haven't figured it out yet. I need to check the place where virtual table is normally initialised, but I don't know where that is! I can theorise by thinking about the structure of an object and the virtual table internally.

The first critical thing is that the virtual table inside the object is a pointer. That is, when the memory for the object is allocated space is not allocated for the virtual table too. A pointer needs to be to point to a valid virtual table. There's two ways this could be done: setting a pointer to some known static data that contains the data for this class, or allocating some more memory and copying the pointers from same known static data.

The former seems the more likely to me. The extra allocation and copy seems unnecessary as the table for the object will not change during the lifetime of the object. There are seperate tables for each class the object inherits from, so there's no need for a group of tables to be mixed into a single one.

So given that as a theory, we should be able to find some code somewhere around the constructor that sets up the virtual table pointer. It'll probably be the first thing after the memory allocation is done. This code might not exist in the binary itself though but may be part of a language support library (libgcc or similar). Regardless, the thing that will need to be there is the virtual table location.

I'm expecting to find that the location of the virtual table is not being relocated properly by the ELF loader. Basically, I trust GCC to produce correct code than I trust our loader to do the right thing. The problem could also be within our linker, collect-aros, but its so simple that I'm happy to rule it out initially.

Stuart, get back to work!

Update 3pm: Found it. I missed one section header table index conversion when I was updating the loader for large numbers of sections. Stupid, but it never hurts to exercise my brain on the really low level stuff.

monday, 10 december 2007

posted at 14:46
tags:

Just a followup about the whole _GLOBAL_OFFSET_TABLE_ thing. Apparently this symbol is provided by the linker, so it makes sense that it doesn't work since the AROS linker is actually a custom ld script known as collect-aros which doesn't handle position independant code at all.

If we were to ever have real ELF-style shared libraries, this is one thing we'd need to implement. The other thing we'd need is a whole load of stuff in LoadSeg(), which is our "runtime linker".

Nothing to see here, just some notes for posterity.

saturday, 3 november 2007

posted at 12:40
tags:

I just finished implementing posix_memalign(). It will help with JavaScriptCore porting, as its allocator/garbage collector wants to do lots of memory tricks, including unusual alignments. I'll write more about my WebKit porting progress later.

I love doing pointer arithmetic. It spices up C so that I feel like I'm writing Perl one-liners:

int posix_memalign (void **memptr, size_t alignment, size_t size) {
    UBYTE *mem = NULL, *orig;

    /* check the alignment is valid */
    if (alignment % sizeof(void *) != 0 || !powerof2(alignment))
        return EINVAL;

    /* allocate enough space to satisfy the alignment and save some info */
    mem = AllocPooled(__startup_mempool, size + alignment + AROS_ALIGN(sizeof(size_t)) + AROS_ALIGN(sizeof(void *)));
    if (mem == NULL)
        return ENOMEM;

    /* store the size for free(). it will add sizeof(size_t) itself */
    *((size_t *) mem) = size + alignment + AROS_ALIGN(sizeof(void *));
    mem += AROS_ALIGN(sizeof(size_t));

    /* if its already aligned correctly, then we just use it as-is */
    if (((IPTR) mem & (alignment-1)) == 0) {
        *memptr = mem;
        return 0;
    }

    orig = mem;

    /* move forward to an even alignment boundary */
    mem = (UBYTE *) (((IPTR) mem + alignment - 1) & -alignment);

    /* store a magic number in the place that free() will look for the
     * allocation size, so it can handle this specially */
    ((size_t *) mem)[-1] = MEMALIGN_MAGIC;

    /* then store the original pointer before it, for free() to find */
    ((void **) &(((size_t *) mem)[-1]))[-1] = orig;

    *memptr = mem;
    return 0;
}

monday, 10 september 2007

posted at 09:33
tags:
  • mood: hacker

So after the compiler shenanigans of last week I finally managed to write some actual code on Friday. I started with just calls to SDL_Init() and SDL_Quit(), but the compile blew up in my face. The problem came from the fact that I was linking with -lSDL, which would have been fine except that AROS has its own libSDL for SDL apps running inside of AROS. The linker found that first, which is entirely not what was wanted, though even if it had found the right one I guess we'd be looking at namespace clashes for anyone who wanted to run a SDL app inside AROS.

After a bit of thought, it seemed to me that the only way out was to not link to the system libSDL at all but instead load it runtime using dlopen() and friends. This can work but isn't without its problems, as loading a library is not the same as linking.

When you write code, you call lots of functions that exist somewhere than other in your .c file. When you compile your .c file, it leaves placeholders for all the functions in the resultant .o file. Linking is the process of pulling in a pile of objects (.o), object archives (.a, also known as static libraries) and shared libraries (.so) and updating all the placeholders to point to the right bits of code.

When you link with a shared library, the link process replaces the function placeholders with stubs that refer to a library file that exists on disk somewhere. When you run the program, a program called the runtime linker (known as ld.so on Linux) looks through it, finds all the stubs, loads all the needed libraries and then fills in all the pieces to make a fully working program.

The idea is simple. By not having to carry a full copy of every required library with every program, program binaries are smaller and so use less disk space. Additionally, its possible for the runtime linker to only keep a single copy of a shared library in memory and point all programs to it, so you save memory when there's lots of programs running. The downside to the whole mess is the increased complexity in linking, the runtime linker needing to find all the pieces (/etc/ld.so.conf, LD_LIBRARY_PATH and ld's -rpath option), the fact that programs can't be as easily copied around because they have libraries that they need, etc. You don't notice this most of the time because we have smart tools to take care of all this stuff.

So back to AROS. dlopen() is not a linker. It merely opens a shared library and allows you to get at pointers inside it. You can obtain a pointer to a function, and then use that pointer to call the function inside the library. So this is possible:

    void *handle = dlopen("libSDL.so", RTLD_NOW | RTLD_LOCAL);
    void *fn = dlsym(handle, "SDL_Init");

The problem here is that the library does not contain prototypes, so we have no idea how to pass arguments to the function. We could build the stack by hand (assuming we knew the arguments), but then you don't get the benefit of the compiler doing type and prototype checking.

The normal home for prototypes is in the header files that come with the library. The problem here is that they define functions as real "first-class" functions. If we used them, it would cause the compiler to leave a placeholder for the function which would never get resolved because we never link -lSDL. Thats a build failure. Obviously though, we need the headers as they have all the prototype information, as well as other things we'll need like structure definitions.

Another problem we have is that we're going to need many, many functions from this library. libSDL has almost 200 functions. While we won't need all of them we can expect to need a fair few, so we need prototypes and calls to dlsym() for each one.

All this really has to be bruteforced. The method is to create a giant struct which has space to store many many pointers, and then, for each wanted function, call dlsym() and populate the list. Function pointers can be declared with the same name as a first-class function (as they're not in the same namespace) and with a prototype. An example is SDL_SetVideoMode, which has the prototype:

    SDL_Surface * SDL_SetVideoMode (int width, int height, int bpp, Uint32 flags);

We can create storage for a function pointer with the same prototype like so:

    SDL_Surface * (*SDL_SetVideoMode) (int width, int height, int bpp, Uint32 flags);

Once we have a struct with all the function pointers declared and initialised, then we'd call a function in it like so:

    struct sdl_funcs *funcs = <allocate and initialise>;
    funcs->SDL_SetVideoMode(640, 480, 16, 0);

The "allocate and initialise" portion of that is a loop that runs through all the function names (stored in a big array), calls dlsym() on each and stows the returned pointer in the struct.

All this is heaps of setup, but it works very well. To help with the setup, I've written a script called soruntime. It takes a shared library and one or more header files as input. It scans the library (using nm) and extracts the names of all the functions that the library provides, then expands the headers (using cpp -E) looking for prototypes for those functions. Once it finds them, it outputs a header file with the library struct (ie all the prototypes), and a code file that has functions to setup and teardown a library.

I'm currently integrating this into my source tree for the SDL HIDD. It could (and probably will) be extended to the X11 HIDD as well, which will provide some uniformity and make it so that if we ever do get an X server ported to AROS, there will be no clashes.

Another thought. With a HIDD that provides facilities for a AROS program/driver to ask the host to load and provide access to a shared library, the graphics HIDDs would not have to be compiled into the kernel anymore and instead could just be standard pieces "inside" AROS. If the UnixIO HIDD was extended to provide better file access features, the other HIDDs (parallel, serial, and the emul filesystem handler) could be modified to use it and thus also be moved into AROS-space. This gives a tight kernel with basically no dependencies. I've started stubbing a hostlib.hidd which will expose dlopen() and friends to AROS for just this purpose.

wednesday, 21 march 2007

posted at 22:32
tags:

I'm currently on a tangent within fat.handler. I started to rip out the caching, but because I hoped to add it back in later in some form I started making everything that needed disk data call a single function that gets the block from disk. Once I had that though it really just seemed easier to actually implement a basic cache right now, that could be expanded into something for suitable for use by all filesystems later.

My basic requirement was that at all times the cache knows exactly what the contents of a block is and whether it needs to be written out or not. For this reason, I've decided to go with a model where there is only ever a single copy of a block in memory, each with a reference count. That way if one part of the code modifies a block, other parts of the code will see those changes and not have an out-of-date copy. And at all times, the cache can know what's going on.

The cache will maintain a list of dirty blocks and write them out at regular intervals based on some (configurable) criteria. Basically, it'll do all the hard stuff. The filesystem should just be able to say at any time "I need block X" and the cache will sort it out.

To do this I need an efficient data structure to store the blocks. My first thought was a kind of hashtable where without the hashing bit - just modulus/bitmask the block number. We threw it around the office over lunch and did the maths and it turned out that the overhead would be huge. B-trees (specifically, B+trees) looked to be the way forward, so I spent quite a bit of time trying to implement one.

I used to be a CS major, but for some reason I just can't work with proper algorithms and data structures, only wacky hacks. I still haven't been able to make my b-tree work, but thinking about it further I realised that a flat array and a binary search will actually do just as good a job in this case. B-trees really shine when the nodes are stored somewhere where its slow to get at them (on disk). When its all in memory its advantages are much reduced.

Again, my brain conspires against me. It took me about three hours to implement a basic binary search over an array. I'm sorely disappointed in myself - this stuff is supposed to be childs play. At least it works. The basic stuff is in, with reference counting and all the rest. The array is currently eight bytes per entry - four the key int, four for the data pointer. That may go up to twelve if I end up needing a 64-bit type for the key, but the overhead is still minimal. The entries get allocated in chunks (which will probably be configurable), and grow (and probably shrink) the array as necessary.

Tomorrow I'll start adding actually loading the blocks from disk. After that I should be able to start refactoring the handler code properly.

sunday, 11 february 2007

posted at 10:53
tags:

"I must've put a decimal point in the wrong place or something. Shit, I always do that. I always mess up some mundane detail." -- Michael Bolton, Office Space.

And so it is with me. I consider myself a fairly good programmer, but I always make the most ridiculous mistakes, that usually cost me hours or days in debugging. Case in point: packet.handler creates a process to mimic the environment required for traditional filesystems. The structure of the startup code for these processes is the same as everywhere else in AROS - create the process and have it wait around until signalled by the main (creating process) to run.

For some reason though, whenever my newly created processes called WaitPort() the whole thing would segfault. I chased this around for over two days. Then, in desperation, I started writing a small program to try and capture just the relevant pieces of this setup for testing. In these cases I usually copy by hand rather than doing a real clipboard copypasta, so I make sure my brain sees all the code as I go.

As I was copying, I noticed something that clearly wasn't going to work, so I fixed it on the fly. A few seconds later, my brain kicked in. Sure enough, the same problem appeared there. Same fix, recompile, run. No crash!

The problem? CreateNewProc() returns a pointer to the newly created process. I store this in an (effectively) global chunk of memory. The new process was accessing this chunk of memory to get its process structure, but of course, it was doing this before CreateNewProc() returned in the main process. Invalid piece of memory, crash!

The solution is easy. Have the new process call FindTask() to get hold of its process structure, and all is well.

Avoiding this kind of thing is kiddie stuff for multithreaded programming. I've done this hundreds of times before. Its simple, and thus exactly the kind of thing I screw up.

saturday, 19 august 2006

posted at 14:18

Back in November 2005, I was learning alot about Javascript. I'd used closures (essentialy anonymous functions) in Perl for years without thinking about it, but the way Javascript scopes variables makes them not work quite the same, so I was suddenly forced to think about closures in more detail.

I had this on the old website and would probably have been content to let it die there, but for two things Daniel did this week:

  • He pointed the Rails IRC channel at it, and I just can't ignore that kind of exposure.
  • He dared me to make function currying work in C, and the two are related enough that they should be nearby.

So consider this use of closures in Perl:

@c = ();
for($i = 0; $i < 10; $i++) {
    my $x = $i;
    push @c, sub { print "x is $x\n" };
}
$_->() for @c;

Or the equivalent in Javascript:

c = [];
for(i = 0; i < 10; i++) {
    var x = i;
    c.push(function () { print("x is " + x) });
}
for(f in c)
    c[f]();

The C version would look like this:

#include <stdio.h>
#include "closure.h"

int main(int argc, char **argv) {
    closure c[10];
    int i, x;

    for(i = 0; i < 10; i++) {
        CLOSURE_INIT(c[i]);
    }

    for(i = 0; i < 10; i++) {
        CLOSURE_START(c[i]);
            int x = i;
            printf("x is %d\n", x);
        CLOSURE_END(c[i]);
    }

    for(i = 0; i < 10; i++) {
        CLOSURE_CALL(c[i]);
    }

Compile and run:

% gcc -Wall -ggdb -o closure closure.c
% ./closure 
x is 0
x is 1
x is 2
x is 3
x is 4
x is 5
x is 6
x is 7
x is 8
x is 9

closure.h provides the magic:

#ifndef __CLOSURE_H
#define __CLOSURE_H 1

#include <ucontext.h>

typedef struct closure {
    ucontext_t  enter;
    ucontext_t  exit;
    int         inside;
} closure;

#define CLOSURE_INIT(c) \
    c.inside = 0;

#define CLOSURE_START(c) \
    getcontext(&c.enter); \
    if(c.inside) { \
        ucontext_t __closure_end;

#define CLOSURE_END(c) \
        swapcontext(&__closure_end, &c.exit); \
    }

#define CLOSURE_CALL(c) \
    c.inside = 1; \
    swapcontext(&c.exit, &c.enter); \
    c.inside = 0;

#endif

I hope to be able to have an implementation of currying/partial application soon - got some great ideas already :)