Dereferencing NULL: Mac OS X Debugging Adventures

For the past few months, I've been spending some of my free time working on a cross-platform screensaver, one that runs under Windows and under Mac OS X. Getting the Windows version up and running was relatively easy. A Windows screensaver is just an exe with a special extension (.scr) and some code to handle a few special command line parameters. Having written Windows game code before helped there. Getting the Mac OS X version running was quite a bit trickier, due in part to my inexperience with OS X. I place some blame on OS X's screensaver engine though, which ended up being rather tricky to work with. It deals with screen savers in a very different manner than Windows' approach. OS X expects screensavers to be represented not as standalone applications, but as dynamically loadable modules, which can be loaded via one of a few different OS X-provided applications.

Every OS X screensaver module must adhere to a certain set of guidelines. Among them, they must be represented as an OS X bundle, they must provide a subclass of the OS-provided ScreenSaverView class, and they should play nice with other screensaver modules. Sounds ok, except I found the implementation of these guidelines to be a bit tricky. Why?

OS X screensavers are subject to control via an external main loop, which each screensaver hooks into (for initialization, shutdown, time-stepping, drawing, etc.) This design seems ok in some ways: having no main loop to create reduces the amount of code needed for each screensaver to implement. I could see ways in which this could be easier for the beginner. In my case, there have been times that I've wanted to debug and profile portions of my screensaver logic using a main loop of my own, and without having to deal with the OS X screensaver engine.
All OS X screensaver modules get loaded into the same address space. Furthermore, they get loaded in such a way that no two modules can share the same symbol names. If two modules implement a function with the same name, there's a good chance one or both of them will crash, even if only one of the two modules is actively displaying a screensaver. One solution to this is to name functions such that they don't clash, such as by appending a unique name to each function. Another solution, the one I chose to work with, is to load almost everything in a second dynamically loaded module, which gets loaded in such a way as not to cause naming clashes (via dlopen with RTLD_LOCAL.)

For the times when I want to debug or profile my app without OS X's screensaver engine getting involved, I can switch to Windows and use that version. I don't always want to switch to Windows though, so I created a small Mac app to help with debugging. It loads the platform-independent portions of the screensaver code and drives them via its own main loop, avoiding the OS X screensaver engine altogether. I recently ran into a problem with that approach though. I created the app, ran it, confirmed that it worked, then hit a case where I wanted to step into the screensaver code. When I tried to launch the app with a debugger attached, the app would immediately crash. It's main function would never get called, and I'd be stuck with a debugger command prompt and the following error message:


dyld: Library not loaded: @loader_path/../Frameworks/libBox2D.dylib
Referenced from: /Users/davidl/Documents/Code/Platformer/trunk/MacOS/build/Debug/Lugnut_Native.so
Reason: image not found
Program received signal: “SIGTRAP”.
Xcode: Introspection dylib not loaded because thread 1 has function: __dyld_dyld_fatal_error on stack

To note, Lugnut_Native.so is the current name of the shared library that contains most of my screensaver code, and libBox2D.dylib is another shared library that it depends on. "Platformer" is the original name of the project, which started out life as a 2d platformer, and eventually evolved into a screensaver having nothing to do with 2d platformers. Such is life.

The error message listed above says that my screensaver library, Lugnut_Native.so, can't load one of its dependencies, libBox2D.dylib. It's looking for it, and it can't find it.

A bit of background info: "@loader_path" is a term that has special meaning to the Mac OS X dynamic linker. When one module tries to load another module, it can use the token, "@loader_path" to indicate that the module to be loaded is located relative to the location of the module doing the loading. In this case, the module Lugnut_Native.so was trying to load another module, libBox2D.dylib, and it was trying to load it from a location relative to itself. Both modules were supposed to be located on my hard disk in locations predictably relative to each other. This is where "@loader_path" came in. It says that Lugnut_Native.so should expect that it's dependent library, libBox2D.dylib, in located spot relative to itself. The OS X dynamic linker replaces "@loader_path", with the path of the directory that Lugnut_Native.so exists in.

The problem ended up being that the wrong copy of Lugnut_Native.so was being loaded. There were two copies of the file, both getting created in the app's XCode-driven build process. The first was created by the linker. This was the copy that was getting inadvertently loaded. The second copy, the one I wanted to load, was a copy of the first placed inside the app's bundle, a location where OS X was supposed to be able to find it. In some cases, OS X would find it. If I ran the app outside of XCode, everything was ok. When I ran it from within XCode with a debugger attached, it'd crash immediately. Why would it do this?

The latter copy was the one I wanted to load. It existed inside the app's bundle. The copy that was getting loaded wasn't. When the OS X dynamic linker tried to load the incorrect copy, it was unable to find libBox2D.dylib in the specified location!

The solution I used, and there may have been several, was to tell XCode that when it launched the app, the dynamic loader should try to load dynamic libraries from a specific location, namely the directory where the desired copy of Lugnut_Native.so existed (inside the main application's bundle, to note.) It did this by making sure an environment variable called DYLD_LIBRARY_PATH was set, and that it got set before the app launched. DYLD_LIBRARY_PATH, when set, tells the OS X dynamic linker to load dynamic libraries from a given path. By setting this variable to the location where the desired copy of Lugnut_Native.so was, the dynamic linker should load it, or so I hoped at the time.

When the app crashed, it presented me with a command prompt, with which I was able to list the environment variables exposed to the app and to the dynamic linker. The command, "show environment" (without the quotes), said that DYLD_LIBRARY_PATH was already set and that it pointed to the path with the incorrect copy of Lugnut_Native.so. If there were a way to tell XCode to set DYLD_LIBRARY_PATH to something else, maybe I would be able to debug my app.

As it turns out, there was a way to set DYLD_LIBRARY_PATH before the app ran, thus making sure that the correct copy of Lugnut_Native.so would load, and thus allowing me to debug my screensaver in the manner I was hoping for. Here are the steps I took, minus most of the annoying missteps I ran into:

Under the "Executables" section of the XCode project, I clicked on the app to debug, then pressed Command-I to bring up its Info dialog.
Clicked on the "Arguments" tab of the dialog that came up.
In the section, "Variables to be set in the environment:" section, I clicked on the plus sign to add an entry.
In the new entry, I set the name to be DYLD_LIBRARY_PATH, and the value to "$SRCROOT/$CONFIGURATION_BUILD_DIR/$FRAMEWORKS_FOLDER_PATH", without the quotes. The value here told the dynamic linker to try loading modules from the application's bundle first and foremost, which is where the correct copy of Lugnut_Native.so was to exist. Furthermore, it gives the dynamic loader a full path name, rather than a relative one. This turned out to be important. When I tried setting DYLD_LIBRARY_PATH to just, "$CONFIGURATION_BUILD_DIR/$FRAMEWORKS_FOLDER_PATH", it didn't work, which is not what I was expecting. $CONFIGURATION_BUILD_DIR, when used elsewhere in XCode, usually resolves to a full path name. Prefixing "$SRCROOT" to this value fixed it.
I closed the dialog and then launchd the app in the debugger (via XCode's "Run" menu.) It worked!

From then on in, I've been able to debug my screensaver under OS X using a main loop of my own creation, which is a bit more flexible than trying to debug the screensaver via OS X's screensaver engine. Listing the reasons why are beyond the scope of this blog posting, perhaps some other time I'll list them. Time for a break. :-)

4 comments:

Unknown said...: Hi,
I think I have the same problem.

I've kind of a screensaver template for flash content. The name of the main class is always the same so I can't install several of those screensavers at the same time.

You wrote:
"the one I chose to work with, is to load almost everything in a second dynamically loaded module, which gets loaded in such a way as not to cause naming clashes (via dlopen with RTLD_LOCAL.)"

Can you give me example how to do this?
Would be great.
Thanks, Florian; 2:53 PM
David Ludwig said...: Some code has to be unique for each screensaver, so far that I can tell. Each screensaver needs at least one unique subclass of ScreenSaverView. If you're looking to create a lot of screensavers, my guess is that you'll need to create subclasses for each of them, however I could be wrong.

Beyond creating unique subclasses for each screensaver, I created a shared library (aka, a .dylib file, creatable using XCode's provided templates) and put my code in there. Then I had my subclass of ScreenSaverView open the .dylib file via dlopen(), ask it for pointers to a few of its functions, then I called the functions as needed.

Anyways, the general concept behind dlopen and dlsym is that you can open a shared library given its file name. For example:

void * lib_handle = dlopen("path/to/my_shared_library.dylib", RTLD_LOCAL);

Once you've opened a shared library, you can retrieve pointers to C functions within that library, which can be called like any other C function. For example:

typedef (*my_function)(const char *, float);
my_function f = (my_function) dlsym(lib_handle, "my_function");
int x = f("blah blah", 123.45f);

When you're done with the library, to note, be sure to close the handle via dlclose. For example:

dlclose(lib_handle);
lib_handle = NULL;

I ended up wrapping my existing screensaver code in a C++ class, which I placed in the shared library, along with C functions that could instantiate and delete instances of the class. The class had member functions that could draw the scene (relevant as I was using OpenGL directly), tick the simulation, retrieve config values, set config values, etc.

One note on dlopen/dlsym/dlclose, if you've never worked with those before, be careful of where and when you allocate and free memory. Typically, memory allocated within a particular shared library can only be deleted within that same shared library. Likewise, memory allocated within a particular application, or screen saver, can only be deleted within that app. Allocating memory in one library or app and then deleting it in another is a no no.; 3:52 PM
Unknown said...: Hi Daniel,
thanks for your answer!
The unique subclass is really a problem because I can only compile one template.

But I found another workaround: I can replace the class name in the binary file using a hex editor. I just have to run a search and replace with the original class name and a new string with the same length.

Perhaps I can write a little app that does this job …; 5:18 PM
Anonymous said...: Thanks for this post! It helped me a ton!!!

-ATLAS Physicist; 10:00 AM

Dereferencing NULL

Sunday, December 07, 2008

Mac OS X Debugging Adventures

4 comments:

About Me

My Blog List

Miscellaneous

Blog Archive

Music To Code To