LLVM Module Maps to the rescue!

Alex Nachbaur

on March 11, 2019

I recently wrote about Cocoa / Cocoa Touch frameworks, and in writing about it I was sorely tempted to dive into Modules, since they are pretty important to modern frameworks. But it was such a huge topic, I decided to break it out into a separate post.

In a nutshell, LLVM Module Maps were invented as a way to improve how source code imports other frameworks.

If you’ve ever worked on traditional C/C++ software projects (Makefile, CMake, gcc…any of these ring a bell?) you’ll know that the more code you add, the longer it takes to build, and the more likely you are to have conflicting types or macro definitions.

Pre-compiler

This is because of the C pre-compiler. Usually source code is scrubbed up before it’s compiled. The pre-compiler is controlled through rudimentary commands beginning with a “#” character. If you’ve ever seen #define, #ifdef, or something similar, you’re looking at pre-processor statements. These are used to add to, filter out, or redefine the contents of source files. And #include is pretty much nothing more than a way to copy and paste the contents of another file into your current one.

So if you #include “a.h” into “b.h”, the two files would essentially be smooshed together into one much longer source file (in-memory only, of course). This enables the compiler to “see” all the relevant symbols, allowing it to know the difference between a syntax error and a valid function call.

This had many problems, not the least of which was recursion: in our above example, what would happen if “b.h” imported “c.h”, which in turn imported “a.h”. Chaos! Actually, just a symbol conflict, but you get the idea.

To work around this, developers would resort to the tedious method of wrapping their headers in yet another preprocessor macro to ensure the statements in the headers, while included multiple times, are only defined once.

#ifndef MY_FILE_NAME_H
#define MY_FILE_NAME_H

// Actual header contents

#endif

That works alright, but compiler engineers figured that they could just create a new and improved #import statement that would essentially do the same thing, so we lucked out there for a while!

Compiler performance woes

Now our code no longer would have these pesky recursive header problems, but we still have the compiler performance problem, because it’s still essentially copy and pasting the headers of every file together each time you build each file.

You might say “That’s not to big of a deal, what’s with a few files copied together in memory?” Well, as an iOS developer, the first thing you do is add #import <UIKit/UIKit.h> to your header.

That’s an umbrella header that imports 226 other files, as well as the Foundation framework which consists of 120 files. If anything imports the Notifications framework, MapKit, or any of the other frameworks, you can imagine a huge gob of source that needs to be parsed together. And traditionally, all of that would need to be done for every file you build!

Apple tried to improve matters with cached, or precompiled headers, in the DerivedData directory, but if something unexpected were to change you could end up with the precompiled headers becoming out of sync with the libraries themselves (thus the dreaded DerivedData reboot many people are familiar with).

Even if you ignore compile speed, the problem is compounded when you want to have a snappy IDE with code completion, syntax highlighting, and other wiz-bang features that requires an in-depth knowledge of which symbols are valid, what their types are, etc.

Headers, please go talk to the Linker

Let’s assume you’ve written your code, imported your libraries, built your app, and everything works great! Then you decide to include (or import) a header from a framework you hadn’t used yet, such as adding MapKit support to your app. You build your app and while the source all compiles great, the linker starts complaining about missing symbols.

You say “I imported MapKit right there, see! Isn’t that enough?” Unfortunately, no. You see, the compiler and linker are two separate programs. The compiler just needs to see the headers for a framework to know which symbols will be there later, but it doesn’t need the implementation of them right away.

At the linking stage though, all symbols must be resolved so symbol A knows how to transfer control to symbol B, wherever those might be.

So in traditional GCC projects, if you didn’t remember to explicitly add each and every library you used to the linker build phase, chaos would ensue. This also became a problem when you’d later remove a framework, but forgot to remove it from the linker phase: you’d be linking code in that you’d never need, adding to your binary size and/or app launch time!

LLVM to the rescue!

With the switch from a monolithic compiler to a modular one, the creators of LLVM took the opportunity to improve things. Instead of managing frameworks through umbrella wrappers to group similar headers together, they would be elevated to official status as a “Module”.

A module defines a package with its name (with support for namespaces), defines what will be exported, what sub-modules exist (if any), indicates the name of the umbrella header, and which linked libraries are needed.

With LLVM and modern versions of Xcode, all frameworks get a module map, whether you know it or not. Unless you define the MODULEMAP_FILE build setting to indicate which file you want to use as the module map, Xcode will auto-generate one for you. The default usually looks like this:

framework module MyFramework {
    umbrella header "MyFramework.h"

    export *
    module * { export * }
}

Let’s break this down. This says a module named “MyFramework” will be defined, and that it follows Cocoa’s “framework” semantics. The header to be included in the framework is called “MyFramework.h”, and it should be considered an umbrella, so other sub-headers should be added as well. Within those headers, all symbols should be exported, and all submodules should have all their symbols exported as well.

Essentially this reproduces the same behaviour as our traditional recursive #import statements, with one notable exception: these symbols managed by these packages are uniformly namespaced, and can be indexed together much easier.

Instead of digging through a filesystem tree of symbols, a SQLite database (or any other data structure, really) can be created to list the symbols, where they’re defined, which module it’s in, etc. Furthermore, this index can be consulted quickly, and repeatedly, by any of the LLVM applications: the compiler, the linker, or anything.

The benefit of this is that since LLVM is modular, and not strictly limited to C-based languages, the module map structure makes language semantics irrelevant.

The net result is faster compiling, faster code inspection, and much more interoperable code across languages.

I see some buttons and levers, I want to push them!

Everyone reading this should know the Spiderman quote “With great power, comes great responsibility”. But I like to put it another way, which is “With great power, comes the great ability to shoot yourself in the foot if you’re not careful”.

The documentation for LLVM Modules is amazing and very rich, if you’re into that sort of thing. In it, the docs reference many different keywords that were obviously put in there to support Apple’s own ability to progress their technology without having to rewrite the whole stack. Much of it is of little use to us regular mortals, but there are some bits that are useful if you know what to do with them, and you have a sufficiently complex use-case.

At Salesforce, I’ve been working on an SDK project for several years, and I was faced with the seemingly conflicting challenges of making a binary distributable framework easy to integrate, and easy to maintain internally. I wanted to be able to abstract and separate out the implementation details for how the SDK itself was built internally. Some of the design goals included:

Separating core vs UI capabilities into their own separate targets in Xcode;
Dividing the SDK into logical feature buckets that could be maintained by different teams without needing to ship separate disparate frameworks (e.g. “Live Chat” vs “Knowledge Articles”);
Exposing private APIs internally without symbol conflicts, while stripping unsupported APIs from public use.

Luckily, LLVM Modules came to the rescue! The implementation details are beyond the scope of this article, but suffice to say the combination of custom LLVM module maps with sub-modules to expose the different internal targets, allowing us to build each sub-library as its own static library, define their structure in a module-map, and link the resulting static libraries together into one unified dynamic library. The explicit keyword allowed us to define a set of private modules which were declared using the MODULEMAP_PRIVATE_FILE Xcode build setting, which would later be explicitly stripped by our release process.

Pre-declaring dependent libraries

Another nifty feature of LLVM Modules is the “link” keyword. This allows your framework to declare which libraries or frameworks your product depends upon. So instead of writing documentation to your users stating “Please add libz, libxml, and MapKit to your linker phase”, you can just explicitly declare those dependencies within your modulemap. When the framework is linked against an application or another framework, the linker will detect those dependencies and will automatically add it to the list of required libraries.

With this feature, adding a pre-compiled framework to an application is as easy as drag-and-drop and everything necessary is linked automatically.

What was that about private modules?

Ah yes, the other feature of LLVM Modules. There are actually two build settings in Xcode for module maps: MODULEMAP_FILE and MODULEMAP_PRIVATE_FILE. This lets you specify two different maps which, as you guessed it, are for public and private access. Usually, within the private modulemap, you would utilize the “explicit” keyword when defining modules available for import.

Normally when using #import or the new module-aware @import variant, all exported symbols are imported into the file. That is except for modules marked with the explicit keyword, meaning the module needs to be explicitly referenced by name in the import statement. For example:

@import MyFramework;
@import MyFramework_Private;

This gives a lot more control over which headers are available at a given time.

Summary

LLVM Modules are one of those bits of glue that help hold things together, but thankfully most people don’t know, and don’t need to know, that they even exist. It’s a great piece of technology that improves our lives in Objective-C, exposes low-level C and C++ to higher-level languages, and lets those languages interoperate easily with Swift. It tidies up some of the legacy problems with C-based languages, improves performance, and makes our day-to-day lives as developers better for it.

For those rare cases when you think to yourself “If only I could easily manage my public and private headers”, “I really wish I could divide my framework up into smaller modules”, or “I’d really like to access that C API from Swift”, now you have the knowledge necessary to make that a reality. For more information, please refer to the documentation for LLVM Modules.

What are your thoughts on module maps? What sorts of problems do you force being able to solve with custom maps in your projects? Let me know in the comments, or let me know if you found this helpful. And if you haven’t read it already, go check out my article on Dynamic Frameworks since that dovetails nicely into this article.

In defence of Apple’s bug process

March 15, 2019

Cocoa Dynamic Frameworks

March 8, 2019