-
Richard Smith authored
requires ! feature The purpose of this is to allow (for instance) the module map for /usr/include to exclude <tgmath.h> and <complex.h> when building in C++ (these headers are instead provided by the C++ standard library in this case, and the glibc C <tgmath.h> header would otherwise try to include <complex.h>, resulting in a module cycle). git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@193549 91177308-0d34-0410-b5e6-96231b3b80d8
Richard Smith authoredrequires ! feature The purpose of this is to allow (for instance) the module map for /usr/include to exclude <tgmath.h> and <complex.h> when building in C++ (these headers are instead provided by the C++ standard library in this case, and the glibc C <tgmath.h> header would otherwise try to include <complex.h>, resulting in a module cycle). git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@193549 91177308-0d34-0410-b5e6-96231b3b80d8
Modules
Warning
The functionality described on this page is supported for C and Objective-C. C++ support is experimental.
Introduction
Most software is built using a number of software libraries, including libraries supplied by the platform, internal libraries built as part of the software itself to provide structure, and third-party libraries. For each library, one needs to access both its interface (API) and its implementation. In the C family of languages, the interface to a library is accessed by including the appropriate header files(s):
#include <SomeLib.h>
The implementation is handled separately by linking against the appropriate library. For example, by passing -lSomeLib
to the linker.
Modules provide an alternative, simpler way to use software libraries that provides better compile-time scalability and eliminates many of the problems inherent to using the C preprocessor to access the API of a library.
Problems with the current model
The #include
mechanism provided by the C preprocessor is a very poor way to access the API of a library, for a number of reasons:
- Compile-time scalability: Each time a header is included, the compiler must preprocess and parse the text in that header and every header it includes, transitively. This process must be repeated for every translation unit in the application, which involves a huge amount of redundant work. In a project with N translation units and M headers included in each translation unit, the compiler is performing M x N work even though most of the M headers are shared among multiple translation units. C++ is particularly bad, because the compilation model for templates forces a huge amount of code into headers.
-
Fragility:
#include
directives are treated as textual inclusion by the preprocessor, and are therefore subject to any active macro definitions at the time of inclusion. If any of the active macro definitions happens to collide with a name in the library, it can break the library API or cause compilation failures in the library header itself. For an extreme example,#define std "The C++ Standard"
and then include a standard library header: the result is a horrific cascade of failures in the C++ Standard Library's implementation. More subtle real-world problems occur when the headers for two different libraries interact due to macro collisions, and users are forced to reorder#include
directives or introduce#undef
directives to break the (unintended) dependency. -
Conventional workarounds: C programmers have
adopted a number of conventions to work around the fragility of the
C preprocessor model. Include guards, for example, are required for
the vast majority of headers to ensure that multiple inclusion
doesn't break the compile. Macro names are written with
LONG_PREFIXED_UPPERCASE_IDENTIFIERS
to avoid collisions, and some library/framework developers even use__underscored
names in headers to avoid collisions with "normal" names that (by convention) shouldn't even be macros. These conventions are a barrier to entry for developers coming from non-C languages, are boilerplate for more experienced developers, and make our headers far uglier than they should be. - Tool confusion: In a C-based language, it is hard to build tools that work well with software libraries, because the boundaries of the libraries are not clear. Which headers belong to a particular library, and in what order should those headers be included to guarantee that they compile correctly? Are the headers C, C++, Objective-C++, or one of the variants of these languages? What declarations in those headers are actually meant to be part of the API, and what declarations are present only because they had to be written as part of the header file?
Semantic import
Modules improve access to the API of software libraries by replacing the textual preprocessor inclusion model with a more robust, more efficient semantic model. From the user's perspective, the code looks only slightly different, because one uses an import
declaration rather than a #include
preprocessor directive:
import std.io; // pseudo-code; see below for syntax discussion
However, this module import behaves quite differently from the corresponding #include <stdio.h>
: when the compiler sees the module import above, it loads a binary representation of the std.io
module and makes its API available to the application directly. Preprocessor definitions that precede the import declaration have no impact on the API provided by std.io
, because the module itself was compiled as a separate, standalone module. Additionally, any linker flags required to use the std.io
module will automatically be provided when the module is imported [1]
This semantic import model addresses many of the problems of the preprocessor inclusion model:
-
Compile-time scalability: The
std.io
module is only compiled once, and importing the module into a translation unit is a constant-time operation (independent of module system). Thus, the API of each software library is only parsed once, reducing the M x N compilation problem to an M + N problem. -
Fragility: Each module is parsed as a standalone entity, so it has a consistent preprocessor environment. This completely eliminates the need for
__underscored
names and similarly defensive tricks. Moreover, the current preprocessor definitions when an import declaration is encountered are ignored, so one software library can not affect how another software library is compiled, eliminating include-order dependencies. - Tool confusion: Modules describe the API of software libraries, and tools can reason about and present a module as a representation of that API. Because modules can only be built standalone, tools can rely on the module definition to ensure that they get the complete API for the library. Moreover, modules can specify which languages they work with, so, e.g., one can not accidentally attempt to load a C++ module into a C program.