-
Justin Lebar authored
Reviewers: rsmith Subscribers: cfe-commits Differential Revision: http://reviews.llvm.org/D20457 git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@270279 91177308-0d34-0410-b5e6-96231b3b80d8
Justin Lebar authoredReviewers: rsmith Subscribers: cfe-commits Differential Revision: http://reviews.llvm.org/D20457 git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@270279 91177308-0d34-0410-b5e6-96231b3b80d8
Clang Compiler User's Manual
- Introduction
- Command Line Options
- Language and Target-Independent Features
- C Language Features
- C++ Language Features
- Objective-C Language Features
- Objective-C++ Language Features
- OpenMP Features
- Target-Specific Features and Limitations
- clang-cl
Introduction
The Clang Compiler is an open-source compiler for the C family of programming languages, aiming to be the best in class implementation of these languages. Clang builds on the LLVM optimizer and code generator, allowing it to provide high-quality optimization and code generation support for many targets. For more general information, please see the Clang Web Site or the LLVM Web Site.
This document describes important notes about using Clang as a compiler for an end-user, documenting the supported features, command line options, etc. If you are interested in using Clang to build a tool that processes code, please see :doc:`InternalsManual`. If you are interested in the Clang Static Analyzer, please see its web page.
Clang is designed to support the C family of programming languages, which includes :ref:`C <c>`, :ref:`Objective-C <objc>`, :ref:`C++ <cxx>`, and :ref:`Objective-C++ <objcxx>` as well as many dialects of those. For language-specific information, please see the corresponding language specific section:
- :ref:`C Language <c>`: K&R C, ANSI C89, ISO C90, ISO C94 (C89+AMD1), ISO C99 (+TC1, TC2, TC3).
- :ref:`Objective-C Language <objc>`: ObjC 1, ObjC 2, ObjC 2.1, plus variants depending on base language.
- :ref:`C++ Language <cxx>`
- :ref:`Objective C++ Language <objcxx>`
In addition to these base languages and their dialects, Clang supports a broad variety of language extensions, which are documented in the corresponding language section. These extensions are provided to be compatible with the GCC, Microsoft, and other popular compilers as well as to improve functionality through Clang-specific features. The Clang driver and language features are intentionally designed to be as compatible with the GNU GCC compiler as reasonably possible, easing migration from GCC to Clang. In most cases, code "just works". Clang also provides an alternative driver, :ref:`clang-cl`, that is designed to be compatible with the Visual C++ compiler, cl.exe.
In addition to language specific features, Clang has a variety of features that depend on what CPU architecture or operating system is being compiled for. Please see the :ref:`Target-Specific Features and Limitations <target_features>` section for more details.
The rest of the introduction introduces some basic :ref:`compiler terminology <terminology>` that is used throughout this manual and contains a basic :ref:`introduction to using Clang <basicusage>` as a command line compiler.
Terminology
Front end, parser, backend, preprocessor, undefined behavior, diagnostic, optimizer
Basic Usage
Intro to how to use a C compiler for newbies.
compile + link compile then link debug info enabling optimizations picking a language to use, defaults to C11 by default. Autosenses based on extension. using a makefile
Command Line Options
This section is generally an index into other sections. It does not go into depth on the ones that are covered by other sections. However, the first part introduces the language selection and other high level options like :option:`-c`, :option:`-g`, etc.
Options to Control Error and Warning Messages
-Werror=foo
Turn warning "foo" into an error.
Formatting of Diagnostics
Clang aims to produce beautiful diagnostics by default, particularly for new users that first come to Clang. However, different people have different preferences, and sometimes Clang is driven not by a human, but by a program that wants consistent and easily parsable output. For these cases, Clang provides a wide range of options to control the exact output format of the diagnostics that it generates.
- -f[no-]show-column
-
Print column number in diagnostic.
This option, which defaults to on, controls whether or not Clang prints the column number of a diagnostic. For example, when this is enabled, Clang will print something like:
test.c:28:8: warning: extra tokens at end of #endif directive [-Wextra-tokens] #endif bad ^ //
When this is disabled, Clang will print "test.c:28: warning..." with no column number.
The printed column numbers count bytes from the beginning of the line; take care if your source contains multibyte characters.
- -f[no-]show-source-location
-
Print source file/line/column information in diagnostic.
This option, which defaults to on, controls whether or not Clang prints the filename, line number and column number of a diagnostic. For example, when this is enabled, Clang will print something like:
test.c:28:8: warning: extra tokens at end of #endif directive [-Wextra-tokens] #endif bad ^ //
When this is disabled, Clang will not print the "test.c:28:8: " part.
- -f[no-]caret-diagnostics
-
Print source line and ranges from source code in diagnostic. This option, which defaults to on, controls whether or not Clang prints the source line, source ranges, and caret when emitting a diagnostic. For example, when this is enabled, Clang will print something like:
test.c:28:8: warning: extra tokens at end of #endif directive [-Wextra-tokens] #endif bad ^ //
- -f[no-]color-diagnostics
-
This option, which defaults to on when a color-capable terminal is detected, controls whether or not Clang prints diagnostics in color.
When this option is enabled, Clang will use colors to highlight specific parts of the diagnostic, e.g.,
When this is disabled, Clang will just print:
test.c:2:8: warning: extra tokens at end of #endif directive [-Wextra-tokens] #endif bad ^ //
- -fansi-escape-codes
- Controls whether ANSI escape codes are used instead of the Windows Console API to output colored diagnostics. This option is only used on Windows and defaults to off.
- -f[no-]diagnostics-show-option
-
Enable
[-Woption]
information in diagnostic line.This option, which defaults to on, controls whether or not Clang prints the associated :ref:`warning group <cl_diag_warning_groups>` option name when outputting a warning diagnostic. For example, in this output:
test.c:28:8: warning: extra tokens at end of #endif directive [-Wextra-tokens] #endif bad ^ //
Passing -fno-diagnostics-show-option will prevent Clang from printing the [:ref:`-Wextra-tokens <opt_Wextra-tokens>`] information in the diagnostic. This information tells you the flag needed to enable or disable the diagnostic, either from the command line or through :ref:`#pragma GCC diagnostic <pragma_GCC_diagnostic>`.
- -f[no-]diagnostics-fixit-info
-
Enable "FixIt" information in the diagnostics output.
This option, which defaults to on, controls whether or not Clang prints the information on how to fix a specific diagnostic underneath it when it knows. For example, in this output:
test.c:28:8: warning: extra tokens at end of #endif directive [-Wextra-tokens] #endif bad ^ //
Passing -fno-diagnostics-fixit-info will prevent Clang from printing the "//" line at the end of the message. This information is useful for users who may not understand what is wrong, but can be confusing for machine parsing.
- -fdiagnostics-print-source-range-info
-
Print machine parsable information about source ranges. This option makes Clang print information about source ranges in a machine parsable format after the file/line/column number information. The information is a simple sequence of brace enclosed ranges, where each range lists the start and end line/column locations. For example, in this output:
exprs.c:47:15:{47:8-47:14}{47:17-47:24}: error: invalid operands to binary expression ('int *' and '_Complex float') P = (P-42) + Gamma*4; ~~~~~~ ^ ~~~~~~~
The {}'s are generated by -fdiagnostics-print-source-range-info.
The printed column numbers count bytes from the beginning of the line; take care if your source contains multibyte characters.
Individual Warning Groups
TODO: Generate this from tblgen. Define one anchor per warning group.
Options to Control Clang Crash Diagnostics
As unbelievable as it may sound, Clang does crash from time to time. Generally, this only occurs to those living on the bleeding edge. Clang goes to great lengths to assist you in filing a bug report. Specifically, Clang generates preprocessed source file(s) and associated run script(s) upon a crash. These files should be attached to a bug report to ease reproducibility of the failure. Below are the command line options to control the crash diagnostics.
The -fno-crash-diagnostics flag can be helpful for speeding the process of generating a delta reduced test case.
Options to Emit Optimization Reports
Optimization reports trace, at a high-level, all the major decisions
done by compiler transformations. For instance, when the inliner
decides to inline function foo()
into bar()
, or the loop unroller
decides to unroll a loop N times, or the vectorizer decides to
vectorize a loop body.
Clang offers a family of flags which the optimizers can use to emit a diagnostic in three cases:
- When the pass makes a transformation (:option:`-Rpass`).
- When the pass fails to make a transformation (:option:`-Rpass-missed`).
- When the pass determines whether or not to make a transformation (:option:`-Rpass-analysis`).
NOTE: Although the discussion below focuses on :option:`-Rpass`, the exact same options apply to :option:`-Rpass-missed` and :option:`-Rpass-analysis`.
Since there are dozens of passes inside the compiler, each of these flags take a regular expression that identifies the name of the pass which should emit the associated diagnostic. For example, to get a report from the inliner, compile the code with:
$ clang -O2 -Rpass=inline code.cc -o code
code.cc:4:25: remark: foo inlined into bar [-Rpass=inline]
int bar(int j) { return foo(j, j - 2); }
^
Note that remarks from the inliner are identified with [-Rpass=inline]. To request a report from every optimization pass, you should use :option:`-Rpass=.*` (in fact, you can use any valid POSIX regular expression). However, do not expect a report from every transformation made by the compiler. Optimization remarks do not really make sense outside of the major transformations (e.g., inlining, vectorization, loop optimizations) and not every optimization pass supports this feature.
Current limitations
- Optimization remarks that refer to function names will display the mangled name of the function. Since these remarks are emitted by the back end of the compiler, it does not know anything about the input language, nor its mangling rules.
- Some source locations are not displayed correctly. The front end has a more detailed source location tracking than the locations included in the debug info (e.g., the front end can locate code inside macro expansions). However, the locations used by :option:`-Rpass` are translated from debug annotations. That translation can be lossy, which results in some remarks having no location information.
Other Options
Clang options that that don't fit neatly into other categories.
When Clang emits a dependency file (e.g., you supplied the -M option) most filenames can be written to the file without any special formatting. Different Make tools will treat different sets of characters as "special" and use different conventions for telling the Make tool that the character is actually part of the filename. Normally Clang uses backslash to "escape" a special character, which is the convention used by GNU Make. The -MV option tells Clang to put double-quotes around the entire filename, which is the convention used by NMake and Jom.
Language and Target-Independent Features
Controlling Errors and Warnings
Clang provides a number of ways to control which code constructs cause it to emit errors and warning messages, and how they are displayed to the console.
Controlling How Clang Displays Diagnostics
When Clang emits a diagnostic, it includes rich information in the output, and gives you fine-grain control over which information is printed. Clang has the ability to print this information, and these are the options that control it:
- A file/line/column indicator that shows exactly where the diagnostic occurs in your code [:ref:`-fshow-column <opt_fshow-column>`, :ref:`-fshow-source-location <opt_fshow-source-location>`].
- A categorization of the diagnostic as a note, warning, error, or fatal error.
- A text string that describes what the problem is.
- An option that indicates how to control the diagnostic (for diagnostics that support it) [:ref:`-fdiagnostics-show-option <opt_fdiagnostics-show-option>`].
- A :ref:`high-level category <diagnostics_categories>` for the diagnostic for clients that want to group diagnostics by class (for diagnostics that support it) [:ref:`-fdiagnostics-show-category <opt_fdiagnostics-show-category>`].
- The line of source code that the issue occurs on, along with a caret and ranges that indicate the important locations [:ref:`-fcaret-diagnostics <opt_fcaret-diagnostics>`].
- "FixIt" information, which is a concise explanation of how to fix the problem (when Clang is certain it knows) [:ref:`-fdiagnostics-fixit-info <opt_fdiagnostics-fixit-info>`].
- A machine-parsable representation of the ranges involved (off by default) [:ref:`-fdiagnostics-print-source-range-info <opt_fdiagnostics-print-source-range-info>`].
For more information please see :ref:`Formatting of Diagnostics <cl_diag_formatting>`.
Diagnostic Mappings
All diagnostics are mapped into one of these 6 classes:
- Ignored
- Note
- Remark
- Warning
- Error
- Fatal
Diagnostic Categories
Though not shown by default, diagnostics may each be associated with a high-level category. This category is intended to make it possible to triage builds that produce a large number of errors or warnings in a grouped way.
Categories are not shown by default, but they can be turned on with the
:ref:`-fdiagnostics-show-category <opt_fdiagnostics-show-category>` option.
When set to "name
", the category is printed textually in the
diagnostic output. When it is set to "id
", a category number is
printed. The mapping of category names to category id's can be obtained
by running 'clang --print-diagnostic-categories
'.
Controlling Diagnostics via Command Line Flags
TODO: -W flags, -pedantic, etc
Controlling Diagnostics via Pragmas
Clang can also control what diagnostics are enabled through the use of pragmas in the source code. This is useful for turning off specific warnings in a section of source code. Clang supports GCC's pragma for compatibility with existing source code, as well as several extensions.
The pragma may control any warning that can be used from the command line. Warnings may be set to ignored, warning, error, or fatal. The following example code will tell Clang or GCC to ignore the -Wall warnings:
#pragma GCC diagnostic ignored "-Wall"
In addition to all of the functionality provided by GCC's pragma, Clang also allows you to push and pop the current warning state. This is particularly useful when writing a header file that will be compiled by other people, because you don't know what warning flags they build with.
In the below example :option:`-Wmultichar` is ignored for only a single line of code, after which the diagnostics return to whatever state had previously existed.
#pragma clang diagnostic push
#pragma clang diagnostic ignored "-Wmultichar"
char b = 'df'; // no warning.
#pragma clang diagnostic pop
The push and pop pragmas will save and restore the full diagnostic state of the compiler, regardless of how it was set. That means that it is possible to use push and pop around GCC compatible diagnostics and Clang will push and pop them appropriately, while GCC will ignore the pushes and pops as unknown pragmas. It should be noted that while Clang supports the GCC pragma, Clang and GCC do not support the exact same set of warnings, so even when using GCC compatible #pragmas there is no guarantee that they will have identical behaviour on both compilers.
In addition to controlling warnings and errors generated by the compiler, it is possible to generate custom warning and error messages through the following pragmas:
// The following will produce warning messages
#pragma message "some diagnostic message"
#pragma GCC warning "TODO: replace deprecated feature"
// The following will produce an error message
#pragma GCC error "Not supported"
These pragmas operate similarly to the #warning
and #error
preprocessor
directives, except that they may also be embedded into preprocessor macros via
the C99 _Pragma
operator, for example:
#define STR(X) #X
#define DEFER(M,...) M(__VA_ARGS__)
#define CUSTOM_ERROR(X) _Pragma(STR(GCC error(X " at line " DEFER(STR,__LINE__))))
CUSTOM_ERROR("Feature not available");
Controlling Diagnostics in System Headers
Warnings are suppressed when they occur in system headers. By default,
an included file is treated as a system header if it is found in an
include path specified by -isystem
, but this can be overridden in
several ways.
The system_header
pragma can be used to mark the current file as
being a system header. No warnings will be produced from the location of
the pragma onwards within the same file.
char a = 'xy'; // warning
#pragma clang system_header
char b = 'ab'; // no warning
The :option:`--system-header-prefix=` and :option:`--no-system-header-prefix=`
command-line arguments can be used to override whether subsets of an include
path are treated as system headers. When the name in a #include
directive
is found within a header search path and starts with a system prefix, the
header is treated as a system header. The last prefix on the
command-line which matches the specified header name takes precedence.
For instance:
$ clang -Ifoo -isystem bar --system-header-prefix=x/ \
--no-system-header-prefix=x/y/
Here, #include "x/a.h"
is treated as including a system header, even
if the header is found in foo
, and #include "x/y/b.h"
is treated
as not including a system header, even if the header is found in
bar
.
A #include
directive which finds a file relative to the current
directory is treated as including a system header if the including file
is treated as a system header.
Enabling All Diagnostics
In addition to the traditional -W
flags, one can enable all
diagnostics by passing :option:`-Weverything`. This works as expected
with
:option:`-Werror`, and also includes the warnings from :option:`-pedantic`.
Note that when combined with :option:`-w` (which disables all warnings), that flag wins.
Controlling Static Analyzer Diagnostics
While not strictly part of the compiler, the diagnostics from Clang's static analyzer can also be influenced by the user via changes to the source code. See the available annotations and the analyzer's FAQ page for more information.
Precompiled Headers
Precompiled headers are a general approach employed by many compilers to reduce compilation time. The underlying motivation of the approach is that it is common for the same (and often large) header files to be included by multiple source files. Consequently, compile times can often be greatly improved by caching some of the (redundant) work done by a compiler to process headers. Precompiled header files, which represent one of many ways to implement this optimization, are literally files that represent an on-disk cache that contains the vital information necessary to reduce some of the work needed to process a corresponding header file. While details of precompiled headers vary between compilers, precompiled headers have been shown to be highly effective at speeding up program compilation on systems with very large system headers (e.g., Mac OS X).
Generating a PCH File
To generate a PCH file using Clang, one invokes Clang with the :option:`-x <language>-header` option. This mirrors the interface in GCC for generating PCH files:
$ gcc -x c-header test.h -o test.h.gch
$ clang -x c-header test.h -o test.h.pch
Using a PCH File
A PCH file can then be used as a prefix header when a :option:`-include`
option is passed to clang
:
$ clang -include test.h test.c -o test
The clang
driver will first check if a PCH file for test.h
is
available; if so, the contents of test.h
(and the files it includes)
will be processed from the PCH file. Otherwise, Clang falls back to
directly processing the content of test.h
. This mirrors the behavior
of GCC.
Note
Clang does not automatically use PCH files for headers that are directly included within a source file. For example:
$ clang -x c-header test.h -o test.h.pch
$ cat test.c
#include "test.h"
$ clang test.c -o test
In this example, clang
will not automatically use the PCH file for
test.h
since test.h
was included directly in the source file and not
specified on the command line using :option:`-include`.
Relocatable PCH Files
It is sometimes necessary to build a precompiled header from headers that are not yet in their final, installed locations. For example, one might build a precompiled header within the build tree that is then meant to be installed alongside the headers. Clang permits the creation of "relocatable" precompiled headers, which are built with a given path (into the build directory) and can later be used from an installed location.
To build a relocatable precompiled header, place your headers into a
subdirectory whose structure mimics the installed location. For example,
if you want to build a precompiled header for the header mylib.h
that will be installed into /usr/include
, create a subdirectory
build/usr/include
and place the header mylib.h
into that
subdirectory. If mylib.h
depends on other headers, then they can be
stored within build/usr/include
in a way that mimics the installed
location.
Building a relocatable precompiled header requires two additional
arguments. First, pass the --relocatable-pch
flag to indicate that
the resulting PCH file should be relocatable. Second, pass
:option:`-isysroot /path/to/build`, which makes all includes for your library
relative to the build directory. For example:
# clang -x c-header --relocatable-pch -isysroot /path/to/build /path/to/build/mylib.h mylib.h.pch
When loading the relocatable PCH file, the various headers used in the
PCH file are found from the system header root. For example, mylib.h
can be found in /usr/include/mylib.h
. If the headers are installed
in some other system root, the :option:`-isysroot` option can be used provide
a different system root from which the headers will be based. For
example, :option:`-isysroot /Developer/SDKs/MacOSX10.4u.sdk` will look for
mylib.h
in /Developer/SDKs/MacOSX10.4u.sdk/usr/include/mylib.h
.
Relocatable precompiled headers are intended to be used in a limited number of cases where the compilation environment is tightly controlled and the precompiled header cannot be generated after headers have been installed.
Controlling Code Generation
Clang provides a number of ways to control code generation. The options are listed below.
- -f[no-]sanitize=check1,check2,...
-
Turn on runtime checks for various forms of undefined or suspicious behavior.
This option controls whether Clang adds runtime checks for various forms of undefined or suspicious behavior, and is disabled by default. If a check fails, a diagnostic message is produced at runtime explaining the problem. The main checks are:
-
-fsanitize=address
: :doc:`AddressSanitizer`, a memory error detector. -
-fsanitize=thread
: :doc:`ThreadSanitizer`, a data race detector. -
-fsanitize=memory
: :doc:`MemorySanitizer`, a detector of uninitialized reads. Requires instrumentation of all program code. -
-fsanitize=undefined
: :doc:`UndefinedBehaviorSanitizer`, a fast and compatible undefined behavior checker. -
-fsanitize=dataflow
: :doc:`DataFlowSanitizer`, a general data flow analysis. -
-fsanitize=cfi
: :doc:`control flow integrity <ControlFlowIntegrity>` checks. Requires-flto
. -
-fsanitize=safe-stack
: :doc:`safe stack <SafeStack>` protection against stack-based memory corruption errors.
There are more fine-grained checks available: see the :ref:`list <ubsan-checks>` of specific kinds of undefined behavior that can be detected and the :ref:`list <cfi-schemes>` of control flow integrity schemes.
The
-fsanitize=
argument must also be provided when linking, in order to link to the appropriate runtime library.It is not possible to combine more than one of the
-fsanitize=address
,-fsanitize=thread
, and-fsanitize=memory
checkers in the same program. -
-f[no-]sanitize-recover=check1,check2,...
-f[no-]sanitize-recover=all
Controls which checks enabled by
-fsanitize=
flag are non-fatal. If the check is fatal, program will halt after the first error of this kind is detected and error report is printed.By default, non-fatal checks are those enabled by :doc:`UndefinedBehaviorSanitizer`, except for
-fsanitize=return
and-fsanitize=unreachable
. Some sanitizers may not support recovery (or not support it by default e.g. :doc:`AddressSanitizer`), and always crash the program after the issue is detected.Note that the
-fsanitize-trap
flag has precedence over this flag. This means that if a check has been configured to trap elsewhere on the command line, or if the check traps by default, this flag will not have any effect unless that sanitizer's trapping behavior is disabled with-fno-sanitize-trap
.For example, if a command line contains the flags
-fsanitize=undefined -fsanitize-trap=undefined
, the flag-fsanitize-recover=alignment
will have no effect on its own; it will need to be accompanied by-fno-sanitize-trap=alignment
.
-f[no-]sanitize-trap=check1,check2,...
Controls which checks enabled by the
-fsanitize=
flag trap. This option is intended for use in cases where the sanitizer runtime cannot be used (for instance, when building libc or a kernel module), or where the binary size increase caused by the sanitizer runtime is a concern.This flag is only compatible with :doc:`control flow integrity <ControlFlowIntegrity>` schemes and :doc:`UndefinedBehaviorSanitizer` checks other than
vptr
. If this flag is supplied together with-fsanitize=undefined
, thevptr
sanitizer will be implicitly disabled.This flag is enabled by default for sanitizers in the
cfi
group.
-f[no-]sanitize-coverage=[type,features,...]
Enable simple code coverage in addition to certain sanitizers. See :doc:`SanitizerCoverage` for more details.
-f[no-]sanitize-stats
Enable simple statistics gathering for the enabled sanitizers. See :doc:`SanitizerStats` for more details.
- -f[no-]max-type-align=[number]
-
Instruct the code generator to not enforce a higher alignment than the given number (of bytes) when accessing memory via an opaque pointer or reference. This cap is ignored when directly accessing a variable or when the pointee type has an explicit “aligned” attribute.
The value should usually be determined by the properties of the system allocator. Some builtin types, especially vector types, have very high natural alignments; when working with values of those types, Clang usually wants to use instructions that take advantage of that alignment. However, many system allocators do not promise to return memory that is more than 8-byte or 16-byte-aligned. Use this option to limit the alignment that the compiler can assume for an arbitrary pointer, which may point onto the heap.
This option does not affect the ABI alignment of types; the layout of structs and unions and the value returned by the alignof operator remain the same.
This option can be overridden on a case-by-case basis by putting an explicit “aligned” alignment on a struct, union, or typedef. For example:
#include <immintrin.h> // Make an aligned typedef of the AVX-512 16-int vector type. typedef __v16si __aligned_v16si __attribute__((aligned(64))); void initialize_vector(__aligned_v16si *v) { // The compiler may assume that ‘v’ is 64-byte aligned, regardless of the // value of -fmax-type-align. }
Profile Guided Optimization
Profile information enables better optimization. For example, knowing that a
branch is taken very frequently helps the compiler make better decisions when
ordering basic blocks. Knowing that a function foo
is called more
frequently than another function bar
helps the inliner.
Clang supports profile guided optimization with two different kinds of profiling. A sampling profiler can generate a profile with very low runtime overhead, or you can build an instrumented version of the code that collects more detailed profile information. Both kinds of profiles can provide execution counts for instructions in the code and information on branches taken and function invocation.
Regardless of which kind of profiling you use, be careful to collect profiles by running your code with inputs that are representative of the typical behavior. Code that is not exercised in the profile will be optimized as if it is unimportant, and the compiler may make poor optimization choices for code that is disproportionately used while profiling.
Differences Between Sampling and Instrumentation
Although both techniques are used for similar purposes, there are important differences between the two:
- Profile data generated with one cannot be used by the other, and there is no
conversion tool that can convert one to the other. So, a profile generated
via
-fprofile-instr-generate
must be used with-fprofile-instr-use
. Similarly, sampling profiles generated by external profilers must be converted and used with-fprofile-sample-use
. - Instrumentation profile data can be used for code coverage analysis and optimization.
- Sampling profiles can only be used for optimization. They cannot be used for code coverage analysis. Although it would be technically possible to use sampling profiles for code coverage, sample-based profiles are too coarse-grained for code coverage purposes; it would yield poor results.
- Sampling profiles must be generated by an external tool. The profile generated by that tool must then be converted into a format that can be read by LLVM. The section on sampling profilers describes one of the supported sampling profile formats.
Using Sampling Profilers
Sampling profilers are used to collect runtime information, such as hardware counters, while your application executes. They are typically very efficient and do not incur a large runtime overhead. The sample data collected by the profiler can be used during compilation to determine what the most executed areas of the code are.
Using the data from a sample profiler requires some changes in the way a program is built. Before the compiler can use profiling information, the code needs to execute under the profiler. The following is the usual build cycle when using sample profilers for optimization:
-
Build the code with source line table information. You can use all the usual build flags that you always build your application with. The only requirement is that you add
-gline-tables-only
or-g
to the command line. This is important for the profiler to be able to map instructions back to source line locations.$ clang++ -O2 -gline-tables-only code.cc -o code
-
Run the executable under a sampling profiler. The specific profiler you use does not really matter, as long as its output can be converted into the format that the LLVM optimizer understands. Currently, there exists a conversion tool for the Linux Perf profiler (https://perf.wiki.kernel.org/), so these examples assume that you are using Linux Perf to profile your code.
$ perf record -b ./code
Note the use of the
-b
flag. This tells Perf to use the Last Branch Record (LBR) to record call chains. While this is not strictly required, it provides better call information, which improves the accuracy of the profile data. -
Convert the collected profile data to LLVM's sample profile format. This is currently supported via the AutoFDO converter
create_llvm_prof
. It is available at http://github.com/google/autofdo. Once built and installed, you can convert theperf.data
file to LLVM using the command:$ create_llvm_prof --binary=./code --out=code.prof
This will read
perf.data
and the binary file./code
and emit the profile data incode.prof
. Note that if you ranperf
without the-b
flag, you need to use--use_lbr=false
when callingcreate_llvm_prof
. -
Build the code again using the collected profile. This step feeds the profile back to the optimizers. This should result in a binary that executes faster than the original one. Note that you are not required to build the code with the exact same arguments that you used in the first step. The only requirement is that you build the code with
-gline-tables-only
and-fprofile-sample-use
.$ clang++ -O2 -gline-tables-only -fprofile-sample-use=code.prof code.cc -o code
Sample Profile Formats
Since external profilers generate profile data in a variety of custom formats, the data generated by the profiler must be converted into a format that can be read by the backend. LLVM supports three different sample profile formats: