Newer
Older
immediately after the name being declared.
For example, this applies the GNU ``unused`` attribute to ``a`` and ``f``, and
also applies the GNU ``noreturn`` attribute to ``f``.
.. code-block:: c++
[[gnu::unused]] int a, f [[gnu::noreturn]] ();
Target-Specific Extensions
==========================
Clang supports some language features conditionally on some targets.
ARM/AArch64 Language Extensions
-------------------------------
Memory Barrier Intrinsics
^^^^^^^^^^^^^^^^^^^^^^^^^
Clang implements the ``__dmb``, ``__dsb`` and ``__isb`` intrinsics as defined
in the `ARM C Language Extensions Release 2.0
<http://infocenter.arm.com/help/topic/com.arm.doc.ihi0053c/IHI0053C_acle_2_0.pdf>`_.
Note that these intrinsics are implemented as motion barriers that block
reordering of memory accesses and side effect instructions. Other instructions
like simple arithmetic may be reordered around the intrinsic. If you expect to
have no reordering at all, use inline assembly instead.
X86/X86-64 Language Extensions
------------------------------
The X86 backend has these language extensions:
Memory references to specified segments
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Annotating a pointer with address space #256 causes it to be code generated
relative to the X86 GS segment register, address space #257 causes it to be
relative to the X86 FS segment, and address space #258 causes it to be
relative to the X86 SS segment. Note that this is a very very low-level
2040
2041
2042
2043
2044
2045
2046
2047
2048
2049
2050
2051
2052
2053
2054
2055
2056
2057
2058
2059
2060
feature that should only be used if you know what you're doing (for example in
an OS kernel).
Here is an example:
.. code-block:: c++
#define GS_RELATIVE __attribute__((address_space(256)))
int foo(int GS_RELATIVE *P) {
return *P;
}
Which compiles to (on X86-32):
.. code-block:: gas
_foo:
movl 4(%esp), %eax
movl %gs:(%eax), %eax
ret
Extensions for Static Analysis

Dmitri Gribenko
committed
==============================
Clang supports additional attributes that are useful for documenting program
invariants and rules for static analysis tools, such as the `Clang Static
Analyzer <http://clang-analyzer.llvm.org/>`_. These attributes are documented
in the analyzer's `list of source-level annotations
<http://clang-analyzer.llvm.org/annotations.html>`_.
Extensions for Dynamic Analysis

Dmitri Gribenko
committed
===============================
Use ``__has_feature(address_sanitizer)`` to check if the code is being built

Dmitri Gribenko
committed
with :doc:`AddressSanitizer`.
Use ``__has_feature(thread_sanitizer)`` to check if the code is being built
with :doc:`ThreadSanitizer`.
Use ``__has_feature(memory_sanitizer)`` to check if the code is being built
with :doc:`MemorySanitizer`.

Peter Collingbourne
committed
Use ``__has_feature(safe_stack)`` to check if the code is being built
with :doc:`SafeStack`.
2086
2087
2088
2089
2090
2091
2092
2093
2094
2095
2096
2097
2098
2099
2100
2101
2102
2103
2104
2105
2106
2107
2108
2109
2110
2111
2112
2113
2114
2115
2116
2117
2118
2119
2120
2121
2122
2123
2124
2125
2126
2127
2128
2129
2130
2131
2132
2133
2134
2135
2136
2137
2138
2139
2140
2141
2142
2143
2144
2145
2146
2147
2148
2149
2150
2151
2152
2153
2154
2155
2156
2157
2158
2159
2160
2161
2162
2163
Extensions for selectively disabling optimization
=================================================
Clang provides a mechanism for selectively disabling optimizations in functions
and methods.
To disable optimizations in a single function definition, the GNU-style or C++11
non-standard attribute ``optnone`` can be used.
.. code-block:: c++
// The following functions will not be optimized.
// GNU-style attribute
__attribute__((optnone)) int foo() {
// ... code
}
// C++11 attribute
[[clang::optnone]] int bar() {
// ... code
}
To facilitate disabling optimization for a range of function definitions, a
range-based pragma is provided. Its syntax is ``#pragma clang optimize``
followed by ``off`` or ``on``.
All function definitions in the region between an ``off`` and the following
``on`` will be decorated with the ``optnone`` attribute unless doing so would
conflict with explicit attributes already present on the function (e.g. the
ones that control inlining).
.. code-block:: c++
#pragma clang optimize off
// This function will be decorated with optnone.
int foo() {
// ... code
}
// optnone conflicts with always_inline, so bar() will not be decorated.
__attribute__((always_inline)) int bar() {
// ... code
}
#pragma clang optimize on
If no ``on`` is found to close an ``off`` region, the end of the region is the
end of the compilation unit.
Note that a stray ``#pragma clang optimize on`` does not selectively enable
additional optimizations when compiling at low optimization levels. This feature
can only be used to selectively disable optimizations.
The pragma has an effect on functions only at the point of their definition; for
function templates, this means that the state of the pragma at the point of an
instantiation is not necessarily relevant. Consider the following example:
.. code-block:: c++
template<typename T> T twice(T t) {
return 2 * t;
}
#pragma clang optimize off
template<typename T> T thrice(T t) {
return 3 * t;
}
int container(int a, int b) {
return twice(a) + thrice(b);
}
#pragma clang optimize on
In this example, the definition of the template function ``twice`` is outside
the pragma region, whereas the definition of ``thrice`` is inside the region.
The ``container`` function is also in the region and will not be optimized, but
it causes the instantiation of ``twice`` and ``thrice`` with an ``int`` type; of
these two instantiations, ``twice`` will be optimized (because its definition
was outside the region) and ``thrice`` will not be optimized.

Tyler Nowicki
committed
Extensions for loop hint optimizations
======================================
The ``#pragma clang loop`` directive is used to specify hints for optimizing the
subsequent for, while, do-while, or c++11 range-based for loop. The directive
provides options for vectorization, interleaving, unrolling and
distribution. Loop hints can be specified before any loop and will be ignored if
the optimization is not safe to apply.
Vectorization and Interleaving
------------------------------

Tyler Nowicki
committed
2176
2177
2178
2179
2180
2181
2182
2183
2184
2185
2186
2187
2188
2189
2190
2191
2192
2193
2194
2195
2196
2197
2198
2199
2200
2201
2202
2203
2204
2205
2206
2207
2208
2209
2210
2211
2212
2213
2214
2215
2216
2217
A vectorized loop performs multiple iterations of the original loop
in parallel using vector instructions. The instruction set of the target
processor determines which vector instructions are available and their vector
widths. This restricts the types of loops that can be vectorized. The vectorizer
automatically determines if the loop is safe and profitable to vectorize. A
vector instruction cost model is used to select the vector width.
Interleaving multiple loop iterations allows modern processors to further
improve instruction-level parallelism (ILP) using advanced hardware features,
such as multiple execution units and out-of-order execution. The vectorizer uses
a cost model that depends on the register pressure and generated code size to
select the interleaving count.
Vectorization is enabled by ``vectorize(enable)`` and interleaving is enabled
by ``interleave(enable)``. This is useful when compiling with ``-Os`` to
manually enable vectorization or interleaving.
.. code-block:: c++
#pragma clang loop vectorize(enable)
#pragma clang loop interleave(enable)
for(...) {
...
}
The vector width is specified by ``vectorize_width(_value_)`` and the interleave
count is specified by ``interleave_count(_value_)``, where
_value_ is a positive integer. This is useful for specifying the optimal
width/count of the set of target architectures supported by your application.
.. code-block:: c++
#pragma clang loop vectorize_width(2)
#pragma clang loop interleave_count(2)
for(...) {
...
}
Specifying a width/count of 1 disables the optimization, and is equivalent to
``vectorize(disable)`` or ``interleave(disable)``.
Loop Unrolling
--------------
Unrolling a loop reduces the loop control overhead and exposes more
opportunities for ILP. Loops can be fully or partially unrolled. Full unrolling
eliminates the loop and replaces it with an enumerated sequence of loop
iterations. Full unrolling is only possible if the loop trip count is known at
compile time. Partial unrolling replicates the loop body within the loop and
reduces the trip count.
If ``unroll(enable)`` is specified the unroller will attempt to fully unroll the
loop if the trip count is known at compile time. If the fully unrolled code size
is greater than an internal limit the loop will be partially unrolled up to this
limit. If the trip count is not known at compile time the loop will be partially
unrolled with a heuristically chosen unroll factor.
.. code-block:: c++
#pragma clang loop unroll(enable)
for(...) {
...
}
If ``unroll(full)`` is specified the unroller will attempt to fully unroll the
loop if the trip count is known at compile time identically to
``unroll(enable)``. However, with ``unroll(full)`` the loop will not be unrolled
if the loop count is not known at compile time.
.. code-block:: c++

Mark Heffernan
committed
#pragma clang loop unroll(full)
for(...) {
...
}
The unroll count can be specified explicitly with ``unroll_count(_value_)`` where
_value_ is a positive integer. If this value is greater than the trip count the
loop will be fully unrolled. Otherwise the loop is partially unrolled subject
to the same code size limit as with ``unroll(enable)``.
.. code-block:: c++
#pragma clang loop unroll_count(8)
for(...) {
...
}
Unrolling of a loop can be prevented by specifying ``unroll(disable)``.
Loop Distribution
-----------------
Loop Distribution allows splitting a loop into multiple loops. This is
beneficial for example when the entire loop cannot be vectorized but some of the
resulting loops can.
If ``distribute(enable))`` is specified and the loop has memory dependencies
2275
2276
2277
2278
2279
2280
2281
2282
2283
2284
2285
2286
2287
2288
2289
2290
2291
2292
2293
2294
2295
2296
2297
2298
that inhibit vectorization, the compiler will attempt to isolate the offending
operations into a new loop. This optimization is not enabled by default, only
loops marked with the pragma are considered.
.. code-block:: c++
#pragma clang loop distribute(enable)
for (i = 0; i < N; ++i) {
S1: A[i + 1] = A[i] + B[i];
S2: C[i] = D[i] * E[i];
}
This loop will be split into two loops between statements S1 and S2. The
second loop containing S2 will be vectorized.
Loop Distribution is currently not enabled by default in the optimizer because
it can hurt performance in some cases. For example, instruction-level
parallelism could be reduced by sequentializing the execution of the
statements S1 and S2 above.
If Loop Distribution is turned on globally with
``-mllvm -enable-loop-distribution``, specifying ``distribute(disable)`` can
be used the disable it on a per-loop basis.
Additional Information
----------------------

Tyler Nowicki
committed
For convenience multiple loop hints can be specified on a single line.
.. code-block:: c++
#pragma clang loop vectorize_width(4) interleave_count(8)
for(...) {
...
}
If an optimization cannot be applied any hints that apply to it will be ignored.
For example, the hint ``vectorize_width(4)`` is ignored if the loop is not
proven safe to vectorize. To identify and diagnose optimization issues use
`-Rpass`, `-Rpass-missed`, and `-Rpass-analysis` command line options. See the
user guide for details.
2316
2317
2318
2319
2320
2321
2322
2323
2324
2325
2326
2327
2328
2329
2330
2331
2332
2333
2334
2335
2336
2337
2338
2339
2340
2341
2342
2343
2344
2345
Extensions to specify floating-point flags
====================================================
The ``#pragma clang fp`` pragma allows floating-point options to be specified
for a section of the source code. This pragma can only appear at file scope or
at the start of a compound statement (excluding comments). When using within a
compound statement, the pragma is active within the scope of the compound
statement.
Currently, only FP contraction can be controlled with the pragma. ``#pragma
clang fp contract`` specifies whether the compiler should contract a multiply
and an addition (or subtraction) into a fused FMA operation when supported by
the target.
The pragma can take three values: ``on``, ``fast`` and ``off``. The ``on``
option is identical to using ``#pragma STDC FP_CONTRACT(ON)`` and it allows
fusion as specified the language standard. The ``fast`` option allows fusiong
in cases when the language standard does not make this possible (e.g. across
statements in C)
.. code-block:: c++
for(...) {
#pragma clang fp contract(fast)
a = b[i] * c[i];
d[i] += a;
}
The pragma can also be used with ``off`` which turns FP contraction off for a
section of the code. This can be useful when fast contraction is otherwise
enabled for the translation unit with the ``-ffp-contract=fast`` flag.