LanguageExtensions.rst

``memory_order`` enumeration.

(Note that Clang additionally provides GCC-compatible ``__atomic_*``
builtins and OpenCL 2.0 ``__opencl_atomic_*`` builtins. The OpenCL 2.0
atomic builtins are an explicit form of the corresponding OpenCL 2.0
builtin function, and are named with a ``__opencl_`` prefix. The macros
``__OPENCL_MEMORY_SCOPE_WORK_ITEM``, ``__OPENCL_MEMORY_SCOPE_WORK_GROUP``,
``__OPENCL_MEMORY_SCOPE_DEVICE``, ``__OPENCL_MEMORY_SCOPE_ALL_SVM_DEVICES``,
and ``__OPENCL_MEMORY_SCOPE_SUB_GROUP`` are provided, with values
corresponding to the enumerators of OpenCL's ``memory_scope`` enumeration.)

Low-level ARM exclusive memory builtins
---------------------------------------

Clang provides overloaded builtins giving direct access to the three key ARM
instructions for implementing atomic operations.

.. code-block:: c

  T __builtin_arm_ldrex(const volatile T *addr);
  T __builtin_arm_ldaex(const volatile T *addr);
  int __builtin_arm_strex(T val, volatile T *addr);
  int __builtin_arm_stlex(T val, volatile T *addr);
  void __builtin_arm_clrex(void);

The types ``T`` currently supported are:

* Integer types with width at most 64 bits (or 128 bits on AArch64).
* Floating-point types
* Pointer types.

Note that the compiler does not guarantee it will not insert stores which clear
the exclusive monitor in between an ``ldrex`` type operation and its paired
``strex``. In practice this is only usually a risk when the extra store is on
the same cache line as the variable being modified and Clang will only insert
stack stores on its own, so it is best not to use these operations on variables
with automatic storage duration.

Also, loads and stores may be implicit in code written between the ``ldrex`` and
``strex``. Clang will not necessarily mitigate the effects of these either, so
care should be exercised.

For these reasons the higher level atomic primitives should be preferred where
possible.

Non-temporal load/store builtins
--------------------------------

Clang provides overloaded builtins allowing generation of non-temporal memory
accesses.

.. code-block:: c

  T __builtin_nontemporal_load(T *addr);
  void __builtin_nontemporal_store(T value, T *addr);

The types ``T`` currently supported are:

* Integer types.
* Floating-point types.
* Vector types.

Note that the compiler does not guarantee that non-temporal loads or stores
will be used.

C++ Coroutines support builtins
--------------------------------

.. warning::
  This is a work in progress. Compatibility across Clang/LLVM releases is not 
  guaranteed.

Clang provides experimental builtins to support C++ Coroutines as defined by
http://wg21.link/P0057. The following four are intended to be used by the
standard library to implement `std::experimental::coroutine_handle` type.

**Syntax**:

.. code-block:: c

  void  __builtin_coro_resume(void *addr);
  void  __builtin_coro_destroy(void *addr);
  bool  __builtin_coro_done(void *addr);
  void *__builtin_coro_promise(void *addr, int alignment, bool from_promise)

**Example of use**:

.. code-block:: c++

  template <> struct coroutine_handle<void> {
    void resume() const { __builtin_coro_resume(ptr); }
    void destroy() const { __builtin_coro_destroy(ptr); }
    bool done() const { return __builtin_coro_done(ptr); }
    // ...
  protected:
    void *ptr;
  };

  template <typename Promise> struct coroutine_handle : coroutine_handle<> {
    // ...
    Promise &promise() const {
      return *reinterpret_cast<Promise *>(
        __builtin_coro_promise(ptr, alignof(Promise), /*from-promise=*/false));
    }
    static coroutine_handle from_promise(Promise &promise) {
      coroutine_handle p;
      p.ptr = __builtin_coro_promise(&promise, alignof(Promise),
                                                      /*from-promise=*/true);
      return p;
    }
  };


Other coroutine builtins are either for internal clang use or for use during
development of the coroutine feature. See `Coroutines in LLVM
<http://llvm.org/docs/Coroutines.html#intrinsics>`_ for
more information on their semantics. Note that builtins matching the intrinsics
that take token as the first parameter (llvm.coro.begin, llvm.coro.alloc, 
llvm.coro.free and llvm.coro.suspend) omit the token parameter and fill it to
an appropriate value during the emission.

**Syntax**:

.. code-block:: c

  size_t __builtin_coro_size()
  void  *__builtin_coro_frame()
  void  *__builtin_coro_free(void *coro_frame)

  void  *__builtin_coro_id(int align, void *promise, void *fnaddr, void *parts)
  bool   __builtin_coro_alloc()
  void  *__builtin_coro_begin(void *memory)
  void   __builtin_coro_end(void *coro_frame, bool unwind)
  char   __builtin_coro_suspend(bool final)
  bool   __builtin_coro_param(void *original, void *copy)

Note that there is no builtin matching the `llvm.coro.save` intrinsic. LLVM
automatically will insert one if the first argument to `llvm.coro.suspend` is
token `none`. If a user calls `__builin_suspend`, clang will insert `token none`
as the first argument to the intrinsic.

Non-standard C++11 Attributes
=============================

Clang's non-standard C++11 attributes live in the ``clang`` attribute
namespace.

Clang supports GCC's ``gnu`` attribute namespace. All GCC attributes which
are accepted with the ``__attribute__((foo))`` syntax are also accepted as
``[[gnu::foo]]``. This only extends to attributes which are specified by GCC
(see the list of `GCC function attributes
<http://gcc.gnu.org/onlinedocs/gcc/Function-Attributes.html>`_, `GCC variable
attributes <http://gcc.gnu.org/onlinedocs/gcc/Variable-Attributes.html>`_, and
`GCC type attributes
<http://gcc.gnu.org/onlinedocs/gcc/Type-Attributes.html>`_). As with the GCC
implementation, these attributes must appertain to the *declarator-id* in a
declaration, which means they must go either at the start of the declaration or
immediately after the name being declared.

For example, this applies the GNU ``unused`` attribute to ``a`` and ``f``, and
also applies the GNU ``noreturn`` attribute to ``f``.

.. code-block:: c++

  [[gnu::unused]] int a, f [[gnu::noreturn]] ();

Target-Specific Extensions
==========================

Clang supports some language features conditionally on some targets.

ARM/AArch64 Language Extensions
-------------------------------

Memory Barrier Intrinsics
^^^^^^^^^^^^^^^^^^^^^^^^^
Clang implements the ``__dmb``, ``__dsb`` and ``__isb`` intrinsics as defined
in the `ARM C Language Extensions Release 2.0
<http://infocenter.arm.com/help/topic/com.arm.doc.ihi0053c/IHI0053C_acle_2_0.pdf>`_.
Note that these intrinsics are implemented as motion barriers that block
reordering of memory accesses and side effect instructions. Other instructions
like simple arithmetic may be reordered around the intrinsic. If you expect to
have no reordering at all, use inline assembly instead.

X86/X86-64 Language Extensions
------------------------------

The X86 backend has these language extensions:

Memory references to specified segments
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Annotating a pointer with address space #256 causes it to be code generated
relative to the X86 GS segment register, address space #257 causes it to be
relative to the X86 FS segment, and address space #258 causes it to be
relative to the X86 SS segment.  Note that this is a very very low-level
feature that should only be used if you know what you're doing (for example in
an OS kernel).

Here is an example:

.. code-block:: c++

  #define GS_RELATIVE __attribute__((address_space(256)))
  int foo(int GS_RELATIVE *P) {
    return *P;
  }

Which compiles to (on X86-32):

.. code-block:: gas

  _foo:
          movl    4(%esp), %eax
          movl    %gs:(%eax), %eax
          ret

Extensions for Static Analysis
==============================

Clang supports additional attributes that are useful for documenting program
invariants and rules for static analysis tools, such as the `Clang Static
Analyzer <http://clang-analyzer.llvm.org/>`_. These attributes are documented
in the analyzer's `list of source-level annotations
<http://clang-analyzer.llvm.org/annotations.html>`_.


Extensions for Dynamic Analysis
===============================

Use ``__has_feature(address_sanitizer)`` to check if the code is being built
with :doc:`AddressSanitizer`.

Use ``__has_feature(thread_sanitizer)`` to check if the code is being built
with :doc:`ThreadSanitizer`.

Use ``__has_feature(memory_sanitizer)`` to check if the code is being built
with :doc:`MemorySanitizer`.

Use ``__has_feature(safe_stack)`` to check if the code is being built
with :doc:`SafeStack`.


Extensions for selectively disabling optimization
=================================================

Clang provides a mechanism for selectively disabling optimizations in functions
and methods.

To disable optimizations in a single function definition, the GNU-style or C++11
non-standard attribute ``optnone`` can be used.

.. code-block:: c++

  // The following functions will not be optimized.
  // GNU-style attribute
  __attribute__((optnone)) int foo() {
    // ... code
  }
  // C++11 attribute
  [[clang::optnone]] int bar() {
    // ... code
  }

To facilitate disabling optimization for a range of function definitions, a
range-based pragma is provided. Its syntax is ``#pragma clang optimize``
followed by ``off`` or ``on``.

All function definitions in the region between an ``off`` and the following
``on`` will be decorated with the ``optnone`` attribute unless doing so would
conflict with explicit attributes already present on the function (e.g. the
ones that control inlining).

.. code-block:: c++

  #pragma clang optimize off
  // This function will be decorated with optnone.
  int foo() {
    // ... code
  }

  // optnone conflicts with always_inline, so bar() will not be decorated.
  __attribute__((always_inline)) int bar() {
    // ... code
  }
  #pragma clang optimize on

If no ``on`` is found to close an ``off`` region, the end of the region is the
end of the compilation unit.

Note that a stray ``#pragma clang optimize on`` does not selectively enable
additional optimizations when compiling at low optimization levels. This feature
can only be used to selectively disable optimizations.

The pragma has an effect on functions only at the point of their definition; for
function templates, this means that the state of the pragma at the point of an
instantiation is not necessarily relevant. Consider the following example:

.. code-block:: c++

  template<typename T> T twice(T t) {
    return 2 * t;
  }

  #pragma clang optimize off
  template<typename T> T thrice(T t) {
    return 3 * t;
  }

  int container(int a, int b) {
    return twice(a) + thrice(b);
  }
  #pragma clang optimize on

In this example, the definition of the template function ``twice`` is outside
the pragma region, whereas the definition of ``thrice`` is inside the region.
The ``container`` function is also in the region and will not be optimized, but
it causes the instantiation of ``twice`` and ``thrice`` with an ``int`` type; of
these two instantiations, ``twice`` will be optimized (because its definition
was outside the region) and ``thrice`` will not be optimized.

Extensions for loop hint optimizations
======================================

The ``#pragma clang loop`` directive is used to specify hints for optimizing the
subsequent for, while, do-while, or c++11 range-based for loop. The directive
provides options for vectorization, interleaving, unrolling and
distribution. Loop hints can be specified before any loop and will be ignored if
the optimization is not safe to apply.

Vectorization and Interleaving
------------------------------

A vectorized loop performs multiple iterations of the original loop
in parallel using vector instructions. The instruction set of the target
processor determines which vector instructions are available and their vector
widths. This restricts the types of loops that can be vectorized. The vectorizer
automatically determines if the loop is safe and profitable to vectorize. A
vector instruction cost model is used to select the vector width.

Interleaving multiple loop iterations allows modern processors to further
improve instruction-level parallelism (ILP) using advanced hardware features,
such as multiple execution units and out-of-order execution. The vectorizer uses
a cost model that depends on the register pressure and generated code size to
select the interleaving count.

Vectorization is enabled by ``vectorize(enable)`` and interleaving is enabled
by ``interleave(enable)``. This is useful when compiling with ``-Os`` to
manually enable vectorization or interleaving.

.. code-block:: c++

  #pragma clang loop vectorize(enable)
  #pragma clang loop interleave(enable)
  for(...) {
    ...
  }

The vector width is specified by ``vectorize_width(_value_)`` and the interleave
count is specified by ``interleave_count(_value_)``, where
_value_ is a positive integer. This is useful for specifying the optimal
width/count of the set of target architectures supported by your application.

.. code-block:: c++

  #pragma clang loop vectorize_width(2)
  #pragma clang loop interleave_count(2)
  for(...) {
    ...
  }

Specifying a width/count of 1 disables the optimization, and is equivalent to
``vectorize(disable)`` or ``interleave(disable)``.

Loop Unrolling
--------------

Unrolling a loop reduces the loop control overhead and exposes more
opportunities for ILP. Loops can be fully or partially unrolled. Full unrolling
eliminates the loop and replaces it with an enumerated sequence of loop
iterations. Full unrolling is only possible if the loop trip count is known at
compile time. Partial unrolling replicates the loop body within the loop and
reduces the trip count.

If ``unroll(enable)`` is specified the unroller will attempt to fully unroll the
loop if the trip count is known at compile time. If the fully unrolled code size
is greater than an internal limit the loop will be partially unrolled up to this
limit. If the trip count is not known at compile time the loop will be partially
unrolled with a heuristically chosen unroll factor.

.. code-block:: c++

  #pragma clang loop unroll(enable)
  for(...) {
    ...
  }

If ``unroll(full)`` is specified the unroller will attempt to fully unroll the
loop if the trip count is known at compile time identically to
``unroll(enable)``. However, with ``unroll(full)`` the loop will not be unrolled
if the loop count is not known at compile time.

.. code-block:: c++

  #pragma clang loop unroll(full)
  for(...) {
    ...
  }

The unroll count can be specified explicitly with ``unroll_count(_value_)`` where
_value_ is a positive integer. If this value is greater than the trip count the
loop will be fully unrolled. Otherwise the loop is partially unrolled subject
to the same code size limit as with ``unroll(enable)``.

.. code-block:: c++

  #pragma clang loop unroll_count(8)
  for(...) {
    ...
  }

Unrolling of a loop can be prevented by specifying ``unroll(disable)``.

Loop Distribution
-----------------

Loop Distribution allows splitting a loop into multiple loops.  This is
beneficial for example when the entire loop cannot be vectorized but some of the
resulting loops can.

If ``distribute(enable))`` is specified and the loop has memory dependencies
that inhibit vectorization, the compiler will attempt to isolate the offending
operations into a new loop.  This optimization is not enabled by default, only
loops marked with the pragma are considered.

.. code-block:: c++

  #pragma clang loop distribute(enable)
  for (i = 0; i < N; ++i) {
    S1: A[i + 1] = A[i] + B[i];
    S2: C[i] = D[i] * E[i];
  }

This loop will be split into two loops between statements S1 and S2.  The
second loop containing S2 will be vectorized.

Loop Distribution is currently not enabled by default in the optimizer because
it can hurt performance in some cases.  For example, instruction-level
parallelism could be reduced by sequentializing the execution of the
statements S1 and S2 above.

If Loop Distribution is turned on globally with
``-mllvm -enable-loop-distribution``, specifying ``distribute(disable)`` can
be used the disable it on a per-loop basis.

Additional Information
----------------------

For convenience multiple loop hints can be specified on a single line.

.. code-block:: c++

  #pragma clang loop vectorize_width(4) interleave_count(8)
  for(...) {
    ...
  }

If an optimization cannot be applied any hints that apply to it will be ignored.
For example, the hint ``vectorize_width(4)`` is ignored if the loop is not
proven safe to vectorize. To identify and diagnose optimization issues use
`-Rpass`, `-Rpass-missed`, and `-Rpass-analysis` command line options. See the
user guide for details.

Extensions to specify floating-point flags
====================================================

The ``#pragma clang fp`` pragma allows floating-point options to be specified
for a section of the source code. This pragma can only appear at file scope or
at the start of a compound statement (excluding comments). When using within a
compound statement, the pragma is active within the scope of the compound
statement.

Currently, only FP contraction can be controlled with the pragma. ``#pragma
clang fp contract`` specifies whether the compiler should contract a multiply
and an addition (or subtraction) into a fused FMA operation when supported by
the target.

The pragma can take three values: ``on``, ``fast`` and ``off``.  The ``on``
option is identical to using ``#pragma STDC FP_CONTRACT(ON)`` and it allows
fusion as specified the language standard.  The ``fast`` option allows fusiong
in cases when the language standard does not make this possible (e.g. across
statements in C)

.. code-block:: c++

  for(...) {
    #pragma clang fp contract(fast)
    a = b[i] * c[i];
    d[i] += a;
  }


The pragma can also be used with ``off`` which turns FP contraction off for a
section of the code. This can be useful when fast contraction is otherwise
enabled for the translation unit with the ``-ffp-contract=fast`` flag.

Specifying an attribute for multiple declarations (#pragma clang attribute)
===========================================================================

The ``#pragma clang attribute`` directive can be used to apply an attribute to
multiple declarations. The ``#pragma clang attribute push`` variation of the
directive pushes a new attribute to the attribute stack. The declarations that
follow the pragma receive the attributes that are on the attribute stack, until
the stack is cleared using a ``#pragma clang attribute pop`` directive. Multiple
push directives can be nested inside each other.

The attributes that are used in the ``#pragma clang attribute`` directives
can be written using the GNU-style syntax:

.. code-block:: c++

  #pragma clang attribute push(__attribute__((annotate("custom"))), apply_to = function)

  void function(); // The function now has the annotate("custom") attribute

  #pragma clang attribute pop

The attributes can also be written using the C++11 style syntax:

.. code-block:: c++

  #pragma clang attribute push([[noreturn]], apply_to = function)

  void function(); // The function now has the [[noreturn]] attribute

  #pragma clang attribute pop

The ``__declspec`` style syntax is also supported:

.. code-block:: c++

  #pragma clang attribute push(__declspec(dllexport), apply_to = function)

  void function(); // The function now has the __declspec(dllexport) attribute

  #pragma clang attribute pop

A single push directive accepts only one attribute regardless of the syntax
used.

Subject Match Rules
-------------------

The set of declarations that receive a single attribute from the attribute stack
depends on the subject match rules that were specified in the pragma. Subject
match rules are specified after the attribute. The compiler expects an
identifier that corresponds to the subject set specifier. The ``apply_to``
specifier is currently the only supported subject set specifier. It allows you
to specify match rules that form a subset of the attribute's allowed subject
set, i.e. the compiler doesn't require all of the attribute's subjects. For
example, an attribute like ``[[nodiscard]]`` whose subject set includes
``enum``, ``record`` and ``hasType(functionType)``, requires the presence of at
least one of these rules after ``apply_to``:

.. code-block:: c++

  #pragma clang attribute push([[nodiscard]], apply_to = enum)

  enum Enum1 { A1, B1 }; // The enum will receive [[nodiscard]]

  struct Record1 { }; // The struct will *not* receive [[nodiscard]]

  #pragma clang attribute pop

  #pragma clang attribute push([[nodiscard]], apply_to = any(record, enum))

  enum Enum2 { A2, B2 }; // The enum will receive [[nodiscard]]

  struct Record2 { }; // The struct *will* receive [[nodiscard]]

  #pragma clang attribute pop

  // This is an error, since [[nodiscard]] can't be applied to namespaces:
  #pragma clang attribute push([[nodiscard]], apply_to = any(record, namespace))

  #pragma clang attribute pop

Multiple match rules can be specified using the ``any`` match rule, as shown
in the example above. The ``any`` rule applies attributes to all declarations
that are matched by at least one of the rules in the ``any``. It doesn't nest
and can't be used inside the other match rules. Redundant match rules or rules
that conflict with one another should not be used inside of ``any``.

Clang supports the following match rules:

- ``function``: Can be used to apply attributes to functions. This includes C++
  member functions, static functions, operators, and constructors/destructors.

- ``function(is_member)``: Can be used to apply attributes to C++ member
  functions. This includes members like static functions, operators, and
  constructors/destructors.

- ``hasType(functionType)``: Can be used to apply attributes to functions, C++
  member functions, and variables/fields whose type is a function pointer. It
  does not apply attributes to Objective-C methods or blocks.

- ``type_alias``: Can be used to apply attributes to ``typedef`` declarations
  and C++11 type aliases.

- ``record``: Can be used to apply attributes to ``struct``, ``class``, and
  ``union`` declarations.

- ``record(unless(is_union))``: Can be used to apply attributes only to
  ``struct`` and ``class`` declarations.

- ``enum``: Can be be used to apply attributes to enumeration declarations.

- ``enum_constant``: Can be used to apply attributes to enumerators.

- ``variable``: Can be used to apply attributes to variables, including
  local variables, parameters, global variables, and static member variables.
  It does not apply attributes to instance member variables or Objective-C
  ivars.

- ``variable(is_thread_local)``: Can be used to apply attributes to thread-local
  variables only.

- ``variable(is_global)``: Can be used to apply attributes to global variables
  only.

- ``variable(is_parameter)``: Can be used to apply attributes to parameters
  only.

- ``variable(unless(is_parameter))``: Can be used to apply attributes to all
  the variables that are not parameters.

- ``field``: Can be used to apply attributes to non-static member variables
  in a record. This includes Objective-C ivars.

- ``namespace``: Can be used to apply attributes to ``namespace`` declarations.

- ``objc_interface``: Can be used to apply attributes to ``@interface``
  declarations.

- ``objc_protocol``: Can be used to apply attributes to ``@protocol``
  declarations.

- ``objc_category``: Can be used to apply attributes to category declarations,
  including class extensions.

- ``objc_method``: Can be used to apply attributes to Objective-C methods,
  including instance and class methods. Implicit methods like implicit property
  getters and setters do not receive the attribute.

- ``objc_method(is_instance)``: Can be used to apply attributes to Objective-C
  instance methods.

- ``objc_property``: Can be used to apply attributes to ``@property``
  declarations.

- ``block``: Can be used to apply attributes to block declarations. This does
  not include variables/fields of block pointer type.

The use of ``unless`` in match rules is currently restricted to a strict set of
sub-rules that are used by the supported attributes. That means that even though
``variable(unless(is_parameter))`` is a valid match rule,
``variable(unless(is_thread_local))`` is not.

Supported Attributes
--------------------

Not all attributes can be used with the ``#pragma clang attribute`` directive.
Notably, statement attributes like ``[[fallthrough]]`` or type attributes
like ``address_space`` aren't supported by this directive. You can determine
whether or not an attribute is supported by the pragma by referring to the
:doc:`individual documentation for that attribute <AttributeReference>`.

The attributes are applied to all matching declarations individually, even when
the attribute is semantically incorrect. The attributes that aren't applied to
any declaration are not verified semantically.

Specifying section names for global objects (#pragma clang section)
===================================================================

The ``#pragma clang section`` directive provides a means to assign section-names
to global variables, functions and static variables.

The section names can be specified as:

.. code-block:: c++

  #pragma clang section bss="myBSS" data="myData" rodata="myRodata" text="myText"

The section names can be reverted back to default name by supplying an empty
string to the section kind, for example:

.. code-block:: c++

  #pragma clang section bss="" data="" text="" rodata=""

The ``#pragma clang section`` directive obeys the following rules:

* The pragma applies to all global variable, statics and function declarations
  from the pragma to the end of the translation unit.

* The pragma clang section is enabled automatically, without need of any flags.

* This feature is only defined to work sensibly for ELF targets.

* If section name is specified through _attribute_((section("myname"))), then
  the attribute name gains precedence.

* Global variables that are initialized to zero will be placed in the named
  bss section, if one is present.

* The ``#pragma clang section`` directive does not does try to infer section-kind
  from the name. For example, naming a section "``.bss.mySec``" does NOT mean
  it will be a bss section name.

* The decision about which section-kind applies to each global is taken in the back-end.
  Once the section-kind is known, appropriate section name, as specified by the user using
  ``#pragma clang section`` directive, is applied to that global.