[CUDA] Implement __ldg using intrinsics. (58d65b28) · Commits · HPC2SE-Project / pacxx-clang

Commit 58d65b28 authored 8 years ago by Justin Lebar

[CUDA] Implement __ldg using intrinsics.

Summary:
Previously it was implemented as inline asm in the CUDA headers.

This change allows us to use the [addr+imm] addressing mode when
executing ld.global.nc instructions.  This translates into a 1.3x
speedup on some benchmarks that call this instruction from within an
unrolled loop.

Reviewers: tra, rsmith

Subscribers: jhen, cfe-commits, jholewinski

Differential Revision: http://reviews.llvm.org/D19990

git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@270150 91177308-0d34-0410-b5e6-96231b3b80d8

parent 52b5c697

No related branches found

No related tags found

No related merge requests found

Hide whitespace changes

Inline Side-by-side

Showing with 447 additions and 3 deletions

Please register or to comment