Skip to content
Snippets Groups Projects
Commit ac41a1b7 authored by James Molloy's avatar James Molloy
Browse files

Rewrite ARM NEON intrinsic emission completely.

There comes a time in the life of any amateur code generator when dumb string
concatenation just won't cut it any more. For NeonEmitter.cpp, that time has
come.

There were a bunch of magic type codes which meant different things depending on
the context. There were a bunch of special cases that really had no reason to be
there but the whole thing was so creaky that removing them would cause something
weird to fall over. There was a 1000 line switch statement for code generation
involving string concatenation, which actually did lexical scoping to an extent
(!!) with a bunch of semi-repeated cases.

I tried to refactor this three times in three different ways without
success. The only way forward was to rewrite the entire thing. Luckily the
testing coverage on this stuff is absolutely massive, both with regression tests
and the "emperor" random test case generator.

The main change is that previously, in arm_neon.td a bunch of "Operation"s were
defined with special names. NeonEmitter.cpp knew about these Operations and
would emit code based on a huge switch. Actually this doesn't make much sense -
the type information was held as strings, so type checking was impossible. Also
TableGen's DAG type actually suits this sort of code generation very well
(surprising that...)

So now every operation is defined in terms of TableGen DAGs. There are a bunch
of operators to use, including "op" (a generic unary or binary operator), "call"
(to call other intrinsics) and "shuffle" (take a guess...). One of the main
advantages of this apart from making it more obvious what is going on, is that
we have proper type inference. This has two obvious advantages:

  1) TableGen can error on bad intrinsic definitions easier, instead of just
     generating wrong code.
  2) Calls to other intrinsics are typechecked too. So
     we no longer need to work out whether the thing we call needs to be the Q-lane
     version or the D-lane version - TableGen knows that itself!

Here's an example: before:

  case OpAbdl: {
    std::string abd = MangleName("vabd", typestr, ClassS) + "(__a, __b)";
    if (typestr[0] != 'U') {
      // vabd results are always unsigned and must be zero-extended.
      std::string utype = "U" + typestr.str();
      s += "(" + TypeString(proto[0], typestr) + ")";
      abd = "(" + TypeString('d', utype) + ")" + abd;
      s += Extend(utype, abd) + ";";
    } else {
      s += Extend(typestr, abd) + ";";
    }
    break;
  }

after:

  def OP_ABDL     : Op<(cast "R", (call "vmovl", (cast $p0, "U",
                                                       (call "vabd", $p0, $p1))))>;

As an example of what happens if you do something wrong now, here's what happens
if you make $p0 unsigned before the call to "vabd" - that is, $p0 -> (cast "U",
$p0):

arm_neon.td:574:1: error: No compatible intrinsic found - looking up intrinsic 'vabd(uint8x8_t, int8x8_t)'
Available overloads:
  - float64x2_t vabdq_v(float64x2_t, float64x2_t)
  - float64x1_t vabd_v(float64x1_t, float64x1_t)
  - float64_t vabdd_f64(float64_t, float64_t)
  - float32_t vabds_f32(float32_t, float32_t)
... snip ...

This makes it seriously easy to work out what you've done wrong in fairly nasty
intrinsics.

As part of this I've massively beefed up the documentation in arm_neon.td too.

Things still to do / on the radar:
  - Testcase generation. This was implemented in the previous version and not in
    the new one, because
    - Autogenerated tests are not being run. The testcase in test/ differs from
      the autogenerated version.
    - There were a whole slew of special cases in the testcase generation that just
      felt (and looked) like hacks.
    If someone really feels strongly about this, I can try and reimplement it too.
  - Big endian. That's coming soon and should be a very small diff on top of this one.

git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@211101 91177308-0d34-0410-b5e6-96231b3b80d8
parent b511fe98
No related branches found
No related tags found
No related merge requests found
This diff is collapsed.
...@@ -44,5 +44,5 @@ float32x4_t test_vcvtx_high_f32_f64(float32x2_t x, float64x2_t v) { ...@@ -44,5 +44,5 @@ float32x4_t test_vcvtx_high_f32_f64(float32x2_t x, float64x2_t v) {
return vcvtx_high_f32_f64(x, v); return vcvtx_high_f32_f64(x, v);
// CHECK: llvm.aarch64.neon.fcvtxn.v2f32.v2f64 // CHECK: llvm.aarch64.neon.fcvtxn.v2f32.v2f64
// CHECK: shufflevector // CHECK: shufflevector
// CHECK-NEXT: ret // CHECK: ret
} }
...@@ -17,7 +17,7 @@ float32x2_t test2(uint32x2_t x) { ...@@ -17,7 +17,7 @@ float32x2_t test2(uint32x2_t x) {
float32x2_t test3(uint32x2_t x) { float32x2_t test3(uint32x2_t x) {
// FIXME: The "incompatible result type" error is due to pr10112 and should be // FIXME: The "incompatible result type" error is due to pr10112 and should be
// removed when that is fixed. // removed when that is fixed.
return vcvt_n_f32_u32(x, 0); // expected-error {{argument should be a value from 1 to 32}} expected-error {{incompatible result type}} return vcvt_n_f32_u32(x, 0); // expected-error {{argument should be a value from 1 to 32}}
} }
typedef signed int vSInt32 __attribute__((__vector_size__(16))); typedef signed int vSInt32 __attribute__((__vector_size__(16)));
......
...@@ -5,7 +5,7 @@ ...@@ -5,7 +5,7 @@
// rdar://13527900 // rdar://13527900
void vcopy_reject(float32x4_t vOut0, float32x4_t vAlpha, int t) { void vcopy_reject(float32x4_t vOut0, float32x4_t vAlpha, int t) {
vcopyq_laneq_f32(vOut0, 1, vAlpha, t); // expected-error {{argument to '__builtin_neon_vgetq_lane_f32' must be a constant integer}} expected-error {{initializing 'float32_t' (aka 'float') with an expression of incompatible type 'void'}} vcopyq_laneq_f32(vOut0, 1, vAlpha, t); // expected-error {{argument to '__builtin_neon_vgetq_lane_f32' must be a constant integer}}
} }
// rdar://problem/15256199 // rdar://problem/15256199
......
This diff is collapsed.
...@@ -61,6 +61,9 @@ void EmitClangCommentCommandList(RecordKeeper &Records, raw_ostream &OS); ...@@ -61,6 +61,9 @@ void EmitClangCommentCommandList(RecordKeeper &Records, raw_ostream &OS);
void EmitNeon(RecordKeeper &Records, raw_ostream &OS); void EmitNeon(RecordKeeper &Records, raw_ostream &OS);
void EmitNeonSema(RecordKeeper &Records, raw_ostream &OS); void EmitNeonSema(RecordKeeper &Records, raw_ostream &OS);
void EmitNeonTest(RecordKeeper &Records, raw_ostream &OS); void EmitNeonTest(RecordKeeper &Records, raw_ostream &OS);
void EmitNeon2(RecordKeeper &Records, raw_ostream &OS);
void EmitNeonSema2(RecordKeeper &Records, raw_ostream &OS);
void EmitNeonTest2(RecordKeeper &Records, raw_ostream &OS);
void EmitClangAttrDocs(RecordKeeper &Records, raw_ostream &OS); void EmitClangAttrDocs(RecordKeeper &Records, raw_ostream &OS);
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment