Skip to content
Snippets Groups Projects
  • James Molloy's avatar
    ac41a1b7
    Rewrite ARM NEON intrinsic emission completely. · ac41a1b7
    James Molloy authored
    There comes a time in the life of any amateur code generator when dumb string
    concatenation just won't cut it any more. For NeonEmitter.cpp, that time has
    come.
    
    There were a bunch of magic type codes which meant different things depending on
    the context. There were a bunch of special cases that really had no reason to be
    there but the whole thing was so creaky that removing them would cause something
    weird to fall over. There was a 1000 line switch statement for code generation
    involving string concatenation, which actually did lexical scoping to an extent
    (!!) with a bunch of semi-repeated cases.
    
    I tried to refactor this three times in three different ways without
    success. The only way forward was to rewrite the entire thing. Luckily the
    testing coverage on this stuff is absolutely massive, both with regression tests
    and the "emperor" random test case generator.
    
    The main change is that previously, in arm_neon.td a bunch of "Operation"s were
    defined with special names. NeonEmitter.cpp knew about these Operations and
    would emit code based on a huge switch. Actually this doesn't make much sense -
    the type information was held as strings, so type checking was impossible. Also
    TableGen's DAG type actually suits this sort of code generation very well
    (surprising that...)
    
    So now every operation is defined in terms of TableGen DAGs. There are a bunch
    of operators to use, including "op" (a generic unary or binary operator), "call"
    (to call other intrinsics) and "shuffle" (take a guess...). One of the main
    advantages of this apart from making it more obvious what is going on, is that
    we have proper type inference. This has two obvious advantages:
    
      1) TableGen can error on bad intrinsic definitions easier, instead of just
         generating wrong code.
      2) Calls to other intrinsics are typechecked too. So
         we no longer need to work out whether the thing we call needs to be the Q-lane
         version or the D-lane version - TableGen knows that itself!
    
    Here's an example: before:
    
      case OpAbdl: {
        std::string abd = MangleName("vabd", typestr, ClassS) + "(__a, __b)";
        if (typestr[0] != 'U') {
          // vabd results are always unsigned and must be zero-extended.
          std::string utype = "U" + typestr.str();
          s += "(" + TypeString(proto[0], typestr) + ")";
          abd = "(" + TypeString('d', utype) + ")" + abd;
          s += Extend(utype, abd) + ";";
        } else {
          s += Extend(typestr, abd) + ";";
        }
        break;
      }
    
    after:
    
      def OP_ABDL     : Op<(cast "R", (call "vmovl", (cast $p0, "U",
                                                           (call "vabd", $p0, $p1))))>;
    
    As an example of what happens if you do something wrong now, here's what happens
    if you make $p0 unsigned before the call to "vabd" - that is, $p0 -> (cast "U",
    $p0):
    
    arm_neon.td:574:1: error: No compatible intrinsic found - looking up intrinsic 'vabd(uint8x8_t, int8x8_t)'
    Available overloads:
      - float64x2_t vabdq_v(float64x2_t, float64x2_t)
      - float64x1_t vabd_v(float64x1_t, float64x1_t)
      - float64_t vabdd_f64(float64_t, float64_t)
      - float32_t vabds_f32(float32_t, float32_t)
    ... snip ...
    
    This makes it seriously easy to work out what you've done wrong in fairly nasty
    intrinsics.
    
    As part of this I've massively beefed up the documentation in arm_neon.td too.
    
    Things still to do / on the radar:
      - Testcase generation. This was implemented in the previous version and not in
        the new one, because
        - Autogenerated tests are not being run. The testcase in test/ differs from
          the autogenerated version.
        - There were a whole slew of special cases in the testcase generation that just
          felt (and looked) like hacks.
        If someone really feels strongly about this, I can try and reimplement it too.
      - Big endian. That's coming soon and should be a very small diff on top of this one.
    
    git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@211101 91177308-0d34-0410-b5e6-96231b3b80d8
    ac41a1b7
    History
    Rewrite ARM NEON intrinsic emission completely.
    James Molloy authored
    There comes a time in the life of any amateur code generator when dumb string
    concatenation just won't cut it any more. For NeonEmitter.cpp, that time has
    come.
    
    There were a bunch of magic type codes which meant different things depending on
    the context. There were a bunch of special cases that really had no reason to be
    there but the whole thing was so creaky that removing them would cause something
    weird to fall over. There was a 1000 line switch statement for code generation
    involving string concatenation, which actually did lexical scoping to an extent
    (!!) with a bunch of semi-repeated cases.
    
    I tried to refactor this three times in three different ways without
    success. The only way forward was to rewrite the entire thing. Luckily the
    testing coverage on this stuff is absolutely massive, both with regression tests
    and the "emperor" random test case generator.
    
    The main change is that previously, in arm_neon.td a bunch of "Operation"s were
    defined with special names. NeonEmitter.cpp knew about these Operations and
    would emit code based on a huge switch. Actually this doesn't make much sense -
    the type information was held as strings, so type checking was impossible. Also
    TableGen's DAG type actually suits this sort of code generation very well
    (surprising that...)
    
    So now every operation is defined in terms of TableGen DAGs. There are a bunch
    of operators to use, including "op" (a generic unary or binary operator), "call"
    (to call other intrinsics) and "shuffle" (take a guess...). One of the main
    advantages of this apart from making it more obvious what is going on, is that
    we have proper type inference. This has two obvious advantages:
    
      1) TableGen can error on bad intrinsic definitions easier, instead of just
         generating wrong code.
      2) Calls to other intrinsics are typechecked too. So
         we no longer need to work out whether the thing we call needs to be the Q-lane
         version or the D-lane version - TableGen knows that itself!
    
    Here's an example: before:
    
      case OpAbdl: {
        std::string abd = MangleName("vabd", typestr, ClassS) + "(__a, __b)";
        if (typestr[0] != 'U') {
          // vabd results are always unsigned and must be zero-extended.
          std::string utype = "U" + typestr.str();
          s += "(" + TypeString(proto[0], typestr) + ")";
          abd = "(" + TypeString('d', utype) + ")" + abd;
          s += Extend(utype, abd) + ";";
        } else {
          s += Extend(typestr, abd) + ";";
        }
        break;
      }
    
    after:
    
      def OP_ABDL     : Op<(cast "R", (call "vmovl", (cast $p0, "U",
                                                           (call "vabd", $p0, $p1))))>;
    
    As an example of what happens if you do something wrong now, here's what happens
    if you make $p0 unsigned before the call to "vabd" - that is, $p0 -> (cast "U",
    $p0):
    
    arm_neon.td:574:1: error: No compatible intrinsic found - looking up intrinsic 'vabd(uint8x8_t, int8x8_t)'
    Available overloads:
      - float64x2_t vabdq_v(float64x2_t, float64x2_t)
      - float64x1_t vabd_v(float64x1_t, float64x1_t)
      - float64_t vabdd_f64(float64_t, float64_t)
      - float32_t vabds_f32(float32_t, float32_t)
    ... snip ...
    
    This makes it seriously easy to work out what you've done wrong in fairly nasty
    intrinsics.
    
    As part of this I've massively beefed up the documentation in arm_neon.td too.
    
    Things still to do / on the radar:
      - Testcase generation. This was implemented in the previous version and not in
        the new one, because
        - Autogenerated tests are not being run. The testcase in test/ differs from
          the autogenerated version.
        - There were a whole slew of special cases in the testcase generation that just
          felt (and looked) like hacks.
        If someone really feels strongly about this, I can try and reimplement it too.
      - Big endian. That's coming soon and should be a very small diff on top of this one.
    
    git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@211101 91177308-0d34-0410-b5e6-96231b3b80d8
Code owners
Assign users and groups as approvers for specific file changes. Learn more.