Make example to show the expected level of detail
Adjust the "D=scalar constant on e and T=affine linear" setting from matrix-free/quadrature-loop.md:
- We're interested in the theoretical limit here, so don't hesitate use a different loop ordering than the one actually used in the particular implementation that code generator emits
- Do try to model effects of caches. Well, one cache. In one of two ways:
- Assume that the slower memory hierarchy dominates. Possibly make a case that it indeed dominates, or which measurement need to be taken to show that it dominates
- estimate the amount of transfers done on each hierachy, e.g. count only compulsory data movement per hierarchy, and apply the corresponding
t^\text{transfer}