Sequentielle Ausführung von `jacobian_apply_volume()` auf der GPU

Hier soll erstmals die Funktion jacobian_apply_volume() im local Operator auf der GPU ausgeführt werden. Vorerst nur sequentiell, und Element-für-Element -- wir wollen erstmal sicherstellen dass die Funktion überhaupt GPU-fähig ist und gegebenenfalls sie gegebenenfalls fixen.

Hierzu ist folgendes in gridoperator.hh in nonlinear_jacobian_apply() nötig:

Die Anzahl der Basisfunktionen size muß constexpr werden damit sie als template-parameter für std::array taugt. Die Methode .size() der LocalBasis sollte schon static constexpr sein -- sie muß bloß auf der Klasse aufgerufen werden statt auf dem Object
zl and wl need to become std::arrays rather than std::vectors so they can be captured by the kernel lambda
The call to jacobian_apply_volume() needs to be wrapped into a kernel lambda, which needs to be executed by PACXX with a single thread
outside the loop storage needs to be allocated on the device for yl
The kernel lambda needs to initialize the on-device storage for yl to 0.0 before calling jacobian_apply_volume()
After the call, the computed content of yl needs to be transferred back to the CPU.

Blocked by: