pacxx-projectseminar-2019 merge requests

Include fix
2019-09-29T17:28:33+02:00
Alexander Gerwing

Follow move of pacxx-docker to hpc2se namespace
2019-08-21T12:03:38+02:00
Dr. Jorrit Fahlke
Addresses: HPC2SE-Project/pacxx-ci#8

WIP: Scatterkernelreorder optimization
2019-09-29T17:38:54+02:00
Alexander Gerwing
Some smaller optimizations to scatterkernelreorder.

Resolve "Figure out why RV times out with accumulation of results on host"
2019-08-06T17:21:11+02:00
Alexander Gerwing
Closes #70 and #83

Bucket colouring
2019-08-08T13:37:44+02:00
Alexander Gerwing

Fix variant visitation
2019-07-23T15:27:15+02:00
Dr. Jorrit Fahlke
So, as it turns out, pacxx supports c++17 on agamemnon, probably due to Ubuntu
18.04 (compared to Ubuntu 16.04 in the CI). This means `Dune::Std::variant`
is just `std::variant`, rather than using Dune's fallback implementation.
Now, with the fallback implementation, it was impossible to find `visit()` via
ADL for some reason, so I had been using the member function `visit()`, which
isn't part of `std::variant`. To satisfy both, try `using Dune::Std::visit`
aren't doing that for the individual threads running on the device either.
Addresses: #88

Human-friendly category/mpi mode printing
2019-07-23T11:54:51+02:00
Dr. Jorrit Fahlke

Enable MPI for device
2019-07-23T12:26:31+02:00
Dr. Jorrit Fahlke
Addresses: #88

Somewhat simplify test definition for device tests
2019-07-23T11:23:37+02:00
Dr. Jorrit Fahlke

Compute and check L2 error
2019-07-18T10:50:00+02:00
Dr. Jorrit Fahlke
Addresses: #88
This is needed for MPI-parallel computation. In a parallel setting the partitioning is not always deterministic, so it becomes difficult to generate reference output files to compare against. This introduces a check on
This is needed for MPI-parallel computation. In a parallel setting the partitioning is not always deterministic, so it becomes difficult to generate reference output files to compare against. This introduces a check on...Addresses: #88
This is needed for MPI-parallel computation. In a parallel setting the partitioning is not always deterministic, so it becomes difficult to generate reference output files to compare against. This introduces a check on the [$`L^2`$ error norm](https://en.wikipedia.org/wiki/Lp_space#Lp_spaces) of the computed solution with respect to the known analytical solution. In particular, for a given computation with refinement level $`l`$ we check that the following holds:
```math
\|x_l-x_\text{ref}\|_{L^2} < C'·h_l^2 = C·2^{-2·l}
```
- $`x_l`$ is the computed solution at refinement level $`l`$
- $`x_\text{ref}`$ is the analytic reference solution
- $`h_l`$ is the size of the mesh elements at refinement level $`l`$. For the structured refinement we are using, we have $`h_l=2^{-l}·h_0`$. The exact definition of "size of the element" isn't that important, what is important is that it halves with every step of refinement.
- $`C`$ is a parameter that needs to be determined experimentally, such that the above holds for all refinement levels $`l`$ we are interested in. It is usually something like the $`L^2`$ error norm at refinement level 0. But it can happen that the error at refinement level 0 is "too good": the above inequality only makes a statement about the upper limit for the error, not the lower limit. In such which case $`C`$ needs to be enlarged artificially to the inequality also holds for the other level we are interested in.
- $`C'`$ is just $`\frac{C}{h_0}`$, it is just used to write the right hand side of the inequality in a more familiar form that might be found in a textbook
The square in $`h_l^2`$ (or equivalently, the 2 in the exponent in $`2^{-2·l}`$) is actually a property of the finite element scheme we are using. (For Q1 ansatz function you usually have this 2, for Q2 you would have 3, etc. This is the reason why poeple bother with higher order ansatz functions: it allows for much coarser meshes while still keeping the error below a certain level.)

Zero the interior of the initial solution
2019-07-17T15:47:00+02:00
Dr. Jorrit Fahlke
Closes: #92

Add mpi support
2019-07-24T09:07:32+02:00
Dr. Jorrit Fahlke
Implement MPI parallelism for the matrix-free host code. Leaving parallel device code for other MR's.
Addresses: #88
Addresses: #88
Resolve "device-v2 flavor is not included in the default flavours in the bench script"
2019-06-24T12:41:27+02:00
Alexander Gerwing
Closes #87

Optimization version 3
2019-08-06T14:36:03+02:00
Alexander Gerwing
Try to reduce the amount of uploads per iteration, change the order of execution for the scatter kernel.

Resolve "yl_tmp (re)initialization"
2019-06-24T18:03:54+02:00
Alexander Gerwing
Closes #82

First optimization attempts
2019-06-20T12:57:39+02:00
Alexander Gerwing
Closes #55

Resolve "minimal reproducer to test += vs. = assignment in gridoperator.h:153"
2019-06-19T15:41:42+02:00
Alexander Gerwing
Closes #83

Gridoperator device v2
2019-06-24T12:25:27+02:00
Alexander Gerwing
Introduced a second version of the device gridoperator. Optimizations can now be applied to the PPS::v2:: classes, without affecting the original brute-force version.

CRTP-less way to semi-automatically determine timer names
2019-06-17T20:37:32+02:00
Dr. Jorrit Fahlke
Just make the `trafo` argument to `nonlinear_jacobian_apply` a proper functor, not a lambda, that demangles to something benign.
Closes: #81
Closes: #81Just make the `trafo` argument to `nonlinear_jacobian_apply` a proper functor, not a lambda, that demangles to something benign.
Closes: #81