Split linearsolver into matrix-based and matrix-free parts

in particular that means device-executables no longer need to include amg and bcrsmatrix