[walker] use parallel_for instead of parallel_reduce
Currently, we are using parallel_reduce with an empty join function, which corresponds to a parallel_for if I am not mistaken. So this MR changes the walker to use a parallel_for directly. Any objections, @ag-ohlberger/dune-community? Probably does not matter much in terms of performance, though.