iterate over reordered elements list to avoid write conflicts
we can try to build groups of elements that share no vertices, so these elements can be calculated and the values written for all vertices conflict-free on the gpu.
For an initial test, sets of buckets need to be built:
- Define a bucket size (e.g. 32) and put one element in the bucket. For the bucket, store all element IDs and all vertex IDs of those elements.
- Add an element that has no common vertices with the elements already in the bucket. (compare all vertices of the element with the list of element IDs for the bucket)
- repeat from (2.) until the bucket is full OR if no more elements can be added.
Additionally, the gridoperator needs to be manipulated so that it iterates over the elements of one bucket and only afterwards switches to the next bucket.