Thanks for your help guys!!
So here is my update:
- I have changed all of my risers (16x and 1x cards) and the usb cables.
- I removed the drivers with DDU
- I installed the latest nvidia drivers
- I installed one by one. I did also a restart after each graphics card
...and now the 9 gpus are running stable for about an hour and then I get the same error as before, just ~ 60 min. later...
I'm still oc-ing with MSI AB --> thx for the hint, jmigdlc99!
I always try to get a 1500 mhz core clock.
my oc settings:
evga cards:
pt: 70 --> before the reinstallation: 67
cclock: 0 --> before the reinstallation: most of the time between -20 -100
memclock: 700 --> before the reinstallation: between 760 and 800
msi 1070 cards:
pt: 52 --> before the reinstallation: 50
cclock: 0 --> before the reinstallation: most of the time between -80 -100
memclock: 700 --> before the reinstallation: between 760 and 800
palit card:
pt: 65 --> before the reinstallation: 63
cclock: 0 --> before the reinstallation: most of the time between -20 -100
memclock: 700 --> before the reinstallation: between 760 and 850
msi 1080 card:
pt: 50 --> before the reinstallation: 42
cclock: 0 --> before the reinstallation: most of the time between -20 -100
memclock: 700 --> before the reinstallation: between 800 and 880
could it be my "1 to 4 pci-e riser"? unfortunately I cannot change him, because I have only one...
.... now it is crashed again... after about 60 min. as I already mentioned above

Same error message... Threads not responding, etc...
edit:
I just saw that the core clock of the 1st GPU is jumping around..?!
Is there anything else I can do?