People who deal with nvidia GPUs on Linux for computation: do you have a favorite stress test program that will reliably make marginal GPUs fail (eg, drop off the PCIe bus)? I thought I had found one but then it failed to consistently reproduce our problem (while people here can frequently do it but only with opaque SLURM jobs that we can't really pick up and use for diagnostics, plus graduate students are busy people).