@artagnonRamkumar Ramachandra
@uucidlNicolas
@pervognsen
@asbAlex Bradbury the size of my toot was full while I had also a point exactly about kernels as well π. If you take just the simplest kernel, like matrix multiplication, a generic kernel won't beat the specific GPU ones (Cuda, rocM) that are used in existing ML workloads
If you have a fediverse account, you can quote this note from your own instance. Search https://mastodon.social/users/xoofx/statuses/114721673848469730 on your instance and quote it. (Note that quoting is not supported in Mastodon.)