Show HN: Llama 3.1 70B on a single RTX 3090 via NVMe-to-GPU bypassing the CPU
Link: https://github.com/xaskasdf/ntransformer
Discussion: https://news.ycombinator.com/item?id=47104667
Show HN: Llama 3.1 70B on a single RTX 3090 via NVMe-to-GPU bypassing the CPU
Link: https://github.com/xaskasdf/ntransformer
Discussion: https://news.ycombinator.com/item?id=47104667
If you have a fediverse account, you can quote this note from your own instance. Search https://social.lansky.name/users/hn50/statuses/116111376842284800 on your instance and quote it. (Note that quoting is not supported in Mastodon.)