4x faster LLM inference (Flash Attention guy's company)
Link: https://www.together.ai/blog/adaptive-learning-speculator-system-atlas
Discussion: https://news.ycombinator.com/item?id=45556474
4x faster LLM inference (Flash Attention guy's company)
Link: https://www.together.ai/blog/adaptive-learning-speculator-system-atlas
Discussion: https://news.ycombinator.com/item?id=45556474
If you have a fediverse account, you can quote this note from your own instance. Search https://social.lansky.name/users/hn50/statuses/115361218992985408 on your instance and quote it. (Note that quoting is not supported in Mastodon.)