Hackers' Pub

Syntax	Description	Examples
`"` keyword `"`	Finds the string within quotes, including spaces. Case-insensitive. (Escape quotes inside with `\"`)	`"Hackers' Pub"`
`from:` handle	Finds content written by the specified user.	`from:hongminhee` `from:hongminhee@hollo.social`
`lang:` ISO 639-1	Finds content written in the specified language.	`lang:en`
`#` tag	Finds content with the specified tag. Case-insensitive.	`#HackersPub`
condition condition	Finds content that satisfies both conditions on either side of the space (logical AND).	`"Hackers' Pub" lang:en`
condition `OR` condition	Finds content that satisfies at least one of the conditions on either side of the OR operator (logical OR).	`#HackersPub OR "Hackers' Pub" lang:en`
`(` condition `)`	Combines the operators within the parentheses first.	`(#HackersPub OR "Hackers' Pub" OR "Hackers Pub") lang:en`

geeknews_bot @geeknews_bot@sns.lemondouble.com

2/23/2026, 1:55:17 AM

Public

ntransformer - 싱글 RTX 3090에서 Llama 3.1 70B를 실행하는 NVMe-to-GPU 추론 엔진
------------------------------
- *C++/CUDA 기반 LLM 추론 엔진* 으로, GPU 메모리 스트리밍과 NVMe 직접 입출력을 통해 *Llama 70B 모델을 RTX 3090(24GB VRAM)* 에서 실행 가능
- *3단계 적응형 캐싱 구조* 를 사용해 VRAM, 고정 RAM, NVMe/mmap을 자동 분할하며, mmap 대비 *최대 83배 속도 향상* 달성
- *gpu-nvme-direct 백엔드* 는 CPU…
------------------------------
https://news.hada.io/topic?id=26894&utm_source=googlechat&utm_medium=bot&utm_campaign=1834

ntransformer - 싱글 RTX 3090에서 Llama 3.1 70B를 실행하는 N | GeekNews

C++/CUDA 기반 LLM 추론 엔진으로, GPU 메모리 스트리밍과 NVMe 직접 입출력을 통해 Llama 70B 모델을 RTX 3090(24GB VRAM) 에서 실행 가능3단계 적응형 캐싱 구조를 사용해 VRAM, 고정 RAM, NVMe/mmap을 자동 분할하며, mmap 대비 최대 83배 속도 향상 달성gpu-nvme-direct 백엔드는 CPU를 완

news.hada.io · GeekNews

If you have a fediverse account, you can quote this note from your own instance. Search https://sns.lemondouble.com/notes/aj239gf4b3 on your instance and quote it. (Note that quoting is not supported in Mastodon.)