Q*Bert Reynolds@sh.itjust.workstoTechnology@lemmy.ml•1-bit LLM performs similarly to full-precision Transformer LLMs with the same model size and training tokens but is much more efficient in terms of latency, memory, throughput, and energy consumption.
4·
7 months agoSays 1-bit then goes on to describe inputs as -1, 0, or 1. That’s 2-bit. Am I missing something here?
“Engineers have been circulating an old, famous-among-programmers web comic about how all modern digital infrastructure rests on a project maintained by some random guy in Nebraska. (In their telling, Mr. Freund is the random guy from Nebraska.)”
That’s not quite right. Lasse Collin is the random guy in Nebraska. Freund is the guy that noticed the whole thing was about to topple.