1 min readfrom Machine Learning

INT3 compression+fused metal kernels [R]

Hey guys, I am a researcher and solo founder. I compress models with INT3 at +0.14 nats and built a 2-bit KV cache for long-horizon tasks. I shipped both (INT3 model + INT2 KV) with custom fused Metal kernels for Mac (M-series). Currently Qwen 7B is available in preview.

#install brew install reinforceai/spiral/spiral #chat spiral-chat 

I am optimizing kernels further and working on Triton kernels for GPU support. There is still more room to pack more efficiently, I will share more models soon. I will appreciate any feedback or any model you want me to compress within 100B parameters.

github.com/ReinforceAI/spiral

submitted by /u/Financial_Buy_2287
[link] [comments]

Want to read more?

Check out the full article on the original site

View original article

Tagged with

#natural language processing for spreadsheets
#generative AI for data analysis
#rows.com
#Excel alternatives for data analysis
#financial modeling with spreadsheets
#no-code spreadsheet solutions
#INT3
#compression
#custom fused Metal kernels
#fused metal kernels
#long-horizon tasks
#2-bit KV cache
#Qwen 7B
#Triton kernels
#model compression
#optimizing kernels
#GPU support
#Mac M-series
#packing efficiency
#ReinforceAI