•1 min read•from Machine Learning
INT3 compression+fused metal kernels [R]
Hey guys, I am a researcher and solo founder. I compress models with INT3 at +0.14 nats and built a 2-bit KV cache for long-horizon tasks. I shipped both (INT3 model + INT2 KV) with custom fused Metal kernels for Mac (M-series). Currently Qwen 7B is available in preview.
#install brew install reinforceai/spiral/spiral #chat spiral-chat I am optimizing kernels further and working on Triton kernels for GPU support. There is still more room to pack more efficiently, I will share more models soon. I will appreciate any feedback or any model you want me to compress within 100B parameters.
[link] [comments]
Want to read more?
Check out the full article on the original site
Tagged with
#natural language processing for spreadsheets
#generative AI for data analysis
#rows.com
#Excel alternatives for data analysis
#financial modeling with spreadsheets
#no-code spreadsheet solutions
#INT3
#compression
#custom fused Metal kernels
#fused metal kernels
#long-horizon tasks
#2-bit KV cache
#Qwen 7B
#Triton kernels
#model compression
#optimizing kernels
#GPU support
#Mac M-series
#packing efficiency
#ReinforceAI