llm tool to help with quantization
To decrease the size and memory footprint of machine learning models, a technique called quantization is employed. This method, akin to lossy image compression, converts model weights into lower precision formats such as 8-bit or 4-bit. While this significantly reduces resource demands, it's important to note that, like image compression, quantization can potentially lead to a reduction in the model's accuracy.
Tool that you can use are
https://github.com/ModelCloud/GPTQModel
https://github.com/casper-hansen/AutoAWQ
Comments