Listening with LLM

The post outlines the author’s endeavor to fine-tune Large Language Models (LLMs) for audio processing. Motivated by the potential to create LLMs capable of describing human voices, the author discusses their process, including adapting cross-domain encoders, debugging issues, and achieving promising training results. The ultimate goal is to expand the model’s capabilities to tasks such as transcription and speaker identification.

Continue reading