The post outlines the author’s endeavor to fine-tune Large Language Models (LLMs) for audio processing. Motivated by the potential to create LLMs capable of describing human voices, the author discusses their process, including adapting cross-domain encoders, debugging issues, and achieving promising training results. The ultimate goal is to expand the model’s capabilities to tasks such as transcription and speaker identification.
Listening with LLM
The post outlines the author’s endeavor to fine-tune Large Language Models (LLMs) for audio processing. Motivated by the potential to create LLMs capable of describing human voices, the author discusses their process, including adapting cross-domain encoders, debugging issues, and achieving promising training results. The ultimate goal is to expand the model’s capabilities to tasks such as transcription and speaker identification.