Meta has unveiled NotebookLlama, an open-source version of the podcast generation feature popularized by Google’s NotebookLM. This new tool leverages Meta’s Llama models to deliver podcast-style content by processing text files, allowing users to create engaging audio digests.
The process begins with NotebookLlama converting uploaded documents—such as PDFs of articles or blog posts—into transcripts. It then enhances these transcripts by introducing dramatization and dynamic interruptions before utilizing open text-to-speech models to produce the final audio output.
However, early reviews indicate that the audio quality of NotebookLlama falls short compared to its predecessor, NotebookLM. Samples reviewed exhibit a distinctly robotic tone, with instances of voices overlapping in unnatural ways. Despite this, the researchers behind the project are optimistic about future improvements.
On the GitHub page for NotebookLlama, the team noted, “The text-to-speech model is the limitation of how natural this will sound. Another potential improvement could involve having two agents engage in a debate on the topic, rather than relying on a single model to draft the podcast outline.”
NotebookLlama is not the first initiative aimed at replicating NotebookLM’s podcast functionality. While various projects have emerged, none have effectively tackled the pervasive issue of AI-generated content, commonly referred to as the “hallucination problem.” This means that users can expect some inaccuracies or fabricated information within the generated podcasts, a challenge that continues to affect AI technologies.
As Meta releases NotebookLlama into the public domain, it aims to enhance accessibility and collaboration in podcast production while acknowledging the challenges that still lie ahead in refining AI-generated audio content.