Speech recognition can be complex, but it doesn't have to be as Whisper.php can help to simplify the process for you. Whisper.php is a PHP wrapper for whisper.cpp, a C/C++ port of OpenAI's Whisper model.
This package was created by Kyrian Obikwelu who recently released v1.0.0, enabling fully local and API-free transcription directly in your projects. It provides:
Whisper.php requires the FFI (Foreign Function Interface) extension to be installed and enabled in PHP. This extension allows you to interact with C libraries directly from PHP.
Assuming you have FFI enabled, to install Whisper.php you would run:
composer require codewithkyrian/whisper.php
Whisper.php offers both low-level and high-level APIs.
– The low-level API provides fine-grained control over the transcription process, closely mirroring the original C implementation.
– The high-level API offers a simpler, more abstracted interface for a streamlined workflow.
For the purpose of this article we will use the High-level API.
use Codewithkyrian\Whisper\Whisper;
use function Codewithkyrian\Whisper\readAudio;
use function Codewithkyrian\Whisper\toTimestamp;
// Transcribe Audio
$whisper = Whisper::fromPretrained('tiny.en', baseDir: __DIR__.'/models');
$audio = readAudio(__DIR__.'/audio/laravel-news-227-sample.mp3');
$segments = $whisper->transcribe($audio, 4);
// Output transcribed segment data
foreach ($segments as $segment) {
echo toTimestamp($segment->startTimestamp) . ': ' . $segment->text . "\n";
}
Whisper.php relies on some platform-specific shared libraries. As such they will be automatically downloaded the first time you initialize a model with Whisper::fromPretrained() and stored in our models directory. The initial download will cause a slight delay on the first run, but thankfully once the libraries are cached, subsequent runs will perform much faster. Some of the supported Whisper base models are: tiny.en, base, base.en among others.
Next, the readAudio() function simplifies audio processing by resampling it to 16kHz, a balance between audio quality and efficiency. This captures the core frequencies of human speech while reducing the amount of data to process.
The transcribe() method then takes the resampled audio and breaks it up into segments with start and end timestamps along with the text, which we can output in our desired format.
As a test we used a recent episode of the Laravel News Podcast. As you can see, it is not perfect but it does a good job.
– The output would look like the following:
00:00:00,000: Hey everybody how's it going welcome to the level this podcast episode 227 today is November
00:00:05,040: 26th
00:00:06,400: 2024
00:00:07,680: Glad to have you hanging out with us and glad that Michael finally figured out his microphone...
You can learn more about this package and view the source code on GitHub.
All Comments