Trying Voice Synthesis with COEIROINK v2 API in Python
I tried running the COEIROINK voice synthesis software from Python.
I'll specifically introduce the methods and results from that attempt.
The code is also uploaded on GitHub as basic-codes-coeiroink-api, so feel free to check it out!
What You Will Learn from This Article
In this article, I'll introduce specific steps and example code for running COEIROINK (v2) from Python.
For an overview of COEIROINK, its terms of use, and what AI voice synthesis is, check out my previous COEIROINK article.
Steps
If you want to execute with minimal code, follow these steps including the preparation.
You only need to do steps 2, 5, and 6 each time.
If you think this is too much hassle, check out the handy scripts.
-
Install Python and its libraries (only once)
- Install Python.
- I use version 3.10, but anything from 3.6 onwards should work.
- Install
requests
andpydub
via pip. json
is a standard library, so no need to install it if Python is already installed.
- Install Python.
-
Set up COEIROINK in a usable state via the GUI and start it.
- The method is described in my previous COEIROINK article.
-
Run the
check_speaker_info.py
script and note the results (only when adding new models). -
Based on the
check_speaker_info.py
execution results, updateSPEAKER_INFO
intext2audio.py
. -
Write the conversation content you want to convert to audio in
input.txt
. -
Run
text2audio.py
.
Code
Automatically create a conversation between AI Voice Actor Ayaka and AI Voice Actor Kanae from a text file.
We'll use the following three files.
-
check_speaker_info.py
(only when adding new models)- This script retrieves information about the voice models (
speaker
) and their speaking styles (style
). - Specify the obtained information in your program.
- You can run this script from any directory.
- This script retrieves information about the voice models (
-
text2audio.py
- Converts the conversation written in
input.txt
into an audio file. - Update
SPEAKER_INFO
based on the output fromcheck_speaker_info.py
.
- Converts the conversation written in
-
input.txt
- Written in the format "Voice Model Name: Dialogue."
- When adding voice models, ensure the voice model names match those in the program.
Here are the contents of each file.
The functionality of text2audio.py
is minimal, so customize it as needed!
check_speaker_info.py
This script retrieves information about the voice models (speaker
) and their speaking styles (style
).
Specify the obtained information in your program.
You can run this script from any directory.
import json
import requests
# Local server for communication
URL = "http://localhost:50032/"
# List speaker information
def check_speaker_info():
response = requests.get(URL+"v1/speakers")
speakers = response.json()
for speaker in speakers:
print("\n"+"="*30)
print(f"name: {speaker['speakerName']}\nid: {speaker['speakerUuid']}")
styles = speaker['styles']
print('styles:')
for style in styles:
print(f" {style['styleId']:>4} : {style['styleName']}")
check_speaker_info()
Here is an example output.
==============================
name: AI Voice Actor Ayaka
id: d1143ac1-c486-4273-92ef-a30938d01b91
styles:
50 : normalv2
==============================
name: AI Voice Actor Kanae
id: d41bcbd9-f4a9-4e10-b000-7a431568dd01
styles:
100 : normal
101 : joy A type
103 : joy B type
102 : happiness
==============================
name: Tsukuyomi-chan
id: 3c37646f-3881-5374-2a83-149267990abc
styles:
0 : calm
input.txt (example)
ayaka: 今日はいい天気だね~
ayaka: そうだ! お姉ちゃん、外に行こうよ!
kanae: うーん、でもちょっと寒くない?
text2audio.py
import json
import requests
from pydub import AudioSegment, playback
# Local server for communication
URL = "http://localhost:50032/"
# Voice model information
# Voice model name: [voice model (speaker) id, speaking style (style) id]
SPEAKER_INFO = {
"kanae": ['d41bcbd9-f4a9-4e10-b000-7a431568dd01', '100'], # AI Voice Actor Kanae
"ayaka": ['d1143ac1-c486-4273-92ef-a30938d01b91', '50' ], # AI Voice Actor Ayaka
}
def make_wav(text, speed_scale, speaker_info):
# Function to generate wav from text using the voice model
speaker_uuid, style_id = speaker_info
response = requests.post(
URL+"v1/predict",
json={
"speakerUuid": speaker_uuid,
"styleId": style_id,
"text": text,
"speedScale": speed_scale
}
)
if response.content == b'Internal Server Error':
print('Internal Server Error')
return response.content
def play_wav(wav):
# Function to play wav
playback.play(
AudioSegment(
wav,
sample_width=2,
frame_rate=44100,
channels=1
)
)
return 0
def write_wav_file(file_path, audio_data, sample_width=2, sample_rate=44100, channels=1):
# Function to write wav file
# file_path: Path to save the file
# audio_data: Audio data to save (binary string)
# sample_width: Sample bit width (byte size) default is 2 (16bit)
# sample_rate: Sample rate default is 44100Hz
# channels: Number of channels default is 1 (mono)
audio_segment = AudioSegment(audio_data, sample_width=sample_width, frame_rate=sample_rate, channels=channels)
# Save as wav file
audio_segment.export(file_path, format="wav")
# Main process
def text2audio():
# Read text file
with open("input.txt", "r", encoding='utf-8') as file:
# Read each line as a list
lines = file.readlines()
speed_scale = 1.0 # Speaking speed
# Voice synthesis
audio_data = b''
for line in lines:
speaker, text = line.split(":")
w0 = make_wav(text, speed_scale, SPEAKER_INFO[speaker])
# _ = play_wav(w0) # Use this to play each part individually
audio_data += w0
# Output concatenated audio
play_wav(audio_data) # Play
write_wav_file("output.wav", audio_data) # Save
text2audio()
Results
Here are the results of running the above code.
The intonation may still be a bit unnatural, but considering it's free and easy to use, it's pretty impressive.
If you find it insufficient, I've included hints for further adjustments in the next section, so take a look!
Handy Scripts
There are too many manual operations, so I've made things a bit easier with these three scripts.
The procedure is now as follows:
-
Run
start.bat
.-
COEIROINK will start automatically.
-
The API documentation page will also open automatically.
- If you want to improve the program, refer to this! It might be a bit hard to understand...
-
check_speaker_info.py
will also run automatically.- When adding models, update
SPEAKER_INFO
intext2audio.py
based on this output.
- When adding models, update
-
-
Write the conversation content you want to convert to audio in
input.txt
. -
Run
run.bat
.text2audio.py
will run automatically.
Please note that this assumes the following folder structure.
(It's neat to keep the scripts organized in a folder.)
- scripts
text2audio.py
check_speaker_info.py
connect.ps1
input.txt
run.bat
start.bat
start.bat
You need to replace the path of COEIROINKv2.exe
with the correct one,
so make sure to edit the second line before using it!
@echo off
start C:\Users\xxxxxxxxxxxxx\COEIROINKv2.exe
powershell -ExecutionPolicy Bypass -File "./scripts
/connect.ps1"
python ./scripts/check_speaker_info.py
pause
connect.ps1
A file called by start.bat
.
This script will open the page in Chrome once the local server is ready for communication.
If the path to Google Chrome is different, edit the second-to-last line.
$maxAttempts = 100
$waitTimeSeconds = 0.5
for ($i = 1; $i -le $maxAttempts; $i++) {
try {
$response = Invoke-WebRequest -Uri "http://localhost:50032/docs" -Method Head
if ($response.StatusCode -eq 200) {
Write-Host "HTTP connection successful: HTTP status code 200 OK"
break
} else {
Write-Host "HTTP connection succeeded, but the HTTP status code is not 200."
}
} catch {
Write-Host "waiting for HTTP connection..."
} finally {
Start-Sleep -Seconds $waitTimeSeconds
}
}
$chromePath = "C:\Program Files\Google\Chrome\Application\chrome.exe"
Start-Process -FilePath $chromePath -ArgumentList "http://localhost:50032/docs"
run.bat
A batch file to execute text2audio.py
.
python ./scripts/text2audio.py
pause
Summary
I introduced specific methods and results for running COEIROINK from Python.
In the future, I plan to explore more advanced applications.