Can AI help to create high-quality content in any language while adhering to corporate language and specific rules?
Today we’re interviewing David Heider, the owner of a STAR partner sound studio in the Czech Republic, to shed light on this fascinating question – can artificial intelligence be effectively used in the area of video and audio productions?
STAR: David, when did you start offering professional audio productions?
Our recording studio has been providing its services since 1999 and we’ve specialised in the spoken word. We cover two different areas. Firstly, the “corporate world”, with recordings of material for internal purposes, such as e-learning. This also includes localisation of internal company systems and software. This can be either training material or various web-based platforms with voice output or automatic operators on your phone, sat nav, etc.– in short, various applications where we often have to cut the sound word by word or even syllable by syllable and where everything is then put together by a system into sentences and whole messages.
The second area is more artistic in nature and covers advertising and promotional videos, among other content. This area differs from the “corporate world” previously mentioned in that it’s not just about conveying content, but rather about a form that appeals to listeners and attracts them. So we need professionals who can express themselves artistically and use their voice skilfully. To summarise, you might say that our first area of action is to provide information. This is about content where users, to put it more clearly, don’t have much choice, as they generally have to listen. In contrast, artistic productions aim to seduce the “audience” in some way, not only in terms of content but also their form.

STAR: This inevitably leads me on to the next question – can AI be used in your work?
AI is an amazing tool and offers numerous advantages. For example, we don’t need to contact a voice-over artist and make an appointment; the AI is always available.
STAR: Are you already using AI?
Yes. We use AI in some cases for preparing and producing audio material. But there’s also a downside. In most languages, the AI voice seems artificial or boring, especially after listening to it for a long time.
STAR: Can’t AI intonate?
Intonation in itself isn’t usually a problem, but the AI does it in unnatural inflections, which is really inconvenient. Often it doesn’t emphasise the core message, which a person would normally express through a particular emphasis. And when you listen to an AI recording, you get this unnatural inflection on repeat that starts to get annoying after a while, because you can’t shake the feeling that it’s actually just “copy-paste”. In comparison, I find it much better in English than in other languages, where the AI can work with variable intonation and make the voice sound very natural and lively. But in all the other languages, we still have a long way to go before we reach that point. At the moment, the other languages still sound very “plastic”.
STAR: Are there any other disadvantages to AI voices?
There’s a second point that I think is more serious, especially with e-learning. As with any AI, the quality of the output depends on the quality of the input. You also always have to prepare the content correctly for AI voices. Perhaps the AI doesn’t read all the abbreviations correctly, e.g. in the same way as you would read them in a specific corporate culture. Every company has its own corporate jargon and the AI won’t take this into account. This also applies to different product names, place names and foreign words. For example, if French names appear in English text, should it be read in French or English?
STAR: How can this be explained?
Only the employees at a company are really familiar with the corporate language and know why a certain linguistic rule can sometimes be ignored for internal company content or marketing reasons. And the listeners are insiders, i.e. they usually know what the content’s about. Companies also have to be consistent, otherwise it will sound strange to their ears. Sometimes, of course, a term or abbreviation can be misunderstood, either phonetically or for names, but that’s just the way it’s done at the company and we should respect it.
STAR: What other challenges are there?
Abbreviations and other specific features are a major challenge for AI. They usually need a lot of adjustments and corrections, which can result in the final price being similar to that of a traditional voice-over. We need to create pronunciation tips or edit the text so that it’s easy for the AI to read. This is very time-consuming – so AI makes little sense for a one-off project. In addition, we also “proof-listen”, i.e. do a listen-through to check, after the AI.
STAR: Don’t you “proof-listen” for human speakers too?
If there are two of us in addition to the speaker during the recording, we don’t do this any more because we can hear and check everything during the recording. The exceptions are languages that we don’t understand, such as Asian languages. But, in the case of AI, we don’t know beforehand what it knows and what it can read. I’ll give you an example. Let’s take the unit of a “megapascal”. This term has the abbreviation “MPa”, and the AI can read it as “em-pee-ay”, which is complete nonsense to a technical expert. So we’ve got to figure out how to get the AI to read it correctly as “megapascal”.
Sometimes we go through the recording and it seems right to us, but then the customer finds something that doesn’t fit their corporate culture. That’s why, while I think AI is a useful tool in certain informational texts that can make work faster and cheaper, and I’m happy to recommend it, in the hands of an inexperienced user, AI can behave unpredictably, and the end product will cause more disappointment than enthusiasm about the resources saved.
STAR: Is there a financial difference?
Yes, using AI reduces the budget to around half or two-thirds, as the work is mainly done by a machine and no voice professionals are involved in the process.
STAR: What do you do if a recording isn’t suitable for AI?
We are the guarantor of quality, and if we have serious and justified doubts about whether AI will lead to the right result, we’ll inform the customer. But customers also want to have personal experiences of this. I then try to point this out first by saying, “don’t be disappointed, but I don’t think AI is suitable for this particular project.” When I feel that I’ve outlined everything, I leave the decision up to them. But in some cases, customers themselves are unsure and are grateful for our support.
STAR: Thank you, David, for this very interesting discussion about AI in audio recordings.

AI voices aren’t yet perfect, and human voices are still winning the race. They’re able to convey emotions and leave a strong impression. However, AI voices are an inexpensive alternative. Please feel free to contact us for our advice.
David Heider,
owner of a STAR partner sound studio in the Czech Republic