Author:Pajupuu, Hille
  1. Introduction

    The voice conveys a lot of information about the speaker, which is why the voice has an important role in communication. Even if we cannot see the speaker, for instance in a phone conversation, we can create an image of them: what their native language is, along with their age, gender, emotional state (whether they are sad or happy, bored or excited), intentions, social status, character and even appearance. People have preferences as to which voices they like or do not like. People with likable voices are considered socially attractive: friendly, competent, self-assured and trustworthy (see McAleer et al. 2014, Schweitzer et al. 2017, Ueda et al. 2013). Many professions necessitate a pleasant voice, for example politicians, news presenters, customer support persons, teachers and voice actors. The last decade has seen a noticeable increase in devices that use the voice for communication and information transfer (e.g. smartphones, reading assistants, car applications). One criterion for choosing voices for technical solutions is their likability to a wide range of people, whether it is a human or synthesised voice.

    Likability we take to mean "how much we like a speaker based on the sound of her/his voice and manner of speaking" (Burkhardt et al. 2011). Schuller and Batliner (2014) consider likability a long-term personality trait. Previous studies have shown that although listeners' ratings may differ on an absolute scale, they concur in terms of which voices are likable or not (see Altrov et al. 2018, Ding et al. 2018, Goy et al. 2016, Obuchi 2017). A likable voice is describable by acoustic parameters. Depending on the field, studies have used either a classical set of features (e.g. voice pitch, energy, speaking rate) or a choice among all possible parameters for a subset optimised by the discriminatory power. Due to the studies' different cultural backgrounds, different aims and different parameter choices, the results are not always comparable and therefore generalisations about the acoustics of likable voices are difficult to form.

    Despite a marked increase in interest in the last few decades in the recognition of speaker traits and states from voices, there is still little research and knowledge about voice likability and its acoustics (see Schuller et al. 2015). Some studies have addressed cross-gender perception of voice likability/attractiveness and determined relevant acoustic parameters (e.g. Babel et al. 2014, Bruckert et al. 2006, Collins 2000, Fraccaro et al. 2013, Zuta 2009). Other studies have originated from various technical applications that use voices, for example studying a likable voice for speech synthesis (e.g. Coelho et al. 2008, Ding et al. 2018, Hinterleitner et al. 2014, Syrdal et al. 1998) or classifying voices based on likability (e.g. Coelho et al. 2011, Montacie and Caraty 2012, Pinto-Coelho et al. 2013, Schuller et al. 2012, 2015). Research has also gone into the relation between speaker age and voice likability (e.g. Deal and Oyer 1991, Gampel and Ferreira 2017, Goy et al. 2016) and handling questions about how to assess and annotate voice likability for speech corpora (e.g. Baumann 2017, Gallardo 2016, Gallardo et al. 2017, Schuller and Batliner 2014:170). A few studies have focused on the connections between culture, language and voice likability (e.g. Biadsy et al. 2008, Dahlback et al. 2007, Ding et al. 2017, 2018, Trouvain and Zimmerer 2017).

    In our study we tried to determine what the influence of culture is on voice likability. That is, how voice likability is perceived across cultures: whether people within a single culture perceive the same voices as likable and the same voices as unlikable, and whether people from different cultures like the same voices. More precisely, we were interested in which voices were perceived as likable by Finns and Estonians, who are geographically close and whose languages belong to the Finnic branch of the Uralic language family.

    1.1. Cross-cultural studies on voice likability

    There are remarkably few cross-cultural and different language studies on the likability perception of speech, but a few studies can be found on adjacent subjects (see Schuller et al. 2013, 2015). Dahlback et al. (2007) studied assessments by Americans and Swedes on a speaker's knowledge of the topic, voice likability and information quality in an infosystem intended for tourists, which spoke to them in English with either an American or Swedish accent. The listeners preferred voices that shared their accent. Researchers explained this with the similarity-attraction effect--people trust those who are similar to them. Biadsy et al. (2008) came to similar findings in their study on the charisma of voices speaking native and foreign languages. In their research, American, Swedish and Palestinian listeners had to rate political speech in Standard American English and Americans and Palestinians had to rate Palestinian Arabic speech from the aspect of charisma on a five-point Likert scale. Both experiments revealed that listeners gave native speech higher and non-native speech lower charisma ratings.

    Trouvain and Zimmerer (2017) came to contrary results while studying how voice attractiveness ratings were affected by speaking in another language. Germans, who assessed speech read by French and Germans (both groups reading in both languages), held French voices to be more attractive than German voices, both in the case of French and German speech. French-accented German speech was perceived as more attractive than the Germans' own native speech and French with a German accent. Therefore foreign-accented speech can be perceived as more attractive than native-accented speech and speakers of a foreign language can be perceived as more attractive than speakers of the listeners' native language. The authors held these results to mirror "the stereotypical picture of French as a popular and sympathetic language for German speakers".

    Studies by Ding et al. (2017, 2018) confirmed that there are prosodic features in voices that direct listeners to prefer the same voices among both native and non-native speech. The aim of these studies was to find a likable donor voice for speech synthesis. In the first study, Chinese and Germans rated Chinese voices (speaking Mandarin) and German voices, while in the second study, Chinese and Germans rated German voices. The results of both studies showed a strong correlation between both German and Chinese ratings for both native and non-native voices. Therefore, listeners of different cultural backgrounds perceived similar voices as likable, whether the speech was in their native language or a foreign one.

    Previous studies have given contradictory results concerning the influence of culture and language on the voice likability perception. With our study we wished to determine whether the Finnish and Estonian listeners' voice preference depends on the language heard or whether Finnish and Estonian listeners prefer the same voices irrespective of language and culture.

    1.2. On the connections between gender and voice likability

    Researchers of voice likability have been interested in whether likability ratings are affected by the gender of the speaker. The connection between voice likability and gender is still somewhat open. A study with Californian English speakers and listeners by Babel et al. (2014) revealed that while listeners found the same voices attractive, female voices were perceived as more attractive. In a study by Altrov et al. (2018), Estonian women and men rated the voice likability of Estonian female and male voices. Raters preferred female voices. A further study conducted in a Chinese-speaking context also showed a significant preference toward female voices (Chang et al. 2018).

    In contrast, in a study by Deal and Oyer (1991), English male voices were assessed as being more pleasant than female voices. In a study by Jokisch et al. (2018), where the charisma of German male and female politicians of different ages was rated, male voices also received higher scores. A study by Ueda et al. (2013) on Japanese voice likability showed that speaker gender had no significant effect on rating.

    Although the studies are for the most part incomparable, the contradictory results hint that speaker gender might have a different effect on voice likability assessment in different cultures. With our study we wished to add knowledge on the importance of gender in assessing female and male voice likability as exemplified by Finnish and Estonian cultures.

    1.3. On the connections between age and voice likability

    Voice likability perception may also be influenced by the age of the speaker and listener, varying from one culture to another. Previous research that has considered the effect of age on voice likability can roughly be divided in two--studies that confirmed the effect age has on voice likability ratings and studies that found no effect of age on voice likability ratings.

    The study by Deal and Oyer (1991) showed that age has an effect on likability. In their study, five groups of different-aged North American English-speaking listeners rated the likability of speakers of different ages. The results showed that younger speakers were rated as more likable. Weiss and Burkhardt (2012) also drew the same conclusions in their study, where German voices of three different age groups--youths, adults and seniors--were listened to, and where speakers from the younger group were more positively assessed than those from the older group. Goy et al.'s (2016) study also supported the effect of age on likability. They had English-speaking listeners of different ages rate younger and older voices for likability and suitability for voicing audiobooks. Comparing the ratings by younger and older listeners, they found that younger raters gave older voices lower scores. However, both groups considered voices rated as likable and suitable...

