Voice assistants are a core technology for human-machine interaction and provide access to product offerings and services via natural language. So far, companies in the US and Asia have dominated the market for voice assistance technology. However, the demand for voice assistant solutions in German production and retail industries is enormous, especially with regard to data sovereignty, as there is a need for better protection and the secure exchange of personal data. A German-made voice assistant solution would make this possible by implementing European standards of data security. At the same time, a new level of quality in human-machine communication that goes far beyond the semantic capabilities of current systems is enabling much more user-friendly systems.
To this end, experts from the fields of speech signal processing, natural language understanding, artificial intelligence and software engineering have joined forces at Fraunhofer IIS and Fraunhofer IAIS. Fraunhofer IIS already holds a world-leading position in the field of acoustic signal processing technology, which forms the basis for the high reliability and robustness of speech processing. Fraunhofer IAIS has developed leading algorithms in the field of automatic speech recognition and question answering. The goal is to further expand this technological leadership and integrate it into a scalable, multilingual and open voice assistant platform. Fraunhofer technology can then be adapted to specific company requirements and support the data sovereignty required in the production and retail industry.
As part of the »Artificial intelligence as a driver for economically relevant ecosystems« innovation competition Fraunhofer is working on a concept for SPEAKER, a large-scale research and development project supported by funding from the German Federal Ministry for Economic Affairs and Climate Action.
The SPEAKER project seeks to develop a leading German-made voice assistant platform for business-to-business (B2B) applications. This platform should be open, modular and scalable and provide technologies, services and data via service interfaces. The SPEAKER platform will be embedded in a comprehensive ecosystem made up of big industry, SMEs, start-ups and research partners who secure high innovation capabilities. The Fraunhofer Institutes for Intelligent Analysis and Information Systems IAIS and for Integrated Circuits IIS, which already possess the relevant technologies and experience in the field of voice assistant technologies, platforms (e.g. AI4EU – European AI on-demand platform) and global marketing strategies for voice and audio technologies (e.g. MP3), will ensure the development of the platform and the ecosystem.
The two Fraunhofer Institutes IIS and IAIS have conducted workshops with numerous companies to establish requirements, determine obstacles and recommend actions that will serve as a basis for platform design and development. Key arguments for a German-made voice assistant platform include data protection, security, privacy and trust. The lack of these has become evident particularly from the recently reported incidents of non-GDPR-compliant speech analysis by Google, Alexa and Siri. This is all the more applicable in the B2B environment, where internal company data needs to be protected. The SPEAKER platform therefore addresses the issues of data and technology sovereignty in this important emerging field of human-machine communication. Requirements were also identified with respect to domain-specific customizability, flexibility in the choice and use of modules, open interfaces to databases and applications, multilinguality, paralinguistics (e.g. recognizing emotions in voices) and participation and development of a user community. In parallel with the survey of requirements, current market research predicting strong growth in the voice assistant market was evaluated. On average, a 25 percent annual increase in devices with voice assistant functions is expected in the next four years.
The SPEAKER platform’s aim is to provide open, transparent and secure voice assistant applications. To achieve this, leading technologies for audio preprocessing, speech recognition, natural-language understanding (NLU), question answering (QA), dialogue management and speech synthesis by means of artificial intelligence (AI), and machine learning must be made available for simple, uncomplicated use. These key modules will be used to develop industrial voice assistant applications that, in turn, can be made available to other market players via the platform in the form of ready-made skills.
Compared with existing voice assistant environments (Alexa, Google Assistant), the following key characteristics are guaranteed and highlighted: modularity, data protection and privacy, openness with respect to technologies, connectivity and dissemination through an open ecosystem, and innovation capability. In addition, data diversity for B2B applications will be made possible by providing a data platform and integrating data and application partners. The infrastructure of the SPEAKER platform will enable data exchange (community approach), with international networks (MetaNet, European Language Grid) providing access to numerous language corpora. The SPEAKER platform will use industrial scaling mechanisms (e.g. Docker, Kubernetes, Redis). To this end, SPEAKER is working with the German company iNNOVO Cloud. This cooperation enables us to guarantee not only scalability, but also data protection based on GDPR principles. After the platform is transferred to the operating company, the public launch of the platform will help it become established quickly, setting up SPEAKER for a sustainable future. SPEAKER will be offered at a similar cost to established platforms and will focus primarily on B2B applications.
Collaborative partners, also called consortium partners, who have agreed to define a use case and implement it together with the SPEAKER consortium.
Companies, associations, municipalities or other organizations that do not apply for funding can be included as associated partners in the project network and thus benefit from free access to the SPEAKER platform during the implementation phase.
Due to the current COVID-19 pandemic, no face-to-face events are currently planned. If you register for our Infomail service, we will be happy to keep you informed about upcoming events (virtual or analog).
Hub.Berlin on 18. and 19.04.2021 in Berlin
Fachseminar „Smart Living – intelligent, vernetzt, energieeffizient“
on 16. and 17.09.2020 in Nürnberg
Hannover Messe 2020 from 13.07. to 27.07.2020 in Hannover
1st International Workshop on Language Technology Platforms (IWLTP 2020)
on 16.05.2020 in Marseille
Voice Connected Business on 14. and 15.05.2020 in Frankfurt
Start of the implementation phase of the SPEAKER Project on 01.04.2020
ITG-Fachgruppentreffen „Signalverarbeitung und maschinelles Lernen“
on 06.03.2020 in Sankt Augustin
ITG Workshop Sprachassistenten on 03.03.2020 in Magdeburg
Submission of the overall project description on 15.10.2019
Opening Ceremony Forum Digitale Technologien & Announcement of the winners of the
KI-Innovationswettbewerbs on 19.09.2019 in Berlin
lecture series at Fraunhofer IIS about Natural Language Processing with Dr. Xin Wang
on 13.09.2019 in Erlangen
Submission of the implementation concept for the implementation phase 16.08.2019
Project intern workshops
07.04.2020 Projekt Kick-Off
30.07.2020 Voice UX Workshop
08.10.2020 1st Milestone Meeting
13.11.2020 Data Annotation Workshop
26.11.2020 Plattform Workshop
09.12.2020 Model workshop for speech recognition
04.03.2021 Workshop Dialogmanager, Dialogeditor und NLU
16.03.2021 Multimodality Workshop
18.03.2021 Workshop Text-to-Speech
15.04.2021 2nd Milestone Meeting
24.06.2021 Data Annotation Workshop
20.12.2021 Drittes Meilensteinmeeting
24.02.2022 Question Answering over Knowledge Graphs
sponsores, promoter, collaborative partners | collaborative- and associated partners | collaborative partners
The AudioLabs System for the Blizzard Challenge 2023
F. Zalkow et al.: ISCA Proceedings, 2023
Subjective Evaluation of Text-to-Speech Models: Comparing Absolute Category Rating and Ranking by Elimination Tests
K. Kayyar, C. Dittmar, N. Pia, and E. A. P. Habets: ISCA Proceedings, 2023
Multi-Speaker Text-to-Speech Using ForwardTacotron with Improved Duration Prediction
K. Kayyar, C. Dittmar, N. Pia, and E. A. P. Habets: ITG Conference on Speech Communication Proceedings, 2023
Low-Resource Text-to-Speech Using Specific Data and Noise Augmentation
K. Kayyar, C. Dittmar, N. Pia, and E. A. P. Habets: EUSIPCO Proceedings, 2023
Improving the Naturalness of Synthesized Spectrograms for TTS Using GAN-Based Post-Processing
P. Sani, J. Bauer, F. Zalkow, E. A. P. Habets, and C. Dittmar: ITG Conference on Speech Communication Proceedings, 2023
Evaluating Speech–Phoneme Alignment and its Impact on Neural Text-To-Speech Synthesis
F. Zalkow, P. Govalkar, M. Müller, E. A. P. Habets, and C. Dittmar: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2023
Uncertain yet rational - Uncertainty as an Evaluation Measure of Rational Privacy Decision-Making in Conversational AI
A. Leschanowsky, B. Popp, and N. Peters: 25th International Conference On Human-Computer Interaction, 2023
Knowledge Distillation Meets Few-Shot Learning: An Approach for Few-Shot Intent Classification Within and Across Domains
A. Sauer, S. Asaadi, and F. Küch: NLP Proceedings, 2022
WoS - Open Source Wizard of Oz for Speech Systems
B. Brüggemeier & P. Lalone: IUI Proceedings, 2019
A Comparison of Recent Neural Vocoders for Speech Signal Reconstruction
P. Govalkar, J. Fischer, F. Zalkow & C. Dittmar: ISCA SSW Proceedings, 2019
Segmenting multi-intent queries for spoken language understanding
R. Shet, E. Davcheva & C. Uhle: ESSV Proceedings, 2019
Privacy in Speech Interfaces
T. Bäckström, B. Brüggemeier & J. Fischer: ITG News, 2020 (not available online)
User Experience of Alexa, Siri and Google Assistant when controlling music – comparison of four questionnaires
B. Brüggemeier, M. Breiter, M. Kurz & J. Schiwy: HCII 2020 – Late Breaking Papers Springer LNCS Proceedings, Copenhagen, Denmark, 2020 (nicht frei verfügbar)
User Experience of Alexa when controlling music – comparison of face and construct validity of four questionnaires
B. Brüggemeier, M. Breiter, M. Kurz & J. Schiwy: 2nd Conference on Conversational User Interfaces (CUI 2020), Bilbao, Spain, 2020
Development of a leading language assistance platform
B. Brüggemeier, J. Fischer, D. Laqua, C. Möller, R. Usbeck, K. Wagener, H. Wedig, P. Theile, D. Steinigen & C. Dittmar: Schlussbericht zum Vorhaben SPEAKER, 2020 (not available online)
Message Passing for Hyper-Relational Knowledge Graphs
M. Galkin, P. Trivedi, G. Maheshwari, R. Usbeck & J. Lehmann: 2020
Language Model Transformers as Evaluators for Open-domain Dialogues
R. Nedelchev, J. Lehmann & R. Usbeck: Proceedings of the 28th International Conference on Computational Linguistics, pages 6797–6808, Barcelona, Spain (Online), 2020. International Committee on Computational Linguistics
Towards an interoperable ecosystem of AI and LT platforms: A roadmap for the implementation of different levels of interoperability
G. Rehm, D. Galanis, P. Labropoulou, S. Piperidis, M. Welß, R. Usbeck, J. Köhler, M. Deligiannis, K. Gkirtzou, J. Fischer, C. Chiarcos, N. Feldhus, J. Moreno Schneider, F. Kintzel, E. Montiel-Ponsoda, V. Rodríguez-Doncel, J. Philip McCrae, D. Laqua, I. P. Theile, C. Dittmar, K. Bontcheva, I. Roberts, A. Vasiljevs & A. Lagzdins: G. Rehm, K. Bontcheva, K. Choukri, J. Hajic, S. Piperidis, and A. Vasiljevs [editors]: Proceedings of the 1st International Workshop on Language Technology Platforms, IWLTP@LREC 2020, Marseille, France, 2020, pages 96–107. European Language Resources Association, 2020
User Preference and Categories for Error Responses in Conversational User Interfaces
S. Yuan, B. Brüggemeier, S. Hillmann & T. Michael: 2nd Conference on Conversational User Interfaces (CUI 2020), Bilbao, Spain, 2020 (Registrierung notwendig)
Crowdsourcing Ecologically-Valid Dialogue Data for German
Y. Frommherz and A. Zarcone: Frontiers in Computer Science 2021
New Domain, Major Effort? How Much Data is Necessary to Adapt a Temporal Tagger to the Voice Assistant Domain
T. Alam, A. Zarcone and S. Padó: Proceedings of the 14th International Conference on Computational Semantics (IWCS), June 2021, Groningen, The Netherlands (online), Association for Computational Linguistics
Design Implications for Human-Machine Interactions from a Qualitative Pilot Study on Privacy
Leschanowsky, A., Brüggemeier, B., Peters, N. (2021) Design Implications for Human-Machine Interactions from a Qualitative Pilot Study on Privacy. Proc. 2021 ISCA Symposium on Security and Privacy in Speech Communication, 76-79, doi: 10.21437/SPSC.2021-16
Fraunhofer - A Lightweight Neural TTS System for High-quality German Speech Synthesis
Govalkar, A. Mustafa, N. Pia, J. Bauer, M. Yurt, Y. Özer, C. Dittmar: „A Lightweight Neural TTS System for High-quality German Speech Synthesis“. ITG Speech Communication, Kiel, 2021
Not So Fast, Classifier – Accuracy and Entropy Reduction in Incremental Intent Classification
Hrycyk, A. Zarcone, L. Hahn: „Not So Fast, Classifier – Accuracy and Entropy Reduction in Incremental Intent Classification“. Proceedings of the 3rd Workshop on Natural Language Processing for Conversational AI, Punta Cana, 2021
Small Data in NLU: Proposals towards a Data-Centric Approach
Zarcone, J. Lehmann, E. Habets: „Small Data in NLU: Proposals towards a Data-Centric Approach“. Proceedings of the NeurIPS Data-centric AI Workshop, 52-67, 2021
Success is not Final; Failure is not Fatal – Task Success and User Experience in Interactions with Alexa, Google Assistant and Siri
Kurz, M., Brüggemeier, B., & Breiter, M. (2021). Success is not Final; Failure is not Fatal – Task Success and User Experience in Interactions with Alexa, Google Assistant and Siri. HCI.
Perceptions and reactions to conversational privacy initiated by a conversational user interface
Birgit Brüggemeier, Philip Lalone, Perceptions and reactions to conversational privacy initiated by a conversational user interface, Computer Speech & Language, Volume 71, 2022, 101269, ISSN 0885-2308, https://doi.org/10.1016/j.csl.2021.101269
Adapting Debiasing Strategies for Conversational AI
Leschanowsky A., Popp B., Peters N. (2022): Adapting Debiasing Strategies for Conversational AI. In: Proceedings of the International Conference on Privacy-friendly and Trustworthy Technology for Society – COST Action CA19121 – Network on Privacy-Aware Audio- and Video-Based Applications for Active and Assisted Living (nicht online verfügbar)
Predicting Request Success with Objective Features in German Multimodal Speech Assistants
Mareike Weber, Mhd Modar Halimeh, Walter Kellermann, and Birgit Popp, Predicting Request Success with Objective Features in German Multimodal Speech Assistants, Proceedings of Human Computer Interaction International HCII 2022, LNAI 13336, Artificial Intelligence in HCI , volume 35.
Chatbot Language - crowdsource perceptions and reactions to dialogue systems to inform dialogue design decision
Birgit Popp, Philip Lalone, Anna Leschanowsky. Chatbot Language – crowdsource perceptions and reactions to dialogue systems to inform dialogue design decisions. Behavior Research Methods, 2022.