In lieu of an abstract, here is a brief excerpt of the content:

  • Voice Mosaic – Talking to the Web
  • Martha Carrer Cruz Gabriel

Introduction

Voice interfaces are a fascinating subject. The human dream of talking to computers in a natural way is not new. Science fiction books and movies that live in our imagination present several examples of this aspiration, such as in old television and movie series like “Star Trek,” where the Enterprise’s staff talk to the ship systems and androids such as commander DATA; “Lost in Space,” where Will Robinson had in his robot a very loyal and confident friend; the conversations and human interactions with the robots C3PO and R2-D2 in “Star Wars”; and “Blade Runner,” with its androids and voice-driven interfaces among others [1].

Until recently, talking to computers was in the realm of fiction – the web has been largely mute and deaf. However, in the beginning of the 21st century, talking to computers has become possible and easy due to the enormous advances in speech synthesis and voice recognition technologies as well as the open standards adopted by the W3C (such as VoiceXML). The accuracy level reached by voice technologies now has allowed us to use them widely on the web.

The potential of using voice interfaces is explosive. From speech-only applications integrated into the whole web, to multimodal applications combining aural and visual abilities into web browsers, voice interfaces add to the flavor of the web a fundamental spice, which is surely going to impact it.

Tim Berners-Lee said at Speech TEK 2004, NY, “Speech technology is an important ingredient for the Web to realize its full potential.” In fact, voice interfaces on the web bring undeniable resources for several areas, such as convenience for mobile users, v-commerce, natural interactions, and usability. Beyond the more obvious utilizations for voice interfaces, the ability to talk to the web also provides an important way to improve web-accessibility – not only by multi-modal applications, but also through speech-only ones. Besides that, speech-only applications liberate users from any client computer device to access the Internet – in this case, all they need is any telephone in any place in the world. This is the alliance of the widest computing network with the most pervasive communication device on Earth – Internet and telephone.

However, talking to computers adds “ears” and “mouths” to the Internet organism, changing the way we interact with it, bringing new possibilities and new challenges as well. We must face the increasing complexity that voice interfaces bring to the web while we also open new channels for digital inclusion, provide more accessibility and increase mobility through voice. All these things affect the human role inside the high-tech social structure we live in, at once causing excitement and fear.

In this context, I created the Voice Mosaic – a web-art work that allows voice interactions on the web through the telephone, causing border dissolution between Internet and telephone. As said once by Hendrik Willem Van Loon [2], “The arts are an even better barometer of what is happening in our world than the stock market or the debates in congress.” I believe that artworks help people to understand and experience the new emergent technosocial world that surrounds us, where convergence and hybridization have become ubiquitous and easy, and “to talk to computers or the web” is going to become common.

Since the technologies used in Voice Mosaic can be used in other kinds of voice applications on the web, improving accessibility and digital inclusion, we will present next the work and its main aspects, regarding either the art concept or the technological implications. This artwork received several awards and was also presented at SIGGRAPH Art Gallery 2006, in Boston, MA (USA).

Voice Mosaic

The Voice Mosaic (Fig. 1) is a web-art application that combines speech and image, building a visual mosaic on the web with the chosen colors and recorded voices of people who interact with it from any place on the globe [3]. The voice interface, developed with open-standards in speech synthesis and voice recognition technologies (VoiceXML), works through phone calls from any telephone – mobile or not. To participate in English, call in US: (800) 289-5570 or (407...

pdf

Share