NottReal: A Tool for Voice-based Wizard of Oz studies

We present NottReal, an application designed for simulating Voice User Interfaces (VUIs) in Wizard of Oz studies. We briefly discuss the premise and advantages of the Wizard of Oz method before moving onto introducing the design of the application, which we have iteratively developed and refined through a number of studies.


THE WIZARD OF OZ METHOD
The recent growth in popularity of Voice User Interfaces (VUIs), from smartphone assistants (e.g. Siri) through to smart speakers (e.g. Amazon Echo) have led to a recent resurgence of research in the HCI (e.g. [10,16]) and CSCW (e.g. [12,15,17]) communities that examines the design and use of novel technologies such as 'natural language' interfaces. Implementing these sorts of technologies can be a complex, lengthy, and costly endeavour, involving a host of computational techniques including lookup [13], gestural/spatial recognition [7], robot control [18], mixed reality techniques [4], machine learning [3], or natural language processing [8]. Thus, when it comes to prototyping ideas or conducting research with these interfaces, the Wizard of Oz method or 'experiment' (often abbreviated to simply WOz or WoZ) is often used as part of the development process [13]. The method prescribes that rather than actually implementing all the elements of a digital system, the 'intelligence' of a machine can be performed by a human operator concealed from the participant, who is led to believe that the system or machine itself is 'intelligent' [5].
The method was originally referred to as "experimenter in the loop" [6, pp. 1-2] or given the epithet of "The Perfect System" [11, Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the owner/author(s). CUI '20, July 22-24, 2020, Bilbao, Spain  [2]. The story evolves around the characters' journey to meet a supposedly wonderful wizard that is later revealed to be a sham. The Wizard is, in fact, an 'ordinary' human behind a curtain controlling a machine. In other words, the wonderful wizard is an orchestrated illusion and this is what inspired the method's name.
The Wizard of Oz method proffers designers and researchers the ability to develop medium-fidelity prototypes [13] that can be used for the exploration and testing of ideas as part of iterative design processes and research, and feeds into a number of different forms of analyses, from qualitative Conversation Analysis [19] through to quantitative task performance analysis [1,9], as well as product development. However, these studies require a flexible tool to enable live performance of the VUI simulation to effectively respond to participants.

NOTTREAL
NottReal is a cross-platform Python-based desktop application for Wizard-controlled voice interface studies, where the intent detection and slot filling of typical natural language interfaces [14] is completed by a human operator.
The primary window (see Figure 1a) consists of a number of controls for simulating a VUI. Through a number of internal research studies and our experience of the design of VUIs, we have progressively refined the application for quick operation during a study. The controls include: 1) tabbed lists of pre-scripted messages, 2) entry for custom messages, 3) currently queued messages, 4) previously sent messages, 5) previously filled slots, and 6) options to log events and send messages with a loading animation. Additional options are also shown-these mostly originate from features which are enabled through command-line arguments. We iterate through this list in more detail below, explaining how each feature works in practice.

Main features and interaction design
We now work through these features and the basis for their design.
2.1.1 Message queue and delivery. NottReal, by default, queues messages to send to the participant and blocks queue processing while a message is being 'delivered' to participants, e.g. through a text-to-speech (TTS) engine. This allows for multiple messages to be queued up with the intent being that the messages be delivered sequentially. This has been useful for us in situations when a large dialogue may be delivered in chunks to the participant.
For delivering these messages, NottReal supports various TTS engines including CereVoice 1 (including support for CereVoice's  NottReal also has a simulated 'Mobile VUI' window (show in Figure 1b). We have used this on a second screen which faces participants, which has proved useful when we have wanted participants to face a camera throughout a study. The interface includes an 'orb' that displays the VUI's 'state'. The orb is a pink ring that fades in and out while the VUI is 'talking', a static blue ring while the VUI is 'resting', a white ring with a rotating 'slice' missing during 'computation', and a purple circle that flutters in brightness based on the microphone input sound level while 'listening'. In our experience, this provides a semblance of realism to the system.
We have also added the ability to interrupt the TTS engine and clear the queue by pressing Ctrl+C. This was useful in cases where a participant might try to interrupt the system by 'overtalking', as occurs if a user interrupts a consumer-grade VUI such as Siri or Alexa by uttering the 'wake word' during a response.

Pre-scripted messages.
NottReal is primarily designed to allow the Wizard to deliver a message 'quickly' to a participant (i.e. within what may be considered a reasonable time for a VUI's latency). The primary configuration of the application consists of pre-scripted messages, separated into categories (displayed as tabs). In our studies, this categorisation has been used to delineate messages used in different experimental conditions or at different stages 2 https://ss64.com/osx/say.html 3 https://activemq.apache.org/ of a study. Each message consists of a label for the Wizard to identify it, as well as the message to be delivered. Double-clicking on a message, or pressing the Enter key while one of these messages is highlighted, automatically queues a message to send to the participant, whereas single-clicking copies the message to the text box to enable customisation.

Slots and slot-tracking.
To allow for responses that incorporate participant/context-specific words (e.g. the words used by a participant in a request), NottReal allows messages to include 'slots', denoted in the pre-scripted messages as text [within square brackets]. If a pre-scripted message with a slot is selected, this message is automatically copied to the text box and the slot is highlighted (if there are multiple slots, the system moves between these on the press of the Tab or Enter keys). The Wizard can type the value of the slot and send the message with Ctrl/Cmd+Enter without moving the mouse pointer to select the slot. Previously used slot names and values are displayed in the window in a list.
Pre-scripted messages can also contain tracked slots, which are automatically substituted based on their previously substituted value if the pre-scripted message is double-clicked. This has been especially handy if multiple messages refer to the same thing (e.g. a participant's name). This can be enabled by appending an asterisk to the slot's name, e.g. [name*]. A pre-scripted message can clear the tracking for the slot by appending a dollar to the name, e.g.
[name$]. We found this beneficial for responding to state changes in the study (e.g. a subsequent stage, task, or condition).

Logging.
NottReal records all activities including message delivery with timestamps. We have used this in conjunction with logging and recording from other equipment to index and segment participant data and parse interaction data semi-automatically.

Summary
We have presented NottReal, an interactive tool to support Wizard of Oz studies by simulating a Voice User Interface, including both optional audio and visual output. Its main features include: 1) tabbed lists of pre-scripted messages, 2) entry for custom messages, 3) currently queued messages, 4) previously sent messages, 5) previously filled slots, and 6) options to log events and send messages with a loading animation.

Source code and continuing development
NottReal is open source and licensed under the MIT Licence, and is available from https://github.com/MixedRealityLab/nottreal/. We welcome support from the community in terms of bug reports and contributions and hope to expand the codebase to include support for additional TTS engines, offline automatic speech-to-text support, and features to make the software more capable in supporting the broadest possible range of voice-based Wizard of Oz studies.

ACKNOWLEDGMENTS
This work was supported by the Engineering and Physical Sciences Research Council [grant number EP/N014243/1] and the Department for International Development.
No new data were created for this paper.