ARIA-AT Automation Driver

A WebSocket server which allows clients to observe the text enunciated by a screen reader and to simulate user input

aria-at-automation · aria-at-automation-harness · aria-at-automation-driver · aria-at-automation-results-viewer

Requirements

Microsoft Windows
Node.js, including "Tools for Native Modules" as offered by the Node.js installer
the Microsoft Visual C++ runtime

note for project maintainers

"Tools for Native Modules" is required to install the "robotjs" npm module, which is a dependency of this project.

The Visual C++ runtime includes VCRUNTIME140.dll, which is required by the automation voice.

Terminology

message - a JSON-formatted string that describes some occurrence of interest, emitted at the moment it occurred; the message should be a JSON object value with two string properties: type and data
message type - one of "lifecycle", "speech", or "error"
- "lifecycle" - signifies that the message data is an expected lifecycle of the automation voice (e.g. initialization and destruction)
- "speech" - signifies that the message data is text which a screen reader has requested the operating system annunciate
- "error" - signifies that an exceptional circumstances has occurred
message data - information which refines the meaning of the message type

Protocol

This project implements AT Driver, a protocol published in 2024 as a W3C Draft Community Group Report. That document exhaustively describes the JSON-encoded WebSocket protocol. Please file a bug report against this project if you observe any discrepencies with AT Driver.

Architecture

This tool is comprised of two main components: a text-to-speech voice and a WebSocket server.

Text-to-speech voice

The text-to-speech voice is written in C++ and integrates with the Microsoft Speech API (SAPI). Because it interfaces with the Windows operating system (that is: "below" the screen reader in the metaphorical software stack), it can observe speech from many screen readers without coupling to any particular screen reader.

The voice has two responsibilities. First, it emits the observed speech data and related events to a Windows named pipe. This allows the second component to present a robust public interface for programmatic consumption of the data. (The named pipe is an implementation detail. Neither its content nor its presence is guaranteed, making it inappropriate for external use.)

Second, the voice annunciates speech data. It does this by forwarding speech data to the system's default text-to-speech voice. This ensures that a system configured to use the voice remains accessible to screen reader users.

WebSocket server

The WebSocket server is written in Node.js and allows an arbitrary number of clients to observe events on a standard interface. It has been designed as an approximation of an interface that may be exposed directly by screen readers in the future.

Contribution Guidelines

For details on contributing to this project, please refer to the file named CONTRIBUTING.md.

License

Licensed under the terms of the MIT Expat License; the complete text is available in the LICENSE file.

Copyright for portions of AT Driver are held by Microsoft as part of the "Sample Text-to-Speech Engine and MakeVoice" project. All other copyright for AT Driver are held by Bocoup.