MAKING THE GUI TALK (1991) by Rich Schwerdtfeger
Seminal how-to article on development of Graphical User Interface (GUI) Screen Readers for Windows OS/2 and the X Windows System.
This article originally appeared (in print) in BYTE magazine, December 1991.
Reproduced with the kind permission of its author Rich Schwerdtfeger
New technology holds promise for blind and learning-disabled people who live in a GUI-oriented world
AUTHOR: Richard S. Schwerdtfeger
More than 120,000 people in the U.S. are legally blind, according to figures put out by the American Foundation for the Blind in New York. In addition, the National Institutes of Health in Washington, D.C., states that 10 percent to 15 percent of the U.S. population have learning disabilities.
For some years now, to communicate with the sighted world, blind people have used computers via screen-reader systems (SRDs) that can access a physical display buffer of characters and orally present the information (see “Opening Doors for the Disabled,” August 1990 BYTE). In the past few years, however, the widespread trend toward the development and use of GUIs has caused a multitude of problems for the visually impaired and some learning-disabled people. The advent of GUIs has made it virtually impossible for blind people to use much of the industry’s most popular software and has limited their chances for advancement in the now graphically oriented world.
The coming of GUIs has left some SRDs with no means to access graphical information. GUIs do not use a physical display buffer, but instead employ a pixel-based display buffer. Fortunately, new technology is coming into existence to correct this problem. Companies are now enhancing SRDs to work by intercepting low-level graphics commands and constructing a text database that models the display.
Developers have created new screen-reader software and combined it with a common user interface. One prototype of this type of application is Screen Reader/PM from IBM. It is an enhancement of the original DOS Screen Reader that the company brought out four years ago. The research project has a current installed base of more than 40 beta-test sites.
With a system like this, the disabled user can maneuver a mouse over the display and use the keyboard or a separate keypad, and a voice synthesizer will actually describe an icon the GUI has displayed or the graphical text shown on the screen. This new technology has alleviated many of the problems that GUIs created for blind and some learning-disabled users, and their future now looks much more promising. Recent technological developments are putting the computer back into the hands of blind and learning-disabled people.
Also spurring development of adaptive technology for the disabled are two new laws. An amendment to Public Law 506 (29 U.S.C. 794d) states that a company can sell a product to the U.S. government only if that device is accessible to people with disabilities. The Americans with Disabilities Act, signed into law by President Bush in 1990, states, in part, that if your company has more than 25 employees, you must provide reasonable accommodation (including technology) in the form of job adaptations for the handicapped.
Taking the First Step
A package called OutSpoken is available that reads and vocalizes a GUI. It was designed and developed in 1989 by Berkeley Systems (BSI) for the Macintosh GUI. OutSpoken is a screen reader that communicates through voice synthesis with blind users as they move the mouse around the GUI and come in contact with text and graphical objects on the screen.
BSI realized that it had to construct a database that the screen- reader software could access. This database, called an off-screen model (OSM), is the conceptual basis for GUI screen readers that vendors are developing today. The advent of OutSpoken was a significant breakthrough for the blind community.
Other companies and software developers are considering how to use this new technology to make their systems accessible to the disabled. I will define what an OSM is, how it is created, and how SRDs can access and use it. The information in this article is based on my work on the Screen Reader project at IBM’s T. J. Watson Research Center in Yorktown Heights, New York.
An OSM is a database reconstruction of what is visible on the screen and also what is not visible on the screen. A database must manage information resources, provide utilities for managing these resources, and supply utilities to access the database. The resources the OSM must manage are text, off-screen bit maps, icons, and cursors.
Text is obviously the largest resource the OSM must maintain. It is arranged in the database relative to its position on the display or to the bit map on which it is drawn. This situation gives the model an appearance like that of the conventional display buffer. Each character must have certain information associated with it: foreground and background color; font family and typeface name; point size; and font style, such as bold, italic, strike-out, underscore, and width.
Merging text by baseline (i.e., position on the display) gives the model a physical display buffer appearance with which current SRDs are accustomed to working. Therefore, to merge associated text in the database, the model combines text in such a way that characters in a particular window have the same baseline. Each string of text in the OSM has an associated bounding rectangle used for model placement, a window handle, and a handle to indicate whether the text is associated with a particular bit map or the display.
GUI text is not always placed directly onto the screen. Text can be drawn into memory in bit-map format and then transferred to the screen. It can also be clipped from a section of the display and saved in memory for placement back onto the screen later. An example of this is a drop-down menu: Text in the area to be overlaid by the menu is transferred to memory and later tranferred back onto the screen when the menu is deleted.
The operation required to transfer bit maps is called a BitBlt, or bit block transfer. GUIs use BitBlts for speed. For example, in a drop- down menu, it is much faster to transfer the pixels cut from the screen back onto the screen than to have the application redraw or “paint” that portion of the screen from scratch.
Icons are pictorial representations of an idea or object. In the GUI, icons represent many things, ranging from programs that are currently running to objects that receive user input to perform a specific action (e.g., a push button or check box). Icons are read, identified, and then stored as single text images or as completely separate, nontext entities.
People who are blind need to be able to use icons the way they use text-in other words, they need to be able to hear a verbal description of each icon. When an application is designed without a provision for keyboard access to particular icons, the blind user needs a mouse to locate the icons. A screen-reader application must be able to track mouse movements, match their positions with OSM icons, and vocalize the associated icon names. The OSM must also provide cursor information so a screen reader can vocalize the user’s cursor position on the display.
In windowing systems, more than one application can be displayed on the screen. However, your keyboard input is directed to only one active window at a time. But each active window is often made up of many child windows. The applications cursor is placed in at least one of these child windows. The OSM must keep track of a cursor’s window identification (i.e., handle) so that when a window becomes active, SRDs can determine if it has a cursor and vocalize it. An SRD cursor is the area on the display where the next potential action will occur or where users can enter their next text. An example of a cursor is the blinking insertion bar in an editor or the rectangular highlighting text in a menu of options.
The OSM must keep track of the cursor’s screen position, dimensions, the associated text string (i.e., speakable cursor text), and string character position. If the cursor is a blinking insertion bar, its character position is that of the associated character in the OSM string. In this case, the cursor’s text is the entire string.
A wide rectangular cursor is called a selector, since it is used to isolate screen text to identify an action. Examples of selector cursors are those used for spreadsheets or drop-down menus. The text that is enclosed within the borders of the selector is the text that the screen reader would speak.
Now, I’ll build a toolkit of utility functions that you can use for constructing and maintaining the resource information.
The software performing the display reconstruction is driven from low- level GUI graphics functions. Therefore, with some exceptions, OSM utilities used to construct and maintain the model appear like operations being performed by the graphics engine. These operations are performed on text or database representations of bit maps, cursors, or icons, as opposed to pictures or actual bit maps.
The utilities required to handle text are clipping, erasure, text merger, and text transfer. These tools must handle icons as well as text. Developers do not normally worry about clipping text to the windows in their GUI applications. Text running outside the window must be clipped off by the low-level graphics routines, and the code constructing the model must also perform the same operation. Therefore, the model must provide tools to clip the text to the clip region supplied by the software constructing the model. This region is represented by an array of rectangles defining the visible domain.
You can use text-erasing tools when placing one window over another or when you reduce the size of a window. You can use text-merger tools when you combine text with other text having the same handle and screen baseline. Simply put, text is merged into an OSM bit-map representation using common baselines.
Performing a BitBlt requires tools to transfer functions for text and cursors-for example, if you are moving a window from one part of the display to another, or when text that was previously drawn to a memory bit map is transferred to the screen. Therefore, tools are also required for moving text between the visible display portion of the database and the nonvisible off-screen portion where OSM representations of bit maps are stored.
Now, on to building the model. The construction code, called the patch code, is operating-system dependent, because specific patch-code function calls are patched or hooked in with a subset of the low-level graphics calls. The calls to hook are those that draw text; perform BitBlts; save bit maps to display card memory; restore bit maps from display card memory; select and unselect bit maps for drawing; delete bit maps; map display buffers to windows; draw rectangles, boxes, borders, and regions; and update cursors.
For each of these graphics calls, you have a corresponding patch function call. The job of these patch functions is to directly mimic the low-level calls by performing the same operations on the model. For instance, the instruction drawtext would draw text on the screen as well as merge text into the model.
How you hook these calls depends on the GUI system architecture. X Window System is based on a client/server model. X applications (clients) communicate with the X server by sending and receiving messages over a socket. When the X server initializes, it creates a socket that clients use to establish new connections and returns the socket address to the calling environment. When a new client starts, it uses that socket address to connect to the X server.
To establish a new connection, the client connects to the socket and receives a new socket to use for communication. When the X server accepts a connection, it receives a new socket connected only to the new client, leaving the original socket open for new connections.
SRD patch software intercepts this communication by changing the socket address that clients use to connect to the X server. New clients use the modified socket address and connect directly to the screen reader instead of the X server. The SRD then connects to the real X server on behalf of the client and manages all socket I/O between the client and server.
Screen Reader’s View
Regarding the screen reader’s interface to the model, the OSM must notify the SRD of cursor changes and window updates. Cursor changes and window-update notifications are performed either by setting flags, in the case of polling, or by various means of interprocess communication (IPC) on multitasking systems.
OSM change notifications result in the SRD’s referring to the OSM to check for important updates. Therefore, the SRD software writer needs to reduce the OSM viewing area to a particular application or window. This area of the model is called a view, and a subset of this view is called a viewport. OSM software must provide tools to set up and maintain views and viewports.
An example of the way you use views and viewports is the Lotus 1-2-3/G spreadsheet shown in the screen. To construct a view of the spreadsheet, the SRD must pass the OSM the spreadsheet’s bounding rectangle and all window identifications composing the spreadsheet. The OSM then extracts all view text and sorts it by baseline. The user then might ask the SRD to isolate a specific spreadsheet column for ease of use. Upon such a request, the SRD vocalizes each column (i.e., viewport) entry.
Turning On the Lights
Here’s an example of how text drawn by an application GUI program is spoken by a screen-reader program. First, an application uses a standard application programming interface call to draw “Hello World” in an OS/2 Presentation Manager (PM) window. The application could use a function called WinDrawText to place the text in the window. The WinDrawText function results in a low-level graphics engine call to GreCharStringPos. For GreCharStringPos, there is a corresponding call to the patch-code function OsmCharStringPos, passing it the same parameters.
PM is very helpful in providing many query functions needed to build an OSM node before placing it into the model. The construction code uses functions such as GreGetClipRects and GreQueryFontAttributes to obtain clipping and font information. When the necessary information is acquired from the Gre functions, the patch code then determines the text colors by performing its own color mixing.
Next, a database node is constructed. OSM utilities create the node; add the font, color, spacing, text, and text bounds; and clip the node to the region. If the text is still visible, a window handle is retrieved from PM and placed in the node. Finally, OSM utilities are used to merge the text into its proper place in the database-in this case, the visible (displayed) portion.
After the text is merged into the OSM, the SRD is notified. Assuming that “Hello World” is in the current view, on the next keystroke command, the blind user could hear the SRD speak what is in the window. If “Hello World” were the only text to be displayed in the window, the user could program the SRD to speak it automatically on update.
Twisting Screen Readers’ Arms
This is a good time to discuss the effects of making the current screen readers work in the new GUI system environment. Modifying current screen readers to accommodate the new GUI software is no easy task. Most of these systems run under DOS as TSR programs. Unlike DOS, leading GUI software operates in multitasking environments where applications are running concurrently. The SRD performs a juggling act as each application gets the user’s input.
Since an SRD is a new multitasking application itself, it must use forms of IPC, like queues. PM and Windows use message queuing so that windows can receive keyboard and mouse input and communicate between other windows. A program remains dormant most of the time, until it receives a message.
Operating systems like OS/2 provide for multiple full-screen sessions as well as a GUI. Due to the protected-mode environment of systems like OS/2, SRD developers must now write device drivers to access the protected full-screen display-buffer memory of non-GUI applications.
With macros or a full-profile language used by current fully functional SRD systems, you can program how each application is read to the user. The blind user or the SRD developer must write profiles to employ multitasking constructs and be able to distinguish GUI constructs like window classes for various system and application controls.
Writing new profiles for these systems will require major rewrites to the existing software base and is a hair-raising experience. If you resize the window, text or child windows within the frame may disappear. The writer can’t always count on the text or icons remaining in the window.
Capabilities and Shortcomings
With the new technology, screen readers will be able to work with most popular GUI software packages. These include the new GUI versions of Microsoft Word and Excel or Lotus 1-2-3 for Windows and PM-any application that relies mainly on text or icons.
The new screen-reader world still won’t be a utopian one. The disabled will have problems with packages such as CorelDraw, which draws pictures as well as text. It includes no way to describe the pictures to the blind user. Disabled people will also have trouble with other packages that cannot tell them the location of graphical images on the screen. In addition, some software packages construct screen text from lines rather than from fixed fonts. In this case, the application performs the drawing instead of placing a fixed font on the screen. Access to this text requires that character recognition be built into off-screen models.
What Else Is Needed?
To fully exploit the new technology, developers should make OSM libraries accessible to the public. Access to an OSM would allow screen-reader developers to tailor their systems to people with learning disabilities. As an example, a modified screen reader could say “file disposer” rather than “wastebasket” for the Macintosh GUI, an enhancement that would help learning-disabled people who have problems translating symbolic representations into meaning.
Other screen-reader companies will want to develop their own OSMs. Developing the hook code for AIX or PM is a very large undertaking. Therefore, GUI systems should provide hook facilities that allow applications to intercept the low-level graphics calls and construct their own OSMs.
Release of this new software and promulgation of the new laws will motivate computer companies to develop more accessible systems. But there is still much more to be done.
For the past year, I have been part of a project team whose goal is to convert IBM’s Screen Reader for DOS to Screen Reader OS/2 and PM. In particular, I’ve been involved in developing PM’s OSM.
Commenting on the destiny of Screen Reader/PM, IBM’s Jim Thatcher, research staff member and project manager, says, “Currently, Screen Reader for PM is an IBM Research Division prototype being used on the job by more than 40 people in IBM and other companies. It hasn’t yet been determined whether or not it will become a product. However, we are demonstrating it at major conferences, including The World Congress on Technology in Washington, D.C., this month, and it is entered in the Johns Hopkins National Search for Computing to Assist Persons with Disabilities.” (See the text box “Chaotic Progress” .)
IBM’s Screen Reader/PM is the first fully functional SRD for a GUI. It is fully programmable. Project team members have completely rewritten the profile access language to accommodate PM and OS/2 multitasking constructs. Screen Reader/PM automatically switches profiles as the blind user switches between applications, either on the PM GUI or in full-screen sessions.
The group is also porting Screen Reader/PM to AIX and X. Screen Reader/AIX is to be the first GUI screen reader for a Unix-based system. With this software, the blind user will also be able to work with Motif. Development of Screen Reader/AIX marks the beginning of a portable OSM between two large multitasking operating systems.
Berkeley Systems is rewriting OutSpoken for Windows 3.0 and developing a portable OSM called the GUI Access Toolkit. This toolkit will make it possible for third-party developers to create and market new access software for Windows and the Macintosh without having to develop their own OSMs.
These efforts forecast a new era of independence for the visually impaired community. For the first time, blind and visually impaired people will be able to work with multitasking systems. They will be able to perform tasks such as formatting a disk and running a spreadsheet calculation while they are logged onto a mainframe and listening to the daily announcements in a window.
With software like IBM’s Screen Reader/AIX, blind people will be able to use workstations. This access will open many new jobs now unattainable by visually impaired people.
Richard S. Schwerdtfeger is an independent software consultant specializing in OS/2. He is working with the Screen Reader/PM development team at IBM’s T. J. Watson Research Center in Yorktown Heights, New York. You can reach him on BIX as “rschwer.”