Permission to reprint or excerpt is granted only if the following line appears at the top of the article: ANTIC PUBLISHING INC., COPYRIGHT 1986. REPRINTED BY PERMISSION. PROFESSIONAL GEM by Tim Oren Column #14 - User Interfaces, part 2 This issue of ST PRO GEM (#14) continues the discussion of user interface design which began in episode eight. It begins where we left off, with a further treatment of the mode problem, and proceeds into topics such as visual grammar and layered interfaces. Note that there is no download for this column. The downloads will return with the next issue, a discussion of using the GEM DOS file system within a GEM application. Specifically, it will include sample code for using the file selector, the GEM form_error alerts, and some utilities for manipulating file and path names. There will also be a feedback section. The following two columns will be devoted to "graphics potpourri", a collection of small but useful GEM utilities such as popup menus, string editing, and source code for drag and rubber box operations. MODES AGAIN. If a program is modeless, it acts predictably, which turns out to be very important. On the other hand, a good definition for "modes" is hard to find. In column eight, I suggested that a mode exists when you cannot use all of the capabilities of the program without performing some intermediate step. If this is less than clear, here are two alternate definitions offering different views of the problem. THE "TWO USER TEST". Consider the following thought experiment: Imagine that your ST (and GEM) had two mice, two cursors, and two users. Could they both effectively use the program at the same time? If so, the application is modeless. If there are points where one user can be "locked out" by the actions of the other, then a mode exists at that point. Let's consider some examples of this test. In any program which uses the GEM menu system, one user could stop the other by touching a menu hotspot and dropping a menu. This constitutes an inherent mode in the GEM architecture. On the GEM Desktop, two users could open windows and view files without interference. However, as soon as one person tries to delete a file (assuming the verify option is on), the other is brought to a halt as a dialog appears. Thus, we have found a modal dialog. In many "Paint-type" programs, such as MacPaint, PC Paint, and GEM Paint, two artists could co-exist quite well, utilizing the on-screen palette and tool selection. Of course, these programs also contain modal dialogs for such operations as file and brush shape selection. In contrast, consider the paint program DEGAS for the ST. Here, two artists could only work together as long as neither wanted to change tool or color. Then the display would have to be flipped to the selection screen, stopping the other user. This is a mode in the DEGAS interface. (By the way, this test is not just academic. The grand-daddy of all mouse based systems, NLS, demonstrated by Doug Englebart in 1968, had two mice and two users, one of whom was physically remote. Cooperative techniques such as this are still largely unexplored and unexploited.) ONE LINER. Here's a terse definition by Jef Raskin: A program is modeless if a given action has one and only one result. Again, let's run a few examples. The menu dropdowns are clearly modal by this definition. Before the menu was activated, window control points could be activated with a click. However, when the dropdown is visible, a click action is interpreted as a menu selection or a dismissal of the dropdown. Similarly, dialogs are modal because the action of moving the mouse into the menu bar no longer causes the dropdown to appear. I am typing this using the First Word editor program. It has a nice desktop level box full of characters where I can click to get symbols which the ST keyboard won't produce. However, if I invoke the find or replace string dialog, the click-in-the-box action doesn't work anymore. This is a mode in the First Word interface. Finally, consider an "old style" menu program, the kind where you type in the number of the desired action from a list. Since the number "2" might mean "Insert the record" in one menu, and "Purge the file" in another, such a program is clearly modal by Raskin's definition. These three definitions say almost the same thing, but from different viewpoints. Depending on the situation, one or the other may be more intuitive for you. The goal of this type of analysis is to root out unnecessary modes, and to make sure that those which remain only appear when requested by the user, offer some visual cue such as a rubber line or standard dialog box, and are used consistently throughout the application. PREDICTABILITY FOREVER AND EVER AND EVER. As Raskin's definition makes clear, when the modes go away, the interface becomes predictable. Predictability leads to the formation of habits of use. Habits reduce "think time" and become progressively faster due to the Power Law of Practice discussed in column eight. This is exactly what we want! There is another benefit of predictability. A habit learned in one part of a program with a consistent interface can be transferred and used elsewhere in the application. If several programs share the same style of interface, the same habits can be used across a complete set of products. Learning time for the new functions becomes shorter, and the user is more likely to use the new feature. IS A BOGEYMAN! Most casual users are scared silly of computers and programs. (If you have any doubt, eavesdrop on a secretary with a new word processor, or the doctor's receptionist coping with an insurance data entry program.) In most cases, they have a right to be frightened. Even experienced programmers, prone to toss the manuals and hack away, know that moderate paranoia is the best way to deal with an unknown program. How must this feel to someone whose ability to perform (or lose) their job depends on an unpredictable (aha!) black box. So here's another way in which predictability works. But to produce a truly fearless user, we need other qualities as well. One is robustness, meaning that the program will not crash given normal or even bizarre actions by the user. Another is feedback, which shuts off invalid options, reinforces correct actions, and gives reassurance that an operation is proceeding normally. Finally, we need forgiveness, in the form of inverse operations or Undo options, when the inevitable mistake is made. The ultimate goal is make the program discoverable. This means the user should be able to safely "wing it" after a short session with the application and its interface. This practice ought to be considered the norm anyway, since the manual is always across the office or missing when an esoteric and half-forgotten feature is needed. If it is possible to muddle through such a situation by trial and error, without causing damage, the immediate problem will be solved, and the user will gain confidence. GOOD GRAMMAR OR... So exactly what are these habits that are supposed to be so helpful? One of the most useful patterns is a consistent command grammar for the program. This may sound strange, since we have supposedly abandoned command line interfaces in the graphics world, but in fact, the same type of rules apply. For instance, in the world of A> we might issue the command: copy a:foobar.txt b: By analogy to English grammar, this command contains a verb, "copy", a file as subject: "a:foobar.txt", and a location as an object: "b:". The equivalent GEM Desktop operation is: - Move mouse to foobar.txt icon in a: window - Press mouse button - Move mouse to b: icon - Release mouse button The operation can be described as a select-drag-drop sequence, with the select designating the subject file, the drag denoting the operation (copy), and the location of the drop showing the object. A grammar still exists, but its "terminal symbols" are composed of mouse actions interpreted in the context of the current screen display, rather than typed characters. One useful way to analyze simple grammars, including those used as command languages, is to separate them into prefix, postfix, and infix forms. In a prefix grammer, the operation to be performed precedes its operands, that is, its subject(s) and object(s). The DOS copy command given above is an example of a prefix command. LISP is an example of a language which uses prefix specification for its commands. Postfix grammars specify the action after all of the operands have been given. This command pattern is familiar to many as the way in which Hewlett-Packard calculators work. FORTH is an example of a language which uses a postfix grammar. Infix notation places the verb, or operator, between its subject and object. Conventional algebraic notation is infix, as are most computer languages such as C or PASCAL. The example GEM command given above is also infix, since the selection of a subject file preceded the action, which was followed by the designation of an object. The "standard" GEM command grammar, as used in the products produced by Digital Research, is in fact infix. This is not to say that GEM enforces such a convention, or that it is rigorously followed. However, when there is no pressing reason for a change, adoption of an infix command grammar will make your application feel most like others which users may have seen. The general problem of specifying a graphic command language can be difficult, but much of the problem has already been handled on the ST. Part of the solution is by constraint: the input and output hardware of the ST are predefined, so most developers will not need to worry about choosing a pointing device or screen resolution. The other part of the standard solution is the GEM convention for mouse usage. I am going to review these rules, and then describe of the situations in which they have been bent, and finally some alternate approaches which may prove useful to some developers. SPECIFYING A SUBJECT. There are really two sets of methods for designating what is to be affected by an operation. One set is used when distinct objects are to be affected. Examples are file and disk icons in the Desktop and trees in the RCS. Another set of designation methods is used when continuous material, such as text or bit images, is being handled. When dealing with objects, a single mouse click (down and up) over the object selects it. The application should show that the selection has occurred by changing the appearance of the object. The most common methods are inverting the object, or drawing "handles" around it. Many operations allow "plural", or multiple object, selections. The GEM convention is that a click on an object while the shift key is held down extends the selection by adding that object. If the shift-clicked object was already selected, it is deleted from the selection list. Another way to select multiple objects is to use a "rubber box" to enclose them. This operation begins with drag on a part of the view where no object is present. The application then animates a rubber box on the screen as long as the mouse button is held down. When the button is released, all objects within the current extent of the box are selected. A shift-drag combination could be used to add the objects to an existing selection list. Selecting part of a text or bit plane display is also done with a rubber box. Since there are no "objects" in the view, any mouse drag is interpreted as the beginning of a selection operation. In the simplest case, a bit plane, the rectangle within the box when the button is released is the selected extent. When the underlying data has structure, such as words and lines of text, the display should reflect this fact during the selection operation. Typically, text selection is indicated by inversion of the characters rather than a rubber box. The selection extends along the starting line so long as the mouse stays within the line. If the mouse move off the starting text line, the implied selection is all characters between the starting character and the character currently under the mouse, which is not necessarily a rectangular area. An extended "plural" selection may be supported in text editing. The use of the shift key is also conventional in this application. ACTION. With the subject designated, the user can now choose an operation. In many cases, this will be picked from the menu, in which case the entire command is complete. Some menu selections will lead to dialogs, in which the interaction methods are regulated by the GEM form manager. When the command is completed, it is often helpful if the application leaves the objects (or areas) selected and ready for another operation. A single click away from any object is interpreted as cancelling the selections. Many operations are indicated by gestures on the screen. Usually, this is some variant of a drag operation. The interpretation of the gesture may depend on the type and location of the selected subject, which part of it is under the mouse, and in what location the drag terminates. "Handles" are small boxes or dot displayed around an object when it is selected. A drag beginning with the mouse on a handle is usually interpreted as a resizing operation, if this is appropriate. The pointing finger mouse form is displayed to indicate the operation in progress, and a rubber version of the object is animated on the screen to show the user the result if the button were released. In some cases, where an underlying "snap" grid exists, the animated object may change size in discrete steps. Dragging a non-handle area of a selected object is usually interpreted as the beginning of a move function. In most applications, a move of a single object may be started without pre-selection. Simply beginning the drag on the object is taken to imply selection. The spread hand, or "grabber", mouse form is typically displayed during a drag operation. Dragging may denote copying or movement, or more complex functions such as instantiation or generalization. The operation implied by movement on the screen will differ among applications, and often within the same application, depending on target location. This target is the recipient of the command's action, or its object, in an English grammar sense. For example, a drag from window to window in the Desktop denotes a copy. On the other hand, dragging the same icon to the trashcan deletes it completely. Dragging an object from the RCS partbox to the editing view creates a new copy of that prototype object. Dragging the same object within the edit view simply changes its placement. There are some mouse actions which are conventional "abbreviations". A double click on an object is interpreted as both a selection and an action. Usually, the double click action is the same as the Open entry in the "File" menu. When the usual interpretation of a drag is movement, then shift-drag may be used as an enhanced varient implying copying. For instance, shift-dragging an object within the RCS editing window makes a copy of the object and places it in the final location. To return to the beginning of this discussion, the reason for adopting these conventional usages is to build an interface that promotes habits. Particularly, a standard grammar for giving commands helps answer the question "What comes next?". It breaks the user's actions into logical phrases, or chunks, which may be thought of a whole, rather than one action at a time. DIFFERENT FOLKS, DIFFERENT STROKES. There are always exceptions to a rule, or so it seems. In this case, consistency of the interface grammar is sometimes traded off against consistency of metaphor, preservation of screen space, and "fast path" methods for experts. One example is the use of "tools" in Paint and Draw programs. In such programs, an initial click is made on a tool icon, denoting the operation to be applied to all following selections. This is an prefix style of grammar, and stands in contrast to the usual method of selecting subject object(s) first. Because of this contrast, it is sometimes called "moding the cursor". (Try applying the tests above to be sure it really is a mode.) In these cases, there are two reasons for accepting the nonstandard method. The first is consistency of metaphor. The "user model" portrayed in the programs is an artist's work table, with tools, palette, and so on. The cursor moding action is equivalent to picking up a working tool. The second reason is speed. In a Paint program, the "canvas" is often modified, and speed in creating or changing the bits is important. In more object oriented applications such as Desktop or RCS, the objects are more persistent. Speed is then more essential when adding or changing properties of the objects. When command styles are mixed in this fashion, you must design very carefully to avoid conflicts or apparent side-effects in the command language. For example, in GEM Draw picking an action from the Edit menu cancels the current cursor mode without warning. Confusion from such side-effects may cancel out the benefits of the mixed grammar. The subject of command speed deserves further attention. While the novice approaching a program needs full feedback, a person who uses it day in and day out will learn the program, and want faster ways to get the job done, even if they are more arcane. The gives rise to a "layered" style of interface. A layered interface is designed so that the visual grammar is obvious, as we have discussed. However, there are one or more sets of "accelerators" built into the program, which may be harder to find but faster to use. One example is condensed mouse actions such as the double-click. For instance, attempting to select a block of text which extends beyond a window is impossible using the basic metaphor. The novice will simply do the operation in pieces. A layered interface might put a less obvious Mark Begin and Mark End option in the menus. Another way is to take a drag which extends outside the window as a request to begin scrolling in that direction, while extending the current selection. One of the most common and useful accelerator methods is function keys. Using this approach, single key equivalents to actions are listed in the menu. Striking this key when an object is selected will cause the action to occur. Note that this is most useful if some keyboard driven method of object selection, such as tabbing, is also available. Otherwise, the time switching from the mouse, used to select the object, to the keyboard for command input, may well cancel any advantage. Finally, radical departures from the GEM metaphor may be useful when attempting to replicate the look of another system, or trying to meet severe constraints, such as display space. One example would be discarding the standard GEM menus in favor of "popup" menus which appear next to the current mouse position in response to a click on the second button. This method has the advantage of preserving the menu space at the top of the screen, and is potentially faster because the menu appears right next to the current mouse position. The drawbacks are lack of a visual cue for naive users trying to find the commands, and the need for custom coding to build the popups. MORE TO COME. We have reached the end of the second sermon on user interface. In a future column, I will look at "higher level" topics relating to the design of the application's user metaphor. These include issues of object orientation, direct manipulation, and the construction of microworlds. In the meantime, several of the more practical columns will present implementions of techniques such as accelarator keys and popup menus which I have discussed this time. THANKS AND APOLOGIES to the following people whose public and published remarks have formed part of the basis of this discussion: Jef Raskin, Bill Buxton, Adele Goldberg, James Foley, and Ben Schneidermann. As always, any errors are my own.