Using Augmented Reality as a Feedback System for Gesture Based Interfaces

Co-authored by: Alona Lerman, Shachar Oz, and Yaron Yanai

Part 2 of the Series: Designing a Practical UI

In this post we explore the challenges of providing an effective and useful “feedback” within gesture-based systems and offer our thoughts on how augmented reality can be implemented as a supporting tool for creating an intuitive and engaging interface.  Read on to learn how we came to the approach implemented in the video below:

 

Background

A feedback system is the application’s method of communicating with users to: prompt them to take an action, inform them that a given task has been understood, and assure them that the system is aware and responding to their interaction. Feedback can come in the form of sounds, animations, color changes, highlights, or textual messages such as instructions, pop-ups, balloons, etc…

Omek’s UX Studio has spent much of the past six years researching best practices for gesture based HMI.  One of the most important lessons we have learned time and again is that an effective feedback system can make the difference between a frustrating and confusing experience to one that naturally guides the user through an application, in a fun and successful way.

Gesture-based systems require their own form of feedback

What works well in one modality often does not work as well when applied to another modality. This is a learning that we cannot emphasize enough. There are attributes that are unique to gesture-based systems, and how you design your application should take into consideration these attributes:

    • No tactile feedback: This one may seem obvious but has important implications. Unlike with most interfaces, gesture-based systems are “touch-less”. They lack physical, tactile feedback. For example, with a mouse or keyboard there is the haptic response when you push down and release a key. In a gesture-based system, you need to find the appropriate means of letting a user know that they have performed a given task.
    • Invisible interaction space: The interaction space in a gesture-based system refers to the effective field of view of the camera (check out the photo below). Without feedback, a user has no way of knowing if he is being “seen” by the application. Check out our last blog postfor more detail on this topic.

      Defining the Interaction Space

  •  Even the best-designed systems sometimes fail: Yes, on occasion even the most accurate of tracking systems may “lose” a user, for example, because of occlusions. When this does happen, the application should fail gracefully to keep frustrated reactions to a minimum.
  • No standardized gestural language: We’ve referenced before the article that Don Norman wrote on gesture wars for the magazine Core 77. For example, something as simple as the action of “selecting” can be interpreted in many different ways. One user may hover their hand over a button to select, while someone else may try to push their hand forward, mimicking the action used with a physical button.

Experiments in Feedback

For the Practical UI app we tested out a number of different ways to provide users feedback throughout their experience, from the first interaction all the way through to the end. Below we share examples of a few different feedback methods, their pros and cons, and considerations to keep in mind when you apply these in your own applications.

1. The Traditional: Your hand as a “cursor” on the screen, similar to a mouse pointer.  In this scenario, your hand (or, more likely, your finger) becomes the mouse cursor on screen giving you visual feedback of where the “cursor” is at all times.

Here, a user’s finger becomes the mouse pointer

Pros:

  • Extension of the current paradigm, thus requiring little explanation.  From both an applications design and a user perspective, this seems like the most natural way to provide feedback in a gesture-based system. Users easily relate to this method of feedback since it takes such a familiar form (who hasn’t used a mouse before?). During testing, we found that users almost expected this to be the means of interacting with the application. There’s no manual or additional guidance you need to offer to users when they are getting started.
  • Constant feedback given to the user.  With a “cursor” method, the user always knows where his hand is in relation to the screen, providing invaluable information.

Cons:

  • Easily leads to user fatigue. The work here is placed on the user, who has to be very accurate in his selection. Users have to hold an extremely steady hand in order to ensure they make the intended selection. All that steady hand-holding will quickly lead to fatigue and frustration. Bernd Plontsch just wrote an excellent post on exactly these challenges when trying to implement this using a Leap Motion.
  • Applying Fitts Law in a NUI world. When the hand becomes a cursor in a gesture-based interface, a significant amount of thought must be put into the design and layout of the interface to ensure that users can quickly navigate from one selection to the next. John Pavlus interviewed UX Expert, Francisco Inchauste, on exactly this topic.

 

2. Active Regions: Highlighting selection buttons on screen when a user’s hand hovers over them (no cursor). 

Jinni Demo: The “selected” box is defined by a white outline

Check out a real-life example of this in our gesture-controlled media selection  demo we created in partnership with Jinni for CES 2012.  Instead of a cursor appearing on screen, when a user’s hand moves over the selection options, the selection buttons activate indicating that the user is in the “area” of a specific selection.  The specific feedback may be manifested by buttons lighting up, enlarging, or creating a “shadow”.

Pros:

  • No issue of “accuracy”. This significantly lowers the issue of user accuracy that we saw above with the cursor method.  As long as a user is in the general area of the selection button, he is able to make a selection. This means that the tracking feels much more stable for end users – they don’t see the small jumps and shakiness the tracking data creates.
  • Fewer “false” selections. Bigger buttons mean less chance for error (see Fitts Law).  By creating bigger, more clickable areas it is much easier for the user to focus on their target, point at it and reach it.

Cons:

  • Changes in the interface design. This feedback method requires that you construct your interface to fit this approach, creating from the get-go large “selection buttons”.
  • No constant feedback for users.  In contrast to the mouse cursor approach, users here only know whether their hand is selecting a certain icon or not.  The feedback received can be somewhat vague, without offering guidance on how far your hand must move to reach the next icon. 

 

3. 3D Hand Model: Creation of a Hand Avatar. Animating a 3D model of a user’s hand and turning it into a “hand avatar”. Every joint in the user’s hand is directly mapped to the model’s joints. Imagine the extension of a user’s hand into the application.

A user’s hand is transformed into a robot arm, showing all of the user’s joints


Pros:

  • Very detailed feedback.This example offers a clear physical representation of your hand on screen, representing each and every one of your actual movements.  It’s almost as though your hand has extended into the application on screen.  There are two options for physically demonstrating this:
    • Mirror – where the screen reflects a mirror version of your hand
    • First person – the user sees their hand on screen the way they see their hand in front of them
    • Creation of an immersive world.  You can simulate collisions and physics in order to interact with virtual objects as if they were real – pick them up, push them, squash them etc.  You become the puppeteer – moving a virtual you in a virtual world.

Cons:

  • Issue of the “uncanny valley”.  A hypothesis widely used in robotics and animation, the issue here references the delicate balance that must be struck between creating a virtual hand that users will find engaging to utilize.
  • Very sensitive to tracking issues. Since most of the data from the tracking system is used all of the time, any error will be instantly seen.  Essentially, it runs the possibility of introducing noise to the experience even though the actual points of interest (such as the index finger) are stable on their own. This issue can arise in any tracking system, even the most accurate ones.
  • Requires a certain amount of control. It sounds strange but we are so used to controlling interfaces on a 2D interaction space that having your 3D hand inside the screen full control can actually make for a bulky and awkward experience.
  • High sensitivity to the perspective of the interface.  Since this is a virtual world the hand might be rendered in a different perspective than what the user is used to, resulting in a disorientating experience for most of the users we tested.

Our current model for the Practical UI: Using Augmented Reality as a Feedback Method

An augmented hand means rendering the hand’s image in each frame and overlaying the user interface on top of that. You use your own hand virtually represented on-screen as the pointing device and the feedback system becomes very responsive and actually fun and intuitive for most users (See part 1).  Why?

“Silhouette” of a user’s hand created using Omek’s Grasp to abstract just the hand

  • Incredibly intuitive system.  A user raises their hand to get started and immediately sees it reflected on screen, requiring almost no explanation on how to use the system.
  • Constant feedback. The user always understands if the camera is tracking them or not based on whether their hand is being shown on the screen, providing quiet reassurance to the user. Moreover, the user has accurate and constant feedback for his position in space.
  • Use of all three dimensions. As the user’s hand moves closer or farther from the camera, their virtual augmented hand inside the application becomes larger or smaller, respectively.  The user can easily understand their area of influence inside of the screen
  • No pointer required, thus offering the ability for lower latency.  Every tracking system has a smoothing algorithm to provide accurate data. In this instance, however, we aren’t rendering an actual pointer since the user’s hand becomes the pointer.  Therefore, using “behind-the-scenes” calculations we were able to remove a lot of the smoothing thus eliminating latency in the application.

There are, however, a few things to keep in mind:

  • First, it requires rendering of a constant full screen video stream at 60 fps, which can have impact on performance.
  • This one is sneaky: what part of a user’s hand is considered the selector?  Is it your index finger?  The palm of your hand?  More about this topic in an upcoming post.
  • Pay attention to the details: we didn’t use a classic augmented reality hand.  Instead we created almost a silhouette of the hand by cutting the hand out of the background.   Rather than seeing a user’s face and arm, all you see is a subtle hand.  This doesn’t just provide a more aesthetically pleasing interface; it also reduces the distraction level for the end user, allowing them to focus more on the experience.
  • Finally, you will have to design the application so that the hand is visible in all circumstances. For example, when the hand is behind an element (i.e., a button or menu) it becomes obscured and the feedback is lost.  Alternatively, if you render the hand above everything else you run the risk of blocking elements on screen (not to mention the fact that it is strange to interact with elements that are behind your hand). We addressed this issue by rendering the hand twice – one rendered opaque in the background and one as an outline in the foreground, thus solving both problems.

In Brief…

Gesture Recognition is an amazing technology that allows the user to interact with devices on his own terms. But it is an entirely new paradigm that requires a different approach to the design and feedback systems.  If you simply extend traditional approaches that work well for a different modality (say, touch), you’ll find almost always that it doesn’t fit for gesture.

“Feedback” in a gesture-based system should be subtle yet constant, informative yet fun, and always intuitive.

Gestural interfaces and 3D sensors offer us new way of interaction with machines, computers and applications. As designers we need to keep in mind those do’s and don’ts in order to create clear and responsive feedback systems.

Designing a Practical UI for a Gesture-Based Interface

Co-authored by: Alona Lerman, Shachar Oz, and Yaron Yanai

Part 1: The Evolution of the Arc Menu

In the first article of the series, Omek UX Studio’s Creative Director Yaron Yanai and Lead Designer Shachar Oz talk about designing the application’s menu system called the Arc Menu.

Jump directly to Part 2, featuring how we used Augmented Reality to provide “feedback” to users.

The final product: a virtual bookshelf you can interact with using your hands

It is the mission of Omek’s UX Studio to explore new ways of interaction with computers using gesture and motion control. UX Studio team members research user experiences as the technology is being developed in order to inspire developers and create better tools for using this exciting new means of control.

For CES 2013 we decided to create a demo that shows off a 3D content browser: essentially, being able to view your books, music and pictures as three dimensional models. You can “pick them up”, look at them from all sides, open them and compare them – all using natural gestures, with no teaching required. The demo we created shows how to use this application for one’s own library of books, this can easily be extended to an online retail application, providing customers the ability to “try out” products virtually, right in their own home. Consumers can “try out” actual items, compare them with similar items, and then purchase them.

The “Practical UI” as we’ve called it, was built from the ground up with the intention of deploying gesture recognition to control every aspect of the app. The ease of use of the tool is the result of several months of development and user testing.

Making Sense of Gestures

At the start of designing the application, we defined several interactions we planned to create:

  1. Menu Navigation and selection: navigating between different collections
  2. Collection Navigation: navigating in 3D once inside a collection, i.e., panning and zooming
  3. Object Selection and Manipulation: Picking up an object, pulling it closer to see more details, rotating it in 3D to view it from every angle
  4. Object Comparison: Picking up two objects, one in each hand, and compare them visually by rotating each of them
  5. Player: Looking inside an object, i.e., opening a book, playing a record

The challenge

Create an application with the functionality to perform all of the gestures and interactions listed above in an intuitive and comfortable fashion. Ensure that we are always providing a responsive, engaging, convenient, and most of all, fun, experience. And finally, showcase the potential of gesture recognition to enable a more dynamic interface with better control of three dimensional objects in a virtual world.

When we began the design process, we leveraged the Studio’s vast experience of building gesture-based applications to ensure we avoided a few of the classic pitfalls people make when translating standard mouse + keyboard or touch paradigms to a 3D environment.

  • Boundaries: The user must always know whether or not she is being tracked, i.e., being “seen” by the camera
  • Location feedback: Provide constant feedback on where a user is located so they can know how & where to move in order to get to their desired selection.
    • One simple option is to place a cursor on the screen the way a regular mouse does, and have it follow the user’s finger. There are a lot of disadvantages to this method, however, because it requires the user to be very accurate leading to increased fatigue and frustration.
  • Item selection: How does a user make a selection? There are several methods for selection using gestures, however a simple “click” is not one of them since there’s no button to click.
  • Fatigue, responsiveness and accuracy: We found out that these mechanisms are closely tied together and fatigue, being one of the major drawbacks of this technology, is probably the problem we spend the most time on.

This first article in our series touches on our approach to designing an engaging, easy-to-user menu system, we’ve called: The Arc Menu.

The Menu System

The first challenges we addressed were Boundaries and Location Feedback. Often, boundaries are defined by a small “live feed” frame at the bottom of the screen and an “out of boundaries” alert when the user approached the edge of the field of view of the camera.

This time, however, we decided to try something new: we stretched the camera’s feed (the depth data) to fill the entire screen, so that it became the background of the entire application. This way the user is receiving real-time feedback on whether his hand is inside the FOV of the camera. It acts as if the application is a mirror of the user’s hands. When we tested it out, people’s reactions were very positive. We ended up liking it so much that we let it shape many other aspects of the application.

Figure 1: Stretching the live feed over the entire screen

We decided to give the application an augmented reality feel, where the hand itself becomes the “pointer” instead of a traditional cursor. This required the creation of a specialized tracking system to “understand” what the user is pointing at. The application needed to support most hand configuration scenarios in order to keep the application intuitive; we designed for:

  • index finger pointing
  • full hand pointing
  • middle finger pointing
  • And more!

While we were testing the input system, we started designing the menu. We started out with a four item menu system: 1) Books, 2) Music, 3) Photos and 4) Friends. We kept in mind a few key learnings from our previous experiences building menu systems designed for long range environments:

  • Buttons must be relatively large in order to be selected easily
  • Menu selections must be placed sufficiently far apart to avoid false selection
  • Menu buttons must enlarge on hover in order to avoid flickering at the edges

We started off with a simple design: a linear horizontal distribution, with four buttons spread across the width of the screen.

Figure 2: Horizontal Menu

So, what did we find out from our initial user testing?

The good:

  • The Augmented Reality style selection worked great for all of the test subjects in terms of intuitiveness and responsiveness. There was no need to explain how to use the interface since a user could immediately see their hand and whether it was being tracked. There was almost zero latency since the hand itself was the cursor and there was no need for accuracy since the buttons were big and you didn’t need to have a tiny cursor point at a specific point

The not-so-good:

  • Fatigue was high, even though the movements are relatively small
  • When a user moved her hand in a horizontal line across the screen, her hand would obscure parts of the menu and the screen
  • Right-handed users found it difficult to select items on the far left side of the screen

Small, incremental changes sometimes aren’t enough.

We started off by making small changes to the design to tackle the issues we identified during user testing.  First, we limited the menu to the right half of the screen only.  User testing showed that the experience was slightly better but not good enough.

We realized that the best way to deal with fatigue was to enable users to rest their elbow on a table or the arm of their chair. Our tests showed that for the first two buttons on the right side, the users had great results. Fatigue was minimal even after several minutes of interaction. To reach the two buttons on the left, however, the users had to raise their elbow, which brought back the problem of fatigue. This proved to be true even when we put all four buttons in a stack formation or a square formation. We kept coming up against the issue that only some of the points were convenient to select without having to bend the wrist or elbow in an uncomfortable way.

Build Upon a User’s natural movements.

We took a step away from the computer and just observed the natural movements of our bodies.  We had a few test users sit comfortably in front of their computer and had them move their hands horizontally and vertically, without lifting their elbow from their desk.  Visualizing these movements as lines it quickly became clear that the joints in our body don’t naturally move in a straight line, but rather as an arc.  So why not build the menu into the shape of an arc?

Figure 3: Visualization of users’ natural hand movements

We gathered a wide set of examples of hand movement “arcs” from people of all sizes in order to create a “standard” Arc Menu that would work across a wide range of users.

We tested the new Arc Menu and the results were especially positive. The interaction was intuitive and fun, while fatigue was low even after several minutes of continued use. All of the buttons on the menu were equally accessible and it worked perfectly with the input system.

Finishing Touches

To ensure an elegant experience, we designed the application so that the arc menu only appears when needed. We accomplished this by folding the arc into a single button on the top right that unfolds automatically when a user hovers over it.

Figure 4: Final design of the arc menu

What’s next for the Arc Menu?

  1. Extend the experience to more than 4 buttons
  2. Make the arc size adjustable according to the screen’s size or even user preferences
  3. Make the arc flip for left-handed people

Conclusion

Close range interaction is very different from long range gesture-based experiences. Although the tracking is much more precise and responsive, we still face similar issues, such as fatigue.  And in close-range, these issues are often felt immediately and get worse over time.

When the user rests his elbow on a table or arm, fatigue is much lower and interaction can last minutes or even more. This however limits the mobility of the user’s hand to pivot around the elbow. All of this information led us to create an arc-like menu which was not only an answer to a problem but actually proved as a very useful and even fun experience extending the amount of time the user can work in front of the computer.

And if you’re interested in signing up for our upcoming Grasp beta…just click on the link below.

Thanks and stay tuned for the next chapter.