Designing a Practical UI for a Gesture-Based Interface

Co-authored by: Alona Lerman, Shachar Oz, and Yaron Yanai

Part 1: The Evolution of the Arc Menu

In the first article of the series, Omek UX Studio’s Creative Director Yaron Yanai and Lead Designer Shachar Oz talk about designing the application’s menu system called the Arc Menu.

Jump directly to Part 2, featuring how we used Augmented Reality to provide “feedback” to users.

The final product: a virtual bookshelf you can interact with using your hands

It is the mission of Omek’s UX Studio to explore new ways of interaction with computers using gesture and motion control. UX Studio team members research user experiences as the technology is being developed in order to inspire developers and create better tools for using this exciting new means of control.

For CES 2013 we decided to create a demo that shows off a 3D content browser: essentially, being able to view your books, music and pictures as three dimensional models. You can “pick them up”, look at them from all sides, open them and compare them – all using natural gestures, with no teaching required. The demo we created shows how to use this application for one’s own library of books, this can easily be extended to an online retail application, providing customers the ability to “try out” products virtually, right in their own home. Consumers can “try out” actual items, compare them with similar items, and then purchase them.

The “Practical UI” as we’ve called it, was built from the ground up with the intention of deploying gesture recognition to control every aspect of the app. The ease of use of the tool is the result of several months of development and user testing.

Making Sense of Gestures

At the start of designing the application, we defined several interactions we planned to create:

  1. Menu Navigation and selection: navigating between different collections
  2. Collection Navigation: navigating in 3D once inside a collection, i.e., panning and zooming
  3. Object Selection and Manipulation: Picking up an object, pulling it closer to see more details, rotating it in 3D to view it from every angle
  4. Object Comparison: Picking up two objects, one in each hand, and compare them visually by rotating each of them
  5. Player: Looking inside an object, i.e., opening a book, playing a record

The challenge

Create an application with the functionality to perform all of the gestures and interactions listed above in an intuitive and comfortable fashion. Ensure that we are always providing a responsive, engaging, convenient, and most of all, fun, experience. And finally, showcase the potential of gesture recognition to enable a more dynamic interface with better control of three dimensional objects in a virtual world.

When we began the design process, we leveraged the Studio’s vast experience of building gesture-based applications to ensure we avoided a few of the classic pitfalls people make when translating standard mouse + keyboard or touch paradigms to a 3D environment.

  • Boundaries: The user must always know whether or not she is being tracked, i.e., being “seen” by the camera
  • Location feedback: Provide constant feedback on where a user is located so they can know how & where to move in order to get to their desired selection.
    • One simple option is to place a cursor on the screen the way a regular mouse does, and have it follow the user’s finger. There are a lot of disadvantages to this method, however, because it requires the user to be very accurate leading to increased fatigue and frustration.
  • Item selection: How does a user make a selection? There are several methods for selection using gestures, however a simple “click” is not one of them since there’s no button to click.
  • Fatigue, responsiveness and accuracy: We found out that these mechanisms are closely tied together and fatigue, being one of the major drawbacks of this technology, is probably the problem we spend the most time on.

This first article in our series touches on our approach to designing an engaging, easy-to-user menu system, we’ve called: The Arc Menu.

The Menu System

The first challenges we addressed were Boundaries and Location Feedback. Often, boundaries are defined by a small “live feed” frame at the bottom of the screen and an “out of boundaries” alert when the user approached the edge of the field of view of the camera.

This time, however, we decided to try something new: we stretched the camera’s feed (the depth data) to fill the entire screen, so that it became the background of the entire application. This way the user is receiving real-time feedback on whether his hand is inside the FOV of the camera. It acts as if the application is a mirror of the user’s hands. When we tested it out, people’s reactions were very positive. We ended up liking it so much that we let it shape many other aspects of the application.

Figure 1: Stretching the live feed over the entire screen

We decided to give the application an augmented reality feel, where the hand itself becomes the “pointer” instead of a traditional cursor. This required the creation of a specialized tracking system to “understand” what the user is pointing at. The application needed to support most hand configuration scenarios in order to keep the application intuitive; we designed for:

  • index finger pointing
  • full hand pointing
  • middle finger pointing
  • And more!

While we were testing the input system, we started designing the menu. We started out with a four item menu system: 1) Books, 2) Music, 3) Photos and 4) Friends. We kept in mind a few key learnings from our previous experiences building menu systems designed for long range environments:

  • Buttons must be relatively large in order to be selected easily
  • Menu selections must be placed sufficiently far apart to avoid false selection
  • Menu buttons must enlarge on hover in order to avoid flickering at the edges

We started off with a simple design: a linear horizontal distribution, with four buttons spread across the width of the screen.

Figure 2: Horizontal Menu

So, what did we find out from our initial user testing?

The good:

  • The Augmented Reality style selection worked great for all of the test subjects in terms of intuitiveness and responsiveness. There was no need to explain how to use the interface since a user could immediately see their hand and whether it was being tracked. There was almost zero latency since the hand itself was the cursor and there was no need for accuracy since the buttons were big and you didn’t need to have a tiny cursor point at a specific point

The not-so-good:

  • Fatigue was high, even though the movements are relatively small
  • When a user moved her hand in a horizontal line across the screen, her hand would obscure parts of the menu and the screen
  • Right-handed users found it difficult to select items on the far left side of the screen

Small, incremental changes sometimes aren’t enough.

We started off by making small changes to the design to tackle the issues we identified during user testing.  First, we limited the menu to the right half of the screen only.  User testing showed that the experience was slightly better but not good enough.

We realized that the best way to deal with fatigue was to enable users to rest their elbow on a table or the arm of their chair. Our tests showed that for the first two buttons on the right side, the users had great results. Fatigue was minimal even after several minutes of interaction. To reach the two buttons on the left, however, the users had to raise their elbow, which brought back the problem of fatigue. This proved to be true even when we put all four buttons in a stack formation or a square formation. We kept coming up against the issue that only some of the points were convenient to select without having to bend the wrist or elbow in an uncomfortable way.

Build Upon a User’s natural movements.

We took a step away from the computer and just observed the natural movements of our bodies.  We had a few test users sit comfortably in front of their computer and had them move their hands horizontally and vertically, without lifting their elbow from their desk.  Visualizing these movements as lines it quickly became clear that the joints in our body don’t naturally move in a straight line, but rather as an arc.  So why not build the menu into the shape of an arc?

Figure 3: Visualization of users’ natural hand movements

We gathered a wide set of examples of hand movement “arcs” from people of all sizes in order to create a “standard” Arc Menu that would work across a wide range of users.

We tested the new Arc Menu and the results were especially positive. The interaction was intuitive and fun, while fatigue was low even after several minutes of continued use. All of the buttons on the menu were equally accessible and it worked perfectly with the input system.

Finishing Touches

To ensure an elegant experience, we designed the application so that the arc menu only appears when needed. We accomplished this by folding the arc into a single button on the top right that unfolds automatically when a user hovers over it.

Figure 4: Final design of the arc menu

What’s next for the Arc Menu?

  1. Extend the experience to more than 4 buttons
  2. Make the arc size adjustable according to the screen’s size or even user preferences
  3. Make the arc flip for left-handed people

Conclusion

Close range interaction is very different from long range gesture-based experiences. Although the tracking is much more precise and responsive, we still face similar issues, such as fatigue.  And in close-range, these issues are often felt immediately and get worse over time.

When the user rests his elbow on a table or arm, fatigue is much lower and interaction can last minutes or even more. This however limits the mobility of the user’s hand to pivot around the elbow. All of this information led us to create an arc-like menu which was not only an answer to a problem but actually proved as a very useful and even fun experience extending the amount of time the user can work in front of the computer.

And if you’re interested in signing up for our upcoming Grasp beta…just click on the link below.

Thanks and stay tuned for the next chapter.

Experience the Future of Close-Range Interaction at CES 2013

It’s been a very busy year for us at Omek and we’ve made great strides in continuing to develop and release gesture recognition software and developer tools to create new, touchless experiences. With CES 2013 right around the corner we couldn’t be more excited about the opportunity to showcase the results of our efforts.

We continue to be inspired by the market opportunities and consistent interest in close range interaction. And Omek’s Grasp solution is groundbreaking when it comes to addressing consumer and customer needs. At CES 2013, we will be there to demonstrate responsive, robust, accurate hand and figure tracking like you haven’t seen before! In addition, we will be presenting our demos which showcase the deep research and learnings we’ve gleaned on how to create best-in-class experiences using gesture for a broad set of industries. Come check out our solution for intuitive control of Windows 8 with just a wave of your hand! Or experience virtual 3D control and modeling when you develop clay using just your hands. The opportunities are limitless.

If you want to see this and much more, book a meeting with us today. Call us at +972-72-245-2424 or email info@omekinteractive.com. We look forward to seeing you in Vegas!

Another Release! Beckon Usability Framework: Motion Toolkit for Unity3D

As promised, Omek continues to expand the Beckon Usability Framework to help you get the most out of Beckon’s motion tracking technology, with the least effort. Last week, we detailed the launch of the Gesture Authoring Tool. In this post, we’re happy to announce the release of two additional new components:

  • Beckon Motion Toolkit for Unity – a Unity package that you simply import into your Unity application, to have all of Beckon’s features at your fingertips.

  • Beckon C# Extension – a .Net C# API that is easier to use, yet more powerful than the Beckon SDK. If you’re a .Net developer, this is your Beckon tool of choice.

We’ll focus today’s blog post on the Beckon Motion Toolkit for Unity but check in tomorrow for details on the C# Extension!

If you’re developing applications using the popular Unity game engine, check out the new Omek Motion Toolkit for Unity. It’s the easiest way to get your Unity application controlled by user motions, creating a new and exciting experience for your users.

We’ve taken our Beckon SDK, which offers sophisticated motion tracking, animation and player management features, and made it easier than ever to access all of these features right in the Unity environment.

Using the Motion Toolkit, you can easily add the following to your Unity application:

  • Animation – automatically map player movements onto animated avatars, using full-body tracking information. You can also map any player’s joint onto a Unity GameObject.
  • Player Display –display players as color or depth images, or as motion-controlled icons.
  • Gesture Control – have your application respond to player gestures – the key feature of a Natural User Interface application.
  • Cursor Control – automatically map players’ hands to cursors, including options to override the operating system cursor, or implement multiple cursors.

  • Player Management – determine how many players your application manages at once, and how to select players from among the people on the scene.

The Motion Toolkit is a Unity package. All you need to do is import it into your Unity project, then drag-and-drop its ready-to-use components into your application.

If you’re curious to see what you can achieve with the Motion Toolkit (and we hope you are!), you can check out two gesture-enabled games developed by Omek’s Games Studio using the Motion Toolkit for Unity3D:


Adventure Park for Eedoo by Omek’s Game Studio


Galactic Surfers for Eedoo by Omek’s Game Studio

New Release! Create Custom Gestures with Omek’s Gesture Authoring Tool

What is the Gesture Authoring Tool?
In our last post, we explored the question, “what is a gesture?”. In this follow-up post, we are excited to announce the availability of Omek’s Gesture Authoring Tool (GAT, for short). This represents just one of a set of tools that fall under the Beckon Usability Framework, all intended to help make development with Beckon that much easier and faster. In this post we will go into a bit of detail on how it works and why it’s so important for the field of gesture recognition. Look out for future posts for details on additional components of our Beckon Usability Framework such as our Beckon Motion Toolkit for Unity3D, C# Extension, and more!

GAT is an easy-to-use, yet highly sophisticated tool that dramatically speeds up your development cycle and makes creating custom gestures accessible to anyone, even without any knowledge of how to code.

How does it work?
To create a custom gesture, you record examples of different people performing the gesture; GAT applies its advanced machine-learning algorithms to automatically “learn” to recognize that gesture. This frees you from the burden of analyzing and coding gestures manually, while offering the additional benefit of producing more accurate gestures than you would get with manual coding.

The GAT is available as an immediate and free download from our support portal.

Why do we think GAT is so great?
Coding gestures can be a complicated undertaking but GAT simplifies the process:

  • Gestures are subjective. Ask three different people to wave hello and you are likely to get three different types of waves. Each person may hold their arm at a slightly different height or angle, or they may wave at different speeds. If you define your gesture too closely to one specific model (as often happens with manually-coded gestures), your application will fail to recognize variations of that gesture when performed by different people.
    • GAT learns by example. The main idea of training gestures in GAT is to record several examples of a gesture, performed by different people. GAT then applies a machine learning algorithm that analyzes motion features, and learns to detect a specific gesture from among other gestures and movements.
  • People are sized differently.For example, the lady writing this blog post is on the shorter side of the spectrum. The range of motion may vary widely when people of different heights perform the gesture (say a child or a taller than average individual) and so you risk the gesture not being identified correctly.
    • Normalized skeleton. Before GAT analyzes a gesture, it first normalizes the tracked skeleton to a standard set of dimensions. This eliminates most of the problems related to people of different sizes.
  • Gestures are subtle.Small movements can have very different meanings, so we understand the importance of accurately recognizing a gesture for what it is.
    • Gesture Packs. GAT allows you to create a “pack” of several gestures that you want to create for your application. GAT can then train all these gestures together, enabling it to differentiate among them even if they are similar.

A few other features that enhance GAT’s usability:

  • Reports and Iterative Improvement. When you train gestures in GAT, it will show you a statistical report of the results, detailing the accuracy of the gesture recognition and the instances where errors occurred. You can then modify or add to your examples, rerun the training process and see how you’ve improved.
  • Mirror Gestures. Once you’ve defined a gesture, you can create its “mirror gesture” by clicking a single check-box. For instance, if you’ve defined a “right swipe” gesture, you can automatically create a “left swipe” gesture, without having to train the mirror gesture.
  • Composite Gestures. You can create a “composite gesture” by defining a sequence of two or more basic gestures that you’ve already defined.
  • Live Test. Once you’ve trained your gestures, you can test them in Live Camera mode. While watching a person’s movements in the GAT viewing pane, you will see gestures detected in real-time.
  • Display Options. GAT supports a rich set of options for displaying person tracking information. These include: 2D and 3D skeleton tracking, color and depth images, joint and bone display overlays.

Gesture Recognition: What is a Gesture?

These days there is a lot of discussion about the transition from a GUI, or Graphical User Interface, to a ‎NUI, or Natural User Interface. The entire experience of how we interact with devices is undergoing ‎significant changes as we move towards more intuitive, gesture-based systems that are based on ‎natural movements. But it also raises a number of related questions. Such as, what constitutes a ‎gesture? How do I decide on which gesture to use for a given task? What makes a gesture natural and ‎intuitive?‎

Over the coming weeks we will provide our thoughts on a number of these questions through our ‎blog. For today, though, we’re going to lay the framework by defining what gesture recognition ‎means in the context of NUI-based applications. ‎

To provide more detail on the topic, I enlisted the help of Omek’s VP Products, Doron Houminer, to ‎share his thoughts. Below, you’ll find the highlights of our conversation. As always, feel free to chime ‎in – we’d love to hear your questions and comments on the subject.‎

Q: Let’s start off with defining, what is a gesture?‎
Doron: In the context of computer vision, a gesture is a unique body movement that can be identified ‎as a discrete action. For example, a gesture could be a hand wave, a swipe of the arm, or a kick of the ‎leg. In a NUI application, gestures are used to generate specific responses (i.e., turn on / off, change ‎the volume or channel).‎

Q: What’s the difference between motion tracking and gesture recognition?‎
Doron: I think of motion tracking as a continuous experience whereas gesture recognition represents ‎an event that triggers something to happen in response. ‎
Motion tracking can be considered the “raw material” that enables gesture recognition. Motion ‎tracking monitors body positions and locations, and represents them as a virtual skeleton. It lets you ‎know the position of every joint at every point in time, representing them in real time.‎
Gesture recognition, on the other hand, is the process of identifying a defined gesture from a ‎sequence of motion tracking frames and labeling it accordingly. It can be explained as “the ‎mathematical interpretation of a human motion by a computing device”. ‎

Q: Why do we need gestures?‎
Doron: In the context of an application, gesture recognition is used to detect actions and elicit a ‎response. You can think of it like this: tracking is the “where” – where the player is located within the ‎field of view – and gesture recognition is the “what” that happened. Gesture recognition represents a ‎higher level of understanding of the scene being recorded. The legendary Bill Buxton sums it up quite ‎well in his chapter on Gesture Based Interaction – in the context of machine vision, “Gesture ‎recognition can be seen as a way for computers to begin to understand human body language, thus ‎building a richer bridge between machines and humans than primitive text user interfaces or ‎even GUIs (graphical user interfaces), which still limit the majority of input to keyboard and mouse.”‎

Q: Are there different types of gestures?‎
Doron: Yes. We think of gestures as falling into one of three categories:‎

  1. ‎Pose gestures: These are when the body (or a limb of the body) is in a specific, static posture. ‎A pose gesture can be illustrated by a single snapshot.
  2. Single-motion gestures: In this case, the body or part of the body performs a specific motion, ‎over a finite, usually brief period of time.
  3. Continuous-motion gestures: This refers to a repetitive action with no time limit, for example: a person running.

Q: So what are the technical means for generating gestures?‎
Doron: I’ll review a couple of different ways to create gestures, including Omek’s innovative Gesture ‎Authoring Tool. We will be dedicating our next blog post to describing the benefits and reviewing best ‎practices of using this tool:‎

  • Manual coding. If you go this route, it requires a skilled programmer to write custom code for ‎each gesture. The programmer has to articulate and define all of the parameters of the ‎motion they want to classify as a gesture. They will need to specify where the joint positions ‎should be in a consecutive series of frames. It is helpful if the programmer has a background in, ‎and understanding of, human anatomy, physics, and algebra.‎
  • Omek’s Gesture Authoring Tool. The second option is to use Omek’s Gesture Authoring Tool ‎‎(GAT). At Omek we’ve invested a lot of time and resources into developing this tool. Using ‎the GAT, gesture creation is based on “machine learning”, which means that the algorithms ‎driving this tool learn to identify patterns by example. You provide varied examples of a ‎specific gesture, and the GAT learns to identify this gesture from among other gestures and ‎motions. Using Omek’s GAT you don’t have to write a single line of code. Instead, you record ‎different users performing the gesture you want to create and “mark” in the video what ‎frames the user performed the gesture. Using Omek’s GAT you can significantly cut the time it takes to ‎create custom gestures — you can even create your own simple gesture in just a matter of minutes with no prior experience!

Check out our next post for more details on our Gesture Authoring Tool — which you can  download and start working with for free now!  Click here to download.