New Release! Create Custom Gestures with Omek’s Gesture Authoring Tool

What is the Gesture Authoring Tool?
In our last post, we explored the question, “what is a gesture?”. In this follow-up post, we are excited to announce the availability of Omek’s Gesture Authoring Tool (GAT, for short). This represents just one of a set of tools that fall under the Beckon Usability Framework, all intended to help make development with Beckon that much easier and faster. In this post we will go into a bit of detail on how it works and why it’s so important for the field of gesture recognition. Look out for future posts for details on additional components of our Beckon Usability Framework such as our Beckon Motion Toolkit for Unity3D, C# Extension, and more!

GAT is an easy-to-use, yet highly sophisticated tool that dramatically speeds up your development cycle and makes creating custom gestures accessible to anyone, even without any knowledge of how to code.

How does it work?
To create a custom gesture, you record examples of different people performing the gesture; GAT applies its advanced machine-learning algorithms to automatically “learn” to recognize that gesture. This frees you from the burden of analyzing and coding gestures manually, while offering the additional benefit of producing more accurate gestures than you would get with manual coding.

The GAT is available as an immediate and free download from our support portal.

Why do we think GAT is so great?
Coding gestures can be a complicated undertaking but GAT simplifies the process:

  • Gestures are subjective. Ask three different people to wave hello and you are likely to get three different types of waves. Each person may hold their arm at a slightly different height or angle, or they may wave at different speeds. If you define your gesture too closely to one specific model (as often happens with manually-coded gestures), your application will fail to recognize variations of that gesture when performed by different people.
    • GAT learns by example. The main idea of training gestures in GAT is to record several examples of a gesture, performed by different people. GAT then applies a machine learning algorithm that analyzes motion features, and learns to detect a specific gesture from among other gestures and movements.
  • People are sized differently.For example, the lady writing this blog post is on the shorter side of the spectrum. The range of motion may vary widely when people of different heights perform the gesture (say a child or a taller than average individual) and so you risk the gesture not being identified correctly.
    • Normalized skeleton. Before GAT analyzes a gesture, it first normalizes the tracked skeleton to a standard set of dimensions. This eliminates most of the problems related to people of different sizes.
  • Gestures are subtle.Small movements can have very different meanings, so we understand the importance of accurately recognizing a gesture for what it is.
    • Gesture Packs. GAT allows you to create a “pack” of several gestures that you want to create for your application. GAT can then train all these gestures together, enabling it to differentiate among them even if they are similar.

A few other features that enhance GAT’s usability:

  • Reports and Iterative Improvement. When you train gestures in GAT, it will show you a statistical report of the results, detailing the accuracy of the gesture recognition and the instances where errors occurred. You can then modify or add to your examples, rerun the training process and see how you’ve improved.
  • Mirror Gestures. Once you’ve defined a gesture, you can create its “mirror gesture” by clicking a single check-box. For instance, if you’ve defined a “right swipe” gesture, you can automatically create a “left swipe” gesture, without having to train the mirror gesture.
  • Composite Gestures. You can create a “composite gesture” by defining a sequence of two or more basic gestures that you’ve already defined.
  • Live Test. Once you’ve trained your gestures, you can test them in Live Camera mode. While watching a person’s movements in the GAT viewing pane, you will see gestures detected in real-time.
  • Display Options. GAT supports a rich set of options for displaying person tracking information. These include: 2D and 3D skeleton tracking, color and depth images, joint and bone display overlays.

Gesture Recognition: What is a Gesture?

These days there is a lot of discussion about the transition from a GUI, or Graphical User Interface, to a ‎NUI, or Natural User Interface. The entire experience of how we interact with devices is undergoing ‎significant changes as we move towards more intuitive, gesture-based systems that are based on ‎natural movements. But it also raises a number of related questions. Such as, what constitutes a ‎gesture? How do I decide on which gesture to use for a given task? What makes a gesture natural and ‎intuitive?‎

Over the coming weeks we will provide our thoughts on a number of these questions through our ‎blog. For today, though, we’re going to lay the framework by defining what gesture recognition ‎means in the context of NUI-based applications. ‎

To provide more detail on the topic, I enlisted the help of Omek’s VP Products, Doron Houminer, to ‎share his thoughts. Below, you’ll find the highlights of our conversation. As always, feel free to chime ‎in – we’d love to hear your questions and comments on the subject.‎

Q: Let’s start off with defining, what is a gesture?‎
Doron: In the context of computer vision, a gesture is a unique body movement that can be identified ‎as a discrete action. For example, a gesture could be a hand wave, a swipe of the arm, or a kick of the ‎leg. In a NUI application, gestures are used to generate specific responses (i.e., turn on / off, change ‎the volume or channel).‎

Q: What’s the difference between motion tracking and gesture recognition?‎
Doron: I think of motion tracking as a continuous experience whereas gesture recognition represents ‎an event that triggers something to happen in response. ‎
Motion tracking can be considered the “raw material” that enables gesture recognition. Motion ‎tracking monitors body positions and locations, and represents them as a virtual skeleton. It lets you ‎know the position of every joint at every point in time, representing them in real time.‎
Gesture recognition, on the other hand, is the process of identifying a defined gesture from a ‎sequence of motion tracking frames and labeling it accordingly. It can be explained as “the ‎mathematical interpretation of a human motion by a computing device”. ‎

Q: Why do we need gestures?‎
Doron: In the context of an application, gesture recognition is used to detect actions and elicit a ‎response. You can think of it like this: tracking is the “where” – where the player is located within the ‎field of view – and gesture recognition is the “what” that happened. Gesture recognition represents a ‎higher level of understanding of the scene being recorded. The legendary Bill Buxton sums it up quite ‎well in his chapter on Gesture Based Interaction – in the context of machine vision, “Gesture ‎recognition can be seen as a way for computers to begin to understand human body language, thus ‎building a richer bridge between machines and humans than primitive text user interfaces or ‎even GUIs (graphical user interfaces), which still limit the majority of input to keyboard and mouse.”‎

Q: Are there different types of gestures?‎
Doron: Yes. We think of gestures as falling into one of three categories:‎

  1. ‎Pose gestures: These are when the body (or a limb of the body) is in a specific, static posture. ‎A pose gesture can be illustrated by a single snapshot.
  2. Single-motion gestures: In this case, the body or part of the body performs a specific motion, ‎over a finite, usually brief period of time.
  3. Continuous-motion gestures: This refers to a repetitive action with no time limit, for example: a person running.

Q: So what are the technical means for generating gestures?‎
Doron: I’ll review a couple of different ways to create gestures, including Omek’s innovative Gesture ‎Authoring Tool. We will be dedicating our next blog post to describing the benefits and reviewing best ‎practices of using this tool:‎

  • Manual coding. If you go this route, it requires a skilled programmer to write custom code for ‎each gesture. The programmer has to articulate and define all of the parameters of the ‎motion they want to classify as a gesture. They will need to specify where the joint positions ‎should be in a consecutive series of frames. It is helpful if the programmer has a background in, ‎and understanding of, human anatomy, physics, and algebra.‎
  • Omek’s Gesture Authoring Tool. The second option is to use Omek’s Gesture Authoring Tool ‎‎(GAT). At Omek we’ve invested a lot of time and resources into developing this tool. Using ‎the GAT, gesture creation is based on “machine learning”, which means that the algorithms ‎driving this tool learn to identify patterns by example. You provide varied examples of a ‎specific gesture, and the GAT learns to identify this gesture from among other gestures and ‎motions. Using Omek’s GAT you don’t have to write a single line of code. Instead, you record ‎different users performing the gesture you want to create and “mark” in the video what ‎frames the user performed the gesture. Using Omek’s GAT you can significantly cut the time it takes to ‎create custom gestures — you can even create your own simple gesture in just a matter of minutes with no prior experience!

Check out our next post for more details on our Gesture Authoring Tool — which you can  download and start working with for free now!  Click here to download.