The ability to determine what day-to-day activity (such as cooking pasta, taking a pill, or watching a video) a person is performing is of interest in many application domains. A system that can do this requires models of the activities of interest, but model construction does not scale well: humans must specify low-level details, such as segmentation and feature selection of sensor data, and high-level structure, such as spatio-temporal relations between states of the model, for each and every activity. As a result, previous practical activity recognition systems have been content to model a tiny fraction of the thousands of human activities that are potentially useful to detect. In this paper, we present an approach to sensing and modeling activities that scales to a much larger class of activities than before. We show how a new class of sensors, based on Radio Frequency Identification (RFID) tags, can directly yield semantic terms that describe the state of the physical world. These sensors allow us to formulate activity models by translating labeled activities, such as ‘cooking pasta’, into probabilistic collections of object terms, such as ‘pot’. Given this view of activity models as text translations, we show how to mine definitions of activities in an unsupervised manner from the web. We have used our technique to mine definitions for over 20,000 activities. We experimentally validate our approach using data gathered from actual human activity as well as simulated data.( permanent, local copy )
Published in WWW 2004
C.V.: CR-04