Today I’ve been looking for information about the origins of
Apple’s ‘data detector’ technology. Although it
was available in some form starting in the late 90s, it seems to have
disappeared at some point, resurfacing with the release of Leopard.
(See also.)
In Leopard, there’s a private (i.e. Apple-only)
framework, which can be found at
/System/Library/PrivateFrameworks/DataDetectorsCore.framework/.
The detectors are in plain-text format, and can be seen in the
Resources/Patterns subdirectory.
The system seems to have a loose library structure. There is a common.ddn
file that defines keywords in a manner similar to
POSIX’s character classes; this file is
often imported into more specific detectors. There also appears to be some
minor similarity to the syntax of Pascal, but I’m not a programmer so don’t
assume this means anything.
Anyhow, here are some historic mentions of the subject I’ve found, sorted
roughly by publication date:
This is presumably the first public mention of the technology Update:
it is not; see here, and the paper
was published in Communications of the
ACM
in March of 1998.
It seems that even from the start, the system was intended to be a plugin
architecture — there would perhaps be some included detectors, but users could
easily create their own detectors with the help of a fairly simple syntax that
loosely resembles regular expressions. The present syntax seems to have a
generally similar format, although it may be complexified somewhat for greater
power.
The goal of our research on intelligent agents was to create something useful
for our customers, but something with that sprinkling of pixie dust that
would make it seem “intelligent.”
Ben Shneiderman observed that claims
about intelligent software agents are vague, dreamy and unrealized. We
started from a simple but focused approach to agents, that they should have
the ability to infer appropriate high-level goals from user actions and
requests, and take action to achieve those goals. Further, based on a study
of reference librarians as exemplary human agents we wanted to build a system
in which the user would not have to state goals explicitly and in detail — we
learned from librarians that a large part of the value they provide to
clients is in working with imprecise requests. Beyond this, our general
design strategy was to keep the user’s question in front of us at all times:
Will this software do something useful for me, in an intelligent way that
makes me more productive? The system we describe here — Apple Data
Detectors — meets our criteria of being unobtrusive, having the ability to
infer user needs, and doing useful work. Apple Data Detectors will ship as a
product in 01997.
…
Past work on intelligent agents has been multi-faceted, to the point where it
is difficult to find consensus on exactly what constitutes an agent.
Researchers have used machine learning techniques to track user actions and
construct models of user preferences, created agents that employ user models,
consulting a set of parameters that describe the user, implemented planning
systems to make the leap from a user’s stated intention to the specific
actions that are required to achieve that intention, and built agents that
act as “eager assistants”. The locality of agents also varies across
different agent-based systems: some act only within one’s own machine, while
others autonomously crawl the Web, searching for interesting content, for
example. Apple Data Detectors works on the user’s own machine, and falls into
the eager assistant category, enabling rapid user action with minimal input
on the part of the user.
…
In an investigation of how people file information on their computer-based
“desktops”, we discovered that a common complaint of users is that they
cannot easily take action on the structured information found in everyday
documents. By structured information we mean data recognizable by a grammar.
Ordinary documents are full of such structured information: phone numbers,
fax numbers, street addresses, email addresses, email signatures, abstracts,
tables of contents, lists of references, tables, figures, captions, meeting
announcements, Web addresses, and so forth. In addition, there are countless
domain-specific structures such as ISBN
numbers, stock symbols, chemical structures, mathematical equations and so
forth. These structures are not only relevant to users, but are also
recognizable by present day parsing technologies. The type of a structure can
be used to identify appropriate actions that might be carried out on the
structure — place a meeting on a calendar, add an address to an address book,
dial a phone number, open a URL, find the
current price of a stock, file an ISBN number,
compile a list of abstracts, and so forth. The system we developed to enable
people to work more fluidly with structured information is called Apple Data
Detectors.
I’ve put together a comparison shot of the data detectors being used on an
email message. The top capture demonstrates the system in use in 01997, and is
included with the article; the lower one demonstrates the system as it appears
in Leopard.

The plan included the ability to build on other detectors, which created
something of a dependency jumble:
The solution for shared, compatible detectors requires developers to register
their detectors. Our registry is supported by
Component Integration Labs, a company
responsible for maintaining the Apple Event Suites (among other standards).
Apple Data Detectors will make use of the registry which contains definitions
for classes of data objects and for events that operate on the objects.
Classes of detectors will be defined as data objects and developers must
write detectors that detect required fields of the detector class. Detectors
can detect more information than the class requires, but they must detect at
least the data that the class requires.
This may still be an issue as a result of the import command; I don’t know
what happens if a detector attempts to pull an external detector that doesn’t
exist — and because this is a private framework there’s no documentation
available other than the included patterns.
This article was briefly mentioned in a paper on teaching machines how to
understand human language:
Recognizing textual entities
[…] These entities may be detected using various techniques. Regular
expressions and pattern matching are often used. Apple Data Detectors (Nardi,
Miller, & Wright, 01998) use context-free grammars.
Script Editor Controls/Commands
[…] In addition, when creating a script to use an Apple Data Detector
(ADD), use the description field to contain
the type of detector that will be referenced in the script and other values.
ADD is an intriguing Apple technology that
allows you to run scripts that respond to contextual menu selections. Chapter
20, Apple Data Detectors Extension, is devoted to
ADD.
Unfortunately, only two pages of chapter 20 are available in
Google’s online viewer, and
Amazon’s preview functionality won’t work with me.
With the examples given on the pages that are available, it seems as though
one would define detectors externally and ask for them via AppleScript. The
detector presumably gives a return value which is treated as any other
AppleScript object data. The sample code uses the detector patterns as a way to
create custom detector actions.
5.2.3 The Apple Data Detector
The Apple Data Detector Software Package was shipped as a product first in
01997. Later on, newer and more sophisticated versions were
introduced.
The essence of the product is in being able to extract semantics from
everyday documents, without asking the user to recreate the documents in a
new way. Documents are automatically redefined from a stream of characters,
and reformatted so that the specific requests stated by the user become
explicitly visible.
In other words, the document is changed so that the implicit goals of the
user (ambiguous from the initial document point of view) become explicit
(from the modified document point of view). The essence of the success is in
the application of sophisticated parsing and recognition techniques, as well
as in the application of appropriate response and collaboration methods.
Onwards
I don’t know anything about future OS X releases. Perhaps Snow Leopard will
promote this to a publicly-available technology; perhaps we’ll have to wait for
a release beyond that.
The technology involved is certainly an interesting one, and could integrate
nicely with microformats — for example, automatically detecting hCards
and offering to put them in one’s Address Book. I guess we’ll just have to see.