Early Indications April 2010 The Web of Opinion: Metadata as conversation
where. Data was kept in proprietary formats and physically located:
if the library was missing the Statistical Abstract for 1940, or some
other grad student had sequestered it, you had little chance to
determine corn production in Nebraska before World War II. Such
statistics were the exception: most data remained unpublished, in lab
notebooks and elsewhere.
Once data escaped from print into bits, it became potentially
ubiquitous, and once formats became less proprietary, more people
could gain access to more forms of data. The early history of the web
was built in part on a footing of public access to data: online
collections of maps, congressional votes, stock prices, phone numbers,
product catalogs, and other data proliferated.
Data has always required metadata: that table of corn production had a
title and probably a methodological footnote. Such metadata was
typically contributed by an expert in either the technical field or in
the practice of categorizing. Official taxonomies have continued the
tradition of creators and curators having cognitive authority in the
process of organizing. In addition, as Clay Shirky has pointed out in
"Ontology is Overrated," the heritage of physicality led to the need
for one answer being correct so that an asset could be found: a book
about Russian and American agricultural policy during the 1930s had to
live among books on Russian history, agricultural history, or U.S.
history: it was arguably about any or all of those things, but someone
(most likely at the Library of Congress) assigned it a catalog number
that finalized the discussion: the book in question was officially and
forever "about" this more than it was about that.
In the past decade, the so-called read-write web has allowed anyone to
become both a content creator and a metadata creator. Sometimes these
activities coincide, as when someone tags their own YouTube video for
example. More often, creations are submitted to a commons, and the
commoners (rather than a cognitive authority) determine what the
contribution "is" and what it is "about." Rather than editors or peer
reviewers judging an asset's quality before publication, in more and
more settings the default process is publication then collaborative
filtering for definition, quality, and meaning.
Imagine a particular propane torch for sale on Amazon.com. So-called
social metadata has been nurtured and collected for years on the site.
If I appreciate the way the torch works for its intended use of
brazing copper pipe, I can submit a review with both a star rating and
prose. Amazon quickly allowed for more social metadata as you the
reader of my review can now rate my review, thus creating metadata
about metadata.
Here is where the discussion gets complicated and extremely
interesting. Suppose I say in my review that I use the Flamethrower
1000 for creme brulee even though the device is not rated (by whatever
safety or sanitation authority) for kitchen use. The comments about
my torch review can quickly become a foodie discussion thread: the
best creme brulee recipe, the best restaurants at which to order it,
regional variations in the naming or preparation of creme brulee, and
so forth. Amazon's moderators might truncate the discussion to the
extent it's not "about" the Flamethrower 1000 under review, but the
urge to digress has long been and will be demonstrated elsewhere.
Enter Facebook. The platform is in essence a gigantic metadata
generation and distribution system. ("I liked the concert." "The
person who liked the concert did not know what she was talking about."
"My friend was at the concert and said it was uneven." and so on)
Strip Facebook of attribute data and there is little left: it's
essentially a mass of descriptors (including "complicated"), created
by amateurs and never claimed as authoritative, linked by a
21st-century kinship network. Facebook's announcement on April 21st
of the Open Graph institutionalizes this collection of conversations
as one vast, logged, searchable metadata repository. If I "like"
something, my social network can be alerted, and the website object of
my affection will know as well.
Back in November, Bruce Schneier laid out five categories of social
networking data:
1. Service data. Service data is the data you need to give to a social
networking site in order to use it. It might include your legal name,
your age, and your credit card number.
2. Disclosed data. This is what you post on your own pages: blog
entries, photographs, messages, comments, and so on.
3. Entrusted data. This is what you post on other people's pages. It's
basically the same stuff as disclosed data, but the difference is that
you don't have control over the data -- someone else does.
4. Incidental data. Incidental data is data the other people post
about you. Again, it's basically the same stuff as disclosed data, but
the difference is that 1) you don't have control over it, and 2) you
didn't create it in the first place.
5. Behavioral data. This is data that the site collects about your
habits by recording what you do and who you do it with.
What does that list look like today? A user's trail of "like" clicks
makes this list or her Netflix reviews and star ratings, themselves
the subject of privacy concerns, seem like merely the tip of the
iceberg. As Dan Frankowski said in his Google Talk on data mining,
people have been defined by their preferences for millennia --
sometimes to the point of dying for them.
With anything so new and so massive in scale (50,000 sites adopted the
"like" software toolkit in the first week), the unexpected
consequences will take months and more likely years to accumulate.
What will it mean when every opinion we express on line, from the
passionate to the petty, gets logged in the Great Preference
Repository in the Sky, never to be erased and forever being able to be
correlated, associated, regressed, and otherwise algorithmically
parsed?
Several questions follow: who will have either direct or indirect
access to the metadata conversation? What are the opt-in, opt-out,
and monitoring/correction provisions? If I once mistakenly clicked a
Budweiser button but have since publicly declared myself a Molson man,
can I see my preference library as if it's a credit score and remedy
any errors or misrepresentations? What will be the rewards for brand
monogamy versus the penalties for promiscuous "liking" of every
product with a prize or a coupon attached?
While this technology appears to build barriers to competitive entry
for Facebook, what happens if I establish a preference profile when
I'm 14, then decide I no longer like zoos, American Idol, or Gatorade?
Will people seek a fresh start at some point in an undefined network,
with no prehistory? What is the mechanism for "unliking" something,
and how far retrospectively will it apply?
Precisely because Facebook is networked, we've come a very long way
from from that Statistical Abstract on the library shelf. What
happens to my social metadata once it traverses my network? How much
or how little control do I have over what my network associates
("friends" in Facebook-speak) do with my behavioral and opinion data
that comes their way? As both the Burger King "Whopper Sacrifice"
(defriend ten people, get a hamburger coupon) and a more recent
Ikea-spoofing scam have revealed, Facebook users will sell out their
friends for rewards large and small, whether real or fraudulent.
Finally, to the extent that Facebook is both free to use and expensive
to operate, the Open Graph model opens a fascinating array of revenue
streams. If beggars can't be choosers, users of a free system have
limited say in how that system survives. At the same time, the global
reach of Facebook exposes it to a broad swath of regulators, not the
least formidable of whom come out of the European Union's strict
privacy rights milieu. As both the uses and inevitable abuses of the
infinite metadata repository unfold, the reaction will be sure to be
newsworthy.
Link to original post
Other Posts by John Jordan
Identity and Privacy: Early Indications, June 2011 - July 1, 2011
Early Indications October 2010: The Analytics Moment: Getting numbers to tell stories - November 2, 2010
May 2010 Early Indications: Devising the cloud-aware organization - May 22, 2010
Early Indications February 2010: Ticket Punching - February 27, 2010
Early Indications December 2009: Yet Another Predictions Issue - December 23, 2009
The moderated business community for business intelligence, predictive analyics, and data professionals.
--Sponsored--
From
By Steve Jones, Capgemini
Sea Change: Is your company prepared for the coming big-data wave?
By Paul Barsch and George Kong
Release the Flow: The Teradata Aster Analytic Pipeline Discovery sets the stage for uncovering new information.
By Mary Pat Simmons, Kevin J. Lewis and Dan Fritz
Smooth Road to System Upgrades: The Teradata Pre-Upgrade Assessment helps you avoid the bumps.
The Predictive Analytics in the Cloud Study is complete!
Register here to access the full results of this exclsuive study on Predictive Analytics and Cloud Technology including a whitepaper, 2 webinars, multiple podcasts and more!
Stephen Baker is the author of The Numerati & a journalist with 20 years of experience at BusinessWeek. More »
Paul Barsch directs professional services marketing programs for Teradata and has more than fifteen years of information... More »
Gary Cokins is an internationally recognized expert, speaker, and author. More »
Jill Dyché is an internationally recognized author, speaker, and business consultant. More »
Themos Kalafatis has worked as a consultant for Data Mining, Text Mining, Information Extraction and Data Quality for over a decade. More »
James Taylor is CEO and Principal Consultant at Decision Management Solutions and a leading expert in decision management. More »
- YOU
- Dean Abbott
- Teradata AusNZ
- Paul Barsch
- Meta S. Brown
- Jason Burke
- Ted Cuzzillo
- Barry Devlin
- Chris Dixon
- Jill Dyché
- Timo Elliott
- Teradata EMEA
- Teradata Experts
- Michael Fauscette
- Bob Gourley
- Julie Hunt
- Doug Lautzenheiser
- Jack Mason
- Darryl McDonald
- Alex Olesker
- David Smith
- James Taylor
- Daniel Tunkelang
Webinar: Making Sense of Service Organization Audits
When: Tue, 2012-02-14 02:00
Webinar Invite: Making Business Intelligence Faster & Easier
When: Tue, 2012-02-21 15:00
Banish Poor Application Performance: Eliminate Business Disruptions, Increase End User Productivity
When: Wed, 2012-02-22 11:00
O’Reilly Strata 2012
When: Tue, 2012-02-28 08:00
IFSUG Summit
When: Sun, 2012-03-04 08:00
Predictive Analytics World, March 4-10, 2012 San Francisco
When: Sun, 2012-03-04 09:00
Text Analytics World Topics & Case Studies – March 6-7, 2012 in San Francisco
When: Tue, 2012-03-06 09:00
Predictive Analytics World, April 25-26, 2012 in Toronto
When: Wed, 2012-04-25 09:00
Salford Analytics and Data Mining Conference
When: Thu, 2012-05-24 12:09
Big Data World Europe
When: Wed, 2012-09-19 08:30

About Social Media Today






