February 28th, 2008

Mathew Ingram has decided that the problem with the semantic web is that it’s as boring as dry toast. Of course, by Mathew's standard, all the stuff that makes the web work is also boring as hell. It's probably a good thing, then, that some people looked beyond the need for immediate titaliation when it comes to the tech underlying this environment, or Mathew's audience for his opinions would be his immediate family members, and perhaps those neighbors not quick enough to run away when seeing him approach.

He also writes:

It’s all about plumbing and widgets and data standards, all of which have names like FOAF and TOTP and SIOC and whatnot. It’s right off the dork-o-meter. The Lone Gunmen from The X-Files would have a hard time getting interested in this stuff, let alone anyone who isn’t married to their slide rule or their pocket protector.

Now, taking Mathew's complaints of No glitter! No glitter! Mama, Mama, where's my glitter! seriously, I decided to put my slide rule down for a sec and see if I couldn't respond to his one statement about no one knowing what this all means.

First, there was the web. The web was dumb, but it was hyperlinked.

Then, there was search. Search followed hyperlinks, scraped pages, massaged keywords and tested the strength of the links. The web was still dumb, but number crunching helped generate some smarts. Think of your favorite dog. Yeah, that smart.

Next, there was the semantic web. The semantic web says, You and I can derive understanding from this blob of text on this page, but applications can't. Applications can pull keywords and run algorithms, but can only approximate what this blob of text is all about. What if we add a little information to this blob of text so that applications don't have to crunch numbers or make guesses as to what we mean?

How do we add a little information? A hundred different ways. We can use microformats, or RDFa, or RDF, or whatever the HTML5 people cook up for us. With this little bit of extra information, applications can access a web page list that's created with UL/LI elements, but instead of having to look at the text in the list and try to guess what the list is all about, it can read that little bit of data and know that the list consists of recommended books. Perhaps they can take that little list of books and use another application to look up these books at Amazon. Or at their library. Or better yet, click a button and load all the books into our Kindle. (Assuming that Mathew doesn't subscribe to the Steve Jobs school of, "We don't read, we aint' got no books, gimme the vids", school of thought.)

The little bit of information might, instead, be an address for an event, triggering the browser to add that event information to a desktop calendar application.

It could be information about people we know and how we know them, so that when we move from Facebook, which is today's darling, to MyPowerBase, we can tell MyPowerBase to add all people who we have defined as friends, but not those defined as just contacts.

If the information is embedded in a photo–wow, information embedded in a photo, how dull–when we upload the photo to a site like Flickr, it could automatically be added to a map, with all the other photos from the same location. It can be pulled up on a search someday, when we ask the web to show us all photos for St. Louis, or for a certain block in St. Louis. Perhaps it can even help us find photos that are licensed Creative Commons so we can steal them.

I might write about a product or company, and the little bit of information I add to my post might help others who are thinking of doing business with the company, or buying that product. Sure, search engines can scrape the content and try and gleam useful bits based on keywords such as the product or company name, but we've all had enough really strange search results to know how far search can go, no matter how brainy the algorithm.

Someday, I'll be able to write about movies and add just a little bit of extra information, and we can do the same for movies. Or music. Or cooking recipes ("give me all recipes on the web that use apricot jam and bourbon, but I don't want chicken"). Or even poetry, though don't mention poetry around Sir Tim–it makes him peevish.

Mathew is very addicted to FriendFeed, which allows him to pull in all the activities of his friends in various places. I bet if we scratched the surface of this application, a lot of the data that makes the application tick comes courtesy of the semantic web dorks.

I could go on and on, but I've already been away from my slide rule too long. Instead I'll end with the best for last: because all of these different ways of adding that tiny little bit of useful information to blocks of text or photos or video files or what have you are based on agreed upon specifications, we can use applications to merge this data and use it for something new; something we haven't thought of yet. See, now that's when it really gets exciting because rather than coming up with an idea and then taking five years to get enough data to test it, we'll already have the data, at no extra effort or cost.

Maybe I've been cooped up in my cube with my computers and code for too long, but that strikes me as kind of interesting. In a dorky sort of way.

Comments
1
Bud Gibson - 10:23 am February 28, 2008

Well Shelley, surely you can spice it up.

2
Laura - 2:42 pm February 28, 2008

In a meet-up a couple of months ago, we were discussing microformats and semantic markup and one guy got quite hot under the collar, challenging not that it's required — which it isn't, of course — but that it's worth the hassle for anybody to do. After trying to explain in various ways, we got nowhere. It was like explaining "blue" to a blind person.

3
Ian Hickson - 3:16 pm February 28, 2008

It's unlikely that HTML5 will include anything substantial in terms of "adding a little information". What we've found is that anything more fine grained than your basic block-level tag is too much for most authors. Most people don't even use things like <cite>, <em>, <strong>, etc, correctly. Asking most people to do things like RDF, RDFa, or Microformats is pretty much a lost cause, as far as I can tell (though of those, Microformats is the most likely to get any traction).

The fundamental problem is that most people don't care if the computer can understand the page. In fact we're lucky if they care if blind people can understand the page, and those are fellow people and potentially paying customers!

4
Shelley - 4:04 pm February 28, 2008

Bud, Mathew will just have to live in ignorance.

Laura, sounds like Ian is agreeing with your "can't see blue" person.

Ian, people won't annotate their data if they don't necessarily see any advantage. People also won't annotate if they're not aware that it can be useful for others if they do.

But give people positive feedback for annotating their data, and they'll go to town. A case in point is the data attached to photos uploaded to Flickr. People never used to associated 'metadata' to their photos before Flickr. Now, they're using the most interesting variations because they can see the data being used effectively.

I think the problem with cite is confusion about how it's used. As for em and strong, I see these used frequently, and correctly. Look how much blockquote is used now, and it's also a 'meaningful' HTML element. The problem is that we're making incremental improvements in a world that wants applications that, to quote most of the people who end up on Techmeme, "change everything we know about ________!"

There is no killer app for RDF. Oh no, oh god! Quick kill it and bury it before it stinks up the place!

Geez, before I even got to a second edition on my book. I think I'm going to wait just a little while longer before I kiss the semantic web good-bye.

We are in such a race to the bottom. If we can't see it now, taste it now, or hold it in our hands, we're so quick to say, "Well, it failed. Let's scrap it."

The web is still a raw teenager, many of the semantic web technologies are five years or less old, we don't put any premium on tools creating good markup–in fact, attempting to get tools to deliver decent markup is considered quaint by some, foolhardy by others.

We want the semantic web and we want it now. If we don't get it now, move on! There is no concept of nurturing the environment, planting the seeds, and letting ideas grow in this teflon-coated speed race. In our haste and our impatience, we're going to end up defining the future of the web based on the lowest common denominator: what will Google give us. No offense to Google, but that's not my idea of an overly bright future.

No, people aren't going to ask for cite just because of cite. But cite will be there for people to use, and some people will recommend its use (after first clarifying its purpose), and eventually some product will use it for something cool, and there will be some data that the product can use in place. Then other people will see this product and think its cool and they'll start using cite, and other tools will see the product and the data and then they'll create something and next thing you know, cite hits its momentum.

So it takes one or two or five years. Who cares? You know, data isn't a banana, it doesn't rot.

You know something else? I think you're selling web authors short. Look at how far we've come in the last ten years. I may seem critical of the the next best thing, but I never underestimate the interest and the passion of people on the web.

5
Mathew Ingram - 4:12 pm February 28, 2008

Point taken, Shelley. I didn't mean it was just about sex appeal, really — just that it would help to have some applications that regular people might want to use (although FriendFeed is getting close).

And I have nothing against geeks (or dorks, for that matter). Some of my best friends are geeks. As for the running away thing — that's why I like the fact that all my neighbours are old and can't run as fast, so if the blog stops working I still have that :-)

6
Shelley - 4:34 pm February 28, 2008

One point back for sense of humor, Mathew.

Orange. Semantic web applications will use the color orange, Mathew. Now you know. We all knew, but we were keeping it from outsiders.

Seriously, Mathew, the application of semantic web technologies you're looking for probably won't have SEMANTIC WEB, in yellow sticker tape draped across it. The semantic web technologies will be like the web technologies, forming the basis for many of the applications that will have you going, "Cool."

If you think on it, this stuff is all amazing. Every day, you should get up and when you log on to the web for the first time that day, you should think about how far we've come in such a short period of time. You can type in something like "burningbird.net" and immediately get served my web page, wherever it is in the world. And then you can click a link, and go anywhere else in the world. You should stop and savor the experience, because it wasn't all that long ago that this stuff was like rocket science and space travel combined–pure science fiction.

Today, I subscribed to a delivery of eco-friendly toilet paper from Amazon, tonight I'll either watch something on Hulu or Joost, or maybe rent a movie at iTunes. While I'm watching the movie, what I write today is read by my friends in Australia (G'day!)–friends I've never physically met, but cherish all the same.

When I go to bed, I'll read a book I just downloaded five minutes before. No actually, tonight, I'm reading a report I found at High Beam for something I'm putting together.

All of this is like, wow! I mean, wow! Think about it. I don't how old you are, but when you were a kid, I bet you didn't have this. I bet you didn't even dream of this. Of course, all of it is familiar, taken for granted, and therefore not exciting stuff now. But if you really stop to think about it…wow!

The semantic web technologies are like the web technologies–there isn't going to be a sudden burst of sunshine through the clouds as the choirs of heaven open up. Nor will Steve Jobs trot it out, wrapped in white or black plastic. The technologies will creep in, on little kitten paws, like the fog in the bay. Someday they'll just be there.

7
Ian Hickson - 5:19 pm February 28, 2008

Don't get me wrong. I think the world would be awesome if it had reliable metadata everywhere. I just don't see it happening. We've given it 10 years already, and it's gone nowhere, and nobody (except us theorists) cares.

Flickr is a good example. The reliable metadata there is the user-visible stuff that matters, and the automated stuff that the user doesn't have to do anything with (e.g. camera settings). The same principle applies to the Web. People don't mark up postal addresses with hAddr, they don't mark up calendar appointments with hCalendar… they might do both, if there was something that happened when you did it, but nothing does.

Specific metadata might become common. I don't think we'll ever have author-provided general metadata to the level we'd need for a truly Semantic Web.

8
Shelley - 6:42 pm February 28, 2008

Ian, I don't want to downplay your views, but people do care. Ask Mathew, he's not a theorist, and he cares. Actually, I'm not a theorist either, and I care.

The thing is, even if we don't care, the technologies that can loosely bundled as "semantic web" will happen. We're not going to be stuck in Amber, waiting for the big companies to provide centralized services and then tell us how we're going to use the web, forever. To me the semantic web is an independent web, as much as it is an intelligent web.

You seem to be unhappy that the photo metadata is either visible or camera generated. But the photo metadata is visible because there was a way to add this data to photos, and then there's a way to extract the data, and now there's applications that do both. For the longest time, all that metadata was hiding behind a pretty face, just waiting to be appreciated.

People aren't using hAddr. I have actually seen sites that do use microformats, but typically they do so because it's seen as a service. Others don't because they either don't know about it–yet–or they don't see it as part of a service they need to be in. However, someday the same thing will happen to hAddr as will happen in cite. Or maybe it won't, and hAddr will become the blink of microformats. I believe the semantic web will happen, but I don't know if all the pieces we have today will become part of it. Or if we even have all the pieces.

What puzzles me, though, is why you're so heavily involved with HTML5 if, from what I'm reading in these comments, you don't have a lot of faith in it. I don't want to put you on the spot, but I am confused. Isn't "semantics" one of the reasons behind HTML5?

9
Danny Ayers - 7:44 am February 29, 2008

Shelley, good post!

Ian, bear in mind that "most people" isn't "all people" - as in any networked system, small contributions can make a lot of difference. A significant proportion (probably most) of the data already on the Semantic Web hasn't been extracted from HTML-based markup, rather it comes directly from databases (e.g. check the Open Linking Data work), and the benefits of any data available from traditional markup will be amplified by that network.

When it comes to the document metadata subset of the Semantic Web, because docs have URIs, that metadata doesn't have to appear within the original doc - consider things like del.icio.us. Atom/RSS feeds are also usually about doc content, providing perfectly good machine-readable data. Just because a particular document markup doesn't (or if things take a wrong turn, can't) contain much machine-friendly data it isn't a blocker.

It's also worth remembering that links are useful Semantic Web data too - even if they aren't typed like the links (properties) of RDF, they still allow an agent to follow its nose to further information. If you consider that the semantics of a regular link is something like dc:related (or rdfs:seeAlso), the HTML Web is effectively a major contributor of http://en.wikipedia.org/wiki/Linked_Data .

So while I personally think the lowest-common-denominator approach that seems to currently underlie HTML5 is selling developers short (what proportion of people hand-author pages in the blog era?), as long as HTML comes over HTTP featuring URIs and links, it also has highest-common-factor value on the Semantic Web. In short, the Web is already bigger than HTML.

Re. "Think of your favorite dog" - well Basildog never really got the idea of fetch, when he was younger he'd run to the stick and then run away with it. Nowadays he usually just ignores you. Sashadog on the other hand is very keen to fetch a stick, although she's very reluctant to give it back to you. Not unlike a few search APIs I could mention. I guess we need improved StickPortability, though I'm not holding my breath.

10
Shelley - 1:19 pm March 1, 2008

Danny, would we need versioning on that StickPortability? Such as a Basildog version, as compared to a Sashadog version? After all, we wouldn't want to break fetching.

Thanks to all those who have contributed to the discussion. Comments are now closed, but you can contact the author of the post directly.