February 28th, 2008

Mathew Ingram has decided that the problem with the semantic web is that it’s as boring as dry toast. Of course, by Mathew's standard, all the stuff that makes the web work is also boring as hell. It's probably a good thing, then, that some people looked beyond the need for immediate titaliation when it comes to the tech underlying this environment, or Mathew's audience for his opinions would be his immediate family members, and perhaps those neighbors not quick enough to run away when seeing him approach.

He also writes:

It’s all about plumbing and widgets and data standards, all of which have names like FOAF and TOTP and SIOC and whatnot. It’s right off the dork-o-meter. The Lone Gunmen from The X-Files would have a hard time getting interested in this stuff, let alone anyone who isn’t married to their slide rule or their pocket protector.

Now, taking Mathew's complaints of No glitter! No glitter! Mama, Mama, where's my glitter! seriously, I decided to put my slide rule down for a sec and see if I couldn't respond to his one statement about no one knowing what this all means.

First, there was the web. The web was dumb, but it was hyperlinked.

Then, there was search. Search followed hyperlinks, scraped pages, massaged keywords and tested the strength of the links. The web was still dumb, but number crunching helped generate some smarts. Think of your favorite dog. Yeah, that smart.

Next, there was the semantic web. The semantic web says, You and I can derive understanding from this blob of text on this page, but applications can't. Applications can pull keywords and run algorithms, but can only approximate what this blob of text is all about. What if we add a little information to this blob of text so that applications don't have to crunch numbers or make guesses as to what we mean?

How do we add a little information? A hundred different ways. We can use microformats, or RDFa, or RDF, or whatever the HTML5 people cook up for us. With this little bit of extra information, applications can access a web page list that's created with UL/LI elements, but instead of having to look at the text in the list and try to guess what the list is all about, it can read that little bit of data and know that the list consists of recommended books. Perhaps they can take that little list of books and use another application to look up these books at Amazon. Or at their library. Or better yet, click a button and load all the books into our Kindle. (Assuming that Mathew doesn't subscribe to the Steve Jobs school of, "We don't read, we aint' got no books, gimme the vids", school of thought.)

The little bit of information might, instead, be an address for an event, triggering the browser to add that event information to a desktop calendar application.

It could be information about people we know and how we know them, so that when we move from Facebook, which is today's darling, to MyPowerBase, we can tell MyPowerBase to add all people who we have defined as friends, but not those defined as just contacts.

If the information is embedded in a photo–wow, information embedded in a photo, how dull–when we upload the photo to a site like Flickr, it could automatically be added to a map, with all the other photos from the same location. It can be pulled up on a search someday, when we ask the web to show us all photos for St. Louis, or for a certain block in St. Louis. Perhaps it can even help us find photos that are licensed Creative Commons so we can steal them.

I might write about a product or company, and the little bit of information I add to my post might help others who are thinking of doing business with the company, or buying that product. Sure, search engines can scrape the content and try and gleam useful bits based on keywords such as the product or company name, but we've all had enough really strange search results to know how far search can go, no matter how brainy the algorithm.

Someday, I'll be able to write about movies and add just a little bit of extra information, and we can do the same for movies. Or music. Or cooking recipes ("give me all recipes on the web that use apricot jam and bourbon, but I don't want chicken"). Or even poetry, though don't mention poetry around Sir Tim–it makes him peevish.

Mathew is very addicted to FriendFeed, which allows him to pull in all the activities of his friends in various places. I bet if we scratched the surface of this application, a lot of the data that makes the application tick comes courtesy of the semantic web dorks.

I could go on and on, but I've already been away from my slide rule too long. Instead I'll end with the best for last: because all of these different ways of adding that tiny little bit of useful information to blocks of text or photos or video files or what have you are based on agreed upon specifications, we can use applications to merge this data and use it for something new; something we haven't thought of yet. See, now that's when it really gets exciting because rather than coming up with an idea and then taking five years to get enough data to test it, we'll already have the data, at no extra effort or cost.

Maybe I've been cooped up in my cube with my computers and code for too long, but that strikes me as kind of interesting. In a dorky sort of way.

February 10th, 2008

On today's tenth anniversary of the birth of XML, Norm Walsh writes:

I joined O'Reilly on the very first day of an unprecedented two-week period during which the production department, the folks who actually turn finished manuscripts into books, was closed. The department was undergoing a two-week training period during which they would learn SGML and, henceforth, all books would be done in SGML…My job, I learned on that first day, would be to write the publishing system that would turn SGML into Troff so that sqtroff could turn it into PostScript. “SGML”, I recall thinking, “well, at least I know how to spell it.”

Ah yes. "Unix Power Tools" was formatted as SGML, the one and only book at O'Reilly I worked on that wasn't in a Word format. I must express a partiality to my NeoOffice, though the SGML system was ideal for cross-referencing and indexing. OpenOffice ODT, or OpenDocument text, will be the most likely format for the next UPT. Just another example of the permanent/impermanence of web trends.

Norm also mentions about HTML5 possibly being the nail in this child of SGML's coffin, but as I wrote recently, the folks behind HTML5 have solemnly assured us this specification also includes XHTML5. I'd hate to think we're giving up on the benefits of XHTML just when they're finally being realized by a more general audience.

Of course, I'm also fond of RDF/XML, which seems to cause others a great deal of pain, the pansies. And I've never hidden my SVG fandom and SVG is based in XML. I must also confess to preferring XML over JSON–you know, good enough for granddad, good enough for me. Atom rules. Or is that, Atom rocks? I'm also sure XML has squeezed between the joints of many of my other applications, and I just don't know it.

February 8th, 2008

I really kick myself now for not including a mention of gnuplot in "Painting the Web". I had one chapter on graphics and data, and it would have been a nice fit. However, it does need a nice installation environment for the Mac, and that was one of the criteria for including mention of tools.

We're told that a Mac-specific installation of gnuplot is coming. When it does, I'll include a link in the graphics tools section of the book's supplementary site.

Another handy graphical tool is svgfig, which allows you to draw mathematical figures in SVG using Python. This tool should be very simple to install if you have Python installed. Using it, though, does require an understanding of math. Of course.

I would say that 2008 is the year of SVG in addition to the year of semantics. Works for me, though perhaps I should have called my book, "Painting the Semantic Web".

(Thanks to Michael Bernstein for mention of svgfig)