Useful and Beautiful

Have nothing in your houses that you do not know to be useful or believe to be beautiful. ~ William Morris

This is a principle I try to live by in a never-ending struggle to combat clutter and the accumulation of too much "stuff." Some things (often too many!) are chosen, others gifted, still others sent unwanted. It is easy to recycle or reuse junk mail, flowers past their bloom, clothes that are no longer worn. The utility and beauty of things physical and notional change over time, the great modifier: we mature and no longer desire everything to be purple; we move and no longer need information about our local community; we end relationships and make new ones. We refer to people who keep all of those bits of paper - coupons, newsletters, address books, calendars - and tchotchkes as a packrat. They often bemoan their own behavior and occasionally toss, sell or donate rooms full of stuff. We have to learn how to accept the values of utility and beauty of our loved ones, even if we don't agree, and learn to compromise.

These same principles apply to building knowledge organization systems. Is your taxonomy or ontology looking like the junk room of a packrat? Is each concept, each relationship in your model useful or beautiful in its context? Does every unique concept in your model answer a question, define a concept, or support the authority of the source? Do you really need to define entities for all the different kinds of salt when you're building a data model for the Fannie Farmer Cookbook? No, not really. "Salt" will do. If you are a true salt connoisseur, then create a model of salt for use in your gastronomic explorations - it is beautiful and useful to you. Do you need to model every language code in ISO 639 if your company only does business in the USA? No, you can model perhaps 1/2 dozen, or better yet - link to someone else's model. That link is more useful and just as beautiful as the codes you would re-create yourself.

Thinking on the utility and beauty of each entity in your models will help you keep out the clutter, ease the efforts towards maintenance, and focus on what adds value. It is another tactic for those looking to live more by the KISS principle: Keep It Simple and Scoped.

Curate the Content, not Just the Container

I've been using this line a great deal over the last couple of years in presentations: "Curate the content, not the container." It resonates with people who want to get at the meat of a book or article or video and not just the clever wrappings that entice you to consume content. It stems from a paper I wrote in graduate school in the late '90s, wherein I analyzed the ability of a searcher to find a known piece of content in a library containing anthologies and other such collections of small works.

Let's say you were trying to find a poem that had meaning for you. It was popular enough to be included in various collections of "The Greatest Sentimental Poems of the 19th Century," but not quite popular enough to be the shining star of such a collection. Current library practices (continuing this day) dictate that the collection be curated with it's collection title, editor, dates and so forth, and only a select few of the individual poem's titles get cataloged. It simply isn't efficient, nor does the system 'allow' for more. When you go to the library's catalog, you search on what you know - the title of poem you are looking for, and you will likely not have any results. Frustrating. You ask the librarian, but s/he cannot recall seeing it in any books in their collection. You take a few random books off the shelf and scan the contents to no avail. Then you give up and hope that Google can help you. How much more annoying to learn that the poem you wanted is in a book on the shelf in your library, but you couldn't find it in the library's catalog? Technology is no longer a limiting factor. Publisher's have sophisticated content management systems that should be configured to pass along this metadata. So why are we still failing to use that metadata?

The key part is "curate the content." The average consumer does not demand the information inside the book be curated. Business analysts have been doing so of their research portals - I know from having worked at Dow Jones that Factiva puts a great deal of effort into tagging people, companies and subjects in articles. Why aren't more content publishers doing the same? Do students want their textbooks to contain back of the book indexes only? Does a hobbyist not want to be able to find examples of a challenge they face in their learning?

School House Rock Logo, published under 'fair use' principle, no copyright impliedIt means to index not just the title, creator, publisher, dates and a few key words of an object as we do in libraries, but to tag all of the core concepts - the nouns, the verbs. I can't help but think of Schoolhouse Rock, "a noun is a person place or thing." (But don't tag all of the conjunctions, as much fun as "Conjunction Junction" was, it would be overkill for 99% of digital content!)

I've also modified the line a bit. I didn't want to imply that we shouldn't curate the container! During my librarian days I had many a patron who asked for something "which had a green cover." I appreciate the value that level of cataloging brings.

There are many activities in commercial and academic research organizations attempting to address this problem. Social tagging, browser plugins, more granular CMS systems, new SEO methods, NLP methods. They are all providing some value. We each need to find the tools right for us, and do our part in tagging our own content to bring greater value to the whole. So I ask you:

Curate the content AND the container.

Random Thoughts in the Machine

I am often asked if I am afraid of making machines too smart with all of the work going into modeling human knowledge - the concepts, the relationships, the explicit and implicit rules.

'not sure why...' by nilay

I am not.

System checks and balances aside for now, we're not trying to create HAL.

A machine will not stop to consider random things on its own, as I did today as I was making myself a sandwich for lunch: is there a dip in pastrami sales in the summer months? I mean really, completely random. I thought it as I was trying to decide whether I should cook the pastrami or eat it cool from the fridge. I usually cook it, but it's been incredibly warm here. So I just put it on the bread. As I did so, I wondered how many others might face the same choice. And then I thought, it's really better warm. I wonder if other people like it better warm? If they do, and don't want to eat it cool, do they buy less of it in the summer months? I think overall we do. Hmm. And then I proceeded to sit and eat.

Even if we detailed of my own personal connections and 'rules' about life, a computer would never randomly ask itself about pastrami. Ever. That is still reserved for the machinery of the human brain.

If some day in the future a computer does ask itself about pastrami, then is it still a machine? We'll have to dig up Gene Rodenberry's thoughts on the matter in countless Star Trek tomes and contend with greater societal issues.

Either way, it's all good. And exciting. And not fearful.

Reblog this post [with Zemanta]

Syndicate content