The thorny issue of archival content on the intranet

The thorny issue of archival content on the intranet

Archiving is tricky. How have you dealt with it on your intranet? Have you ever preformed a giant web harvest snapshot and backup of your intranet?

March 30, 2010

*Update: Whoaaa, this blog post is really old! Check out some more recent posts here. 

Yes, it's true. Those are almost verbatim the words that were uttered by an intranet manager I once met. I've used it many times in conversations with clients about slimming down their intranets.

Now, you're thinking, I'd love to delete a lot of old content on my intranet, but I just can't. I know that Chris told me to blow it up, but I've got some pretty big reasons as to why I can't. Like regulatory. Or records management. Or just plain old paranoia.

And now that's disk space is so cheap, there's really no cost to simply keep a copy of everything my organization has published to the intranet for the past 15 years. Besides, someone might need it one day. And then it will be here for them.

4442489313 F078ef73cd O

Of course, we know that there's issues with that. While disk space is cheap, the cognitive load experienced by intranet users is high. Ever tried finding a particular document or collection of documents amongst 100,000 others on an intranet with a sub-par search engine, questionable information design, and highly varying degrees of reliable content? And our time is precious and expensive. Managing and maintaining 100,000 and growing pages on the intranet can cost lots.

A noble goal for many of our customers is a smaller, more relevant intranet. People like James Robertson have been calling for this for years. But it too is hard to do.

What help is there?

Recognizing the characteristics of your content is a great first step. This classic from Paul Chin in the Intranet Journal from 2004 is one of the few articles I've ever found that tackles the time-based dimensions of your content. What's your content's lifespan?

Once you recognize the short-term / long-term orientation of your content, how do you design your site for it? One of the best metaphors that comes to my mind is the IA community's adaption of Stewart Brand's notion of scaffolding in his book How Buildings Learn. Peter Merholz and Jesse James Garrett blogged about this in 2002. I still think it's a powerful concept.

And reminds me a great deal of how we deal with each other through physical space (which I've blogged about before via Edward T Hall's notion of proxemics).

What's the "stuff" of your intranet? The "skin" or the "structure" -- how do you assemble your content based on its temporality or permanence?

And finally, what patterns can we use from the wiki body of knowledge to help re-enforce editorial activities that will keep the intranet a cleaner and tidier place?

Mike Briggs of Sun had this important point in his post on stale content: keep the authors tied to their content as much as possible. The publish-and-forget anti-pattern of intranet publishing, combined with the "orphaned content" anti-pattern are harder to have happen if you keep the connection alive between content and author.

You created this page, it's your responsibility to keep tabs on it, remove it when you see fit, or pass the ownership onto someone who will. That's a design pattern that we baked into ThoughtFarmer from the start: there is no anonymous page ownership. Page and author are always coupled together.

Archiving is tricky. How have you dealt with it on your intranet? Have you ever preformed a giant web harvest snapshot and backup of your intranet, like the US government did with their federal sites in the past few years?

Has anyone ever come asking for one of those 70,000 deleted pages? What will your intranet look like if you could time travel to the future?

Have questions? Get in touch! We're always happy to hear from you.