Moving Large Documents to the Web | WebReference

Moving Large Documents to the Web


Moving Large Documents to the Web

By Rick Diehl (

Taming your monster documents to live on the Web takes special tools and skills. Unfortunately, many Web designers, developers or information architects cringe at the idea of converting large-scale documents or paper-based materials to the Web. More Web developers are faced with knowledge repositories that are critical to organizations. These include reference books, product specifications, legal documents and industry specific codes. It's good information but not Web-based. Many Web development teams do not have the budget or infrastructure to move to Content Management Systems driven with XML. An alternative might be the same approach used by government agencies who e-publish large amounts of documents using HTML, SGML and portable document format.

Our team of Web developers, instructional designers and graphic artists has built the infrastructure, Web tools and methods to move large scale documents to intranets or the Web. Our experiences and obstacles might benefit other Web developers facing the same challenges.

This article covers the following stages of development:

Project Parameters

Having large amounts of ready-to-use content is unusual for most Web development. Typically, organizations or businesses have ideas or graphics but no solid text content. For large document conversions, the first task is to prioritize the conversion based on organizational or business needs. This might sound obvious, but you can roll out segments of large-scale documents in phases. A phased rollout approach releases critical information to a usable Web site earlier; then releasing additional content in later rollouts.

Materials hopefully come in some type of digital format. Web developers are handed a variety of documents and publications; some of which are electronic. If the information is large and not in a digital format, look for a conversion service that specializes in scanning, copy writing or whatever makes sense for the original format. Conversion services have the technology and turn around times that make the cost worth it.

A few inherent problems with document conversion are:

  1. Planning and designing for the non-linear Web, not linear printed documents.
  2. Determine who will do maintenance on the site. SME's can be valuable content editors to keep the Web developers from having to become experts on content (more on this later).
  3. Dynamic content (original material that is subject to change frequently) can hinder deployment and future integrity of the site's information.

An analogy of "overly" dynamic content would be the use of information kiosks in supermarkets; similar to the systems used bookstores. Several years ago, supermarkets started experimenting with kiosks that would tell you where to find items in the store. The problem was that the database of available foods is constantly changing. Couple that with the fact that most grocery stores do not locate all food items in the same place over time. The result, the supermarket kiosks were an example of misused technology. Now compare that to large bookstores. The database of published materials has become an industry standard. Take that publication database and dynamically compare that to your store inventory and store map and you have a great resource for locating books online. That is why many large bookstores use search kiosks. Avoid content that is so dynamic that the maintenance or upkeep outweighs the business benefits.

One last issue with large document projects is inconsistencies in original material look, feel, graphics use, and accuracy. This brings us to the next phase - information design.


Created: February 2, 2001
Revised: February 5, 2001