Special Edition Using SGML:Automatic versus Manual Tagging

15 Automatic versus Manual Tagging
16 Markup Challenges and Specialized Content

Chapter 15
Automatic versus Manual Tagging

This chapter talks about getting SGML markup into documents. Chapter 5, “Two Scenarios,” discussed how you build an SGML installation. You either create an SGML environment from scratch or filter a legacy of existing documents. Often, though, you need to do both operations. This chapter discusses the tools required for each approach.

In this chapter, you learn:

• The best way to get tags into documents

• What types of conversion tools you can use to get documents marked up with SGML

• What types of structured authoring tools you can use to get documents marked up with SGML

Deciding which Markup Method Is for You

It all comes down to your document collection. If you have a backlog of diverse documents, you need document conversion. If you’re starting with a clean slate, you emphasize structured authoring. To decide where you fall between these extremes, consider these questions:

• How big will the SGML installation be? How many documents will it have?

• How complex are the document types?

• Are there many legacy documents that must be converted?

• Are you committed to software or hardware platforms that dictate how you create or process documents?

• Can the people in your group learn to author or edit documents differently?

• Are you constrained by the types of file formats that you receive from information providers or provide to clients?

• What sort of turn-around times must you adhere to in processing your documents? What are your time constraints?

How you answer these questions determines which markup method you should choose.

Big Installation versus One-Person Hobbyist

The large installation will most likely need much more document conversion than a one-person hobbyist. The hobbyist will primarily be a structured SGML author. Also, the hobbyist has fewer time restrictions than the large installation, which is most likely a commercial enterprise. So, the hobbyist can afford to experiment and search for the most cost-effective ways of doing things. The large enterprise will likely need to just get the job done—downtime costs more money for them. The one-person installation, however, will probably be using less expensive tools, and will use as many public domain programs as possible, whereas the large enterprise will buy expensive suites of software from commercial SGML software vendors.

Complex versus Simple Hardware and Software

Large corporations sometimes have a substantial commitment to a specific hardware and software platform that can dictate their requirements, whereas smaller companies or hobbyists have much more flexibility. If the hardware and software requirements are too unusual and no SGML support exists for that platform, the SGML enterprise is forced to insert tags manually, or they must hire someone to build an SGML processing system for their platform. The SGML standard was designed so that tags could be inserted using simple text editors, but it is very laborious. Also, public domain parsers exist for many different platforms, even many unusual ones. Still, building validating parser—as well as authoring and document conversion—tools is sometimes necessary for unusual platforms.

Simple Documents or Complex Document Collection

Collections of simple documents are easier to manage than collections of complex documents. Simple requirements can be met with fewer and less expensive tools for document conversion and structured authoring. A hobbyist can spend $400 on an MS Word add-on program, and he will have filled his entire requirement because of a simple document collection. A corporation could easily spend 100 times that amount on a team of analysts working full-time to build DTDs for all their documents to be authored, buying industrial strength tools from the finest SGML firms, and further handling the many exceptional legacy data challenges during document conversion.

Short or Long Timeline

Large companies spend large amounts of money for being late with documentation to customers, whereas if a hobbyist doesn’t get his SGML Web site operational until next month, there is no harm done. If you have time to experiment, you will iron out many potential problems before they happen and you will have time to educate yourself about the various conversion and authoring tools, as well as get your system up and working without having to learn as you go. Sometimes large companies must be in such a hurry that they don’t have time to test their authoring and conversion solutions before they use them, and this causes problems. The longer your timeline, the better your authoring and document conversion solutions will be.

There are essentially three approaches for marking up documents. You must choose an approach or combine them when you incorporate document markup. You can:

• Insert tags manually with a simple text editor

• Do structured authoring by using an SGML authoring and editing tool

• Convert existing electronic or paper documents into SGML documents

Table of Contents

Part IVMarkup Strategies