Previous Table of Contents Next


Chapter 21
Integrating SGML and HTML Environments

This chapter discusses what is happening with SGML on the Web and how you can get to that information, as well as what you need to get going with SGML on the Web. It doesn’t cover all the details, but it gives an overview of several tools. You then can decide which tools fit your needs for Internet delivery of SGML information. This chapter also points you to many URLs that provide examples.

The discussion is divided into parts that cover:

  Some successful uses of SGML on the Web
  Current solutions and tradeoffs
  Tools and how to combine them

Can It Be Done?

Many people and companies have SGML data on the Web and several companies have introduced helpful products. So here are a few of their success stories, along with URLs so you can check them out directly.


Note:  
See Appendix B for a longer list of Internet sites that have SGML tools, information, and data content. This section focuses on locations that deliver SGML through the World Wide Web itself.


• See “Internet Resources,” p. 569

Novell

www.novell.com

Novell delivers its NetWare documentation in multiple languages on CD-ROM using SGML. Novell has made this documentation available on the Web. The documents are over 100,000 pages in five languages and are marked up using the DocBook DTD (popular in the computer industry). Various other materials on Novell’s site are in HTML, but the manuals are kept in SGML (using the DocBook DTD), and converted on the fly to HTML using EBT’s DynaWeb system (see fig. 21.1).


Fig. 21.1  Novell delivers NetWare documentation to the Web directly from SGML.

Text Encoding Initiative (TEI)

www.ebt.com/usrbooks/teip3 and etext.virginia.edu/TEI

Several universities are leaders in creating interesting and useful SGML data. Some of them have large literary SGML collections and make them available in various forms on the Web. The most easily available materials are significant works of English literature. Poetry, literature of other languages, and less-famous works of all kinds are also going online very quickly.

Many of the projects working with literature, poetry, drama, and other related areas use the TEI “Guidelines for Encoding Machine Readable Texts.” You might wonder whether those guidelines are themselves available in SGML. They are: You can get them through the Open Text server running at the University of Virginia, or through EBT. If you’re thinking of preparing any type of literary work in SGML, you can’t do without these guidelines.

CD-ROM and paper versions also are available and are produced from the same SGML source files. The project maintains FTP archives at:

ftp-tei.uic.edu/pub/tei (for users in North America)
ftp.ifi.uio.no/pub/SGML/TEI (for users in Europe)
TEI.IPC.Chiba-u.ac.jp/TEI/P3 (for users in Asia)

The Oxford University Text Archive (OTA)

ftp.ota.ox.ac.uk

The Oxford Text Archive collects and distributes electronic texts as well as helps develop the TEI guidelines. Among the authors included in this archive are Charles Dickens, Mark Twain, H.G. Wells, and E.R. Burroughs. You can access many of these books, which are marked up (with the TEI DTD) and ready to drop into an electronic document system. You can use applications such as Fetch or Telnet to access the texts, or a Web client that supports the FTP protocol. You can also browse and search many of these texts by pointing your Web browser to www.ebt.com/literature (see fig. 21.2).


Fig. 21.2  The Oxford Text Archive has marked up many texts using the TEI DTD. This view shows one by Mark Twain, accessed via the DynaWeb SGML Web server.

University of Virginia

www.lib.virginia.edu/etext/ETC.html

The University of Virginia has a major electronic text center that works with a wide variety of literature and mirrors a lot of information from other sites. Much of the information is in SGML and available on the Web. Start at the URL shown above to explore this collection (see fig. 21.3).


Fig. 21.3  A sample of Doyle’s “The Red-Headed League” from the University of Virginia collection of SGML documents.

Center for Electronic Texts in the Humanities (CETH)

cethmac.princeton.edu

Rutgers and Princeton Universities together sponsor the Center for Electronic Texts in the Humanities, or CETH. CETH focuses on providing information on how to catalog, maintain, and distribute electronic documents; it also provides courses and periodicals to help in this effort.

Summer Institute of Linguistics (SIL)

www.sil.org

SIL has been working with highly structured electronic documents in many languages for many years, even before SGML was the way to do it. Through its Web site you can get a great deal of information about SGML, including a bibliography by Robin Cover that points to almost everything ever published on SGML (though not all of the more recent stuff—for the last couple years, there’s been so much activity that no one can keep up with it all). You can find the SGML Bibliography at www.sil.org/sgml/biblio.html.

University of California at Berkeley

www.lib.berkeley.edu/AboutLibrary/Projects/BFAP

Through the Berkeley Finding Aids Project (BFAP), the UC Berkeley Library and others have developed a DTD for managing information about archives, personal papers, manuscripts, and other kinds of information found in libraries, special collections, and archives. Some of this information is becoming available on the Web. For more information, see the URL above (there are some especially nice pictures of San Francisco from before the 1906 earthquake and fires). Duke University also participates actively in this project. Go to odyssey.lib.duke.edu/findaid/ (this site has the nice feature of providing a choice of whether to see the data using the BFAP DTD or HTML).


Fig. 21.4  Thumbnails of photos from UCB special collections accessible on the Web. You can click the thumbnails to download higher-resolution versions.

SGML Open

www.sgmlopen.org/index.sgml

The companies that develop SGML products have a cooperative organization called SGML Open. On the SGML Open Web site, you can find a wealth of information about all aspects of SGML, and pointers to many companies with SGML products. Many of those companies have an interest in HTML as well, and provide products that apply in both worlds.


Previous Table of Contents Next