How do we do it
Problem is not unique. Faced by all information providers
Old approach -- SGML
-
Information all stored in common format
-
Same information produced in different formats
-
originally designed for legal materials (IBM -- GML in 1970's)
-
Only caught on with big publishers (Lexis, DOD, most text book publishers)
because complicated and expensive.
-
leading application program $15,000 per seat
SGML
-
Core idea. Separate materials into two parts. Control with two functions.
-
First part --- information
-
Most frequently text but can be anything
-
Second part -- information about information
-
"Joseph Conrad" on book binding of "Heart of Darkness" is information
but also is information about information
-
he is author
-
it is part of binding
-
should be emphasized
-
show information about information with tags:
-
<author> Joseph Conrad</author>
-
<emphasis><author>Joseph Conrad</author></emphasis>
-
<binding><emphasis>Heart of Darkness<author>Joseph Conrad</author></emphasis></binding>
-
First Control Function-- Document Type Definition
-
defines tags and rules for using them ( "no section 2 without a section
1")
-
industry specific-- developing DTD is moderately difficult. Standard DTD's
important
-
Second Control Function-- Style Sheet
-
defines how tags are presented ("for emphasis use purple and make it blink")
-
specific to output device (may be printing press, maybe browser, maybe
Palm Pilot, may be voice, etc.)
-