CSE134A LECTURE NOTES

November 21, 2001
 
 

ANNOUNCEMENTS

There are four separate evaluations for you to fill out today: CAPE, and one for each of the teaching assistants.

For good short XML tutorials see here.  Parts of today's lecture are based on this site.
 
 

INCLUDING AN EXTERNAL DTD

A DTD can be external, internal, or both.  The syntax for an external DTD is     <!DOCTYPE NAME SYSTEM "file">

To have an external part and an internal part, write      <!DOCTYPE NAME SYSTEM "file" [ ... ]>
where [ ... ] indicates the internal DTD.

One of the limitations of DTDs is there is no way of making them modular by combining several DTDs into one.  The DTD for addresses above includes a DTD for names.  This has to be repeated explicitly.  It should be included by referring to it in some way.
 
 

A DTD FOR POSTAL ADDRESSES

Here is the DTD for postal addresses developed by the HR-XML consortium, from page 15 of this PDF document:
<!-- Copyright 2000 The HR-XML Consortium (TM) -->
<!-- version 1.0  October 17 2000 -->
<!-- 11/05/2000

<!ELEMENT PostalAddress  (CountryCode , PostalCode? , Region* , Municipality? , DeliveryAddress? , Recipient* )>
<!ATTLIST PostalAddress  type  (postOfficeBoxAddress | streetAddress | undefined )  'undefined' >
<!ELEMENT PostalCode  (#PCDATA )>
<!ELEMENT CountryCode  (#PCDATA )>
<!ELEMENT Region  (#PCDATA )>
<!ELEMENT Municipality  (#PCDATA )>
<!ELEMENT DeliveryAddress  (AddressLine* )>
<!ELEMENT AddressLine  (#PCDATA )>
<!ELEMENT Recipient  (PersonName? , AdditionalText* , Organization? )>
<!ELEMENT PersonName  (FormattedName* , GivenName* , PreferredGivenName? , MiddleName? , FamilyName* , Affix* )>
<!ELEMENT FormattedName  (#PCDATA )>
<!ATTLIST FormattedName  type  (presentation | legal | sortOrder )  'presentation' >
<!ELEMENT GivenName  (#PCDATA )>
<!ELEMENT PreferredGivenName  (#PCDATA )>
<!ELEMENT MiddleName  (#PCDATA )>
<!ELEMENT FamilyName  (#PCDATA )>
<!ATTLIST FamilyName  primary  (true | false | undefined )  'undefined' >
<!ELEMENT Affix  (#PCDATA )>
<!ATTLIST Affix  type  (academicGrade |
                        aristocraticPrefix |
                        aristocraticTitle|
                        familyNamePrefix |
                        familyNameSuffix |
                        formOfAddress |
                        generation )  #REQUIRED >
<!ELEMENT AdditionalText  (#PCDATA )>
<!ELEMENT Organization  (#PCDATA )>

You may use this DTD instead of the newer XML schema mentioned in the project description.  For an explanation of XML schemas see here.
 
 

ATTRIBUTE DECLARATIONS

Each element named in a DTD can have one or more ATTLIST declarations.  An empty element can still have attributes.  For example
<!ATTLIST image source     CDATA       #REQUIRED
                width      NMTOKEN     #IMPLIED
                height     NMTOKEN     #IMPLIED
                format     CDATA       #FIXED "jpeg"
                alt        CDATA       "No caption provided."
                catalogno  ID          #REQUIRED
                owner      IDREF       "Unknownn_owner"
>
The meaning of the modifier #REQUIRED is obvious.  #IMPLIED means the attribute is optional, and no default value is provided.  A literal value is a default for when the attribute is not given a value.

CDATA means that the content of an attribute value can be aribtrary text inside quotation marks, while NMTOKEN means the content must be a legal XML name.
 
 

ENTITIES IN A DTD

An internal entity is simply an abbreviation that can be used in an XML document.  For example in the DTD you can write:
<!ENTITY notice "Copyright Regents of the University of California, 2001.  All rights reserved.">
Then in every document using this DTD you can just write
<header>&notice;<header>
External entities are useful for including non-XML data in an XML document.  You do this indirectly, by declaring the external data to be an "entity" in the DTD for the document, for example
<!ENTITY pic  SYSTEM "http://www.w3schools.com/entities/photo.gif">
Then in the document you can write
<author>&pic;</author>
It is the job of the software that parses the XML document to refer back to the DTD and to do something with the URL it finds there.
 
 

NAME SPACES

Conflicts between the names of XML elements are resolved using a prefix.  A namespace attribute declares a prefix for an element and all elements nested inside it, for example:
<f:table xmlns:f="http://www.w3schools.com/furniture">
The URL that identifies the namespace is just a placeholder.  No corresponding file has to exist, and no information is looked up at this URL.  (Technically it is a URI, not a URL.)

In an XSL document, the root element is

<xsl:stylesheet xmlns:xsl="http://www.w3.org/TR/xsl">
All tags that are XSL commands then start with the prefix xsl.

Unfortunately DTDs know nothing about namespaces.  If you have a DTD for a document that uses namespaces, the DTD has to use the same prefixes.  Each namespace should be associated with its own namespace, which should be included automatically when the namespace is used.  But this is not the case.
 
 



Copyright (c) by Charles Elkan, 2001.