A Quick Guide to XML Schem1.doc

(84 KB) Pobierz

MSDN Magazine

A Quick Guide to XML Schema-Part 2


Download the code for this article: TheXMLFiles0207.exe (40KB)

I

n the April 2002 issue I covered the basics of XML Schema, including how you define elements, attributes, complex types, and how to use the built-in data types. In this month's column I'll introduce the more powerful features of the language, which are related to defining custom types and type hierarchies.
      Last time, I covered the process of describing a document's structure through xsd:element, xsd:attribute, and xsd:complexType definitions. As I mentioned in that earlier column, XML Schema defines a set of built-in data types that may be used to constrain the text content of elements and attributes.
      Picking up where I left off, each XML Schema data type has an explicitly defined value space as well as an explicitly defined lexical space (in other words, the set of possible string formats used in the XML document). For example, the double value 4200 can be represented lexically in a variety of ways (see Figure 1).

Figure 1 Lexical Representation
Figure 1 Lexical Representation

      Hence, when you specify that an element/attribute is of a given built-in type, the type's lexical space effectively controls what string formats are allowed in the document. For example, consider a schema that maps several local element declarations to XML Schema built-in types, as shown in Figure 2. When a schema processor validates an instance of this schema, it will ensure that the text contained within each of the elements/attributes conforms to a legal lexical representation of its defined type. The following instance is considered valid according to the schema:

<tns:employee xmlns:tns="http://example.org/employee/"

  id="555-12-3434">

  <name>Monica</name>

  <hiredate>1997-12-02</hiredate>

  <salary>42000.00</salary>

</tns:employee>

The following instance would be considered invalid because the hiredate and salary element values aren't valid representations of their corresponding types:

<tns:employee xmlns:tns="http://example.org/employee/"

  id="555-12-3434">

  <name>Monica</name>

  <hiredate>Dec 12, 2002</hiredate>

  <salary>42,000.00</salary>

</tns:employee>

The hiredate should be in the form of CCYY-MM-DD, while the salary shouldn't contain a comma. Both the value and lexical space details for a given type are defined in Part 2 of the XML Schema specification as well as in my book Essential XML Quick Reference (Addison-Wesley, 2001).
      Although it seems that the set of built-in types should be sufficient, you'll surely encounter situations in which the most appropriate built-in data type won't do the job. For example, in the previous schema definition, the id attribute is defined to be of type string, but it must actually be in the format of a Social Security number for things to work properly. Obviously, the U.S. Social Security number format isn't universal enough to make it one of XML Schema's built-in types. To deal with situations like this without having to write your own validation code at the application level, XML Schema makes it possible to define custom simple types.

Deriving New Simple Types

      XML Schema makes it possible to derive new simple types from existing simple types in a variety of ways. This is accomplished by using an xsd:simpleType element in the schema:

<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema"

  xmlns:tns="http://example.org/employee/"

  targetNamespace="http://example.org/employee/">

  <xsd:simpleType name="socialSecurityNumber">

     <!-- define characteristics of new simple type here -->

  </xsd:simpleType>

  <xsd:attribute name="id" type="tns:socialSecurityNumber"/>

</xsd:schema>

      Just as with complex types, you can either name simple types (through the name attribute) or define them anonymously within text-only element and attribute declarations. Typically, developers choose to name their new simple types so they can refer to them from multiple element/attribute declarations.
      In this case, the name of the new simple type is socialSecurityNumber and it's automatically associated with the schema's targetNamespace as I discussed in the April issue. Hence, when referring to this type, you must use a qualified name, as shown in the id attribute declaration.
      You can base new simple types on existing types using three techniques: restricting a base type, creating a list of a given type, or defining a union of types. There is an element that represents each of these different derivation techniques, which can be nested directly within the xsd:simpleType element. These elements are xsd:restriction, xsd:list, and xsd:union (see Figure 3).
      Deriving a new simple type by restriction is the most common case. This makes it possible to restrict the value space of an existing simple type to better meet your needs. Any instance of a restricted type is also a valid instance of the base type since the restricted type's value space is a proper subset of the base type's value space. Restricting the base type's value space also restricts the allowed lexical representations because there are fewer values to represent.
      You specify the base type you want to restrict through the base attribute on xsd:restriction, as shown here:

<xsd:simpleType name="age">

    <xsd:restriction base="xsd:unsignedShort">

      <!-- restrict string's value space here -->

    </xsd:restriction>

  </xsd:simpleType>

      You can restrict the base type's characteristics in a variety of ways. XML Schema makes this possible through type facets.

Type Facets

      XML Schema defines a set of type characteristics, or facets, which can be used to restrict certain aspects of the base type. There are a total of 12 facets, but not all of them can be used on a given built-in type. Figure 4 lists facet descriptions.
      Each facet makes it possible to restrict the value space of the base type in a different way. For example, you can use a combination of xsd:minExclusive, xsd:minInclusive, xsd:maxExclusive, and xsd:maxInclusive to define the allowed value range of number-based types. You can use a combination of xsd:length, xsd:maxLength, and xsd:minLength to control the length of string-based, binary-based, and list-based types. You can also use multiple xsd:enumeration elements to specify a fixed set of valid enumerated values. And for the Perl aficionados of the world, you've got xsd:pattern, which allows you to explicitly control the lexical pattern of a value through regular expressions.
      Let's look at some examples. The schema in Figure 5 defines some new simple types based on existing types (including some defined in the same schema). The new xsd:simpleType definitions extend the XML Schema type system with new custom types that are more appropriate for the application at hand (see Figure 6).

Figure 6 The XML Schema Tax System
Figure 6 The XML Schema Tax System

      Now the application can choose to use a more specific value space for a given element/attribute. Figure 7 illustrates the value spaces for each of the newly defined age types that derive from xsd:unsignedByte. As you can see, 18 is a valid value of teenAge, adultAge, normalAge, age, and unsignedByte, but it's not a valid value of infantAge or recordAge.

Figure 7 Age-derived Value Spaces
Figure 7 Age-derived Value Spaces

      You can experiment with this using the validation utility I provided earlier. Just create an XML instance document like this

<age xmlns:tns="http://example.org/ages/"

  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"

  xsi:type="tns:infantAge">2

</age>

and run it through the validation utility, as shown here:

c:> validate age.xml -s ages.xsd

Then you can experiment with the different value spaces by changing the type name specified in xsi:type along with the value embedded in the start/end tags. I've provided the validate.js utility, ages.xsd, and age.xml for you to download from the link at the top of this article.
      It's also worth mentioning again that you can use these simple types within xsd:complexType definitions as with any other built-in type. For example, you can represent people who have lived longer than anyone on record through the following complex type definition that uses the recordAge type from Figure 5:

<xsd:complexType name="oldPerson">

  <xsd:sequence>

    <xsd:element name="name"

       type="xsd:string"/>

    <xsd:element name="age"

       type="tns:recordAge"/>

  </xsd:sequence>

</xsd:complexType>

<xsd:element name="oldie"

   type="tns:oldPerson"/>

      The following two elements are valid instances of the oldPerson type and happen to represent the oldest man and woman in recorded history (according to the Guinness Book of World Records):

   <tns:oldie xmlns:tns="...">

     <name>Antonio Todde</name>

     <age>112</age>

   </tns:oldie>

 

   <tns:oldie xmlns:tns="...">

     <name>Jeanne-Louise Calment</name>

     <age>122</age>

   </tns:oldie>

Now any instance of oldPerson that has an age value less than 101 or greater than 122 is invalid according to the schema.
      It's also possible to restrict date ranges just as with numbers. The following schema fragment defines a new simple type that restricts the base type xsd:date to include only the date values of the 2002 Olympic events:

<xsd:simpleType name="olympicDates">

  <xsd:restriction base="xsd:date">

    <xsd:minInclusive value="2002-02-08"/>

    <xsd:maxInclusive value="2002-02-24"/>

  </xsd:restriction>

</xsd:simpleType>

<xsd:element name="date" type="tns:olympicDates"/>

The following element is a valid instance of the olympicDates type:

<tns:date xmlns:tns="...">2002-02-13</date>

Anything outside of the specified date range (2/8/2002 through 2/24/2002) is invalid according to this type.
      As a final example of restricting simple types, go back to the Social Security number problem from the first schema. In that schema, the id attribute was defined to be of type xsd:string, as shown here:

<xsd:attribute name="id" type="xsd:string"/>

      But the values should really be in the format of a U.S. Social Security number, which look like this: 555-12-3434.
      You can define a new type derived from xsd:string that restricts the value space to this specific lexical pattern, which can be described by the following regular expression: \d{3}\-\d{2}\-\d{4}, as shown in the following xsd:simpleType definition:

<xsd:simpleType name="socialSecurityNumber">

  <xsd:restriction base="xsd:string">

    <xsd:pattern value="\d{3}\-\d{2}-\d{4}"/>

  </xsd:restriction>

</xsd:simpleType>

   <xsd:attribute name="id"

      type="tns:socialSecurityNumber"/>

Now, since I defined the id attribute to be of this new type instead of the base type, xsd:string, the schema processor can take over the process of validating the value's lexical format.
      In addition to deriving simple types by restriction, you can also derive lists and unions from base types.

Lists and Unions of Simple Types

      The concept of deriving lists and unions from base types is much simpler than xsd:restriction because you're not actually changing the value spaces of the base types. Instead, you're defining a new type that is either a list of values for a single type (xsd:list) or a single value from one of several types (xsd:union).
      The following schema fragment defines a new simple type that is a list of recordAge values:

<xsd:simpleType name="listOfAges">

  <xsd:list itemType="tns:recordAge" />

</xsd:simpleType>

<xsd:element name="ages" type="tns:listOfAges"/>

      A valid instance of this new type must contain a whitespace-delimited list of valid recordAge values, as shown here:

<tns:ages xmlns:tns="...">112 122 119</ages>

      The following schema fragment defines a new simple type that is a union of two existing types, namely infantAge and recordAge:

  <xsd:simpleType name="extremeAge">

    <xsd:union memberTypes="tns:infantAge tns:recordAge" />

  </xsd:simpleType>

  <xsd:element name="extremeAgeRanges"

               type="tns:extremeAge"/>

This means that an instance of extremeAge must contain a value from within either the infantAge or recordAge value spaces. For example, the following extremeAgeRanges element is valid because it contains the value of 2, which is within the infantAge value space defined as up to three years old:

<tns:extremeAgeRanges xmlns:tns="...">2</extremeAgeRanges>

      It would also be valid if it had a value between 101—122 on the other end of the spectrum.
      That wraps up most of what you can do when deriving new simple types from existing simple types.

Deriving New Complex Types

      In a similar fashion, you can continue extending your type hierarchy by deriving new complex types from existing simple/complex types. This is accomplished through a new xsd:complexType element that has either an xsd:simpleContent or xsd:complexContent element child.
      xsd:simpleContent indicates that the new complex type is being derived from an existing type with simple content (just text content) while xsd:complexContent indicates that it's being derived from an existing type with complex content:

<xsd:complexType name="newDerivedComplexType">

  <!-- use xsd:simpleContent  to derive from simple  type -->

  <!-- use xsd:complexContent to derive from complex type -->

</xsd:complexType>

      Unlike simple types, you can derive complex types by either restriction or extension. You indicate the type of derivation by nesting either the xsd:restriction or xsd:extension elements within the xsd:simpleContent or xsd:complexContent elements, again depending on the base type.
      When deriving by restriction, an instance of the derived type is always a valid instance of the base type. This means that each member of the derived type must have the same, or narrower, value space and occurrence constraints as that of the base type. You can restrict the facets or occurrence constraints of any complex type member using the techniques that I described earlier in this column for simple types.
  &#...

Zgłoś jeśli naruszono regulamin