XHTML Introduction

[Authors Note: It's been many years since I penned this page and time has moved on leaving xhtml in the dust. For a brief summary see my blogpost http://mitchfincher.blogspot.com/2011/12/html5-is-not-xml-time-to-get-over-it.html]. I leave this page here for historical research reasons. You should be learning HTML5, not xhtml. sigh.

  1. Some of my favorite Online References for XHTML:
    1. w3's overview of XHTML
    2. w3.org's validator checks if a document really is xml
    3. w3.org's css validator
    4. w3.org's Cascading Style Sheets
  2. What is XHTML?

    XHTML is a more formal, stricter version of HTML. XHTML is defined by an XML dtd which makes it much easier to handle.

  3. Advantages of using XHTML instead of HTML
    1. Documents can be validated much easier
    2. Documents can be transformed via tools like XSLT into other documents for consumption by devices like handhelds
    3. Fragments of documents can be retrieved faster
    4. Text can be stored more effieciently in object oriented databases
  4. XHTML Versions

    XHTML (like Gaul) is divided into three parts (or flavors), transitional , strict, and frames.

  5. How to convert most HTML pages to XHTML
    1. Heading lines at top

      At the beginning of documents we need to include a few lines:

      <?xml version="1.0" encoding="UTF-8"?>
      <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
      <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">

      The location of the dtd allows validating parsers to check the document. Most browsers will ignore these tags.

    2. Downcase HTML tags, attributes, and HTML defined values

      <BODY BGCOLOR="RED">

      becomes

      <body bgcolor="red">

      (Capitols are ok in user defined attribute values like <img src="..." alt="My Favorite Picture">.)

    3. Attributes values must be in double or single quotes

      <ol type=1>

      becomes

      <ol type="1">

      or

      <ol type='1'>

    4. Every element must have an end tag, even when it doesn't really matter.

      <br>
      <input type="text" value="Amazon.com" size="20" >

      becomes

      <br />
      <input type="text" value="Amazon.com" size="20" />

      For compatibility with older browsers its best to put a single space before the '/'. Some browsers have trouble with "<br></br>" so its best to use "<br />"

    5. Every attribute must have a value

      <ol compact>
      <input type="radio" name="title" value="decline" checked>decline</input>

      becomes

      <ol compact="compact" >
      <input type="radio" name="title" value="decline" checked="checked">decline</input>

    6. Tags may not overlap

      This is <em> emphasized text and <b>bold </em>text</b>

      becomes

      This is <em>emphasized text </em> is <b>bold text</b>

    7. Only certain tags may nest inside other tags

      Looking at the dtd for xhtml, the definition of the "ol" element is:

      <!ELEMENT ol (li)+>
      <!ATTLIST ol
        %attrs;
        type        %OLStyle;      #IMPLIED
        compact     (compact)      #IMPLIED
        start       %Number;       #IMPLIED
        >
      

      This implies that an order list, "ol", element may not contain paragraph tags or body text, just list items.

      <ol>
      These are some of my favorite animals:
      <li>octopus</li>
      <li>shrew</li>
      <li>lemur</li>
      and my most favorite
      <li>meerkats</li>
      </ol>

      becomes


      <p>These are some of my favorite animals:</p>
      <ol>
      <li>octopus</li>
      <li>shrew</li>
      <li>lemur</li>
      <li>meerkats</li>
      </ol>

      What do we do with the phase, "and my most favorite"?

    8. Ampersands in hrefs must convert "&" to "&amp;" in the URI

      <a href="http://www.phonelists.com/cgi-bin/Handler.pl?ListID=Test&Password=test&action=View">Sample List</a>

      becomes

      <a href="http://www.phonelists.com/cgi-bin/Handler.pl?ListID=Test&amp;Password=test&amp;action=View">Sample List</a>

    9. The attribute "name" becomes "id" when used for a locator inside a document

      For example, to reference a section within a document with a URI, we usually do something like
      "<a href="favoriteAnimals.html#meerkats">Meerkats</a>"

      Inside the referenced section,

      <a name="meerkats"><h2>Meerkats of Africa</h2></a>

      becomes

      <a id="meerkats"><h2>Meerkats of Africa</h2></a>

      or better yet for backwards compatibility:

      <a id="meerkats" name="meerkats"><h2>Meerkats of Africa</h2></a>

    10. Tidy

      tidy is a tool to automatically convert HTML to XHTML. You can find it at http://www.w3.org/People/Raggett/tidy/.

    Java Section

    1. GetWebPage.java

      Example of using Java to get web pages via http
      GetWebPage http://www.cnn.com

    2. ValidateXML.java

      Example of using Java to validate XML pages
      ValidateXML http://www.w3.org

    3. SurveyTaker.java

      Example of using Java to punch through surveys

    4. Misc:

      www.NetMechanic.com

      One of many web tuning sites

  6. Differences between XML and HTML

    Since XML and HTML are derived from SGML they are similar, but have the following differences:

    1. XML is case-sensitive
    2. XML must have quotes (single or double) around attributes
    3. Most interpreters of HTML are very forgiving about missing end tags - XML parses are not.
    4. Comments start with <-- and end with -->. Inside a comment, "--" may not appear. Although this is fine in html, it confuses xml parsers.

It's a good idea to check your source XHTML pages against the validators at w3: