[Authors Note: It's been many years since I penned this page and time has moved on leaving xhtml in the dust. For a brief summary see my blogpost http://mitchfincher.blogspot.com/2011/12/html5-is-not-xml-time-to-get-over-it.html]. I leave this page here for historical research reasons. You should be learning HTML5, not xhtml. sigh.
- Some of my favorite Online References for XHTML:
- What is XHTML?
XHTML is a more formal, stricter version of HTML. XHTML is defined by an XML dtd which makes it much easier to handle.
- Advantages of using XHTML instead of HTML
- Documents can be validated much easier
- Documents can be transformed via tools like XSLT into other documents for consumption by devices like handhelds
- Fragments of documents can be retrieved faster
- Text can be stored more effieciently in object oriented databases
- XHTML Versions
- How to convert most HTML pages to XHTML
- Heading lines at top
At the beginning of documents we need to include a few lines:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
The location of the dtd allows validating parsers to check the document. Most browsers will ignore these tags.
- Downcase HTML tags, attributes, and HTML defined values
(Capitols are ok in user defined attribute values like <img src="..." alt="My Favorite Picture">.)
- Attributes values must be in double or single quotes
- Every element must have an end tag, even when it doesn't really matter.
<input type="text" value="Amazon.com" size="20" >
<input type="text" value="Amazon.com" size="20" />
For compatibility with older browsers its best to put a single space before the '/'. Some browsers have trouble with "<br></br>" so its best to use "<br />"
- Every attribute must have a value
<input type="radio" name="title" value="decline" checked>decline</input>
<ol compact="compact" >
<input type="radio" name="title" value="decline" checked="checked">decline</input>
- Tags may not overlap
This is <em> emphasized text and <b>bold </em>text</b>becomes
This is <em>emphasized text </em> is <b>bold text</b>
- Only certain tags may nest inside other tags
Looking at the dtd for xhtml, the definition of the "ol" element is:
<!ELEMENT ol (li)+> <!ATTLIST ol %attrs; type %OLStyle; #IMPLIED compact (compact) #IMPLIED start %Number; #IMPLIED >
This implies that an order list, "ol", element may not contain paragraph tags or body text, just list items.
These are some of my favorite animals:
and my most favorite
<p>These are some of my favorite animals:</p>
What do we do with the phase, "and my most favorite"?
- Ampersands in hrefs must convert "&" to "&" in the URI
<a href="http://www.phonelists.com/cgi-bin/Handler.pl?ListID=Test&Password=test&action=View">Sample List</a>becomes
<a href="http://www.phonelists.com/cgi-bin/Handler.pl?ListID=Test&Password=test&action=View">Sample List</a>
- The attribute "name" becomes "id" when used for a locator inside a document
For example, to reference a section within a document with a URI, we usually do something like
Inside the referenced section,
<a name="meerkats"><h2>Meerkats of Africa</h2></a>becomes
<a id="meerkats"><h2>Meerkats of Africa</h2></a>or better yet for backwards compatibility:
<a id="meerkats" name="meerkats"><h2>Meerkats of Africa</h2></a>
tidy is a tool to automatically convert HTML to XHTML. You can find it at http://www.w3.org/People/Raggett/tidy/.
Example of using Java to get web pages via http
Example of using Java to validate XML pages
Example of using Java to punch through surveys
One of many web tuning sites
- Heading lines at top
- Differences between XML and HTML
Since XML and HTML are derived from SGML they are similar, but have the following differences:
- XML is case-sensitive
- XML must have quotes (single or double) around attributes
- Most interpreters of HTML are very forgiving about missing end tags - XML parses are not.
- Comments start with <-- and end with -->. Inside a comment, "--" may not appear. Although this is fine in html, it confuses xml parsers.
It's a good idea to check your source XHTML pages against the validators at w3: