![]() |
|
|
Published: Wednesday, December 03, 2003 By Vlad Alexander
Introduction
A Brief History
HTML 4 The problem stems from the fact that HTML, itself, does not impose any formatting or structuring guidelines. Add this with the fact that browsers will gleefully render sloppy and malformatted HTML, and you have yourself a recipe for disaster. Instead of tackling the problem at its source and making sure that the markup these editors generated was clean, tools like HTML Tidy were used to clean up dirty markup after the fact.
XHTML 1.0 Confusingly, XHTML 1.0 came in three flavors and you could specify which flavor of the language you were using by inserting a line in the beginning of the document.
One advantage of XHTML 1.0 was that it displayed pages in Web browsers much faster than HTML 4 pages, the difference being most apparent in very long documents. This was due to the fact that XHTML 1.0 followed the rules of XML, so parsing Web pages became much easier and required less CPU resources. Also, browsers did not need to clean up the structure of code before displaying the Web page, because Web pages written in XHTML 1.0 were well formed. Some WYSIWYG editors that natively generated HTML 4 were able to convert their code to XHTML 1.0 using clean-up tools like HTML Tidy.
XHTML 1.1 For developers of medium to large Web sites, the benefits of separating data from formatting are huge. First, in its "raw" state data becomes immediately more available to a wide range of devices and applications. Second, separating data from formatting has significant advantages for Web design. For instance, if you have ever maintained a Web site with many contributing authors, you know that some can't tell the different between Arial and Times Roman. Some like 11 point font while others prefer putting everything in 14 point. And if you give a non-technical user a color-picker, you can be sure that no color on the palette will go unused. Since XHTML 1.1 does not permit random inline formatting of this type, but regulates presentation through external or embedded CSS, it is much easier to maintain the common look and feel of Web sites. Modifying the look and feel of entire Web pages or web sites is also much simpler. Both can be achieved by making a few simple changes to one or more CSS files.
True, XHTML 1.1 requires a change in the way that Web pages are served, but the change is slight. It involves the
"media type" information that is normally returned to the browser by the Web server when a page is requested.
For HTML Web pages, the media type is
Content-managed Web Sites
This is a solid and time-tested approach and virtually all content-managed sites are built in this way. Some store content in the database, others store it in XML documents on the file system or in plain text files, but the approach is essentially the same. However, over time, content usually needs to be re-purposed, syndicated and inserted into different page layouts. So while the way in which data is presented will change over time, content itself needs to remain highly available to any layout that needs it. The diagram below demonstrates this point.
Only content that is free from formatting can be easily re-used in this way. In theory, HTML 4 right through to XHTML 1.1 supports the separation of data from formatting, but only XHTML 1.1 actually enforces it. The reality is therefore that in the real world of content authoring, most WYSIWYG editors still generate code that fuses data and formatting together. This makes data more difficult to parse and reuse. Take for example this simple illustration. Let's say that an author decides to present people's names, within a news article, in the color green. This will generate the following code:
or
Problem: what if another Web site's policy is to display people's names in blue? On the surface, the solution seems easy – a simple "search and replace" on the word "green" within a color or a style attribute. But what if green is also being used to colorize something else? How confident would you be that your search and replace has not mistakenly replaced something it was not supposed to? A far better approach is to author content in such a way that the data is not compromised by inline formatting - by using an external or embedded CSS. For example:
Each Web site that uses the data "John Smith" is now free to define the CSS rule that formats the
Taking this one step further, what if a Web site for some reason wants to revert to using the
This example reveals one self-evident truth: it is possible to convert semantically rich markup to semantically barren markup, but not vice versa. Fortunately, there are XHTML 1.1-compliant WYSIWYG editors, ones that enforce the separation of content and style. In Part 2 we'll look at one XHTML WYSIWYG editor in particular, and look at some general rules you can apply to your HTML markup today to help prepare it for a future of XHTML.
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
![]() |
![]() |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||