Tolerating broken HTML writers
These are illegal according to SGML, but they're so prevalent that
they're supported by the sample implementation.
Please stop generating HTML in this style!
The BODY element must start with some element. See: an example document where this rule
Paragraph breaks are not allowed in headers, lists etc. They may be
ignored or treated intelligently.
Tags that aren't known to the parser are treated as data by, for
example, the MidasWWW-1.0 implementation. They should be ignored.
There should be no tags around the word foo: foo.
Note that conforming SGML parsers will treat "&", "<", "</",
and "<!" as normal text characters when they are not followed by a
letter. HTML producers are discouraged from taking advantage of this
This anchor's name starts with a digit, which is not a
name start character.
unquoted attribute literals: NeXT and html-mode.el
This anchor's href contains a '#', which is not a name
character. It should lead to the NeXT implementation reference below
contains ':' and '/', which are not a name characters. It should lead
to the SLAC MidasWWW doc anyway.
Literal Text Elements
The original semantics of the XMP and LISTING elements is not
representable in SGML. From Tags used in
But in section 7.6 of the SGML standard:
- The text may contain any ISO Latin printable characters, including
the tag opener, so long as it does not contain the closing tag in
The XMP and LISTING elements are deprecated in favor of the TYPEWRITER
- The content of an element declared to be character data or
replaceable character data is terminated only by an etago
delimiter-in-context (which need not open a valid end-tag) ... .
Non-standard CDATA parsing: LineMode, MidasWWW, etc.
This example section ends here:
Just in case the foo close tag above wasn't recognized:
The following systems are known to read and/or write HTML. They all
- Linemode Browser 1.3c
- MidasWWW 1.0
The MidasWWW parses HTML into its internal data structures, and
then offers the option to extract the data and write it to a file.
It doesn't get it right all the time.
- NeXT editor
- From firstname.lastname@example.org
- from marca@@@
From Pei Wei @ O'Reilly (@@email address). Any known problems? I hear
it's going to use SGMLs.
@@Go get The
latest version -- it should be current with this spec.
- perl client
Just heard about it. haven't tried it. I don't think it supports