People sometimes ask about using HTML in titles and descriptions.

The spec says that "entity-encoded HTML is allowed" in descriptions. It does not say that it's allowed for any other elements.

Most aggregators will render descriptions as HTML, though it should be noted that it's conceivable that an aggregator might not have access to an HTML renderer, or might have only a very basic HTML renderer available. (Consider a PDA, for example.)

Titles, however, should not contain HTML. The spec doesn't allow for it. The behavior of an aggregator when encountering HTML is undefined: some aggregators strip the HTML, others might display it with the HTML code visible.

You can think of titles as like the titles of Web pages. When someone puts HTML in a Web page title, browsers often display the tags in the window title bar. This is because, according to the HTML 4 spec, "Titles may contain character entities (for accented characters, special characters, etc.), but may not contain other markup (including comments).

"You can also think of titles as similar to subjects of email messages. Though an email message may be HTML, the subject may not.

Comments

The issue isn't *intentional* markup in titles, or a strong desire to add them, I think that's been widely misunderstood. No markup in titles, as a decision and specced requirement, is fine.

The issue is "not-markup", that aggregators should display titles "literally" (as some do, that is to say, end users see less-than, greater-than, and ampersands as they were entered) and not strip "what <looks> like markup" (as an intentional example).

Regarding descriptions, previous specs did not allow markup. Markup in description was added to the spec in 0.92 "to reflect actual practice".

The issue is that how to use markup or *not* use markup was ambiguous then and continues to be ambiguous.

NB, I have not tried to suggest potential solutions at this point. Discussing potential solutions while one is still trying to figure out if there's a problem tends to overshadow the discussion of the problems themselves. Although, sometimes discussing potential solutions helps clarify the problem space. Up to you.

Actually we've waiting a pretty long time on this. Basically we're going to publish a guideline saying please don't put markup in titles, and ask that validator developers flag it, and that aggregator developers are free to strip it, as are tool developers. Most interested in feedback from people who do validators, content tools and aggregators.

My suggestion in that case would be that aggregators should display it literally. "Strip it" would involve having the spec describe precisely what "it" is that should be stripped.

Good, you've got your suggestion, and Brent, Jon and I have ours.

As a content tool and aggregator developer (and of necessity to some extent a validator developer), I want to see clear recommendation of what should be stripped and what should be left verbatim. Ken's suggestion sounds an easy route to removing the ambiguity.

Speaking as a validator developer, I will state unequivocally that my validator will not flag anything that is not in the official spec. I have been reamed in the past for attempting to validator "past the spec" (outputting warnings that are not supported by official spec text), and I have been reamed in the past for suggesting that anything outside the official spec is official. My New Year's resolution to you is, once the holidays are over and I have some time to devote to it, I will be reviewing the test cases and associated code in my validator and removing all errors/warnings that are not explicitly supported by spec text.

In other words, you can publish all the recommendations and best practices documents you want, but until it shows up at blogs.law.harvard.edu , I'm ignoring it.

Mark, I assume you're Mark Pilgrim.

I understand what you're saying, although I don't agree with what you got "reamed" about.

Anyway, the stuff we're talking about here is in the spec.

Let's try to get along, what do you say?

If HTML is not allowed in titles, that implies any (XML-escaped) angle brackets contained in an RSS title should be displayed to the user. However if some aggregators are stripping "markup", anything in angle brackets will be lost.

The question that we all need answered isn't so much whether or not markup is allowed, but what an RSS generator should do when it has an angle bracket or ampersand character to insert into a title or description. Assuming that the character is meant to be seen by the end user rather than being markup, what exact characters should end up in the XML?

(me = aggregator developer and amateur RSS publisher)

:
:
:

Popular Pages on This Site