Here's a revised proposal for a clarification to the RSS 2.0 spec.
Under Elements of
- , replace the lead paragraph with the following.
A channel may contain any number of
- s. An item may represent a "story" -- much like a story in a newspaper or magazine; if so its description is a synopsis of the story, and the link points to the full story. An item may also be complete in itself, if so, the description contains the text (entity-encoded HTML is allowed; see examples), and the link and title may be omitted. All elements of an item are optional, however at least one of title or description must be present.
Notes: The new text is in bold. The word examples links to a page of examples of encodings. We'd like to make the changes to the spec early next week. If you have concerns about this change, please post a comment below, and we will review them before making the change. Thanks to everyone who has participated in the discussion thus far.
Could you also please be more clear in the spec about whether or not multiple enclosures are allowed? It would make sense to allow multiple enclosures, since you don't want to make a weblog post for every audiofile, or holiday picture etc you're enclosing. Also, when multiple enclosures are allowed perhaps it makes sense to be able to give them short descriptions in a 'title' attribute, just as you'd do when you're linking to a file from a webpage. Now it wouldn't make sense to be linking to foo.bar.com and having to click it to find out that it's something I don't want to see, right?
Also, I find it really frustrating that I have to include my email addres in the author field. But undoubtedly you have heard that before ;)
Great stuff. I particularly like the examples.
Let's keep the discussion on topic folks. This section is for comments on a specific proposal and not the RSS specification in general.
Thanks for the feedback Randy. I started my comment before your posted. It wasn't in response to yours.
Is non-encoded HTML allowed? Arbitrary XML?
Does this really clear things up? Wouldn't it be better to say that "the description is entity-encoded HTML" rather than saying than it's allowed?
I think you should say it MUST be entity-encoded HTML, but shouldn't you also apply the same rules to the title element?
Ian, Mick: saying "must" would technically be a change, so I think the idea here was to just offer a clarification (two words, and some examples), that helps people implement RSS feeds.
Joshua: OK. But I still think the same rules should be applied to the title element. The "clarification" doesn't address the title element.
I would change "entity-encoded HTML is allowed" to "as entity-encoded HTML" just to be perfectly clear.
Unfortunately, this "clarification" clarifies nothing actionable from a feedvalidator perspective. The spec continues to point to howtos which provide conflicting advice. Even if such howtos were disregarded, the question still remains as to whether descriptions which were valid UserLand RSS 0.91 are still to be considered valid RSS 2.0.
Until I see a clarification in the spec that says otherwise (for example, Aaron Swartz's suggestion above), I plan to continue to assume that plain text is equally as valid as escaped HTML in both titles and descriptions.
Aaron - so you're agreeing that the description and title elements MUST be entity encode HTML?
Sam: If the MUST were to be included, would your UL RSS 0.91 concerns be alleviated or would you need a version upgrade to RSS 2.1 (or similar) ?
I have been using numeric entities in my rss feeds, is that also valid? so < less-than < I would use < > greater-than > I would use >Maybe there should be an example where numeric entities are used. Not © but ©
Ok this will not work. What I meant was Not © but ©
Mick: MUST would address my concern.
Ed: here is a feed with a variety of techniques used to express "unusual" characters in RSS 2.0 titles. I am interested in clarifications as to which ones are legal in RSS 2.0. At the moment, my assumption is that all of them are.
I don't have the data on how many feed aggregators support HTML in the title, but I think it is very few. It wouldn't do much good to say "can be HTML", when it wouldn't work in the aggregators anyway. I agree it's an issue, but one that should be driven by aggregator vendors arriving at a consensus and convention rather than the specs trying to push the vendors.
Joshua: two examples: Radio UserLand's aggregator treats titles as entity encoded HTML and displays them as such. RssBandit treats titles as entity encoded HTML, but chooses to strip HTML tags prior to display. Here's a test case which be used to explore this.
Aaron, Sam: Just to be clear, are you guys saying you would support the text as it was originally clarified (MUST be entity-encoded HTML)?
Yes, I kind of assumed that aggregators which use "newspaper" style within a web browser would honor HTML in the tags, while the "rich client" ones would not. I know that RSS Bandit and NewsGator do not, and AFAIK that is a large percentage of the market :-)
There is no need to change anything. If you say that entity-encoded HTML is allowed then there could be valid non-HTML feeds that don't dispaly properly. Requiring HTML breaks old feeds.
If you are using HTML in your descriptions and like the simplicity of RSS, then use it and hope for the best. Most aggregators will correctly display HTML descriptions. If you use plain text descriptions or want to make sure that your feed works in all aggregators, use Atom.
The two formats each have their purposes. Just leave RSS as it is so it can fill its role without breaking old feeds, and people will use it to make simple HTML feeds. Do not try to cover all areas that Atom does with RSS, because it is, by its spec, a very loose vocabulary.
"Unfortunately, this 'clarification' clarifies nothing actionable from a feedvalidator perspective."
Perhaps not, but it does provide more assistance to aggregator developers who need guidance on what it means when a description element allows entity-encoded HTML.
There are two issues here. I think this proposal solves the easier one and ought to be implemented. We can always return to the harder problem.
Joshua: if the text were changed to "MUST be entity-encoded HTML", I would make a corresponding change to the validator.
Re: «Yes, I kind of assumed that aggregators which use "newspaper" style within a web browser would honor HTML in the tags, while the "rich client" ones would not. I know that RSS Bandit and NewsGator do not», it is not a question of "honoring" HTML in the tags, it is a question of *detecting* HTML in tags. To illustrate the point: how should the following be expressed as a title?
An ode to the <blink> tag
Rogers, re: "it does provide more assistance to aggregator developers who need guidance on what it means when a description element allows entity-encoded HTML", I am not aware of an aggregator developer who is confused on this point. There does, however, seem to be considerable questions as to whether titles and descriptions should contain entity-encoded HTML.
This doesn't solve any real world problems with presentation from what I can tell, as Sam and others have implied. In order to embelish titles and descriptions with appropriate CSS styles, it is important to know whether these fields comform to valid HTML containers. e.g. paragraphs have P wrappers, BLOCKQUOTE must have containing P or block containers, etc. With this "clarification" you still have to guess at what the author/publisher was intending, and make up your own wrappings as appropriate.Better but unworkable "clarifications" would be for either MUST (as sugested previously), or for these fields to be typed. e.g. <item type="text">.But like Lenny says, don't change anything, vagueness is one of RSS' selling points.
Richard, I haven't seen much demand for the ability to put CSS styles in titles.
Sam, it's not so much the originating data, but the display which some aggregators call the "newspaper" view. Different items from different feeds all need to display the same, or at least need to be normalised before being given their display/CSS wrapping. In this sense the problem is the same as that for descriptions. How are titles wrapped? Valid HTML containers? Optional embelished HTMLisms like ? Or plain text? To define appropriate CSS, this needs to be clarified. But sure, like you say, the description is the bigger problem.
"Optional HTMLisms like <strong>?"
Selective tag rendering, amusing.
Are you suggesting that RSS include style tags, or just well-composed, reliable standard HTML tags that stylesheets can use when the feed is displayed by an aggregator?
Aggregators should strip style tags because they can contain malicious payloads, as Mark Pilgrim demonstrated with the platypus prank.
I don't think a syndication format can accept HTML and impose rules that it be valid HTML, any more than Web page composition tools should require the same. Valid markup is great, but there's a place for looser standards, as the success of both HTML and RSS demonstrate. I think RSS should treat HTML as opaque data, carrying it from the sender to the receiver without trying to parse it. People who need to impose a stronger content model could use a namespaced element for that purpose.
A comment about author, managingEditor: the spec says it must contains an email (RFC822).I have a use case where instead of an email, a link to a business card (an http url) is more appropriate than email...