Proposed Clarification for RSS 2.0 Spec

Requesting comments on a couple of proposed clarifications to the RSS 2.0 spec.

1. Under Elements of , replace the lead paragraph with the following. The new text is highlighted in green.

A channel may contain any number of items. An item may represent a "story" -- much like a story in a newspaper or magazine; if so its description is a synopsis of the story, and the link points to the full story. An item may also be complete in itself, if so, the description contains the text, and the link and title may be omitted. Either way, description contains entity-encoded HTML. All elements of an item are optional, however at least one of title or description must be present.

2. Immediately following that section, a link to a page of examples, authored by Nick Bradbury, author of the FeedDemon aggregator.

Notes: We believe aggregators already assume the item-level description contains entity-encoded HTML. We'd like to make the changes to the spec early next week and if there are no deal-stoppers, we will. Please comment below. Off-topic and personal comments will be deleted.

Thanks to: Nick Bradbury, Brent Simmons, Greg Reinacker, Jake Savin, Dare Obasanjo, Matt Mullenweg for their help working out this proposed clarification.

Posted by Dave Winer at 2004/06/04 03:59 PM | 79 COMMENTS | permalink

Comments

So no one complains later, off-topic and personal comments will be deleted. This text also appears in the body of the post.

Posted by Dave Winer at 2004-06-04 04:23 PM

It's *really* nice to see that the spec is actively being reformed. Item number 1 was a major showstopper in many recent test cases. I'm glad to see that you guy have fixed this. Excellent work!

Posted by Scott Johnson at 2004-06-04 04:51 PM

Dave: +1

Posted by Roger Benningfield at 2004-06-04 05:12 PM

Awesome!

Posted by Randy Charles Morin at 2004-06-04 05:19 PM

Would it help to understand, by saying a little more about what this clarifies? And I was just wondering ,is entity-encoded html basically HTML? Certainly I'm glad to see examples of the CDATA enclosing mechanism at work here.

Just one more question. The first time I viewed this clarification page, it had something about an (S) tag (replace paren w/ angle there) and a closing 'S' tag if it was a story. Then, did that go away? Or am I goin gcrazy (or both?!) Just wondering.

cheers!

Posted by george girton at 2004-06-04 05:29 PM

In the "Extending RSS" section, you'll also need to remove or change the phrase "a version 0.91 or 0.92 file is also a valid 2.0 file", since this will no longer be true. In RSS 0.91, description was plain text.

Posted by Mark at 2004-06-04 05:31 PM

This makes perfect sense, but I can't help but wonder if the

"Either way, contains entity-encoded HTML."

implies that the entry must be valid HTML, i.e. it wouldpass the W3C checker if copied and pasted inside the body tags ofa valid document. This would require the text be contained within block tags (Which I do not think is a bad thing).

Posted by Joseph Palmer at 2004-06-04 05:38 PM

Scott, Roger and Randy -- thanks for the good vibes. You guys are awesome!!

George, you're not going crazy, the editor started, at some point in the editing process, neutering the strike tag, so rather than explain that, I just nuked the striked text and changed the legend.

Mark, you're wrong, the compatibility is still there, in this case, both ways. Plain text is also HTML, and in 0.91 and 0.92 people were putting entity-encoded HTML in descriptions.

Posted by Dave Winer at 2004-06-04 05:39 PM

Excellent.

Posted by Smug Canadian at 2004-06-04 05:49 PM

Dave - but plain text can't be interpretted the same way as HTML, as plain text relies on line breaks for spacing and paragraphs, while HTML would use p and br tags.

I have a 0.91 feed displaying incorrectly in bloglines *because* bloglines is interpreting it as escaped HTML, it would appear - even though it's plain text with plenty of returns and whitespace.

(The feed in question is blogs.salon.com )

Posted by Dan Dickinson at 2004-06-04 05:49 PM

Sorry, it's a 0.92. My mistake.

Posted by Dan Dickinson at 2004-06-04 05:52 PM

what about titles?

Posted by J. Random RSS User at 2004-06-04 05:53 PM

> the compatibility is still there, in this case, both ways. Plain text is also HTML

If that were true, then this change would be unnecessary. Think about it.

Posted by Mark at 2004-06-04 05:54 PM

Dan, all the aggregator developers we've talked to report that they work that way, we've yet to find an exception. So it's up to the content provider to format the text appropriately. I'd take it up on one of the user support lists for your content tool (which I see is Radio), this is not something we can deal with here, we're just clarifying a bit of spec text (since Mark is here, proposing to clarify a bit of spec text, I hate lawyers!). It sounds like Bloglines is doing the right thing, in other words.

To Mark, it was a mistake on my part to try to discuss your issue in a professional manner. I deleted the exchange, and let your initial comment stand. You can make a point once, that's enough. If you want to make a case out of it, try it in your own space. And please don't post anymore here, we all know you're a competitor, and you want to hurt RSS. We're trying to do something small here to improve things for developers and users, and don't appreciate your interference. Thanks.

Posted by Dave Winer at 2004-06-04 05:55 PM

Thanks to all for all. <>

However, I can't help but ask what is wrong with the Reuter's solution (as I understand it).

<whatever-tag><![CDATA[5 </whatever-tag>

5 < 8, ticker symbol

It's "easy on the eyes", and easier to machine-parse, both.

Not saying current suggestions are NOT a huge improvement, btw. And agree it's good to see the spec improve, and is it really, Really, REALLY crucial whether spec says this is (or is not) this is compatible with .91?? (I dunno.)

I think a bigger question is if CDATA omitted, do aggregators still treat the stuff as HTML in most (all) elements, in general??

Posted by J. Toran at 2004-06-04 06:26 PM

Haha! First line said "<No sarcasm...;->" within the brackets.

Posted by J. Toran at 2004-06-04 06:28 PM

Thanks to Nick Bradbury, Brent Simmons, Greg Reinacker, Jake Savin, Dare Obasanjo, Matt Mullenweg for their help working out this proposed clarification.

This is great work, guys. Keep the ball rolling.

Posted by Steve Gillmor at 2004-06-04 06:38 PM

Dave, I think there is a problem with Example 4. Everything between "<![CDATA[" and "]]>" are treated as character data meaning that no markup maybe present. Entities (numeric or otherwise) are considered markup.

Posted by Don Park at 2004-06-04 06:53 PM

Arrgh... I probably should-a put :-> rather than a wink, because I meant seriously that I'm not being sarcastic when I say thanks to ALL. And I noticed the example is squirrelly, to boot:

Example 3: Encoding angle brackets in text

<whatever-tag>5 << 8, ticker symbol <<BIGCO>>

5 < 8, ticker symbol <BIGCO>

Dunno above example any better.

Posted by J. Toran at 2004-06-04 06:54 PM

Oh, I see now. They are treated as literal but 'literally HTML'. Sheesh, no wonder you had to clarify that.

Posted by Don Park at 2004-06-04 07:00 PM

Any particular reason for this approach over an XML-oriented one? (i.e. use XHTML in descriptions).

I'd strongly urge that this isn't rushed through - at least give it a few weeks. There may be issues that may not be obvious. I suspect there may still be character sequences that could cause systematic breakage - have you fully checked around the ]]> string?

Posted by Danny at 2004-06-04 07:34 PM

Agree that it would be generally helpful to at least reccomend a particular version of HTML; this would be particularly helpful for non-browser-based aggregators, where libraries as robust with bad HTML as multi-million-dollar browsers aren't at easy to come by.

Posted by Jeremy Bowers at 2004-06-04 07:51 PM

As always, we're just documenting current practice. No one is putting XHTML in there. And people are using all versions of HTML. If people want, they can include Atom bits in their feeds. Not many aggregators will tune into it, but they can still do it. No need to reinvent the wheel.

Posted by Dave Winer at 2004-06-04 07:57 PM

(cough) I am putting XHTML in there, admitedly it is a one-off home-grown implementation.(It does verify through rss.scripting.com)

It just made sense for me to impliment it in that way, because I generate my homepage blog andRSS from the same source.

Is it permited to use XHTML?

Posted by Joseph Palmer at 2004-06-04 08:11 PM

1) It would be useful to show examples of entity-encoded HTML for entities other than left and right angle-brackets (i.e. < and >).

2) It should be clarified what entities are to be encoded, e.g. are all non-ascii entities to be entity-encoded? Everything over 128? Etc.?

3) As the w3c seems not to define "entity-encoded HTML" anywhere on its site, a pointer to the definition of "entity-encoded HTML" might be in order.

Thanks,-- Frank Leahy

Posted by Frank Leahy at 2004-06-04 08:12 PM

The reason I posted is that feedvalidator doesn't like ’ in my rss feed. See

www.feedvalidator.org

I assume this error will go away?

Posted by Frank Leahy at 2004-06-04 08:16 PM

Btw, iirc, there's no element described to specify WHAT version of HTML/whatever the content is in, as well as whether this item of the feed has ACTUALLY BEEN validated, and specifically by what brand/version of validator. (This would also be a useful place to put CCSID or code-page, imo.)

Would this not help??

For one thing, feeders could post which validators they'll use, and aggregators could post which validators they can handle. And I would guess that'd be a way to avoid some no-small corn-fusion that way.

(I know this is probably NOT current practice "in the wild", but dunno why it would be hard to implement for the confusion it could remove. This would seem to also be an easy way to surface incompatibilities between various validators (on assumption that nobody'd be brain-dead enough to force everything through 1 person's validator).)

Posted by J. Toran at 2004-06-04 08:16 PM

I read the RFC and the comments here and I have to say as a user:

1) Thanks to Dave for being flexible and willing to clarify the spec. This will help improve my overall experience.

2) Thanks to the developers involved. Their participation ensures that users and developers are working together for a common good.

3) I'm not qualified to comment on the examples from a technical perspective, but the four listed make it clearer. If I'm trying to implement the spec and have a question about < item > tags and encoding, the examples will make it clear. They are appropriately "simple".

Posted by Steve Kirks at 2004-06-04 08:17 PM

Just a thought, but I'm guessing a few people reading this could quickly create some JavaScript to convert a snippet of HTML to entity-encoded text. If so, this would be a nice addition to the examples page.

Posted by Nick Bradbury at 2004-06-04 08:25 PM

FYI : "Character entity references for ISO 8859-1 characters" is at:www.w3.org

Posted by Joseph Palmer at 2004-06-04 08:49 PM

Nick, here you go:

var str = "some valid html & a little more";str = str.replace(/&/,"&");str = str.replace(/</,"<");str = str.replace(/>/,"&>");

Also, you need to add this as a test case:diveintomark.org -- as posted by Mark Pilgrim earlier, and deleted for some reason by Dave Winer.

Posted by Isofarro at 2004-06-04 09:13 PM

Nick, here you go:

var str = "some valid html & a little more";str = str.replace(/&/,"&");str = str.replace(/</,"<");str = str.replace(/>/,">");

Also, you need to add this as a test case:diveintomark.org -- as posted by Mark Pilgrim earlier, and deleted for some reason by Dave Winer.

Posted by Isofarro at 2004-06-04 09:15 PM

What happened to Detente?

www.intertwingly.net

Posted by Randy Charles Morin at 2004-06-04 09:56 PM

Danny,

I think entity-encoded markup is the best solution. Specifying XHTML would shut out a large number of posting tools which still produce legacy HTML, while specifying entity-encoded allows for XHTML to be used if desired (just entity-encode it, like any other HTML).

Regarding the CDATA close sequence, I would just point out that the XML spec does allow for people to store CDATA inside CDATA, and there are explicit rules for what to do here (I think you add a space between ]] on each successive pass or something). It should be totally possible to nest CDATA sections and round-trip. However, you have a point in that very few XML parsers (AFAIK) properly implement this. So if someone is wanting to nest CDATA, I would recommend that they use character entity encoding rather than CDATA, which is much easier to understand anyway, IMO.

Posted by Joshua Allen at 2004-06-04 10:02 PM

Frank: This particular issue only affects the entity codes used to escape text in XML: the codes for "<", ">" and "&". Your issue relates to the character set of your feed, which does not appear to support the character giving you trouble.

Posted by Rogers Cadenhead at 2004-06-04 10:06 PM

Frank,

I agree it would be nice to have an ampersand in the example too. &, <, and > are the only characters that matter. Your issue with quotes is an encoding issue. Try utf-8 or something.

Jeremy,

It might be a "best practice" to refain from using anything beyond

, , , in the HTML. But I'm not sure if that belongs in this specific spec clarification note, or if it reflects current practice.

Randy,

This is a great example of what happens when detente is observed. Real progress.

Posted by Joshua Allen at 2004-06-04 10:25 PM

Bah, I meant "b", "p", "i", "a" tags in HTML.

Posted by Joshua Allen at 2004-06-04 10:26 PM

Dave, I haven't seen you address Mark's complaint. The original specifications allow description elements to contain plain text. This alteration would disallow that. A feed which had previously been valid might now not be. Mark even supplied you with a test-case, but you just deleted his comment and complained he wasn't being professional.

As you state, a common consensus amongst aggregators has been to always treat description element contents as HTML and not as plain text. So it's not a bad thing that the specification is being refined, it removes unpredictability. Unless a change like this is made, breakage can occur even when people are adhering to the specification. This is a good change in my opinion.

But that doesn't mean this doesn't break compatibility with previous versions of the specification. Previous versions allowed plain text. After this change is made, it will not. Instead of blowing off Mark's legitimate complaint because you two don't get on, how about simply acknowledging this fact so that the change can go through nice and smoothly without all the bellyaching so commonly associated with RSS?

Is it a corner-case? Sure. Is it easily fixed? Sure. But that doesn't mean you should pretend this doesn't break compatibility with previous versions when it clearly does. Apart from noting this incompatibility, it'll give Mark and others less reason to moan at you. It's also more professional than deleting comments from people you dislike that point out flaws in your specification. That doesn't help anyone.

Just to re-iterate: this is a *good* change. I'm all for it. Sweeping the break in compatibility under the carpet is *not* a good thing, and all you need do to appease me is make a note of it in the updated specification.

Posted by Jim at 2004-06-04 10:28 PM

Jim, it's not for me to address it. This was a group project, the proposed clarification is what we came up with. We put it up for comment, and now you're commenting. I was very clear to the group about this, and now I'll be clear with you, I want out of the hot seat. So if you've got an issue, state it, and move on. I'll ask the others what they think, and then we'll decide if there's an issue or not. This is no kind of environment for such a discussion. We learned this the hard way over many years. Some people don't have good intentions. Many of those people are here. Can't have this discussion with them involved. Sorry.

About giving Mark a reason to moan at me and your other emotional arguments, read the red text at the top of this post. You're over the line. I'm going to let it stand, for now, assuming you move on.

Posted by Dave Winer at 2004-06-04 10:38 PM

Cool.

Posted by Robert Sayre at 2004-06-04 10:58 PM

> I want out of the hot seat. So if you've got an issue, state it, and move on.

Okay, fair enough. Please read my previous comment as "I would like to see this issue addressed" rather than "please address it, Dave". I don't have a problem with changing specifications as long as incompatible changes are clearly marked and the version number incremented to reflect it. My issue is that this seems to be being treated as a clarification rather than a change. As far as I am concerned, this is definitely a change and needs to be labelled as such.

Apologies for including the comments about being professional, I was led to believe this was within acceptable limits as it was intended to be constructive and you made a similar comment above.

Anyway, now my position is clear, I'll drop the matter.

Posted by Jim at 2004-06-04 11:01 PM

Looking at backend.userland.com it does state "any 0.91 source is also a valid 2.0 source." Perhaps to be consistent with today's proposal, this should be updated to note that 0.91 descriptions are assumed to be text/plain whereas 2.0 descriptions are assumed to be text/html.

This is a minor change, of course, and one that will have little impact other than meeting everyone's goal of clarity. When the time comes to consider this idea, I'd be willing to take the hot seat.

In the meantime, the consensus here seems to be that today's proposal is a welcome one, which is certainly a relief to me. BTW, for the sake of politics, I should make it clear that my role in this should not be interpreted as a vote for any particular format. As is the case with all wars, my concerns lie more with the collateral damage than with the battle itself.

Posted by Nick Bradbury at 2004-06-04 11:26 PM

The proposed change is:

Either way, contains entity-encoded HTML.

This is probably too wordy, but today's discussion has led me to think the intent might besomething closer to:

The contents of shall be entity-encoded per HTML 4.01, part 24.

But I'm worried that that locks the spec to ISO 8859-1, and what about other encodings?Is there a better way to say "per HTML 4.01, part 24." that is freindly to other encodings?

Posted by Joseph Palmer at 2004-06-04 11:59 PM

Sorry, I forgot to encode the tags. My bad. Please substitue:

Either way, <description> contains entity-encoded HTML.

The contents of &glt;description> shall be entity-encoded per HTML 4.01, part 24.

Posted by Joseph Palmer at 2004-06-05 12:03 AM

Dave uses a lot of run-on sentences in his blog, which I've always assumed was just some sort of attempt at creating a conversational feeling (at the risk of annoying English grammarians). But in a spec, shouldn't the punctuation be more standard? I'm referring to "An item may also be complete in itself, if so, the description contains the text, and the link and title may be omitted."

Posted by parle at 2004-06-05 12:36 AM

Nick: practically speaking, treating 0.91 feeds as plain text isn't going to make for a very pretty aggregator. It took me about twenty seconds at syndic8 to find a couple of purported 0.91 feeds with entity encoded HTML.

One way of possibly minimizing the pain, which I just discovered this afternoon that EffNews is doing: look at what's between encoded angle brackets. If there's anything that you recognize as HTML, then treat the whole description as HTML (even though it might not be intended that way), if there's no recognizable HTML then treat it as plain text (even though it might just be HTML that you thought was so unlikely that you don't look for it, or an HTML tag from a version you hadn't heard of at the time). You'll wind up being wrong on two edge cases, people talking about HTML tags in what they intend to be plain text and people whose only HTML is a fieldset, but you still should wind up right more often than either treating all 0.91 as plain text or treating all RSS no matter what version as encoded HTML. It's not *right*, but it ought to give you more happy users and more happy producers than any other way out of this trap.

Posted by Phil Ringnalda at 2004-06-05 02:06 AM

Looks like great clarifications. Its great to see a spec move with what the community is actively doing.

Posted by Philip Miseldine at 2004-06-05 09:20 AM

Phil R: Yes, you're right that in practice many 0.91 feeds are not plain text, which is why I think changing the 0.91 spec will have zero *real* impact.

FWIW, the way FeedDemon "sniffs" HTML is very similar to what you describe - and yes, I've already run into those edge cases you mention :)

Posted by Nick Bradbury at 2004-06-05 11:00 AM

This is a clarification of the language in the RSS 2.0 spec, not a change to RSS itself. Escaped HTML was allowed in a description element; this new language provides more guidance on how implementors should deal with it.

This should be considered an example of how the RSS advisory process can work best. Aggregator developers with a strong stake in RSS approached us with a problem that they were having interpreting the spec. They recommended the language based on what they've been doing in practice, the subject has been opened for public comment and will be considered next by the RSS board.

Looking at the RSS 0.91 spec, I don't see any language that restricts the description element to text/plain. Where is that assumption coming from? Lacking that, I don't see how an aggregator developer could be assuming that RSS 0.91 description elements are text/plain.

Posted by Rogers Cadenhead at 2004-06-05 01:33 PM

> Escaped HTML was allowed in a description element

I know it was. That is not the issue, perhaps I didn't make myself clear enough. The issue is that plain text was allowed in a description element and will not be after this change. Did you examine Mark's testcase? Isofarro reposted the link above.

> Looking at the RSS 0.91 spec, I don't see any language that restricts the description element to text/plain.

I'm not stating that it does, in fact the informal defacto standard of entity-escaped HTML arose and was explicitly condoned as an alternative by RSS 2.0. I'm stating that RSS 0.91 doesn't require the element contents to be entity-escaped HTML. This "clarification" changes this.

I don't have a problem with this as long as it's clearly noted as a change and the version number incremented. I'd like the RSS 2.0 specification to remain stable rather than change the requirements it places upon documents. The only way I can see that RSS 2.0 can be kept stable and also accomplish this change is to keep RSS 2.0 unchanged, and publish RSS 2.1. I really don't see why there is resistance to this. Is there something I am missing?

Posted by Jim at 2004-06-05 03:15 PM

I would be surprised if there was a single aggregator that handled that test feed without treating the item's description element as HTML and presenting it improperly.

So if plain text was allowed, but the implications of permitting escaped HTML made it impractical if not impossible to use "<" and ">" to produce the characters "<" and ">", the notion that this clarification newly disallows plain text is meaningless in practice.

Let's keep this grounded in the real world. What RSS 0.91 publisher would be hurt by this clarification? My guess is that they are already using "&lt;" and "&gt;" to produce "<" and ">" in their feeds, because that's the only way to make them show up in most, perhaps all, aggregators.

Posted by Rogers Cadenhead at 2004-06-05 03:49 PM

On an aside, my impression is that a decision on this will be made by the board. Will a fifth member be added to the board to assure the odd number of votes, as described on the board FAQ page?

Posted by Dan Dickinson at 2004-06-05 08:40 PM

Rogers: sorry, I wasn't clear *which* 0.91 I meant. I was talking (mostly) about Netscape-RSS-0.91, which says "We also are not allowing any HTML markup beyond the commonly used entities" and describes the description element as "a plain text description of an item, channel, image, or textinput." However, I think you could make a case that Userland-RSS-0.92 saying "Further, 0.92 allows entity-encoded HTML in the <description> of an item, to reflect actual practice by bloggers, who are often proficient HTML coders." as a change from 0.91 means that Userland-RSS-0.91 didn't (formally) allow entity-encoded HTML.

Posted by Phil Ringnalda at 2004-06-05 09:53 PM

Yikes. Netscape's 0.91 spec was offline for more than a year, perhaps longer. I've been going by the RSS 0.91 spec hosted by UserLand, which has always been available.

The idea there are differing RSS 0.91s out there that need to be reconciled makes my head hurt. But I'm going to defer thinking any more about it until we unearth an RSS 0.91 producer counting on text/plain.

Posted by Rogers Cadenhead at 2004-06-05 11:13 PM

> I would be surprised if there was a single aggregator that handled that test feed without treating the item's description element as HTML and presenting it improperly.

What is supposed to define RSS - the specification, or the behaviour of popular aggregators? If this change is insubstantial because nobody (you don't count Reuters?) pays attention to this part of the specification, instead referring to popular aggregators, then why bother making this change at all?

The specification is changing. It is not being clarified. It is changing for the better, and changing to more accurately reflect real-world usage, but it is *changing*. All I'm asking for is for this to be acknowledged in the updated specification.

> What RSS 0.91 publisher would be hurt by this clarification?

You're still not understanding me! I am in *favour* of this change! I want it to happen! I'm not arguing against it!

I'm disturbed that you are still referring to this as a clarification. Can we at least agree that it is a change, and that the testcase provided is an example of a feed which started valid and is now becoming invalid due to this change, even if you don't think it's going to be a problem in the "real world"?

Posted by Jim at 2004-06-06 10:04 AM

Hey! My feed doesn't work any more!

Posted by Mr. Safe at 2004-06-06 12:49 PM

I understand what you're saying, Jim, but I don't think the "is this a change or is this a clarification?" question is as important as you do. Regardless of what we call it, the new language will be noted on a Change Notes page (if approved).

As for Reuters, if an RSS 2.0 producer needs the literal characters "<" and ">" in a description element, why can't they replace "<" and ">" with "&lt;" and "&gt;"?

Posted by Rogers Cadenhead at 2004-06-06 02:00 PM

"Jim" wrote, "What is supposed to define RSS - the specification, or the behaviour of popular aggregators?"

First of all, allowing anonymous input to a forum like this is an exceedingly bad practice.

Second of all, "Jim", you have already plainly and obviously decided what is "supposed to" define RSS, which is you.

Others may (iow, WILL), see things otherwise.

If anyone can definitively answer your question, "Jim", then they are full of BS.

Because a person that answers one way or the other has the benefit of being blissfully ignorant of the difference between theory and practice.

NOW, mebbe those who have (unfortunately...;-) had the differences between theory and practice and reality burned into their synapses by decades of actual real-world experience can move on and work towards some more real progress in these kinds-a discussions.

Any objections?

Posted by J. Toran at 2004-06-06 04:46 PM

> As for Reuters, if an RSS 2.0 producer needs the literal characters "<" and ">" in a description element, why can't they replace "<" and ">" with "<" and ">"?

I'm not saying that the new specification is difficult to change to. I agree it is easy once you know that you need to do it. Perhaps it would be better to give an example of why this concerns me so much.

We are developing a family of websites, one of which is the "main" one for our client, and the others use it as a data source. I would like to use RSS as the glue that ties these websites together for a number of reasons. Right now, being read directly by end-users using a popular aggregator isn't a priority.

What concerns me is that if you aren't treating changes properly (i.e. not issuing a new version of the specification), when a new website in the family is created in the future, I can't say "the feeds are here, and they are RSS 2.0" to whoever is implementing it. Who knows what else you've quietly changed under the guise of "clarifications" because of your limited perception of aggregators? I just want a static specification I can refer to.

Now, if you make changes and publish an RSS 2.1 specification, this isn't an issue, because valid RSS 2.0 feeds carry on being valid RSS 2.0 feeds, and when somebody looks at the RSS 2.0 specification, it means the same thing it meant yesterday.

I'm not saying that this is a common position to be in, it's just an example of an underlying principle - potentially pulling the rug out from under people by altering specifications that are in wide use can't be a good thing, can it?

What I can't understand is why there is resistance to what I am saying. Can I ask again what the problem is with issuing a new version number?

J. Toran, I don't get why you are so hostile. If you disagree with what I am saying, then by all means, point out my mistakes instead of ad hominem attacks. Right now, you're being a bit vague as to what your actual problem with what I am saying is. I'm certainly not trying to define RSS, and don't understand how you could have arrived at that conclusion from what I have been saying. Why do you want RSS 2.0 to be changed without the version number being incremented?

Posted by Jim at 2004-06-06 05:42 PM

Jim, J Toran is right, and I'm glad someone had the guts to call you on it.

In the future we will only take comments from people who use their full names and clearly state their affiliation. I get timid because we attract so many attacks, enough already. You've made some nasty accusations here. You know that we haven't made changes to the spec because we tell you we haven't. We're honest people. Now if you want to say otherwise, use your own space. This is a place to comment on the proposed clarification. You've made your point, repeatedly. We understand what your complaint is. Move on.

Posted by Dave Winer at 2004-06-06 06:06 PM

When you introduce a new version number, you tell the entire RSS-implementing world that they have fallen behind the times and need to catch up with you, because RSS 2.1 is now the current release.

It would slow the momentum of the format's adoption at a time when we're finally bringing a lot of foot-draggers and Mister Safe types into XML-based Web publishing.

It would also break the claim that RSS 2.0 is frozen and new developments should be made in namespaces and in differently named formats.

This proposed change in language wouldn't make any RSS 2.0 feeds different from what you'd like to call RSS 2.1.

Given all of that, I don't see how there's a benefit to renumbering RSS based on a language clarification that supports existing practice.

Posted by Rogers Cadenhead at 2004-06-06 06:19 PM

> You know that we haven't made changes to the spec because we tell you we haven't. We're honest people.

I never meant to imply that you weren't, and I apologise if you got that impression. The fact is, I don't read this blog a lot. I don't want to check the specification for changes every time I think somebody might be doing something new with one of our feeds (not that I would necessarily be aware of every use!). I just want to feel safe in telling people that the feeds are RSS 2.0.

Please don't change posting policy because of me. If people want to remain anonymous, they'll just use fake names and fake affiliations. I don't give my identity because I don't want my inbox flooded with flames, and I don't give my affiliation because I don't want to expose my company to a community that has some vocal and hostile members (that's just a general observation, I'm not pointing fingers at anybody). I hope you can appreciate that. It would have been nice if you held J. Toran to the same standards as others and called him on his personal attack though.

> When you introduce a new version number, you tell the entire RSS-implementing world that they have fallen behind the times and need to catch up with you, because RSS 2.1 is now the current release.

But as we have both agreed upon, the changes are non-existent for almost everyone, and it is trivial to change to the new version.

> It would slow the momentum of the format's adoption at a time when we're finally bringing a lot of foot-draggers and Mister Safe types into XML-based Web publishing.

I disagree, but obviously you are in a better position to judge that than me.

> It would also break the claim that RSS 2.0 is frozen and new developments should be made in namespaces and in differently named formats.

Surely the change itself does this and not the new version number? The RSS 2.0 specification allows new version numbers at the moment doesn't it?

> This proposed change in language wouldn't make any RSS 2.0 feeds different from what you'd like to call RSS 2.1.

I thought we had agreed that Mark's testcase was just such an example?

In any case, I appreciate that you took the time to answer my questions, Rogers, and the attitude that went along with it. It's nice to see I'm not banging my head against a brick wall, even if we end up on the opposite sides of the fence wrt. the version issue. It would be nice for you to acknowledge that this is a new requirement and not just a clarification of language though!

Posted by Jim at 2004-06-06 06:59 PM

Jim :

(quote)Therefore, the RSS spec is, for all practical purposes, frozen at version 2.0.1. We anticipate possible 2.0.2 or 2.0.3 versions, etc. only for the purpose of clarifying the specification, not for adding new features to the format. Subsequent work should happen in modules, using namespaces, and in completely new syndication formats, with new names.(unquote)

So, this clarifiaction would effectively take RSS from 2.01 to 2.02. You go tell your developers that your feed is (still) 2.01, and everyone is happy, not?

Posted by Sander van de Donk at 2004-06-06 07:17 PM

blogs.law.harvard.edu

Posted by Sander van de Donk at 2004-06-06 07:20 PM

Jim, I use my real name and take all that comes with it.

How could I possibly object to what J Toran said about you when I have no idea who you are. See the problem? You've made yourself ad-hominem-proof. I'm sure not willing to stick my neck out for you if you aren't even willing to say who you are.

I know about the intimidation. I've had prospective employers ask me why Mark Pilgrim says I'm such a bad person (he makes ethical charges, accusing me of stealing and plaigerizing among other things). I can't explain it. I'm sure I've lost jobs because of him.

Net-net, I'm not willing to stand up for someone who won't even say who they are. Pretty selfish of you, whoever you are.

Posted by Dave Winer at 2004-06-06 07:37 PM

Thanks, Sander. As I have repeatedly stated, as long as the version number is updated, I am happy.

Dave, I've tried to find areas we can all agree on. I've tried to clarify my position. I've apologised for causing offense. I've tried to show my support by repeatedly stating I am in favour of this change. I'm trying damn hard to keep this on topic. Please understand I am doing my best to participate constructively.

> How could I possibly object to what J Toran said about you when I have no idea who you are.

Because you say right at the top of the page: "off-topic and personal comments will be deleted". If you can use that as the reason to delete Mark's on-topic, non-personal comment that merely pointed out a problem with the proposed change and even provided a test-case, it's hard to accept that you praise J. Toran for his abuse! I thought we were trying to have a constructive discussion here?

> I know about the intimidation.

So I'm sure you'll understand my desire to shield myself and others from it. I don't see how that makes me selfish.

In any case, this is getting way off-topic. I assume you'll delete this comment and your own of course? My other comments were all on-topic, but your last comment and this one are entirely off-topic. I'm sorry, but I couldn't let it go when you accuse me of being selfish and nasty and have demonstrated that entirely personal off-topic comments are in fact acceptable.

Posted by Jim at 2004-06-06 08:01 PM

Okay, got it. Now move on.

Posted by Dave Winer at 2004-06-06 08:12 PM

Sorry to see that such a minor issue concerning 0.91 has caused so much debate - it makes we wish I'd never given my opinion on the matter.

Getting back on topic, perhaps it would be more helpful if instead of continuing this, we simply stated whether or not we agree with the proposed RSS 2.0 clarification, followed by a summary of why we've taken this viewpoint.

For the record, I'm for this clarification. It removes the ambiguity of the existing wording by clearly stating that descriptions are assumed to be text/html, and it aids feed producers by providing a set of examples.

Posted by Nick Bradbury at 2004-06-06 08:46 PM

Nick, how would you feel if we omitted the clarification to the language of the spec, and just linked to the page of examples? Right now I don't think we have a consensus that the clarification is needed, but it doesn't seem that the page of examples can do any harm. My guess is that it will achieve the purpose you guys wanted to achieve, helping to resolve problems in problematic RSS 2.0 feeds, without giving certain people something to scream about.

Posted by Dave Winer at 2004-06-06 09:27 PM

Honestly, I think the examples are more valuable than the clarification itself since they provide feed producers with a clear idea of how to properly encode their descriptions. So while I'm in favor of the clarification, just having the examples accomplishes most of what I need.

Posted by Nick Bradbury at 2004-06-06 09:58 PM

My 2 cents:

1. I agree with the cause and intent of this clarification.

2. I am uncomfortable with the actual wording of the clarification, because I find "entity-encoded HTML" to be ambiguous when describing a field that might contain anything from very, very plain text to entity-encoded valid XHTML. (But... I?ll get over it.)

3. Since verification software checks for more than just the <. >, and & characters, I suggest that the examples include at least one other character that must be represented by entity-encoding.

Posted by Joseph Palmer at 2004-06-07 12:43 AM

Echoing a previous comment - what about the title field? We'd like to be able to put ticket symbols in there too.

Posted by Roddy MacFarquhar at 2004-06-07 01:31 PM

I'm glad someone with Reuters has showed up here, because I'm curious about the ticker symbol markup. Are you putting it in RSS elements so that it will show up as the literal text (i.e. "<BEOS>") when displayed on a Web page, or are you putting it in as markup that should be hidden in all clients that don't do anything intelligent with the symbol?

If the former, is encoding it as "<BEOS>" a workable solution for you?

Posted by Rogers Cadenhead at 2004-06-07 02:07 PM

I hate to just say "me too" -- but, me too. I support the clarification for the same reasons Nick Bradbury stated:

"It removes the ambiguity of the existing wording by clearly stating that descriptions are assumed to be text/html, and it aids feed producers by providing a set of examples."

Posted by Brent Simmons at 2004-06-07 08:55 PM

I support this clarification.

My hope is that the RSS Advisory Board will vote in favor of it, and that after the dust has settled a bit, that tool developers, aggregator developers, and most importantly users, will be able to work with RSS with less fear of being sucked into wasteful flame wars.

I would prefer that both the examples and the clarification to the language of the spec were adopted, but would still support the addition of the examples only.

I would also prefer that the RSS version number remain unchanged, though I would not object if the Advisory Board were to vote to change the version to 2.0.2, since as far as I can surmise, changing to 2.0.2 would not violate the roadmap.

Disclaimer: I work as a developer for UserLand, and work on tools which both produce and consume RSS. I participated, though not centrally, on working out this proposed clarification. My statement of support is mine, as a developer who works in this market, and should not be interpreted as having been made by UserLand.

Posted by Jake Savin at 2004-06-07 08:55 PM

First, let me say I would be happy to see the comment deletion policies enforced here. There are a number of comments of a personal nature which clutter up the thread. I also think that the discussion about "is it called a clarification or a change; even if it is a clarification, will you please change the version number" is really cluttering up the thread and should not be on-topic for this discussion. The decision of what to *call* this clarification (and what numbering scheme, etc.) is absolutely secondary to the decision about *what* the clarification should say. I think this thread should be concentrating on the *content* of the clarification, rather than what to *call* it. Please agree that the discussion about what to *call* it would be held in a different forum, and start deleting off-topic posts.

Additionally, some people have raised the issue that there could be some ambiguity about what impact this would have on RSS 0.91 feed providers. I think that this is a non-issue, and if anything, the clarification for RSS 2.0 simply documents what is current practice and would have been an issue for 0.91 providers anyway (Nick and Phil already made this point, I believe). The actual real-world issues that a provider of RSS 0.91 feeds would face are largely independent of the RSS 2.0 spec clarification under discussion (the connection is certainly debatable). For example, completely missing from the discussion is any hard data about what the actual shipping aggregators do with RSS 0.91 description elements (I suspect that most, if not all, handle HTML in the description element in direct contradiction to the "requirement" that was cited). In other words, if none of the aggregators would treat the HTML in description as plaintext, then you would be doing feed providers a grave disservice by encouraging them to treat this node as plaintext. Again, I think it would be best for concerned parties to collect the appropriate data, and if desired drive this separate issue on a separate thread.

Regardless, I completely support the current clarification. Even if it were not documented in the spec, it is exactly what I would have told Reuters (or any other provider) to do. So it passes the "common sense" and "best practice" test, and it belongs in the spec.

In addition, I would suggest that the examples be ammended to show content with an ampersand, and perhaps some mention of using minimal HTML tags to support interop. However, these suggestions are not critical, and would not affect my support for the proposal.

Posted by Joshua Allen at 2004-06-07 09:57 PM

I've had 48 hours to absorb this since returning from vacation and continue to wrestle with it.

What I can easily support, and what I believe is consensus, is the point echoed in the last several posts that the examples are good and that encouraging plaintext is not.

What has been making me queasy is the potential for logical inconsistency raised by Mark. Is he correct in saying that descriptions are encoded as plaintext in RSS 0.91? Afaict the spec [1] doesn't say one way or the other, but I admittedly haven't been involved with RSS as long as many of you have.

Here's the thing. I'm in favor of documenting current practice, passing the common sense test, and making aggregator developers' lives better. And I agree with Joshua Allen that the potential impact on RSS 0.91 feed providers is a non-issue. But I'm not in favor of a spec that lies or tries to rewrite history. If plaintext descriptions are considered legal RSS 0.91 --- and I don't yet have much data here --- then it seems that we can't both adopt the new language and keep the statement about upward compatibility. I know that upward compatibility was an important consideration in writing the original 2.0 spec and this seems non-negotiable. If this is really where we are, I'd say the right thing is to keep the examples and either drop the wording or replace it with a strong encouragement to use entity-encoded HTML and perhaps a pointer to a document that straightforwardly lays out the ambiguity, its historical origins and best practices for feed producers and consumers. In such a case I'd also be in favor of adopting the rssHints proposal outlined in [2].

[1] backend.userland.com[2] blogs.law.harvard.edu

Posted by Andrew Grumet at 2004-06-08 01:58 AM

In truth the ticker symbols are probably a combination of markup and literal text. If you look at the corresponding story to a RSS entry on reuters.com you'll see we convert the ticker symbol to the symbol and three links (in the future we might do the same thing for ticker symbols that appear in the description element of our RSS feeds). Yahoo does something similar.

However, in the absence of any intelligent recognition and handling of ticker symbols by the display application then it's preferable that the ticker symbol itself does appear (rather than nothing at all). Yes, entity encoded HTML is a workable solution - but can we apply that to the title as well in the cases were ticker symbols appear there?

Posted by Roddy MacFarquhar at 2004-06-08 02:35 PM

While this debate happened awhile ago, I would have to say that theuse of entity encoded HTML in descriptions text thwarts "good" content.Many of the RSS feeds that I have surveyed have content removed fromits context that whould not display correctly--event without the useof XHTML.

This renders XML processing tools almost completely useless. If this isn't"corrected" in RSS 2.0, when will it be the case that the contents ofa description are well-formed? The lesson learned from HTML was that allowingbad content to be displayed was a really bad idea. Some baseline levelof compliance with XML is needed here.

Why not allow a description element to have a "encoding" attribute? If theencoding is XML, the content is well-formed and could be XHTML. The defaultwould be to assume it is entity encoded HTML.

Posted by Alex Milowski at 2005-01-05 07:48 PM