In today's News.Com article, Sam Ruby raises an issue, for the first time that I am aware of, about namespaces in the RSS directory. He says:
"Dave Winer has on a number of occasions pointed out namespaces and said that they break interoperability," said Ruby, the RSS alternative advocate, who is a senior technical staff member at IBM in Raleigh, N.C., and a director of the Apache Software Foundation. "His RSS spec points to a list of namespaces, and it's extremely selective. It includes certain ones and not others. It's extremely confusing. I don't know anyone who knows what is and is not acceptable."
We've tried to be very inclusive. Basically the rule is, if it works with RSS 2.0, then it's included. To see if it works with 2.0, we look at the page describing the namespace, and if it claims RSS 2.0 support, we believe them.
As far as I know I have never "pointed out namespaces and said that they break interoperability."
Postscript: I've gotten some email on this, and I guess he means that I asserted that core elements should be used over elements in namespaces when they do the same thing. I don't think there's any question that that approach increases interop. "Break" is far too strong a word. Also, a reporter might not understand the difference, but I'm pretty sure Sam does.
Please define "if it works with RSS 2.0" and "*we* look at the page"...
What is to "work with RSS 2.0"? To validade?
Where I write... [[One might ask, "why make up a whole namespace all about jobs, then call it 'perljobs'? isn't the concept and vocabulary more broadly applicable?". The same issue arises with Job and locations and dates.]]
...a combination of typo and braino obscured my point. I meant that the same issue arises with RSS and Jobs, in that a Job-description namespace (being about employers, jobs, locations, dates etc.) is no more or less "about" RSS than it is "about" Perl. They just happen to have called it 'perljobs', but it appears from looking at their work to be much more widely applicable (which is great!). As a set of RDF properties, I could use them (with no further prior agreement) with any other RDF vocabulary. So RDF has this free'n'easy policy allowing for namespace mixing without 1-to-1 agreement by namespace creators. If RSS2 doesn't go that route, it needs to explain which route it has taken instead, so we can have some hope of telling which namespace 'work' with it or not.
pps. this weblog seems to have hyperlinked my use of the word 'about' (maybe I emphasised it with underscores?). sorry for any confusion there.ppps. fwiw all these comments also hold for what I've seen of the Echo/Necho/Pie/etc design too...
Dan, you read way too much into what I wrote. I'm just trying to say that Sam either made it up, or got it wrong, or there was a transcription error. I agree -- namespaces shouldn't be coded to one format or another.
On the other hand I don't feel bad about insisting that they point to this format if they want it to point back to them. That's how the Web works.
I agree a friendly link between complimentary namespaces is a good and Webby thing to have. But I'm still having trouble understanding how RSS2 namespace extension works. From RSS2's point of view, are all namespaces OK to use, or are some (for some technical reason) inappropriate? Presumably the definitions of those other namespaces might rule out embedding in RSS2 (eg. RDF vocabularies one might expect to only appear in files that meet RDF's XML syntax; other vocabs might demand being the root element, or other things that can't co-exist with RSS2's requirements).
I guess namespace mixing (outside the RDF arena, which is to a certain extend, architecturally in its own closed world) is pretty much a researchy area. We know it can be done, but concepts such as 'validation' when applied to mixes of XML namespace can be tricky. I've heard RELAX-NG is pretty good on that front, but haven't really investigated it.
To stick with examples, say I'm making an RSS feed for something to do with Music, and I find some XML namespaces that provide handy terms such as 'track', 'artist' etc., and perhaps an actual list of artist IDs. What procedure should I follow to determine whether it is OK to augment an RSS2 feed with markup that references this namespace? (same goes for movies, jobs, locations, and so on). Or say I'm the author of that namespace, how can I tell whether it would be appropriate for me to write "this namespace can be used with RSS2.0.*".
I don't know Dan, maybe we can pop the stack one more level and explain what Sam Ruby was trying to say? XML theorists seem to forget that there are confused users who have a vague understanding of what XML is. Sam's statement confused a fair number of people who I heard from. Now don't put me on the defensive for that. Okay?
I don't see Dan trying to put you on the defensive, Dave. He's simply asking you a legitimate question: If he writes a namespace that is NOT written explicitly for RSS (i.e., his Music namespace example) how does he know whether or not it is technically valid for RSS2? The point is, there is no way for it to be considered valid (and testably so) unless either (a) you say it is valid (what is your criteria? where is it documented? is it automatable?) or (b) the author says it is valid (again: criteria? documentation? automation?).
I don't think Dan is interested in what Sam had to say about you -- as you say, that's a personality issue, which shouldn't be an issue at all. Dan's question (and mine, and many others') is a *technical* question. Sam has nothing to do with the RSS2 spec.
This is the problem people have had with RSS2+namespaces. You prefer not to use them, which is fine, that's your preference, and a preference shared by many people. But many others do want to use them, and without a rigidly-defined spec it is virtually impossible to determine independently (in other words, without you or the namespace author) whether or not the RSS2+namespace combination is technically valid.
As my wife is fond of saying about the stock market, "It's all based on air." A good laugh, but it's inherently true here -- the RSS spec is notorious for being difficult to test in itself, and Mark and Sam had a hell of a time writing a validator for it because of the way the spec was written.
Just some thoughts...
Hmm, perhaps I should re-word the last paragraph of my previous comment. It's not my intent to slam the spec or the spec-writer, merely to point out that a spec written in loose prose is inherently difficult to test against. You've done a hell of a lot to get weblogs and RSS/syndication to it's present state, and do deserve that credit.
Recently on scripting.com you said that you fall into the "literary" mode of thinking about XML. That's fine, and as long as you are aware that there is an entirely different way of thinking about XML and are willing to consider arguments from that point of view, then great!
Dave I haven't got any algorithms automatable or otherwise. I like to take the broadest possible approach, unless something has been proven not to work with RSS 2.0, I'd like to include it in the list. I can't imagine why this is so confusing.
Also, I think Dan is looking for an argument, not enlightenment. He's flamed me so many times, called my integrity into question so many times, I'm just wary of dealing with him. There's never any way for me to win with him. He is putting me on the defense, trust me on that, there's no happy ending possible.
Dave Cantrell is correct to point out that I've no interest in Sam Ruby's comments. The reason I'm writing here is because your comments here regarding RSS2,
"To see if it works with 2.0, we look at the page describing the namespace, and if it claims RSS 2.0 support, we believe them."
...got me to thinking about how this would work in practice, for people creating and discovering RSS-friendly namespaces. I didn't come here to talk about your integrity, I came here because I wanted to hear, in your own words and in your own space, how you expect this to work. Namespaces are about control, and about devolving control to collaborative, complementary efforts. RDF is one architecture for putting that into practice. If you're not going to use RDF, and given your 'FUCK RDF!' views that seems likely, then I'm curious about what you're going to do instead. The namespaces spec is agnostic about so much, that additional technology is needed to tell the story about how independently managed namespaces can play well together. If RSS2 really does devolve control through namespaces, we need some guidance from the specifiers of RSS2 as to how we know when we've done it right.
Dave Cantrell writes,
"Dan's question (and mine, and many others') is a *technical* question"
If you don't answer it for me, answer it for the rest of the RSS2 user and developer community, and for the (far larger) community working to create XML and XML/RDF namespaces that can be used together. It's your spec, just tell us how it works!
My $0.02 is that there is no special requirement for use of a namespace in an RSS 2.0 feed. I have used Dublin Core, XHTML, mod_content. So far as I can see, none of these has -- or needs have -- any explicit "OK to use with RSS 2.0" blessing.
It further seems to me that modules in the RDF namespace can be included in an RSS 2.0 feed. I'm currently trying to come up with a test case, and would appreciate suggestions, because I want to do the experiment. I'd also love to hear your thoughts on the feasibility of doing such a thing.
If I have read correctly Dave Winer "I like to take the broadest possible approach, unless something has been proven not to work with RSS 2.0, I'd like to include it in the list."
I guess this could work for RSS 2.0
"""Namespace-based modularization affords for compartmentalized extensibility, allowing RSS 2.0 to be extended:
1. without need of iterative rewrites of the core specification
2. without need of consensus on each and every element
3. without bloating RSS with elements the majority of which won't be used in any particular arena or application
4. without namespace collisions"""
And with the addition of that
""" RSS modules must not introduce conflicts by ad hoc modification of the content models of any other module or the core. Modular extensions may not be considered stand-ins for required core elements (eg. dc:description at the channel level does not obviate the need for including the required description element). """
Thanks for the reply. Good to know that they don't need any blessing, and - yes - it makes sense for me for your spec not to attempt to specify a namespace-mixing architecture, combination rules etc. That's a big job.
As far as mixing in other's namespaces, one issue we have is the variety of technologies those parties might have employed to define their vocab (XML DTDs, various XML schema languages, RDFS, OWL etc). Each of those languages has different facilities for describing the syntactic constraints as well as aspects of the meaning of a vocabulary. DTDs don't know about namespace URIs; RDF schemas and OWL ontologies don't know about XML syntax rules.
It simply isn't clear to me how best to do this, ie. whether we consider the owner of the namespace always to have the final "say" as to where their content can be put, or whether the XML namespace mechanism means we can include anything, regardless of the original expectations of the namespace author.
RDF is an interesting sub-case. RDF vocabularies form a semi-closed community. The normal case for RDF vocabularies (such as FOAF, DC, RSS1 modules, MusicBrainz, Wordnet, CC., etc) is to appear in an RDF syntactic context, ie. when we know the XML structures are arranged according to the RDF/XML syntax, ie. as a serialization of an RDF graph. If they're appearing in a XML syntax known to be an encoding of an RDF, everything is sweet, since we know for each XML element (whether from known or unknown namespace) what role it is playing, ie. whether it is encoding a 'node' or an 'arc' from the graph.
The tricky case is where RDF namespaces are mixed in an XML context where the XML is not generally being arranged as an encoding of an RDF graph. In such cases, all bets are off, we don't really know what is meant by an unknown XML element or attribute, since there is no over-arching set of cross-namespace rules that are being followed.
So, what to do? A stark line (one I don't believe anyone is taking) would be to assert that RDF vocabularies can only appear in XML documents that are structured according to the (current) RDF/XML syntax. So any use of FOAF in non-RDF XML would be illegal, for example. That strikes me as wrong for a couple of reasons. Firstly it increases the gap between RDF and XML when we should be working to reduce it. Secondly, it would create an upgrade hell if W3C (or anyone else) were trying to deploy an alternative XML encoding of RDF graphs (there are a few of these and one may catch on...). There may end up being other XML encodings of RDF graphs, and we won't want to go changing all our RDF vocab namespace URIs to celbrate that occasion.
So a less stark line, and one I'm currently inclined towards, is to explore some rules along the lines of "It is OK for XML markup to draw upon RDF namespaces even when the enclosing XML markup isn't in RDF syntax, so long as all elements below that RDF-namespaced element are structured in accordance with a regular RDF syntax.
You mention test cases, which sound like a great way forward. I guess we'd have something like:
(hmm can I use markup here? let's see...)
(assume default ns is rss2 and if this example seems dumb, we could redo using Jobs vocab)
<item xmlns:dc="purl.org">a photo of a dog called bob<dc:format>image/jpeg</dc:format><foaf:depicts> <wn:Dog><foaf:name>Bob</foaf:name><foaf:homepage><foaf:Document rdf:about="bobthedog.example.com"><dc:title>A Dog's Life</dc:title></foaf:Document></foaf:homepage></wn:Dog></foaf:depicts><etc>more rss2 stuff here, and other XML namespace with no structure required</etc></item>
...so I'd be happy seeing that sort of thing. ie. a self-contained island of RDF, which internally is OK by RDF syntax rules (ie. matches a production from the RDF/XML grammar, in this case starting at a property element, 'depicts').
Another test case, which I'd be inclined to rule ill-advised, would be one which mixed in non-RDF namespaces and syntactic structures right alongside the RDF-oriented markup. For example:
oops my finger slipped and I hit submit. Hmm looks like markup was ok but isn't indented nicely.
2nd example was going to basically be anything like the first but containing random extra bits which (from the <foaf:depicts> inward) fails to parse as RDF. The problem there being that the things named in an RDF namespace map to either properties (relationship types, attributes) or classes (types of thing), and you can't tell which except through use of the RDF syntax rules. If the XML isn't sticking to those rules, and you encounter some unknown tag, eg '<aergkldjg>' or '<Abc>', it isn't clear what the intent was. So one stray non-rule-abiding tag would spoil it for everyone...
If we *know* that the above example (from my prev post) has been composed according to the rules of RDF syntax, then we know that it is telling us that there the RSS2 item foaf:depicts a thing that is a wordnet:Dog and that the dog has a foaf:name and a foaf:homepage, and that the homepage is a foaf:Document with a dc:title etc etc.
If we have no guarantee that the markup was composed according to the RDF rules, then the intent could have been that, and could have been something else. We really can't be sure enough what was meant. While that might not matter when syndicating information about pictures of dogs and their homepages, it surely does matter if we're syndicating ecommerce info (see recent Amazon explorations...) or Job details (see rssjobs, perljobs etc. links above).
So the suggestion is that when RDF namespaces are used as extension namespaces in RSS2 (or imho any other non-RDF XML context) we make sure their sub-tree of XML does stick to RDF syntax rules.
Sorry for the verbosity, I've been trying to think this one through. People want to use FOAF markup within RSS2 feeds, and I didn't want to just say "nope, it ain't an RDF format so no-can-do", but at same time, I didn't want to say "Sure, mix the tags freely with any random XML structures", since that'll make for huge confusion and unparse-able data.
Does something like the rule above make sense to you as a rule for RSS2 namespace extensions where the namespace being used is one originally defined as an RDF vocabulary?
ps. my above example is missing various namespace declarations, which would need adding for test case
Here's another example. It looks RDFish, but the syntax isn't RDF-compatible.
<item>..<foaf:depicts><wn:Cat foaf:name="Chris the cat"/><wn:Dog foaf:name="Dave the dog"/></foaf:depicts></item>
Here it is in its full RDF-happy glory:
(RDF requires 'striping' of node and edge elements)
<item>..<foaf:depicts><wn:Cat foaf:name="Chris the cat"/></foaf:depicts><foaf:depicts><wn:Dog foaf:name="Dave the dog"/></foaf:depicts></item>
So the question here isn't whether RDF's syntax is a bit verbose (it is -- fair cop!) but whether encouraging pseudo-RDF markup such as (eg3) will on balance be a force for progress or chaos. I fear the latter; it looks a bit like RDF, the element names appear meaningful, but it isn't sticking to the syntactic rules of RDF so its interpretation isn't clear.
This might seem like more of an RDF-ists issue than an a RSS2 concern, but since we're (as I understand it) trying to figure out how best we can use RDF vocabularies within an RSS2 (and Echo of course) context, I hope you'll be interested in continuing this conversation...
Dan - I suggest you try pushing the RSS2+FOAF through Morten or Sjoerd's RSSx->RSS1 XSLT. (In Sjoerd version at least) stuff from other namespaces gets passed through verbatim, but in the same effective position in the structure. This is a deterministic process, so if the result is valid RDF (and it looks right ;-) then the RSS 2.0++ can be considered valid against the RDF model. This is potentially a way of machine-validating RSS 2.0 + extensions.
Dan, forgive me, but I truly don't understand what's going on here. Why do you and Jon want to include RDF in RSS? I really don't get it.
> And if you try it with my second example, the not-quite-RDF RDFesque markup, it should fail, right?
> unless the RSS2 feed came with some syntactic flag (rdf="on"?) to indicate that a sub-tree of markup was intended as RDF.
Or maybe a new <meta> element, where all RDF data is put. But I believe that currently there are no RSS2 extensions that would produce invalid RDF. They are either already an RSS 1.0 module, or elements with text-only content.
> Hmm I guess I need to say this all over again for the Echo discussion...
Don't worry, enough people here are also involved with Echo...
Sorry this got so long! I should probably copy it into a wiki or something. The RDF discussion is a sub-task of the bigger question we began with, regarding rules for including non-RSS2 namespaces in RSS2 feeds. Jon confirmed that, from an RSS2 perspective, anything goes. When you said "To see if it works with 2.0..." I was intrigued to find out whether there was a technical story to flesh out what "working with" amounted to.
To answer your question, some (but by no means all) of the namespaces people may want to augment RSS feeds with are defined as RDF vocabularies. The discussion above was an attempt to figure out a story we can tell those folks regarding best practice for embedding markup that uses RDF namespaces into non-RDF enclosing markup, such as RSS2 (and quite likely, Echo/Pie/etc). Why would someone want to include RDF inside an RSS feed? For same reason one might want to use *any* extension namespace: to provide more info so that machines have something to match against or pass on to humans. Going back to the Jobs example, contrast the www.rssjobs.com approeach to jobs.perl.org -- setting aside for a second the question of whether the extension vocab is RDF or plain XML, I hope the value of using namespaces for stuff like 'salary', 'employer','location' etc. is clear. The RSS2 spec does strike me as quite clear on that. For RSSish stuff I read it as saying 'don't bother using fancy extensions, lets keep some useful concepts in the core'; but for things like Job description markup, uses extensions so RSS2 can be frozen.
So assuming the value of (sometimes -- eg case of Job info) using extension namespaces is common ground, the answer to "why would you want to include RDF in RSS" is pretty straightforward. There is a significant body of work going on in the RDF world, vocabularies being created, datasets, tools. We have RDF vocabularies in use for describing documents (Dublin Core), people (FOAF), locations (basic lat/long/alt vocab), calendars/events/times (ical-in-rdf), rights (creative commons), plus 50,000 noun terms via wordnet-in-rdf and various other efforts. As I mentioned earlier, a recent crawl found 179 namespaces being used in RDF docs. The vocabs I mention cover the basics such as 'who/what/where/when' in various levels of detail. It would be great to be able to re-use rather than re-invent these vocabularies for RSS2 and EchoPie feeds. Not every feed will need them. But as the body of RDF vocabularies grow, more and more terms we're defining in RDF-land will prove useful. For eg. with jobs, we can describe locations, organisations, salaries etc. It might be that there is a just-fine non-RDF XML vocab you could use for this; if so, great. But it seems a pity to rule out even the possibility of using RDF work here. In which case, there's a modest bit of technical work that needs doing, for either RSS2 and/or Echo, which is to specify some rules for including islands of RDF markup within a looser 'anything goes' namespace-extensible format.
Does that help explain my goals here?
Basically, re RSS, I'd like, for a given item in a feed to be able to use extension vocabs to say:
- for a picture, if it is a picture, who and what it depicts; where it was taken.- for a job advert, who is advertising it, where the job is, who the employer is, where the work is- for a news article, what technologies, organisations and people it is about- for a weblog article, who it was by, what there homepage, weblog and foaf urls are
Some of us have RDF vocabularies in the works which provide namespaces that can help with such things, we just need to figure out the rules for plugging them into the feed. Those with non-RDF XML vocabs that address these problems will also want to be thinking about namespace-mixing rules, since most XML schema languages don't make that super-easy...
re "Or maybe a new <meta> element, where all RDF data is put", that would need re-inventing for Echo, and any other XML 'host' format. Not to mention extending RSS2, which has already been declared frozen.
re "currently there are no RSS2 extensions that would produce invalid RDF", i'm not so sure: I thought we'd already established (Jon's comment) that RSS2 can be extended using any XML namespace at all. And there are 100s of those, some of which are bound to use markup that isn't RDF. It might well be that people don't want to use any of them and would prefer to use RDF namespaces, that'd be an interesting turnaround! But my expectation is that we'll need both approaches, ie. there will be XML markup in RSS2 feeds which draws from non-RDF namespaces and which shouldn't be treated as RDF-parsable, or mistaken for RDF.
re "Or maybe a new <meta> element, where all RDF data is put", that would need re-inventing for Echo, and any other XML 'host' format. Not to mention extending RSS2, which has already been declared frozen.
I realized later that it probably would be possible to detect in the XSL if the subelement of an <item> is structurally valid RDF and only then let it through. Which means that the rss2rdf.xsl would always generate valid RDF, without changes in the RSS2 spec. Not all of it might be meant as RDF, but some extra meaningless triples don't do much harm, right?
(this blog seems to be ordering our responses strangely, but anyway...).
"some extra meaningless triples don't do much harm, right?"... I'm not so sure.
If we want to be able to treat data carried over RSS robustly, it's worth architecting things so that we don't get muddled up where we could have avoided it. Just because something happens to be structured in a form that parses as RDF, doesn't mean it was intended to say what an RDF consumer will treat it as saying. The rdfweb.org example was supposed to show that; one of RDF's syntactic variants that may be counter-intuitive from an XML perspective.
If we have:
this would parse as RDF if we decided to take x:foo as an RDF property.
So would (again beginning as a property) <x:foo><y:bar c="d"/></x:foo> and some other variations. <x:foo a="b"><y:bar c="d"/></x:foo>, by contrast, doesn't match any legal rdf syntax.
<x:foo c="d"/> would create a sub-graph of this. This isn't immediately intuitive if you only know XML syntax and not RDF's patterns for representing statements as elements and attributes.
So long as these were only pictures of people's pets or annotations on weblog postings, a few mixed up properties might be harmless. But if we're syndicating ecommerce stuff, customer info etc., details of things for sale, it's worth getting this right.
I belive a syntactic flag (in a new namespace, maybe I could draft one at w3.org) for indicating RDF markup subtrees is worth the effort, and could help out w/ inclusion of RDF vocabularies in both RSS2 and Echo.
Elements with an rdf:resource attribute are obviously valid RDF. These are in use already, f.e. the admin module or the trackback module. It should not be required to add a syntactic flag to those. Same for any other attribute in the rdf namespace. If the syntactic flag were also in the rdf namespace the test would be simple. (<xsl:if test="@rdf:*">)
Another problem is which URI to use for the item. Sometimes the guid element is more appropriate than the link element. (In Echo this should always be the id element) Or must rdf:about be required on an item?
This is the url of the page I use to post:
There the order of responses are correct.
Ah OK, I was using (for some reason) this url:blogs.law.harvard.edu
danbri - "I believe a syntactic flag ..."This was the idea of SSR, except done with a holistic approach (man ;-) using XSLT (Sjoerd's, as it happens)
Where did all the comments go?