RSS Advisory Board

Update: This proposal was withdrawn on 10/17/03.

A few weeks back, a question was raised by Jeremy Zawodny on behalf of Yahoo. They have a large number of RSS feeds that they want to make available to aggregators. They need a machine-readable format and a default location for the file. Further, this file should be able to contain links to other files in this format so that directories can be distributed.

Further, this file should be able to contain links to other files in this format so that directories can be distributed. A format and location is proposed in this document.

Proposal: An application that wishes to find public feeds for a Web site, say http://www.foobar.com/, should request http://www.foobar.com/myPublicFeeds.opml. If it can find this document, it should read the contents and look for a sequence of top-level <outline>s with at least the following attributes: type, text, description, xmlUrl. These attributes provide enough information so that the aggregator can provide a reasonable user interface for choosing one or more feeds from the site. Type must be rss; text is the name of the feed; description is a one or two sentence description of the feed; xmlUrl points to the feed. Even though the type is rss, the feed it points to may be in any common syndication format understood by aggregators.

An <outline> may contain a type attribute with the value link, and an xmlUrl attribute that points to an OPML file to be included, allowing directories to be distributed. Included files may also include links to other files, without limit. In general, the names of OPML files should end with ".opml".

Analysis: There seem to be two options: 1. a custom format, or 2. use OPML.

I like using OPML although I see advantages in using a custom format. A custom format would be like changes.xml for weblogs.com, a flat list of files, each with a name, description, and url, and perhaps some other data, as attributes, or as sub-elements. Lots of choices, all subject to taste, relatively easy to implement on both the production and consumption sides. Would require new code for aggregators. Would require a new convention for linking.

Most aggregators can already read OPML because it is the default format for collections of subscriptions, which is exactly what this application is. Further, OPML already has a convention for n-level inclusion, implemented in hierarchic web directories. It's also quite easy for content systems to produce OPML files.

I decided to go with OPML because of its familiarity to aggregator developers.

References: Robots Exclusion, OPML 1.0

Comments

Jeremy Zawodny is sharp dude. Hope he does well on this.

Replying to the "which format to use"-question: I would go for OPML, because the format already exists and applications (aggregators, outliners) know how to read it.Why invent a new format when there's already one that does what it needs to do?

So OPML is the right choice, if you ask me ;)

Not much to comment on here. It makes sense, it's simple and it has a purpose that amplifies the user's experience.

I don't like the idea of discovering the file only by appending"/MyPublicFeeds.opml" to the domain. This smacks of the mess that isFavIcon (check your logs so see how many failed requests there are forfavicon.ico if you don't have one).Rather, I would like to see a <link /> tag like that used to point to RSS feeds.

yep, specifying a URL assumes too much about the site. you may also not be able to put something in the root directory. a good list of the tradeoffs for well-known locations by mark nottingham.how about getting rid of the "My" business (which is overused i think,with all the "My Documents" etc) and just name it feeds.opml. but ofcourse, if you use the link tag you name it as you please :) maybe now would be a good time to rally behind a single type for OPML in the link tag.Now that OPML has been in the wild for a while, it may also make sense to look at how it is commonly used and update the spec to reflect best practice.

I like OPML for this too, however I would request one small change to the <type> definition.Whilst I understand that the xml file we are pointing to may be in any format(not just rss) I think this is an opportunity to extend this proposal to provide a simple meta-feed protocol for several different messaging protocols, e.g. IM-based notifications, calender reminders, tool specific stuff(like say jabber). Now these types of messages may not yet have an agreed xml structure of their own, but may have in future, and as such I think we should allow the type attribute to hold more than "rss". This would be a good pointer to the aggrigator to "load" the relevant parser for each meta-mesaging protocol when it loads the file.We might even suggest a range of suggested values("rss", "IM", "calendar", "custom" etc.)

Thanks Dave -- this is great.

First, it would be nice to have an optional value for the source. This would represent the location of the item that is being RSS-ified (i.e., the HTML page in most cases).

Second, say there are multiple feeds (e.g., RSS 0.91 and RSS 2.0 or a headline and a full-text feed). It would be nice to have a preferred flag so an aggragator can automatically pick the best feed from the set.

It's unfortunate that there are two places to comment on this. It fractures the discussion. There are some replies posted here as well.

Sounds good to me. I'm happy to deploy this on the Topic Exchange as soon as it's stable. OPML format is fine. I already have an OPML topic list, as does k-collector, so it won't be much work to turn that into a list of RSS feeds ...

It would be nice if there was some way to query the server for the URL, but I can't think of something that wouldn't depend on the existence of some _other_ URL ... so this sounds all right :-)

uh keep it simple ... why not just index.opml?

I like the idea of an autodetection tag (like already in use with RSS) instead of (or maybe in addition to?) a standard file location. This allows agents to find it without making a likely-to-fail request for myPublicFeeds.opml on every server.

<link rel="feedList" type="application/opml+xml" title="myPublicFeeds" href="www.scripting.com">

the relation, type, and title I totally pulled out of my ass and probably requires more thought, but I like the general form.

If simplicity is required, why not just use a text file listing the feed URIs?

If further description is required, why not use a language designed for the purpose, such as RDF. There is already a compatible channel-listing vocabulary in the form of OCS.

What exactly is gained through using OPML?

Responses to some of the questions and comments:

blogs.law.harvard.edu$310

I agree that OPML is really close to what we need for this sort of function (if not there already). I also like David's idea of the link to find the list. So much so that I added it to my own page. :)

I also already had a list of feeds avaialble on my site in OPML format, so it was easy enough to change the format slightly to match this proposal.

As a bonus, my MT Outliner plugin should be able to parse a standard OPML file without much difficulty, to make it trivially easy to pull this data in for display on a page somewhere.www.cxliv.org

Hiya,

For what it's worth...

1. I would prefer not to place OPML files at a standard URL. Using a standard URL:- makes it more difficult to find OPML files for subsites, which may have a home page in a directory- makes it more difficult to generate OPML files using a script or process- it denotes a specific outline markup format (OPML) when the site ownermay have decided to use a (perhaps not yet existing) alternative.- does not allow an owner of one site to point, for the sake of convenience,to an OPML on another site

All servers designate a default page (usually, but not always, index.htm) to serve when only a directory if specified. This page is a logical location for a pointer to a site OPML page (it would also have been a logical place for a pointer to a robots page as well).

This is essentially the practice that has been used with great success forsuch things as CSS, remote Javascript, images, objects, and the like. Services and related files should, in general, be pointed to rather than assumed to be in a specific place.

2. Without commenting on the specifics of OPML, it will be important for my ownpurposes that any such file be extensible. This may or may not involve theuse of namespaces, though in order to avoid confusion of vocabulary, I preferto be able to point to a schema.

I have been involved in work using RSS files for the distribution of educationalcontent. My experience has been that it is useful to extend RSS vocabulariesto include education-specific information. This will be the case for OPML filesas well.

3. 'Type' should allow for a variety of types. I would prefer to see somethinglike "xml/rss 0.91" or "xml/dc" allowed; this allows me to preselect my parser. It also allows for the development of post-RSS syndication formats.

Thanks for the opportunity to contribute.

~-~-~-~-~-~-~-~-~-~-~-~-~-~-~- ~-~-~-~-~-Stephen Downes ~ E-Learning Group ~ National Research Council Canadastephen@downes.ca -~- www.downes.caFor free daily news and information about e-learning and relatedtechnology, visit OLDaily at http:

I like it. I'm still with the camp that says it belongs as a LINK in the site's homepage, inefficient as fetching the entire index page to look for it may be.

Remember: not all the places you would want to put an OPML feed list are going to be at the "root" URI of a host. How would someone at the Manila-based Salon Blogs--which puts blogs at URLs in the form of "blogs.salon.com/BLOGID" publish myPublicFeeds.opml and expect to get it picked up?

Still, if this is going to be the way it's done, I'd suggest a filename that's less prone to case mangling--maybe even an 8.3 one so the broadest range of HTTP-capable embedded devices can offer feeds.

I think it would be up to blogs.salon.com to manage the feed, not the individual sites.

I tend to agree with Joe Gregorio, see bitworking.org - what you really want to do is read a file from a well-known name on the "home page", only (sigh) the Web doesn't know about home pages, so you try to fake it by using the first directory after the host name, which works well on some sites, not on others. Also, bear in mind that we're only 1% in to the lifespan of the Web, it's just bad practice to start grabbing little pieces of the namespace. Joe's piece is worth reading end to end, he makes more than one good point.

OPML would be OK for this, it might also be worth looking at RDDL, the latest draft isn't up at rddl.org yet, check out a recent version at www.tbray.org - you wouldn't be able to do anything you can't do with OPML, but it has the noticeable advantage of being human-readable.

Re "Also, bear in mind that we're only 1% in to the lifespan of the Web"

Other people are quoting you on that, it's cute, but it's malarkey. You have no crystal ball that can read the future. People said the same thing about CP/M and the Apple II, and it didn't turn out to be true.

I read Joe's piece, I'm still not convinced. Every system has reserved names. For good reason, without them you don't get any extensibility. It's one of those tradeoff things you keep running into in software design.

The big picture is that there is no way to move forward without giving something up, usually it's flexibility in some manner. In this case it means that an application that presents a UI for choosing a few feeds to subscribe to might find something other than an OPML file at this location and it may contain something other than a list of feeds that can be subscribed to.

Seems like a trivially low cost to me.

What you're all saying I think is that you would like to be charged with designing this feature. If you had the power you'd do a much better job. But even that's not true. There are so many cooks here that the cost to innovate is so high, there's virtually no satisfaction in it. Read some of the flames on the Syndication list for a clue.

The best thing to do is to ask yourself "Can I live with this?" and if so, just let it go. Even Joe Gregorio would have to say he can live with it, because he's living with the heinous robots.txt. Even Tim Bray's ancestors can use it because they honestly won't know what the world wide web was any more than kids today know what visicalc was, or cp/m or PDP-11's, all of which loomed large in our youth.

Why not find some bigger fish to fry?

what are the valid values for type? can i list alternate feed types for the same thing ? e.g. rss 0.91 for my blog, rss 2.0 for my blog, rss 2.0 with comments for my blog. How would the client tool pick up that they are alternates for the same thing, rather than different things ?

Per the spec, the two types supported are rss and link.

The spec doesn't say whether you can list alternate feeds for the same content.

What you're all saying I think is that you would like to be charged with designing this feature.

That is not what I'm saying.

I am saying I need more flexibility than is contained in the original proposal.

The best thing to do is to ask yourself "Can I live with this?" and if so, just let it go. What I'm saying is that I cannot use the current proposal. That's all. I need flexibility of file formats and reference, as described in my previous post. I believe these will be common requirements, but I will not speak beyond my own need.

If the current proposal is what is implemented then I will find myself in the position of standardizing on the (inevitable) alternative, and treating cases using the current proposal as an exception.

I will now, as per your suggestion, find other fish to fry.

-- Stephen

Stephen, can you provide a concise list of things you MUST have to be able to use this format, and also what you plan to use it for.

Sure. Here is a set of needs, with some use cases:

1. Format must allow for various file names (and specifically, various extensions, such as .php, .cgi, etc) to allow for auto-generation of feeds (lemma: feed must declare its own format (eg., as OPML) in the header) since the list of feeds available will vary as class participation varies.

2. Format must allow for multiple sets of feeds (for example, a school may provide a list of feeds for its student blogs in each of ten classes; for example, MERLOT or EdNA (learning object RSS feed providers) may provide lists of feeds in different subject areas for different types of learning).

3. Format must allow for feed lists to be located on different sites (for example, Altoona school board (www.altoona.ca) may provide a list of feeds, which would then be used by individual schools (www.medowlark.ca) to point to feeds relevant to their parents, students; for example; for example, a university (www.ualberta.ca) may point to a list of feeds provided by a member institution on a different domain (www.albertalab.ca) or in the home directory of a subdomain (www.extension.ualberta.ca) or directory (www.ualberta.ca/pressoffice/) .

4. Format must be extensible (for example, a feed may be designated as subject=biology or grade=12 or content=learning objects).

5. Format must allow different XML feed types (for example, the chemistry department provides an OAI style feed which serves Dublin Core, the journalism department has opted for NewsML, philosophy uses RSS 0.91; it is preferable to identify these different types in the list of feeds).

-- Stephen

Okay, thanks for the list. But something is really puzzling, I think it does allow for many of those uses. Can you whittle your list down to things that you know for a fact that myPublicFeeds.opml spec disallows? And then we can deal with lack of clarity in the spec.

It's not clear to me that it does (but that's OK, I have been known to be wrong about numerous things over the years).

1. Because the proposal species a particular filename, it seems to me that different filenames (and in particular, different extensions) are excluded.

2. OPML allows multiple sets of feeds, because it creates an internal hierarchy. However, these lists of feeds must be in the same file. But there will be cases where these feeds need to be in different files, for example, for security reasons. But because there is only one filename, it is not possible to put different lists of feeds in different files.

3. The use of a single filename, in the top level directory, similarly precludes the use of files on different domain names or subdomains.

4. Though there appears to be no restrictions on the use of names attributes, there appears to be no way to associate these names with a given vocabulary, making extensibility difficult. In educational metadata there are already different XML formats in use with the same name (eg., IMS:rights and DC: rights; IMS:creator and DC:creator) and different extensions, and we need to be able to specify which of these we are using. If, however, I can do this:<outline text="Major weblogs"><dc:creator>Joe Schmo</dc:creator><ims:typicalagerange>7-12</ims:typicalagerange></outline>Then I will be fine.

5. The original proposal said Type must be rss. Though it allows that though the type is rss, the feed it points to may be in any common syndication format understood by aggregators, it would help me a lot to be able to sort by feed type at the outline level, rather than having to dereference each feed in order to determine its type. That is why I asked for "type=xml/rss0.91" or something that would be similar to (and work like) mime types.

The key thing, therefore, is this:

Most of my issues are caused by the static filename. If I can make the filename anything I please, generate more than one instance of it, and then point to it (them), I will be fine. I can add more tags into the file whether or not it is supported by the spec, which can be ignored by harvesters not depending on education-specific data. But I can't get around the filename thing, not if I want to dynamically generate lists of feeds across distributed environments.

That would mean that if someone at blogs.salon.com/fooblog wanted to have a couple of feeds--say, one with all of fooblog's recent stories and one with only the stories about whiskey--the system that hosts blogs.salon.com would have to have a mechanism for centrally maintaining this list of feeds, and users would have to wait for the gatekeepers to bestow the functionality on them. It means less personal control for individual publishers and more responsibility and power for the central provider. A newspaper columnist would have no easy way of getting her column's RSS feeds indexed unless they were created through a central mechanism sanctioned by her paper that included it in its one central OPML file.

Putting the pointers to the OPML in the <head> of any page a content creator deems an entry point means it will always be there to be picked up by indexers. Yes, it means more bandwidth spent on serving out blog pages just to get at the <link>, but it also means more power in the hands of individuals and less in the hands of central authorities. Isn't that what RSS is about? ;)

The idea of placing a file like myPublicFeeds.opml in your root directory, is by itself ok, but making it a standard is not. People should be able to place the file anywhere they like, and people should ALLWAYS link to it from their indexpage (or wherever they want to put it).

Encouraging standards like "Go look for www.foobar.com" when some program wants to finds public feeds, will cause the error logs of people who DON'T use this feature to be overflown by requests to this file. Bothering people who DON'T use this standard should never be the intention of a new standard.

I agree with subscriptions available in OPML just like it was until now with mySubscriptions.opmlHowever is NO good for a standard: to impose a URL to a specific file. Due to different technical problems that might arise and also to let the people the CHOICE there should never be such "www.foobar.com" standardisation. Instead, the best idea would be to use autodiscovery just like for the RSS files. Therefore people should include in their index pages something like:<link rel="alternate" type="application/opml+xml" title="subscriptions.opml" href="www.foo.bar"/> for example.

Also what it bothers me and no-one seems to do anything about it is that people give in their aggregators time to refresh for a feed. Based on the RSS update time but also some may want to give a different update period. Whenever they migrate their feed list from aggregator to aggregator they have to manually change maybe hundreds of such update periods. In my opinion there should be a refreshTime attribute (in minutes) as part of the standard, for easy migration of feeds lists from aggregator to aggregator, along with the other attributes specifying the feed address and description.

As the internet gets more global English will be a minority language, so I don't like the idea of an hard-coded English phrase.

I know there is so much English hard-coded into the fabric of the Internet, but let's not make it worse :-)

:
:
: