The spec says about categories:

<category> sub-element of <item>

<category> is an optional sub-element of <item>.

Ithas one optional attribute, domain, a string that identifies acategorization taxonomy.

The value of the elementis a forward-slash-separated string that identifies a hierarchiclocation in the indicated taxonomy. Processors may establishconventions for the interpretation of categories. Two examples areprovided below:

<category>Grateful Dead</category>

<category domain="http://www.fool.com/cusips">MSFT</category>

Youmay include as many category elements as you need to, for differentdomains, and to have an item cross-referenced in different parts of thesame domain.
But what if an element itself uses the forward slash, forexample: Hydrogen/potassium ATPase, which -- it's been pointed out--  is a valid Library of Congress Subject Heading?

We think the simplest solution will be to escape the forward slash in that element, so for example:

Hydrogen%2fpotassium ATPase

We invite users of taxonomies in which this issue arises to comment on whether this will, in fact, work acceptably.
Comments

The spec says about categories:

<category> sub-element of <item> <category> is an optional sub-element of <item>.Ithas one optional attribute, domain, a string that identifies acategorization taxonomy.The value of the elementis a forward-slash-separated string that identifies a hierarchiclocation in the indicated taxonomy. Processors may establishconventions for the interpretation of categories. Two examples areprovided below:<category>Grateful Dead</category><category domain="www.fool.com">MSFT</category>Youmay include as many category elements as you need to, for differentdomains, and to have an item cross-referenced in different parts of thesame domain.
But what if an element itself uses the forward slash, forexample: Hydrogen/potassium ATPase, which -- it's been pointed out--  is a valid Library of Congress Subject Heading?We think the simplest solution will be to escape the forward slash in that element, so for example:Hydrogen%2fpotassium ATPaseWe invite users of taxonomies in which this issue arises to comment on whether this will, in fact, work acceptably.

I have the same problem both at work and at home. At work, Antarctica sells visual interfaces to complex category trees, and we have been working slowly to expunge the various places in the code that "just know" that '/' is the category separator. Same story at ongoing, made harder by the fact that there is a unix directory structure corresponding to the category tree so everything kind of "just works". Blame Thomson and Ritchie. And when you're done doing that, go ahead and escape it like Jon suggests.

%2f is URI encoding. Is there a reason that entity encoding isn't appropriate here (as it would save yet another encoding scheme from being used in the same text.)

URI vs entity encoding: I thought of that too. Either could work, and I actually have no strong preference one way or the other. I'd be interested to hear more perspectives on the pros and cons of doing it one way or the other.

I don't think entity encoding is a good idea, because this is already in XML, right, so if I say something like

<cat>foo/bar</cat>

Then after it's been through an XML parser the software sees

<cat>foo/bar</cat>

and everything goes on breaking the way it did. Alternatively, youcould use a name foo&slash;bar - but I don't think that works either; either you have to declare the entity, in which case you're back to foo/bar, or you don't in which case your XML isn't well-formed any longer.

Hell, why not foo/bar? Two less characters.

Heh, in my first example I *used* the ampersand-#x2f; form and the comment-posting machinery just turned it into a '/', thus proving my point.

but what about double-encoding it to

<cat>foo&amp;#x2f;bar</cat>

this should be read by an xml parser as

foo&#x2f;bar

which would subsequently be rendered as

foo/bar

(I hope this comes out ok - it's always a guess as to what comment-systems do with embedded tags or entities)

I would think that is preferable to using URI encoding in an XML document

phew - my triple encoded comment-entry did indeed display as a double-encoding like I planned it ;-)

I really wish there was a standard for what comment-systems do with embedded tags or entities, or at least have more sites post an explanation as to how they handle them.

If markup is allowed at all, you basically have to have a preview mechanism.

:
:
:

Popular Pages on This Site