I have been struggling with Bi-Directionality of text in RSS for some time now. No aggregator that I know of supports Bi-Directionality properly.
I didn't see here any mention of Bi-Directionality nor of character set.
I assume that the character set question is solved with the assumption that RSS is XML and XML text is Unicode by definition.
I would like to know where the directionality of all text elements should be specified in RSS 2.0.
Regarding a community process for testing aggregators to see if they can handle feeds with xmlns attributes, Mark Nottingham recalls that I started one on his RSS Profile wiki just over a month back, the RSS Profile Testbed, whose one test suite during that effort was a Top Level Namespace.
Thanks, Mark! Time flies!
Languages that are written from Right to Left (RTL) like Arabic and Hebrew have two aspects: the direction of which the text is written (and read) and the alignment of the text.
(A short archeological explanation: Hebrew is a very old language. It was originally carved on stone tablets with a hammer and chisel. The chisel was held in the left hand and the hammer in the right hand. Hence, the easiest direction for writing was from right to left.)
Specifying the direction makes sure that the mixture of Hebrew (and other languages) and Latin text in the same sentence is displayed in the same order it has been written. Another example is punctuation. A full stop should be displayed at the end of a sentence, to the left of the last word.
The other issue is alignment. RTL languages should be displayed aligned to the right.
As I understand it, RSS does not specify how the content of the elements should be displayed.
On the other hand, not specifying the direction and alignment of the text prevents a standard way of writing and displaying RTL text in RSS.
The standard HTML method of dealing with the RTL issue is like this: <div dir=?rtl? align=?right?>
Taking the same path will ?break? the way RSS looks.
The other way is adding optional <direction> and <alignment> to all text elements.
I hope that a standard solution for the RTL problem will be specified in the RSS specs. Let?s do it fast before de-facto ?standards? settle in which will be hard to be changed, in both writing and reading RSS software.
Contact me on any questions you have on this issue.
RSS feeds can specify their language. That combined with using the appropriate character encoding (if it's not UTF-8 or -16) should be enough to allow an aggregator to handle it.
There are about 40 different languages known to be of use in feeds. Whether or not aggregators properly handle RTL is anyone's guess. Given many readers make use of an HTML browser for their display it's possible this isn't an issue. The browsers do seem to handle this. Now, do various XSLT and HTML table layout tricks take this into account? Probably not.
Truly and completely internationalized software is a VERY big pain to create. Being able to intermingle different charsets and text display styles, while it seems like it ought to be easy, is a profoundly complex issue. That browsers handle this at all is a credit to them. But I'll venture a guess most don't handle intermingled data in the way an aggregator spanning across many languages might need.
Start with a test scenario. Look at how extant readers handle it. Then look at what their display rendering systems have available. I suspect making use of the existing standards for language tagging AND proper use of character encoding will solve this problem without adding anything new.
To get a handle on what feeds and languages are being used, visit the stats page on Syndic8: http://www.syndic8.com/stats.php?Section=feeds#FeedLang
There are some in Hebrew:http://www.syndic8.com/feedlist.php?ShowLanguage=he
Less fortunately, however, there are a great many that have failed to apply a language tag:http://www.syndic8.com/feedlist.php?ShowLanguage=