You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@abdera.apache.org by James M Snell <ja...@gmail.com> on 2006/09/14 01:53:29 UTC

IRI Support and ICU

The Atom specification defines that IRI's can be used anywhere within
Atom documents.  Unfortunately, however, Java 1.5 and earlier does not
include support for converting IRIs to URIs as necessary in order to get
a dereferenceable URI.  Currently we fake it by parsing out to URI, but
that definitely has a number of problems.

For instance, consider the following feed:

  http://www.詹姆斯.com/feed   (James Holderness' weblog)

If I do:

  URI uri = new URI("http://www.詹姆斯.com/feed");

The URI will be created without throwing any errors, despite the fact
that the unicode characters are not legal in a URI.  Calling
uri.toString() will return the URI.

However, calling uri.getHost() on this URI improperly returns null.
Calling uri.getAuthority() returns the host name, but if the URI also
has a port specified, getAuthority() also returns the port (e.g. for
"http://www.詹姆斯.com:80/feed" getAuthority() returns "www.詹姆斯.com:80"

Worse yet, if I call uri.toASCIIString() the output from URI is
http://www.%E8%A9%B9%E5%A7%86%E6%96%AF.com/feed, which is quite clearly
wrong.

Now, all of our (IBMs) implementations have ICU [1] available, which
includes proper IDN support.  It's a simple matter to write an IRI to
URI converter..

Unfortunately, this is *really* slow and ICU is a big package (3.08M for
the jar) and we really don't have need for the whole thing.  It's fine
for platforms that already have ICU, but requiring an additional 3.08M
download so we can slowly convert and IRI to a URI really bugs.

That said, however, I'm not sure how we can get around it. Even the Jena
projects IRI implementation (generally considered by those more
knowledgeable about this than I to be pretty good) depends on ICU.

So, anyway, long story short: if we want proper support for IRIs (which
we need) then we're going to have to introduce a dependency on ICU.  I'm
not happy about it, but I don't see any other way around it.

Thoughts?

- James

[1] http://www-306.ibm.com/software/globalization/icu/index.jsp

Re: IRI Support and ICU

Posted by Charles Adkins <fo...@gmail.com>.
This seems like the best solution available.

I would suggest however, that you consider making the default package
include the more correct and complete functionality provided by the ICU.
Seems more aligned with an "it just works' sort of simplicity.

As reference implementation, shouldn't the correctness of its behavior be
the  primary factor?

Also, looking at a scenario of an "uninformed but interested party" looking
to come up to speed on the ATOM spec; I don't see any benefit in offering a
default configuration that fails in such a significant case simply because
it is a faster or smaller implementation, or even both.

Making the default configuration fail because java 1.5 fails to handle some
fundamental data object correctly seems to be on the face of it just a
fundamentally bad default starting point.

I don't imagine that any ordinarily interested party would be necessarily be
informed enough to have to digest a representative IRI-URI example and make
a decision up front about which options or package configuration to
download.

Why complicate the approach that the novice must take to an already rather
detail-rich tool?

just my two cents.

also, kudos and thanks for all your hard work to date.
Charles


On 9/13/06, Garrett Rooney <ro...@electricjellyfish.net> wrote:
>
> On 9/13/06, James M Snell <ja...@gmail.com> wrote:
>
> > So, anyway, long story short: if we want proper support for IRIs (which
> > we need) then we're going to have to introduce a dependency on ICU.  I'm
> > not happy about it, but I don't see any other way around it.
> >
> > Thoughts?
>
> James and I were just chatting about this, and the idea of making ICU
> an optional dependency came up.  We can treat it just like the digsig
> stuff, if it's there you get some extra capabilities, if not then
> you're stuck with what we've got now, which works other than the IRI
> stuff.
>
> Would anyone have a problem with that?
>
> -garrett
>

Re: IRI Support and ICU

Posted by Garrett Rooney <ro...@electricjellyfish.net>.
On 9/13/06, James M Snell <ja...@gmail.com> wrote:

> So, anyway, long story short: if we want proper support for IRIs (which
> we need) then we're going to have to introduce a dependency on ICU.  I'm
> not happy about it, but I don't see any other way around it.
>
> Thoughts?

James and I were just chatting about this, and the idea of making ICU
an optional dependency came up.  We can treat it just like the digsig
stuff, if it's there you get some extra capabilities, if not then
you're stuck with what we've got now, which works other than the IRI
stuff.

Would anyone have a problem with that?

-garrett