You are viewing a plain text version of this content. The canonical link for it is here.
Posted to j-users@xerces.apache.org by ne...@ca.ibm.com on 2002/09/10 18:10:55 UTC

filenames versus URI's

Hi all,

There are a number of places where the parser has to interact with the file
system (e.g., in resolving systemId's, schemaLocation hints and Strings
supplied to our JAXP #parse methods.)  To my knowledge, all of these
situations are expecting a URI--possibly relative--rather than a filename.

Historically--at least in recent history--we've been more and more
permissive in what we'll accept here.  We can usually figure out, for
instance, that "c:\myfile.xml" maps to file:///c/myfile.xml.  But recently,
there have been a deluge of reports that we can't handle filenames with
spaces or other characters disallowed by the URI spec, or that non-ASCII
characters can't be processed.

It would be possible--in rrinciple--to keep on becoming more accomodating.
It would make our code more complex, and for things like Chinese characters
it isn't clear that that complexity wouldn't be rather substantial.  Or, we
could change course and decide to allow only true URI's to be used
consistently, and restrict ourselves to making sure we can absolutize
relative URI's correctly in whatever context they're given.

What do people think?  Is it too much to ask of applications to provide
URI's rather than platform-dependent filenames?  Do people think increasing
the complexity of our stream-processing code is worth whatever convenience
is gained?  Is it acceptable that, by allowing filenames, we're violating
the letter of many specifications and probably not aiding the cause of
platform/parser independence, since we're being more permissive than other
products are likely to be?

All thoughts appreciated!

Cheers,
Neil
Neil Graham
XML Parser Development
IBM Toronto Lab
Phone:  905-413-3519, T/L 969-3519
E-mail:  neilg@ca.ibm.com



---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-user-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-user-help@xml.apache.org


Re: filenames versus URI's

Posted by Elliotte Rusty Harold <el...@metalab.unc.edu>.
Could you be clearer about exactly which classes and methods you're 
talking about here?

If this is SAX's XMLReaderFactory, then discussions and resolution of 
this important issue should probably take place within the SAX 
community on sax-devel, since it affects more than just Xerces. If 
these are JAXP classes, I guess the JCP is the right place. Off the 
top of my head, DOMParser is the only Xerces-owned class that's 
likely to encounter this issue.
-- 

+-----------------------+------------------------+-------------------+
| Elliotte Rusty Harold | elharo@metalab.unc.edu | Writer/Programmer |
+-----------------------+------------------------+-------------------+
|          XML in a  Nutshell, 2nd Edition (O'Reilly, 2002)          |
|              http://www.cafeconleche.org/books/xian2/              |
|  http://www.amazon.com/exec/obidos/ISBN%3D0596002920/cafeaulaitA/  |
+----------------------------------+---------------------------------+
|  Read Cafe au Lait for Java News:  http://www.cafeaulait.org/      |
|  Read Cafe con Leche for XML News: http://www.cafeconleche.org/    |
+----------------------------------+---------------------------------+

---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-dev-help@xml.apache.org


Re: filenames versus URI's

Posted by Elliotte Rusty Harold <el...@metalab.unc.edu>.
Could you be clearer about exactly which classes and methods you're 
talking about here?

If this is SAX's XMLReaderFactory, then discussions and resolution of 
this important issue should probably take place within the SAX 
community on sax-devel, since it affects more than just Xerces. If 
these are JAXP classes, I guess the JCP is the right place. Off the 
top of my head, DOMParser is the only Xerces-owned class that's 
likely to encounter this issue.
-- 

+-----------------------+------------------------+-------------------+
| Elliotte Rusty Harold | elharo@metalab.unc.edu | Writer/Programmer |
+-----------------------+------------------------+-------------------+
|          XML in a  Nutshell, 2nd Edition (O'Reilly, 2002)          |
|              http://www.cafeconleche.org/books/xian2/              |
|  http://www.amazon.com/exec/obidos/ISBN%3D0596002920/cafeaulaitA/  |
+----------------------------------+---------------------------------+
|  Read Cafe au Lait for Java News:  http://www.cafeaulait.org/      |
|  Read Cafe con Leche for XML News: http://www.cafeconleche.org/    |
+----------------------------------+---------------------------------+

---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-user-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-user-help@xml.apache.org


Re: filenames versus URI's

Posted by Milind Gadre <mi...@ecplatforms.com>.
As suggested by someone else in this newsgroup, why not have a set of
methods that localize the conversion of file names, mis-formed URLs etc
to well-formed URIs. All the 'laxity' or 'complexity' can be hidden in
these methods.

> What do people think?  Is it too much to ask of applications to
provide
> URI's rather than platform-dependent filenames?  Do people think
increasing
> the complexity of our stream-processing code is worth whatever
convenience
> is gained?  Is it acceptable that, by allowing filenames, we're
violating
> the letter of many specifications and probably not aiding the cause of
> platform/parser independence, since we're being more permissive than
other
> products are likely to be?

The parser will then accept *only* well-formed URIs, it is the
responsibility of the programmer to use one of the above mentioned
methods to convert filenames etc to well-formed URIs - *before* calling
any parser methods.

Regards...

Milind Gadre
ecPlatforms, Inc
901 Mariner's Island Blvd, Suite 565
San Mateo, CA 94404
C: 510-919-0596
F: 815-352-0779
milind@ecplatforms.com




---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-user-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-user-help@xml.apache.org


Re: filenames versus URI's

Posted by Joseph Kesselman <ke...@us.ibm.com>.
Personal opinion: Factor it out. Provide a filename-to-URI convenience 
method (which might have to be different on different operating systems, 
in fact) but have the APIs expect URIs.

You might even want to move absolutizing of relative URIs into that 
convenience -- let folks pass in the base URI at that time.

______________________________________
Joe Kesselman  / IBM Research

---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-user-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-user-help@xml.apache.org