You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Keith R. Bennett" <kb...@bbsinc.biz> on 2007/10/22 21:37:26 UTC
URLs as Primary Resource Identifiers
All -
I'd like to suggest that where our API requires an identifier for a
resource, we provide access for it as a URL, and only where necessary, as a
File, String, etc. Allowing a URL will allow the maximum flexibility, since
this allows the loading of a resource outside of the classpath. Reducing
the number of alternate methods would simplify the API.
When we add a similar method with a String parameter, that could be
confusing; does the String represent a string to be passed to
Class.getResource(), or a string with which to instantiate a File, or a
string with which to instantiate a URL?
The reason I'm thinking of this is that I see that I cannot find any
reasonably simple way to load a MimeTypes object from a resource outside of
the classpath. Someone might want to do this, for example, to point to a
resource on a LAN or even on the public Internet. Since the String
identifying the resource could easily be converted to a URL, I think we
should provide URL capability directly, and possibly remove the String
method. In the case of MimeTypes, it requires trivial changes to MimeTypes
and MimeTypesReader.
What do you think?
- Keith
--
View this message in context: http://www.nabble.com/URLs-as-Primary-Resource-Identifiers-tf4673177.html#a13350966
Sent from the Apache Tika - Development mailing list archive at Nabble.com.
Re: URLs as Primary Resource Identifiers
Posted by Jukka Zitting <ju...@gmail.com>.
Hi,
On 10/23/07, Keith R. Bennett <kb...@bbsinc.biz> wrote:
> > I would rather see InputStream as the primary source
> > of byte streams to be processed.
>
> I understand. That's what I meant by "where our API requires an
> *identifier*". I meant "identifier" in the precise sense ("textual tokens
> (also called symbols) which name language entities", as per wikipedia), but
> I can see that that could be easily misinterpreted if it is used more
> casually.
I noticed that, but I don't see that many places where we'd really
need an identifier instead of a byte stream.
Of course there are occasions where you can extract stuff like a
resource name or a content type hint from a URL or a File, but I think
such cases are best handled in utility methods that associated those
hints with a resolved InputStream.
BR,
Jukka Zitting
Re: URLs as Primary Resource Identifiers
Posted by "Keith R. Bennett" <kb...@bbsinc.biz>.
Jukka -
> I would rather see InputStream as the primary source
> of byte streams to be processed.
I understand. That's what I meant by "where our API requires an
*identifier*". I meant "identifier" in the precise sense ("textual tokens
(also called symbols) which name language entities", as per wikipedia), but
I can see that that could be easily misinterpreted if it is used more
casually.
I only mention this because there may be other times when I really do forget
something you already said, and don't want to use up my goodwill. ;)
Regards,
Keith
Jukka Zitting wrote:
>
> Hi,
>
> On 10/22/07, Keith R. Bennett <kb...@bbsinc.biz> wrote:
>> I'd like to suggest that where our API requires an identifier for a
>> resource, we provide access for it as a URL, and only where necessary, as
>> a
>> File, String, etc.
>
> I would rather see InputStream as the primary source of byte streams
> to be processed.
>
>
--
View this message in context: http://www.nabble.com/URLs-as-Primary-Resource-Identifiers-tf4673177.html#a13366344
Sent from the Apache Tika - Development mailing list archive at Nabble.com.
Re: URLs as Primary Resource Identifiers
Posted by Jukka Zitting <ju...@gmail.com>.
Hi,
On 10/22/07, Keith R. Bennett <kb...@bbsinc.biz> wrote:
> I'd like to suggest that where our API requires an identifier for a
> resource, we provide access for it as a URL, and only where necessary, as a
> File, String, etc.
I would rather see InputStream as the primary source of byte streams
to be processed.
We can of course have utility methods that take a URL, a File, a
String, or whatever as arguments, but such methods should just create
an InputStream based on the given information and pass it to a primary
method that takes an InputStream.
BR,
Jukka Zitting