You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@tika.apache.org by "Keith R. Bennett" <kb...@bbsinc.biz> on 2007/10/22 21:37:26 UTC

URLs as Primary Resource Identifiers

All -

I'd like to suggest that where our API requires an identifier for a
resource, we provide access for it as a URL, and only where necessary, as a
File, String, etc.  Allowing a URL will allow the maximum flexibility, since
this allows the loading of a resource outside of the classpath.  Reducing
the number of alternate methods would simplify the API.

When we add a similar method with a String parameter, that could be
confusing; does the String represent a string to be passed to
Class.getResource(), or a string with which to instantiate a File, or a
string with which to instantiate a URL?

The reason I'm thinking of this is that I see that I cannot find any
reasonably simple way to load a MimeTypes object from a resource outside of
the classpath.  Someone might want to do this, for example, to point to a
resource on a LAN or even on the public Internet.  Since the String
identifying the resource could easily be converted to a URL, I think we
should provide URL capability directly, and possibly remove the String
method.  In the case of MimeTypes, it requires trivial changes to MimeTypes
and MimeTypesReader.

What do you think?

- Keith

-- 
View this message in context: http://www.nabble.com/URLs-as-Primary-Resource-Identifiers-tf4673177.html#a13350966
Sent from the Apache Tika - Development mailing list archive at Nabble.com.

Re: URLs as Primary Resource Identifiers

Posted by Jukka Zitting <ju...@gmail.com>.

Hi,

On 10/23/07, Keith R. Bennett <kb...@bbsinc.biz> wrote:
> > I would rather see InputStream as the primary source
> > of byte streams to be processed.
>
> I understand.  That's what I meant by "where our API requires an
> *identifier*".  I meant "identifier" in the precise sense ("textual tokens
> (also called symbols) which name language entities", as per wikipedia), but
> I can see that that could be easily misinterpreted if it is used more
> casually.

I noticed that, but I don't see that many places where we'd really
need an identifier instead of a byte stream.

Of course there are occasions where you can extract stuff like a
resource name or a content type hint from a URL or a File, but I think
such cases are best handled in utility methods that associated those
hints with a resolved InputStream.

BR,

Jukka Zitting

Re: URLs as Primary Resource Identifiers

Posted by "Keith R. Bennett" <kb...@bbsinc.biz>.

Jukka -

> I would rather see InputStream as the primary source 
> of byte streams to be processed.

I understand.  That's what I meant by "where our API requires an
*identifier*".  I meant "identifier" in the precise sense ("textual tokens
(also called symbols) which name language entities", as per wikipedia), but
I can see that that could be easily misinterpreted if it is used more
casually.

I only mention this because there may be other times when I really do forget
something you already said, and don't want to use up my goodwill. ;)

Regards,
Keith

Jukka Zitting wrote:
> 
> Hi,
> 
> On 10/22/07, Keith R. Bennett <kb...@bbsinc.biz> wrote:
>> I'd like to suggest that where our API requires an identifier for a
>> resource, we provide access for it as a URL, and only where necessary, as
>> a
>> File, String, etc.
> 
> I would rather see InputStream as the primary source of byte streams
> to be processed.
> 
> 

-- 
View this message in context: http://www.nabble.com/URLs-as-Primary-Resource-Identifiers-tf4673177.html#a13366344
Sent from the Apache Tika - Development mailing list archive at Nabble.com.

Re: URLs as Primary Resource Identifiers

Posted by Jukka Zitting <ju...@gmail.com>.

Hi,

On 10/22/07, Keith R. Bennett <kb...@bbsinc.biz> wrote:
> I'd like to suggest that where our API requires an identifier for a
> resource, we provide access for it as a URL, and only where necessary, as a
> File, String, etc.

I would rather see InputStream as the primary source of byte streams
to be processed.

We can of course have utility methods that take a URL, a File, a
String, or whatever as arguments, but such methods should just create
an InputStream based on the given information and pass it to a primary
method that takes an InputStream.

BR,

Jukka Zitting