You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@tika.apache.org by "Mattmann, Chris A (388J)" <ch...@jpl.nasa.gov> on 2013/03/04 06:41:02 UTC

Re: Improvement in Metadata Class

Hey Lewis,

RE: #3 — it would be great to get Nutch using Tika's metadata container — I don't think we have anything special in Nutch that prevents it.
RE: #2 — I committed your Tika doc patch during ApacheCon NA 2013 so thanks!

Thanks!

Cheers,
Chris


From: Lewis John Mcgibbney <le...@gmail.com>>
Reply-To: "user@tika.apache.org<ma...@tika.apache.org>" <us...@tika.apache.org>>
Date: Tuesday, February 26, 2013 3:25 PM
To: "user@tika.apache.org<ma...@tika.apache.org>" <us...@tika.apache.org>>
Subject: Improvement in Metadata Class

Hi,
(This is maybe traffic for dev@ but I hope it is OK here on user@)

1.
In Apache Nutch we are using the Metadata class [0] as follows
if (tikaMDName.equalsIgnoreCase(Metadata.TITLE)) continue;
TITLE value is deprecated and I want to upgrade API usage.
What should I be using?

2.
I would like to contribute to the Tika Java documentation for this as I am not happy with the current Java documentation for this class.

3.
We also currently maintain a legacy Metadata package [1] within Nutch. This is a multi-valued Metadata container including sets of constant fields for Nutch webpage and host metadata.
How much of this stuff do we actually need (to be maintaining)? Should we not be leveraging more of the stuff available within Apache Tika for Metadata fields. Is this a case of the more the merrier here?

Thank you very much in advance. I look forward to hearing back from anyone on this, I am at ApacheCon just now and will cook up a patch based on the feedback. Thank you.

Lewis

[0] http://tika.apache.org/1.3/api/index.html?org/apache/tika/metadata/Metadata.html
[1] http://svn.apache.org/repos/asf/nutch/trunk/src/java/org/apache/nutch/metadata/
--
Lewis

Re: Improvement in Metadata Class

Posted by Lewis John Mcgibbney <le...@gmail.com>.
Oh and thanks for taking the patch into Tika. I hope it will be a *bit*
clearer for folks in a similar position as us (in Nutch) to see exactly
what should be pulled from Tika.
Lewis

On Wed, Mar 6, 2013 at 10:49 AM, Lewis John Mcgibbney <
lewis.mcgibbney@gmail.com> wrote:

> Hi Chris,
> Thanks for the input.
> RE#3 Yeah, me and Sebastien are now discussing this and will address it
> within NUTCH-1537
> Thanks
> Lewis
>
>
> On Sun, Mar 3, 2013 at 9:41 PM, Mattmann, Chris A (388J) <
> chris.a.mattmann@jpl.nasa.gov> wrote:
>
>>  Hey Lewis,
>>
>>  RE: #3 — it would be great to get Nutch using Tika's metadata container
>> — I don't think we have anything special in Nutch that prevents it.
>> RE: #2 — I committed your Tika doc patch during ApacheCon NA 2013 so
>> thanks!
>>
>>  Thanks!
>>
>>  Cheers,
>> Chris
>>
>>
>>   From: Lewis John Mcgibbney <le...@gmail.com>
>> Reply-To: "user@tika.apache.org" <us...@tika.apache.org>
>> Date: Tuesday, February 26, 2013 3:25 PM
>> To: "user@tika.apache.org" <us...@tika.apache.org>
>> Subject: Improvement in Metadata Class
>>
>>  Hi,
>> (This is maybe traffic for dev@ but I hope it is OK here on user@)
>>
>> 1.
>> In Apache Nutch we are using the Metadata class [0] as follows
>> if (tikaMDName.equalsIgnoreCase(Metadata.TITLE)) continue;
>> TITLE value is deprecated and I want to upgrade API usage.
>> What should I be using?
>>
>> 2.
>> I would like to contribute to the Tika Java documentation for this as I
>> am not happy with the current Java documentation for this class.
>>
>> 3.
>> We also currently maintain a legacy Metadata package [1] within Nutch.
>> This is a multi-valued Metadata container including sets of constant fields
>> for Nutch webpage and host metadata.
>> How much of this stuff do we actually need (to be maintaining)? Should we
>> not be leveraging more of the stuff available within Apache Tika for
>> Metadata fields. Is this a case of the more the merrier here?
>>
>> Thank you very much in advance. I look forward to hearing back from
>> anyone on this, I am at ApacheCon just now and will cook up a patch based
>> on the feedback. Thank you.
>>
>> Lewis
>>
>> [0]
>> http://tika.apache.org/1.3/api/index.html?org/apache/tika/metadata/Metadata.html
>> [1]
>> http://svn.apache.org/repos/asf/nutch/trunk/src/java/org/apache/nutch/metadata/
>> --
>> *Lewis*
>>
>
>
>
> --
> *Lewis*
>



-- 
*Lewis*

Re: Improvement in Metadata Class

Posted by Lewis John Mcgibbney <le...@gmail.com>.
Hi Chris,
Thanks for the input.
RE#3 Yeah, me and Sebastien are now discussing this and will address it
within NUTCH-1537
Thanks
Lewis

On Sun, Mar 3, 2013 at 9:41 PM, Mattmann, Chris A (388J) <
chris.a.mattmann@jpl.nasa.gov> wrote:

>  Hey Lewis,
>
>  RE: #3 — it would be great to get Nutch using Tika's metadata container
> — I don't think we have anything special in Nutch that prevents it.
> RE: #2 — I committed your Tika doc patch during ApacheCon NA 2013 so
> thanks!
>
>  Thanks!
>
>  Cheers,
> Chris
>
>
>   From: Lewis John Mcgibbney <le...@gmail.com>
> Reply-To: "user@tika.apache.org" <us...@tika.apache.org>
> Date: Tuesday, February 26, 2013 3:25 PM
> To: "user@tika.apache.org" <us...@tika.apache.org>
> Subject: Improvement in Metadata Class
>
>  Hi,
> (This is maybe traffic for dev@ but I hope it is OK here on user@)
>
> 1.
> In Apache Nutch we are using the Metadata class [0] as follows
> if (tikaMDName.equalsIgnoreCase(Metadata.TITLE)) continue;
> TITLE value is deprecated and I want to upgrade API usage.
> What should I be using?
>
> 2.
> I would like to contribute to the Tika Java documentation for this as I am
> not happy with the current Java documentation for this class.
>
> 3.
> We also currently maintain a legacy Metadata package [1] within Nutch.
> This is a multi-valued Metadata container including sets of constant fields
> for Nutch webpage and host metadata.
> How much of this stuff do we actually need (to be maintaining)? Should we
> not be leveraging more of the stuff available within Apache Tika for
> Metadata fields. Is this a case of the more the merrier here?
>
> Thank you very much in advance. I look forward to hearing back from anyone
> on this, I am at ApacheCon just now and will cook up a patch based on the
> feedback. Thank you.
>
> Lewis
>
> [0]
> http://tika.apache.org/1.3/api/index.html?org/apache/tika/metadata/Metadata.html
> [1]
> http://svn.apache.org/repos/asf/nutch/trunk/src/java/org/apache/nutch/metadata/
> --
> *Lewis*
>



-- 
*Lewis*