You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Paul Rogers <pa...@gmail.com> on 2011/03/15 12:43:49 UTC

Using latest version of Tika with nutch

Dear All

I'm currently having great difficulty building nutch from trunk.  The
reason that I'm attempting this is that I wish to use the latest
version of Tika and I thought there might be an alternative.
Therefore:

Is it possible to determine the version of Tika included with the
binary version (1.2) of nutch?

Is it possible to add the latest version of Tika (0.9 I believe) to
the binary version (1.2) nutch? Or is it all bound up in the
executable?

Many thanks


Paul

Re: Using latest version of Tika with nutch

Posted by Paul Rogers <pa...@gmail.com>.
Andrzej

That's great - many thanks again for all your help.

regards

Paul

On 15 March 2011 12:44, Andrzej Bialecki <ab...@getopt.org> wrote:
> On 3/15/11 1:31 PM, Paul Rogers wrote:
>>
>> Hi Andzrej
>>
>> Sorry to bother you again.
>>
>> I have checked the trunk version of tika out from svn.  I notice in
>> $NUTCH_HOME/runtime/local/lib the tika jar is also 0.7.  Is it
>> possible to upgrade this in the trunk or is this also affected by API
>> change?
>
> I hasn't been upgraded yet - this JIRA issue tracks the status of the
> upgrade: https://issues.apache.org/jira/browse/NUTCH-967
>
> And yes, the API is different, but not much - if you're a Java programmer
> you should be able to do the upgrade yourself, and then you can submit a
> patch to make it work for everyone else :)
>
> --
> Best regards,
> Andrzej Bialecki     <><
>  ___. ___ ___ ___ _ _   __________________________________
> [__ || __|__/|__||\/|  Information Retrieval, Semantic Web
> ___|||__||  \|  ||  |  Embedded Unix, System Integration
> http://www.sigram.com  Contact: info at sigram dot com
>
>

Re: Using latest version of Tika with nutch

Posted by Andrzej Bialecki <ab...@getopt.org>.
On 3/15/11 1:31 PM, Paul Rogers wrote:
> Hi Andzrej
>
> Sorry to bother you again.
>
> I have checked the trunk version of tika out from svn.  I notice in
> $NUTCH_HOME/runtime/local/lib the tika jar is also 0.7.  Is it
> possible to upgrade this in the trunk or is this also affected by API
> change?

I hasn't been upgraded yet - this JIRA issue tracks the status of the 
upgrade: https://issues.apache.org/jira/browse/NUTCH-967

And yes, the API is different, but not much - if you're a Java 
programmer you should be able to do the upgrade yourself, and then you 
can submit a patch to make it work for everyone else :)

-- 
Best regards,
Andrzej Bialecki     <><
  ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com


Re: Using latest version of Tika with nutch

Posted by Paul Rogers <pa...@gmail.com>.
Hi Andzrej

Sorry to bother you again.

I have checked the trunk version of tika out from svn.  I notice in
$NUTCH_HOME/runtime/local/lib the tika jar is also 0.7.  Is it
possible to upgrade this in the trunk or is this also affected by API
change?

many thanks


Paul

On 15 March 2011 11:53, Andrzej Bialecki <ab...@getopt.org> wrote:
> On 3/15/11 12:43 PM, Paul Rogers wrote:
>>
>> Dear All
>>
>> I'm currently having great difficulty building nutch from trunk.  The
>> reason that I'm attempting this is that I wish to use the latest
>> version of Tika and I thought there might be an alternative.
>> Therefore:
>>
>> Is it possible to determine the version of Tika included with the
>> binary version (1.2) of nutch?
>
> Yes, take a look at the lib/tika-*.jar. The 1.2 release shipped with Tika
> 0.7.
>
>>
>> Is it possible to add the latest version of Tika (0.9 I believe) to
>> the binary version (1.2) nutch? Or is it all bound up in the
>> executable?
>
> It's not possible - the Tika API has changed. Your best bet is to encourage
> Nutch devs to work on NUTCH-967 and then wait a couple weeks more and
> upgrade to Nutch 1.3.
>
>
> --
> Best regards,
> Andrzej Bialecki     <><
>  ___. ___ ___ ___ _ _   __________________________________
> [__ || __|__/|__||\/|  Information Retrieval, Semantic Web
> ___|||__||  \|  ||  |  Embedded Unix, System Integration
> http://www.sigram.com  Contact: info at sigram dot com
>
>

Re: Using latest version of Tika with nutch

Posted by Paul Rogers <pa...@gmail.com>.
Hi Andrzej

Many thanks for the prompt reply.  I'll do as you suggest.

Regards

Paul

On 15 March 2011 11:53, Andrzej Bialecki <ab...@getopt.org> wrote:
> On 3/15/11 12:43 PM, Paul Rogers wrote:
>>
>> Dear All
>>
>> I'm currently having great difficulty building nutch from trunk.  The
>> reason that I'm attempting this is that I wish to use the latest
>> version of Tika and I thought there might be an alternative.
>> Therefore:
>>
>> Is it possible to determine the version of Tika included with the
>> binary version (1.2) of nutch?
>
> Yes, take a look at the lib/tika-*.jar. The 1.2 release shipped with Tika
> 0.7.
>
>>
>> Is it possible to add the latest version of Tika (0.9 I believe) to
>> the binary version (1.2) nutch? Or is it all bound up in the
>> executable?
>
> It's not possible - the Tika API has changed. Your best bet is to encourage
> Nutch devs to work on NUTCH-967 and then wait a couple weeks more and
> upgrade to Nutch 1.3.
>
>
> --
> Best regards,
> Andrzej Bialecki     <><
>  ___. ___ ___ ___ _ _   __________________________________
> [__ || __|__/|__||\/|  Information Retrieval, Semantic Web
> ___|||__||  \|  ||  |  Embedded Unix, System Integration
> http://www.sigram.com  Contact: info at sigram dot com
>
>

Re: Using latest version of Tika with nutch

Posted by Andrzej Bialecki <ab...@getopt.org>.
On 3/15/11 12:43 PM, Paul Rogers wrote:
> Dear All
>
> I'm currently having great difficulty building nutch from trunk.  The
> reason that I'm attempting this is that I wish to use the latest
> version of Tika and I thought there might be an alternative.
> Therefore:
>
> Is it possible to determine the version of Tika included with the
> binary version (1.2) of nutch?

Yes, take a look at the lib/tika-*.jar. The 1.2 release shipped with 
Tika 0.7.

>
> Is it possible to add the latest version of Tika (0.9 I believe) to
> the binary version (1.2) nutch? Or is it all bound up in the
> executable?

It's not possible - the Tika API has changed. Your best bet is to 
encourage Nutch devs to work on NUTCH-967 and then wait a couple weeks 
more and upgrade to Nutch 1.3.


-- 
Best regards,
Andrzej Bialecki     <><
  ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com