You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@uima.apache.org by Marshall Schor <ms...@schor.com> on 2009/08/23 13:08:32 UTC

Can we update the version of Tika in the TikaAnnotator to 0.4

The current dependency is specified as 0.2, but the Tika project is now
at 0.4 release. I don't know if changing to 0.4 requires any concurrent
modifications to the annotator.

-Marshall

Re: Can we update the version of Tika in the TikaAnnotator to 0.4

Posted by Marshall Schor <ms...@schor.com>.

Julien Nioche wrote:
> Hi guys,
>
>
>   
>> One final issue:  The README for the TikaAnnotator says ...
>>
>>    COMPILATION
>>
>>    You can use the ANT script to compile the sources.
>>    ....
>>
>> But there is no ant script in the TikaAnnotator project.  A check of the
>> history shows it wasn't included in the initial Jira UIMA-1095 patch,
>> from Julien Nioche
>>     
>
>
> Actually my initial patch from 2008-09-22 01:33 AM *did* have an ANT script.
> I am not a Maven user so I assume that the ANT script has been replaced with
> a pom.xml by joern when he made the changes to the code in January.
>
> I suppose that the comment about ANT in the readme can now be removed.
>
>   
Right you are - I had foolishly searched for "ant" to see if the script
was there - but it showed up when I searched for "build" (build.xml).

I think you're correct, it was removed in favor of a consistent build
approach using maven (less to maintain).
>   
>> Julien - can you take a look at this issue, and also see if everything
>> is OK for version 0.4?
>>
>>     
>
> Won't have time to look into this in the short term but can try at some
> point next week. Maybe someone with more experience of Maven can give it a
> try first and see if it compiles OK?
>   
I tried this and it compiled OK.
I had success with the following (requires maven to be installed):

cd to directory TikaAnnotator
mvn install

That builds it, builds any docbook documentation (there is none) and
runs the test cases (there are none, I think).

Thanks. -Marshall

-Marshall
> Julien
>
>   

Re: Can we update the version of Tika in the TikaAnnotator to 0.4

Posted by Julien Nioche <li...@gmail.com>.
Hi guys,


> One final issue:  The README for the TikaAnnotator says ...
>
>    COMPILATION
>
>    You can use the ANT script to compile the sources.
>    ....
>
> But there is no ant script in the TikaAnnotator project.  A check of the
> history shows it wasn't included in the initial Jira UIMA-1095 patch,
> from Julien Nioche


Actually my initial patch from 2008-09-22 01:33 AM *did* have an ANT script.
I am not a Maven user so I assume that the ANT script has been replaced with
a pom.xml by joern when he made the changes to the code in January.

I suppose that the comment about ANT in the readme can now be removed.


>
> Julien - can you take a look at this issue, and also see if everything
> is OK for version 0.4?
>

Won't have time to look into this in the short term but can try at some
point next week. Maybe someone with more experience of Maven can give it a
try first and see if it compiles OK?

Julien

-- 
DigitalPebble Ltd
http://www.digitalpebble.com

Re: Can we update the version of Tika in the TikaAnnotator to 0.4

Posted by Marshall Schor <ms...@schor.com>.

Jukka Zitting wrote:
> Hi,
>
> On Tue, Aug 25, 2009 at 3:24 AM, Marshall Schor<ms...@schor.com> wrote:
>   
>> However, I notice that there are no test cases for this annotator, and
>> also that there is another tika artifact at the 0.4 level, called
>> tika-parsers.  Is this other artifact needed?  If so, how does it need
>> to be incorporated?
>>     
>
> The tika-core jar contains only the core client-visible classes and
> interfaces and has zero dependencies beyond Java 5. All the actual
> parser implementations and external parser dependencies are in the
> tika-parsers jar. This split is new in Tika 0.4 and was done to better
> support users who only need the core functionality.
>
> For UIMA, I suppose you'll want support for all the document types, so
> the correct dependency settings would be:
>
>   <dependency>
>     <groupId>org.apache.tika</groupId>
>     <artifactId>tika-core</artifactId>
>     <version>0.4</version>
>   </dependency>
>   <dependency>
>     <groupId>org.apache.tika</groupId>
>     <artifactId>tika-parsers</artifactId>
>     <version>0.4</version>
>   </dependency>
>
> See the Maven section in
> http://lucene.apache.org/tika/gettingstarted.html for the full
> details.
>
>   
Thanks.

One final issue:  The README for the TikaAnnotator says ...

    COMPILATION

    You can use the ANT script to compile the sources.
    ....

But there is no ant script in the TikaAnnotator project.  A check of the
history shows it wasn't included in the initial Jira UIMA-1095 patch,
from Julien Nioche

Julien - can you take a look at this issue, and also see if everything
is OK for version 0.4?

> BR,
>
> Jukka Zitting
>
>
>   

Re: Can we update the version of Tika in the TikaAnnotator to 0.4

Posted by Jukka Zitting <ju...@gmail.com>.
Hi,

On Tue, Aug 25, 2009 at 3:24 AM, Marshall Schor<ms...@schor.com> wrote:
> However, I notice that there are no test cases for this annotator, and
> also that there is another tika artifact at the 0.4 level, called
> tika-parsers.  Is this other artifact needed?  If so, how does it need
> to be incorporated?

The tika-core jar contains only the core client-visible classes and
interfaces and has zero dependencies beyond Java 5. All the actual
parser implementations and external parser dependencies are in the
tika-parsers jar. This split is new in Tika 0.4 and was done to better
support users who only need the core functionality.

For UIMA, I suppose you'll want support for all the document types, so
the correct dependency settings would be:

  <dependency>
    <groupId>org.apache.tika</groupId>
    <artifactId>tika-core</artifactId>
    <version>0.4</version>
  </dependency>
  <dependency>
    <groupId>org.apache.tika</groupId>
    <artifactId>tika-parsers</artifactId>
    <version>0.4</version>
  </dependency>

See the Maven section in
http://lucene.apache.org/tika/gettingstarted.html for the full
details.

BR,

Jukka Zitting

Re: Can we update the version of Tika in the TikaAnnotator to 0.4

Posted by Marshall Schor <ms...@schor.com>.

Jukka Zitting wrote:
> Hi,
>
> On Sun, Aug 23, 2009 at 1:08 PM, Marshall Schor<ms...@schor.com> wrote:
>   
>> The current dependency is specified as 0.2, but the Tika project is now
>> at 0.4 release. I don't know if changing to 0.4 requires any concurrent
>> modifications to the annotator.
>>     
>
> The main Tika APIs are pretty much the same in 0.4 than they were in
> 0.2, so no major client changes should be needed. The 0.4 release just
> has more and better file format support than 0.2. :-)
>   
OK, replacing the dependency on
org.apache.tika:tika:0.2 with
org.apache.tika:tika-core:0.4

resulted in successful building.

However, I notice that there are no test cases for this annotator, and
also that there is another tika artifact at the 0.4 level, called
tika-parsers.  Is this other artifact needed?  If so, how does it need
to be incorporated?

Thanks. -Marshall

> BR,
>
> Jukka Zitting
>
>
>   

Re: Can we update the version of Tika in the TikaAnnotator to 0.4

Posted by Jukka Zitting <ju...@gmail.com>.
Hi,

On Sun, Aug 23, 2009 at 1:08 PM, Marshall Schor<ms...@schor.com> wrote:
> The current dependency is specified as 0.2, but the Tika project is now
> at 0.4 release. I don't know if changing to 0.4 requires any concurrent
> modifications to the annotator.

The main Tika APIs are pretty much the same in 0.4 than they were in
0.2, so no major client changes should be needed. The 0.4 release just
has more and better file format support than 0.2. :-)

BR,

Jukka Zitting