You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@tika.apache.org by David Pilato <da...@pilato.fr> on 2021/07/21 14:09:52 UTC

Apache Tika 2.0.0 parsers jars

Hey team


I'm trying to upgrade my project to 2.0.0.
I'm confused. The doc says to include:
<dependency>
   <groupId>org.apache.tika</groupId>
   <artifactId>tika-parsers</artifactId>
   <version>2.0.0</version>
</dependency>

But the release note says to include modules like:
<dependency>
   <groupId>org.apache.tika</groupId>
   <artifactId>tika-parsers-standard</artifactId>
   <version>2.0.0</version>
</dependency>
<dependency>
   <groupId>org.apache.tika</groupId>
   <artifactId>tika-parsers-extended</artifactId>
   <version>2.0.0</version>
</dependency>
<dependency>
   <groupId>org.apache.tika</groupId>
   <artifactId>tika-parser-scientific-module</artifactId>
   <version>2.0.0</version>
</dependency>
<dependency>
   <groupId>org.apache.tika</groupId>
   <artifactId>tika-parser-sqlite3-module</artifactId>
   <version>2.0.0</version>
</dependency>


But AFAICS all those modules are marked as pom not as jar. So maven is failing when I'm trying to use them.

What am I missing here?


David

Re: Apache Tika 2.0.0 parsers jars

Posted by David Pilato <da...@pilato.fr>.
Thank Tim!


For whatever reason, this thread went into my spam box. :(
I'll look at async/pipes indeed as this could help me to add parallelism to FSCrawler.
But I'll most likely think about it for FSCrawler v3 where I wanted to redesign everything.

Now with the async/pipes and fetchers and emitters (which is basically what I wanted to implement in v3), I have to think about it.


Best

David
Le 21 juil. 2021 à 16:43 +0200, Tim Allison <ta...@apache.org>, a écrit :
> Hi David,
> W00t! You should definitely also look into the async/pipes option
> for FSCrawler once I get the documentation in order. I'm in the
> process of putting together the minimal config files for
> fileshare->fileshare, and then I'll put together an example of
> fileshare->OpenSearch, which, um, should work for a bit at least with
> Elasticsearch. If it doesn't work with Elasticsearch, it should be
> fairly easy to write your own emitter.
> The benefit of the pipes package is that all of the parsing is done
> in isolated jvms so that catastrophic problems aren't catastrophic for
> the indexing process or the indexer. :D The other benefit is that we
> have fetchers for fileshare, S3 and http so that you can easily add
> new data sources.
> The new pipes module takes a bit of explanation (in lieu of tbd
> documentation), but not much. I'm always happy to chat.
>
> Cheers,
>
> Tim
>
>
> On Wed, Jul 21, 2021 at 10:16 AM David Pilato <da...@pilato.fr> wrote:
> >
> > Ha. Found it...
> >
> > <dependency>
> > <groupId>org.apache.tika</groupId>
> > <artifactId>tika-parsers-standard-package</artifactId>
> > </dependency>
> > <dependency>
> > <groupId>org.apache.tika</groupId>
> > <artifactId>tika-parser-scientific-module</artifactId>
> > </dependency>
> > <dependency>
> > <groupId>org.apache.tika</groupId>
> > <artifactId>tika-parser-sqlite3-module</artifactId>
> > </dependency>
> >
> >
> >
> > I guess we just need to update the documentation?
> >
> > David
> > Le 21 juil. 2021 à 16:10 +0200, David Pilato <da...@pilato.fr>, a écrit :
> >
> > Hey team
> >
> >
> > I'm trying to upgrade my project to 2.0.0.
> > I'm confused. The doc says to include:
> >
> > <dependency>
> > <groupId>org.apache.tika</groupId>
> > <artifactId>tika-parsers</artifactId>
> > <version>2.0.0</version>
> > </dependency>
> >
> >
> > But the release note says to include modules like:
> >
> > <dependency>
> > <groupId>org.apache.tika</groupId>
> > <artifactId>tika-parsers-standard</artifactId>
> > <version>2.0.0</version>
> > </dependency>
> > <dependency>
> > <groupId>org.apache.tika</groupId>
> > <artifactId>tika-parsers-extended</artifactId>
> > <version>2.0.0</version>
> > </dependency>
> > <dependency>
> > <groupId>org.apache.tika</groupId>
> > <artifactId>tika-parser-scientific-module</artifactId>
> > <version>2.0.0</version>
> > </dependency>
> > <dependency>
> > <groupId>org.apache.tika</groupId>
> > <artifactId>tika-parser-sqlite3-module</artifactId>
> > <version>2.0.0</version>
> > </dependency>
> >
> >
> >
> > But AFAICS all those modules are marked as pom not as jar. So maven is failing when I'm trying to use them.
> >
> > What am I missing here?
> >
> >
> > David

Re: Apache Tika 2.0.0 parsers jars

Posted by Maxim Solodovnik <so...@gmail.com>.
Thanks for this thread!
It saves my time :)

On Wed, 21 Jul 2021 at 21:43, Tim Allison <ta...@apache.org> wrote:
>
> Hi David,
>   W00t!  You should definitely also look into the async/pipes option
> for FSCrawler once I get the documentation in order.  I'm in the
> process of putting together the minimal config files for
> fileshare->fileshare, and then I'll put together an example of
> fileshare->OpenSearch, which, um, should work for a bit at least with
> Elasticsearch.  If it doesn't work with Elasticsearch, it should be
> fairly easy to write your own emitter.
>    The benefit of the pipes package is that all of the parsing is done
> in isolated jvms so that catastrophic problems aren't catastrophic for
> the indexing process or the indexer. :D The other benefit is that we
> have fetchers for fileshare, S3 and http so that you can easily add
> new data sources.
>    The new pipes module takes a bit of explanation (in lieu of tbd
> documentation), but not much.  I'm always happy to chat.
>
>         Cheers,
>
>                  Tim
>
>
> On Wed, Jul 21, 2021 at 10:16 AM David Pilato <da...@pilato.fr> wrote:
> >
> > Ha. Found it...
> >
> > <dependency>
> >     <groupId>org.apache.tika</groupId>
> >     <artifactId>tika-parsers-standard-package</artifactId>
> > </dependency>
> > <dependency>
> >     <groupId>org.apache.tika</groupId>
> >     <artifactId>tika-parser-scientific-module</artifactId>
> > </dependency>
> > <dependency>
> >     <groupId>org.apache.tika</groupId>
> >     <artifactId>tika-parser-sqlite3-module</artifactId>
> > </dependency>
> >
> >
> >
> > I guess we just need to update the documentation?
> >
> > David
> > Le 21 juil. 2021 à 16:10 +0200, David Pilato <da...@pilato.fr>, a écrit :
> >
> > Hey team
> >
> >
> > I'm trying to upgrade my project to 2.0.0.
> > I'm confused. The doc says to include:
> >
> > <dependency>
> >     <groupId>org.apache.tika</groupId>
> >     <artifactId>tika-parsers</artifactId>
> >     <version>2.0.0</version>
> > </dependency>
> >
> >
> > But the release note says to include modules like:
> >
> > <dependency>
> >     <groupId>org.apache.tika</groupId>
> >     <artifactId>tika-parsers-standard</artifactId>
> >     <version>2.0.0</version>
> > </dependency>
> > <dependency>
> >     <groupId>org.apache.tika</groupId>
> >     <artifactId>tika-parsers-extended</artifactId>
> >     <version>2.0.0</version>
> > </dependency>
> > <dependency>
> >     <groupId>org.apache.tika</groupId>
> >     <artifactId>tika-parser-scientific-module</artifactId>
> >     <version>2.0.0</version>
> > </dependency>
> > <dependency>
> >     <groupId>org.apache.tika</groupId>
> >     <artifactId>tika-parser-sqlite3-module</artifactId>
> >     <version>2.0.0</version>
> > </dependency>
> >
> >
> >
> > But AFAICS all those modules are marked as pom not as jar. So maven is failing when I'm trying to use them.
> >
> > What am I missing here?
> >
> >
> > David



-- 
Best regards,
Maxim

Re: Apache Tika 2.0.0 parsers jars

Posted by Tim Allison <ta...@apache.org>.
Hi David,
  W00t!  You should definitely also look into the async/pipes option
for FSCrawler once I get the documentation in order.  I'm in the
process of putting together the minimal config files for
fileshare->fileshare, and then I'll put together an example of
fileshare->OpenSearch, which, um, should work for a bit at least with
Elasticsearch.  If it doesn't work with Elasticsearch, it should be
fairly easy to write your own emitter.
   The benefit of the pipes package is that all of the parsing is done
in isolated jvms so that catastrophic problems aren't catastrophic for
the indexing process or the indexer. :D The other benefit is that we
have fetchers for fileshare, S3 and http so that you can easily add
new data sources.
   The new pipes module takes a bit of explanation (in lieu of tbd
documentation), but not much.  I'm always happy to chat.

        Cheers,

                 Tim


On Wed, Jul 21, 2021 at 10:16 AM David Pilato <da...@pilato.fr> wrote:
>
> Ha. Found it...
>
> <dependency>
>     <groupId>org.apache.tika</groupId>
>     <artifactId>tika-parsers-standard-package</artifactId>
> </dependency>
> <dependency>
>     <groupId>org.apache.tika</groupId>
>     <artifactId>tika-parser-scientific-module</artifactId>
> </dependency>
> <dependency>
>     <groupId>org.apache.tika</groupId>
>     <artifactId>tika-parser-sqlite3-module</artifactId>
> </dependency>
>
>
>
> I guess we just need to update the documentation?
>
> David
> Le 21 juil. 2021 à 16:10 +0200, David Pilato <da...@pilato.fr>, a écrit :
>
> Hey team
>
>
> I'm trying to upgrade my project to 2.0.0.
> I'm confused. The doc says to include:
>
> <dependency>
>     <groupId>org.apache.tika</groupId>
>     <artifactId>tika-parsers</artifactId>
>     <version>2.0.0</version>
> </dependency>
>
>
> But the release note says to include modules like:
>
> <dependency>
>     <groupId>org.apache.tika</groupId>
>     <artifactId>tika-parsers-standard</artifactId>
>     <version>2.0.0</version>
> </dependency>
> <dependency>
>     <groupId>org.apache.tika</groupId>
>     <artifactId>tika-parsers-extended</artifactId>
>     <version>2.0.0</version>
> </dependency>
> <dependency>
>     <groupId>org.apache.tika</groupId>
>     <artifactId>tika-parser-scientific-module</artifactId>
>     <version>2.0.0</version>
> </dependency>
> <dependency>
>     <groupId>org.apache.tika</groupId>
>     <artifactId>tika-parser-sqlite3-module</artifactId>
>     <version>2.0.0</version>
> </dependency>
>
>
>
> But AFAICS all those modules are marked as pom not as jar. So maven is failing when I'm trying to use them.
>
> What am I missing here?
>
>
> David

Re: Apache Tika 2.0.0 parsers jars

Posted by David Pilato <da...@pilato.fr>.
Ha. Found it...
<dependency>
   <groupId>org.apache.tika</groupId>
   <artifactId>tika-parsers-standard-package</artifactId>
</dependency>
<dependency>
   <groupId>org.apache.tika</groupId>
   <artifactId>tika-parser-scientific-module</artifactId>
</dependency>
<dependency>
   <groupId>org.apache.tika</groupId>
   <artifactId>tika-parser-sqlite3-module</artifactId>
</dependency>


I guess we just need to update the documentation?

David
Le 21 juil. 2021 à 16:10 +0200, David Pilato <da...@pilato.fr>, a écrit :
> Hey team
>
>
> I'm trying to upgrade my project to 2.0.0.
> I'm confused. The doc says to include:
> <dependency>
>    <groupId>org.apache.tika</groupId>
>    <artifactId>tika-parsers</artifactId>
>    <version>2.0.0</version>
> </dependency>
>
> But the release note says to include modules like:
> <dependency>
>    <groupId>org.apache.tika</groupId>
>    <artifactId>tika-parsers-standard</artifactId>
>    <version>2.0.0</version>
> </dependency>
> <dependency>
>    <groupId>org.apache.tika</groupId>
>    <artifactId>tika-parsers-extended</artifactId>
>    <version>2.0.0</version>
> </dependency>
> <dependency>
>    <groupId>org.apache.tika</groupId>
>    <artifactId>tika-parser-scientific-module</artifactId>
>    <version>2.0.0</version>
> </dependency>
> <dependency>
>    <groupId>org.apache.tika</groupId>
>    <artifactId>tika-parser-sqlite3-module</artifactId>
>    <version>2.0.0</version>
> </dependency>
>
>
> But AFAICS all those modules are marked as pom not as jar. So maven is failing when I'm trying to use them.
>
> What am I missing here?
>
>
> David