You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@tika.apache.org by Chetan Bikire <ch...@gmail.com> on 2022/10/14 18:04:38 UTC

Apache tika Server

Hi,

Is Apache tika server endpoints like tika/form or rmeta/form can accept
multiple files in single Request?

I am trying to call these endpoints from .net6 using httpclient and trying
to send multiple files in single request using multipart data.
I tried with postman also by attaching two files but endpoint parsed first
attached file only.

Please Assist.

Thank you

Re: Apache tika Server

Posted by Chetan Bikire <ch...@gmail.com>.
Thank you for your response
I will surely look into it.

On Fri, Oct 14, 2022, 23:51 Nicholas DiPiazza <ni...@gmail.com>
wrote:

> Sounds like you are doing big batch processing from your .NET app.
>
> When you are doing large amounts of batch processing, you should be trying
> to use Tika Pipes if possible. Have you given that a look yet?
>
> Example flow would be
>
> Your .NET app -> writes Fetch Request objects to the topic. Example Fetch
> request would be {"url": fileUrl1}, {"url": fileUrl2}, etc.
>
> Tika Pipes has a "pipe iterator" that listens to that topic.
>
> Then Tika Pipes takes the Fetch Request and sends it to a Tika Pipes
> Fetcher that pulls the file contents and parses it with Tika.
>
> Then Tika Emitters are able to take the Tika Body and Metadata that was
> parsed and emit it to a target destination.
>
> So let's say your fetcher is FileSystem. And your emitter is Apache Solr
> index.
>
> So Tika Pipes would run pipe iterator on Kafka and it will Fetch all
> documents found from the Kafka Topic.
>
> It will use the file system to obtain the bytes, and parses it with rmeta
> parser.
>
> Then finally the result of the parse is emitted as a Solr Document to
> Apache Solr.
>
> Does that make sense?
> https://cwiki.apache.org/confluence/display/TIKA/tika-pipes for more info
>
> -Nicholas
>
>
>
> On Fri, Oct 14, 2022 at 1:06 PM Chetan Bikire <ch...@gmail.com> wrote:
>
>> Hi,
>>
>> Is Apache tika server endpoints like tika/form or rmeta/form can accept
>> multiple files in single Request?
>>
>> I am trying to call these endpoints from .net6 using httpclient and
>> trying to send multiple files in single request using multipart data.
>> I tried with postman also by attaching two files but endpoint parsed
>> first attached file only.
>>
>> Please Assist.
>>
>> Thank you
>>
>

Re: Apache tika Server

Posted by Nicholas DiPiazza <ni...@gmail.com>.
Sounds like you are doing big batch processing from your .NET app.

When you are doing large amounts of batch processing, you should be trying
to use Tika Pipes if possible. Have you given that a look yet?

Example flow would be

Your .NET app -> writes Fetch Request objects to the topic. Example Fetch
request would be {"url": fileUrl1}, {"url": fileUrl2}, etc.

Tika Pipes has a "pipe iterator" that listens to that topic.

Then Tika Pipes takes the Fetch Request and sends it to a Tika Pipes
Fetcher that pulls the file contents and parses it with Tika.

Then Tika Emitters are able to take the Tika Body and Metadata that was
parsed and emit it to a target destination.

So let's say your fetcher is FileSystem. And your emitter is Apache Solr
index.

So Tika Pipes would run pipe iterator on Kafka and it will Fetch all
documents found from the Kafka Topic.

It will use the file system to obtain the bytes, and parses it with rmeta
parser.

Then finally the result of the parse is emitted as a Solr Document to
Apache Solr.

Does that make sense?
https://cwiki.apache.org/confluence/display/TIKA/tika-pipes for more info

-Nicholas



On Fri, Oct 14, 2022 at 1:06 PM Chetan Bikire <ch...@gmail.com> wrote:

> Hi,
>
> Is Apache tika server endpoints like tika/form or rmeta/form can accept
> multiple files in single Request?
>
> I am trying to call these endpoints from .net6 using httpclient and trying
> to send multiple files in single request using multipart data.
> I tried with postman also by attaching two files but endpoint parsed first
> attached file only.
>
> Please Assist.
>
> Thank you
>