You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@tika.apache.org by Cyrus Cheng <cy...@gmail.com> on 2019/11/25 20:49:14 UTC

Parsing files on a remote server

Hi, I'm currently developing a project. I would like to use Tika to parse
files that are stored on a remote server from a local server, then ingest
them into an elastic cluster without transferring the files over to the
local server at all. Is this possible? Thanks in advance.

Re: Parsing files on a remote server

Posted by Cyrus Cheng <cy...@gmail.com>.
Thank you! This was very helpful and answered my question.

Best wishes

On Tue, 26 Nov 2019, 14:44 Tim Allison, <ta...@apache.org> wrote:

> Thank you, David!  I heartily second this recommendation: please do not
> reinvent the wheel!
>
> On Tue, Nov 26, 2019 at 6:13 AM David Pilato <da...@pilato.fr> wrote:
>
>> You could have a look at FSCrawler project BTW which supports indexing
>> local files and files over ssh.
>>
>> https://fscrawler.readthedocs.io/en/latest/
>>
>> It uses Tika behind the scene.
>>
>> HTH
>> Le 26 nov. 2019 à 12:07 +0100, Tim Allison <ta...@apache.org>, a
>> écrit :
>>
>> You won't be able to parse the files without reading the bytes from the
>> remote server...so you have to transfer the bytes somehow.  Once you do
>> that and parse the files, then you can send what you want over to Elastic.
>>
>> Let me know if I misunderstood the question.
>>
>> Cheers,
>>
>>       Tim
>>
>> On Mon, Nov 25, 2019 at 3:49 PM Cyrus Cheng <cy...@gmail.com>
>> wrote:
>>
>>> Hi, I'm currently developing a project. I would like to use Tika to
>>> parse files that are stored on a remote server from a local server, then
>>> ingest them into an elastic cluster without transferring the files over to
>>> the local server at all. Is this possible? Thanks in advance.
>>>
>>

Re: Parsing files on a remote server

Posted by Tim Allison <ta...@apache.org>.
Thank you, David!  I heartily second this recommendation: please do not
reinvent the wheel!

On Tue, Nov 26, 2019 at 6:13 AM David Pilato <da...@pilato.fr> wrote:

> You could have a look at FSCrawler project BTW which supports indexing
> local files and files over ssh.
>
> https://fscrawler.readthedocs.io/en/latest/
>
> It uses Tika behind the scene.
>
> HTH
> Le 26 nov. 2019 à 12:07 +0100, Tim Allison <ta...@apache.org>, a écrit
> :
>
> You won't be able to parse the files without reading the bytes from the
> remote server...so you have to transfer the bytes somehow.  Once you do
> that and parse the files, then you can send what you want over to Elastic.
>
> Let me know if I misunderstood the question.
>
> Cheers,
>
>       Tim
>
> On Mon, Nov 25, 2019 at 3:49 PM Cyrus Cheng <cy...@gmail.com>
> wrote:
>
>> Hi, I'm currently developing a project. I would like to use Tika to parse
>> files that are stored on a remote server from a local server, then ingest
>> them into an elastic cluster without transferring the files over to the
>> local server at all. Is this possible? Thanks in advance.
>>
>

Re: Parsing files on a remote server

Posted by David Pilato <da...@pilato.fr>.
You could have a look at FSCrawler project BTW which supports indexing local files and files over ssh.

https://fscrawler.readthedocs.io/en/latest/

It uses Tika behind the scene.

HTH
Le 26 nov. 2019 à 12:07 +0100, Tim Allison <ta...@apache.org>, a écrit :
> You won't be able to parse the files without reading the bytes from the remote server...so you have to transfer the bytes somehow.  Once you do that and parse the files, then you can send what you want over to Elastic.
>
> Let me know if I misunderstood the question.
>
> Cheers,
>
>       Tim
>
> > On Mon, Nov 25, 2019 at 3:49 PM Cyrus Cheng <cy...@gmail.com> wrote:
> > > Hi, I'm currently developing a project. I would like to use Tika to parse files that are stored on a remote server from a local server, then ingest them into an elastic cluster without transferring the files over to the local server at all. Is this possible? Thanks in advance.

Re: Parsing files on a remote server

Posted by Tim Allison <ta...@apache.org>.
You won't be able to parse the files without reading the bytes from the
remote server...so you have to transfer the bytes somehow.  Once you do
that and parse the files, then you can send what you want over to Elastic.

Let me know if I misunderstood the question.

Cheers,

      Tim

On Mon, Nov 25, 2019 at 3:49 PM Cyrus Cheng <cy...@gmail.com>
wrote:

> Hi, I'm currently developing a project. I would like to use Tika to parse
> files that are stored on a remote server from a local server, then ingest
> them into an elastic cluster without transferring the files over to the
> local server at all. Is this possible? Thanks in advance.
>