You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@tika.apache.org by Cyrus Cheng <cy...@gmail.com> on 2019/11/25 20:49:14 UTC
Parsing files on a remote server
Hi, I'm currently developing a project. I would like to use Tika to parse
files that are stored on a remote server from a local server, then ingest
them into an elastic cluster without transferring the files over to the
local server at all. Is this possible? Thanks in advance.
Re: Parsing files on a remote server
Posted by Cyrus Cheng <cy...@gmail.com>.
Thank you! This was very helpful and answered my question.
Best wishes
On Tue, 26 Nov 2019, 14:44 Tim Allison, <ta...@apache.org> wrote:
> Thank you, David! I heartily second this recommendation: please do not
> reinvent the wheel!
>
> On Tue, Nov 26, 2019 at 6:13 AM David Pilato <da...@pilato.fr> wrote:
>
>> You could have a look at FSCrawler project BTW which supports indexing
>> local files and files over ssh.
>>
>> https://fscrawler.readthedocs.io/en/latest/
>>
>> It uses Tika behind the scene.
>>
>> HTH
>> Le 26 nov. 2019 à 12:07 +0100, Tim Allison <ta...@apache.org>, a
>> écrit :
>>
>> You won't be able to parse the files without reading the bytes from the
>> remote server...so you have to transfer the bytes somehow. Once you do
>> that and parse the files, then you can send what you want over to Elastic.
>>
>> Let me know if I misunderstood the question.
>>
>> Cheers,
>>
>> Tim
>>
>> On Mon, Nov 25, 2019 at 3:49 PM Cyrus Cheng <cy...@gmail.com>
>> wrote:
>>
>>> Hi, I'm currently developing a project. I would like to use Tika to
>>> parse files that are stored on a remote server from a local server, then
>>> ingest them into an elastic cluster without transferring the files over to
>>> the local server at all. Is this possible? Thanks in advance.
>>>
>>
Re: Parsing files on a remote server
Posted by Tim Allison <ta...@apache.org>.
Thank you, David! I heartily second this recommendation: please do not
reinvent the wheel!
On Tue, Nov 26, 2019 at 6:13 AM David Pilato <da...@pilato.fr> wrote:
> You could have a look at FSCrawler project BTW which supports indexing
> local files and files over ssh.
>
> https://fscrawler.readthedocs.io/en/latest/
>
> It uses Tika behind the scene.
>
> HTH
> Le 26 nov. 2019 à 12:07 +0100, Tim Allison <ta...@apache.org>, a écrit
> :
>
> You won't be able to parse the files without reading the bytes from the
> remote server...so you have to transfer the bytes somehow. Once you do
> that and parse the files, then you can send what you want over to Elastic.
>
> Let me know if I misunderstood the question.
>
> Cheers,
>
> Tim
>
> On Mon, Nov 25, 2019 at 3:49 PM Cyrus Cheng <cy...@gmail.com>
> wrote:
>
>> Hi, I'm currently developing a project. I would like to use Tika to parse
>> files that are stored on a remote server from a local server, then ingest
>> them into an elastic cluster without transferring the files over to the
>> local server at all. Is this possible? Thanks in advance.
>>
>
Re: Parsing files on a remote server
Posted by David Pilato <da...@pilato.fr>.
You could have a look at FSCrawler project BTW which supports indexing local files and files over ssh.
https://fscrawler.readthedocs.io/en/latest/
It uses Tika behind the scene.
HTH
Le 26 nov. 2019 à 12:07 +0100, Tim Allison <ta...@apache.org>, a écrit :
> You won't be able to parse the files without reading the bytes from the remote server...so you have to transfer the bytes somehow. Once you do that and parse the files, then you can send what you want over to Elastic.
>
> Let me know if I misunderstood the question.
>
> Cheers,
>
> Tim
>
> > On Mon, Nov 25, 2019 at 3:49 PM Cyrus Cheng <cy...@gmail.com> wrote:
> > > Hi, I'm currently developing a project. I would like to use Tika to parse files that are stored on a remote server from a local server, then ingest them into an elastic cluster without transferring the files over to the local server at all. Is this possible? Thanks in advance.
Re: Parsing files on a remote server
Posted by Tim Allison <ta...@apache.org>.
You won't be able to parse the files without reading the bytes from the
remote server...so you have to transfer the bytes somehow. Once you do
that and parse the files, then you can send what you want over to Elastic.
Let me know if I misunderstood the question.
Cheers,
Tim
On Mon, Nov 25, 2019 at 3:49 PM Cyrus Cheng <cy...@gmail.com>
wrote:
> Hi, I'm currently developing a project. I would like to use Tika to parse
> files that are stored on a remote server from a local server, then ingest
> them into an elastic cluster without transferring the files over to the
> local server at all. Is this possible? Thanks in advance.
>