You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by Peter Kronenberg <pe...@torch.ai> on 2021/01/14 01:17:17 UTC

Fwd: Python dependency


Any thoughts on this?  Wonering if I can totally remove the python dependency or we still need it?

________________________________
From: Peter Kronenberg <pe...@torch.ai>
Sent: Wednesday, January 13, 2021, 11:20 AM
To: tallison@apache.org
Subject: Python dependency

So I see that there are other Python scripts.  I have no idea what these are used for.  But does this mean that Tika still needs the dependency on Python for some cases?  I.e., we still need the Python path in the config.  I don’t see any other hasPython() method or calls to getPythonPath()  anywhere, so not sure how these works.

[cid:image001.png@01D6E99B.8064A310]


RE: Python dependency

Posted by Peter Kronenberg <pe...@torch.ai>.
Ok, thanks.  So for now, it’s gone

From: Tim Allison <ta...@apache.org>
Sent: Wednesday, January 13, 2021 8:29 PM
To: Peter Kronenberg <pe...@torch.ai>
Cc: dev@tika.apache.org
Subject: Re: Python dependency

IMHO, we should remove it entirely from the tesseract module.  The advancedmedia module can handle finding it/configuring it/executing it.  Or, longer term, as Nick proposed, we can have a centralized "common external commands" configuration somehow through TikaConfig...but that is for later.

As I've been reflecting on this a bit, I'm not sure we should allow runtime configuration of paths to executables.  That opens that way to path attacks, and I'm not convinced of the utility. That, also, is for later.

On Wed, Jan 13, 2021 at 8:17 PM Peter Kronenberg <pe...@torch.ai>> wrote:


Any thoughts on this?  Wonering if I can totally remove the python dependency or we still need it?

________________________________
From: Peter Kronenberg <pe...@torch.ai>>
Sent: Wednesday, January 13, 2021, 11:20 AM
To: tallison@apache.org<ma...@apache.org>
Subject: Python dependency

So I see that there are other Python scripts.  I have no idea what these are used for.  But does this mean that Tika still needs the dependency on Python for some cases?  I.e., we still need the Python path in the config.  I don’t see any other hasPython() method or calls to getPythonPath()  anywhere, so not sure how these works.

[cid:image001.png@01D6EA51.862B8D20]


Re: Python dependency

Posted by Tim Allison <ta...@apache.org>.
IMHO, we should remove it entirely from the tesseract module.  The
advancedmedia module can handle finding it/configuring it/executing it.
Or, longer term, as Nick proposed, we can have a centralized "common
external commands" configuration somehow through TikaConfig...but that is
for later.

As I've been reflecting on this a bit, I'm not sure we should allow runtime
configuration of paths to executables.  That opens that way to path
attacks, and I'm not convinced of the utility. That, also, is for later.

On Wed, Jan 13, 2021 at 8:17 PM Peter Kronenberg <pe...@torch.ai>
wrote:

>
>
> Any thoughts on this?  Wonering if I can totally remove the python
> dependency or we still need it?
>
> ------------------------------
> *From:* Peter Kronenberg <pe...@torch.ai>
> *Sent:* Wednesday, January 13, 2021, 11:20 AM
> *To:* tallison@apache.org
> *Subject:* Python dependency
>
> So I see that there are other Python scripts.  I have no idea what these
> are used for.  But does this mean that Tika still needs the dependency on
> Python for some cases?  I.e., we still need the Python path in the config.
> I don’t see any other hasPython() method or calls to getPythonPath()
>  anywhere, so not sure how these works.
>
>
>
>
>