You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by Chris Mattmann <ma...@apache.org> on 2021/03/15 15:49:50 UTC

Re: Python-tika: issues related to memory consumption

Hi Manish, I think you should ask this one upstream on the Tika Dev lists. I’ve cc’ed them for you.

 

 

 

 

From: manish mathur <ma...@gmail.com>
Date: Monday, March 15, 2021 at 4:41 AM
To: <ch...@gmail.com>
Subject: Re: Python-tika: issues related to memory consumption

 

Hi Chris,

 

    I am using python-tika library to extract the content from pdf. but  lot of junks are coming due to tables or graphs etc. so is there have any way to ignore while parsing pdf to get the content.

 

Thanks in advance

 

Thanks 

Manish Mathur

 

 

 

 

On Mon, Feb 1, 2021 at 4:18 PM manish mathur <ma...@gmail.com> wrote:

Hi Chris,

 

    I am using python-tika library for reading pdf urls, but gradually memory consumption is increasing so much. is there have any way to release the memory after reading one pdf url. Please let me know.

 

Thanks in advance

 

Thanks 

Manish Mathur

 

 


Re: Python-tika: issues related to memory consumption

Posted by Tim Allison <ta...@apache.org>.
Hi Manish,
  Lots of things can go wrong in parsing PDFs.  Can you share links to
files showing specific problems?

On Mon, Mar 15, 2021 at 11:50 AM Chris Mattmann <ma...@apache.org> wrote:
>
> Hi Manish, I think you should ask this one upstream on the Tika Dev lists. I’ve cc’ed them for you.
>
>
>
>
>
>
>
>
>
> From: manish mathur <ma...@gmail.com>
> Date: Monday, March 15, 2021 at 4:41 AM
> To: <ch...@gmail.com>
> Subject: Re: Python-tika: issues related to memory consumption
>
>
>
> Hi Chris,
>
>
>
>     I am using python-tika library to extract the content from pdf. but  lot of junks are coming due to tables or graphs etc. so is there have any way to ignore while parsing pdf to get the content.
>
>
>
> Thanks in advance
>
>
>
> Thanks
>
> Manish Mathur
>
>
>
>
>
>
>
>
>
> On Mon, Feb 1, 2021 at 4:18 PM manish mathur <ma...@gmail.com> wrote:
>
> Hi Chris,
>
>
>
>     I am using python-tika library for reading pdf urls, but gradually memory consumption is increasing so much. is there have any way to release the memory after reading one pdf url. Please let me know.
>
>
>
> Thanks in advance
>
>
>
> Thanks
>
> Manish Mathur
>
>
>
>
>