You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by Chris Mattmann <ma...@apache.org> on 2021/03/15 15:49:50 UTC
Re: Python-tika: issues related to memory consumption
Hi Manish, I think you should ask this one upstream on the Tika Dev lists. I’ve cc’ed them for you.
From: manish mathur <ma...@gmail.com>
Date: Monday, March 15, 2021 at 4:41 AM
To: <ch...@gmail.com>
Subject: Re: Python-tika: issues related to memory consumption
Hi Chris,
I am using python-tika library to extract the content from pdf. but lot of junks are coming due to tables or graphs etc. so is there have any way to ignore while parsing pdf to get the content.
Thanks in advance
Thanks
Manish Mathur
On Mon, Feb 1, 2021 at 4:18 PM manish mathur <ma...@gmail.com> wrote:
Hi Chris,
I am using python-tika library for reading pdf urls, but gradually memory consumption is increasing so much. is there have any way to release the memory after reading one pdf url. Please let me know.
Thanks in advance
Thanks
Manish Mathur
Re: Python-tika: issues related to memory consumption
Posted by Tim Allison <ta...@apache.org>.
Hi Manish,
Lots of things can go wrong in parsing PDFs. Can you share links to
files showing specific problems?
On Mon, Mar 15, 2021 at 11:50 AM Chris Mattmann <ma...@apache.org> wrote:
>
> Hi Manish, I think you should ask this one upstream on the Tika Dev lists. I’ve cc’ed them for you.
>
>
>
>
>
>
>
>
>
> From: manish mathur <ma...@gmail.com>
> Date: Monday, March 15, 2021 at 4:41 AM
> To: <ch...@gmail.com>
> Subject: Re: Python-tika: issues related to memory consumption
>
>
>
> Hi Chris,
>
>
>
> I am using python-tika library to extract the content from pdf. but lot of junks are coming due to tables or graphs etc. so is there have any way to ignore while parsing pdf to get the content.
>
>
>
> Thanks in advance
>
>
>
> Thanks
>
> Manish Mathur
>
>
>
>
>
>
>
>
>
> On Mon, Feb 1, 2021 at 4:18 PM manish mathur <ma...@gmail.com> wrote:
>
> Hi Chris,
>
>
>
> I am using python-tika library for reading pdf urls, but gradually memory consumption is increasing so much. is there have any way to release the memory after reading one pdf url. Please let me know.
>
>
>
> Thanks in advance
>
>
>
> Thanks
>
> Manish Mathur
>
>
>
>
>