You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@pdfbox.apache.org by José Rodolfo Carrijo de Freitas <jo...@softplan.com.br> on 2011/08/29 21:56:10 UTC

how to find why a pdf is so big in pdfdebugger

Hey guys,

I have a pdf with one page that have some texts and a 71kb image,
How could be that this pdf has 1337 kb ?

I'd like to find what objects are bloating the size of this pdf in
PDFDebugger but I couldn't find a way to check on that.
I was hoping if you could give me some tips.

This pdf is actually one page extracted from a 22 pages pdf, and the
original file (with 22 pages) has 1345 kb, 
so I'm guessing that for some reason this page is holding references from
all the resources of the original pdf.
And I'd like to be sure of that, is there a way through pdfDebugger to check
this?


Thanks.
José Freitas.



>>-----Mensagem original-----
>>De: Andreas Lehmkuehler [mailto:andreas@lehmi.de]
>>Enviada em: domingo, 28 de agosto de 2011 09:12
>>Para: users@pdfbox.apache.org
>>Assunto: Re: pdfbox for .NET compilation and use example request
>>
>>Hi,
>>
>>Am 27.08.2011 19:41, schrieb Z W:
>>> Hi
>>>
>>> I was reading the section on compiling Pdfbox to dll for .NET use,
>>> supposing Ant use.
>>> Is there a simple command line way to just compile the pdfbox without
>>> using Ant ?
>>> Detailed example would be helpful
>>> I apologize I need more help to get this to work
>>If you are looking for a precompiled .NET-version, have a look at [1]
>>
>>> Thanks
>>
>>
>>BR
>>Andreas Lehmkühler
>>
>>[1] http://pdfbox.lehmi.de/


Re: how to find why a pdf is so big in pdfdebugger

Posted by José Rodolfo Carrijo de Freitas <jo...@softplan.com.br>.
Hi Mehdi,

I'll try to get permission to show this pdf.
Btw, how can I ask for pdfbox to save using flating compression?

Thanks.


>>-----Mensagem original-----
>>De: mehdi houshmand [mailto:med1985@gmail.com]
>>Enviada em: terça-feira, 30 de agosto de 2011 12:02
>>Para: users@pdfbox.apache.org
>>Assunto: Re: how to find why a pdf is so big in pdfdebugger
>>
>>Hi Jose,
>>
>>There are several factors that could affect the size of a PDF, one
>>factor we found that made a significant difference was accessibility
>>features within a PDF. These are features that assist text-to-audio
>>clients for the visually impaired and there will be a structure tree
>>in the PDF if it does have accessibility features. Another factor is
>>the compression, if your PDF isn't compressed with a Flate encoder,
>>that also makes big impact on file size. However, without seeing the
>>PDF, we can only speculate.
>>
>>Hope that helps
>>
>>Mehdi
>>
>>On 30 August 2011 15:01, José Rodolfo Carrijo de Freitas
>><jo...@softplan.com.br> wrote:
>>> Hi Eric,
>>>
>>> 1) I'd like to, but I don't know what to say to be more specific,
>>> 2) I believe I cannot publicize the pdf since it's a document from a
judge
>>> court.
>>> 3) Basically, I list all the pages from a pdf and iterate over them,
saving
>>> them separatedly in a pdf with pdfWithOnePage.addPage() and
>>> pdfWithOnePage.save(..)
>>>
>>> My problem is that I need to find why a pdf is so big (1337kb), what
could
>>> possible make a pdf bigger? I can only think on resources, like images,
and
>>> this pdf has only one image with 71kb.
>>>
>>>
>>>
>>>
>>> Atenciosamente,
>>>
>>> José Rodolfo Carrijo de Freitas
>>> Analista de Sistemas
>>> Pesquisa e desenvolvimento
>>> Softplan/Poligraph
>>> + 55 48 3027-8000
>>> www.softplan.com.br
>>>
>>>
>>>>>-----Mensagem original-----
>>>>>De: Eric Douglas [mailto:edouglas@blockhouse.com]
>>>>>Enviada em: segunda-feira, 29 de agosto de 2011 17:41
>>>>>Para: users@pdfbox.apache.org
>>>>>Assunto: RE: how to find why a pdf is so big in pdfdebugger
>>>>>
>>>>>Can you be more specific?
>>>>>Do you have test samples?
>>>>>How did you extract the page?
>>>>>
>>>>>
>>>>>-----Original Message-----
>>>>>From: José Rodolfo Carrijo de Freitas
>>> [mailto:jose.freitas@softplan.com.br]
>>>>>Sent: Monday, August 29, 2011 3:56 PM
>>>>>To: users@pdfbox.apache.org
>>>>>Subject: how to find why a pdf is so big in pdfdebugger
>>>>>
>>>>>Hey guys,
>>>>>
>>>>>I have a pdf with one page that have some texts and a 71kb image, How
>>> could be
>>>>>that this pdf has 1337 kb ?
>>>>>
>>>>>I'd like to find what objects are bloating the size of this pdf in
>>> PDFDebugger but I
>>>>>couldn't find a way to check on that.
>>>>>I was hoping if you could give me some tips.
>>>>>
>>>>>This pdf is actually one page extracted from a 22 pages pdf, and the
>>> original file
>>>>>(with 22 pages) has 1345 kb, so I'm guessing that for some reason this
>>> page is
>>>>>holding references from all the resources of the original pdf.
>>>>>And I'd like to be sure of that, is there a way through pdfDebugger to
>>> check this?
>>>>>
>>>>>
>>>>>Thanks.
>>>>>José Freitas.
>>>>>
>>>>>
>>>>>
>>>>>>>-----Mensagem original-----
>>>>>>>De: Andreas Lehmkuehler [mailto:andreas@lehmi.de]
>>>>>>>Enviada em: domingo, 28 de agosto de 2011 09:12
>>>>>>>Para: users@pdfbox.apache.org
>>>>>>>Assunto: Re: pdfbox for .NET compilation and use example request
>>>>>>>
>>>>>>>Hi,
>>>>>>>
>>>>>>>Am 27.08.2011 19:41, schrieb Z W:
>>>>>>>> Hi
>>>>>>>>
>>>>>>>> I was reading the section on compiling Pdfbox to dll for .NET use,
>>>>>>>> supposing Ant use.
>>>>>>>> Is there a simple command line way to just compile the pdfbox
without
>>>>>>>> using Ant ?
>>>>>>>> Detailed example would be helpful
>>>>>>>> I apologize I need more help to get this to work
>>>>>>>If you are looking for a precompiled .NET-version, have a look at [1]
>>>>>>>
>>>>>>>> Thanks
>>>>>>>
>>>>>>>
>>>>>>>BR
>>>>>>>Andreas Lehmkühler
>>>>>>>
>>>>>>>[1] http://pdfbox.lehmi.de/
>>>
>>>


Re: how to find why a pdf is so big in pdfdebugger

Posted by mehdi houshmand <me...@gmail.com>.
Hi Jose,

There are several factors that could affect the size of a PDF, one
factor we found that made a significant difference was accessibility
features within a PDF. These are features that assist text-to-audio
clients for the visually impaired and there will be a structure tree
in the PDF if it does have accessibility features. Another factor is
the compression, if your PDF isn't compressed with a Flate encoder,
that also makes big impact on file size. However, without seeing the
PDF, we can only speculate.

Hope that helps

Mehdi

On 30 August 2011 15:01, José Rodolfo Carrijo de Freitas
<jo...@softplan.com.br> wrote:
> Hi Eric,
>
> 1) I'd like to, but I don't know what to say to be more specific,
> 2) I believe I cannot publicize the pdf since it's a document from a judge
> court.
> 3) Basically, I list all the pages from a pdf and iterate over them, saving
> them separatedly in a pdf with pdfWithOnePage.addPage() and
> pdfWithOnePage.save(..)
>
> My problem is that I need to find why a pdf is so big (1337kb), what could
> possible make a pdf bigger? I can only think on resources, like images, and
> this pdf has only one image with 71kb.
>
>
>
>
> Atenciosamente,
>
> José Rodolfo Carrijo de Freitas
> Analista de Sistemas
> Pesquisa e desenvolvimento
> Softplan/Poligraph
> + 55 48 3027-8000
> www.softplan.com.br
>
>
>>>-----Mensagem original-----
>>>De: Eric Douglas [mailto:edouglas@blockhouse.com]
>>>Enviada em: segunda-feira, 29 de agosto de 2011 17:41
>>>Para: users@pdfbox.apache.org
>>>Assunto: RE: how to find why a pdf is so big in pdfdebugger
>>>
>>>Can you be more specific?
>>>Do you have test samples?
>>>How did you extract the page?
>>>
>>>
>>>-----Original Message-----
>>>From: José Rodolfo Carrijo de Freitas
> [mailto:jose.freitas@softplan.com.br]
>>>Sent: Monday, August 29, 2011 3:56 PM
>>>To: users@pdfbox.apache.org
>>>Subject: how to find why a pdf is so big in pdfdebugger
>>>
>>>Hey guys,
>>>
>>>I have a pdf with one page that have some texts and a 71kb image, How
> could be
>>>that this pdf has 1337 kb ?
>>>
>>>I'd like to find what objects are bloating the size of this pdf in
> PDFDebugger but I
>>>couldn't find a way to check on that.
>>>I was hoping if you could give me some tips.
>>>
>>>This pdf is actually one page extracted from a 22 pages pdf, and the
> original file
>>>(with 22 pages) has 1345 kb, so I'm guessing that for some reason this
> page is
>>>holding references from all the resources of the original pdf.
>>>And I'd like to be sure of that, is there a way through pdfDebugger to
> check this?
>>>
>>>
>>>Thanks.
>>>José Freitas.
>>>
>>>
>>>
>>>>>-----Mensagem original-----
>>>>>De: Andreas Lehmkuehler [mailto:andreas@lehmi.de]
>>>>>Enviada em: domingo, 28 de agosto de 2011 09:12
>>>>>Para: users@pdfbox.apache.org
>>>>>Assunto: Re: pdfbox for .NET compilation and use example request
>>>>>
>>>>>Hi,
>>>>>
>>>>>Am 27.08.2011 19:41, schrieb Z W:
>>>>>> Hi
>>>>>>
>>>>>> I was reading the section on compiling Pdfbox to dll for .NET use,
>>>>>> supposing Ant use.
>>>>>> Is there a simple command line way to just compile the pdfbox without
>>>>>> using Ant ?
>>>>>> Detailed example would be helpful
>>>>>> I apologize I need more help to get this to work
>>>>>If you are looking for a precompiled .NET-version, have a look at [1]
>>>>>
>>>>>> Thanks
>>>>>
>>>>>
>>>>>BR
>>>>>Andreas Lehmkühler
>>>>>
>>>>>[1] http://pdfbox.lehmi.de/
>
>

RES: how to find why a pdf is so big in pdfdebugger

Posted by José Rodolfo Carrijo de Freitas <jo...@softplan.com.br>.
Hi Eric,

1) I'd like to, but I don't know what to say to be more specific,
2) I believe I cannot publicize the pdf since it's a document from a judge
court.
3) Basically, I list all the pages from a pdf and iterate over them, saving
them separatedly in a pdf with pdfWithOnePage.addPage() and
pdfWithOnePage.save(..)

My problem is that I need to find why a pdf is so big (1337kb), what could
possible make a pdf bigger? I can only think on resources, like images, and
this pdf has only one image with 71kb.




Atenciosamente,

José Rodolfo Carrijo de Freitas
Analista de Sistemas
Pesquisa e desenvolvimento
Softplan/Poligraph
+ 55 48 3027-8000
www.softplan.com.br


>>-----Mensagem original-----
>>De: Eric Douglas [mailto:edouglas@blockhouse.com]
>>Enviada em: segunda-feira, 29 de agosto de 2011 17:41
>>Para: users@pdfbox.apache.org
>>Assunto: RE: how to find why a pdf is so big in pdfdebugger
>>
>>Can you be more specific?
>>Do you have test samples?
>>How did you extract the page?
>>
>>
>>-----Original Message-----
>>From: José Rodolfo Carrijo de Freitas
[mailto:jose.freitas@softplan.com.br]
>>Sent: Monday, August 29, 2011 3:56 PM
>>To: users@pdfbox.apache.org
>>Subject: how to find why a pdf is so big in pdfdebugger
>>
>>Hey guys,
>>
>>I have a pdf with one page that have some texts and a 71kb image, How
could be
>>that this pdf has 1337 kb ?
>>
>>I'd like to find what objects are bloating the size of this pdf in
PDFDebugger but I
>>couldn't find a way to check on that.
>>I was hoping if you could give me some tips.
>>
>>This pdf is actually one page extracted from a 22 pages pdf, and the
original file
>>(with 22 pages) has 1345 kb, so I'm guessing that for some reason this
page is
>>holding references from all the resources of the original pdf.
>>And I'd like to be sure of that, is there a way through pdfDebugger to
check this?
>>
>>
>>Thanks.
>>José Freitas.
>>
>>
>>
>>>>-----Mensagem original-----
>>>>De: Andreas Lehmkuehler [mailto:andreas@lehmi.de]
>>>>Enviada em: domingo, 28 de agosto de 2011 09:12
>>>>Para: users@pdfbox.apache.org
>>>>Assunto: Re: pdfbox for .NET compilation and use example request
>>>>
>>>>Hi,
>>>>
>>>>Am 27.08.2011 19:41, schrieb Z W:
>>>>> Hi
>>>>>
>>>>> I was reading the section on compiling Pdfbox to dll for .NET use,
>>>>> supposing Ant use.
>>>>> Is there a simple command line way to just compile the pdfbox without
>>>>> using Ant ?
>>>>> Detailed example would be helpful
>>>>> I apologize I need more help to get this to work
>>>>If you are looking for a precompiled .NET-version, have a look at [1]
>>>>
>>>>> Thanks
>>>>
>>>>
>>>>BR
>>>>Andreas Lehmkühler
>>>>
>>>>[1] http://pdfbox.lehmi.de/


RE: how to find why a pdf is so big in pdfdebugger

Posted by Eric Douglas <ed...@blockhouse.com>.
Can you be more specific?
Do you have test samples?
How did you extract the page?
 

-----Original Message-----
From: José Rodolfo Carrijo de Freitas [mailto:jose.freitas@softplan.com.br] 
Sent: Monday, August 29, 2011 3:56 PM
To: users@pdfbox.apache.org
Subject: how to find why a pdf is so big in pdfdebugger

Hey guys,

I have a pdf with one page that have some texts and a 71kb image, How could be that this pdf has 1337 kb ?

I'd like to find what objects are bloating the size of this pdf in PDFDebugger but I couldn't find a way to check on that.
I was hoping if you could give me some tips.

This pdf is actually one page extracted from a 22 pages pdf, and the original file (with 22 pages) has 1345 kb, so I'm guessing that for some reason this page is holding references from all the resources of the original pdf.
And I'd like to be sure of that, is there a way through pdfDebugger to check this?


Thanks.
José Freitas.



>>-----Mensagem original-----
>>De: Andreas Lehmkuehler [mailto:andreas@lehmi.de]
>>Enviada em: domingo, 28 de agosto de 2011 09:12
>>Para: users@pdfbox.apache.org
>>Assunto: Re: pdfbox for .NET compilation and use example request
>>
>>Hi,
>>
>>Am 27.08.2011 19:41, schrieb Z W:
>>> Hi
>>>
>>> I was reading the section on compiling Pdfbox to dll for .NET use,
>>> supposing Ant use.
>>> Is there a simple command line way to just compile the pdfbox without
>>> using Ant ?
>>> Detailed example would be helpful
>>> I apologize I need more help to get this to work
>>If you are looking for a precompiled .NET-version, have a look at [1]
>>
>>> Thanks
>>
>>
>>BR
>>Andreas Lehmkühler
>>
>>[1] http://pdfbox.lehmi.de/