You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@pdfbox.apache.org by Jesse Long <je...@gmail.com> on 2015/06/02 16:15:11 UTC

Scratch files - too many files open

Hi All,

Regarding PDFBOX-2301, and the use of scratch files: right now, each 
COSStream uses one or two scratch files.

I recently ran into the problem on Linux where the max number of open 
files allowed to the JVM by the OS was reached because of this.

Is there a plan around this?

Is it maybe that my use case is not expected?

My use case is:
Open PDDocument 1
Open PDDocument 2
for a few hundred times
         import page 1 of PDDocument 1 into PDDocument 2 and overlay 
some stuff ontop.
save PDDocument 2.

I have written a patch to use one single java.io.RandomAccessFile as a 
scratch file per COSDocument, using pages in a doubly linked list to 
separate streams in the same file. Would you be interested in adding 
this to PDFBox?

Thanks,
Jesse

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org


Re: Scratch files - too many files open

Posted by Andreas Lehmkuehler <an...@lehmi.de>.
Hi,

Am 03.06.2015 um 13:20 schrieb Jesse Long:
> On 03/06/2015 12:46, Andreas Lehmkühler wrote:
>> Hi,
>>
>>> Jesse Long <je...@gmail.com> hat am 3. Juni 2015 um 08:45 geschrieben:
>>>
>>>
>>> On 02/06/2015 17:48, Andreas Lehmkuehler wrote:
>>>> Hi,
>>>>
>>>> Am 02.06.2015 um 16:15 schrieb Jesse Long:
>>>>> Hi All,
>>>>>
>>>>> Regarding PDFBOX-2301, and the use of scratch files: right now, each
>>>>> COSStream
>>>>> uses one or two scratch files.
>>>>>
>>>>> I recently ran into the problem on Linux where the max number of open
>>>>> files
>>>>> allowed to the JVM by the OS was reached because of this.
>>>>>
>>>>> Is there a plan around this?
>>>>>
>>>>> Is it maybe that my use case is not expected?
>>>> I'm aware of that. The refactoring is still in progress. I expect to
>>>> reduce the number of open files.
>>>>
>>>>> My use case is:
>>>>> Open PDDocument 1
>>>>> Open PDDocument 2
>>>>> for a few hundred times
>>>>>           import page 1 of PDDocument 1 into PDDocument 2 and overlay
>>>>> some stuff
>>>>> ontop.
>>>>> save PDDocument 2.
>>>>>
>>>>> I have written a patch to use one single java.io.RandomAccessFile as
>>>>> a scratch
>>>>> file per COSDocument, using pages in a doubly linked list to separate
>>>>> streams in
>>>>> the same file. Would you be interested in adding this to PDFBox?
>>>> To use one file only led to problems when creating pdfs from scratch.
>>>> It is possible to write to 2 COSStreams at the same time which
>>>> corrupts pdf.
>>> Hi Andreas,
>>>
>>> Do you mean at the same time, as in multiple threads, or single thread
>>> writing a bit to this stream and then a bit to another stream back and
>>> forth?
>> It's about the second case. You can't add fonts and/or images to a page while
>> adding content to a contentstream the same time. You have to add those before
>> opening a stream or you have to close the stream before
>>
>>> For the single thread use case, I have solved this in my patch.
>>> Actually, even multiple thread should be easy to support with
>>> synchronization. I'll work on some docs and submit and you can see if
>>> you like it.
>> At least it sounds interesting and I'm happy to look at it.
>>
>
> Please see patch attached.
Looks promising, I'll have a deeper look later.


> Thanks,
> Jesse

Thanks,
Andreas


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org


Re: Scratch files - too many files open

Posted by Andreas Lehmkühler <an...@lehmi.de>.
Hi,

> Jesse Long <je...@gmail.com> hat am 3. Juni 2015 um 13:20 geschrieben:
> 
> 
> On 03/06/2015 12:46, Andreas Lehmkühler wrote:
> > Hi,
> >
> >> Jesse Long <je...@gmail.com> hat am 3. Juni 2015 um 08:45
> >> geschrieben:
> >>
> >>
> >> On 02/06/2015 17:48, Andreas Lehmkuehler wrote:
> >>> Hi,
> >>>
> >>> Am 02.06.2015 um 16:15 schrieb Jesse Long:
> >>>> Hi All,
> >>>>
> >>>> Regarding PDFBOX-2301, and the use of scratch files: right now, each
> >>>> COSStream
> >>>> uses one or two scratch files.
> >>>>
> >>>> I recently ran into the problem on Linux where the max number of open
> >>>> files
> >>>> allowed to the JVM by the OS was reached because of this.
> >>>>
> >>>> Is there a plan around this?
> >>>>
> >>>> Is it maybe that my use case is not expected?
> >>> I'm aware of that. The refactoring is still in progress. I expect to
> >>> reduce the number of open files.
> >>>
> >>>> My use case is:
> >>>> Open PDDocument 1
> >>>> Open PDDocument 2
> >>>> for a few hundred times
> >>>>           import page 1 of PDDocument 1 into PDDocument 2 and overlay
> >>>> some stuff
> >>>> ontop.
> >>>> save PDDocument 2.
> >>>>
> >>>> I have written a patch to use one single java.io.RandomAccessFile as
> >>>> a scratch
> >>>> file per COSDocument, using pages in a doubly linked list to separate
> >>>> streams in
> >>>> the same file. Would you be interested in adding this to PDFBox?
> >>> To use one file only led to problems when creating pdfs from scratch.
> >>> It is possible to write to 2 COSStreams at the same time which
> >>> corrupts pdf.
> >> Hi Andreas,
> >>
> >> Do you mean at the same time, as in multiple threads, or single thread
> >> writing a bit to this stream and then a bit to another stream back and
> >> forth?
> > It's about the second case. You can't add fonts and/or images to a page
> > while
> > adding content to a contentstream the same time. You have to add those
> > before
> > opening a stream or you have to close the stream before
> >
> >> For the single thread use case, I have solved this in my patch.
> >> Actually, even multiple thread should be easy to support with
> >> synchronization. I'll work on some docs and submit and you can see if
> >> you like it.
> > At least it sounds interesting and I'm happy to look at it.
> >
> 
> Please see patch attached.
I've attached your patch to PDFBOX-2301 so that it can't get lost.

> 
> Thanks,
> Jesse

BR
Andreas

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org


Re: Scratch files - too many files open

Posted by Jesse Long <je...@gmail.com>.
On 06/06/2015 18:44, Andreas Lehmkuehler wrote:
> I've added the patch in r1683929 to the trunk with some sligth 
> modifications.

Thank you.

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org


Re: Scratch files - too many files open

Posted by Andreas Lehmkuehler <an...@lehmi.de>.
Hi,

Am 03.06.2015 um 13:20 schrieb Jesse Long:
> On 03/06/2015 12:46, Andreas Lehmkühler wrote:
>> Hi,
>>
>>> Jesse Long <je...@gmail.com> hat am 3. Juni 2015 um 08:45 geschrieben:
>>>
>>>
>>> On 02/06/2015 17:48, Andreas Lehmkuehler wrote:
>>>> Hi,
>>>>
>>>> Am 02.06.2015 um 16:15 schrieb Jesse Long:
>>>>> Hi All,
>>>>>
>>>>> Regarding PDFBOX-2301, and the use of scratch files: right now, each
>>>>> COSStream
>>>>> uses one or two scratch files.
>>>>>
>>>>> I recently ran into the problem on Linux where the max number of open
>>>>> files
>>>>> allowed to the JVM by the OS was reached because of this.
>>>>>
>>>>> Is there a plan around this?
>>>>>
>>>>> Is it maybe that my use case is not expected?
>>>> I'm aware of that. The refactoring is still in progress. I expect to
>>>> reduce the number of open files.
>>>>
>>>>> My use case is:
>>>>> Open PDDocument 1
>>>>> Open PDDocument 2
>>>>> for a few hundred times
>>>>>           import page 1 of PDDocument 1 into PDDocument 2 and overlay
>>>>> some stuff
>>>>> ontop.
>>>>> save PDDocument 2.
>>>>>
>>>>> I have written a patch to use one single java.io.RandomAccessFile as
>>>>> a scratch
>>>>> file per COSDocument, using pages in a doubly linked list to separate
>>>>> streams in
>>>>> the same file. Would you be interested in adding this to PDFBox?
>>>> To use one file only led to problems when creating pdfs from scratch.
>>>> It is possible to write to 2 COSStreams at the same time which
>>>> corrupts pdf.
>>> Hi Andreas,
>>>
>>> Do you mean at the same time, as in multiple threads, or single thread
>>> writing a bit to this stream and then a bit to another stream back and
>>> forth?
>> It's about the second case. You can't add fonts and/or images to a page while
>> adding content to a contentstream the same time. You have to add those before
>> opening a stream or you have to close the stream before
>>
>>> For the single thread use case, I have solved this in my patch.
>>> Actually, even multiple thread should be easy to support with
>>> synchronization. I'll work on some docs and submit and you can see if
>>> you like it.
>> At least it sounds interesting and I'm happy to look at it.
>>
>
> Please see patch attached.
I've added the patch in r1683929 to the trunk with some sligth modifications.

> Thanks,
> Jesse

Thanks for the contribution!!

BR
Andreas

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org


Re: Scratch files - too many files open

Posted by Jesse Long <je...@gmail.com>.
On 03/06/2015 12:46, Andreas Lehmkühler wrote:
> Hi,
>
>> Jesse Long <je...@gmail.com> hat am 3. Juni 2015 um 08:45 geschrieben:
>>
>>
>> On 02/06/2015 17:48, Andreas Lehmkuehler wrote:
>>> Hi,
>>>
>>> Am 02.06.2015 um 16:15 schrieb Jesse Long:
>>>> Hi All,
>>>>
>>>> Regarding PDFBOX-2301, and the use of scratch files: right now, each
>>>> COSStream
>>>> uses one or two scratch files.
>>>>
>>>> I recently ran into the problem on Linux where the max number of open
>>>> files
>>>> allowed to the JVM by the OS was reached because of this.
>>>>
>>>> Is there a plan around this?
>>>>
>>>> Is it maybe that my use case is not expected?
>>> I'm aware of that. The refactoring is still in progress. I expect to
>>> reduce the number of open files.
>>>
>>>> My use case is:
>>>> Open PDDocument 1
>>>> Open PDDocument 2
>>>> for a few hundred times
>>>>           import page 1 of PDDocument 1 into PDDocument 2 and overlay
>>>> some stuff
>>>> ontop.
>>>> save PDDocument 2.
>>>>
>>>> I have written a patch to use one single java.io.RandomAccessFile as
>>>> a scratch
>>>> file per COSDocument, using pages in a doubly linked list to separate
>>>> streams in
>>>> the same file. Would you be interested in adding this to PDFBox?
>>> To use one file only led to problems when creating pdfs from scratch.
>>> It is possible to write to 2 COSStreams at the same time which
>>> corrupts pdf.
>> Hi Andreas,
>>
>> Do you mean at the same time, as in multiple threads, or single thread
>> writing a bit to this stream and then a bit to another stream back and
>> forth?
> It's about the second case. You can't add fonts and/or images to a page while
> adding content to a contentstream the same time. You have to add those before
> opening a stream or you have to close the stream before
>
>> For the single thread use case, I have solved this in my patch.
>> Actually, even multiple thread should be easy to support with
>> synchronization. I'll work on some docs and submit and you can see if
>> you like it.
> At least it sounds interesting and I'm happy to look at it.
>

Please see patch attached.

Thanks,
Jesse

Re: Scratch files - too many files open

Posted by Andreas Lehmkühler <an...@lehmi.de>.
Hi,

> Jesse Long <je...@gmail.com> hat am 3. Juni 2015 um 08:45 geschrieben:
> 
> 
> On 02/06/2015 17:48, Andreas Lehmkuehler wrote:
> > Hi,
> >
> > Am 02.06.2015 um 16:15 schrieb Jesse Long:
> >> Hi All,
> >>
> >> Regarding PDFBOX-2301, and the use of scratch files: right now, each 
> >> COSStream
> >> uses one or two scratch files.
> >>
> >> I recently ran into the problem on Linux where the max number of open 
> >> files
> >> allowed to the JVM by the OS was reached because of this.
> >>
> >> Is there a plan around this?
> >>
> >> Is it maybe that my use case is not expected?
> > I'm aware of that. The refactoring is still in progress. I expect to 
> > reduce the number of open files.
> >
> >> My use case is:
> >> Open PDDocument 1
> >> Open PDDocument 2
> >> for a few hundred times
> >>          import page 1 of PDDocument 1 into PDDocument 2 and overlay 
> >> some stuff
> >> ontop.
> >> save PDDocument 2.
> >>
> >> I have written a patch to use one single java.io.RandomAccessFile as 
> >> a scratch
> >> file per COSDocument, using pages in a doubly linked list to separate 
> >> streams in
> >> the same file. Would you be interested in adding this to PDFBox?
> > To use one file only led to problems when creating pdfs from scratch. 
> > It is possible to write to 2 COSStreams at the same time which 
> > corrupts pdf.
> 
> Hi Andreas,
> 
> Do you mean at the same time, as in multiple threads, or single thread 
> writing a bit to this stream and then a bit to another stream back and 
> forth?
It's about the second case. You can't add fonts and/or images to a page while
adding content to a contentstream the same time. You have to add those before
opening a stream or you have to close the stream before

> For the single thread use case, I have solved this in my patch. 
> Actually, even multiple thread should be easy to support with 
> synchronization. I'll work on some docs and submit and you can see if 
> you like it.
At least it sounds interesting and I'm happy to look at it.


> Thanks,
> Jesse
Thanks for the offer

BR
Andreas

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org


Re: Scratch files - too many files open

Posted by Jesse Long <je...@gmail.com>.
On 02/06/2015 17:48, Andreas Lehmkuehler wrote:
> Hi,
>
> Am 02.06.2015 um 16:15 schrieb Jesse Long:
>> Hi All,
>>
>> Regarding PDFBOX-2301, and the use of scratch files: right now, each 
>> COSStream
>> uses one or two scratch files.
>>
>> I recently ran into the problem on Linux where the max number of open 
>> files
>> allowed to the JVM by the OS was reached because of this.
>>
>> Is there a plan around this?
>>
>> Is it maybe that my use case is not expected?
> I'm aware of that. The refactoring is still in progress. I expect to 
> reduce the number of open files.
>
>> My use case is:
>> Open PDDocument 1
>> Open PDDocument 2
>> for a few hundred times
>>          import page 1 of PDDocument 1 into PDDocument 2 and overlay 
>> some stuff
>> ontop.
>> save PDDocument 2.
>>
>> I have written a patch to use one single java.io.RandomAccessFile as 
>> a scratch
>> file per COSDocument, using pages in a doubly linked list to separate 
>> streams in
>> the same file. Would you be interested in adding this to PDFBox?
> To use one file only led to problems when creating pdfs from scratch. 
> It is possible to write to 2 COSStreams at the same time which 
> corrupts pdf.

Hi Andreas,

Do you mean at the same time, as in multiple threads, or single thread 
writing a bit to this stream and then a bit to another stream back and 
forth?

For the single thread use case, I have solved this in my patch. 
Actually, even multiple thread should be easy to support with 
synchronization. I'll work on some docs and submit and you can see if 
you like it.

Thanks,
Jesse



---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org


Re: Scratch files - too many files open

Posted by Andreas Lehmkuehler <an...@lehmi.de>.
Hi,

Am 02.06.2015 um 16:15 schrieb Jesse Long:
> Hi All,
>
> Regarding PDFBOX-2301, and the use of scratch files: right now, each COSStream
> uses one or two scratch files.
>
> I recently ran into the problem on Linux where the max number of open files
> allowed to the JVM by the OS was reached because of this.
>
> Is there a plan around this?
>
> Is it maybe that my use case is not expected?
I'm aware of that. The refactoring is still in progress. I expect to reduce the 
number of open files.

> My use case is:
> Open PDDocument 1
> Open PDDocument 2
> for a few hundred times
>          import page 1 of PDDocument 1 into PDDocument 2 and overlay some stuff
> ontop.
> save PDDocument 2.
>
> I have written a patch to use one single java.io.RandomAccessFile as a scratch
> file per COSDocument, using pages in a doubly linked list to separate streams in
> the same file. Would you be interested in adding this to PDFBox?
To use one file only led to problems when creating pdfs from scratch. It is 
possible to write to 2 COSStreams at the same time which corrupts pdf.

> Thanks,
> Jesse
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
> For additional commands, e-mail: users-help@pdfbox.apache.org

BR
Andreas Lehmkühler


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org


Re: Scratch files - too many files open

Posted by Tilman Hausherr <TH...@t-online.de>.
Am 02.06.2015 um 16:15 schrieb Jesse Long:
> Hi All,
>
> Regarding PDFBOX-2301, and the use of scratch files: right now, each 
> COSStream uses one or two scratch files.
>
> I recently ran into the problem on Linux where the max number of open 
> files allowed to the JVM by the OS was reached because of this.
>
> Is there a plan around this?
>
> Is it maybe that my use case is not expected?
>
> My use case is:
> Open PDDocument 1
> Open PDDocument 2
> for a few hundred times
>         import page 1 of PDDocument 1 into PDDocument 2 and overlay 
> some stuff ontop.
> save PDDocument 2.

Did you close the documents when done?

Tilman

>
> I have written a patch to use one single java.io.RandomAccessFile as a 
> scratch file per COSDocument, using pages in a doubly linked list to 
> separate streams in the same file. Would you be interested in adding 
> this to PDFBox?
>
> Thanks,
> Jesse
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
> For additional commands, e-mail: users-help@pdfbox.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org