You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@pdfbox.apache.org by Jesse Long <je...@gmail.com> on 2015/06/02 16:15:11 UTC
Scratch files - too many files open
Hi All,
Regarding PDFBOX-2301, and the use of scratch files: right now, each
COSStream uses one or two scratch files.
I recently ran into the problem on Linux where the max number of open
files allowed to the JVM by the OS was reached because of this.
Is there a plan around this?
Is it maybe that my use case is not expected?
My use case is:
Open PDDocument 1
Open PDDocument 2
for a few hundred times
import page 1 of PDDocument 1 into PDDocument 2 and overlay
some stuff ontop.
save PDDocument 2.
I have written a patch to use one single java.io.RandomAccessFile as a
scratch file per COSDocument, using pages in a doubly linked list to
separate streams in the same file. Would you be interested in adding
this to PDFBox?
Thanks,
Jesse
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org
Re: Scratch files - too many files open
Posted by Andreas Lehmkuehler <an...@lehmi.de>.
Hi,
Am 03.06.2015 um 13:20 schrieb Jesse Long:
> On 03/06/2015 12:46, Andreas Lehmkühler wrote:
>> Hi,
>>
>>> Jesse Long <je...@gmail.com> hat am 3. Juni 2015 um 08:45 geschrieben:
>>>
>>>
>>> On 02/06/2015 17:48, Andreas Lehmkuehler wrote:
>>>> Hi,
>>>>
>>>> Am 02.06.2015 um 16:15 schrieb Jesse Long:
>>>>> Hi All,
>>>>>
>>>>> Regarding PDFBOX-2301, and the use of scratch files: right now, each
>>>>> COSStream
>>>>> uses one or two scratch files.
>>>>>
>>>>> I recently ran into the problem on Linux where the max number of open
>>>>> files
>>>>> allowed to the JVM by the OS was reached because of this.
>>>>>
>>>>> Is there a plan around this?
>>>>>
>>>>> Is it maybe that my use case is not expected?
>>>> I'm aware of that. The refactoring is still in progress. I expect to
>>>> reduce the number of open files.
>>>>
>>>>> My use case is:
>>>>> Open PDDocument 1
>>>>> Open PDDocument 2
>>>>> for a few hundred times
>>>>> import page 1 of PDDocument 1 into PDDocument 2 and overlay
>>>>> some stuff
>>>>> ontop.
>>>>> save PDDocument 2.
>>>>>
>>>>> I have written a patch to use one single java.io.RandomAccessFile as
>>>>> a scratch
>>>>> file per COSDocument, using pages in a doubly linked list to separate
>>>>> streams in
>>>>> the same file. Would you be interested in adding this to PDFBox?
>>>> To use one file only led to problems when creating pdfs from scratch.
>>>> It is possible to write to 2 COSStreams at the same time which
>>>> corrupts pdf.
>>> Hi Andreas,
>>>
>>> Do you mean at the same time, as in multiple threads, or single thread
>>> writing a bit to this stream and then a bit to another stream back and
>>> forth?
>> It's about the second case. You can't add fonts and/or images to a page while
>> adding content to a contentstream the same time. You have to add those before
>> opening a stream or you have to close the stream before
>>
>>> For the single thread use case, I have solved this in my patch.
>>> Actually, even multiple thread should be easy to support with
>>> synchronization. I'll work on some docs and submit and you can see if
>>> you like it.
>> At least it sounds interesting and I'm happy to look at it.
>>
>
> Please see patch attached.
Looks promising, I'll have a deeper look later.
> Thanks,
> Jesse
Thanks,
Andreas
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org
Re: Scratch files - too many files open
Posted by Andreas Lehmkühler <an...@lehmi.de>.
Hi,
> Jesse Long <je...@gmail.com> hat am 3. Juni 2015 um 13:20 geschrieben:
>
>
> On 03/06/2015 12:46, Andreas Lehmkühler wrote:
> > Hi,
> >
> >> Jesse Long <je...@gmail.com> hat am 3. Juni 2015 um 08:45
> >> geschrieben:
> >>
> >>
> >> On 02/06/2015 17:48, Andreas Lehmkuehler wrote:
> >>> Hi,
> >>>
> >>> Am 02.06.2015 um 16:15 schrieb Jesse Long:
> >>>> Hi All,
> >>>>
> >>>> Regarding PDFBOX-2301, and the use of scratch files: right now, each
> >>>> COSStream
> >>>> uses one or two scratch files.
> >>>>
> >>>> I recently ran into the problem on Linux where the max number of open
> >>>> files
> >>>> allowed to the JVM by the OS was reached because of this.
> >>>>
> >>>> Is there a plan around this?
> >>>>
> >>>> Is it maybe that my use case is not expected?
> >>> I'm aware of that. The refactoring is still in progress. I expect to
> >>> reduce the number of open files.
> >>>
> >>>> My use case is:
> >>>> Open PDDocument 1
> >>>> Open PDDocument 2
> >>>> for a few hundred times
> >>>> import page 1 of PDDocument 1 into PDDocument 2 and overlay
> >>>> some stuff
> >>>> ontop.
> >>>> save PDDocument 2.
> >>>>
> >>>> I have written a patch to use one single java.io.RandomAccessFile as
> >>>> a scratch
> >>>> file per COSDocument, using pages in a doubly linked list to separate
> >>>> streams in
> >>>> the same file. Would you be interested in adding this to PDFBox?
> >>> To use one file only led to problems when creating pdfs from scratch.
> >>> It is possible to write to 2 COSStreams at the same time which
> >>> corrupts pdf.
> >> Hi Andreas,
> >>
> >> Do you mean at the same time, as in multiple threads, or single thread
> >> writing a bit to this stream and then a bit to another stream back and
> >> forth?
> > It's about the second case. You can't add fonts and/or images to a page
> > while
> > adding content to a contentstream the same time. You have to add those
> > before
> > opening a stream or you have to close the stream before
> >
> >> For the single thread use case, I have solved this in my patch.
> >> Actually, even multiple thread should be easy to support with
> >> synchronization. I'll work on some docs and submit and you can see if
> >> you like it.
> > At least it sounds interesting and I'm happy to look at it.
> >
>
> Please see patch attached.
I've attached your patch to PDFBOX-2301 so that it can't get lost.
>
> Thanks,
> Jesse
BR
Andreas
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org
Re: Scratch files - too many files open
Posted by Jesse Long <je...@gmail.com>.
On 06/06/2015 18:44, Andreas Lehmkuehler wrote:
> I've added the patch in r1683929 to the trunk with some sligth
> modifications.
Thank you.
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org
Re: Scratch files - too many files open
Posted by Andreas Lehmkuehler <an...@lehmi.de>.
Hi,
Am 03.06.2015 um 13:20 schrieb Jesse Long:
> On 03/06/2015 12:46, Andreas Lehmkühler wrote:
>> Hi,
>>
>>> Jesse Long <je...@gmail.com> hat am 3. Juni 2015 um 08:45 geschrieben:
>>>
>>>
>>> On 02/06/2015 17:48, Andreas Lehmkuehler wrote:
>>>> Hi,
>>>>
>>>> Am 02.06.2015 um 16:15 schrieb Jesse Long:
>>>>> Hi All,
>>>>>
>>>>> Regarding PDFBOX-2301, and the use of scratch files: right now, each
>>>>> COSStream
>>>>> uses one or two scratch files.
>>>>>
>>>>> I recently ran into the problem on Linux where the max number of open
>>>>> files
>>>>> allowed to the JVM by the OS was reached because of this.
>>>>>
>>>>> Is there a plan around this?
>>>>>
>>>>> Is it maybe that my use case is not expected?
>>>> I'm aware of that. The refactoring is still in progress. I expect to
>>>> reduce the number of open files.
>>>>
>>>>> My use case is:
>>>>> Open PDDocument 1
>>>>> Open PDDocument 2
>>>>> for a few hundred times
>>>>> import page 1 of PDDocument 1 into PDDocument 2 and overlay
>>>>> some stuff
>>>>> ontop.
>>>>> save PDDocument 2.
>>>>>
>>>>> I have written a patch to use one single java.io.RandomAccessFile as
>>>>> a scratch
>>>>> file per COSDocument, using pages in a doubly linked list to separate
>>>>> streams in
>>>>> the same file. Would you be interested in adding this to PDFBox?
>>>> To use one file only led to problems when creating pdfs from scratch.
>>>> It is possible to write to 2 COSStreams at the same time which
>>>> corrupts pdf.
>>> Hi Andreas,
>>>
>>> Do you mean at the same time, as in multiple threads, or single thread
>>> writing a bit to this stream and then a bit to another stream back and
>>> forth?
>> It's about the second case. You can't add fonts and/or images to a page while
>> adding content to a contentstream the same time. You have to add those before
>> opening a stream or you have to close the stream before
>>
>>> For the single thread use case, I have solved this in my patch.
>>> Actually, even multiple thread should be easy to support with
>>> synchronization. I'll work on some docs and submit and you can see if
>>> you like it.
>> At least it sounds interesting and I'm happy to look at it.
>>
>
> Please see patch attached.
I've added the patch in r1683929 to the trunk with some sligth modifications.
> Thanks,
> Jesse
Thanks for the contribution!!
BR
Andreas
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org
Re: Scratch files - too many files open
Posted by Jesse Long <je...@gmail.com>.
On 03/06/2015 12:46, Andreas Lehmkühler wrote:
> Hi,
>
>> Jesse Long <je...@gmail.com> hat am 3. Juni 2015 um 08:45 geschrieben:
>>
>>
>> On 02/06/2015 17:48, Andreas Lehmkuehler wrote:
>>> Hi,
>>>
>>> Am 02.06.2015 um 16:15 schrieb Jesse Long:
>>>> Hi All,
>>>>
>>>> Regarding PDFBOX-2301, and the use of scratch files: right now, each
>>>> COSStream
>>>> uses one or two scratch files.
>>>>
>>>> I recently ran into the problem on Linux where the max number of open
>>>> files
>>>> allowed to the JVM by the OS was reached because of this.
>>>>
>>>> Is there a plan around this?
>>>>
>>>> Is it maybe that my use case is not expected?
>>> I'm aware of that. The refactoring is still in progress. I expect to
>>> reduce the number of open files.
>>>
>>>> My use case is:
>>>> Open PDDocument 1
>>>> Open PDDocument 2
>>>> for a few hundred times
>>>> import page 1 of PDDocument 1 into PDDocument 2 and overlay
>>>> some stuff
>>>> ontop.
>>>> save PDDocument 2.
>>>>
>>>> I have written a patch to use one single java.io.RandomAccessFile as
>>>> a scratch
>>>> file per COSDocument, using pages in a doubly linked list to separate
>>>> streams in
>>>> the same file. Would you be interested in adding this to PDFBox?
>>> To use one file only led to problems when creating pdfs from scratch.
>>> It is possible to write to 2 COSStreams at the same time which
>>> corrupts pdf.
>> Hi Andreas,
>>
>> Do you mean at the same time, as in multiple threads, or single thread
>> writing a bit to this stream and then a bit to another stream back and
>> forth?
> It's about the second case. You can't add fonts and/or images to a page while
> adding content to a contentstream the same time. You have to add those before
> opening a stream or you have to close the stream before
>
>> For the single thread use case, I have solved this in my patch.
>> Actually, even multiple thread should be easy to support with
>> synchronization. I'll work on some docs and submit and you can see if
>> you like it.
> At least it sounds interesting and I'm happy to look at it.
>
Please see patch attached.
Thanks,
Jesse
Re: Scratch files - too many files open
Posted by Andreas Lehmkühler <an...@lehmi.de>.
Hi,
> Jesse Long <je...@gmail.com> hat am 3. Juni 2015 um 08:45 geschrieben:
>
>
> On 02/06/2015 17:48, Andreas Lehmkuehler wrote:
> > Hi,
> >
> > Am 02.06.2015 um 16:15 schrieb Jesse Long:
> >> Hi All,
> >>
> >> Regarding PDFBOX-2301, and the use of scratch files: right now, each
> >> COSStream
> >> uses one or two scratch files.
> >>
> >> I recently ran into the problem on Linux where the max number of open
> >> files
> >> allowed to the JVM by the OS was reached because of this.
> >>
> >> Is there a plan around this?
> >>
> >> Is it maybe that my use case is not expected?
> > I'm aware of that. The refactoring is still in progress. I expect to
> > reduce the number of open files.
> >
> >> My use case is:
> >> Open PDDocument 1
> >> Open PDDocument 2
> >> for a few hundred times
> >> import page 1 of PDDocument 1 into PDDocument 2 and overlay
> >> some stuff
> >> ontop.
> >> save PDDocument 2.
> >>
> >> I have written a patch to use one single java.io.RandomAccessFile as
> >> a scratch
> >> file per COSDocument, using pages in a doubly linked list to separate
> >> streams in
> >> the same file. Would you be interested in adding this to PDFBox?
> > To use one file only led to problems when creating pdfs from scratch.
> > It is possible to write to 2 COSStreams at the same time which
> > corrupts pdf.
>
> Hi Andreas,
>
> Do you mean at the same time, as in multiple threads, or single thread
> writing a bit to this stream and then a bit to another stream back and
> forth?
It's about the second case. You can't add fonts and/or images to a page while
adding content to a contentstream the same time. You have to add those before
opening a stream or you have to close the stream before
> For the single thread use case, I have solved this in my patch.
> Actually, even multiple thread should be easy to support with
> synchronization. I'll work on some docs and submit and you can see if
> you like it.
At least it sounds interesting and I'm happy to look at it.
> Thanks,
> Jesse
Thanks for the offer
BR
Andreas
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org
Re: Scratch files - too many files open
Posted by Jesse Long <je...@gmail.com>.
On 02/06/2015 17:48, Andreas Lehmkuehler wrote:
> Hi,
>
> Am 02.06.2015 um 16:15 schrieb Jesse Long:
>> Hi All,
>>
>> Regarding PDFBOX-2301, and the use of scratch files: right now, each
>> COSStream
>> uses one or two scratch files.
>>
>> I recently ran into the problem on Linux where the max number of open
>> files
>> allowed to the JVM by the OS was reached because of this.
>>
>> Is there a plan around this?
>>
>> Is it maybe that my use case is not expected?
> I'm aware of that. The refactoring is still in progress. I expect to
> reduce the number of open files.
>
>> My use case is:
>> Open PDDocument 1
>> Open PDDocument 2
>> for a few hundred times
>> import page 1 of PDDocument 1 into PDDocument 2 and overlay
>> some stuff
>> ontop.
>> save PDDocument 2.
>>
>> I have written a patch to use one single java.io.RandomAccessFile as
>> a scratch
>> file per COSDocument, using pages in a doubly linked list to separate
>> streams in
>> the same file. Would you be interested in adding this to PDFBox?
> To use one file only led to problems when creating pdfs from scratch.
> It is possible to write to 2 COSStreams at the same time which
> corrupts pdf.
Hi Andreas,
Do you mean at the same time, as in multiple threads, or single thread
writing a bit to this stream and then a bit to another stream back and
forth?
For the single thread use case, I have solved this in my patch.
Actually, even multiple thread should be easy to support with
synchronization. I'll work on some docs and submit and you can see if
you like it.
Thanks,
Jesse
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org
Re: Scratch files - too many files open
Posted by Andreas Lehmkuehler <an...@lehmi.de>.
Hi,
Am 02.06.2015 um 16:15 schrieb Jesse Long:
> Hi All,
>
> Regarding PDFBOX-2301, and the use of scratch files: right now, each COSStream
> uses one or two scratch files.
>
> I recently ran into the problem on Linux where the max number of open files
> allowed to the JVM by the OS was reached because of this.
>
> Is there a plan around this?
>
> Is it maybe that my use case is not expected?
I'm aware of that. The refactoring is still in progress. I expect to reduce the
number of open files.
> My use case is:
> Open PDDocument 1
> Open PDDocument 2
> for a few hundred times
> import page 1 of PDDocument 1 into PDDocument 2 and overlay some stuff
> ontop.
> save PDDocument 2.
>
> I have written a patch to use one single java.io.RandomAccessFile as a scratch
> file per COSDocument, using pages in a doubly linked list to separate streams in
> the same file. Would you be interested in adding this to PDFBox?
To use one file only led to problems when creating pdfs from scratch. It is
possible to write to 2 COSStreams at the same time which corrupts pdf.
> Thanks,
> Jesse
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
> For additional commands, e-mail: users-help@pdfbox.apache.org
BR
Andreas Lehmkühler
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org
Re: Scratch files - too many files open
Posted by Tilman Hausherr <TH...@t-online.de>.
Am 02.06.2015 um 16:15 schrieb Jesse Long:
> Hi All,
>
> Regarding PDFBOX-2301, and the use of scratch files: right now, each
> COSStream uses one or two scratch files.
>
> I recently ran into the problem on Linux where the max number of open
> files allowed to the JVM by the OS was reached because of this.
>
> Is there a plan around this?
>
> Is it maybe that my use case is not expected?
>
> My use case is:
> Open PDDocument 1
> Open PDDocument 2
> for a few hundred times
> import page 1 of PDDocument 1 into PDDocument 2 and overlay
> some stuff ontop.
> save PDDocument 2.
Did you close the documents when done?
Tilman
>
> I have written a patch to use one single java.io.RandomAccessFile as a
> scratch file per COSDocument, using pages in a doubly linked list to
> separate streams in the same file. Would you be interested in adding
> this to PDFBox?
>
> Thanks,
> Jesse
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
> For additional commands, e-mail: users-help@pdfbox.apache.org
>
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org