You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@pdfbox.apache.org by Gilad Denneboom <gi...@gmail.com> on 2023/03/14 15:05:07 UTC

"Too Many Open Files" IOException in ScratchFile

Hi all,

I created an application that opens many files (I'm talking thousands),
searching them for specific pages and then merges those pages into new PDF
files. The way I do it is by using the importPage command from the original
files into the split ones.
However, I'm getting an IOException ("Too many open files") from
ScratchFile after several thousands files were processed. I had a look at
the source code for that class and I think it might have to do with a
RandomAccessFile variable ("raf") not being properly closed.
All of the documents are opened using MemoryUsageSetting set to
setupTempFileOnly, by the way.
Could someone confirm this is the issue, and maybe help solve it? I'm using
PDFBox 2.0.26, by the way, and the app runs on a Mac.

The stack-trace:
Exception in thread "main" java.io.IOException: Too many open files
at java.base/java.io.UnixFileSystem.createFileExclusively0(Native Method)
at
java.base/java.io.UnixFileSystem.createFileExclusively(UnixFileSystem.java:356)
at java.base/java.io.File.createTempFile(File.java:2179)
at org.apache.pdfbox.io.ScratchFile.enlarge(ScratchFile.java:217)
at org.apache.pdfbox.io.ScratchFile.getNewPage(ScratchFile.java:167)
at
org.apache.pdfbox.io.ScratchFileBuffer.addPage(ScratchFileBuffer.java:126)
at org.apache.pdfbox.io.ScratchFileBuffer. <init>(ScratchFileBuffer.java:84)
at org.apache.pdfbox.io.ScratchFile.createBuffer(ScratchFile.java:424)
at org.apache.pdfbox.cos.COSStream.createRaw0utputStream(COSStream.java:273)
at org.apache.pdfbox.pdfparser.COSParser.parseCOSStream(COSParser.java:1140)
at org.apache.pdfbox.pdfparser.COSParser.parseFileObject(COSParser.java:929)
at
org.apache.pdfbox.pdfparser.COSParser.parseObjectDynamically(COSParser.java:888)
at
org.apache.pdfbox.pdfparser.COSParser.parseObjectDynamically(COSParser.java:800)
at
org.apache.pdfbox.pdfparser.COSParser.parseDictObjects(COSParser.java:760)
at org.apache.pdfbox.pdfparser.PDFParser.initialParse(PDFParser.java:187)
at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:226)
at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1107)
at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1090)
at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1014)
at MergeStudentRecords_2021.main(MergeStudentRecords_2021.java:324)

Thanks in advance!

Gilad

Re: "Too Many Open Files" IOException in ScratchFile

Posted by Gilad Denneboom <gi...@gmail.com>.

Yeah, I thought of doing that, too... OK, thanks for help, anyway!

On Fri, Mar 17, 2023 at 7:51 AM Andreas Lehmkuehler <an...@lehmi.de>
wrote:

> Am 15.03.23 um 17:51 schrieb Gilad Denneboom:
> > It's a bit more complicated than that. I have a small set of very large
> > files with different pages matching different people. I need to match
> those
> > pages based on some identifying code, and then extract them into either
> > individual files (one per person) or a single merged file with those
> pages
> > sorted by person. But yes, I do close the input files after scanning
> them,
> > and then open them later on to extract the relevant pages from them, if
> > needed. This is actually the reason I opted not to use PDFMergerUtility,
> as
> > it would require me to extract all the individual pages as separate
> files,
> > so I could merge them later on (as it's not possible to use it to only
> > merge parts of files).
> How about extracting those pages using the splitter? This will produce the
> file
> per person you are looking for. Use the merger to get the summary file. If
> there
> are to many files use several steps to do the merge.
>
> Andreas
>
> >
> > On Wed, Mar 15, 2023 at 5:28 PM Tilman Hausherr <TH...@t-online.de>
> > wrote:
> >
> >> Your text sounded like you're not picking stuff from all documents. Are
> >> you closing the documents where nothing is found at the earliest possble
> >> time?
> >> Tilman
> >>
> >> On 15.03.2023 17:21, Gilad Denneboom wrote:
> >>>> The question is, do you close the input files properly?
> >>> Yes, I do, but only at the very end of the operation, as I was merging
> >> all
> >>> these individual files into one large one, so I had to keep the
> originals
> >>> open until I save this merged file for the last time, or it would throw
> >> an
> >>> exception about the PDDocument being closed.
> >>> I know this is not the best way of merging documents, by the way. I
> might
> >>> try to switch to using PDFMergerUtility, instead.
> >>>
> >>> On Wed, Mar 15, 2023 at 8:30 AM Andreas Lehmkuehler <an...@lehmi.de>
> >>> wrote:
> >>>
> >>>> Hi Gilad,
> >>>>
> >>>> PDFBox is using a scratch file per document as long as you are using
> >>>> setupTempFileOnly. Handling thousands of documents ends up in
> thousands
> >> of
> >>>> scratch files. Those scratch files should be closed once the
> >> corresponding
> >>>> documents are closed.
> >>>>
> >>>> The question is, do you close the input files properly?
> >>>>
> >>>> Andreas
> >>>>
> >>>> Am 14.03.23 um 19:16 schrieb Gilad Denneboom:
> >>>>> Hi Maruan,
> >>>>>
> >>>>> Yes, I saw that, but it would be nice if this issue can be solved
> >> within
> >>>>> PDFBox, too.
> >>>>>
> >>>>> Gilad
> >>>>>
> >>>>> On Tue, Mar 14, 2023 at 4:52 PM Maruan Sahyoun <
> sahyoun@fileaffairs.de
> >>>
> >>>>> wrote:
> >>>>>
> >>>>>> You can set the ulimit on Linux - Standard is 1024 open files.
> >>>>>>
> >>>>>> BR
> >>>>>> Maruan
> >>>>>>
> >>>>>>> Am 14.03.2023 um 16:05 schrieb Gilad Denneboom <
> >>>>>> gilad.denneboom@gmail.com>:
> >>>>>>> Hi all,
> >>>>>>>
> >>>>>>> I created an application that opens many files (I'm talking
> >> thousands),
> >>>>>>> searching them for specific pages and then merges those pages into
> >> new
> >>>>>> PDF
> >>>>>>> files. The way I do it is by using the importPage command from the
> >>>>>> original
> >>>>>>> files into the split ones.
> >>>>>>> However, I'm getting an IOException ("Too many open files") from
> >>>>>>> ScratchFile after several thousands files were processed. I had a
> >> look
> >>>> at
> >>>>>>> the source code for that class and I think it might have to do
> with a
> >>>>>>> RandomAccessFile variable ("raf") not being properly closed.
> >>>>>>> All of the documents are opened using MemoryUsageSetting set to
> >>>>>>> setupTempFileOnly, by the way.
> >>>>>>> Could someone confirm this is the issue, and maybe help solve it?
> I'm
> >>>>>> using
> >>>>>>> PDFBox 2.0.26, by the way, and the app runs on a Mac.
> >>>>>>>
> >>>>>>> The stack-trace:
> >>>>>>> Exception in thread "main" java.io.IOException: Too many open files
> >>>>>>> at java.base/java.io.UnixFileSystem.createFileExclusively0(Native
> >>>>>> Method)
> >>>>>>> at
> >>>>>>> java.base/java.io
> >>>>>> .UnixFileSystem.createFileExclusively(UnixFileSystem.java:356)
> >>>>>>> at java.base/java.io.File.createTempFile(File.java:2179)
> >>>>>>> at org.apache.pdfbox.io.ScratchFile.enlarge(ScratchFile.java:217)
> >>>>>>> at org.apache.pdfbox.io
> .ScratchFile.getNewPage(ScratchFile.java:167)
> >>>>>>> at
> >>>>>>> org.apache.pdfbox.io
> >>>>>> .ScratchFileBuffer.addPage(ScratchFileBuffer.java:126)
> >>>>>>> at org.apache.pdfbox.io.ScratchFileBuffer.
> >>>>>> <init>(ScratchFileBuffer.java:84)
> >>>>>>> at org.apache.pdfbox.io
> >> .ScratchFile.createBuffer(ScratchFile.java:424)
> >>>>>>> at
> >>>>
> >>
> org.apache.pdfbox.cos.COSStream.createRaw0utputStream(COSStream.java:273)
> >>>>>>> at
> >>>>
> >>
> org.apache.pdfbox.pdfparser.COSParser.parseCOSStream(COSParser.java:1140)
> >>>>>>> at
> >>>>
> >>
> org.apache.pdfbox.pdfparser.COSParser.parseFileObject(COSParser.java:929)
> >>>>>>> at
> >>>>>>>
> >>>>
> >>
> org.apache.pdfbox.pdfparser.COSParser.parseObjectDynamically(COSParser.java:888)
> >>>>>>> at
> >>>>>>>
> >>>>
> >>
> org.apache.pdfbox.pdfparser.COSParser.parseObjectDynamically(COSParser.java:800)
> >>>>>>> at
> >>>>>>>
> >>>>
> >>
> org.apache.pdfbox.pdfparser.COSParser.parseDictObjects(COSParser.java:760)
> >>>>>>> at
> >>>> org.apache.pdfbox.pdfparser.PDFParser.initialParse(PDFParser.java:187)
> >>>>>>> at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:226)
> >>>>>>> at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1107)
> >>>>>>> at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1090)
> >>>>>>> at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1014)
> >>>>>>> at MergeStudentRecords_2021.main(MergeStudentRecords_2021.java:324)
> >>>>>>>
> >>>>>>> Thanks in advance!
> >>>>>>>
> >>>>>>> Gilad
> >>>>>>
> ---------------------------------------------------------------------
> >>>>>> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
> >>>>>> For additional commands, e-mail: users-help@pdfbox.apache.org
> >>>>>>
> >>>>>>
> >>>>
> >>>> ---------------------------------------------------------------------
> >>>> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
> >>>> For additional commands, e-mail: users-help@pdfbox.apache.org
> >>>>
> >>>>
> >>
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
> >> For additional commands, e-mail: users-help@pdfbox.apache.org
> >>
> >>
> >
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
> For additional commands, e-mail: users-help@pdfbox.apache.org
>
>

Re: "Too Many Open Files" IOException in ScratchFile

Posted by Andreas Lehmkuehler <an...@lehmi.de>.

Am 15.03.23 um 17:51 schrieb Gilad Denneboom:
> It's a bit more complicated than that. I have a small set of very large
> files with different pages matching different people. I need to match those
> pages based on some identifying code, and then extract them into either
> individual files (one per person) or a single merged file with those pages
> sorted by person. But yes, I do close the input files after scanning them,
> and then open them later on to extract the relevant pages from them, if
> needed. This is actually the reason I opted not to use PDFMergerUtility, as
> it would require me to extract all the individual pages as separate files,
> so I could merge them later on (as it's not possible to use it to only
> merge parts of files).
How about extracting those pages using the splitter? This will produce the file 
per person you are looking for. Use the merger to get the summary file. If there 
are to many files use several steps to do the merge.

Andreas

> 
> On Wed, Mar 15, 2023 at 5:28 PM Tilman Hausherr <TH...@t-online.de>
> wrote:
> 
>> Your text sounded like you're not picking stuff from all documents. Are
>> you closing the documents where nothing is found at the earliest possble
>> time?
>> Tilman
>>
>> On 15.03.2023 17:21, Gilad Denneboom wrote:
>>>> The question is, do you close the input files properly?
>>> Yes, I do, but only at the very end of the operation, as I was merging
>> all
>>> these individual files into one large one, so I had to keep the originals
>>> open until I save this merged file for the last time, or it would throw
>> an
>>> exception about the PDDocument being closed.
>>> I know this is not the best way of merging documents, by the way. I might
>>> try to switch to using PDFMergerUtility, instead.
>>>
>>> On Wed, Mar 15, 2023 at 8:30 AM Andreas Lehmkuehler <an...@lehmi.de>
>>> wrote:
>>>
>>>> Hi Gilad,
>>>>
>>>> PDFBox is using a scratch file per document as long as you are using
>>>> setupTempFileOnly. Handling thousands of documents ends up in thousands
>> of
>>>> scratch files. Those scratch files should be closed once the
>> corresponding
>>>> documents are closed.
>>>>
>>>> The question is, do you close the input files properly?
>>>>
>>>> Andreas
>>>>
>>>> Am 14.03.23 um 19:16 schrieb Gilad Denneboom:
>>>>> Hi Maruan,
>>>>>
>>>>> Yes, I saw that, but it would be nice if this issue can be solved
>> within
>>>>> PDFBox, too.
>>>>>
>>>>> Gilad
>>>>>
>>>>> On Tue, Mar 14, 2023 at 4:52 PM Maruan Sahyoun <sahyoun@fileaffairs.de
>>>
>>>>> wrote:
>>>>>
>>>>>> You can set the ulimit on Linux - Standard is 1024 open files.
>>>>>>
>>>>>> BR
>>>>>> Maruan
>>>>>>
>>>>>>> Am 14.03.2023 um 16:05 schrieb Gilad Denneboom <
>>>>>> gilad.denneboom@gmail.com>:
>>>>>>> Hi all,
>>>>>>>
>>>>>>> I created an application that opens many files (I'm talking
>> thousands),
>>>>>>> searching them for specific pages and then merges those pages into
>> new
>>>>>> PDF
>>>>>>> files. The way I do it is by using the importPage command from the
>>>>>> original
>>>>>>> files into the split ones.
>>>>>>> However, I'm getting an IOException ("Too many open files") from
>>>>>>> ScratchFile after several thousands files were processed. I had a
>> look
>>>> at
>>>>>>> the source code for that class and I think it might have to do with a
>>>>>>> RandomAccessFile variable ("raf") not being properly closed.
>>>>>>> All of the documents are opened using MemoryUsageSetting set to
>>>>>>> setupTempFileOnly, by the way.
>>>>>>> Could someone confirm this is the issue, and maybe help solve it? I'm
>>>>>> using
>>>>>>> PDFBox 2.0.26, by the way, and the app runs on a Mac.
>>>>>>>
>>>>>>> The stack-trace:
>>>>>>> Exception in thread "main" java.io.IOException: Too many open files
>>>>>>> at java.base/java.io.UnixFileSystem.createFileExclusively0(Native
>>>>>> Method)
>>>>>>> at
>>>>>>> java.base/java.io
>>>>>> .UnixFileSystem.createFileExclusively(UnixFileSystem.java:356)
>>>>>>> at java.base/java.io.File.createTempFile(File.java:2179)
>>>>>>> at org.apache.pdfbox.io.ScratchFile.enlarge(ScratchFile.java:217)
>>>>>>> at org.apache.pdfbox.io.ScratchFile.getNewPage(ScratchFile.java:167)
>>>>>>> at
>>>>>>> org.apache.pdfbox.io
>>>>>> .ScratchFileBuffer.addPage(ScratchFileBuffer.java:126)
>>>>>>> at org.apache.pdfbox.io.ScratchFileBuffer.
>>>>>> <init>(ScratchFileBuffer.java:84)
>>>>>>> at org.apache.pdfbox.io
>> .ScratchFile.createBuffer(ScratchFile.java:424)
>>>>>>> at
>>>>
>> org.apache.pdfbox.cos.COSStream.createRaw0utputStream(COSStream.java:273)
>>>>>>> at
>>>>
>> org.apache.pdfbox.pdfparser.COSParser.parseCOSStream(COSParser.java:1140)
>>>>>>> at
>>>>
>> org.apache.pdfbox.pdfparser.COSParser.parseFileObject(COSParser.java:929)
>>>>>>> at
>>>>>>>
>>>>
>> org.apache.pdfbox.pdfparser.COSParser.parseObjectDynamically(COSParser.java:888)
>>>>>>> at
>>>>>>>
>>>>
>> org.apache.pdfbox.pdfparser.COSParser.parseObjectDynamically(COSParser.java:800)
>>>>>>> at
>>>>>>>
>>>>
>> org.apache.pdfbox.pdfparser.COSParser.parseDictObjects(COSParser.java:760)
>>>>>>> at
>>>> org.apache.pdfbox.pdfparser.PDFParser.initialParse(PDFParser.java:187)
>>>>>>> at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:226)
>>>>>>> at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1107)
>>>>>>> at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1090)
>>>>>>> at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1014)
>>>>>>> at MergeStudentRecords_2021.main(MergeStudentRecords_2021.java:324)
>>>>>>>
>>>>>>> Thanks in advance!
>>>>>>>
>>>>>>> Gilad
>>>>>> ---------------------------------------------------------------------
>>>>>> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
>>>>>> For additional commands, e-mail: users-help@pdfbox.apache.org
>>>>>>
>>>>>>
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
>>>> For additional commands, e-mail: users-help@pdfbox.apache.org
>>>>
>>>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
>> For additional commands, e-mail: users-help@pdfbox.apache.org
>>
>>
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org

Re: "Too Many Open Files" IOException in ScratchFile

Posted by Gilad Denneboom <gi...@gmail.com>.

It's a bit more complicated than that. I have a small set of very large
files with different pages matching different people. I need to match those
pages based on some identifying code, and then extract them into either
individual files (one per person) or a single merged file with those pages
sorted by person. But yes, I do close the input files after scanning them,
and then open them later on to extract the relevant pages from them, if
needed. This is actually the reason I opted not to use PDFMergerUtility, as
it would require me to extract all the individual pages as separate files,
so I could merge them later on (as it's not possible to use it to only
merge parts of files).

On Wed, Mar 15, 2023 at 5:28 PM Tilman Hausherr <TH...@t-online.de>
wrote:

> Your text sounded like you're not picking stuff from all documents. Are
> you closing the documents where nothing is found at the earliest possble
> time?
> Tilman
>
> On 15.03.2023 17:21, Gilad Denneboom wrote:
> >> The question is, do you close the input files properly?
> > Yes, I do, but only at the very end of the operation, as I was merging
> all
> > these individual files into one large one, so I had to keep the originals
> > open until I save this merged file for the last time, or it would throw
> an
> > exception about the PDDocument being closed.
> > I know this is not the best way of merging documents, by the way. I might
> > try to switch to using PDFMergerUtility, instead.
> >
> > On Wed, Mar 15, 2023 at 8:30 AM Andreas Lehmkuehler <an...@lehmi.de>
> > wrote:
> >
> >> Hi Gilad,
> >>
> >> PDFBox is using a scratch file per document as long as you are using
> >> setupTempFileOnly. Handling thousands of documents ends up in thousands
> of
> >> scratch files. Those scratch files should be closed once the
> corresponding
> >> documents are closed.
> >>
> >> The question is, do you close the input files properly?
> >>
> >> Andreas
> >>
> >> Am 14.03.23 um 19:16 schrieb Gilad Denneboom:
> >>> Hi Maruan,
> >>>
> >>> Yes, I saw that, but it would be nice if this issue can be solved
> within
> >>> PDFBox, too.
> >>>
> >>> Gilad
> >>>
> >>> On Tue, Mar 14, 2023 at 4:52 PM Maruan Sahyoun <sahyoun@fileaffairs.de
> >
> >>> wrote:
> >>>
> >>>> You can set the ulimit on Linux - Standard is 1024 open files.
> >>>>
> >>>> BR
> >>>> Maruan
> >>>>
> >>>>> Am 14.03.2023 um 16:05 schrieb Gilad Denneboom <
> >>>> gilad.denneboom@gmail.com>:
> >>>>> Hi all,
> >>>>>
> >>>>> I created an application that opens many files (I'm talking
> thousands),
> >>>>> searching them for specific pages and then merges those pages into
> new
> >>>> PDF
> >>>>> files. The way I do it is by using the importPage command from the
> >>>> original
> >>>>> files into the split ones.
> >>>>> However, I'm getting an IOException ("Too many open files") from
> >>>>> ScratchFile after several thousands files were processed. I had a
> look
> >> at
> >>>>> the source code for that class and I think it might have to do with a
> >>>>> RandomAccessFile variable ("raf") not being properly closed.
> >>>>> All of the documents are opened using MemoryUsageSetting set to
> >>>>> setupTempFileOnly, by the way.
> >>>>> Could someone confirm this is the issue, and maybe help solve it? I'm
> >>>> using
> >>>>> PDFBox 2.0.26, by the way, and the app runs on a Mac.
> >>>>>
> >>>>> The stack-trace:
> >>>>> Exception in thread "main" java.io.IOException: Too many open files
> >>>>> at java.base/java.io.UnixFileSystem.createFileExclusively0(Native
> >>>> Method)
> >>>>> at
> >>>>> java.base/java.io
> >>>> .UnixFileSystem.createFileExclusively(UnixFileSystem.java:356)
> >>>>> at java.base/java.io.File.createTempFile(File.java:2179)
> >>>>> at org.apache.pdfbox.io.ScratchFile.enlarge(ScratchFile.java:217)
> >>>>> at org.apache.pdfbox.io.ScratchFile.getNewPage(ScratchFile.java:167)
> >>>>> at
> >>>>> org.apache.pdfbox.io
> >>>> .ScratchFileBuffer.addPage(ScratchFileBuffer.java:126)
> >>>>> at org.apache.pdfbox.io.ScratchFileBuffer.
> >>>> <init>(ScratchFileBuffer.java:84)
> >>>>> at org.apache.pdfbox.io
> .ScratchFile.createBuffer(ScratchFile.java:424)
> >>>>> at
> >>
> org.apache.pdfbox.cos.COSStream.createRaw0utputStream(COSStream.java:273)
> >>>>> at
> >>
> org.apache.pdfbox.pdfparser.COSParser.parseCOSStream(COSParser.java:1140)
> >>>>> at
> >>
> org.apache.pdfbox.pdfparser.COSParser.parseFileObject(COSParser.java:929)
> >>>>> at
> >>>>>
> >>
> org.apache.pdfbox.pdfparser.COSParser.parseObjectDynamically(COSParser.java:888)
> >>>>> at
> >>>>>
> >>
> org.apache.pdfbox.pdfparser.COSParser.parseObjectDynamically(COSParser.java:800)
> >>>>> at
> >>>>>
> >>
> org.apache.pdfbox.pdfparser.COSParser.parseDictObjects(COSParser.java:760)
> >>>>> at
> >> org.apache.pdfbox.pdfparser.PDFParser.initialParse(PDFParser.java:187)
> >>>>> at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:226)
> >>>>> at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1107)
> >>>>> at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1090)
> >>>>> at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1014)
> >>>>> at MergeStudentRecords_2021.main(MergeStudentRecords_2021.java:324)
> >>>>>
> >>>>> Thanks in advance!
> >>>>>
> >>>>> Gilad
> >>>> ---------------------------------------------------------------------
> >>>> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
> >>>> For additional commands, e-mail: users-help@pdfbox.apache.org
> >>>>
> >>>>
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
> >> For additional commands, e-mail: users-help@pdfbox.apache.org
> >>
> >>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
> For additional commands, e-mail: users-help@pdfbox.apache.org
>
>

Re: "Too Many Open Files" IOException in ScratchFile

Posted by Tilman Hausherr <TH...@t-online.de>.

Your text sounded like you're not picking stuff from all documents. Are 
you closing the documents where nothing is found at the earliest possble 
time?
Tilman

On 15.03.2023 17:21, Gilad Denneboom wrote:
>> The question is, do you close the input files properly?
> Yes, I do, but only at the very end of the operation, as I was merging all
> these individual files into one large one, so I had to keep the originals
> open until I save this merged file for the last time, or it would throw an
> exception about the PDDocument being closed.
> I know this is not the best way of merging documents, by the way. I might
> try to switch to using PDFMergerUtility, instead.
>
> On Wed, Mar 15, 2023 at 8:30 AM Andreas Lehmkuehler <an...@lehmi.de>
> wrote:
>
>> Hi Gilad,
>>
>> PDFBox is using a scratch file per document as long as you are using
>> setupTempFileOnly. Handling thousands of documents ends up in thousands of
>> scratch files. Those scratch files should be closed once the corresponding
>> documents are closed.
>>
>> The question is, do you close the input files properly?
>>
>> Andreas
>>
>> Am 14.03.23 um 19:16 schrieb Gilad Denneboom:
>>> Hi Maruan,
>>>
>>> Yes, I saw that, but it would be nice if this issue can be solved within
>>> PDFBox, too.
>>>
>>> Gilad
>>>
>>> On Tue, Mar 14, 2023 at 4:52 PM Maruan Sahyoun <sa...@fileaffairs.de>
>>> wrote:
>>>
>>>> You can set the ulimit on Linux - Standard is 1024 open files.
>>>>
>>>> BR
>>>> Maruan
>>>>
>>>>> Am 14.03.2023 um 16:05 schrieb Gilad Denneboom <
>>>> gilad.denneboom@gmail.com>:
>>>>> Hi all,
>>>>>
>>>>> I created an application that opens many files (I'm talking thousands),
>>>>> searching them for specific pages and then merges those pages into new
>>>> PDF
>>>>> files. The way I do it is by using the importPage command from the
>>>> original
>>>>> files into the split ones.
>>>>> However, I'm getting an IOException ("Too many open files") from
>>>>> ScratchFile after several thousands files were processed. I had a look
>> at
>>>>> the source code for that class and I think it might have to do with a
>>>>> RandomAccessFile variable ("raf") not being properly closed.
>>>>> All of the documents are opened using MemoryUsageSetting set to
>>>>> setupTempFileOnly, by the way.
>>>>> Could someone confirm this is the issue, and maybe help solve it? I'm
>>>> using
>>>>> PDFBox 2.0.26, by the way, and the app runs on a Mac.
>>>>>
>>>>> The stack-trace:
>>>>> Exception in thread "main" java.io.IOException: Too many open files
>>>>> at java.base/java.io.UnixFileSystem.createFileExclusively0(Native
>>>> Method)
>>>>> at
>>>>> java.base/java.io
>>>> .UnixFileSystem.createFileExclusively(UnixFileSystem.java:356)
>>>>> at java.base/java.io.File.createTempFile(File.java:2179)
>>>>> at org.apache.pdfbox.io.ScratchFile.enlarge(ScratchFile.java:217)
>>>>> at org.apache.pdfbox.io.ScratchFile.getNewPage(ScratchFile.java:167)
>>>>> at
>>>>> org.apache.pdfbox.io
>>>> .ScratchFileBuffer.addPage(ScratchFileBuffer.java:126)
>>>>> at org.apache.pdfbox.io.ScratchFileBuffer.
>>>> <init>(ScratchFileBuffer.java:84)
>>>>> at org.apache.pdfbox.io.ScratchFile.createBuffer(ScratchFile.java:424)
>>>>> at
>> org.apache.pdfbox.cos.COSStream.createRaw0utputStream(COSStream.java:273)
>>>>> at
>> org.apache.pdfbox.pdfparser.COSParser.parseCOSStream(COSParser.java:1140)
>>>>> at
>> org.apache.pdfbox.pdfparser.COSParser.parseFileObject(COSParser.java:929)
>>>>> at
>>>>>
>> org.apache.pdfbox.pdfparser.COSParser.parseObjectDynamically(COSParser.java:888)
>>>>> at
>>>>>
>> org.apache.pdfbox.pdfparser.COSParser.parseObjectDynamically(COSParser.java:800)
>>>>> at
>>>>>
>> org.apache.pdfbox.pdfparser.COSParser.parseDictObjects(COSParser.java:760)
>>>>> at
>> org.apache.pdfbox.pdfparser.PDFParser.initialParse(PDFParser.java:187)
>>>>> at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:226)
>>>>> at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1107)
>>>>> at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1090)
>>>>> at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1014)
>>>>> at MergeStudentRecords_2021.main(MergeStudentRecords_2021.java:324)
>>>>>
>>>>> Thanks in advance!
>>>>>
>>>>> Gilad
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
>>>> For additional commands, e-mail: users-help@pdfbox.apache.org
>>>>
>>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
>> For additional commands, e-mail: users-help@pdfbox.apache.org
>>
>>


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org

Re: "Too Many Open Files" IOException in ScratchFile

Posted by Gilad Denneboom <gi...@gmail.com>.

> The question is, do you close the input files properly?
Yes, I do, but only at the very end of the operation, as I was merging all
these individual files into one large one, so I had to keep the originals
open until I save this merged file for the last time, or it would throw an
exception about the PDDocument being closed.
I know this is not the best way of merging documents, by the way. I might
try to switch to using PDFMergerUtility, instead.

On Wed, Mar 15, 2023 at 8:30 AM Andreas Lehmkuehler <an...@lehmi.de>
wrote:

> Hi Gilad,
>
> PDFBox is using a scratch file per document as long as you are using
> setupTempFileOnly. Handling thousands of documents ends up in thousands of
> scratch files. Those scratch files should be closed once the corresponding
> documents are closed.
>
> The question is, do you close the input files properly?
>
> Andreas
>
> Am 14.03.23 um 19:16 schrieb Gilad Denneboom:
> > Hi Maruan,
> >
> > Yes, I saw that, but it would be nice if this issue can be solved within
> > PDFBox, too.
> >
> > Gilad
> >
> > On Tue, Mar 14, 2023 at 4:52 PM Maruan Sahyoun <sa...@fileaffairs.de>
> > wrote:
> >
> >> You can set the ulimit on Linux - Standard is 1024 open files.
> >>
> >> BR
> >> Maruan
> >>
> >>> Am 14.03.2023 um 16:05 schrieb Gilad Denneboom <
> >> gilad.denneboom@gmail.com>:
> >>>
> >>> Hi all,
> >>>
> >>> I created an application that opens many files (I'm talking thousands),
> >>> searching them for specific pages and then merges those pages into new
> >> PDF
> >>> files. The way I do it is by using the importPage command from the
> >> original
> >>> files into the split ones.
> >>> However, I'm getting an IOException ("Too many open files") from
> >>> ScratchFile after several thousands files were processed. I had a look
> at
> >>> the source code for that class and I think it might have to do with a
> >>> RandomAccessFile variable ("raf") not being properly closed.
> >>> All of the documents are opened using MemoryUsageSetting set to
> >>> setupTempFileOnly, by the way.
> >>> Could someone confirm this is the issue, and maybe help solve it? I'm
> >> using
> >>> PDFBox 2.0.26, by the way, and the app runs on a Mac.
> >>>
> >>> The stack-trace:
> >>> Exception in thread "main" java.io.IOException: Too many open files
> >>> at java.base/java.io.UnixFileSystem.createFileExclusively0(Native
> >> Method)
> >>> at
> >>> java.base/java.io
> >> .UnixFileSystem.createFileExclusively(UnixFileSystem.java:356)
> >>> at java.base/java.io.File.createTempFile(File.java:2179)
> >>> at org.apache.pdfbox.io.ScratchFile.enlarge(ScratchFile.java:217)
> >>> at org.apache.pdfbox.io.ScratchFile.getNewPage(ScratchFile.java:167)
> >>> at
> >>> org.apache.pdfbox.io
> >> .ScratchFileBuffer.addPage(ScratchFileBuffer.java:126)
> >>> at org.apache.pdfbox.io.ScratchFileBuffer.
> >> <init>(ScratchFileBuffer.java:84)
> >>> at org.apache.pdfbox.io.ScratchFile.createBuffer(ScratchFile.java:424)
> >>> at
> >>
> org.apache.pdfbox.cos.COSStream.createRaw0utputStream(COSStream.java:273)
> >>> at
> >>
> org.apache.pdfbox.pdfparser.COSParser.parseCOSStream(COSParser.java:1140)
> >>> at
> >>
> org.apache.pdfbox.pdfparser.COSParser.parseFileObject(COSParser.java:929)
> >>> at
> >>>
> >>
> org.apache.pdfbox.pdfparser.COSParser.parseObjectDynamically(COSParser.java:888)
> >>> at
> >>>
> >>
> org.apache.pdfbox.pdfparser.COSParser.parseObjectDynamically(COSParser.java:800)
> >>> at
> >>>
> >>
> org.apache.pdfbox.pdfparser.COSParser.parseDictObjects(COSParser.java:760)
> >>> at
> org.apache.pdfbox.pdfparser.PDFParser.initialParse(PDFParser.java:187)
> >>> at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:226)
> >>> at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1107)
> >>> at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1090)
> >>> at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1014)
> >>> at MergeStudentRecords_2021.main(MergeStudentRecords_2021.java:324)
> >>>
> >>> Thanks in advance!
> >>>
> >>> Gilad
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
> >> For additional commands, e-mail: users-help@pdfbox.apache.org
> >>
> >>
> >
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
> For additional commands, e-mail: users-help@pdfbox.apache.org
>
>

AW: Re: "Too Many Open Files" IOException in ScratchFile

Posted by Tilman Hausherr <TH...@t-online.de>.

Let's say your software runs without hitting the problem. Is there any 
pdfbox*.tmp files left in your temp directory? Then, it would mean you're 
not closing the input files like Andreas suspects. (Or that there is a bug 
in our software that doesn't occur in the build tests)

Tilman



--- Original-Nachricht ---
Von: Andreas Lehmkuehler
Betreff: Re: "Too Many Open Files" IOException in ScratchFile
Datum: 15. März 2023, 8:29
An: users@pdfbox.apache.org




Hi Gilad,

PDFBox is using a scratch file per document as long as you are using
setupTempFileOnly. Handling thousands of documents ends up in thousands of
scratch files. Those scratch files should be closed once the corresponding
documents are closed.

The question is, do you close the input files properly?

Andreas

Am 14.03.23 um 19:16 schrieb Gilad Denneboom:
> Hi Maruan,
>
> Yes, I saw that, but it would be nice if this issue can be solved within
> PDFBox, too.
>
> Gilad
>
> On Tue, Mar 14, 2023 at 4:52 PM Maruan Sahyoun <sahyoun@fileaffairs.de
<ma...@fileaffairs.de> >
> wrote:
>
>> You can set the ulimit on Linux - Standard is 1024 open files.
>>
>> BR
>> Maruan
>>
>>> Am 14.03.2023 um 16:05 schrieb Gilad Denneboom <
>> gilad.denneboom@gmail.com <ma...@gmail.com> >:
>>>
>>> Hi all,
>>>
>>> I created an application that opens many files (I'm talking thousands),
>>> searching them for specific pages and then merges those pages into new
>> PDF
>>> files. The way I do it is by using the importPage command from the
>> original
>>> files into the split ones.
>>> However, I'm getting an IOException ("Too many open files") from
>>> ScratchFile after several thousands files were processed. I had a look 
at
>>> the source code for that class and I think it might have to do with a
>>> RandomAccessFile variable ("raf") not being properly closed.
>>> All of the documents are opened using MemoryUsageSetting set to
>>> setupTempFileOnly, by the way.
>>> Could someone confirm this is the issue, and maybe help solve it? I'm
>> using
>>> PDFBox 2.0.26, by the way, and the app runs on a Mac.
>>>
>>> The stack-trace:
>>> Exception in thread "main"<http://java.io.IOException> : Too many open 
files
>>> at
<http://java.base/java.io.UnixFileSystem.createFileExclusively0(Native>
>> Method)
>>> at
>>><http://java.base/java.io>
>> .<http://UnixFileSystem.createFileExclusively(UnixFileSystem.java:356> 
)
>>> at<http://java.base/java.io.File.createTempFile(File.java:2179> )
>>> at
<http://org.apache.pdfbox.io.ScratchFile.enlarge(ScratchFile.java:217> )
>>> at
<http://org.apache.pdfbox.io.ScratchFile.getNewPage(ScratchFile.java:167> 
)
>>> at
>>><http://org.apache.pdfbox.io>
>> .<http://ScratchFileBuffer.addPage(ScratchFileBuffer.java:126> )
>>> at<http://org.apache.pdfbox.io.ScratchFileBuffer> .
>> <init>(ScratchFileBuffer.java:84)
>>> at
<http://org.apache.pdfbox.io.ScratchFile.createBuffer(ScratchFile.java:424> 
)
>>> at
>>
<http://org.apache.pdfbox.cos.COSStream.createRaw0utputStream(COSStream.java:273> 
)
>>> at
>>
<http://org.apache.pdfbox.pdfparser.COSParser.parseCOSStream(COSParser.java:1140> 
)
>>> at
>>
<http://org.apache.pdfbox.pdfparser.COSParser.parseFileObject(COSParser.java:929> 
)
>>> at
>>>
>>
<http://org.apache.pdfbox.pdfparser.COSParser.parseObjectDynamically(COSParser.java:888> 
)
>>> at
>>>
>>
<http://org.apache.pdfbox.pdfparser.COSParser.parseObjectDynamically(COSParser.java:800> 
)
>>> at
>>>
>>
<http://org.apache.pdfbox.pdfparser.COSParser.parseDictObjects(COSParser.java:760> 
)
>>> at
<http://org.apache.pdfbox.pdfparser.PDFParser.initialParse(PDFParser.java:187> 
)
>>> at
<http://org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:226> )
>>> at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1107)
>>> at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1090)
>>> at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1014)
>>> at MergeStudentRecords_
<http://2021.main(MergeStudentRecords_2021.java:324> )
>>>
>>> Thanks in advance!
>>>
>>> Gilad
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
<ma...@pdfbox.apache.org>
>> For additional commands, e-mail: users-help@pdfbox.apache.org
<ma...@pdfbox.apache.org>
>>
>>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
<ma...@pdfbox.apache.org>
For additional commands, e-mail: users-help@pdfbox.apache.org
<ma...@pdfbox.apache.org>

Re: "Too Many Open Files" IOException in ScratchFile

Posted by Andreas Lehmkuehler <an...@lehmi.de>.

Hi Gilad,

PDFBox is using a scratch file per document as long as you are using 
setupTempFileOnly. Handling thousands of documents ends up in thousands of 
scratch files. Those scratch files should be closed once the corresponding 
documents are closed.

The question is, do you close the input files properly?

Andreas

Am 14.03.23 um 19:16 schrieb Gilad Denneboom:
> Hi Maruan,
> 
> Yes, I saw that, but it would be nice if this issue can be solved within
> PDFBox, too.
> 
> Gilad
> 
> On Tue, Mar 14, 2023 at 4:52 PM Maruan Sahyoun <sa...@fileaffairs.de>
> wrote:
> 
>> You can set the ulimit on Linux - Standard is 1024 open files.
>>
>> BR
>> Maruan
>>
>>> Am 14.03.2023 um 16:05 schrieb Gilad Denneboom <
>> gilad.denneboom@gmail.com>:
>>>
>>> Hi all,
>>>
>>> I created an application that opens many files (I'm talking thousands),
>>> searching them for specific pages and then merges those pages into new
>> PDF
>>> files. The way I do it is by using the importPage command from the
>> original
>>> files into the split ones.
>>> However, I'm getting an IOException ("Too many open files") from
>>> ScratchFile after several thousands files were processed. I had a look at
>>> the source code for that class and I think it might have to do with a
>>> RandomAccessFile variable ("raf") not being properly closed.
>>> All of the documents are opened using MemoryUsageSetting set to
>>> setupTempFileOnly, by the way.
>>> Could someone confirm this is the issue, and maybe help solve it? I'm
>> using
>>> PDFBox 2.0.26, by the way, and the app runs on a Mac.
>>>
>>> The stack-trace:
>>> Exception in thread "main" java.io.IOException: Too many open files
>>> at java.base/java.io.UnixFileSystem.createFileExclusively0(Native
>> Method)
>>> at
>>> java.base/java.io
>> .UnixFileSystem.createFileExclusively(UnixFileSystem.java:356)
>>> at java.base/java.io.File.createTempFile(File.java:2179)
>>> at org.apache.pdfbox.io.ScratchFile.enlarge(ScratchFile.java:217)
>>> at org.apache.pdfbox.io.ScratchFile.getNewPage(ScratchFile.java:167)
>>> at
>>> org.apache.pdfbox.io
>> .ScratchFileBuffer.addPage(ScratchFileBuffer.java:126)
>>> at org.apache.pdfbox.io.ScratchFileBuffer.
>> <init>(ScratchFileBuffer.java:84)
>>> at org.apache.pdfbox.io.ScratchFile.createBuffer(ScratchFile.java:424)
>>> at
>> org.apache.pdfbox.cos.COSStream.createRaw0utputStream(COSStream.java:273)
>>> at
>> org.apache.pdfbox.pdfparser.COSParser.parseCOSStream(COSParser.java:1140)
>>> at
>> org.apache.pdfbox.pdfparser.COSParser.parseFileObject(COSParser.java:929)
>>> at
>>>
>> org.apache.pdfbox.pdfparser.COSParser.parseObjectDynamically(COSParser.java:888)
>>> at
>>>
>> org.apache.pdfbox.pdfparser.COSParser.parseObjectDynamically(COSParser.java:800)
>>> at
>>>
>> org.apache.pdfbox.pdfparser.COSParser.parseDictObjects(COSParser.java:760)
>>> at org.apache.pdfbox.pdfparser.PDFParser.initialParse(PDFParser.java:187)
>>> at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:226)
>>> at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1107)
>>> at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1090)
>>> at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1014)
>>> at MergeStudentRecords_2021.main(MergeStudentRecords_2021.java:324)
>>>
>>> Thanks in advance!
>>>
>>> Gilad
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
>> For additional commands, e-mail: users-help@pdfbox.apache.org
>>
>>
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org

Re: "Too Many Open Files" IOException in ScratchFile

Posted by Gilad Denneboom <gi...@gmail.com>.

Hi Maruan,

Yes, I saw that, but it would be nice if this issue can be solved within
PDFBox, too.

Gilad

On Tue, Mar 14, 2023 at 4:52 PM Maruan Sahyoun <sa...@fileaffairs.de>
wrote:

> You can set the ulimit on Linux - Standard is 1024 open files.
>
> BR
> Maruan
>
> > Am 14.03.2023 um 16:05 schrieb Gilad Denneboom <
> gilad.denneboom@gmail.com>:
> >
> > Hi all,
> >
> > I created an application that opens many files (I'm talking thousands),
> > searching them for specific pages and then merges those pages into new
> PDF
> > files. The way I do it is by using the importPage command from the
> original
> > files into the split ones.
> > However, I'm getting an IOException ("Too many open files") from
> > ScratchFile after several thousands files were processed. I had a look at
> > the source code for that class and I think it might have to do with a
> > RandomAccessFile variable ("raf") not being properly closed.
> > All of the documents are opened using MemoryUsageSetting set to
> > setupTempFileOnly, by the way.
> > Could someone confirm this is the issue, and maybe help solve it? I'm
> using
> > PDFBox 2.0.26, by the way, and the app runs on a Mac.
> >
> > The stack-trace:
> > Exception in thread "main" java.io.IOException: Too many open files
> > at java.base/java.io.UnixFileSystem.createFileExclusively0(Native
> Method)
> > at
> > java.base/java.io
> .UnixFileSystem.createFileExclusively(UnixFileSystem.java:356)
> > at java.base/java.io.File.createTempFile(File.java:2179)
> > at org.apache.pdfbox.io.ScratchFile.enlarge(ScratchFile.java:217)
> > at org.apache.pdfbox.io.ScratchFile.getNewPage(ScratchFile.java:167)
> > at
> > org.apache.pdfbox.io
> .ScratchFileBuffer.addPage(ScratchFileBuffer.java:126)
> > at org.apache.pdfbox.io.ScratchFileBuffer.
> <init>(ScratchFileBuffer.java:84)
> > at org.apache.pdfbox.io.ScratchFile.createBuffer(ScratchFile.java:424)
> > at
> org.apache.pdfbox.cos.COSStream.createRaw0utputStream(COSStream.java:273)
> > at
> org.apache.pdfbox.pdfparser.COSParser.parseCOSStream(COSParser.java:1140)
> > at
> org.apache.pdfbox.pdfparser.COSParser.parseFileObject(COSParser.java:929)
> > at
> >
> org.apache.pdfbox.pdfparser.COSParser.parseObjectDynamically(COSParser.java:888)
> > at
> >
> org.apache.pdfbox.pdfparser.COSParser.parseObjectDynamically(COSParser.java:800)
> > at
> >
> org.apache.pdfbox.pdfparser.COSParser.parseDictObjects(COSParser.java:760)
> > at org.apache.pdfbox.pdfparser.PDFParser.initialParse(PDFParser.java:187)
> > at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:226)
> > at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1107)
> > at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1090)
> > at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1014)
> > at MergeStudentRecords_2021.main(MergeStudentRecords_2021.java:324)
> >
> > Thanks in advance!
> >
> > Gilad
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
> For additional commands, e-mail: users-help@pdfbox.apache.org
>
>

Re: "Too Many Open Files" IOException in ScratchFile

Posted by Maruan Sahyoun <sa...@fileaffairs.de>.

You can set the ulimit on Linux - Standard is 1024 open files.

BR
Maruan 

> Am 14.03.2023 um 16:05 schrieb Gilad Denneboom <gi...@gmail.com>:
> 
> Hi all,
> 
> I created an application that opens many files (I'm talking thousands),
> searching them for specific pages and then merges those pages into new PDF
> files. The way I do it is by using the importPage command from the original
> files into the split ones.
> However, I'm getting an IOException ("Too many open files") from
> ScratchFile after several thousands files were processed. I had a look at
> the source code for that class and I think it might have to do with a
> RandomAccessFile variable ("raf") not being properly closed.
> All of the documents are opened using MemoryUsageSetting set to
> setupTempFileOnly, by the way.
> Could someone confirm this is the issue, and maybe help solve it? I'm using
> PDFBox 2.0.26, by the way, and the app runs on a Mac.
> 
> The stack-trace:
> Exception in thread "main" java.io.IOException: Too many open files
> at java.base/java.io.UnixFileSystem.createFileExclusively0(Native Method)
> at
> java.base/java.io.UnixFileSystem.createFileExclusively(UnixFileSystem.java:356)
> at java.base/java.io.File.createTempFile(File.java:2179)
> at org.apache.pdfbox.io.ScratchFile.enlarge(ScratchFile.java:217)
> at org.apache.pdfbox.io.ScratchFile.getNewPage(ScratchFile.java:167)
> at
> org.apache.pdfbox.io.ScratchFileBuffer.addPage(ScratchFileBuffer.java:126)
> at org.apache.pdfbox.io.ScratchFileBuffer. <init>(ScratchFileBuffer.java:84)
> at org.apache.pdfbox.io.ScratchFile.createBuffer(ScratchFile.java:424)
> at org.apache.pdfbox.cos.COSStream.createRaw0utputStream(COSStream.java:273)
> at org.apache.pdfbox.pdfparser.COSParser.parseCOSStream(COSParser.java:1140)
> at org.apache.pdfbox.pdfparser.COSParser.parseFileObject(COSParser.java:929)
> at
> org.apache.pdfbox.pdfparser.COSParser.parseObjectDynamically(COSParser.java:888)
> at
> org.apache.pdfbox.pdfparser.COSParser.parseObjectDynamically(COSParser.java:800)
> at
> org.apache.pdfbox.pdfparser.COSParser.parseDictObjects(COSParser.java:760)
> at org.apache.pdfbox.pdfparser.PDFParser.initialParse(PDFParser.java:187)
> at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:226)
> at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1107)
> at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1090)
> at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1014)
> at MergeStudentRecords_2021.main(MergeStudentRecords_2021.java:324)
> 
> Thanks in advance!
> 
> Gilad

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org

Re: "Too Many Open Files" IOException in ScratchFile

Posted by Andreas Lehmkuehler <an...@lehmi.de>.

Am 15.03.23 um 08:37 schrieb Tilman Hausherr:
> On 14.03.2023 16:05, Gilad Denneboom wrote:
>> However, I'm getting an IOException ("Too many open files") from
>> ScratchFile after several thousands files were processed. I had a look at
>> the source code for that class and I think it might have to do with a
>> RandomAccessFile variable ("raf") not being properly closed.
> 
> Can you build from source and run your application? If yes, change
> 
> pdfbox\src\main\java\org\apache\pdfbox\io\ScratchFileBuffer.java
> 
> and add
> 
> IOUtils.closeQuietly(pageHandler);
> 
> near the end. Does this make things better? (It changes nothing in build)
I won't do that. Depending on the specific situation the ScratchFile instance 
may hold more than one buffer and calling close will close all of them most 
likely too early.

Andreas

> 
> Tilman
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
> For additional commands, e-mail: users-help@pdfbox.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org

Re: "Too Many Open Files" IOException in ScratchFile

Posted by Tilman Hausherr <TH...@t-online.de>.

On 14.03.2023 16:05, Gilad Denneboom wrote:
> However, I'm getting an IOException ("Too many open files") from
> ScratchFile after several thousands files were processed. I had a look at
> the source code for that class and I think it might have to do with a
> RandomAccessFile variable ("raf") not being properly closed.

Can you build from source and run your application? If yes, change

pdfbox\src\main\java\org\apache\pdfbox\io\ScratchFileBuffer.java

and add

IOUtils.closeQuietly(pageHandler);

near the end. Does this make things better? (It changes nothing in build)

Tilman



---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org