You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@pdfbox.apache.org by Jérôme Wacongne <ch...@c4-soft.com> on 2011/02/01 18:02:41 UTC
undeleted pdf box temp files (+~JF*.tmp)
Hi,
I have an issue with pdfbox leaving tmp files (+~JF*.tmp) in the JVM temp
folder.
After enough runs, that temp folder just gets full...
Here are the Groovy code source and unit test to reproduce (ok.pdf is a file
parsed with no error, ko.pdf is a file with one corrupted object, but
parsing can continue) :
package com.c4soft.mvntest
import org.apache.pdfbox.pdmodel.PDDocument
import org.apache.pdfbox.util.PDFTextStripper
/**
* Example Groovy class.
*/
class MvnTest {
static main(args) {
if(!args || args.size() != 2) {
println 'Exactly two arguments are required:\n- path to the pdf to read\n-
path to the file were to put the pdf content'
return
}
File pdfFile = new File(args[0])
if(!pdfFile.exists()) {
println "${args[0]} does not exist"
return
}
InputStream pdfStream = new FileInputStream(pdfFile)
OutputStream text = new FileOutputStream(args[1])
PDDocument document
try {
document = PDDocument.load(pdfStream, true)
PDFTextStripper stripper = new PDFTextStripper('UTF-8')
text << stripper.getText(document)
} finally {
document.close()
}
}
}
package com.c4soft.mvntest;
import static org.junit.Assert.*
import java.util.regex.Pattern
import groovy.io.FileType
import org.junit.Test
class MvnTestTest {
@Test
public void testMain() {
File tmpDir = new File(System.getProperty('java.io.tmpdir'))
Pattern p = ~/\+~JF.+\.tmp/
tmpDir.eachFileMatch(FileType.FILES, p) { File tmpFile ->
tmpFile.delete()
}
Integer cnt = 0
tmpDir.eachFileMatch(FileType.FILES, p) { File tmpFile ->
cnt++
}
assertEquals(0, cnt)
OutputStream out = new FileOutputStream('out.txt')
Integer i = 1
while(i < 21) {
MvnTest.main('src/test/resources/ok.pdf', 'ok.txt')
cnt = 0
tmpDir.eachFileMatch(FileType.FILES, p) { File tmpFile ->
cnt++
}
out << "${i} ok: ${cnt}" << '\n'
i++
}
out << '\n'
i = 1
while(i < 21) {
MvnTest.main('src/test/resources/ko.pdf', 'ko.txt')
cnt = 0
tmpDir.eachFileMatch(FileType.FILES, p) { File tmpFile ->
cnt++
}
out << "${i} ko: ${cnt}" << '\n'
i++
}
}
}
And here is corresponding output (count of pdfbox temp files in JVM temp
folder):
1 ok: 35
2 ok: 36
3 ok: 37
4 ok: 42
5 ok: 43
6 ok: 46
7 ok: 47
8 ok: 50
9 ok: 50
10 ok: 53
11 ok: 53
12 ok: 56
13 ok: 65
14 ok: 68
15 ok: 73
16 ok: 75
17 ok: 80
18 ok: 81
19 ok: 84
20 ok: 85
1 ko: 62
2 ko: 62
3 ko: 62
4 ko: 62
5 ko: 62
6 ko: 62
7 ko: 62
8 ko: 62
9 ko: 71
10 ko: 65
11 ko: 67
12 ko: 68
13 ko: 68
14 ko: 68
15 ko: 68
16 ko: 68
17 ko: 68
18 ko: 77
19 ko: 71
20 ko: 73
After JUnit returns, 3 tmp files are left in temp folder
Has anyone ever noticed something like that ?
Do I miss use the API ?
Regards,
Ch4mp
Re: undeleted pdf box temp files (+~JF*.tmp)
Posted by Jérôme Wacongne <ch...@c4-soft.com>.
Thank you Kevin :D
2011/2/12 Kevin Jackson <kj...@ebscohost.com>
> I submitted a patch with Jira PDFBOX-959 which fixed this problem if you
> are only doing text extraction.
>
> Kevin
>
> -----Original Message-----
> From: Jérôme Wacongne [mailto:ch4mp@c4-soft.com]
> Sent: Friday, February 11, 2011 5:40 PM
> To: users@pdfbox.apache.org
> Subject: Re: undeleted pdf box temp files (+~JF*.tmp)
>
> I'm glad others noticed because it is a production issue to me.
> We had to set up a batch that cleans old temp files in JVM tmp folder not
> to
> fill it up. Kind of really dirty.
>
> Best regards
>
> 2011/2/11 Ryan McKinley <ry...@gmail.com>
>
> > >
> > > Has anyone ever noticed something like that ?
> > > Do I miss use the API ?
> > >
> >
> > I get the same behavior.... lots and lots of +~JF*.tmp files...
> >
> > last time i looked into it, it is a tempfile that java makes when you
> > reference fonts.
> >
> > would be nice to somehow clean them up automatically though.
> >
> > ryan
> >
>
Re: undeleted pdf box temp files (+~JF*.tmp)
Posted by Andreas Lehmkuehler <an...@lehmi.de>.
Hi,
Am 12.02.2011 02:05, schrieb Kevin Jackson:
> I submitted a patch with Jira PDFBOX-959 which fixed this problem if you are only doing text extraction.
I added the patch in revision 1070073.
Thnaks again for the contribution!
BR
Andreas Lehmkühler
RE: undeleted pdf box temp files (+~JF*.tmp)
Posted by Kevin Jackson <kj...@ebscohost.com>.
I submitted a patch with Jira PDFBOX-959 which fixed this problem if you are only doing text extraction.
Kevin
-----Original Message-----
From: Jérôme Wacongne [mailto:ch4mp@c4-soft.com]
Sent: Friday, February 11, 2011 5:40 PM
To: users@pdfbox.apache.org
Subject: Re: undeleted pdf box temp files (+~JF*.tmp)
I'm glad others noticed because it is a production issue to me.
We had to set up a batch that cleans old temp files in JVM tmp folder not to
fill it up. Kind of really dirty.
Best regards
2011/2/11 Ryan McKinley <ry...@gmail.com>
> >
> > Has anyone ever noticed something like that ?
> > Do I miss use the API ?
> >
>
> I get the same behavior.... lots and lots of +~JF*.tmp files...
>
> last time i looked into it, it is a tempfile that java makes when you
> reference fonts.
>
> would be nice to somehow clean them up automatically though.
>
> ryan
>
Re: undeleted pdf box temp files (+~JF*.tmp)
Posted by Jérôme Wacongne <ch...@c4-soft.com>.
I'm glad others noticed because it is a production issue to me.
We had to set up a batch that cleans old temp files in JVM tmp folder not to
fill it up. Kind of really dirty.
Best regards
2011/2/11 Ryan McKinley <ry...@gmail.com>
> >
> > Has anyone ever noticed something like that ?
> > Do I miss use the API ?
> >
>
> I get the same behavior.... lots and lots of +~JF*.tmp files...
>
> last time i looked into it, it is a tempfile that java makes when you
> reference fonts.
>
> would be nice to somehow clean them up automatically though.
>
> ryan
>
Re: undeleted pdf box temp files (+~JF*.tmp)
Posted by Andreas Lehmkuehler <an...@lehmi.de>.
Hi,
Am 11.02.2011 22:54, schrieb Ryan McKinley:
>>
>> Has anyone ever noticed something like that ?
>> Do I miss use the API ?
>>
>
> I get the same behavior.... lots and lots of +~JF*.tmp files...
>
> last time i looked into it, it is a tempfile that java makes when you
> reference fonts.
>
> would be nice to somehow clean them up automatically though.
This may be related to PDFBOX-959 [1]
BR
Andreas Lehmkühler
[1] https://issues.apache.org/jira/browse/PDFBOX-959
Re: undeleted pdf box temp files (+~JF*.tmp)
Posted by Ryan McKinley <ry...@gmail.com>.
>
> Has anyone ever noticed something like that ?
> Do I miss use the API ?
>
I get the same behavior.... lots and lots of +~JF*.tmp files...
last time i looked into it, it is a tempfile that java makes when you
reference fonts.
would be nice to somehow clean them up automatically though.
ryan