You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@pdfbox.apache.org by Lars Juel Jensen <la...@gmail.com> on 2024/01/31 08:50:39 UTC
Loading a PDF using InputStream
In PDFBox2 I could do:
PDDocument.load(inputStream, MemoryUsageSetting.setupTempFileOnly())
But there is no equivalent to this in PDFBox3. How do I read a PDF from an
inputstream?
Re: Loading a PDF using InputStream
Posted by Tilman Hausherr <TH...@t-online.de>.
P.S.: thank you for having investigated and reported this!
Tilman
On 01.02.2024 16:06, Tilman Hausherr wrote:
> Oh. I had looked at the trunk and not at 3.0. That was likely a
> mistake in refactoring. Fixed in
>
> https://issues.apache.org/jira/browse/PDFBOX-5757
>
> and you get get a snapshot here
> https://repository.apache.org/content/groups/snapshots/org/apache/pdfbox/pdfbox-app/3.0.2-SNAPSHOT/
>
>
> Tilman
>
>
> On 01.02.2024 15:25, Lars Juel Jensen wrote:
>> That is weird.. The source file I am looking at for version 3.0.1
>> does not
>> pass it:
>> -->
>> https://github.com/apache/pdfbox/blob/3.0.1/pdfbox/src/main/java/org/apache/pdfbox/pdfparser/PDFParser.java#L91
>>
>>
>> On Wed, Jan 31, 2024 at 4:57 PM Tilman Hausherr <TH...@t-online.de>
>> wrote:
>>
>>> On 31.01.2024 16:19, Lars Juel Jensen wrote:
>>>> Well that's my problem.. It works with PDFBox2 with reasonable sized
>>> files.
>>>> When it comes to the big ones it crashes.. So reading the migration
>>>> guide
>>>> for PDFBox3.0 I thought I saw some light in the tunnel as it says I
>>>> can
>>>> create my own reader and stream cache. I see that I can provide my own
>>>> RandomAccessReader when I call Loader.loadPDF, but the loadPDF method
>>> that
>>>> takes a StreamCacheCreate function does not work as promised as the
>>>> StreamCacheCreateFunction is not passed from PDFParser to COSParser in
>>> the
>>>> PDFParser constructor. This works in v3.0.0, but not in v3.0.1. I
>>>> guess
>>>> this is a bug?
>>> I don't know if there is a bug, but it is passed:
>>>
>>> public PDFParser(RandomAccessRead source, String
>>> decryptionPassword, InputStream keyStore,
>>> String alias, StreamCacheCreateFunction
>>> streamCacheCreateFunction) throws IOException
>>> {
>>> super(source, decryptionPassword, keyStore, alias,
>>> streamCacheCreateFunction);
>>> }
>>>
>>> and here's COSParser:
>>>
>>> public COSParser(RandomAccessRead source, String password,
>>> InputStream keyStore,
>>> String keyAlias, StreamCacheCreateFunction
>>> streamCacheCreateFunction) throws IOException
>>> {
>>> super(source);
>>> this.password = password;
>>> this.keyAlias = keyAlias;
>>> fileLen = source.length();
>>> keyStoreInputStream = keyStore;
>>> init(streamCacheCreateFunction);
>>> }
>>>
>>> If you think 3.0.1 has a bigger memory footprint than 3.0.0, can you
>>> create a scenario to reproduce this? Preferably without using a
>>> container.
>>>
>>> Tilman
>>>
>>>> On Wed, Jan 31, 2024 at 3:46 PM Tilman Hausherr
>>>> <TH...@t-online.de>
>>>> wrote:
>>>>
>>>>> On 31.01.2024 14:48, Lars Juel Jensen wrote:
>>>>>> This creates another problem for me. I am running PDFBox in a
>>> kubernetes
>>>>>> cluster on premises with limited resources. I can not setup
>>>>>> persistent
>>>>>> volume claims nor ephemeral volumes, and I can not change how my
>>>>>> pods
>>> are
>>>>>> started. I have limited resources and an emptyDir that is mounted on
>>> /tmp
>>>>>> where the temporary files go. The emptyDir is mapped to a portion of
>>> the
>>>>>> kubernetes node's memory, and this memory is shared with many other
>>>>>> services. All in all - I need to keep a very low memory and tempFile
>>>>>> footprint, hence the InputStream. Using RandomAccessReadBuffer
>>>>>> with an
>>>>>> InputStream loads the entire PDF into memory, and I can encounter
>>>>>> PDF
>>>>>> documents that can be over 1GB in size. So loading everything into
>>> memory
>>>>>> is not an option.
>>>>> You can try to create your own class extending RandomAccessRead.
>>>>>
>>>>> If your /tmp is mapped on main memory, then it doesn't make sense
>>>>> to use
>>>>> a temp file at all, you're just wasting time.
>>>>>
>>>>> Btw PDFBox 2 was also loading the whole PDF file into memory (or
>>>>> into a
>>>>> scratch file) and had an even bigger footprint because it was also
>>>>> parsing the complete PDF. So if your project was working with
>>>>> PDFBox 2
>>>>> then it should work with PDFBox 3.
>>>>>
>>>>> Tilman
>>>>>
>>>>>
>>>>>
>>>>>> On Wed, Jan 31, 2024 at 10:10 AM Tilman Hausherr <
>>> THausherr@t-online.de>
>>>>>> wrote:
>>>>>>
>>>>>>> On 31.01.2024 09:50, Lars Juel Jensen wrote:
>>>>>>>> In PDFBox2 I could do:
>>>>>>>>
>>>>>>>> PDDocument.load(inputStream,
>>>>>>>> MemoryUsageSetting.setupTempFileOnly())
>>>>>>>>
>>>>>>>> But there is no equivalent to this in PDFBox3. How do I read a PDF
>>> from
>>>>>>> an
>>>>>>>> inputstream?
>>>>>>>>
>>>>>>> |Loader.loadPDF(new RandomAccessReadBuffer(inputStream),
>>>>>>> IOUtils.createTempFileOnlyStreamCache());|
>>>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
>>>>> For additional commands, e-mail: users-help@pdfbox.apache.org
>>>>>
>>>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
>>> For additional commands, e-mail: users-help@pdfbox.apache.org
>>>
>>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
> For additional commands, e-mail: users-help@pdfbox.apache.org
>
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org
Re: Loading a PDF using InputStream
Posted by Tilman Hausherr <TH...@t-online.de>.
Oh. I had looked at the trunk and not at 3.0. That was likely a mistake
in refactoring. Fixed in
https://issues.apache.org/jira/browse/PDFBOX-5757
and you get get a snapshot here
https://repository.apache.org/content/groups/snapshots/org/apache/pdfbox/pdfbox-app/3.0.2-SNAPSHOT/
Tilman
On 01.02.2024 15:25, Lars Juel Jensen wrote:
> That is weird.. The source file I am looking at for version 3.0.1 does not
> pass it:
> -->
> https://github.com/apache/pdfbox/blob/3.0.1/pdfbox/src/main/java/org/apache/pdfbox/pdfparser/PDFParser.java#L91
>
> On Wed, Jan 31, 2024 at 4:57 PM Tilman Hausherr <TH...@t-online.de>
> wrote:
>
>> On 31.01.2024 16:19, Lars Juel Jensen wrote:
>>> Well that's my problem.. It works with PDFBox2 with reasonable sized
>> files.
>>> When it comes to the big ones it crashes.. So reading the migration guide
>>> for PDFBox3.0 I thought I saw some light in the tunnel as it says I can
>>> create my own reader and stream cache. I see that I can provide my own
>>> RandomAccessReader when I call Loader.loadPDF, but the loadPDF method
>> that
>>> takes a StreamCacheCreate function does not work as promised as the
>>> StreamCacheCreateFunction is not passed from PDFParser to COSParser in
>> the
>>> PDFParser constructor. This works in v3.0.0, but not in v3.0.1. I guess
>>> this is a bug?
>> I don't know if there is a bug, but it is passed:
>>
>> public PDFParser(RandomAccessRead source, String
>> decryptionPassword, InputStream keyStore,
>> String alias, StreamCacheCreateFunction
>> streamCacheCreateFunction) throws IOException
>> {
>> super(source, decryptionPassword, keyStore, alias,
>> streamCacheCreateFunction);
>> }
>>
>> and here's COSParser:
>>
>> public COSParser(RandomAccessRead source, String password,
>> InputStream keyStore,
>> String keyAlias, StreamCacheCreateFunction
>> streamCacheCreateFunction) throws IOException
>> {
>> super(source);
>> this.password = password;
>> this.keyAlias = keyAlias;
>> fileLen = source.length();
>> keyStoreInputStream = keyStore;
>> init(streamCacheCreateFunction);
>> }
>>
>> If you think 3.0.1 has a bigger memory footprint than 3.0.0, can you
>> create a scenario to reproduce this? Preferably without using a container.
>>
>> Tilman
>>
>>> On Wed, Jan 31, 2024 at 3:46 PM Tilman Hausherr <TH...@t-online.de>
>>> wrote:
>>>
>>>> On 31.01.2024 14:48, Lars Juel Jensen wrote:
>>>>> This creates another problem for me. I am running PDFBox in a
>> kubernetes
>>>>> cluster on premises with limited resources. I can not setup persistent
>>>>> volume claims nor ephemeral volumes, and I can not change how my pods
>> are
>>>>> started. I have limited resources and an emptyDir that is mounted on
>> /tmp
>>>>> where the temporary files go. The emptyDir is mapped to a portion of
>> the
>>>>> kubernetes node's memory, and this memory is shared with many other
>>>>> services. All in all - I need to keep a very low memory and tempFile
>>>>> footprint, hence the InputStream. Using RandomAccessReadBuffer with an
>>>>> InputStream loads the entire PDF into memory, and I can encounter PDF
>>>>> documents that can be over 1GB in size. So loading everything into
>> memory
>>>>> is not an option.
>>>> You can try to create your own class extending RandomAccessRead.
>>>>
>>>> If your /tmp is mapped on main memory, then it doesn't make sense to use
>>>> a temp file at all, you're just wasting time.
>>>>
>>>> Btw PDFBox 2 was also loading the whole PDF file into memory (or into a
>>>> scratch file) and had an even bigger footprint because it was also
>>>> parsing the complete PDF. So if your project was working with PDFBox 2
>>>> then it should work with PDFBox 3.
>>>>
>>>> Tilman
>>>>
>>>>
>>>>
>>>>> On Wed, Jan 31, 2024 at 10:10 AM Tilman Hausherr <
>> THausherr@t-online.de>
>>>>> wrote:
>>>>>
>>>>>> On 31.01.2024 09:50, Lars Juel Jensen wrote:
>>>>>>> In PDFBox2 I could do:
>>>>>>>
>>>>>>> PDDocument.load(inputStream, MemoryUsageSetting.setupTempFileOnly())
>>>>>>>
>>>>>>> But there is no equivalent to this in PDFBox3. How do I read a PDF
>> from
>>>>>> an
>>>>>>> inputstream?
>>>>>>>
>>>>>> |Loader.loadPDF(new RandomAccessReadBuffer(inputStream),
>>>>>> IOUtils.createTempFileOnlyStreamCache());|
>>>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
>>>> For additional commands, e-mail: users-help@pdfbox.apache.org
>>>>
>>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
>> For additional commands, e-mail: users-help@pdfbox.apache.org
>>
>>
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org
Re: Loading a PDF using InputStream
Posted by Lars Juel Jensen <la...@gmail.com>.
That is weird.. The source file I am looking at for version 3.0.1 does not
pass it:
-->
https://github.com/apache/pdfbox/blob/3.0.1/pdfbox/src/main/java/org/apache/pdfbox/pdfparser/PDFParser.java#L91
On Wed, Jan 31, 2024 at 4:57 PM Tilman Hausherr <TH...@t-online.de>
wrote:
> On 31.01.2024 16:19, Lars Juel Jensen wrote:
> > Well that's my problem.. It works with PDFBox2 with reasonable sized
> files.
> > When it comes to the big ones it crashes.. So reading the migration guide
> > for PDFBox3.0 I thought I saw some light in the tunnel as it says I can
> > create my own reader and stream cache. I see that I can provide my own
> > RandomAccessReader when I call Loader.loadPDF, but the loadPDF method
> that
> > takes a StreamCacheCreate function does not work as promised as the
> > StreamCacheCreateFunction is not passed from PDFParser to COSParser in
> the
> > PDFParser constructor. This works in v3.0.0, but not in v3.0.1. I guess
> > this is a bug?
>
> I don't know if there is a bug, but it is passed:
>
> public PDFParser(RandomAccessRead source, String
> decryptionPassword, InputStream keyStore,
> String alias, StreamCacheCreateFunction
> streamCacheCreateFunction) throws IOException
> {
> super(source, decryptionPassword, keyStore, alias,
> streamCacheCreateFunction);
> }
>
> and here's COSParser:
>
> public COSParser(RandomAccessRead source, String password,
> InputStream keyStore,
> String keyAlias, StreamCacheCreateFunction
> streamCacheCreateFunction) throws IOException
> {
> super(source);
> this.password = password;
> this.keyAlias = keyAlias;
> fileLen = source.length();
> keyStoreInputStream = keyStore;
> init(streamCacheCreateFunction);
> }
>
> If you think 3.0.1 has a bigger memory footprint than 3.0.0, can you
> create a scenario to reproduce this? Preferably without using a container.
>
> Tilman
>
> >
> > On Wed, Jan 31, 2024 at 3:46 PM Tilman Hausherr <TH...@t-online.de>
> > wrote:
> >
> >> On 31.01.2024 14:48, Lars Juel Jensen wrote:
> >>> This creates another problem for me. I am running PDFBox in a
> kubernetes
> >>> cluster on premises with limited resources. I can not setup persistent
> >>> volume claims nor ephemeral volumes, and I can not change how my pods
> are
> >>> started. I have limited resources and an emptyDir that is mounted on
> /tmp
> >>> where the temporary files go. The emptyDir is mapped to a portion of
> the
> >>> kubernetes node's memory, and this memory is shared with many other
> >>> services. All in all - I need to keep a very low memory and tempFile
> >>> footprint, hence the InputStream. Using RandomAccessReadBuffer with an
> >>> InputStream loads the entire PDF into memory, and I can encounter PDF
> >>> documents that can be over 1GB in size. So loading everything into
> memory
> >>> is not an option.
> >> You can try to create your own class extending RandomAccessRead.
> >>
> >> If your /tmp is mapped on main memory, then it doesn't make sense to use
> >> a temp file at all, you're just wasting time.
> >>
> >> Btw PDFBox 2 was also loading the whole PDF file into memory (or into a
> >> scratch file) and had an even bigger footprint because it was also
> >> parsing the complete PDF. So if your project was working with PDFBox 2
> >> then it should work with PDFBox 3.
> >>
> >> Tilman
> >>
> >>
> >>
> >>> On Wed, Jan 31, 2024 at 10:10 AM Tilman Hausherr <
> THausherr@t-online.de>
> >>> wrote:
> >>>
> >>>> On 31.01.2024 09:50, Lars Juel Jensen wrote:
> >>>>> In PDFBox2 I could do:
> >>>>>
> >>>>> PDDocument.load(inputStream, MemoryUsageSetting.setupTempFileOnly())
> >>>>>
> >>>>> But there is no equivalent to this in PDFBox3. How do I read a PDF
> from
> >>>> an
> >>>>> inputstream?
> >>>>>
> >>>> |Loader.loadPDF(new RandomAccessReadBuffer(inputStream),
> >>>> IOUtils.createTempFileOnlyStreamCache());|
> >>>>
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
> >> For additional commands, e-mail: users-help@pdfbox.apache.org
> >>
> >>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
> For additional commands, e-mail: users-help@pdfbox.apache.org
>
>
Re: Loading a PDF using InputStream
Posted by Tilman Hausherr <TH...@t-online.de>.
On 31.01.2024 16:19, Lars Juel Jensen wrote:
> Well that's my problem.. It works with PDFBox2 with reasonable sized files.
> When it comes to the big ones it crashes.. So reading the migration guide
> for PDFBox3.0 I thought I saw some light in the tunnel as it says I can
> create my own reader and stream cache. I see that I can provide my own
> RandomAccessReader when I call Loader.loadPDF, but the loadPDF method that
> takes a StreamCacheCreate function does not work as promised as the
> StreamCacheCreateFunction is not passed from PDFParser to COSParser in the
> PDFParser constructor. This works in v3.0.0, but not in v3.0.1. I guess
> this is a bug?
I don't know if there is a bug, but it is passed:
public PDFParser(RandomAccessRead source, String
decryptionPassword, InputStream keyStore,
String alias, StreamCacheCreateFunction
streamCacheCreateFunction) throws IOException
{
super(source, decryptionPassword, keyStore, alias,
streamCacheCreateFunction);
}
and here's COSParser:
public COSParser(RandomAccessRead source, String password,
InputStream keyStore,
String keyAlias, StreamCacheCreateFunction
streamCacheCreateFunction) throws IOException
{
super(source);
this.password = password;
this.keyAlias = keyAlias;
fileLen = source.length();
keyStoreInputStream = keyStore;
init(streamCacheCreateFunction);
}
If you think 3.0.1 has a bigger memory footprint than 3.0.0, can you
create a scenario to reproduce this? Preferably without using a container.
Tilman
>
> On Wed, Jan 31, 2024 at 3:46 PM Tilman Hausherr <TH...@t-online.de>
> wrote:
>
>> On 31.01.2024 14:48, Lars Juel Jensen wrote:
>>> This creates another problem for me. I am running PDFBox in a kubernetes
>>> cluster on premises with limited resources. I can not setup persistent
>>> volume claims nor ephemeral volumes, and I can not change how my pods are
>>> started. I have limited resources and an emptyDir that is mounted on /tmp
>>> where the temporary files go. The emptyDir is mapped to a portion of the
>>> kubernetes node's memory, and this memory is shared with many other
>>> services. All in all - I need to keep a very low memory and tempFile
>>> footprint, hence the InputStream. Using RandomAccessReadBuffer with an
>>> InputStream loads the entire PDF into memory, and I can encounter PDF
>>> documents that can be over 1GB in size. So loading everything into memory
>>> is not an option.
>> You can try to create your own class extending RandomAccessRead.
>>
>> If your /tmp is mapped on main memory, then it doesn't make sense to use
>> a temp file at all, you're just wasting time.
>>
>> Btw PDFBox 2 was also loading the whole PDF file into memory (or into a
>> scratch file) and had an even bigger footprint because it was also
>> parsing the complete PDF. So if your project was working with PDFBox 2
>> then it should work with PDFBox 3.
>>
>> Tilman
>>
>>
>>
>>> On Wed, Jan 31, 2024 at 10:10 AM Tilman Hausherr <TH...@t-online.de>
>>> wrote:
>>>
>>>> On 31.01.2024 09:50, Lars Juel Jensen wrote:
>>>>> In PDFBox2 I could do:
>>>>>
>>>>> PDDocument.load(inputStream, MemoryUsageSetting.setupTempFileOnly())
>>>>>
>>>>> But there is no equivalent to this in PDFBox3. How do I read a PDF from
>>>> an
>>>>> inputstream?
>>>>>
>>>> |Loader.loadPDF(new RandomAccessReadBuffer(inputStream),
>>>> IOUtils.createTempFileOnlyStreamCache());|
>>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
>> For additional commands, e-mail: users-help@pdfbox.apache.org
>>
>>
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org
Re: Loading a PDF using InputStream
Posted by Lars Juel Jensen <la...@gmail.com>.
Well that's my problem.. It works with PDFBox2 with reasonable sized files.
When it comes to the big ones it crashes.. So reading the migration guide
for PDFBox3.0 I thought I saw some light in the tunnel as it says I can
create my own reader and stream cache. I see that I can provide my own
RandomAccessReader when I call Loader.loadPDF, but the loadPDF method that
takes a StreamCacheCreate function does not work as promised as the
StreamCacheCreateFunction is not passed from PDFParser to COSParser in the
PDFParser constructor. This works in v3.0.0, but not in v3.0.1. I guess
this is a bug?
On Wed, Jan 31, 2024 at 3:46 PM Tilman Hausherr <TH...@t-online.de>
wrote:
> On 31.01.2024 14:48, Lars Juel Jensen wrote:
> > This creates another problem for me. I am running PDFBox in a kubernetes
> > cluster on premises with limited resources. I can not setup persistent
> > volume claims nor ephemeral volumes, and I can not change how my pods are
> > started. I have limited resources and an emptyDir that is mounted on /tmp
> > where the temporary files go. The emptyDir is mapped to a portion of the
> > kubernetes node's memory, and this memory is shared with many other
> > services. All in all - I need to keep a very low memory and tempFile
> > footprint, hence the InputStream. Using RandomAccessReadBuffer with an
> > InputStream loads the entire PDF into memory, and I can encounter PDF
> > documents that can be over 1GB in size. So loading everything into memory
> > is not an option.
>
> You can try to create your own class extending RandomAccessRead.
>
> If your /tmp is mapped on main memory, then it doesn't make sense to use
> a temp file at all, you're just wasting time.
>
> Btw PDFBox 2 was also loading the whole PDF file into memory (or into a
> scratch file) and had an even bigger footprint because it was also
> parsing the complete PDF. So if your project was working with PDFBox 2
> then it should work with PDFBox 3.
>
> Tilman
>
>
>
> >
> > On Wed, Jan 31, 2024 at 10:10 AM Tilman Hausherr <TH...@t-online.de>
> > wrote:
> >
> >> On 31.01.2024 09:50, Lars Juel Jensen wrote:
> >>> In PDFBox2 I could do:
> >>>
> >>> PDDocument.load(inputStream, MemoryUsageSetting.setupTempFileOnly())
> >>>
> >>> But there is no equivalent to this in PDFBox3. How do I read a PDF from
> >> an
> >>> inputstream?
> >>>
> >> |Loader.loadPDF(new RandomAccessReadBuffer(inputStream),
> >> IOUtils.createTempFileOnlyStreamCache());|
> >>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
> For additional commands, e-mail: users-help@pdfbox.apache.org
>
>
Re: Loading a PDF using InputStream
Posted by Tilman Hausherr <TH...@t-online.de>.
On 31.01.2024 14:48, Lars Juel Jensen wrote:
> This creates another problem for me. I am running PDFBox in a kubernetes
> cluster on premises with limited resources. I can not setup persistent
> volume claims nor ephemeral volumes, and I can not change how my pods are
> started. I have limited resources and an emptyDir that is mounted on /tmp
> where the temporary files go. The emptyDir is mapped to a portion of the
> kubernetes node's memory, and this memory is shared with many other
> services. All in all - I need to keep a very low memory and tempFile
> footprint, hence the InputStream. Using RandomAccessReadBuffer with an
> InputStream loads the entire PDF into memory, and I can encounter PDF
> documents that can be over 1GB in size. So loading everything into memory
> is not an option.
You can try to create your own class extending RandomAccessRead.
If your /tmp is mapped on main memory, then it doesn't make sense to use
a temp file at all, you're just wasting time.
Btw PDFBox 2 was also loading the whole PDF file into memory (or into a
scratch file) and had an even bigger footprint because it was also
parsing the complete PDF. So if your project was working with PDFBox 2
then it should work with PDFBox 3.
Tilman
>
> On Wed, Jan 31, 2024 at 10:10 AM Tilman Hausherr <TH...@t-online.de>
> wrote:
>
>> On 31.01.2024 09:50, Lars Juel Jensen wrote:
>>> In PDFBox2 I could do:
>>>
>>> PDDocument.load(inputStream, MemoryUsageSetting.setupTempFileOnly())
>>>
>>> But there is no equivalent to this in PDFBox3. How do I read a PDF from
>> an
>>> inputstream?
>>>
>> |Loader.loadPDF(new RandomAccessReadBuffer(inputStream),
>> IOUtils.createTempFileOnlyStreamCache());|
>>
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org
Re: Loading a PDF using InputStream
Posted by Lars Juel Jensen <la...@gmail.com>.
This creates another problem for me. I am running PDFBox in a kubernetes
cluster on premises with limited resources. I can not setup persistent
volume claims nor ephemeral volumes, and I can not change how my pods are
started. I have limited resources and an emptyDir that is mounted on /tmp
where the temporary files go. The emptyDir is mapped to a portion of the
kubernetes node's memory, and this memory is shared with many other
services. All in all - I need to keep a very low memory and tempFile
footprint, hence the InputStream. Using RandomAccessReadBuffer with an
InputStream loads the entire PDF into memory, and I can encounter PDF
documents that can be over 1GB in size. So loading everything into memory
is not an option.
On Wed, Jan 31, 2024 at 10:10 AM Tilman Hausherr <TH...@t-online.de>
wrote:
> On 31.01.2024 09:50, Lars Juel Jensen wrote:
> > In PDFBox2 I could do:
> >
> > PDDocument.load(inputStream, MemoryUsageSetting.setupTempFileOnly())
> >
> > But there is no equivalent to this in PDFBox3. How do I read a PDF from
> an
> > inputstream?
> >
>
> |Loader.loadPDF(new RandomAccessReadBuffer(inputStream),
> IOUtils.createTempFileOnlyStreamCache());|
>
Re: Loading a PDF using InputStream
Posted by Tilman Hausherr <TH...@t-online.de>.
On 31.01.2024 09:50, Lars Juel Jensen wrote:
> In PDFBox2 I could do:
>
> PDDocument.load(inputStream, MemoryUsageSetting.setupTempFileOnly())
>
> But there is no equivalent to this in PDFBox3. How do I read a PDF from an
> inputstream?
>
|Loader.loadPDF(new RandomAccessReadBuffer(inputStream),
IOUtils.createTempFileOnlyStreamCache());|