You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by Maruan Sahyoun <sa...@fileaffairs.de> on 2013/04/02 17:22:33 UTC

Possible regression in pdfbox-1.8.1?

Hi,

there might be a regression in pdfbox-1.8.1 compared to pdfbox-1.7.1.

The samples under the users@pdfbox.apache.org email list for 'Error when opening multi-page PDF generated by PDFBox' work fine in 1.7.1 but are corrupt as is PDFBOX-1556. Could some else check these to verify it's not my environment?

Kind regards
 
Maruan Sahyoun




Re: Possible regression in pdfbox-1.8.1?

Posted by Maruan Sahyoun <sa...@fileaffairs.de>.
Sry - obviously I meant pdfbox-1.8.0

Maruan Sahyoun

Am 02.04.2013 um 17:22 schrieb Maruan Sahyoun <sa...@fileaffairs.de>:

> Hi,
> 
> there might be a regression in pdfbox-1.8.1 compared to pdfbox-1.7.1.
> 
> The samples under the users@pdfbox.apache.org email list for 'Error when opening multi-page PDF generated by PDFBox' work fine in 1.7.1 but are corrupt as is PDFBOX-1556. Could some else check these to verify it's not my environment?
> 
> Kind regards
> 
> Maruan Sahyoun
> 
> 
> 

Re: Possible regression in pdfbox-1.8.1?

Posted by Maruan Sahyoun <sa...@fileaffairs.de>.
But the testfiles show that the xref is written as text and not as stream so I think the fact that there is a stream in the doc ommits writing a trailer at all

Maruan Sahyoun

Am 04.04.2013 um 19:26 schrieb Andreas Lehmkuehler <an...@lehmi.de>:

> Hi,
> 
> 
> Am 04.04.2013 18:58, schrieb Maruan Sahyoun:
>> Hi Andreas,
>> 
>> I'm currently on a trip but could it be the test in line 595 of COSWriter? I have some more time tomorrow
> No, it seems to be in line 1142. In the case of a XRef stream the trailer isn't
> written right after the xref table, which is correct according to the spec. But
> in that case the trailer object isn't written at all. I'm still trying to
> understandthe circumstances ....
> 
> BR
> Andreas Lehmkühler
> 
>> Kind regards
>> 
>> Maruan Sahyoun
>> 
>> Am 04.04.2013 um 18:45 schrieb Andreas Lehmkuehler <an...@lehmi.de>:
>> 
>>> Hi,
>>> 
>>> Am 04.04.2013 16:41, schrieb Maruan Sahyoun:
>>>> Hi Andreas,
>>>> 
>>>> it's related to PDFBOX-1551 as you already guessed. In addition PDF-1556 is related to PDFBOX-1551. I linked them in Jira.
>>> Thanks.
>>> 
>>> I ran a quick test. As suggested I simply load and save a pdf and got 2
>>> different results:
>>> 
>>> - cweb.pdf (from our testsuite) works fine
>>> - one of the pdfs attached to PDFBOX-1551 fails (the trailer is missing)
>>> 
>>> The reason is the trailer. The first pdf uses a xref table the second a xref
>>> xstream. After saving the second the trailer is missing. The regression was
>>> introduced with PDFBOX-1513.
>>> 
>>> I'm still investigating the details ... any help is appreciated.
>>> 
>>> BR
>>> Andreas Lehmkühler
> 

Re: Possible regression in pdfbox-1.8.1?

Posted by Andreas Lehmkuehler <an...@lehmi.de>.
Hi,


Am 04.04.2013 18:58, schrieb Maruan Sahyoun:
> Hi Andreas,
>
> I'm currently on a trip but could it be the test in line 595 of COSWriter? I have some more time tomorrow
No, it seems to be in line 1142. In the case of a XRef stream the trailer isn't
written right after the xref table, which is correct according to the spec. But
in that case the trailer object isn't written at all. I'm still trying to
understandthe circumstances ....

BR
Andreas Lehmkühler

> Kind regards
>
> Maruan Sahyoun
>
> Am 04.04.2013 um 18:45 schrieb Andreas Lehmkuehler <an...@lehmi.de>:
>
>> Hi,
>>
>> Am 04.04.2013 16:41, schrieb Maruan Sahyoun:
>>> Hi Andreas,
>>>
>>> it's related to PDFBOX-1551 as you already guessed. In addition PDF-1556 is related to PDFBOX-1551. I linked them in Jira.
>> Thanks.
>>
>> I ran a quick test. As suggested I simply load and save a pdf and got 2
>> different results:
>>
>> - cweb.pdf (from our testsuite) works fine
>> - one of the pdfs attached to PDFBOX-1551 fails (the trailer is missing)
>>
>> The reason is the trailer. The first pdf uses a xref table the second a xref
>> xstream. After saving the second the trailer is missing. The regression was
>> introduced with PDFBOX-1513.
>>
>> I'm still investigating the details ... any help is appreciated.
>>
>> BR
>> Andreas Lehmkühler
>>
>>
>
>


Re: Possible regression in pdfbox-1.8.1?

Posted by Maruan Sahyoun <sa...@fileaffairs.de>.
Hi Andreas,

I'm currently on a trip but could it be the test in line 595 of COSWriter? I have some more time tomorrow

Kind regards 

Maruan Sahyoun

Am 04.04.2013 um 18:45 schrieb Andreas Lehmkuehler <an...@lehmi.de>:

> Hi,
> 
> Am 04.04.2013 16:41, schrieb Maruan Sahyoun:
>> Hi Andreas,
>> 
>> it's related to PDFBOX-1551 as you already guessed. In addition PDF-1556 is related to PDFBOX-1551. I linked them in Jira.
> Thanks.
> 
> I ran a quick test. As suggested I simply load and save a pdf and got 2
> different results:
> 
> - cweb.pdf (from our testsuite) works fine
> - one of the pdfs attached to PDFBOX-1551 fails (the trailer is missing)
> 
> The reason is the trailer. The first pdf uses a xref table the second a xref
> xstream. After saving the second the trailer is missing. The regression was
> introduced with PDFBOX-1513.
> 
> I'm still investigating the details ... any help is appreciated.
> 
> BR
> Andreas Lehmkühler
> 
> 


Re: Possible regression in pdfbox-1.8.1?

Posted by Maruan Sahyoun <sa...@fileaffairs.de>.
Hi Thomas,

I've tested with the files provided during recent complaints on the users mailing list as well as PDFBOX-1556.

The files are accessible now using the jar you provided so the fix seems to solve the issue. As I'm on travel I can't do a full test though.

Maruan Sahyoun

Am 04.04.2013 um 20:53 schrieb Thomas Chojecki <in...@rayman2200.de>:

> Hi all,
> I have patched the COSWriter and I would ask you to test it please. I attached the patch.
> 
> Due to temp. lacking access to my apache webspace, I uploaded the patched jar on my private server.
> 
> http://media-nation.de/~rayman2200/pdfbox-1.9.0-SNAPSHOT.jar
> 
> Best regards
> Thomas
> 
> If no one complain I would commit it.
> 
> Zitat von Thomas Chojecki <in...@rayman2200.de>:
> 
>> Zitat von Andreas Lehmkuehler <an...@lehmi.de>:
>> 
>>> Hi,
>> 
>> Hi Andreas,
>> 
>>> Am 04.04.2013 16:41, schrieb Maruan Sahyoun:
>>>> Hi Andreas,
>>>> 
>>>> it's related to PDFBOX-1551 as you already guessed. In addition PDF-1556 is related to PDFBOX-1551. I linked them in Jira.
>>> Thanks.
>>> 
>>> I ran a quick test. As suggested I simply load and save a pdf and got 2
>>> different results:
>>> 
>>> - cweb.pdf (from our testsuite) works fine
>>> - one of the pdfs attached to PDFBOX-1551 fails (the trailer is missing)
>>> 
>>> The reason is the trailer. The first pdf uses a xref table the second a xref
>>> xstream. After saving the second the trailer is missing. The regression was
>>> introduced with PDFBOX-1513.
>> 
>> I don't look at the file but as you already suggested, this changes come with PDFBOX-1513. I will take a look and try to fix it.
>> 
>>> I'm still investigating the details ... any help is appreciated.
>>> 
>>> BR
>>> Andreas Lehmkühler
>> 
>> Best regards
>> Thomas
> 
> 
> <pdfbox-1551.patch>


Re: Possible regression in pdfbox-1.8.1?

Posted by Maruan Sahyoun <sa...@fileaffairs.de>.
Hi,

Am 05.04.2013 um 11:10 schrieb Andreas Lehmkühler <an...@lehmi.de>:

> Hi,
> 
> Thomas Chojecki <in...@rayman2200.de> hat am 5. April 2013 um 10:31 geschrieben:
>> Am 05.04.2013 07:45, schrieb Andreas Lehmkuehler:
>>> Hi,
>> Hi,
>> 
>>> Am 04.04.2013 20:53, schrieb Thomas Chojecki:
>>>> Hi all,
>>>> I have patched the COSWriter and I would ask you to test it please. I
>>>> attached
>>>> the patch.
>>> Thanks for jumping in. Looks good to me, it restores the old
>>> behaviour. I'll run
>>> some additional tests at the weekend. BTW the written trailer isn't
>>> correct in
>>> any case. If the origin trailer contains a xrefstream, some of the
>>> stream
>>> information (the decodeparms, the type) is still there. But that's not
>>> new,
>>> 1.7.1 already did the same.
>> doesn't notice that the trailer keep this informations. I've already
>> started to port the xref stream writer for new created or altered
>> documents. This should fix this trailer issue too. But at the moment it
>> doesn't work as expected.
> IMHO this isn't urgent, as it works for years now ;-)
> 

Agree that this doesn't need an immediate fix. For a general fix maybe that should be part of a PDFBOX 2 discussion e.g. instead of patching here and there maybe some of pdfbox current architecture and approach needs some revisiting.

>>> I'll cut a new release soon, if everythings works well and nobody
>>> objects.
>> Thanks and sorry for the extra work.
> No need to worry.
> 
>>>> Due to temp. lacking access to my apache webspace, I uploaded the
>>>> patched jar on
>>>> my private server.
>>>> 
>>>> http://media-nation.de/~rayman2200/pdfbox-1.9.0-SNAPSHOT.jar
>>>> 
>>>> Best regards
>>>> Thomas
>>>> 
>>>> If no one complain I would commit it.
>>> +1, please use PDFBOX-1551 as reference.
>> Ok, but I will not be able to edit or close issues. Maybe you know who
>> is the right contact for fixing jira issue privileges?
> My bad, I simply forgot to add you to the committer role after becoming a
> committer. I should work now.
> 
>> Best regards
>> Thomas
> 
> BR
> Andreas Lehmkühler

thanks Thomas for the quick fix.

BR
Maruan


Re: Possible regression in pdfbox-1.8.1?

Posted by Andreas Lehmkühler <an...@lehmi.de>.
Hi,

Thomas Chojecki <in...@rayman2200.de> hat am 5. April 2013 um 10:31 geschrieben:
> Am 05.04.2013 07:45, schrieb Andreas Lehmkuehler:
> > Hi,
> Hi,
>
> > Am 04.04.2013 20:53, schrieb Thomas Chojecki:
> >> Hi all,
> >> I have patched the COSWriter and I would ask you to test it please. I
> >> attached
> >> the patch.
> > Thanks for jumping in. Looks good to me, it restores the old
> > behaviour. I'll run
> > some additional tests at the weekend. BTW the written trailer isn't
> > correct in
> > any case. If the origin trailer contains a xrefstream, some of the
> > stream
> > information (the decodeparms, the type) is still there. But that's not
> > new,
> > 1.7.1 already did the same.
> doesn't notice that the trailer keep this informations. I've already
> started to port the xref stream writer for new created or altered
> documents. This should fix this trailer issue too. But at the moment it
> doesn't work as expected.
IMHO this isn't urgent, as it works for years now ;-)

> > I'll cut a new release soon, if everythings works well and nobody
> > objects.
> Thanks and sorry for the extra work.
No need to worry.

> >> Due to temp. lacking access to my apache webspace, I uploaded the
> >> patched jar on
> >> my private server.
> >>
> >> http://media-nation.de/~rayman2200/pdfbox-1.9.0-SNAPSHOT.jar
> >>
> >> Best regards
> >> Thomas
> >>
> >> If no one complain I would commit it.
> > +1, please use PDFBOX-1551 as reference.
> Ok, but I will not be able to edit or close issues. Maybe you know who
> is the right contact for fixing jira issue privileges?
My bad, I simply forgot to add you to the committer role after becoming a
committer. I should work now.

> Best regards
> Thomas

BR
Andreas Lehmkühler

Re: Possible regression in pdfbox-1.8.1?

Posted by Thomas Chojecki <in...@rayman2200.de>.
Am 05.04.2013 07:45, schrieb Andreas Lehmkuehler:
> Hi,
Hi,

> Am 04.04.2013 20:53, schrieb Thomas Chojecki:
>> Hi all,
>> I have patched the COSWriter and I would ask you to test it please. I 
>> attached
>> the patch.
> Thanks for jumping in. Looks good to me, it restores the old 
> behaviour. I'll run
> some additional tests at the weekend. BTW the written trailer isn't 
> correct in
> any case. If the origin trailer contains a xrefstream, some of the 
> stream
> information (the decodeparms, the type) is still there. But that's not 
> new,
> 1.7.1 already did the same.
doesn't notice that the trailer keep this informations. I've already 
started to port the xref stream writer for new created or altered 
documents. This should fix this trailer issue too. But at the moment it 
doesn't work as expected.

> I'll cut a new release soon, if everythings works well and nobody 
> objects.
Thanks and sorry for the extra work.

>> Due to temp. lacking access to my apache webspace, I uploaded the 
>> patched jar on
>> my private server.
>> 
>> http://media-nation.de/~rayman2200/pdfbox-1.9.0-SNAPSHOT.jar
>> 
>> Best regards
>> Thomas
>> 
>> If no one complain I would commit it.
> +1, please use PDFBOX-1551 as reference.
Ok, but I will not be able to edit or close issues. Maybe you know who 
is the right contact for fixing jira issue privileges?

Best regards
Thomas

Re: Possible regression in pdfbox-1.8.1?

Posted by Andreas Lehmkuehler <an...@lehmi.de>.
Hi,

Am 04.04.2013 20:53, schrieb Thomas Chojecki:
> Hi all,
> I have patched the COSWriter and I would ask you to test it please. I attached
> the patch.
Thanks for jumping in. Looks good to me, it restores the old behaviour. I'll run
some additional tests at the weekend. BTW the written trailer isn't correct in
any case. If the origin trailer contains a xrefstream, some of the stream
information (the decodeparms, the type) is still there. But that's not new,
1.7.1 already did the same.

I'll cut a new release soon, if everythings works well and nobody objects.

> Due to temp. lacking access to my apache webspace, I uploaded the patched jar on
> my private server.
>
> http://media-nation.de/~rayman2200/pdfbox-1.9.0-SNAPSHOT.jar
>
> Best regards
> Thomas
>
> If no one complain I would commit it.
+1, please use PDFBOX-1551 as reference.

> Zitat von Thomas Chojecki <in...@rayman2200.de>:
>
>> Zitat von Andreas Lehmkuehler <an...@lehmi.de>:
>>
>>> Hi,
>>
>> Hi Andreas,
>>
>>> Am 04.04.2013 16:41, schrieb Maruan Sahyoun:
>>>> Hi Andreas,
>>>>
>>>> it's related to PDFBOX-1551 as you already guessed. In addition PDF-1556 is
>>>> related to PDFBOX-1551. I linked them in Jira.
>>> Thanks.
>>>
>>> I ran a quick test. As suggested I simply load and save a pdf and got 2
>>> different results:
>>>
>>> - cweb.pdf (from our testsuite) works fine
>>> - one of the pdfs attached to PDFBOX-1551 fails (the trailer is missing)
>>>
>>> The reason is the trailer. The first pdf uses a xref table the second a xref
>>> xstream. After saving the second the trailer is missing. The regression was
>>> introduced with PDFBOX-1513.
>>
>> I don't look at the file but as you already suggested, this changes come with
>> PDFBOX-1513. I will take a look and try to fix it.
>>
>>> I'm still investigating the details ... any help is appreciated.
>>>
>>> BR
>>> Andreas Lehmkühler
>>
>> Best regards
>> Thomas
>


BR
Andreas Lehmkühler


Re: Possible regression in pdfbox-1.8.1?

Posted by Thomas Chojecki <in...@rayman2200.de>.
Hi all,
I have patched the COSWriter and I would ask you to test it please. I  
attached the patch.

Due to temp. lacking access to my apache webspace, I uploaded the  
patched jar on my private server.

http://media-nation.de/~rayman2200/pdfbox-1.9.0-SNAPSHOT.jar

Best regards
Thomas

If no one complain I would commit it.

Zitat von Thomas Chojecki <in...@rayman2200.de>:

> Zitat von Andreas Lehmkuehler <an...@lehmi.de>:
>
>> Hi,
>
> Hi Andreas,
>
>> Am 04.04.2013 16:41, schrieb Maruan Sahyoun:
>>> Hi Andreas,
>>>
>>> it's related to PDFBOX-1551 as you already guessed. In addition  
>>> PDF-1556 is related to PDFBOX-1551. I linked them in Jira.
>> Thanks.
>>
>> I ran a quick test. As suggested I simply load and save a pdf and got 2
>> different results:
>>
>> - cweb.pdf (from our testsuite) works fine
>> - one of the pdfs attached to PDFBOX-1551 fails (the trailer is missing)
>>
>> The reason is the trailer. The first pdf uses a xref table the second a xref
>> xstream. After saving the second the trailer is missing. The regression was
>> introduced with PDFBOX-1513.
>
> I don't look at the file but as you already suggested, this changes  
> come with PDFBOX-1513. I will take a look and try to fix it.
>
>> I'm still investigating the details ... any help is appreciated.
>>
>> BR
>> Andreas Lehmkühler
>
> Best regards
> Thomas



Re: Possible regression in pdfbox-1.8.1?

Posted by Thomas Chojecki <in...@rayman2200.de>.
Zitat von Andreas Lehmkuehler <an...@lehmi.de>:

> Hi,

Hi Andreas,

> Am 04.04.2013 16:41, schrieb Maruan Sahyoun:
>> Hi Andreas,
>>
>> it's related to PDFBOX-1551 as you already guessed. In addition  
>> PDF-1556 is related to PDFBOX-1551. I linked them in Jira.
> Thanks.
>
> I ran a quick test. As suggested I simply load and save a pdf and got 2
> different results:
>
> - cweb.pdf (from our testsuite) works fine
> - one of the pdfs attached to PDFBOX-1551 fails (the trailer is missing)
>
> The reason is the trailer. The first pdf uses a xref table the second a xref
> xstream. After saving the second the trailer is missing. The regression was
> introduced with PDFBOX-1513.

I don't look at the file but as you already suggested, this changes  
come with PDFBOX-1513. I will take a look and try to fix it.

> I'm still investigating the details ... any help is appreciated.
>
> BR
> Andreas Lehmkühler

Best regards
Thomas


Re: Possible regression in pdfbox-1.8.1?

Posted by Andreas Lehmkuehler <an...@lehmi.de>.
Hi,

Am 04.04.2013 16:41, schrieb Maruan Sahyoun:
> Hi Andreas,
>
> it's related to PDFBOX-1551 as you already guessed. In addition PDF-1556 is related to PDFBOX-1551. I linked them in Jira.
Thanks.

I ran a quick test. As suggested I simply load and save a pdf and got 2
different results:

- cweb.pdf (from our testsuite) works fine
- one of the pdfs attached to PDFBOX-1551 fails (the trailer is missing)

The reason is the trailer. The first pdf uses a xref table the second a xref
xstream. After saving the second the trailer is missing. The regression was
introduced with PDFBOX-1513.

I'm still investigating the details ... any help is appreciated.

BR
Andreas Lehmkühler



Re: Possible regression in pdfbox-1.8.1?

Posted by Maruan Sahyoun <sa...@fileaffairs.de>.
Hi Andreas,

it's related to PDFBOX-1551 as you already guessed. In addition PDF-1556 is related to PDFBOX-1551. I linked them in Jira.

BR

Maruan Sahyoun

Am 02.04.2013 um 18:28 schrieb Maruan Sahyoun <sa...@fileaffairs.de>:

> Hi,
> 
> I didn't have the time to analyze further as I have to catch a plane. But simply loading and saving these will leave them in a state that Adobe Reader and Mac Preview can't open the newly saved docs. I did try both PDDocument.load and .loadNonSeq. Doing the same test with 1.7.1 produces viewable docs.
> 
> Unfortunately  I can't look into that before thursday - that's why I wanted to make you aware of it.
> 
> BR
> 
> Maruan Sahyoun
> 
> Am 02.04.2013 um 18:16 schrieb Andreas Lehmkuehler <an...@lehmi.de>:
> 
>> Hi,
>> 
>> Am 02.04.2013 17:22, schrieb Maruan Sahyoun:
>>> Hi,
>>> 
>>> there might be a regression in pdfbox-1.8.1 compared to pdfbox-1.7.1.
>>> 
>>> The samples under the users@pdfbox.apache.org email list for 'Error when
>>> opening multi-page PDF generated by PDFBox' work fine in 1.7.1 but are
>>> corrupt as is PDFBOX-1556. Could some else check these to verify it's not my environment?
>> What exactly is wrong with those pdfs? Is it probably related to PDFBOX-1551?
>> 
>>> 
>>> Kind regards
>>> 
>>> Maruan Sahyoun
>> 
>> BR
>> Andreas Lehmkühler
>> 


Re: Possible regression in pdfbox-1.8.1?

Posted by Maruan Sahyoun <sa...@fileaffairs.de>.
Hi,

I didn't have the time to analyze further as I have to catch a plane. But simply loading and saving these will leave them in a state that Adobe Reader and Mac Preview can't open the newly saved docs. I did try both PDDocument.load and .loadNonSeq. Doing the same test with 1.7.1 produces viewable docs.

Unfortunately  I can't look into that before thursday - that's why I wanted to make you aware of it.

BR

Maruan Sahyoun

Am 02.04.2013 um 18:16 schrieb Andreas Lehmkuehler <an...@lehmi.de>:

> Hi,
> 
> Am 02.04.2013 17:22, schrieb Maruan Sahyoun:
>> Hi,
>> 
>> there might be a regression in pdfbox-1.8.1 compared to pdfbox-1.7.1.
>> 
>> The samples under the users@pdfbox.apache.org email list for 'Error when
> > opening multi-page PDF generated by PDFBox' work fine in 1.7.1 but are
> > corrupt as is PDFBOX-1556. Could some else check these to verify it's not my environment?
> What exactly is wrong with those pdfs? Is it probably related to PDFBOX-1551?
> 
>> 
>> Kind regards
>> 
>> Maruan Sahyoun
> 
> BR
> Andreas Lehmkühler
> 

Re: Possible regression in pdfbox-1.8.1?

Posted by Andreas Lehmkuehler <an...@lehmi.de>.
Hi,

Am 02.04.2013 17:22, schrieb Maruan Sahyoun:
> Hi,
>
> there might be a regression in pdfbox-1.8.1 compared to pdfbox-1.7.1.
>
> The samples under the users@pdfbox.apache.org email list for 'Error when
 > opening multi-page PDF generated by PDFBox' work fine in 1.7.1 but are
 > corrupt as is PDFBOX-1556. Could some else check these to verify it's not my 
environment?
What exactly is wrong with those pdfs? Is it probably related to PDFBOX-1551?

>
> Kind regards
>
> Maruan Sahyoun

BR
Andreas Lehmkühler