You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@pdfbox.apache.org by Johanneke Lamberink <jo...@onior.com> on 2015/05/04 13:26:58 UTC

PDPageDestination off by one

Hi,

In my project I am exporting bookmarks from a PDF to an XML file, and at another stage importing the bookmarks from the XML file into the PDF. In doing this I've noticed an issue with the index of the page returned by the PDPageDestination methods.

My situation is as follows:
I have a PDF with a bookmark on the second page. That bookmark is represented by a PDOutlineItem with a PDActionGoTo and a PDPageDestination.
Calling PDPageDestination.findPageNumber() on the bookmark's destination results in the integer 2. So far so good. I write to my XML file that the bookmark is on page 2.
When reading the XML, I create a new PDGoTo action with a PDPageDestination. This PDPageDestination receives an array with the integer 2 as the first element.
When I ask this new PDPageDestination for its pagenumber, using PDPageDestination.findPageNumber(), I receive 2. Since this is the same code I used to determine the bookmarked page in the first place, I would expect the bookmark to be made for the correct page at this point.

However, when I open the PDF in Adobe Acrobat Pro and inspect the bookmark, it says the page destination is 3! When going to the bookmark destination, I end up on the third page of the PDF.

Is this a bug in PDFBox? And does anyone have a solution on how to fix this?

Kind regards,


Johanneke Lamberink

Re: PDPageDestination off by one

Posted by Johanneke Lamberink <jo...@onior.com>.
Hi,

I’ve created an issue, https://issues.apache.org/jira/browse/PDFBOX-2786
Attached to the issue is a test case.

Kind regards,

Johanneke Lamberink




Op 4/5/15 14:24 schreef Johanneke Lamberink
<jo...@onior.com>:

>Hi,
>
>Thank you for the quick reply. As I stated in my original email, I ask
>PDFBox for the pagenumber of a bookmark, and I use that exact number to
>set the pagenumber of a bookmark. No switching between user/development
>environment.
>
>The bookmark was added in Adobe Acrobat Pro, on the second page. When
>inspecting the bookmark, it states that the page destination is page 2.
>When programmatically setting the bookmark to the same page as returned by
>PDFBox for the original PDF, the bookmark in the resulting PDF is on the
>third page.
>
>The code I use for reading the bookmark and then creating the bookmark is
>pasted below.
>
>Reading a PDF to extract the bookmarks to XML:
>// item instanceof PDOutlineItem
>if ( item != null ) {
>   PDDestination destination = item.getDestination();
>   PDAction action = item.getAction();
>   if ( action instanceof PDActionGoTo && destination == null ) {
>      PDActionGoTo gotoAction = (PDActionGoTo) action;
>      destination = gotoAction.getDestination();
>   }
>   if ( destination != null ) {
>      Map<String, String> properties = new HashMap<String, String>();
>      properties.put( Bookmark.ACTION, "GoTo" );
>      if ( destination instanceof PDNamedDestination ) {
>         PDNamedDestination namedDest = (PDNamedDestination) destination;
>         properties.put( Bookmark.NAMED, namedDest.getNamedDestination()
>);
>      } else if ( destination instanceof PDPageDestination ) {
>         PDPageDestination pageDest = (PDPageDestination) destination;
>         int pagenumber = pageDest.findPageNumber(); // this results in
>Œ2¹ for the original PDF, with a bookmark created on the second page in
>Adobe Acrobat Pro
>         LOGGER.debug( "Page in PDF = " + pagenumber );
>         properties.put( Bookmark.PAGE, String.valueOf( pagenumber ) );
>      }
>   }
>				}
>
>
>
>Creating bookmarks programmatically in the PDF:
>int pagenumber = Integer.parseInt( xml );
>COSArray destinationArray = new COSArray();
>				destinationArray.add( 0, COSNumber.get( "" + pagenumber ) );
>				destinationArray.add( 1, COSName.getPDFName( "Fit" ) );
>				actionDict.setItem( "D", destinationArray );
>				pdAction = new PDActionGoTo( actionDict );
>// I added this to check the correct setting of the destination
>				PDActionGoTo gotoAction = (PDActionGoTo) pdAction;
>				PDPageDestination dest = (PDPageDestination)
>gotoAction.getDestination();
>				LOGGER.debug( "New page in PDF = " + dest.findPageNumber() );
>// results in page Œ2¹, same as the original PDF.
>
>
>
>
>I¹ve tried correcting the pagenumber myself by adding/subtracting 1, but
>this only works once. Setting the pagenumber to 1 (for example), will
>result in a correct PDF with a working bookmark. When extracting the
>bookmarks a second time however (as is needed for my application), I get
>pagenumber = 1. This means that I never know when to correct for the
>0-index.
>
>
>
>
>
>Op 4/5/15 14:09 schreef Gilad Denneboom <gi...@gmail.com>:
>
>>PDF pages (like almost anything else in arrays or lists) are 0-based.
>>That
>>means that 0 refers to the first item, 1 to the second one, etc. So PDF
>>page #2 refers to the third physical page in the document.
>>Acrobat is a user-facing application so it uses the more conventional
>>1-based way of referring to the these items, so the first page in a
>>document is #1, the second page is #2, etc.
>>This is something you always have to keep in mind when switching between
>>a
>>development environment and a user environment.
>>
>>On Mon, May 4, 2015 at 1:26 PM, Johanneke Lamberink <
>>johanneke.lamberink@onior.com> wrote:
>>
>>> Hi,
>>>
>>> In my project I am exporting bookmarks from a PDF to an XML file, and
>>>at
>>> another stage importing the bookmarks from the XML file into the PDF.
>>>In
>>> doing this I've noticed an issue with the index of the page returned by
>>>the
>>> PDPageDestination methods.
>>>
>>> My situation is as follows:
>>> I have a PDF with a bookmark on the second page. That bookmark is
>>> represented by a PDOutlineItem with a PDActionGoTo and a
>>>PDPageDestination.
>>> Calling PDPageDestination.findPageNumber() on the bookmark's
>>>destination
>>> results in the integer 2. So far so good. I write to my XML file that
>>>the
>>> bookmark is on page 2.
>>> When reading the XML, I create a new PDGoTo action with a
>>> PDPageDestination. This PDPageDestination receives an array with the
>>> integer 2 as the first element.
>>> When I ask this new PDPageDestination for its pagenumber, using
>>> PDPageDestination.findPageNumber(), I receive 2. Since this is the same
>>> code I used to determine the bookmarked page in the first place, I
>>>would
>>> expect the bookmark to be made for the correct page at this point.
>>>
>>> However, when I open the PDF in Adobe Acrobat Pro and inspect the
>>> bookmark, it says the page destination is 3! When going to the bookmark
>>> destination, I end up on the third page of the PDF.
>>>
>>> Is this a bug in PDFBox? And does anyone have a solution on how to fix
>>> this?
>>>
>>> Kind regards,
>>>
>>>
>>> Johanneke Lamberink
>>>
>
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
>For additional commands, e-mail: users-help@pdfbox.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org


Re: PDPageDestination off by one

Posted by Johanneke Lamberink <jo...@onior.com>.
Hi,

Thank you for the quick reply. As I stated in my original email, I ask
PDFBox for the pagenumber of a bookmark, and I use that exact number to
set the pagenumber of a bookmark. No switching between user/development
environment.

The bookmark was added in Adobe Acrobat Pro, on the second page. When
inspecting the bookmark, it states that the page destination is page 2.
When programmatically setting the bookmark to the same page as returned by
PDFBox for the original PDF, the bookmark in the resulting PDF is on the
third page.

The code I use for reading the bookmark and then creating the bookmark is
pasted below.

Reading a PDF to extract the bookmarks to XML:
// item instanceof PDOutlineItem
if ( item != null ) {
   PDDestination destination = item.getDestination();
   PDAction action = item.getAction();
   if ( action instanceof PDActionGoTo && destination == null ) {
      PDActionGoTo gotoAction = (PDActionGoTo) action;
      destination = gotoAction.getDestination();
   }
   if ( destination != null ) {
      Map<String, String> properties = new HashMap<String, String>();
      properties.put( Bookmark.ACTION, "GoTo" );
      if ( destination instanceof PDNamedDestination ) {
         PDNamedDestination namedDest = (PDNamedDestination) destination;
         properties.put( Bookmark.NAMED, namedDest.getNamedDestination() );
      } else if ( destination instanceof PDPageDestination ) {
         PDPageDestination pageDest = (PDPageDestination) destination;
         int pagenumber = pageDest.findPageNumber(); // this results in
Œ2¹ for the original PDF, with a bookmark created on the second page in
Adobe Acrobat Pro
         LOGGER.debug( "Page in PDF = " + pagenumber );
         properties.put( Bookmark.PAGE, String.valueOf( pagenumber ) );
      }
   }
				}



Creating bookmarks programmatically in the PDF:
int pagenumber = Integer.parseInt( xml );
COSArray destinationArray = new COSArray();
				destinationArray.add( 0, COSNumber.get( "" + pagenumber ) );
				destinationArray.add( 1, COSName.getPDFName( "Fit" ) );
				actionDict.setItem( "D", destinationArray );
				pdAction = new PDActionGoTo( actionDict );
// I added this to check the correct setting of the destination
				PDActionGoTo gotoAction = (PDActionGoTo) pdAction;
				PDPageDestination dest = (PDPageDestination)
gotoAction.getDestination();
				LOGGER.debug( "New page in PDF = " + dest.findPageNumber() );
// results in page Œ2¹, same as the original PDF.




I¹ve tried correcting the pagenumber myself by adding/subtracting 1, but
this only works once. Setting the pagenumber to 1 (for example), will
result in a correct PDF with a working bookmark. When extracting the
bookmarks a second time however (as is needed for my application), I get
pagenumber = 1. This means that I never know when to correct for the
0-index.





Op 4/5/15 14:09 schreef Gilad Denneboom <gi...@gmail.com>:

>PDF pages (like almost anything else in arrays or lists) are 0-based. That
>means that 0 refers to the first item, 1 to the second one, etc. So PDF
>page #2 refers to the third physical page in the document.
>Acrobat is a user-facing application so it uses the more conventional
>1-based way of referring to the these items, so the first page in a
>document is #1, the second page is #2, etc.
>This is something you always have to keep in mind when switching between a
>development environment and a user environment.
>
>On Mon, May 4, 2015 at 1:26 PM, Johanneke Lamberink <
>johanneke.lamberink@onior.com> wrote:
>
>> Hi,
>>
>> In my project I am exporting bookmarks from a PDF to an XML file, and at
>> another stage importing the bookmarks from the XML file into the PDF. In
>> doing this I've noticed an issue with the index of the page returned by
>>the
>> PDPageDestination methods.
>>
>> My situation is as follows:
>> I have a PDF with a bookmark on the second page. That bookmark is
>> represented by a PDOutlineItem with a PDActionGoTo and a
>>PDPageDestination.
>> Calling PDPageDestination.findPageNumber() on the bookmark's destination
>> results in the integer 2. So far so good. I write to my XML file that
>>the
>> bookmark is on page 2.
>> When reading the XML, I create a new PDGoTo action with a
>> PDPageDestination. This PDPageDestination receives an array with the
>> integer 2 as the first element.
>> When I ask this new PDPageDestination for its pagenumber, using
>> PDPageDestination.findPageNumber(), I receive 2. Since this is the same
>> code I used to determine the bookmarked page in the first place, I would
>> expect the bookmark to be made for the correct page at this point.
>>
>> However, when I open the PDF in Adobe Acrobat Pro and inspect the
>> bookmark, it says the page destination is 3! When going to the bookmark
>> destination, I end up on the third page of the PDF.
>>
>> Is this a bug in PDFBox? And does anyone have a solution on how to fix
>> this?
>>
>> Kind regards,
>>
>>
>> Johanneke Lamberink
>>


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org


Re: PDPageDestination off by one

Posted by Gilad Denneboom <gi...@gmail.com>.
PDF pages (like almost anything else in arrays or lists) are 0-based. That
means that 0 refers to the first item, 1 to the second one, etc. So PDF
page #2 refers to the third physical page in the document.
Acrobat is a user-facing application so it uses the more conventional
1-based way of referring to the these items, so the first page in a
document is #1, the second page is #2, etc.
This is something you always have to keep in mind when switching between a
development environment and a user environment.

On Mon, May 4, 2015 at 1:26 PM, Johanneke Lamberink <
johanneke.lamberink@onior.com> wrote:

> Hi,
>
> In my project I am exporting bookmarks from a PDF to an XML file, and at
> another stage importing the bookmarks from the XML file into the PDF. In
> doing this I've noticed an issue with the index of the page returned by the
> PDPageDestination methods.
>
> My situation is as follows:
> I have a PDF with a bookmark on the second page. That bookmark is
> represented by a PDOutlineItem with a PDActionGoTo and a PDPageDestination.
> Calling PDPageDestination.findPageNumber() on the bookmark's destination
> results in the integer 2. So far so good. I write to my XML file that the
> bookmark is on page 2.
> When reading the XML, I create a new PDGoTo action with a
> PDPageDestination. This PDPageDestination receives an array with the
> integer 2 as the first element.
> When I ask this new PDPageDestination for its pagenumber, using
> PDPageDestination.findPageNumber(), I receive 2. Since this is the same
> code I used to determine the bookmarked page in the first place, I would
> expect the bookmark to be made for the correct page at this point.
>
> However, when I open the PDF in Adobe Acrobat Pro and inspect the
> bookmark, it says the page destination is 3! When going to the bookmark
> destination, I end up on the third page of the PDF.
>
> Is this a bug in PDFBox? And does anyone have a solution on how to fix
> this?
>
> Kind regards,
>
>
> Johanneke Lamberink
>