You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@pdfbox.apache.org by Olivier Cailloux <ol...@gmail.com> on 2017/04/24 12:53:33 UTC

Get list of PDPageLabelRange?

Dear list,
How can I obtain, from a given PDDocument, a list of the page label 
ranges that it contains?

Here is an example where I obtain the first PDPageLabelRange. How to 
retrieve the other ones? I realize I can iterate over all pages of the 
document and query for the possible existence of a pageLabelRange at 
each page, but I suspect there must be a more efficient (and simpler) way.

try (PDDocument document = PDDocument.load(\u2026)) {
     assert !document.isEncrypted();
     PDDocumentCatalog catalog = document.getDocumentCatalog();
     PDPageLabels labels = catalog.getPageLabels();
     PDPageLabelRange pageLabelRange = labels.getPageLabelRange(0);
}

Thanks.
Olivier


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org


Re: Get list of PDPageLabelRange?

Posted by Olivier Cailloux <ol...@gmail.com>.
Le 28/04/2017 � 21:32, Tilman Hausherr a �crit :
>     /**
>      * Get an ordered set of page indices having a page label range.
>      *
>      * @return set of page indices.
>      */
>     public SortedSet<Integer> getPageIndices()
>
>
> Please give feedback whether it works for you.
Perfect. Thanks. (Unrelatedly, it\u2019s very handy that the snapshots are 
available through Maven.)
(Nitpicking: have you considered NavigableSet instead of SortedSet? But 
for my purpose SortedSet is enough.)
Olivier

>
> Tilman


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org


Re: Get list of PDPageLabelRange?

Posted by Tilman Hausherr <TH...@t-online.de>.
Am 28.04.2017 um 13:42 schrieb Olivier Cailloux:
> Le 27/04/2017 � 18:15, Tilman Hausherr a �crit :
>> So, would this help you?
>>
>>
>>     public Set<Integer>getPageIndices()
>>     {
>>         return labels.keySet();
>>     }
>>
>
> Perfect! (Possibly also with a (javadoc) guarantee about the ascending 
> iteration order over the set contents, if that\u2019s easy to provide.)
> Olivier 


Done.
https://issues.apache.org/jira/browse/PDFBOX-3770

You get get a snapshot here:
https://repository.apache.org/content/groups/snapshots/org/apache/pdfbox/pdfbox-app/2.0.6-SNAPSHOT/

     /**
      * Get an ordered set of page indices having a page label range.
      *
      * @return set of page indices.
      */
     public SortedSet<Integer> getPageIndices()


Please give feedback whether it works for you.

Tilman



---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org


Re: Get list of PDPageLabelRange?

Posted by Olivier Cailloux <ol...@gmail.com>.
Le 27/04/2017 � 18:15, Tilman Hausherr a �crit :
> So, would this help you?
>
>
>     public Set<Integer>getPageIndices()
>     {
>         return labels.keySet();
>     }
>

Perfect! (Possibly also with a (javadoc) guarantee about the ascending 
iteration order over the set contents, if that\u2019s easy to provide.)
Olivier


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org


Re: Get list of PDPageLabelRange?

Posted by Tilman Hausherr <TH...@t-online.de>.
Am 26.04.2017 um 13:44 schrieb Olivier Cailloux:
> Le 25/04/2017 � 18:43, Tilman Hausherr a �crit :
>> Am 25.04.2017 um 14:15 schrieb Olivier Cailloux:
>>> Le 24/04/2017 � 20:10, Tilman Hausherr a �crit :
>>>> Am 24.04.2017 um 14:53 schrieb Olivier Cailloux:
>>>>> Dear list,
>>>>> How can I obtain, from a given PDDocument, a list of the page 
>>>>> label ranges that it contains?
>>>>>
>>>>> Here is an example where I obtain the first PDPageLabelRange. How 
>>>>> to retrieve the other ones? I realize I can iterate over all pages 
>>>>> of the document and query for the possible existence of a 
>>>>> pageLabelRange at each page, but I suspect there must be a more 
>>>>> efficient (and simpler) way.
>>>>>
>>>>> try (PDDocument document = PDDocument.load(\u2026)) {
>>>>>     assert !document.isEncrypted();
>>>>>     PDDocumentCatalog catalog = document.getDocumentCatalog();
>>>>>     PDPageLabels labels = catalog.getPageLabels();
>>>>>     PDPageLabelRange pageLabelRange = labels.getPageLabelRange(0);
>>>>> } 
>>>>
>>>> [accidentally mailed; repost for the list]
>>>>
>>>> Do you need the range (which is a naming scheme) or do you need the 
>>>> label?
>>> Thanks for your reply. I need the ranges.
>>>
>>>>
>>>> First one isn't available for some reason. There's a Map<Integer, 
>>>> PDPageLabelRange> labels, but it is not available to the public.
>>> Should I file a request for improving this situation somehow?
>>> Olivier
>>
>> I'm wondering whether it is really THAT important?
> Certainly not extremely important. But still more elegant and nice to 
> have, me thinks. I suspect other users will wonder how to iterate over 
> the label ranges, as it seems a natural information to provide. (My 
> two cents.)
>
>> I'm not sure if I should expose the map; maybe return a list of pages 
>> as a set? You could then iterate on that one to get the ranges.
> A set of PDPageLabelRange in itself has low usefulness, as 
> PDPageLabelRange objects do not contain the corresponding page 
> indexes. I\u2019d rather suggest providing a set of page indexes on which 
> PDPageLabelRange start. (Assuming you do not plan to change the rest 
> of the API.) This would fit well with the existing 
> labels.getPageLabelRange method. Or even better (IMHO), add the page 
> index info to the PDPageLabelRange object. I feel it belongs there.

The page label object mirrors the page label dictionary (see in the PDF 
specification "Entries in a page label dictionary") so we won't add the 
page number.

"I\u2019d rather suggest providing a set of page indexes on which 
PDPageLabelRange start."

That is what I meant, altough my words "maybe return a list of pages as 
a set?" missed the word "indexes".

So, would this help you?


     public Set<Integer>getPageIndices()
     {
         return labels.keySet();
     }


Tilman


> Olivier
>
>>
>> Tilman
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
>> For additional commands, e-mail: users-help@pdfbox.apache.org
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
> For additional commands, e-mail: users-help@pdfbox.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org


Re: Get list of PDPageLabelRange?

Posted by Olivier Cailloux <ol...@gmail.com>.
Le 25/04/2017 � 18:43, Tilman Hausherr a �crit :
> Am 25.04.2017 um 14:15 schrieb Olivier Cailloux:
>> Le 24/04/2017 � 20:10, Tilman Hausherr a �crit :
>>> Am 24.04.2017 um 14:53 schrieb Olivier Cailloux:
>>>> Dear list,
>>>> How can I obtain, from a given PDDocument, a list of the page label 
>>>> ranges that it contains?
>>>>
>>>> Here is an example where I obtain the first PDPageLabelRange. How 
>>>> to retrieve the other ones? I realize I can iterate over all pages 
>>>> of the document and query for the possible existence of a 
>>>> pageLabelRange at each page, but I suspect there must be a more 
>>>> efficient (and simpler) way.
>>>>
>>>> try (PDDocument document = PDDocument.load(\u2026)) {
>>>>     assert !document.isEncrypted();
>>>>     PDDocumentCatalog catalog = document.getDocumentCatalog();
>>>>     PDPageLabels labels = catalog.getPageLabels();
>>>>     PDPageLabelRange pageLabelRange = labels.getPageLabelRange(0);
>>>> } 
>>>
>>> [accidentally mailed; repost for the list]
>>>
>>> Do you need the range (which is a naming scheme) or do you need the 
>>> label?
>> Thanks for your reply. I need the ranges.
>>
>>>
>>> First one isn't available for some reason. There's a Map<Integer, 
>>> PDPageLabelRange> labels, but it is not available to the public.
>> Should I file a request for improving this situation somehow?
>> Olivier
>
> I'm wondering whether it is really THAT important?
Certainly not extremely important. But still more elegant and nice to 
have, me thinks. I suspect other users will wonder how to iterate over 
the label ranges, as it seems a natural information to provide. (My two 
cents.)

> I'm not sure if I should expose the map; maybe return a list of pages 
> as a set? You could then iterate on that one to get the ranges.
A set of PDPageLabelRange in itself has low usefulness, as 
PDPageLabelRange objects do not contain the corresponding page indexes. 
I\u2019d rather suggest providing a set of page indexes on which 
PDPageLabelRange start. (Assuming you do not plan to change the rest of 
the API.) This would fit well with the existing labels.getPageLabelRange 
method. Or even better (IMHO), add the page index info to the 
PDPageLabelRange object. I feel it belongs there.
Olivier

>
> Tilman
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
> For additional commands, e-mail: users-help@pdfbox.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org


Re: Get list of PDPageLabelRange?

Posted by Tilman Hausherr <TH...@t-online.de>.
Am 25.04.2017 um 14:15 schrieb Olivier Cailloux:
> Le 24/04/2017 � 20:10, Tilman Hausherr a �crit :
>> Am 24.04.2017 um 14:53 schrieb Olivier Cailloux:
>>> Dear list,
>>> How can I obtain, from a given PDDocument, a list of the page label 
>>> ranges that it contains?
>>>
>>> Here is an example where I obtain the first PDPageLabelRange. How to 
>>> retrieve the other ones? I realize I can iterate over all pages of 
>>> the document and query for the possible existence of a 
>>> pageLabelRange at each page, but I suspect there must be a more 
>>> efficient (and simpler) way.
>>>
>>> try (PDDocument document = PDDocument.load(\u2026)) {
>>>     assert !document.isEncrypted();
>>>     PDDocumentCatalog catalog = document.getDocumentCatalog();
>>>     PDPageLabels labels = catalog.getPageLabels();
>>>     PDPageLabelRange pageLabelRange = labels.getPageLabelRange(0);
>>> } 
>>
>> [accidentally mailed; repost for the list]
>>
>> Do you need the range (which is a naming scheme) or do you need the 
>> label?
> Thanks for your reply. I need the ranges.
>
>>
>> First one isn't available for some reason. There's a Map<Integer, 
>> PDPageLabelRange> labels, but it is not available to the public.
> Should I file a request for improving this situation somehow?
> Olivier

I'm wondering whether it is really THAT important? I'm not sure if I 
should expose the map; maybe return a list of pages as a set? You could 
then iterate on that one to get the ranges.

Tilman


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org


Re: Get list of PDPageLabelRange?

Posted by Olivier Cailloux <ol...@gmail.com>.
Le 24/04/2017 � 20:10, Tilman Hausherr a �crit :
> Am 24.04.2017 um 14:53 schrieb Olivier Cailloux:
>> Dear list,
>> How can I obtain, from a given PDDocument, a list of the page label 
>> ranges that it contains?
>>
>> Here is an example where I obtain the first PDPageLabelRange. How to 
>> retrieve the other ones? I realize I can iterate over all pages of 
>> the document and query for the possible existence of a pageLabelRange 
>> at each page, but I suspect there must be a more efficient (and 
>> simpler) way.
>>
>> try (PDDocument document = PDDocument.load(\u2026)) {
>>     assert !document.isEncrypted();
>>     PDDocumentCatalog catalog = document.getDocumentCatalog();
>>     PDPageLabels labels = catalog.getPageLabels();
>>     PDPageLabelRange pageLabelRange = labels.getPageLabelRange(0);
>> } 
>
> [accidentally mailed; repost for the list]
>
> Do you need the range (which is a naming scheme) or do you need the label?
Thanks for your reply. I need the ranges.

>
> First one isn't available for some reason. There's a Map<Integer, 
> PDPageLabelRange> labels, but it is not available to the public.
Should I file a request for improving this situation somehow?
Olivier

>
> Tilman


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org


Re: Get list of PDPageLabelRange?

Posted by Tilman Hausherr <TH...@t-online.de>.
Am 24.04.2017 um 14:53 schrieb Olivier Cailloux:
> Dear list,
> How can I obtain, from a given PDDocument, a list of the page label 
> ranges that it contains?
>
> Here is an example where I obtain the first PDPageLabelRange. How to 
> retrieve the other ones? I realize I can iterate over all pages of the 
> document and query for the possible existence of a pageLabelRange at 
> each page, but I suspect there must be a more efficient (and simpler) 
> way.
>
> try (PDDocument document = PDDocument.load(\u2026)) {
>     assert !document.isEncrypted();
>     PDDocumentCatalog catalog = document.getDocumentCatalog();
>     PDPageLabels labels = catalog.getPageLabels();
>     PDPageLabelRange pageLabelRange = labels.getPageLabelRange(0);
> } 

[accidentally mailed; repost for the list]

Do you need the range (which is a naming scheme) or do you need the 
label? Last one would be easy.

       labels.getLabelsByPageIndices();

First one isn't available for some reason. There's a Map<Integer, 
PDPageLabelRange> labels, but it is not available to the public.

Tilman

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org