You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by "Tilman Hausherr (Jira)" <ji...@apache.org> on 2020/02/01 12:33:00 UTC
[jira] [Resolved] (PDFBOX-4738) getDocument().getObjects() returns
nothing for split result documents
[ https://issues.apache.org/jira/browse/PDFBOX-4738?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Tilman Hausherr resolved PDFBOX-4738.
-------------------------------------
Assignee: Tilman Hausherr
Resolution: Fixed
> getDocument().getObjects() returns nothing for split result documents
> ---------------------------------------------------------------------
>
> Key: PDFBOX-4738
> URL: https://issues.apache.org/jira/browse/PDFBOX-4738
> Project: PDFBox
> Issue Type: Bug
> Components: Documentation
> Affects Versions: 2.0.18
> Reporter: Yuguang Huang
> Assignee: Tilman Hausherr
> Priority: Minor
> Fix For: 2.0.19, 3.0.0 PDFBox
>
>
>
> Hi PDFBOX community, we want to get objs count on pages instead of the whole document.
> Our way to do it is splitting the whole document into multiple documents containing only one page. But it seems then it returns documents/pages without objects, meaning getDocument().getObjects() returns an empty list.
> But if we save each page into bytes then load them into PDDocument, we are able to get the object counts.
>
> Is there any way we can get the page objs count without involving so much IO? Thanks!
>
> Output of the below code with a three-page PDF document:
>
> Page objects count from splitted pages:
> page [1] num of objs [0]
> page [2] num of objs [0]
> page [3] num of objs [0]
> Page objects count from pages generated from bytes:
> page [1] num of objs [20]
> page [2] num of objs [51]
> page [3] num of objs [20]
>
> {code:java}
> private static void printNumObjects(String pdfFilename) throws IOException {
> byte[] fileContent = Files.readAllBytes((new File(pdfFilename)).toPath());
> PDDocument document = PDDocument.load(fileContent);
> List<PDDocument> pages = new Splitter().split(document);
> List<byte[]> pageBytes = pages.stream().map(page -> {
> try (ByteArrayOutputStream baos = new ByteArrayOutputStream()) {
> page.save(baos);
> page.close();
> return baos.toByteArray();
> } catch (IOException e) {
> LOG.error("Failed to get bytes from page.", e);
> return new byte[0];
> }
> }).collect(Collectors.toList());
> System.out.println("Page objects count from splitted pages:");
> IntStream.range(0, pages.size()).forEach(i -> System.out.println(String.format("page [%d] num of objs [%d]", i + 1, pages.get(i).getDocument().getObjects().size())));
> System.out.println("Page objects count from pages generated from bytes:");
> IntStream.range(0, pageBytes.size()).forEach(i -> {
> try {
> System.out.println(String.format("page [%d] num of objs [%d]", i + 1, PDDocument.load(pageBytes.get(i)).getDocument().getObjects().size()));
> } catch (IOException e) {
> LOG.error("Failed to load page.", e);
> }
> });
> }{code}
>
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: dev-help@pdfbox.apache.org