You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@pdfbox.apache.org by Matthew Sheppard <ms...@funnelback.com> on 2012/09/21 09:14:23 UTC

Accessing "alternate text" for an image via PDFBox?

Is there some way to extract "alternate text" for a specific image using
PDFBox?

I have a PDF file which, as described at
http://www.w3.org/WAI/GL/2011/WD-WCAG20-TECHS-20110621/pdf.html#PDF1, has
had alternate text added to an image. Using PDFBox I can find my way
through the object model to the image itself (a PDXObjectImage)
through PDFDocument.getDocumentCatalog().getAllPages() [iterator]
.getResources.getImages() but I can not see any way to get from the image
itself to the alternate text for it.

A small sample PDF (with a single image which has some alternate text
specified) can be found at
http://dl.dropbox.com/u/12253279/image_test_pass.pdf

Many thanks in advance to anyone who is able to point me in the right
direction,
Matt Sheppard