You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by "Hasan Karaoğlu (JIRA)" <ji...@apache.org> on 2017/09/12 11:15:00 UTC
[jira] [Comment Edited] (PDFBOX-3926) ExtractImages
[ https://issues.apache.org/jira/browse/PDFBOX-3926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16162721#comment-16162721 ]
Hasan Karaoğlu edited comment on PDFBOX-3926 at 9/12/17 11:14 AM:
------------------------------------------------------------------
These below two functions return wrong values. Why? I think they are length to bottom edge.
How can get y position of an image correctly? (Length to top edge)
{code:java}
PDImageXObject image = (PDImageXObject)xobject;
Matrix ctmNew = getGraphicsState().getCurrentTransformationMatrix();
ctmNew.getTranslateX()
ctmNew.getTranslateY()
{code}
was (Author: hkaraoglu):
These below two functions return wrong values. Why? I think they are length to bottom edge.
How can get y position of an image correctly? (Length to top edge)
{code:java}
PDImageXObject image = (PDImageXObject)xobject;
Matrix ctmNew = getGraphicsState().getCurrentTransformationMatrix();
getGraphicsState().getCurrentTransformationMatrix().getTranslate()
getGraphicsState().getCurrentTransformationMatrix().getTranslateY()
{code}
> ExtractImages
> --------------
>
> Key: PDFBOX-3926
> URL: https://issues.apache.org/jira/browse/PDFBOX-3926
> Project: PDFBox
> Issue Type: Improvement
> Reporter: Hasan Karaoğlu
>
> Hi, I extract texts from pdf by below command. But it doesnt extract images. And So, I use extract images command. But how can we merge these two data sequentially?
> Extract Texts: (First command)
> {code:java}
> java -jar pdfbox.jar ExtractText -html {{inputFileName}} -startPage {{startPage}} -endPage {{endPage}} -encoding UTF-8 {{outputFileName}}
> {code}
> Extract Images: (Second command)
> {code:java}
> java -jar pdfbox-app.jar ExtractImages [OPTIONS] <inputfile>
> {code}
> For example I run first command and I have a output.html file. But this file has just text parts of page. There is no image. And I run second command , I get image as file. Then, How can I merge these two seperated files. Order of elements in page is important.
>
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: dev-help@pdfbox.apache.org