You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by "Jukka Zitting (JIRA)" <ji...@apache.org> on 2008/08/04 20:34:44 UTC

[jira] Created: (PDFBOX-363) Fixed Page rotation

Fixed Page rotation
-------------------

                 Key: PDFBOX-363
                 URL: https://issues.apache.org/jira/browse/PDFBOX-363
             Project: PDFBox
          Issue Type: Improvement
            Reporter: Jukka Zitting


[Issue from SourceForge]
http://sourceforge.net/tracker/index.php?func=detail&aid=1977429&group_id=78314&atid=552834

Hi all,

Daniel asked me for my patch for the rotation-issue described in
https://sourceforge.net/forum/message.php?msg_id=4992032

Attention, I didn't apply the newest patches to the classes PDFStreamEngine
and PageDrawer.

There are 4 more probably affected classes calling the page.findRotation
method which I didn't change, because I'm didn't have to use them (until
now).

org.pdfbox.util.operator.pagedrawer.Invoke
org.pdfbox.util.TextPositionComparator
org.pdfbox.examples.pdmodel.PrintURLs
org.pdfbox.examples.util.PrintImageLocations


I've attached a pdf in DINA4-landscape. The text is missplaced whenever I
try to print or display (using the pdfbox-PDFReader and convertToImage
within my application) it with pdfbox. The acrobat reader has no problems
with my documents.
After my patch everything works fine. Perhaps it is a point of discussion,
if the convertToImage method has to rotate the image or if the user has to
do it. The PDFPagePanel didn't do it (yet).

Andreas

http://sourceforge.net/tracker/download.php?group_id=78314&atid=552834&file_id=279404&aid=1977429

[Comment from SourceForge]
Date: 2008-05-29 12:42
Sender: danielwilson
Logged In: YES 
user_id=1737686
Originator: NO

I've just tried your sample PDF w/ the latest code -- prior to application
of your patch.  It doesn't work.

I'll work on incorporating your change for a full regression test in the
next hour or so.

[Comment from SourceForge]
Date: 2008-05-29 15:16
Sender: lehmialk
Logged In: YES 
user_id=2069622
Originator: YES

Hi Daniel,

I've just added my patch to the newest sources you send me earlier this
day. I guess it works. During testing I've found another problem concernign
graphics within landscape-docs. I found the solution in patching the class
org.pdfbox.util.operator.pagedrawer.Invoke in the same way I've patched the
others. And consequently to be strict I've also patched the new methods in
org.pdfbox.pdfviewer.PageDrawer

For my everthings works fine inlc. the 4PP-pdf.

I've attached the patched files and another testpdf with a embedded
graphic.

Andreas
File Added: pdfbox_rotation_patch_2.zip
http://sourceforge.net/tracker/download.php?group_id=78314&atid=552834&file_id=279471&aid=1977429

[Comment from SourceForge]
Date: 2008-05-29 18:12
Sender: danielwilson
Logged In: YES 
user_id=1737686
Originator: NO

Your code works w/ the 4PP test ... and with the other rendering stuff
I've tried so far.

However ... the text extraction test fails with it.  I can't figure that
one out ... ideas?

[Comment from SourceForge]
Date: 2008-05-29 18:19
Sender: lehmialk
Logged In: YES 
user_id=2069622
Originator: YES

Can you give me some more details? I never do any textextractions with
pdfbox. Perhaps you'll provide with the code for test program, or is it
part of pdfbox, so that I can find it in the cvs?

However, it has to wait until tomorrow

[Comment from SourceForge]
Date: 2008-05-29 18:39
Sender: danielwilson
Logged In: YES 
user_id=1737686
Originator: NO

If you've got the whole project set up, try
ant testextract

I'll see if I can narrow it down some.

[Comment from SourceForge]
Date: 2008-05-29 21:00
Sender: danielwilson
Logged In: YES 
user_id=1737686
Originator: NO

The extraction problem seems to have to do w/ the changes to
PDFStreamEngine.

If I revert that file, extraction succeeds.  Unfortunately ... with that
reverted but your other changes in place, image rendering hangs.

Will work on it more ... probably tomorrow.

[Comment from SourceForge]
Date: 2008-05-29 21:12
Sender: danielwilson
Logged In: YES 
user_id=1737686
Originator: NO

Correction ... it doesn't hang ... it's just slow on the first PDF to
render ... maybe just due to the first one I'm sending it.

Will look more tomorrow.

[Comment from SourceForge]
Date: 2008-05-30 07:11
Sender: lehmialk
Logged In: YES 
user_id=2069622
Originator: YES

I've found one bug. While deleting the if rules for the rotation, I've
deleted line 394 which is still needed.

I've attached the corrected file


File Added: PDFStreamEngine.java
http://sourceforge.net/tracker/download.php?group_id=78314&atid=552834&file_id=279559&aid=1977429

[Comment from SourceForge]
Date: 2008-05-30 07:43
Sender: lehmialk
Logged In: YES 
user_id=2069622
Originator: YES

I forgot to mention that I can't run the test suite. When I try to get the
whole project, I realized that I'm behind a firewall here in my office.
Consequently my cvs-client doesn't work. I've to do it from home. :-(

I've only tested one file: 601501018.pdf

There are additional blanks and they disapper after adding the missing
line. But starting at page 21, when the document orientation changes from
portrait to landscape, there are additional cr or lf. Hmmmm ??

[Comment from SourceForge]
Date: 2008-05-30 08:25
Sender: lehmialk
Logged In: YES 
user_id=2069622
Originator: YES

I've continued testing and I guess the problem is somewhere starting in
org.pdfbox.util.PDFTextStripper.showCharacter(..). Obviously it handles the
coordinates for rotated pages somehow in an other way than the
implementation of the showCharacter() in org.pdfbox.pdfviewer.PageDrawer.
But for the moment I don't understand what's happening in the
TextStripper, perhaps I'll find out later. 
I hope this hint helps ...

[Comment from SourceForge]
Date: 2008-05-30 16:20
Sender: danielwilson
Logged In: YES 
user_id=1737686
Originator: NO

I've put a couple more hours into this, and I don't know the answer.

I do know the text extraction is the more mature side of this library.

For the moment, I'll be skipping over your changes to PDFStreamEngine.

Thanks for the other changes!

[Comment from SourceForge]
Date: 2008-06-02 09:21
Sender: lehmialk
Logged In: YES 
user_id=2069622
Originator: YES

Hi Daniel,

I guess I've solved the problem. The textposition-handling has to be
adjusted within the method PDFTextStripper.flushText(). Of course my former
changes to the class PDFStreamEngine are needed. During debugging I found a
bug in the class TextPositionComparator (line 82). I solved it by removing
the rotation if-clauses. Whenever you compare two Textpositions, it is
needless to look at the rotation because they are on the same page so that
the comparison is independent of the rotation.

Furthermore my PDFTextStripper-patch seems to correct some minor problems,
which are described in
https://sourceforge.net/forum/message.php?msg_id=4976730.

I've tested the following cases:
Garcia2003b__Correlative_exploration_of_EEG_Signals.pdf works 100%
test_rotate_270.txt doesn't work 100%, but my patch corrected a bug in
lines 251-257, 278/279, 502/503, 574/575 and the other differences are some
kind of special-character-issues. I guess you have to correct the input at
first.

I've attached my changes based on the newest versions of both classes.

[Comment from SourceForge]
Date: 2008-06-02 09:22
Sender: lehmialk
Logged In: YES 
user_id=2069622
Originator: YES

File Added: pdfbox_rotation_patch_3.zip
http://sourceforge.net/tracker/download.php?group_id=78314&atid=552834&file_id=279842&aid=1977429

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Re: [jira] Commented: (PDFBOX-363) Fixed Page rotation

Posted by Brian Carrier <ca...@digital-evidence.org>.
On Nov 3, 2008, at 8:30 AM, Daniel Wilson wrote:

> I would say that the approach that fixes the most items w/o  
> breaking those
> that work is the better approach.
>
> What I have done is run the regression tests before committing.   
> If, though
> I've fixed something, I've broken a regression test, I haven't  
> committed
> that change.


The regression tests are failing for me (without any changes) because  
of the change that I previously submitted that replaced "null" with  
"?" for characters that could not be mapped. The files that are  
generating the failures are 10101-AR.pdf and  
Garcia2003b__Correlative...pdf.  Can someone update these?

thanks,
brian


Re: [jira] Commented: (PDFBOX-363) Fixed Page rotation

Posted by Daniel Wilson <wi...@gmail.com>.
I would say that the approach that fixes the most items w/o breaking those
that work is the better approach.

What I have done is run the regression tests before committing.  If, though
I've fixed something, I've broken a regression test, I haven't committed
that change.

We do have a regression test for the image rendering now (have since May,
IIRC) and 1 or 2 of those tests involve rotation.

Daniel Wilson

On 10/31/08, Brian Carrier (JIRA) <ji...@apache.org> wrote:
>
>
>     [
> https://issues.apache.org/jira/browse/PDFBOX-363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12644414#action_12644414]
>
>
> Brian Carrier commented on PDFBOX-363:
> --------------------------------------
>
>
> What additional information is needed for a commiter to make a decision on
> which approach will be used?  There are some additional text rotation
> problems that are not addressed by either this patch or the one associated
> with PDFBOX-374 and the fix requires taking the text matrix into account.
> The current fix that I am working on will be impacted by which approach is
> used.
>
> FWIW, the approach of not  rotating the text in PDFTextStripper and taking
> the rotation into account in TextPositionComparator will be easiest for the
> next fix as well (which will allow page rotation as well as having text on a
> page going in a different direction - such as on the y-axis of a graph).
>
>
>
> > Fixed Page rotation
> > -------------------
> >
> >                 Key: PDFBOX-363
> >                 URL: https://issues.apache.org/jira/browse/PDFBOX-363
> >             Project: PDFBox
> >          Issue Type: Improvement
> >            Reporter: Jukka Zitting
> >
> > [Issue from SourceForge]
> >
> http://sourceforge.net/tracker/index.php?func=detail&aid=1977429&group_id=78314&atid=552834
> > Hi all,
> > Daniel asked me for my patch for the rotation-issue described in
> > https://sourceforge.net/forum/message.php?msg_id=4992032
> > Attention, I didn't apply the newest patches to the classes
> PDFStreamEngine
> > and PageDrawer.
> > There are 4 more probably affected classes calling the page.findRotation
> > method which I didn't change, because I'm didn't have to use them (until
> > now).
> > org.pdfbox.util.operator.pagedrawer.Invoke
> > org.pdfbox.util.TextPositionComparator
> > org.pdfbox.examples.pdmodel.PrintURLs
> > org.pdfbox.examples.util.PrintImageLocations
> > I've attached a pdf in DINA4-landscape. The text is missplaced whenever I
> > try to print or display (using the pdfbox-PDFReader and convertToImage
> > within my application) it with pdfbox. The acrobat reader has no problems
> > with my documents.
> > After my patch everything works fine. Perhaps it is a point of
> discussion,
> > if the convertToImage method has to rotate the image or if the user has
> to
> > do it. The PDFPagePanel didn't do it (yet).
> > Andreas
> >
> http://sourceforge.net/tracker/download.php?group_id=78314&atid=552834&file_id=279404&aid=1977429
> > [Comment from SourceForge]
> > Date: 2008-05-29 12:42
> > Sender: danielwilson
> > Logged In: YES
> > user_id=1737686
> > Originator: NO
> > I've just tried your sample PDF w/ the latest code -- prior to
> application
> > of your patch.  It doesn't work.
> > I'll work on incorporating your change for a full regression test in the
> > next hour or so.
> > [Comment from SourceForge]
> > Date: 2008-05-29 15:16
> > Sender: lehmialk
> > Logged In: YES
> > user_id=2069622
> > Originator: YES
> > Hi Daniel,
> > I've just added my patch to the newest sources you send me earlier this
> > day. I guess it works. During testing I've found another problem
> concernign
> > graphics within landscape-docs. I found the solution in patching the
> class
> > org.pdfbox.util.operator.pagedrawer.Invoke in the same way I've patched
> the
> > others. And consequently to be strict I've also patched the new methods
> in
> > org.pdfbox.pdfviewer.PageDrawer
> > For my everthings works fine inlc. the 4PP-pdf.
> > I've attached the patched files and another testpdf with a embedded
> > graphic.
> > Andreas
> > File Added: pdfbox_rotation_patch_2.zip
> >
> http://sourceforge.net/tracker/download.php?group_id=78314&atid=552834&file_id=279471&aid=1977429
> > [Comment from SourceForge]
> > Date: 2008-05-29 18:12
> > Sender: danielwilson
> > Logged In: YES
> > user_id=1737686
> > Originator: NO
> > Your code works w/ the 4PP test ... and with the other rendering stuff
> > I've tried so far.
> > However ... the text extraction test fails with it.  I can't figure that
> > one out ... ideas?
> > [Comment from SourceForge]
> > Date: 2008-05-29 18:19
> > Sender: lehmialk
> > Logged In: YES
> > user_id=2069622
> > Originator: YES
> > Can you give me some more details? I never do any textextractions with
> > pdfbox. Perhaps you'll provide with the code for test program, or is it
> > part of pdfbox, so that I can find it in the cvs?
> > However, it has to wait until tomorrow
> > [Comment from SourceForge]
> > Date: 2008-05-29 18:39
> > Sender: danielwilson
> > Logged In: YES
> > user_id=1737686
> > Originator: NO
> > If you've got the whole project set up, try
> > ant testextract
> > I'll see if I can narrow it down some.
> > [Comment from SourceForge]
> > Date: 2008-05-29 21:00
> > Sender: danielwilson
> > Logged In: YES
> > user_id=1737686
> > Originator: NO
> > The extraction problem seems to have to do w/ the changes to
> > PDFStreamEngine.
> > If I revert that file, extraction succeeds.  Unfortunately ... with that
> > reverted but your other changes in place, image rendering hangs.
> > Will work on it more ... probably tomorrow.
> > [Comment from SourceForge]
> > Date: 2008-05-29 21:12
> > Sender: danielwilson
> > Logged In: YES
> > user_id=1737686
> > Originator: NO
> > Correction ... it doesn't hang ... it's just slow on the first PDF to
> > render ... maybe just due to the first one I'm sending it.
> > Will look more tomorrow.
> > [Comment from SourceForge]
> > Date: 2008-05-30 07:11
> > Sender: lehmialk
> > Logged In: YES
> > user_id=2069622
> > Originator: YES
> > I've found one bug. While deleting the if rules for the rotation, I've
> > deleted line 394 which is still needed.
> > I've attached the corrected file
> > File Added: PDFStreamEngine.java
> >
> http://sourceforge.net/tracker/download.php?group_id=78314&atid=552834&file_id=279559&aid=1977429
> > [Comment from SourceForge]
> > Date: 2008-05-30 07:43
> > Sender: lehmialk
> > Logged In: YES
> > user_id=2069622
> > Originator: YES
> > I forgot to mention that I can't run the test suite. When I try to get
> the
> > whole project, I realized that I'm behind a firewall here in my office.
> > Consequently my cvs-client doesn't work. I've to do it from home. :-(
> > I've only tested one file: 601501018.pdf
> > There are additional blanks and they disapper after adding the missing
> > line. But starting at page 21, when the document orientation changes from
> > portrait to landscape, there are additional cr or lf. Hmmmm ??
> > [Comment from SourceForge]
> > Date: 2008-05-30 08:25
> > Sender: lehmialk
> > Logged In: YES
> > user_id=2069622
> > Originator: YES
> > I've continued testing and I guess the problem is somewhere starting in
> > org.pdfbox.util.PDFTextStripper.showCharacter(..). Obviously it handles
> the
> > coordinates for rotated pages somehow in an other way than the
> > implementation of the showCharacter() in org.pdfbox.pdfviewer.PageDrawer.
> > But for the moment I don't understand what's happening in the
> > TextStripper, perhaps I'll find out later.
> > I hope this hint helps ...
> > [Comment from SourceForge]
> > Date: 2008-05-30 16:20
> > Sender: danielwilson
> > Logged In: YES
> > user_id=1737686
> > Originator: NO
> > I've put a couple more hours into this, and I don't know the answer.
> > I do know the text extraction is the more mature side of this library.
> > For the moment, I'll be skipping over your changes to PDFStreamEngine.
> > Thanks for the other changes!
> > [Comment from SourceForge]
> > Date: 2008-06-02 09:21
> > Sender: lehmialk
> > Logged In: YES
> > user_id=2069622
> > Originator: YES
> > Hi Daniel,
> > I guess I've solved the problem. The textposition-handling has to be
> > adjusted within the method PDFTextStripper.flushText(). Of course my
> former
> > changes to the class PDFStreamEngine are needed. During debugging I found
> a
> > bug in the class TextPositionComparator (line 82). I solved it by
> removing
> > the rotation if-clauses. Whenever you compare two Textpositions, it is
> > needless to look at the rotation because they are on the same page so
> that
> > the comparison is independent of the rotation.
> > Furthermore my PDFTextStripper-patch seems to correct some minor
> problems,
> > which are described in
> > https://sourceforge.net/forum/message.php?msg_id=4976730.
> > I've tested the following cases:
> > Garcia2003b__Correlative_exploration_of_EEG_Signals.pdf works 100%
> > test_rotate_270.txt doesn't work 100%, but my patch corrected a bug in
> > lines 251-257, 278/279, 502/503, 574/575 and the other differences are
> some
> > kind of special-character-issues. I guess you have to correct the input
> at
> > first.
> > I've attached my changes based on the newest versions of both classes.
> > [Comment from SourceForge]
> > Date: 2008-06-02 09:22
> > Sender: lehmialk
> > Logged In: YES
> > user_id=2069622
> > Originator: YES
> > File Added: pdfbox_rotation_patch_3.zip
> >
> http://sourceforge.net/tracker/download.php?group_id=78314&atid=552834&file_id=279842&aid=1977429
>
> --
> This message is automatically generated by JIRA.
> -
> You can reply to this email to add a comment to the issue online.
>
>

[jira] Resolved: (PDFBOX-363) Fixed Page rotation

Posted by "Jukka Zitting (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PDFBOX-363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jukka Zitting resolved PDFBOX-363.
----------------------------------

       Resolution: Fixed
    Fix Version/s: 0.8.0-incubator
         Assignee: Jukka Zitting

Thanks! Patch applied in revision 720169.


> Fixed Page rotation
> -------------------
>
>                 Key: PDFBOX-363
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-363
>             Project: PDFBox
>          Issue Type: Improvement
>            Reporter: Jukka Zitting
>            Assignee: Jukka Zitting
>             Fix For: 0.8.0-incubator
>
>         Attachments: landscape_rot90.pdf, PageRotation-Patch.diff, test-landscape2.pdf
>
>
> [Issue from SourceForge]
> http://sourceforge.net/tracker/index.php?func=detail&aid=1977429&group_id=78314&atid=552834
> Hi all,
> Daniel asked me for my patch for the rotation-issue described in
> https://sourceforge.net/forum/message.php?msg_id=4992032
> Attention, I didn't apply the newest patches to the classes PDFStreamEngine
> and PageDrawer.
> There are 4 more probably affected classes calling the page.findRotation
> method which I didn't change, because I'm didn't have to use them (until
> now).
> org.pdfbox.util.operator.pagedrawer.Invoke
> org.pdfbox.util.TextPositionComparator
> org.pdfbox.examples.pdmodel.PrintURLs
> org.pdfbox.examples.util.PrintImageLocations
> I've attached a pdf in DINA4-landscape. The text is missplaced whenever I
> try to print or display (using the pdfbox-PDFReader and convertToImage
> within my application) it with pdfbox. The acrobat reader has no problems
> with my documents.
> After my patch everything works fine. Perhaps it is a point of discussion,
> if the convertToImage method has to rotate the image or if the user has to
> do it. The PDFPagePanel didn't do it (yet).
> Andreas
> http://sourceforge.net/tracker/download.php?group_id=78314&atid=552834&file_id=279404&aid=1977429
> [Comment from SourceForge]
> Date: 2008-05-29 12:42
> Sender: danielwilson
> Logged In: YES 
> user_id=1737686
> Originator: NO
> I've just tried your sample PDF w/ the latest code -- prior to application
> of your patch.  It doesn't work.
> I'll work on incorporating your change for a full regression test in the
> next hour or so.
> [Comment from SourceForge]
> Date: 2008-05-29 15:16
> Sender: lehmialk
> Logged In: YES 
> user_id=2069622
> Originator: YES
> Hi Daniel,
> I've just added my patch to the newest sources you send me earlier this
> day. I guess it works. During testing I've found another problem concernign
> graphics within landscape-docs. I found the solution in patching the class
> org.pdfbox.util.operator.pagedrawer.Invoke in the same way I've patched the
> others. And consequently to be strict I've also patched the new methods in
> org.pdfbox.pdfviewer.PageDrawer
> For my everthings works fine inlc. the 4PP-pdf.
> I've attached the patched files and another testpdf with a embedded
> graphic.
> Andreas
> File Added: pdfbox_rotation_patch_2.zip
> http://sourceforge.net/tracker/download.php?group_id=78314&atid=552834&file_id=279471&aid=1977429
> [Comment from SourceForge]
> Date: 2008-05-29 18:12
> Sender: danielwilson
> Logged In: YES 
> user_id=1737686
> Originator: NO
> Your code works w/ the 4PP test ... and with the other rendering stuff
> I've tried so far.
> However ... the text extraction test fails with it.  I can't figure that
> one out ... ideas?
> [Comment from SourceForge]
> Date: 2008-05-29 18:19
> Sender: lehmialk
> Logged In: YES 
> user_id=2069622
> Originator: YES
> Can you give me some more details? I never do any textextractions with
> pdfbox. Perhaps you'll provide with the code for test program, or is it
> part of pdfbox, so that I can find it in the cvs?
> However, it has to wait until tomorrow
> [Comment from SourceForge]
> Date: 2008-05-29 18:39
> Sender: danielwilson
> Logged In: YES 
> user_id=1737686
> Originator: NO
> If you've got the whole project set up, try
> ant testextract
> I'll see if I can narrow it down some.
> [Comment from SourceForge]
> Date: 2008-05-29 21:00
> Sender: danielwilson
> Logged In: YES 
> user_id=1737686
> Originator: NO
> The extraction problem seems to have to do w/ the changes to
> PDFStreamEngine.
> If I revert that file, extraction succeeds.  Unfortunately ... with that
> reverted but your other changes in place, image rendering hangs.
> Will work on it more ... probably tomorrow.
> [Comment from SourceForge]
> Date: 2008-05-29 21:12
> Sender: danielwilson
> Logged In: YES 
> user_id=1737686
> Originator: NO
> Correction ... it doesn't hang ... it's just slow on the first PDF to
> render ... maybe just due to the first one I'm sending it.
> Will look more tomorrow.
> [Comment from SourceForge]
> Date: 2008-05-30 07:11
> Sender: lehmialk
> Logged In: YES 
> user_id=2069622
> Originator: YES
> I've found one bug. While deleting the if rules for the rotation, I've
> deleted line 394 which is still needed.
> I've attached the corrected file
> File Added: PDFStreamEngine.java
> http://sourceforge.net/tracker/download.php?group_id=78314&atid=552834&file_id=279559&aid=1977429
> [Comment from SourceForge]
> Date: 2008-05-30 07:43
> Sender: lehmialk
> Logged In: YES 
> user_id=2069622
> Originator: YES
> I forgot to mention that I can't run the test suite. When I try to get the
> whole project, I realized that I'm behind a firewall here in my office.
> Consequently my cvs-client doesn't work. I've to do it from home. :-(
> I've only tested one file: 601501018.pdf
> There are additional blanks and they disapper after adding the missing
> line. But starting at page 21, when the document orientation changes from
> portrait to landscape, there are additional cr or lf. Hmmmm ??
> [Comment from SourceForge]
> Date: 2008-05-30 08:25
> Sender: lehmialk
> Logged In: YES 
> user_id=2069622
> Originator: YES
> I've continued testing and I guess the problem is somewhere starting in
> org.pdfbox.util.PDFTextStripper.showCharacter(..). Obviously it handles the
> coordinates for rotated pages somehow in an other way than the
> implementation of the showCharacter() in org.pdfbox.pdfviewer.PageDrawer.
> But for the moment I don't understand what's happening in the
> TextStripper, perhaps I'll find out later. 
> I hope this hint helps ...
> [Comment from SourceForge]
> Date: 2008-05-30 16:20
> Sender: danielwilson
> Logged In: YES 
> user_id=1737686
> Originator: NO
> I've put a couple more hours into this, and I don't know the answer.
> I do know the text extraction is the more mature side of this library.
> For the moment, I'll be skipping over your changes to PDFStreamEngine.
> Thanks for the other changes!
> [Comment from SourceForge]
> Date: 2008-06-02 09:21
> Sender: lehmialk
> Logged In: YES 
> user_id=2069622
> Originator: YES
> Hi Daniel,
> I guess I've solved the problem. The textposition-handling has to be
> adjusted within the method PDFTextStripper.flushText(). Of course my former
> changes to the class PDFStreamEngine are needed. During debugging I found a
> bug in the class TextPositionComparator (line 82). I solved it by removing
> the rotation if-clauses. Whenever you compare two Textpositions, it is
> needless to look at the rotation because they are on the same page so that
> the comparison is independent of the rotation.
> Furthermore my PDFTextStripper-patch seems to correct some minor problems,
> which are described in
> https://sourceforge.net/forum/message.php?msg_id=4976730.
> I've tested the following cases:
> Garcia2003b__Correlative_exploration_of_EEG_Signals.pdf works 100%
> test_rotate_270.txt doesn't work 100%, but my patch corrected a bug in
> lines 251-257, 278/279, 502/503, 574/575 and the other differences are some
> kind of special-character-issues. I guess you have to correct the input at
> first.
> I've attached my changes based on the newest versions of both classes.
> [Comment from SourceForge]
> Date: 2008-06-02 09:22
> Sender: lehmialk
> Logged In: YES 
> user_id=2069622
> Originator: YES
> File Added: pdfbox_rotation_patch_3.zip
> http://sourceforge.net/tracker/download.php?group_id=78314&atid=552834&file_id=279842&aid=1977429

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Resolved: (PDFBOX-363) Fixed Page rotation

Posted by "Jukka Zitting (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PDFBOX-363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jukka Zitting resolved PDFBOX-363.
----------------------------------

    Resolution: Fixed

Excellent, thanks! Patch applied in revision 720692.

> Fixed Page rotation
> -------------------
>
>                 Key: PDFBOX-363
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-363
>             Project: PDFBox
>          Issue Type: Improvement
>            Reporter: Jukka Zitting
>            Assignee: Jukka Zitting
>             Fix For: 0.8.0-incubator
>
>         Attachments: landscape_rot90.pdf, PageRotation-Patch.diff, PageRotation-Patch2.diff, test-landscape2.pdf
>
>
> [Issue from SourceForge]
> http://sourceforge.net/tracker/index.php?func=detail&aid=1977429&group_id=78314&atid=552834
> Hi all,
> Daniel asked me for my patch for the rotation-issue described in
> https://sourceforge.net/forum/message.php?msg_id=4992032
> Attention, I didn't apply the newest patches to the classes PDFStreamEngine
> and PageDrawer.
> There are 4 more probably affected classes calling the page.findRotation
> method which I didn't change, because I'm didn't have to use them (until
> now).
> org.pdfbox.util.operator.pagedrawer.Invoke
> org.pdfbox.util.TextPositionComparator
> org.pdfbox.examples.pdmodel.PrintURLs
> org.pdfbox.examples.util.PrintImageLocations
> I've attached a pdf in DINA4-landscape. The text is missplaced whenever I
> try to print or display (using the pdfbox-PDFReader and convertToImage
> within my application) it with pdfbox. The acrobat reader has no problems
> with my documents.
> After my patch everything works fine. Perhaps it is a point of discussion,
> if the convertToImage method has to rotate the image or if the user has to
> do it. The PDFPagePanel didn't do it (yet).
> Andreas
> http://sourceforge.net/tracker/download.php?group_id=78314&atid=552834&file_id=279404&aid=1977429
> [Comment from SourceForge]
> Date: 2008-05-29 12:42
> Sender: danielwilson
> Logged In: YES 
> user_id=1737686
> Originator: NO
> I've just tried your sample PDF w/ the latest code -- prior to application
> of your patch.  It doesn't work.
> I'll work on incorporating your change for a full regression test in the
> next hour or so.
> [Comment from SourceForge]
> Date: 2008-05-29 15:16
> Sender: lehmialk
> Logged In: YES 
> user_id=2069622
> Originator: YES
> Hi Daniel,
> I've just added my patch to the newest sources you send me earlier this
> day. I guess it works. During testing I've found another problem concernign
> graphics within landscape-docs. I found the solution in patching the class
> org.pdfbox.util.operator.pagedrawer.Invoke in the same way I've patched the
> others. And consequently to be strict I've also patched the new methods in
> org.pdfbox.pdfviewer.PageDrawer
> For my everthings works fine inlc. the 4PP-pdf.
> I've attached the patched files and another testpdf with a embedded
> graphic.
> Andreas
> File Added: pdfbox_rotation_patch_2.zip
> http://sourceforge.net/tracker/download.php?group_id=78314&atid=552834&file_id=279471&aid=1977429
> [Comment from SourceForge]
> Date: 2008-05-29 18:12
> Sender: danielwilson
> Logged In: YES 
> user_id=1737686
> Originator: NO
> Your code works w/ the 4PP test ... and with the other rendering stuff
> I've tried so far.
> However ... the text extraction test fails with it.  I can't figure that
> one out ... ideas?
> [Comment from SourceForge]
> Date: 2008-05-29 18:19
> Sender: lehmialk
> Logged In: YES 
> user_id=2069622
> Originator: YES
> Can you give me some more details? I never do any textextractions with
> pdfbox. Perhaps you'll provide with the code for test program, or is it
> part of pdfbox, so that I can find it in the cvs?
> However, it has to wait until tomorrow
> [Comment from SourceForge]
> Date: 2008-05-29 18:39
> Sender: danielwilson
> Logged In: YES 
> user_id=1737686
> Originator: NO
> If you've got the whole project set up, try
> ant testextract
> I'll see if I can narrow it down some.
> [Comment from SourceForge]
> Date: 2008-05-29 21:00
> Sender: danielwilson
> Logged In: YES 
> user_id=1737686
> Originator: NO
> The extraction problem seems to have to do w/ the changes to
> PDFStreamEngine.
> If I revert that file, extraction succeeds.  Unfortunately ... with that
> reverted but your other changes in place, image rendering hangs.
> Will work on it more ... probably tomorrow.
> [Comment from SourceForge]
> Date: 2008-05-29 21:12
> Sender: danielwilson
> Logged In: YES 
> user_id=1737686
> Originator: NO
> Correction ... it doesn't hang ... it's just slow on the first PDF to
> render ... maybe just due to the first one I'm sending it.
> Will look more tomorrow.
> [Comment from SourceForge]
> Date: 2008-05-30 07:11
> Sender: lehmialk
> Logged In: YES 
> user_id=2069622
> Originator: YES
> I've found one bug. While deleting the if rules for the rotation, I've
> deleted line 394 which is still needed.
> I've attached the corrected file
> File Added: PDFStreamEngine.java
> http://sourceforge.net/tracker/download.php?group_id=78314&atid=552834&file_id=279559&aid=1977429
> [Comment from SourceForge]
> Date: 2008-05-30 07:43
> Sender: lehmialk
> Logged In: YES 
> user_id=2069622
> Originator: YES
> I forgot to mention that I can't run the test suite. When I try to get the
> whole project, I realized that I'm behind a firewall here in my office.
> Consequently my cvs-client doesn't work. I've to do it from home. :-(
> I've only tested one file: 601501018.pdf
> There are additional blanks and they disapper after adding the missing
> line. But starting at page 21, when the document orientation changes from
> portrait to landscape, there are additional cr or lf. Hmmmm ??
> [Comment from SourceForge]
> Date: 2008-05-30 08:25
> Sender: lehmialk
> Logged In: YES 
> user_id=2069622
> Originator: YES
> I've continued testing and I guess the problem is somewhere starting in
> org.pdfbox.util.PDFTextStripper.showCharacter(..). Obviously it handles the
> coordinates for rotated pages somehow in an other way than the
> implementation of the showCharacter() in org.pdfbox.pdfviewer.PageDrawer.
> But for the moment I don't understand what's happening in the
> TextStripper, perhaps I'll find out later. 
> I hope this hint helps ...
> [Comment from SourceForge]
> Date: 2008-05-30 16:20
> Sender: danielwilson
> Logged In: YES 
> user_id=1737686
> Originator: NO
> I've put a couple more hours into this, and I don't know the answer.
> I do know the text extraction is the more mature side of this library.
> For the moment, I'll be skipping over your changes to PDFStreamEngine.
> Thanks for the other changes!
> [Comment from SourceForge]
> Date: 2008-06-02 09:21
> Sender: lehmialk
> Logged In: YES 
> user_id=2069622
> Originator: YES
> Hi Daniel,
> I guess I've solved the problem. The textposition-handling has to be
> adjusted within the method PDFTextStripper.flushText(). Of course my former
> changes to the class PDFStreamEngine are needed. During debugging I found a
> bug in the class TextPositionComparator (line 82). I solved it by removing
> the rotation if-clauses. Whenever you compare two Textpositions, it is
> needless to look at the rotation because they are on the same page so that
> the comparison is independent of the rotation.
> Furthermore my PDFTextStripper-patch seems to correct some minor problems,
> which are described in
> https://sourceforge.net/forum/message.php?msg_id=4976730.
> I've tested the following cases:
> Garcia2003b__Correlative_exploration_of_EEG_Signals.pdf works 100%
> test_rotate_270.txt doesn't work 100%, but my patch corrected a bug in
> lines 251-257, 278/279, 502/503, 574/575 and the other differences are some
> kind of special-character-issues. I guess you have to correct the input at
> first.
> I've attached my changes based on the newest versions of both classes.
> [Comment from SourceForge]
> Date: 2008-06-02 09:22
> Sender: lehmialk
> Logged In: YES 
> user_id=2069622
> Originator: YES
> File Added: pdfbox_rotation_patch_3.zip
> http://sourceforge.net/tracker/download.php?group_id=78314&atid=552834&file_id=279842&aid=1977429

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PDFBOX-363) Fixed Page rotation

Posted by "Brian Carrier (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PDFBOX-363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12644414#action_12644414 ] 

Brian Carrier commented on PDFBOX-363:
--------------------------------------

What additional information is needed for a commiter to make a decision on which approach will be used?  There are some additional text rotation problems that are not addressed by either this patch or the one associated with PDFBOX-374 and the fix requires taking the text matrix into account. The current fix that I am working on will be impacted by which approach is used. 

FWIW, the approach of not  rotating the text in PDFTextStripper and taking the rotation into account in TextPositionComparator will be easiest for the next fix as well (which will allow page rotation as well as having text on a page going in a different direction - such as on the y-axis of a graph). 


> Fixed Page rotation
> -------------------
>
>                 Key: PDFBOX-363
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-363
>             Project: PDFBox
>          Issue Type: Improvement
>            Reporter: Jukka Zitting
>
> [Issue from SourceForge]
> http://sourceforge.net/tracker/index.php?func=detail&aid=1977429&group_id=78314&atid=552834
> Hi all,
> Daniel asked me for my patch for the rotation-issue described in
> https://sourceforge.net/forum/message.php?msg_id=4992032
> Attention, I didn't apply the newest patches to the classes PDFStreamEngine
> and PageDrawer.
> There are 4 more probably affected classes calling the page.findRotation
> method which I didn't change, because I'm didn't have to use them (until
> now).
> org.pdfbox.util.operator.pagedrawer.Invoke
> org.pdfbox.util.TextPositionComparator
> org.pdfbox.examples.pdmodel.PrintURLs
> org.pdfbox.examples.util.PrintImageLocations
> I've attached a pdf in DINA4-landscape. The text is missplaced whenever I
> try to print or display (using the pdfbox-PDFReader and convertToImage
> within my application) it with pdfbox. The acrobat reader has no problems
> with my documents.
> After my patch everything works fine. Perhaps it is a point of discussion,
> if the convertToImage method has to rotate the image or if the user has to
> do it. The PDFPagePanel didn't do it (yet).
> Andreas
> http://sourceforge.net/tracker/download.php?group_id=78314&atid=552834&file_id=279404&aid=1977429
> [Comment from SourceForge]
> Date: 2008-05-29 12:42
> Sender: danielwilson
> Logged In: YES 
> user_id=1737686
> Originator: NO
> I've just tried your sample PDF w/ the latest code -- prior to application
> of your patch.  It doesn't work.
> I'll work on incorporating your change for a full regression test in the
> next hour or so.
> [Comment from SourceForge]
> Date: 2008-05-29 15:16
> Sender: lehmialk
> Logged In: YES 
> user_id=2069622
> Originator: YES
> Hi Daniel,
> I've just added my patch to the newest sources you send me earlier this
> day. I guess it works. During testing I've found another problem concernign
> graphics within landscape-docs. I found the solution in patching the class
> org.pdfbox.util.operator.pagedrawer.Invoke in the same way I've patched the
> others. And consequently to be strict I've also patched the new methods in
> org.pdfbox.pdfviewer.PageDrawer
> For my everthings works fine inlc. the 4PP-pdf.
> I've attached the patched files and another testpdf with a embedded
> graphic.
> Andreas
> File Added: pdfbox_rotation_patch_2.zip
> http://sourceforge.net/tracker/download.php?group_id=78314&atid=552834&file_id=279471&aid=1977429
> [Comment from SourceForge]
> Date: 2008-05-29 18:12
> Sender: danielwilson
> Logged In: YES 
> user_id=1737686
> Originator: NO
> Your code works w/ the 4PP test ... and with the other rendering stuff
> I've tried so far.
> However ... the text extraction test fails with it.  I can't figure that
> one out ... ideas?
> [Comment from SourceForge]
> Date: 2008-05-29 18:19
> Sender: lehmialk
> Logged In: YES 
> user_id=2069622
> Originator: YES
> Can you give me some more details? I never do any textextractions with
> pdfbox. Perhaps you'll provide with the code for test program, or is it
> part of pdfbox, so that I can find it in the cvs?
> However, it has to wait until tomorrow
> [Comment from SourceForge]
> Date: 2008-05-29 18:39
> Sender: danielwilson
> Logged In: YES 
> user_id=1737686
> Originator: NO
> If you've got the whole project set up, try
> ant testextract
> I'll see if I can narrow it down some.
> [Comment from SourceForge]
> Date: 2008-05-29 21:00
> Sender: danielwilson
> Logged In: YES 
> user_id=1737686
> Originator: NO
> The extraction problem seems to have to do w/ the changes to
> PDFStreamEngine.
> If I revert that file, extraction succeeds.  Unfortunately ... with that
> reverted but your other changes in place, image rendering hangs.
> Will work on it more ... probably tomorrow.
> [Comment from SourceForge]
> Date: 2008-05-29 21:12
> Sender: danielwilson
> Logged In: YES 
> user_id=1737686
> Originator: NO
> Correction ... it doesn't hang ... it's just slow on the first PDF to
> render ... maybe just due to the first one I'm sending it.
> Will look more tomorrow.
> [Comment from SourceForge]
> Date: 2008-05-30 07:11
> Sender: lehmialk
> Logged In: YES 
> user_id=2069622
> Originator: YES
> I've found one bug. While deleting the if rules for the rotation, I've
> deleted line 394 which is still needed.
> I've attached the corrected file
> File Added: PDFStreamEngine.java
> http://sourceforge.net/tracker/download.php?group_id=78314&atid=552834&file_id=279559&aid=1977429
> [Comment from SourceForge]
> Date: 2008-05-30 07:43
> Sender: lehmialk
> Logged In: YES 
> user_id=2069622
> Originator: YES
> I forgot to mention that I can't run the test suite. When I try to get the
> whole project, I realized that I'm behind a firewall here in my office.
> Consequently my cvs-client doesn't work. I've to do it from home. :-(
> I've only tested one file: 601501018.pdf
> There are additional blanks and they disapper after adding the missing
> line. But starting at page 21, when the document orientation changes from
> portrait to landscape, there are additional cr or lf. Hmmmm ??
> [Comment from SourceForge]
> Date: 2008-05-30 08:25
> Sender: lehmialk
> Logged In: YES 
> user_id=2069622
> Originator: YES
> I've continued testing and I guess the problem is somewhere starting in
> org.pdfbox.util.PDFTextStripper.showCharacter(..). Obviously it handles the
> coordinates for rotated pages somehow in an other way than the
> implementation of the showCharacter() in org.pdfbox.pdfviewer.PageDrawer.
> But for the moment I don't understand what's happening in the
> TextStripper, perhaps I'll find out later. 
> I hope this hint helps ...
> [Comment from SourceForge]
> Date: 2008-05-30 16:20
> Sender: danielwilson
> Logged In: YES 
> user_id=1737686
> Originator: NO
> I've put a couple more hours into this, and I don't know the answer.
> I do know the text extraction is the more mature side of this library.
> For the moment, I'll be skipping over your changes to PDFStreamEngine.
> Thanks for the other changes!
> [Comment from SourceForge]
> Date: 2008-06-02 09:21
> Sender: lehmialk
> Logged In: YES 
> user_id=2069622
> Originator: YES
> Hi Daniel,
> I guess I've solved the problem. The textposition-handling has to be
> adjusted within the method PDFTextStripper.flushText(). Of course my former
> changes to the class PDFStreamEngine are needed. During debugging I found a
> bug in the class TextPositionComparator (line 82). I solved it by removing
> the rotation if-clauses. Whenever you compare two Textpositions, it is
> needless to look at the rotation because they are on the same page so that
> the comparison is independent of the rotation.
> Furthermore my PDFTextStripper-patch seems to correct some minor problems,
> which are described in
> https://sourceforge.net/forum/message.php?msg_id=4976730.
> I've tested the following cases:
> Garcia2003b__Correlative_exploration_of_EEG_Signals.pdf works 100%
> test_rotate_270.txt doesn't work 100%, but my patch corrected a bug in
> lines 251-257, 278/279, 502/503, 574/575 and the other differences are some
> kind of special-character-issues. I guess you have to correct the input at
> first.
> I've attached my changes based on the newest versions of both classes.
> [Comment from SourceForge]
> Date: 2008-06-02 09:22
> Sender: lehmialk
> Logged In: YES 
> user_id=2069622
> Originator: YES
> File Added: pdfbox_rotation_patch_3.zip
> http://sourceforge.net/tracker/download.php?group_id=78314&atid=552834&file_id=279842&aid=1977429

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PDFBOX-363) Fixed Page rotation

Posted by "Andreas Lehmkühler (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PDFBOX-363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12632153#action_12632153 ] 

Andreas Lehmkühler commented on PDFBOX-363:
-------------------------------------------

To make it easier to work on this issue, I'll try to give a summary on the things which had been already done.

- Most parts of my patches have been already commited to the source, before it went to apache

- There is still a disagreement between daniel and me concerning the patches for the following classes:

PDFStreamEngine
PDFTextStripper (the changes to this class depend on the changes to the class PDFStreamEngine)

I saw that PDFBOX-374 has something to do with the same code. I'll have a look at that changes and perhaps my changes will be obsolete, so that this issue could be closed.



> Fixed Page rotation
> -------------------
>
>                 Key: PDFBOX-363
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-363
>             Project: PDFBox
>          Issue Type: Improvement
>            Reporter: Jukka Zitting
>
> [Issue from SourceForge]
> http://sourceforge.net/tracker/index.php?func=detail&aid=1977429&group_id=78314&atid=552834
> Hi all,
> Daniel asked me for my patch for the rotation-issue described in
> https://sourceforge.net/forum/message.php?msg_id=4992032
> Attention, I didn't apply the newest patches to the classes PDFStreamEngine
> and PageDrawer.
> There are 4 more probably affected classes calling the page.findRotation
> method which I didn't change, because I'm didn't have to use them (until
> now).
> org.pdfbox.util.operator.pagedrawer.Invoke
> org.pdfbox.util.TextPositionComparator
> org.pdfbox.examples.pdmodel.PrintURLs
> org.pdfbox.examples.util.PrintImageLocations
> I've attached a pdf in DINA4-landscape. The text is missplaced whenever I
> try to print or display (using the pdfbox-PDFReader and convertToImage
> within my application) it with pdfbox. The acrobat reader has no problems
> with my documents.
> After my patch everything works fine. Perhaps it is a point of discussion,
> if the convertToImage method has to rotate the image or if the user has to
> do it. The PDFPagePanel didn't do it (yet).
> Andreas
> http://sourceforge.net/tracker/download.php?group_id=78314&atid=552834&file_id=279404&aid=1977429
> [Comment from SourceForge]
> Date: 2008-05-29 12:42
> Sender: danielwilson
> Logged In: YES 
> user_id=1737686
> Originator: NO
> I've just tried your sample PDF w/ the latest code -- prior to application
> of your patch.  It doesn't work.
> I'll work on incorporating your change for a full regression test in the
> next hour or so.
> [Comment from SourceForge]
> Date: 2008-05-29 15:16
> Sender: lehmialk
> Logged In: YES 
> user_id=2069622
> Originator: YES
> Hi Daniel,
> I've just added my patch to the newest sources you send me earlier this
> day. I guess it works. During testing I've found another problem concernign
> graphics within landscape-docs. I found the solution in patching the class
> org.pdfbox.util.operator.pagedrawer.Invoke in the same way I've patched the
> others. And consequently to be strict I've also patched the new methods in
> org.pdfbox.pdfviewer.PageDrawer
> For my everthings works fine inlc. the 4PP-pdf.
> I've attached the patched files and another testpdf with a embedded
> graphic.
> Andreas
> File Added: pdfbox_rotation_patch_2.zip
> http://sourceforge.net/tracker/download.php?group_id=78314&atid=552834&file_id=279471&aid=1977429
> [Comment from SourceForge]
> Date: 2008-05-29 18:12
> Sender: danielwilson
> Logged In: YES 
> user_id=1737686
> Originator: NO
> Your code works w/ the 4PP test ... and with the other rendering stuff
> I've tried so far.
> However ... the text extraction test fails with it.  I can't figure that
> one out ... ideas?
> [Comment from SourceForge]
> Date: 2008-05-29 18:19
> Sender: lehmialk
> Logged In: YES 
> user_id=2069622
> Originator: YES
> Can you give me some more details? I never do any textextractions with
> pdfbox. Perhaps you'll provide with the code for test program, or is it
> part of pdfbox, so that I can find it in the cvs?
> However, it has to wait until tomorrow
> [Comment from SourceForge]
> Date: 2008-05-29 18:39
> Sender: danielwilson
> Logged In: YES 
> user_id=1737686
> Originator: NO
> If you've got the whole project set up, try
> ant testextract
> I'll see if I can narrow it down some.
> [Comment from SourceForge]
> Date: 2008-05-29 21:00
> Sender: danielwilson
> Logged In: YES 
> user_id=1737686
> Originator: NO
> The extraction problem seems to have to do w/ the changes to
> PDFStreamEngine.
> If I revert that file, extraction succeeds.  Unfortunately ... with that
> reverted but your other changes in place, image rendering hangs.
> Will work on it more ... probably tomorrow.
> [Comment from SourceForge]
> Date: 2008-05-29 21:12
> Sender: danielwilson
> Logged In: YES 
> user_id=1737686
> Originator: NO
> Correction ... it doesn't hang ... it's just slow on the first PDF to
> render ... maybe just due to the first one I'm sending it.
> Will look more tomorrow.
> [Comment from SourceForge]
> Date: 2008-05-30 07:11
> Sender: lehmialk
> Logged In: YES 
> user_id=2069622
> Originator: YES
> I've found one bug. While deleting the if rules for the rotation, I've
> deleted line 394 which is still needed.
> I've attached the corrected file
> File Added: PDFStreamEngine.java
> http://sourceforge.net/tracker/download.php?group_id=78314&atid=552834&file_id=279559&aid=1977429
> [Comment from SourceForge]
> Date: 2008-05-30 07:43
> Sender: lehmialk
> Logged In: YES 
> user_id=2069622
> Originator: YES
> I forgot to mention that I can't run the test suite. When I try to get the
> whole project, I realized that I'm behind a firewall here in my office.
> Consequently my cvs-client doesn't work. I've to do it from home. :-(
> I've only tested one file: 601501018.pdf
> There are additional blanks and they disapper after adding the missing
> line. But starting at page 21, when the document orientation changes from
> portrait to landscape, there are additional cr or lf. Hmmmm ??
> [Comment from SourceForge]
> Date: 2008-05-30 08:25
> Sender: lehmialk
> Logged In: YES 
> user_id=2069622
> Originator: YES
> I've continued testing and I guess the problem is somewhere starting in
> org.pdfbox.util.PDFTextStripper.showCharacter(..). Obviously it handles the
> coordinates for rotated pages somehow in an other way than the
> implementation of the showCharacter() in org.pdfbox.pdfviewer.PageDrawer.
> But for the moment I don't understand what's happening in the
> TextStripper, perhaps I'll find out later. 
> I hope this hint helps ...
> [Comment from SourceForge]
> Date: 2008-05-30 16:20
> Sender: danielwilson
> Logged In: YES 
> user_id=1737686
> Originator: NO
> I've put a couple more hours into this, and I don't know the answer.
> I do know the text extraction is the more mature side of this library.
> For the moment, I'll be skipping over your changes to PDFStreamEngine.
> Thanks for the other changes!
> [Comment from SourceForge]
> Date: 2008-06-02 09:21
> Sender: lehmialk
> Logged In: YES 
> user_id=2069622
> Originator: YES
> Hi Daniel,
> I guess I've solved the problem. The textposition-handling has to be
> adjusted within the method PDFTextStripper.flushText(). Of course my former
> changes to the class PDFStreamEngine are needed. During debugging I found a
> bug in the class TextPositionComparator (line 82). I solved it by removing
> the rotation if-clauses. Whenever you compare two Textpositions, it is
> needless to look at the rotation because they are on the same page so that
> the comparison is independent of the rotation.
> Furthermore my PDFTextStripper-patch seems to correct some minor problems,
> which are described in
> https://sourceforge.net/forum/message.php?msg_id=4976730.
> I've tested the following cases:
> Garcia2003b__Correlative_exploration_of_EEG_Signals.pdf works 100%
> test_rotate_270.txt doesn't work 100%, but my patch corrected a bug in
> lines 251-257, 278/279, 502/503, 574/575 and the other differences are some
> kind of special-character-issues. I guess you have to correct the input at
> first.
> I've attached my changes based on the newest versions of both classes.
> [Comment from SourceForge]
> Date: 2008-06-02 09:22
> Sender: lehmialk
> Logged In: YES 
> user_id=2069622
> Originator: YES
> File Added: pdfbox_rotation_patch_3.zip
> http://sourceforge.net/tracker/download.php?group_id=78314&atid=552834&file_id=279842&aid=1977429

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Reopened: (PDFBOX-363) Fixed Page rotation

Posted by "Jukka Zitting (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PDFBOX-363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jukka Zitting reopened PDFBOX-363:
----------------------------------


Reopening as the change introduced the Java 6 method AffineTransform.quadrantRotate() on line 168 of PageDrawer.java.

Can we replace that with something from Java 1.4?

> Fixed Page rotation
> -------------------
>
>                 Key: PDFBOX-363
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-363
>             Project: PDFBox
>          Issue Type: Improvement
>            Reporter: Jukka Zitting
>            Assignee: Jukka Zitting
>             Fix For: 0.8.0-incubator
>
>         Attachments: landscape_rot90.pdf, PageRotation-Patch.diff, test-landscape2.pdf
>
>
> [Issue from SourceForge]
> http://sourceforge.net/tracker/index.php?func=detail&aid=1977429&group_id=78314&atid=552834
> Hi all,
> Daniel asked me for my patch for the rotation-issue described in
> https://sourceforge.net/forum/message.php?msg_id=4992032
> Attention, I didn't apply the newest patches to the classes PDFStreamEngine
> and PageDrawer.
> There are 4 more probably affected classes calling the page.findRotation
> method which I didn't change, because I'm didn't have to use them (until
> now).
> org.pdfbox.util.operator.pagedrawer.Invoke
> org.pdfbox.util.TextPositionComparator
> org.pdfbox.examples.pdmodel.PrintURLs
> org.pdfbox.examples.util.PrintImageLocations
> I've attached a pdf in DINA4-landscape. The text is missplaced whenever I
> try to print or display (using the pdfbox-PDFReader and convertToImage
> within my application) it with pdfbox. The acrobat reader has no problems
> with my documents.
> After my patch everything works fine. Perhaps it is a point of discussion,
> if the convertToImage method has to rotate the image or if the user has to
> do it. The PDFPagePanel didn't do it (yet).
> Andreas
> http://sourceforge.net/tracker/download.php?group_id=78314&atid=552834&file_id=279404&aid=1977429
> [Comment from SourceForge]
> Date: 2008-05-29 12:42
> Sender: danielwilson
> Logged In: YES 
> user_id=1737686
> Originator: NO
> I've just tried your sample PDF w/ the latest code -- prior to application
> of your patch.  It doesn't work.
> I'll work on incorporating your change for a full regression test in the
> next hour or so.
> [Comment from SourceForge]
> Date: 2008-05-29 15:16
> Sender: lehmialk
> Logged In: YES 
> user_id=2069622
> Originator: YES
> Hi Daniel,
> I've just added my patch to the newest sources you send me earlier this
> day. I guess it works. During testing I've found another problem concernign
> graphics within landscape-docs. I found the solution in patching the class
> org.pdfbox.util.operator.pagedrawer.Invoke in the same way I've patched the
> others. And consequently to be strict I've also patched the new methods in
> org.pdfbox.pdfviewer.PageDrawer
> For my everthings works fine inlc. the 4PP-pdf.
> I've attached the patched files and another testpdf with a embedded
> graphic.
> Andreas
> File Added: pdfbox_rotation_patch_2.zip
> http://sourceforge.net/tracker/download.php?group_id=78314&atid=552834&file_id=279471&aid=1977429
> [Comment from SourceForge]
> Date: 2008-05-29 18:12
> Sender: danielwilson
> Logged In: YES 
> user_id=1737686
> Originator: NO
> Your code works w/ the 4PP test ... and with the other rendering stuff
> I've tried so far.
> However ... the text extraction test fails with it.  I can't figure that
> one out ... ideas?
> [Comment from SourceForge]
> Date: 2008-05-29 18:19
> Sender: lehmialk
> Logged In: YES 
> user_id=2069622
> Originator: YES
> Can you give me some more details? I never do any textextractions with
> pdfbox. Perhaps you'll provide with the code for test program, or is it
> part of pdfbox, so that I can find it in the cvs?
> However, it has to wait until tomorrow
> [Comment from SourceForge]
> Date: 2008-05-29 18:39
> Sender: danielwilson
> Logged In: YES 
> user_id=1737686
> Originator: NO
> If you've got the whole project set up, try
> ant testextract
> I'll see if I can narrow it down some.
> [Comment from SourceForge]
> Date: 2008-05-29 21:00
> Sender: danielwilson
> Logged In: YES 
> user_id=1737686
> Originator: NO
> The extraction problem seems to have to do w/ the changes to
> PDFStreamEngine.
> If I revert that file, extraction succeeds.  Unfortunately ... with that
> reverted but your other changes in place, image rendering hangs.
> Will work on it more ... probably tomorrow.
> [Comment from SourceForge]
> Date: 2008-05-29 21:12
> Sender: danielwilson
> Logged In: YES 
> user_id=1737686
> Originator: NO
> Correction ... it doesn't hang ... it's just slow on the first PDF to
> render ... maybe just due to the first one I'm sending it.
> Will look more tomorrow.
> [Comment from SourceForge]
> Date: 2008-05-30 07:11
> Sender: lehmialk
> Logged In: YES 
> user_id=2069622
> Originator: YES
> I've found one bug. While deleting the if rules for the rotation, I've
> deleted line 394 which is still needed.
> I've attached the corrected file
> File Added: PDFStreamEngine.java
> http://sourceforge.net/tracker/download.php?group_id=78314&atid=552834&file_id=279559&aid=1977429
> [Comment from SourceForge]
> Date: 2008-05-30 07:43
> Sender: lehmialk
> Logged In: YES 
> user_id=2069622
> Originator: YES
> I forgot to mention that I can't run the test suite. When I try to get the
> whole project, I realized that I'm behind a firewall here in my office.
> Consequently my cvs-client doesn't work. I've to do it from home. :-(
> I've only tested one file: 601501018.pdf
> There are additional blanks and they disapper after adding the missing
> line. But starting at page 21, when the document orientation changes from
> portrait to landscape, there are additional cr or lf. Hmmmm ??
> [Comment from SourceForge]
> Date: 2008-05-30 08:25
> Sender: lehmialk
> Logged In: YES 
> user_id=2069622
> Originator: YES
> I've continued testing and I guess the problem is somewhere starting in
> org.pdfbox.util.PDFTextStripper.showCharacter(..). Obviously it handles the
> coordinates for rotated pages somehow in an other way than the
> implementation of the showCharacter() in org.pdfbox.pdfviewer.PageDrawer.
> But for the moment I don't understand what's happening in the
> TextStripper, perhaps I'll find out later. 
> I hope this hint helps ...
> [Comment from SourceForge]
> Date: 2008-05-30 16:20
> Sender: danielwilson
> Logged In: YES 
> user_id=1737686
> Originator: NO
> I've put a couple more hours into this, and I don't know the answer.
> I do know the text extraction is the more mature side of this library.
> For the moment, I'll be skipping over your changes to PDFStreamEngine.
> Thanks for the other changes!
> [Comment from SourceForge]
> Date: 2008-06-02 09:21
> Sender: lehmialk
> Logged In: YES 
> user_id=2069622
> Originator: YES
> Hi Daniel,
> I guess I've solved the problem. The textposition-handling has to be
> adjusted within the method PDFTextStripper.flushText(). Of course my former
> changes to the class PDFStreamEngine are needed. During debugging I found a
> bug in the class TextPositionComparator (line 82). I solved it by removing
> the rotation if-clauses. Whenever you compare two Textpositions, it is
> needless to look at the rotation because they are on the same page so that
> the comparison is independent of the rotation.
> Furthermore my PDFTextStripper-patch seems to correct some minor problems,
> which are described in
> https://sourceforge.net/forum/message.php?msg_id=4976730.
> I've tested the following cases:
> Garcia2003b__Correlative_exploration_of_EEG_Signals.pdf works 100%
> test_rotate_270.txt doesn't work 100%, but my patch corrected a bug in
> lines 251-257, 278/279, 502/503, 574/575 and the other differences are some
> kind of special-character-issues. I guess you have to correct the input at
> first.
> I've attached my changes based on the newest versions of both classes.
> [Comment from SourceForge]
> Date: 2008-06-02 09:22
> Sender: lehmialk
> Logged In: YES 
> user_id=2069622
> Originator: YES
> File Added: pdfbox_rotation_patch_3.zip
> http://sourceforge.net/tracker/download.php?group_id=78314&atid=552834&file_id=279842&aid=1977429

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PDFBOX-363) Fixed Page rotation

Posted by "Andreas Lehmkühler (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PDFBOX-363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Andreas Lehmkühler updated PDFBOX-363:
--------------------------------------

    Attachment: test-landscape2.pdf

This document contains 2 boxes (1 in the upper left corner and 1 i lower right corner ) with text. It has a landscape orientation. It's useful to test the whole rotation-stuff espacially for displaying or printing

> Fixed Page rotation
> -------------------
>
>                 Key: PDFBOX-363
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-363
>             Project: PDFBox
>          Issue Type: Improvement
>            Reporter: Jukka Zitting
>         Attachments: landscape_rot90.pdf, test-landscape2.pdf
>
>
> [Issue from SourceForge]
> http://sourceforge.net/tracker/index.php?func=detail&aid=1977429&group_id=78314&atid=552834
> Hi all,
> Daniel asked me for my patch for the rotation-issue described in
> https://sourceforge.net/forum/message.php?msg_id=4992032
> Attention, I didn't apply the newest patches to the classes PDFStreamEngine
> and PageDrawer.
> There are 4 more probably affected classes calling the page.findRotation
> method which I didn't change, because I'm didn't have to use them (until
> now).
> org.pdfbox.util.operator.pagedrawer.Invoke
> org.pdfbox.util.TextPositionComparator
> org.pdfbox.examples.pdmodel.PrintURLs
> org.pdfbox.examples.util.PrintImageLocations
> I've attached a pdf in DINA4-landscape. The text is missplaced whenever I
> try to print or display (using the pdfbox-PDFReader and convertToImage
> within my application) it with pdfbox. The acrobat reader has no problems
> with my documents.
> After my patch everything works fine. Perhaps it is a point of discussion,
> if the convertToImage method has to rotate the image or if the user has to
> do it. The PDFPagePanel didn't do it (yet).
> Andreas
> http://sourceforge.net/tracker/download.php?group_id=78314&atid=552834&file_id=279404&aid=1977429
> [Comment from SourceForge]
> Date: 2008-05-29 12:42
> Sender: danielwilson
> Logged In: YES 
> user_id=1737686
> Originator: NO
> I've just tried your sample PDF w/ the latest code -- prior to application
> of your patch.  It doesn't work.
> I'll work on incorporating your change for a full regression test in the
> next hour or so.
> [Comment from SourceForge]
> Date: 2008-05-29 15:16
> Sender: lehmialk
> Logged In: YES 
> user_id=2069622
> Originator: YES
> Hi Daniel,
> I've just added my patch to the newest sources you send me earlier this
> day. I guess it works. During testing I've found another problem concernign
> graphics within landscape-docs. I found the solution in patching the class
> org.pdfbox.util.operator.pagedrawer.Invoke in the same way I've patched the
> others. And consequently to be strict I've also patched the new methods in
> org.pdfbox.pdfviewer.PageDrawer
> For my everthings works fine inlc. the 4PP-pdf.
> I've attached the patched files and another testpdf with a embedded
> graphic.
> Andreas
> File Added: pdfbox_rotation_patch_2.zip
> http://sourceforge.net/tracker/download.php?group_id=78314&atid=552834&file_id=279471&aid=1977429
> [Comment from SourceForge]
> Date: 2008-05-29 18:12
> Sender: danielwilson
> Logged In: YES 
> user_id=1737686
> Originator: NO
> Your code works w/ the 4PP test ... and with the other rendering stuff
> I've tried so far.
> However ... the text extraction test fails with it.  I can't figure that
> one out ... ideas?
> [Comment from SourceForge]
> Date: 2008-05-29 18:19
> Sender: lehmialk
> Logged In: YES 
> user_id=2069622
> Originator: YES
> Can you give me some more details? I never do any textextractions with
> pdfbox. Perhaps you'll provide with the code for test program, or is it
> part of pdfbox, so that I can find it in the cvs?
> However, it has to wait until tomorrow
> [Comment from SourceForge]
> Date: 2008-05-29 18:39
> Sender: danielwilson
> Logged In: YES 
> user_id=1737686
> Originator: NO
> If you've got the whole project set up, try
> ant testextract
> I'll see if I can narrow it down some.
> [Comment from SourceForge]
> Date: 2008-05-29 21:00
> Sender: danielwilson
> Logged In: YES 
> user_id=1737686
> Originator: NO
> The extraction problem seems to have to do w/ the changes to
> PDFStreamEngine.
> If I revert that file, extraction succeeds.  Unfortunately ... with that
> reverted but your other changes in place, image rendering hangs.
> Will work on it more ... probably tomorrow.
> [Comment from SourceForge]
> Date: 2008-05-29 21:12
> Sender: danielwilson
> Logged In: YES 
> user_id=1737686
> Originator: NO
> Correction ... it doesn't hang ... it's just slow on the first PDF to
> render ... maybe just due to the first one I'm sending it.
> Will look more tomorrow.
> [Comment from SourceForge]
> Date: 2008-05-30 07:11
> Sender: lehmialk
> Logged In: YES 
> user_id=2069622
> Originator: YES
> I've found one bug. While deleting the if rules for the rotation, I've
> deleted line 394 which is still needed.
> I've attached the corrected file
> File Added: PDFStreamEngine.java
> http://sourceforge.net/tracker/download.php?group_id=78314&atid=552834&file_id=279559&aid=1977429
> [Comment from SourceForge]
> Date: 2008-05-30 07:43
> Sender: lehmialk
> Logged In: YES 
> user_id=2069622
> Originator: YES
> I forgot to mention that I can't run the test suite. When I try to get the
> whole project, I realized that I'm behind a firewall here in my office.
> Consequently my cvs-client doesn't work. I've to do it from home. :-(
> I've only tested one file: 601501018.pdf
> There are additional blanks and they disapper after adding the missing
> line. But starting at page 21, when the document orientation changes from
> portrait to landscape, there are additional cr or lf. Hmmmm ??
> [Comment from SourceForge]
> Date: 2008-05-30 08:25
> Sender: lehmialk
> Logged In: YES 
> user_id=2069622
> Originator: YES
> I've continued testing and I guess the problem is somewhere starting in
> org.pdfbox.util.PDFTextStripper.showCharacter(..). Obviously it handles the
> coordinates for rotated pages somehow in an other way than the
> implementation of the showCharacter() in org.pdfbox.pdfviewer.PageDrawer.
> But for the moment I don't understand what's happening in the
> TextStripper, perhaps I'll find out later. 
> I hope this hint helps ...
> [Comment from SourceForge]
> Date: 2008-05-30 16:20
> Sender: danielwilson
> Logged In: YES 
> user_id=1737686
> Originator: NO
> I've put a couple more hours into this, and I don't know the answer.
> I do know the text extraction is the more mature side of this library.
> For the moment, I'll be skipping over your changes to PDFStreamEngine.
> Thanks for the other changes!
> [Comment from SourceForge]
> Date: 2008-06-02 09:21
> Sender: lehmialk
> Logged In: YES 
> user_id=2069622
> Originator: YES
> Hi Daniel,
> I guess I've solved the problem. The textposition-handling has to be
> adjusted within the method PDFTextStripper.flushText(). Of course my former
> changes to the class PDFStreamEngine are needed. During debugging I found a
> bug in the class TextPositionComparator (line 82). I solved it by removing
> the rotation if-clauses. Whenever you compare two Textpositions, it is
> needless to look at the rotation because they are on the same page so that
> the comparison is independent of the rotation.
> Furthermore my PDFTextStripper-patch seems to correct some minor problems,
> which are described in
> https://sourceforge.net/forum/message.php?msg_id=4976730.
> I've tested the following cases:
> Garcia2003b__Correlative_exploration_of_EEG_Signals.pdf works 100%
> test_rotate_270.txt doesn't work 100%, but my patch corrected a bug in
> lines 251-257, 278/279, 502/503, 574/575 and the other differences are some
> kind of special-character-issues. I guess you have to correct the input at
> first.
> I've attached my changes based on the newest versions of both classes.
> [Comment from SourceForge]
> Date: 2008-06-02 09:22
> Sender: lehmialk
> Logged In: YES 
> user_id=2069622
> Originator: YES
> File Added: pdfbox_rotation_patch_3.zip
> http://sourceforge.net/tracker/download.php?group_id=78314&atid=552834&file_id=279842&aid=1977429

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


AW: AW: [jira] Updated: (PDFBOX-363) Fixed Page rotation

Posted by An...@rwe.com.
>> I had a similar idea. My first attempt looked like this:
>>
>> - I've added a method to the class TextPosition to get the
>> AffineTransform
>> - I've used the getX/getY-methods to get the starting-coords
>>
>> But then I realized, that I can't get the unmodified starting-
>> coords from TextPosition.
>> They are always altered in some way concerning the rotation. So
>> finally I had to
>> decide if I add 3 new methods (1 for the AffineTransform and 2 for
>> X and Y) to the class
>> TextPosition or if I just get the textPos and put the logic into
>> PageDrawer.showCharacter().
>> I've choosen alternativ 2.
>
>OK.  I'm just not sure how many other callers in the future may need
>to duplicate the logic.   At a minimum, I think the comment for the
>getTextPos() function should document that the X and Y coordinates
>have not been adjusted to change the location of 0,0 because all
>other functions are returning coordinates that have been adjusted.
>Returning an adjusted matrix seems more consistent, but a comment
>would help to notify callers that it is different.
Ok, I agree. I will add a comment to the method, to explain what the caller
will get. If there will be other callers in the future, it'll be no problem
to put the logic into the class TextPosition.

>There is a slight difference between getX() and getXDirAdj().  getX()
>is adjusted based on page rotation and getXDirAdj() is adjusted based
>on text direction.  To be honest, I don't know when the page rotation
>values (i.e. getX()) would be needed, but that is how the code
>originally worked and there were other functions in PDFBox that were
>calling it and I kept it there for backwards compatibility.  If no
>one needs the page rotation adjusted values (i.e. getX()) and
>functions do need non-adjusted values, then we can change getX() to
>return the non-modified values.
I've studied the code again and thanks to your explanation I've got the point.
Of course, there is a difference and I guess that is the missing fact I'm looking for
answering my question about rotating the font by 180 degrees if the page rotation != 0:

patched source from PageDrawer.showCharacter

Matrix textPos = text.getTextPos().copy();
float x = textPos.getXPosition();
// the 0,0-reference has to be moved from the lower left (PDF) to the upper left (AWT-graphics)
float y = pageSize.height - textPos.getYPosition();
// Set translation to 0,0. We only need the scaling and shearing
textPos.setValue(2, 0, 0);
textPos.setValue(2, 1, 0);
AffineTransform at = textPos.createAffineTransform();
// If there is a rotation, we have to add a additional rotation by 180°.
// Don't no why.yet. I guess there is a problem with clockwise- and counterclockwise-rotation
if (at.getShearX() != 0 || at.getShearY() != 0)
        at.quadrantRotate(2);

I'll try some other examples with different page-rotations and text-orientations
to check if the rotation by 180 degrees is needed like implemented or not.

Andreas

----------------------------------------------------------------
- Geschaeftsfuehrung: Chittur Ramakrishnan (Vorsitzender), 
Stefan Niehusmann - 
- Sitz der Gesellschaft: Dortmund - 
- Eingetragen beim Amtsgericht Dortmund - 
- Handelsregister-Nr. HR B 21222 - 
- USt.-IdNr. DE 2588 96 719 - 

Re: AW: [jira] Updated: (PDFBOX-363) Fixed Page rotation

Posted by Brian Carrier <ca...@digital-evidence.org>.
Hi Andreas,

On Nov 24, 2008, at 4:17 AM, <An...@rwe.com>  
<An...@rwe.com> wrote:

> Hi Brian
>
>> I took a look at the diffs and had an idea.  Since TextPosition now
>> returns coordinates and such that are adjusted for various rotations
>> and changes in the 0,0 location, perhaps the new getTextPos() method
>> should also adjust the matrix for the 0,0 location. Then none of the
>> callers need to deal with converting the values (since they no longer
>> need to deal with the conversions when getting the X and Y
>> coordinates either).  Or, the TextPosition could create and return an
>> AffineTransform object.
> I had a similar idea. My first attempt looked like this:
>
> - I've added a method to the class TextPosition to get the  
> AffineTransform
> - I've used the getX/getY-methods to get the starting-coords
>
> But then I realized, that I can't get the unmodified starting- 
> coords from TextPosition.
> They are always altered in some way concerning the rotation. So  
> finally I had to
> decide if I add 3 new methods (1 for the AffineTransform and 2 for  
> X and Y) to the class
> TextPosition or if I just get the textPos and put the logic into  
> PageDrawer.showCharacter().
> I've choosen alternativ 2.

OK.  I'm just not sure how many other callers in the future may need  
to duplicate the logic.   At a minimum, I think the comment for the  
getTextPos() function should document that the X and Y coordinates  
have not been adjusted to change the location of 0,0 because all  
other functions are returning coordinates that have been adjusted.    
Returning an adjusted matrix seems more consistent, but a comment  
would help to notify callers that it is different.

> At this point I've a questions concerning your changes to the class  
> TextPosition:
>
> There is a pair of methods getX and getXDirAdj (there are similar  
> pairs for Y and Width).
> Both are returning the same value. Was that your intention? If not,  
> I would suggest to change
> them. For example getX/Y/Width returns the unmodified values and  
> getX/Y/WidthDirAdj returns the
> modified values. Consequently I would alter my patch to put some of  
> the logic from PageDrawer
> to TextPosition.

There is a slight difference between getX() and getXDirAdj().  getX()  
is adjusted based on page rotation and getXDirAdj() is adjusted based  
on text direction.  To be honest, I don't know when the page rotation  
values (i.e. getX()) would be needed, but that is how the code  
originally worked and there were other functions in PDFBox that were  
calling it and I kept it there for backwards compatibility.  If no  
one needs the page rotation adjusted values (i.e. getX()) and  
functions do need non-adjusted values, then we can change getX() to  
return the non-modified values.

thanks,
brian


AW: [jira] Updated: (PDFBOX-363) Fixed Page rotation

Posted by An...@rwe.com.
Hi Brian

>I took a look at the diffs and had an idea.  Since TextPosition now  
>returns coordinates and such that are adjusted for various rotations  
>and changes in the 0,0 location, perhaps the new getTextPos() method  
>should also adjust the matrix for the 0,0 location. Then none of the  
>callers need to deal with converting the values (since they no longer  
>need to deal with the conversions when getting the X and Y  
>coordinates either).  Or, the TextPosition could create and return an  
>AffineTransform object.
I had a similar idea. My first attempt looked like this:

- I've added a method to the class TextPosition to get the AffineTransform
- I've used the getX/getY-methods to get the starting-coords

But then I realized, that I can't get the unmodified starting-coords from TextPosition. 
They are always altered in some way concerning the rotation. So finally I had to
decide if I add 3 new methods (1 for the AffineTransform and 2 for X and Y) to the class
TextPosition or if I just get the textPos and put the logic into PageDrawer.showCharacter().
I've choosen alternativ 2.

At this point I've a questions concerning your changes to the class TextPosition:

There is a pair of methods getX and getXDirAdj (there are similar pairs for Y and Width).
Both are returning the same value. Was that your intention? If not, I would suggest to change
them. For example getX/Y/Width returns the unmodified values and getX/Y/WidthDirAdj returns the
modified values. Consequently I would alter my patch to put some of the logic from PageDrawer
to TextPosition.

Andreas



----------------------------------------------------------------
- Geschaeftsfuehrung: Chittur Ramakrishnan (Vorsitzender), 
Stefan Niehusmann - 
- Sitz der Gesellschaft: Dortmund - 
- Eingetragen beim Amtsgericht Dortmund - 
- Handelsregister-Nr. HR B 21222 - 
- USt.-IdNr. DE 2588 96 719 - 

Re: [jira] Updated: (PDFBOX-363) Fixed Page rotation

Posted by Brian Carrier <ca...@digital-evidence.org>.
Hi Andreas,

I took a look at the diffs and had an idea.  Since TextPosition now  
returns coordinates and such that are adjusted for various rotations  
and changes in the 0,0 location, perhaps the new getTextPos() method  
should also adjust the matrix for the 0,0 location. Then none of the  
callers need to deal with converting the values (since they no longer  
need to deal with the conversions when getting the X and Y  
coordinates either).  Or, the TextPosition could create and return an  
AffineTransform object.

thanks,
brian



On Nov 23, 2008, at 7:25 AM, Andreas Lehmkühler (JIRA) wrote:

>
>      [ https://issues.apache.org/jira/browse/PDFBOX-363? 
> page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
>
> Andreas Lehmkühler updated PDFBOX-363:
> --------------------------------------
>
>     Attachment: PageRotation-Patch.diff
>
> As already discussed on the dev-list, all supported parts of a pdf- 
> document but the text will be rotated if necessary. Based on the  
> changes coming with PDFBOX-374 the only thing I have to do is to  
> rotate the font if needed. I made a patch with the following changes
>
> org.apache.pdfbox.pdmodel.font.PDFont (and all classes):
> - the signature of drawString() changes.
>
> org.apache.pdfbox.util.TextPosition:
> - additional getter for the textPos-Matrix
>
> org.apache.pdfbox.pdfviewer.PadeDrawer:
> - added the rotation-stuff to the showCharacter-method
>
>
>
>> Fixed Page rotation
>> -------------------
>>
>>                 Key: PDFBOX-363
>>                 URL: https://issues.apache.org/jira/browse/PDFBOX-363
>>             Project: PDFBox
>>          Issue Type: Improvement
>>            Reporter: Jukka Zitting
>>         Attachments: landscape_rot90.pdf, PageRotation-Patch.diff,  
>> test-landscape2.pdf
>>
>>
>> [Issue from SourceForge]
>> http://sourceforge.net/tracker/index.php? 
>> func=detail&aid=1977429&group_id=78314&atid=552834
>> Hi all,
>> Daniel asked me for my patch for the rotation-issue described in
>> https://sourceforge.net/forum/message.php?msg_id=4992032
>> Attention, I didn't apply the newest patches to the classes  
>> PDFStreamEngine
>> and PageDrawer.
>> There are 4 more probably affected classes calling the  
>> page.findRotation
>> method which I didn't change, because I'm didn't have to use them  
>> (until
>> now).
>> org.pdfbox.util.operator.pagedrawer.Invoke
>> org.pdfbox.util.TextPositionComparator
>> org.pdfbox.examples.pdmodel.PrintURLs
>> org.pdfbox.examples.util.PrintImageLocations
>> I've attached a pdf in DINA4-landscape. The text is missplaced  
>> whenever I
>> try to print or display (using the pdfbox-PDFReader and  
>> convertToImage
>> within my application) it with pdfbox. The acrobat reader has no  
>> problems
>> with my documents.
>> After my patch everything works fine. Perhaps it is a point of  
>> discussion,
>> if the convertToImage method has to rotate the image or if the  
>> user has to
>> do it. The PDFPagePanel didn't do it (yet).
>> Andreas
>> http://sourceforge.net/tracker/download.php? 
>> group_id=78314&atid=552834&file_id=279404&aid=1977429
>> [Comment from SourceForge]
>> Date: 2008-05-29 12:42
>> Sender: danielwilson
>> Logged In: YES
>> user_id=1737686
>> Originator: NO
>> I've just tried your sample PDF w/ the latest code -- prior to  
>> application
>> of your patch.  It doesn't work.
>> I'll work on incorporating your change for a full regression test  
>> in the
>> next hour or so.
>> [Comment from SourceForge]
>> Date: 2008-05-29 15:16
>> Sender: lehmialk
>> Logged In: YES
>> user_id=2069622
>> Originator: YES
>> Hi Daniel,
>> I've just added my patch to the newest sources you send me earlier  
>> this
>> day. I guess it works. During testing I've found another problem  
>> concernign
>> graphics within landscape-docs. I found the solution in patching  
>> the class
>> org.pdfbox.util.operator.pagedrawer.Invoke in the same way I've  
>> patched the
>> others. And consequently to be strict I've also patched the new  
>> methods in
>> org.pdfbox.pdfviewer.PageDrawer
>> For my everthings works fine inlc. the 4PP-pdf.
>> I've attached the patched files and another testpdf with a embedded
>> graphic.
>> Andreas
>> File Added: pdfbox_rotation_patch_2.zip
>> http://sourceforge.net/tracker/download.php? 
>> group_id=78314&atid=552834&file_id=279471&aid=1977429
>> [Comment from SourceForge]
>> Date: 2008-05-29 18:12
>> Sender: danielwilson
>> Logged In: YES
>> user_id=1737686
>> Originator: NO
>> Your code works w/ the 4PP test ... and with the other rendering  
>> stuff
>> I've tried so far.
>> However ... the text extraction test fails with it.  I can't  
>> figure that
>> one out ... ideas?
>> [Comment from SourceForge]
>> Date: 2008-05-29 18:19
>> Sender: lehmialk
>> Logged In: YES
>> user_id=2069622
>> Originator: YES
>> Can you give me some more details? I never do any textextractions  
>> with
>> pdfbox. Perhaps you'll provide with the code for test program, or  
>> is it
>> part of pdfbox, so that I can find it in the cvs?
>> However, it has to wait until tomorrow
>> [Comment from SourceForge]
>> Date: 2008-05-29 18:39
>> Sender: danielwilson
>> Logged In: YES
>> user_id=1737686
>> Originator: NO
>> If you've got the whole project set up, try
>> ant testextract
>> I'll see if I can narrow it down some.
>> [Comment from SourceForge]
>> Date: 2008-05-29 21:00
>> Sender: danielwilson
>> Logged In: YES
>> user_id=1737686
>> Originator: NO
>> The extraction problem seems to have to do w/ the changes to
>> PDFStreamEngine.
>> If I revert that file, extraction succeeds.  Unfortunately ...  
>> with that
>> reverted but your other changes in place, image rendering hangs.
>> Will work on it more ... probably tomorrow.
>> [Comment from SourceForge]
>> Date: 2008-05-29 21:12
>> Sender: danielwilson
>> Logged In: YES
>> user_id=1737686
>> Originator: NO
>> Correction ... it doesn't hang ... it's just slow on the first PDF to
>> render ... maybe just due to the first one I'm sending it.
>> Will look more tomorrow.
>> [Comment from SourceForge]
>> Date: 2008-05-30 07:11
>> Sender: lehmialk
>> Logged In: YES
>> user_id=2069622
>> Originator: YES
>> I've found one bug. While deleting the if rules for the rotation,  
>> I've
>> deleted line 394 which is still needed.
>> I've attached the corrected file
>> File Added: PDFStreamEngine.java
>> http://sourceforge.net/tracker/download.php? 
>> group_id=78314&atid=552834&file_id=279559&aid=1977429
>> [Comment from SourceForge]
>> Date: 2008-05-30 07:43
>> Sender: lehmialk
>> Logged In: YES
>> user_id=2069622
>> Originator: YES
>> I forgot to mention that I can't run the test suite. When I try to  
>> get the
>> whole project, I realized that I'm behind a firewall here in my  
>> office.
>> Consequently my cvs-client doesn't work. I've to do it from home. :-(
>> I've only tested one file: 601501018.pdf
>> There are additional blanks and they disapper after adding the  
>> missing
>> line. But starting at page 21, when the document orientation  
>> changes from
>> portrait to landscape, there are additional cr or lf. Hmmmm ??
>> [Comment from SourceForge]
>> Date: 2008-05-30 08:25
>> Sender: lehmialk
>> Logged In: YES
>> user_id=2069622
>> Originator: YES
>> I've continued testing and I guess the problem is somewhere  
>> starting in
>> org.pdfbox.util.PDFTextStripper.showCharacter(..). Obviously it  
>> handles the
>> coordinates for rotated pages somehow in an other way than the
>> implementation of the showCharacter() in  
>> org.pdfbox.pdfviewer.PageDrawer.
>> But for the moment I don't understand what's happening in the
>> TextStripper, perhaps I'll find out later.
>> I hope this hint helps ...
>> [Comment from SourceForge]
>> Date: 2008-05-30 16:20
>> Sender: danielwilson
>> Logged In: YES
>> user_id=1737686
>> Originator: NO
>> I've put a couple more hours into this, and I don't know the answer.
>> I do know the text extraction is the more mature side of this  
>> library.
>> For the moment, I'll be skipping over your changes to  
>> PDFStreamEngine.
>> Thanks for the other changes!
>> [Comment from SourceForge]
>> Date: 2008-06-02 09:21
>> Sender: lehmialk
>> Logged In: YES
>> user_id=2069622
>> Originator: YES
>> Hi Daniel,
>> I guess I've solved the problem. The textposition-handling has to be
>> adjusted within the method PDFTextStripper.flushText(). Of course  
>> my former
>> changes to the class PDFStreamEngine are needed. During debugging  
>> I found a
>> bug in the class TextPositionComparator (line 82). I solved it by  
>> removing
>> the rotation if-clauses. Whenever you compare two Textpositions,  
>> it is
>> needless to look at the rotation because they are on the same page  
>> so that
>> the comparison is independent of the rotation.
>> Furthermore my PDFTextStripper-patch seems to correct some minor  
>> problems,
>> which are described in
>> https://sourceforge.net/forum/message.php?msg_id=4976730.
>> I've tested the following cases:
>> Garcia2003b__Correlative_exploration_of_EEG_Signals.pdf works 100%
>> test_rotate_270.txt doesn't work 100%, but my patch corrected a  
>> bug in
>> lines 251-257, 278/279, 502/503, 574/575 and the other differences  
>> are some
>> kind of special-character-issues. I guess you have to correct the  
>> input at
>> first.
>> I've attached my changes based on the newest versions of both  
>> classes.
>> [Comment from SourceForge]
>> Date: 2008-06-02 09:22
>> Sender: lehmialk
>> Logged In: YES
>> user_id=2069622
>> Originator: YES
>> File Added: pdfbox_rotation_patch_3.zip
>> http://sourceforge.net/tracker/download.php? 
>> group_id=78314&atid=552834&file_id=279842&aid=1977429
>
> -- 
> This message is automatically generated by JIRA.
> -
> You can reply to this email to add a comment to the issue online.
>


[jira] Updated: (PDFBOX-363) Fixed Page rotation

Posted by "Andreas Lehmkühler (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PDFBOX-363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Andreas Lehmkühler updated PDFBOX-363:
--------------------------------------

    Attachment: PageRotation-Patch.diff

As already discussed on the dev-list, all supported parts of a pdf-document but the text will be rotated if necessary. Based on the changes coming with PDFBOX-374 the only thing I have to do is to rotate the font if needed. I made a patch with the following changes

org.apache.pdfbox.pdmodel.font.PDFont (and all classes):
- the signature of drawString() changes.

org.apache.pdfbox.util.TextPosition:
- additional getter for the textPos-Matrix

org.apache.pdfbox.pdfviewer.PadeDrawer:
- added the rotation-stuff to the showCharacter-method



> Fixed Page rotation
> -------------------
>
>                 Key: PDFBOX-363
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-363
>             Project: PDFBox
>          Issue Type: Improvement
>            Reporter: Jukka Zitting
>         Attachments: landscape_rot90.pdf, PageRotation-Patch.diff, test-landscape2.pdf
>
>
> [Issue from SourceForge]
> http://sourceforge.net/tracker/index.php?func=detail&aid=1977429&group_id=78314&atid=552834
> Hi all,
> Daniel asked me for my patch for the rotation-issue described in
> https://sourceforge.net/forum/message.php?msg_id=4992032
> Attention, I didn't apply the newest patches to the classes PDFStreamEngine
> and PageDrawer.
> There are 4 more probably affected classes calling the page.findRotation
> method which I didn't change, because I'm didn't have to use them (until
> now).
> org.pdfbox.util.operator.pagedrawer.Invoke
> org.pdfbox.util.TextPositionComparator
> org.pdfbox.examples.pdmodel.PrintURLs
> org.pdfbox.examples.util.PrintImageLocations
> I've attached a pdf in DINA4-landscape. The text is missplaced whenever I
> try to print or display (using the pdfbox-PDFReader and convertToImage
> within my application) it with pdfbox. The acrobat reader has no problems
> with my documents.
> After my patch everything works fine. Perhaps it is a point of discussion,
> if the convertToImage method has to rotate the image or if the user has to
> do it. The PDFPagePanel didn't do it (yet).
> Andreas
> http://sourceforge.net/tracker/download.php?group_id=78314&atid=552834&file_id=279404&aid=1977429
> [Comment from SourceForge]
> Date: 2008-05-29 12:42
> Sender: danielwilson
> Logged In: YES 
> user_id=1737686
> Originator: NO
> I've just tried your sample PDF w/ the latest code -- prior to application
> of your patch.  It doesn't work.
> I'll work on incorporating your change for a full regression test in the
> next hour or so.
> [Comment from SourceForge]
> Date: 2008-05-29 15:16
> Sender: lehmialk
> Logged In: YES 
> user_id=2069622
> Originator: YES
> Hi Daniel,
> I've just added my patch to the newest sources you send me earlier this
> day. I guess it works. During testing I've found another problem concernign
> graphics within landscape-docs. I found the solution in patching the class
> org.pdfbox.util.operator.pagedrawer.Invoke in the same way I've patched the
> others. And consequently to be strict I've also patched the new methods in
> org.pdfbox.pdfviewer.PageDrawer
> For my everthings works fine inlc. the 4PP-pdf.
> I've attached the patched files and another testpdf with a embedded
> graphic.
> Andreas
> File Added: pdfbox_rotation_patch_2.zip
> http://sourceforge.net/tracker/download.php?group_id=78314&atid=552834&file_id=279471&aid=1977429
> [Comment from SourceForge]
> Date: 2008-05-29 18:12
> Sender: danielwilson
> Logged In: YES 
> user_id=1737686
> Originator: NO
> Your code works w/ the 4PP test ... and with the other rendering stuff
> I've tried so far.
> However ... the text extraction test fails with it.  I can't figure that
> one out ... ideas?
> [Comment from SourceForge]
> Date: 2008-05-29 18:19
> Sender: lehmialk
> Logged In: YES 
> user_id=2069622
> Originator: YES
> Can you give me some more details? I never do any textextractions with
> pdfbox. Perhaps you'll provide with the code for test program, or is it
> part of pdfbox, so that I can find it in the cvs?
> However, it has to wait until tomorrow
> [Comment from SourceForge]
> Date: 2008-05-29 18:39
> Sender: danielwilson
> Logged In: YES 
> user_id=1737686
> Originator: NO
> If you've got the whole project set up, try
> ant testextract
> I'll see if I can narrow it down some.
> [Comment from SourceForge]
> Date: 2008-05-29 21:00
> Sender: danielwilson
> Logged In: YES 
> user_id=1737686
> Originator: NO
> The extraction problem seems to have to do w/ the changes to
> PDFStreamEngine.
> If I revert that file, extraction succeeds.  Unfortunately ... with that
> reverted but your other changes in place, image rendering hangs.
> Will work on it more ... probably tomorrow.
> [Comment from SourceForge]
> Date: 2008-05-29 21:12
> Sender: danielwilson
> Logged In: YES 
> user_id=1737686
> Originator: NO
> Correction ... it doesn't hang ... it's just slow on the first PDF to
> render ... maybe just due to the first one I'm sending it.
> Will look more tomorrow.
> [Comment from SourceForge]
> Date: 2008-05-30 07:11
> Sender: lehmialk
> Logged In: YES 
> user_id=2069622
> Originator: YES
> I've found one bug. While deleting the if rules for the rotation, I've
> deleted line 394 which is still needed.
> I've attached the corrected file
> File Added: PDFStreamEngine.java
> http://sourceforge.net/tracker/download.php?group_id=78314&atid=552834&file_id=279559&aid=1977429
> [Comment from SourceForge]
> Date: 2008-05-30 07:43
> Sender: lehmialk
> Logged In: YES 
> user_id=2069622
> Originator: YES
> I forgot to mention that I can't run the test suite. When I try to get the
> whole project, I realized that I'm behind a firewall here in my office.
> Consequently my cvs-client doesn't work. I've to do it from home. :-(
> I've only tested one file: 601501018.pdf
> There are additional blanks and they disapper after adding the missing
> line. But starting at page 21, when the document orientation changes from
> portrait to landscape, there are additional cr or lf. Hmmmm ??
> [Comment from SourceForge]
> Date: 2008-05-30 08:25
> Sender: lehmialk
> Logged In: YES 
> user_id=2069622
> Originator: YES
> I've continued testing and I guess the problem is somewhere starting in
> org.pdfbox.util.PDFTextStripper.showCharacter(..). Obviously it handles the
> coordinates for rotated pages somehow in an other way than the
> implementation of the showCharacter() in org.pdfbox.pdfviewer.PageDrawer.
> But for the moment I don't understand what's happening in the
> TextStripper, perhaps I'll find out later. 
> I hope this hint helps ...
> [Comment from SourceForge]
> Date: 2008-05-30 16:20
> Sender: danielwilson
> Logged In: YES 
> user_id=1737686
> Originator: NO
> I've put a couple more hours into this, and I don't know the answer.
> I do know the text extraction is the more mature side of this library.
> For the moment, I'll be skipping over your changes to PDFStreamEngine.
> Thanks for the other changes!
> [Comment from SourceForge]
> Date: 2008-06-02 09:21
> Sender: lehmialk
> Logged In: YES 
> user_id=2069622
> Originator: YES
> Hi Daniel,
> I guess I've solved the problem. The textposition-handling has to be
> adjusted within the method PDFTextStripper.flushText(). Of course my former
> changes to the class PDFStreamEngine are needed. During debugging I found a
> bug in the class TextPositionComparator (line 82). I solved it by removing
> the rotation if-clauses. Whenever you compare two Textpositions, it is
> needless to look at the rotation because they are on the same page so that
> the comparison is independent of the rotation.
> Furthermore my PDFTextStripper-patch seems to correct some minor problems,
> which are described in
> https://sourceforge.net/forum/message.php?msg_id=4976730.
> I've tested the following cases:
> Garcia2003b__Correlative_exploration_of_EEG_Signals.pdf works 100%
> test_rotate_270.txt doesn't work 100%, but my patch corrected a bug in
> lines 251-257, 278/279, 502/503, 574/575 and the other differences are some
> kind of special-character-issues. I guess you have to correct the input at
> first.
> I've attached my changes based on the newest versions of both classes.
> [Comment from SourceForge]
> Date: 2008-06-02 09:22
> Sender: lehmialk
> Logged In: YES 
> user_id=2069622
> Originator: YES
> File Added: pdfbox_rotation_patch_3.zip
> http://sourceforge.net/tracker/download.php?group_id=78314&atid=552834&file_id=279842&aid=1977429

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PDFBOX-363) Fixed Page rotation

Posted by "Andreas Lehmkühler (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PDFBOX-363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Andreas Lehmkühler updated PDFBOX-363:
--------------------------------------

    Attachment: PageRotation-Patch3.diff

There was still one unanswered question. First of all I thought that the text has to be rotated by 180 degrees, if the text-direction unequals 0. But this works only for rotation-angles which are multiplies of 90 degrees. Any other angle doesn't work.

This problem can be demonstrated by using the test-document Flyer2.pdf attached to PDFBOX-358. The document has a landscape-orientation and most of the text has a direction of 90. But there are some pieces of text with a rotation of approx. 40 degrees.

After some testing I realized that the text doesn't have to be rotated by 180 degrees, but the shearing has to be corrected by multiplying the values with -1. This operation compensates the moving of the 0,0-reference from lower left to upper left before drawing.

I've attached my patch.

> Fixed Page rotation
> -------------------
>
>                 Key: PDFBOX-363
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-363
>             Project: PDFBox
>          Issue Type: Improvement
>            Reporter: Jukka Zitting
>            Assignee: Jukka Zitting
>             Fix For: 0.8.0-incubator
>
>         Attachments: landscape_rot90.pdf, PageRotation-Patch.diff, PageRotation-Patch2.diff, PageRotation-Patch3.diff, test-landscape2.pdf
>
>
> [Issue from SourceForge]
> http://sourceforge.net/tracker/index.php?func=detail&aid=1977429&group_id=78314&atid=552834
> Hi all,
> Daniel asked me for my patch for the rotation-issue described in
> https://sourceforge.net/forum/message.php?msg_id=4992032
> Attention, I didn't apply the newest patches to the classes PDFStreamEngine
> and PageDrawer.
> There are 4 more probably affected classes calling the page.findRotation
> method which I didn't change, because I'm didn't have to use them (until
> now).
> org.pdfbox.util.operator.pagedrawer.Invoke
> org.pdfbox.util.TextPositionComparator
> org.pdfbox.examples.pdmodel.PrintURLs
> org.pdfbox.examples.util.PrintImageLocations
> I've attached a pdf in DINA4-landscape. The text is missplaced whenever I
> try to print or display (using the pdfbox-PDFReader and convertToImage
> within my application) it with pdfbox. The acrobat reader has no problems
> with my documents.
> After my patch everything works fine. Perhaps it is a point of discussion,
> if the convertToImage method has to rotate the image or if the user has to
> do it. The PDFPagePanel didn't do it (yet).
> Andreas
> http://sourceforge.net/tracker/download.php?group_id=78314&atid=552834&file_id=279404&aid=1977429
> [Comment from SourceForge]
> Date: 2008-05-29 12:42
> Sender: danielwilson
> Logged In: YES 
> user_id=1737686
> Originator: NO
> I've just tried your sample PDF w/ the latest code -- prior to application
> of your patch.  It doesn't work.
> I'll work on incorporating your change for a full regression test in the
> next hour or so.
> [Comment from SourceForge]
> Date: 2008-05-29 15:16
> Sender: lehmialk
> Logged In: YES 
> user_id=2069622
> Originator: YES
> Hi Daniel,
> I've just added my patch to the newest sources you send me earlier this
> day. I guess it works. During testing I've found another problem concernign
> graphics within landscape-docs. I found the solution in patching the class
> org.pdfbox.util.operator.pagedrawer.Invoke in the same way I've patched the
> others. And consequently to be strict I've also patched the new methods in
> org.pdfbox.pdfviewer.PageDrawer
> For my everthings works fine inlc. the 4PP-pdf.
> I've attached the patched files and another testpdf with a embedded
> graphic.
> Andreas
> File Added: pdfbox_rotation_patch_2.zip
> http://sourceforge.net/tracker/download.php?group_id=78314&atid=552834&file_id=279471&aid=1977429
> [Comment from SourceForge]
> Date: 2008-05-29 18:12
> Sender: danielwilson
> Logged In: YES 
> user_id=1737686
> Originator: NO
> Your code works w/ the 4PP test ... and with the other rendering stuff
> I've tried so far.
> However ... the text extraction test fails with it.  I can't figure that
> one out ... ideas?
> [Comment from SourceForge]
> Date: 2008-05-29 18:19
> Sender: lehmialk
> Logged In: YES 
> user_id=2069622
> Originator: YES
> Can you give me some more details? I never do any textextractions with
> pdfbox. Perhaps you'll provide with the code for test program, or is it
> part of pdfbox, so that I can find it in the cvs?
> However, it has to wait until tomorrow
> [Comment from SourceForge]
> Date: 2008-05-29 18:39
> Sender: danielwilson
> Logged In: YES 
> user_id=1737686
> Originator: NO
> If you've got the whole project set up, try
> ant testextract
> I'll see if I can narrow it down some.
> [Comment from SourceForge]
> Date: 2008-05-29 21:00
> Sender: danielwilson
> Logged In: YES 
> user_id=1737686
> Originator: NO
> The extraction problem seems to have to do w/ the changes to
> PDFStreamEngine.
> If I revert that file, extraction succeeds.  Unfortunately ... with that
> reverted but your other changes in place, image rendering hangs.
> Will work on it more ... probably tomorrow.
> [Comment from SourceForge]
> Date: 2008-05-29 21:12
> Sender: danielwilson
> Logged In: YES 
> user_id=1737686
> Originator: NO
> Correction ... it doesn't hang ... it's just slow on the first PDF to
> render ... maybe just due to the first one I'm sending it.
> Will look more tomorrow.
> [Comment from SourceForge]
> Date: 2008-05-30 07:11
> Sender: lehmialk
> Logged In: YES 
> user_id=2069622
> Originator: YES
> I've found one bug. While deleting the if rules for the rotation, I've
> deleted line 394 which is still needed.
> I've attached the corrected file
> File Added: PDFStreamEngine.java
> http://sourceforge.net/tracker/download.php?group_id=78314&atid=552834&file_id=279559&aid=1977429
> [Comment from SourceForge]
> Date: 2008-05-30 07:43
> Sender: lehmialk
> Logged In: YES 
> user_id=2069622
> Originator: YES
> I forgot to mention that I can't run the test suite. When I try to get the
> whole project, I realized that I'm behind a firewall here in my office.
> Consequently my cvs-client doesn't work. I've to do it from home. :-(
> I've only tested one file: 601501018.pdf
> There are additional blanks and they disapper after adding the missing
> line. But starting at page 21, when the document orientation changes from
> portrait to landscape, there are additional cr or lf. Hmmmm ??
> [Comment from SourceForge]
> Date: 2008-05-30 08:25
> Sender: lehmialk
> Logged In: YES 
> user_id=2069622
> Originator: YES
> I've continued testing and I guess the problem is somewhere starting in
> org.pdfbox.util.PDFTextStripper.showCharacter(..). Obviously it handles the
> coordinates for rotated pages somehow in an other way than the
> implementation of the showCharacter() in org.pdfbox.pdfviewer.PageDrawer.
> But for the moment I don't understand what's happening in the
> TextStripper, perhaps I'll find out later. 
> I hope this hint helps ...
> [Comment from SourceForge]
> Date: 2008-05-30 16:20
> Sender: danielwilson
> Logged In: YES 
> user_id=1737686
> Originator: NO
> I've put a couple more hours into this, and I don't know the answer.
> I do know the text extraction is the more mature side of this library.
> For the moment, I'll be skipping over your changes to PDFStreamEngine.
> Thanks for the other changes!
> [Comment from SourceForge]
> Date: 2008-06-02 09:21
> Sender: lehmialk
> Logged In: YES 
> user_id=2069622
> Originator: YES
> Hi Daniel,
> I guess I've solved the problem. The textposition-handling has to be
> adjusted within the method PDFTextStripper.flushText(). Of course my former
> changes to the class PDFStreamEngine are needed. During debugging I found a
> bug in the class TextPositionComparator (line 82). I solved it by removing
> the rotation if-clauses. Whenever you compare two Textpositions, it is
> needless to look at the rotation because they are on the same page so that
> the comparison is independent of the rotation.
> Furthermore my PDFTextStripper-patch seems to correct some minor problems,
> which are described in
> https://sourceforge.net/forum/message.php?msg_id=4976730.
> I've tested the following cases:
> Garcia2003b__Correlative_exploration_of_EEG_Signals.pdf works 100%
> test_rotate_270.txt doesn't work 100%, but my patch corrected a bug in
> lines 251-257, 278/279, 502/503, 574/575 and the other differences are some
> kind of special-character-issues. I guess you have to correct the input at
> first.
> I've attached my changes based on the newest versions of both classes.
> [Comment from SourceForge]
> Date: 2008-06-02 09:22
> Sender: lehmialk
> Logged In: YES 
> user_id=2069622
> Originator: YES
> File Added: pdfbox_rotation_patch_3.zip
> http://sourceforge.net/tracker/download.php?group_id=78314&atid=552834&file_id=279842&aid=1977429

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PDFBOX-363) Fixed Page rotation

Posted by "Brian Carrier (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PDFBOX-363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12632731#action_12632731 ] 

Brian Carrier commented on PDFBOX-363:
--------------------------------------

After reviewing these changes along with the ones I submitted in PDFBOX-374, here are my thoughts.

The previous code in PDFTextStripper used to adjust for the page rotation and it would change the location of the 0,0 reference from the lower left to the upper left.  The code in TextPositionComparator also adjusted for rotation and expected the change in Y-axis and calculates the top of the character by subtracting the height from the bottom. 

The changes by Andreas removed the rotation adjustment from both PDFTextStripper and TextPositionComparator and thereby also removed the code that changed the location of the 0,0 reference. 

My changes kept the adjustments in PDFTextStripper and only removed the rotation adjustment in TextPositionComparator because it was being applied twice. I don't disagree with the change by Andreas to remove rotation in PDFTextStripper, but I didn't feel that I fully understood all of the code to know what impact it would have in other classes that depended on it being rotated.

When I tried the fixes by Andreas on the test file I submitted with PDFBOX-374 I still get errors because the sorting is not being done correctly. The rotation needs to be taken into account somewhere in the sorting process (because sometimes the rotation causes left to right to go from larger to smaller numbers).

The change of not moving the 0,0 reference and not rotating in PDFTextStripper makes sense, but we need to make sure that the appropriate adjustments are made everywhere.  Currently, we will need to change TextPositionComparator to apply the rotation and to change the calculation of the character height based on the 0,0 reference change.  I cannot speak to where else the changes will need to occur outside of the text extraction code.  If this is the desired fix, I can provide a patch.


> Fixed Page rotation
> -------------------
>
>                 Key: PDFBOX-363
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-363
>             Project: PDFBox
>          Issue Type: Improvement
>            Reporter: Jukka Zitting
>
> [Issue from SourceForge]
> http://sourceforge.net/tracker/index.php?func=detail&aid=1977429&group_id=78314&atid=552834
> Hi all,
> Daniel asked me for my patch for the rotation-issue described in
> https://sourceforge.net/forum/message.php?msg_id=4992032
> Attention, I didn't apply the newest patches to the classes PDFStreamEngine
> and PageDrawer.
> There are 4 more probably affected classes calling the page.findRotation
> method which I didn't change, because I'm didn't have to use them (until
> now).
> org.pdfbox.util.operator.pagedrawer.Invoke
> org.pdfbox.util.TextPositionComparator
> org.pdfbox.examples.pdmodel.PrintURLs
> org.pdfbox.examples.util.PrintImageLocations
> I've attached a pdf in DINA4-landscape. The text is missplaced whenever I
> try to print or display (using the pdfbox-PDFReader and convertToImage
> within my application) it with pdfbox. The acrobat reader has no problems
> with my documents.
> After my patch everything works fine. Perhaps it is a point of discussion,
> if the convertToImage method has to rotate the image or if the user has to
> do it. The PDFPagePanel didn't do it (yet).
> Andreas
> http://sourceforge.net/tracker/download.php?group_id=78314&atid=552834&file_id=279404&aid=1977429
> [Comment from SourceForge]
> Date: 2008-05-29 12:42
> Sender: danielwilson
> Logged In: YES 
> user_id=1737686
> Originator: NO
> I've just tried your sample PDF w/ the latest code -- prior to application
> of your patch.  It doesn't work.
> I'll work on incorporating your change for a full regression test in the
> next hour or so.
> [Comment from SourceForge]
> Date: 2008-05-29 15:16
> Sender: lehmialk
> Logged In: YES 
> user_id=2069622
> Originator: YES
> Hi Daniel,
> I've just added my patch to the newest sources you send me earlier this
> day. I guess it works. During testing I've found another problem concernign
> graphics within landscape-docs. I found the solution in patching the class
> org.pdfbox.util.operator.pagedrawer.Invoke in the same way I've patched the
> others. And consequently to be strict I've also patched the new methods in
> org.pdfbox.pdfviewer.PageDrawer
> For my everthings works fine inlc. the 4PP-pdf.
> I've attached the patched files and another testpdf with a embedded
> graphic.
> Andreas
> File Added: pdfbox_rotation_patch_2.zip
> http://sourceforge.net/tracker/download.php?group_id=78314&atid=552834&file_id=279471&aid=1977429
> [Comment from SourceForge]
> Date: 2008-05-29 18:12
> Sender: danielwilson
> Logged In: YES 
> user_id=1737686
> Originator: NO
> Your code works w/ the 4PP test ... and with the other rendering stuff
> I've tried so far.
> However ... the text extraction test fails with it.  I can't figure that
> one out ... ideas?
> [Comment from SourceForge]
> Date: 2008-05-29 18:19
> Sender: lehmialk
> Logged In: YES 
> user_id=2069622
> Originator: YES
> Can you give me some more details? I never do any textextractions with
> pdfbox. Perhaps you'll provide with the code for test program, or is it
> part of pdfbox, so that I can find it in the cvs?
> However, it has to wait until tomorrow
> [Comment from SourceForge]
> Date: 2008-05-29 18:39
> Sender: danielwilson
> Logged In: YES 
> user_id=1737686
> Originator: NO
> If you've got the whole project set up, try
> ant testextract
> I'll see if I can narrow it down some.
> [Comment from SourceForge]
> Date: 2008-05-29 21:00
> Sender: danielwilson
> Logged In: YES 
> user_id=1737686
> Originator: NO
> The extraction problem seems to have to do w/ the changes to
> PDFStreamEngine.
> If I revert that file, extraction succeeds.  Unfortunately ... with that
> reverted but your other changes in place, image rendering hangs.
> Will work on it more ... probably tomorrow.
> [Comment from SourceForge]
> Date: 2008-05-29 21:12
> Sender: danielwilson
> Logged In: YES 
> user_id=1737686
> Originator: NO
> Correction ... it doesn't hang ... it's just slow on the first PDF to
> render ... maybe just due to the first one I'm sending it.
> Will look more tomorrow.
> [Comment from SourceForge]
> Date: 2008-05-30 07:11
> Sender: lehmialk
> Logged In: YES 
> user_id=2069622
> Originator: YES
> I've found one bug. While deleting the if rules for the rotation, I've
> deleted line 394 which is still needed.
> I've attached the corrected file
> File Added: PDFStreamEngine.java
> http://sourceforge.net/tracker/download.php?group_id=78314&atid=552834&file_id=279559&aid=1977429
> [Comment from SourceForge]
> Date: 2008-05-30 07:43
> Sender: lehmialk
> Logged In: YES 
> user_id=2069622
> Originator: YES
> I forgot to mention that I can't run the test suite. When I try to get the
> whole project, I realized that I'm behind a firewall here in my office.
> Consequently my cvs-client doesn't work. I've to do it from home. :-(
> I've only tested one file: 601501018.pdf
> There are additional blanks and they disapper after adding the missing
> line. But starting at page 21, when the document orientation changes from
> portrait to landscape, there are additional cr or lf. Hmmmm ??
> [Comment from SourceForge]
> Date: 2008-05-30 08:25
> Sender: lehmialk
> Logged In: YES 
> user_id=2069622
> Originator: YES
> I've continued testing and I guess the problem is somewhere starting in
> org.pdfbox.util.PDFTextStripper.showCharacter(..). Obviously it handles the
> coordinates for rotated pages somehow in an other way than the
> implementation of the showCharacter() in org.pdfbox.pdfviewer.PageDrawer.
> But for the moment I don't understand what's happening in the
> TextStripper, perhaps I'll find out later. 
> I hope this hint helps ...
> [Comment from SourceForge]
> Date: 2008-05-30 16:20
> Sender: danielwilson
> Logged In: YES 
> user_id=1737686
> Originator: NO
> I've put a couple more hours into this, and I don't know the answer.
> I do know the text extraction is the more mature side of this library.
> For the moment, I'll be skipping over your changes to PDFStreamEngine.
> Thanks for the other changes!
> [Comment from SourceForge]
> Date: 2008-06-02 09:21
> Sender: lehmialk
> Logged In: YES 
> user_id=2069622
> Originator: YES
> Hi Daniel,
> I guess I've solved the problem. The textposition-handling has to be
> adjusted within the method PDFTextStripper.flushText(). Of course my former
> changes to the class PDFStreamEngine are needed. During debugging I found a
> bug in the class TextPositionComparator (line 82). I solved it by removing
> the rotation if-clauses. Whenever you compare two Textpositions, it is
> needless to look at the rotation because they are on the same page so that
> the comparison is independent of the rotation.
> Furthermore my PDFTextStripper-patch seems to correct some minor problems,
> which are described in
> https://sourceforge.net/forum/message.php?msg_id=4976730.
> I've tested the following cases:
> Garcia2003b__Correlative_exploration_of_EEG_Signals.pdf works 100%
> test_rotate_270.txt doesn't work 100%, but my patch corrected a bug in
> lines 251-257, 278/279, 502/503, 574/575 and the other differences are some
> kind of special-character-issues. I guess you have to correct the input at
> first.
> I've attached my changes based on the newest versions of both classes.
> [Comment from SourceForge]
> Date: 2008-06-02 09:22
> Sender: lehmialk
> Logged In: YES 
> user_id=2069622
> Originator: YES
> File Added: pdfbox_rotation_patch_3.zip
> http://sourceforge.net/tracker/download.php?group_id=78314&atid=552834&file_id=279842&aid=1977429

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PDFBOX-363) Fixed Page rotation

Posted by "Andreas Lehmkühler (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PDFBOX-363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Andreas Lehmkühler updated PDFBOX-363:
--------------------------------------

    Attachment: landscape_rot90.pdf

A4 landscape example with page rotation = 90

> Fixed Page rotation
> -------------------
>
>                 Key: PDFBOX-363
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-363
>             Project: PDFBox
>          Issue Type: Improvement
>            Reporter: Jukka Zitting
>         Attachments: landscape_rot90.pdf
>
>
> [Issue from SourceForge]
> http://sourceforge.net/tracker/index.php?func=detail&aid=1977429&group_id=78314&atid=552834
> Hi all,
> Daniel asked me for my patch for the rotation-issue described in
> https://sourceforge.net/forum/message.php?msg_id=4992032
> Attention, I didn't apply the newest patches to the classes PDFStreamEngine
> and PageDrawer.
> There are 4 more probably affected classes calling the page.findRotation
> method which I didn't change, because I'm didn't have to use them (until
> now).
> org.pdfbox.util.operator.pagedrawer.Invoke
> org.pdfbox.util.TextPositionComparator
> org.pdfbox.examples.pdmodel.PrintURLs
> org.pdfbox.examples.util.PrintImageLocations
> I've attached a pdf in DINA4-landscape. The text is missplaced whenever I
> try to print or display (using the pdfbox-PDFReader and convertToImage
> within my application) it with pdfbox. The acrobat reader has no problems
> with my documents.
> After my patch everything works fine. Perhaps it is a point of discussion,
> if the convertToImage method has to rotate the image or if the user has to
> do it. The PDFPagePanel didn't do it (yet).
> Andreas
> http://sourceforge.net/tracker/download.php?group_id=78314&atid=552834&file_id=279404&aid=1977429
> [Comment from SourceForge]
> Date: 2008-05-29 12:42
> Sender: danielwilson
> Logged In: YES 
> user_id=1737686
> Originator: NO
> I've just tried your sample PDF w/ the latest code -- prior to application
> of your patch.  It doesn't work.
> I'll work on incorporating your change for a full regression test in the
> next hour or so.
> [Comment from SourceForge]
> Date: 2008-05-29 15:16
> Sender: lehmialk
> Logged In: YES 
> user_id=2069622
> Originator: YES
> Hi Daniel,
> I've just added my patch to the newest sources you send me earlier this
> day. I guess it works. During testing I've found another problem concernign
> graphics within landscape-docs. I found the solution in patching the class
> org.pdfbox.util.operator.pagedrawer.Invoke in the same way I've patched the
> others. And consequently to be strict I've also patched the new methods in
> org.pdfbox.pdfviewer.PageDrawer
> For my everthings works fine inlc. the 4PP-pdf.
> I've attached the patched files and another testpdf with a embedded
> graphic.
> Andreas
> File Added: pdfbox_rotation_patch_2.zip
> http://sourceforge.net/tracker/download.php?group_id=78314&atid=552834&file_id=279471&aid=1977429
> [Comment from SourceForge]
> Date: 2008-05-29 18:12
> Sender: danielwilson
> Logged In: YES 
> user_id=1737686
> Originator: NO
> Your code works w/ the 4PP test ... and with the other rendering stuff
> I've tried so far.
> However ... the text extraction test fails with it.  I can't figure that
> one out ... ideas?
> [Comment from SourceForge]
> Date: 2008-05-29 18:19
> Sender: lehmialk
> Logged In: YES 
> user_id=2069622
> Originator: YES
> Can you give me some more details? I never do any textextractions with
> pdfbox. Perhaps you'll provide with the code for test program, or is it
> part of pdfbox, so that I can find it in the cvs?
> However, it has to wait until tomorrow
> [Comment from SourceForge]
> Date: 2008-05-29 18:39
> Sender: danielwilson
> Logged In: YES 
> user_id=1737686
> Originator: NO
> If you've got the whole project set up, try
> ant testextract
> I'll see if I can narrow it down some.
> [Comment from SourceForge]
> Date: 2008-05-29 21:00
> Sender: danielwilson
> Logged In: YES 
> user_id=1737686
> Originator: NO
> The extraction problem seems to have to do w/ the changes to
> PDFStreamEngine.
> If I revert that file, extraction succeeds.  Unfortunately ... with that
> reverted but your other changes in place, image rendering hangs.
> Will work on it more ... probably tomorrow.
> [Comment from SourceForge]
> Date: 2008-05-29 21:12
> Sender: danielwilson
> Logged In: YES 
> user_id=1737686
> Originator: NO
> Correction ... it doesn't hang ... it's just slow on the first PDF to
> render ... maybe just due to the first one I'm sending it.
> Will look more tomorrow.
> [Comment from SourceForge]
> Date: 2008-05-30 07:11
> Sender: lehmialk
> Logged In: YES 
> user_id=2069622
> Originator: YES
> I've found one bug. While deleting the if rules for the rotation, I've
> deleted line 394 which is still needed.
> I've attached the corrected file
> File Added: PDFStreamEngine.java
> http://sourceforge.net/tracker/download.php?group_id=78314&atid=552834&file_id=279559&aid=1977429
> [Comment from SourceForge]
> Date: 2008-05-30 07:43
> Sender: lehmialk
> Logged In: YES 
> user_id=2069622
> Originator: YES
> I forgot to mention that I can't run the test suite. When I try to get the
> whole project, I realized that I'm behind a firewall here in my office.
> Consequently my cvs-client doesn't work. I've to do it from home. :-(
> I've only tested one file: 601501018.pdf
> There are additional blanks and they disapper after adding the missing
> line. But starting at page 21, when the document orientation changes from
> portrait to landscape, there are additional cr or lf. Hmmmm ??
> [Comment from SourceForge]
> Date: 2008-05-30 08:25
> Sender: lehmialk
> Logged In: YES 
> user_id=2069622
> Originator: YES
> I've continued testing and I guess the problem is somewhere starting in
> org.pdfbox.util.PDFTextStripper.showCharacter(..). Obviously it handles the
> coordinates for rotated pages somehow in an other way than the
> implementation of the showCharacter() in org.pdfbox.pdfviewer.PageDrawer.
> But for the moment I don't understand what's happening in the
> TextStripper, perhaps I'll find out later. 
> I hope this hint helps ...
> [Comment from SourceForge]
> Date: 2008-05-30 16:20
> Sender: danielwilson
> Logged In: YES 
> user_id=1737686
> Originator: NO
> I've put a couple more hours into this, and I don't know the answer.
> I do know the text extraction is the more mature side of this library.
> For the moment, I'll be skipping over your changes to PDFStreamEngine.
> Thanks for the other changes!
> [Comment from SourceForge]
> Date: 2008-06-02 09:21
> Sender: lehmialk
> Logged In: YES 
> user_id=2069622
> Originator: YES
> Hi Daniel,
> I guess I've solved the problem. The textposition-handling has to be
> adjusted within the method PDFTextStripper.flushText(). Of course my former
> changes to the class PDFStreamEngine are needed. During debugging I found a
> bug in the class TextPositionComparator (line 82). I solved it by removing
> the rotation if-clauses. Whenever you compare two Textpositions, it is
> needless to look at the rotation because they are on the same page so that
> the comparison is independent of the rotation.
> Furthermore my PDFTextStripper-patch seems to correct some minor problems,
> which are described in
> https://sourceforge.net/forum/message.php?msg_id=4976730.
> I've tested the following cases:
> Garcia2003b__Correlative_exploration_of_EEG_Signals.pdf works 100%
> test_rotate_270.txt doesn't work 100%, but my patch corrected a bug in
> lines 251-257, 278/279, 502/503, 574/575 and the other differences are some
> kind of special-character-issues. I guess you have to correct the input at
> first.
> I've attached my changes based on the newest versions of both classes.
> [Comment from SourceForge]
> Date: 2008-06-02 09:22
> Sender: lehmialk
> Logged In: YES 
> user_id=2069622
> Originator: YES
> File Added: pdfbox_rotation_patch_3.zip
> http://sourceforge.net/tracker/download.php?group_id=78314&atid=552834&file_id=279842&aid=1977429

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PDFBOX-363) Fixed Page rotation

Posted by "Andreas Lehmkühler (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PDFBOX-363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Andreas Lehmkühler updated PDFBOX-363:
--------------------------------------

    Attachment: PageRotation-Patch2.diff

Sorry, I didn't realize that the used method doesn't exist in ealier java-versions.

I've attached a java-1.4 compatible version.



> Fixed Page rotation
> -------------------
>
>                 Key: PDFBOX-363
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-363
>             Project: PDFBox
>          Issue Type: Improvement
>            Reporter: Jukka Zitting
>            Assignee: Jukka Zitting
>             Fix For: 0.8.0-incubator
>
>         Attachments: landscape_rot90.pdf, PageRotation-Patch.diff, PageRotation-Patch2.diff, test-landscape2.pdf
>
>
> [Issue from SourceForge]
> http://sourceforge.net/tracker/index.php?func=detail&aid=1977429&group_id=78314&atid=552834
> Hi all,
> Daniel asked me for my patch for the rotation-issue described in
> https://sourceforge.net/forum/message.php?msg_id=4992032
> Attention, I didn't apply the newest patches to the classes PDFStreamEngine
> and PageDrawer.
> There are 4 more probably affected classes calling the page.findRotation
> method which I didn't change, because I'm didn't have to use them (until
> now).
> org.pdfbox.util.operator.pagedrawer.Invoke
> org.pdfbox.util.TextPositionComparator
> org.pdfbox.examples.pdmodel.PrintURLs
> org.pdfbox.examples.util.PrintImageLocations
> I've attached a pdf in DINA4-landscape. The text is missplaced whenever I
> try to print or display (using the pdfbox-PDFReader and convertToImage
> within my application) it with pdfbox. The acrobat reader has no problems
> with my documents.
> After my patch everything works fine. Perhaps it is a point of discussion,
> if the convertToImage method has to rotate the image or if the user has to
> do it. The PDFPagePanel didn't do it (yet).
> Andreas
> http://sourceforge.net/tracker/download.php?group_id=78314&atid=552834&file_id=279404&aid=1977429
> [Comment from SourceForge]
> Date: 2008-05-29 12:42
> Sender: danielwilson
> Logged In: YES 
> user_id=1737686
> Originator: NO
> I've just tried your sample PDF w/ the latest code -- prior to application
> of your patch.  It doesn't work.
> I'll work on incorporating your change for a full regression test in the
> next hour or so.
> [Comment from SourceForge]
> Date: 2008-05-29 15:16
> Sender: lehmialk
> Logged In: YES 
> user_id=2069622
> Originator: YES
> Hi Daniel,
> I've just added my patch to the newest sources you send me earlier this
> day. I guess it works. During testing I've found another problem concernign
> graphics within landscape-docs. I found the solution in patching the class
> org.pdfbox.util.operator.pagedrawer.Invoke in the same way I've patched the
> others. And consequently to be strict I've also patched the new methods in
> org.pdfbox.pdfviewer.PageDrawer
> For my everthings works fine inlc. the 4PP-pdf.
> I've attached the patched files and another testpdf with a embedded
> graphic.
> Andreas
> File Added: pdfbox_rotation_patch_2.zip
> http://sourceforge.net/tracker/download.php?group_id=78314&atid=552834&file_id=279471&aid=1977429
> [Comment from SourceForge]
> Date: 2008-05-29 18:12
> Sender: danielwilson
> Logged In: YES 
> user_id=1737686
> Originator: NO
> Your code works w/ the 4PP test ... and with the other rendering stuff
> I've tried so far.
> However ... the text extraction test fails with it.  I can't figure that
> one out ... ideas?
> [Comment from SourceForge]
> Date: 2008-05-29 18:19
> Sender: lehmialk
> Logged In: YES 
> user_id=2069622
> Originator: YES
> Can you give me some more details? I never do any textextractions with
> pdfbox. Perhaps you'll provide with the code for test program, or is it
> part of pdfbox, so that I can find it in the cvs?
> However, it has to wait until tomorrow
> [Comment from SourceForge]
> Date: 2008-05-29 18:39
> Sender: danielwilson
> Logged In: YES 
> user_id=1737686
> Originator: NO
> If you've got the whole project set up, try
> ant testextract
> I'll see if I can narrow it down some.
> [Comment from SourceForge]
> Date: 2008-05-29 21:00
> Sender: danielwilson
> Logged In: YES 
> user_id=1737686
> Originator: NO
> The extraction problem seems to have to do w/ the changes to
> PDFStreamEngine.
> If I revert that file, extraction succeeds.  Unfortunately ... with that
> reverted but your other changes in place, image rendering hangs.
> Will work on it more ... probably tomorrow.
> [Comment from SourceForge]
> Date: 2008-05-29 21:12
> Sender: danielwilson
> Logged In: YES 
> user_id=1737686
> Originator: NO
> Correction ... it doesn't hang ... it's just slow on the first PDF to
> render ... maybe just due to the first one I'm sending it.
> Will look more tomorrow.
> [Comment from SourceForge]
> Date: 2008-05-30 07:11
> Sender: lehmialk
> Logged In: YES 
> user_id=2069622
> Originator: YES
> I've found one bug. While deleting the if rules for the rotation, I've
> deleted line 394 which is still needed.
> I've attached the corrected file
> File Added: PDFStreamEngine.java
> http://sourceforge.net/tracker/download.php?group_id=78314&atid=552834&file_id=279559&aid=1977429
> [Comment from SourceForge]
> Date: 2008-05-30 07:43
> Sender: lehmialk
> Logged In: YES 
> user_id=2069622
> Originator: YES
> I forgot to mention that I can't run the test suite. When I try to get the
> whole project, I realized that I'm behind a firewall here in my office.
> Consequently my cvs-client doesn't work. I've to do it from home. :-(
> I've only tested one file: 601501018.pdf
> There are additional blanks and they disapper after adding the missing
> line. But starting at page 21, when the document orientation changes from
> portrait to landscape, there are additional cr or lf. Hmmmm ??
> [Comment from SourceForge]
> Date: 2008-05-30 08:25
> Sender: lehmialk
> Logged In: YES 
> user_id=2069622
> Originator: YES
> I've continued testing and I guess the problem is somewhere starting in
> org.pdfbox.util.PDFTextStripper.showCharacter(..). Obviously it handles the
> coordinates for rotated pages somehow in an other way than the
> implementation of the showCharacter() in org.pdfbox.pdfviewer.PageDrawer.
> But for the moment I don't understand what's happening in the
> TextStripper, perhaps I'll find out later. 
> I hope this hint helps ...
> [Comment from SourceForge]
> Date: 2008-05-30 16:20
> Sender: danielwilson
> Logged In: YES 
> user_id=1737686
> Originator: NO
> I've put a couple more hours into this, and I don't know the answer.
> I do know the text extraction is the more mature side of this library.
> For the moment, I'll be skipping over your changes to PDFStreamEngine.
> Thanks for the other changes!
> [Comment from SourceForge]
> Date: 2008-06-02 09:21
> Sender: lehmialk
> Logged In: YES 
> user_id=2069622
> Originator: YES
> Hi Daniel,
> I guess I've solved the problem. The textposition-handling has to be
> adjusted within the method PDFTextStripper.flushText(). Of course my former
> changes to the class PDFStreamEngine are needed. During debugging I found a
> bug in the class TextPositionComparator (line 82). I solved it by removing
> the rotation if-clauses. Whenever you compare two Textpositions, it is
> needless to look at the rotation because they are on the same page so that
> the comparison is independent of the rotation.
> Furthermore my PDFTextStripper-patch seems to correct some minor problems,
> which are described in
> https://sourceforge.net/forum/message.php?msg_id=4976730.
> I've tested the following cases:
> Garcia2003b__Correlative_exploration_of_EEG_Signals.pdf works 100%
> test_rotate_270.txt doesn't work 100%, but my patch corrected a bug in
> lines 251-257, 278/279, 502/503, 574/575 and the other differences are some
> kind of special-character-issues. I guess you have to correct the input at
> first.
> I've attached my changes based on the newest versions of both classes.
> [Comment from SourceForge]
> Date: 2008-06-02 09:22
> Sender: lehmialk
> Logged In: YES 
> user_id=2069622
> Originator: YES
> File Added: pdfbox_rotation_patch_3.zip
> http://sourceforge.net/tracker/download.php?group_id=78314&atid=552834&file_id=279842&aid=1977429

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.