You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@pdfbox.apache.org by Frank van der Hulst <dr...@gmail.com> on 2014/10/01 11:37:01 UTC

Re: PageDrawer bug?

OK, I've figured out where I went wrong. I now have a LinedTableStripper()
class which works for the files I was having trouble with. :)

In case anyone cares, the problem seemed to be something to do with setting
the page and pageSize fields in PageDrawer.

Many thanks to those who gave help.

Frank


On Tue, Sep 30, 2014 at 9:56 PM, Frank van der Hulst <
drifter.frank@gmail.com> wrote:

> Hi Tilman,
>
> I'm pretty sure now that there's nothing wrong with PageDrawer, but there
> is something wrong with my understanding of it.
>
> So I'm still poking and prodding it to try to figure it out myself. Will
> have another look tomorrow and then get back to you.
>
> Frank
>
> On Tue, Sep 30, 2014 at 7:00 PM, Tilman Hausherr <TH...@t-online.de>
> wrote:
>
>> Hi,
>>
>> The best is to download source code from the source and not from some
>> secondary websites.
>>
>> https://pdfbox.apache.org/download.cgi#recent
>>
>> Still can't tell why it doesn't work for you because you didn't post your
>> code :-(
>>
>> Tilman
>>
>>
>>
>> Am 30.09.2014 um 05:56 schrieb Frank van der Hulst:
>>
>>  Thanks for the replies... I'm working with 1.8.7, but the same applied to
>>> 1.8.6 and I think 1.8.5.
>>>
>>> convertToImage() works properly, which was a bit surprising when I looked
>>> into it and found that it created a PageDrawer object. So I tried copying
>>> the source code for convertToImage into my code. That worked fine too.
>>>
>>> Then I tried copying the source from
>>> http://grepcode.com/file/repo1.maven.org/maven2/org.
>>> apache.pdfbox/pdfbox/1.8.6/org/apache/pdfbox/pdfviewer/
>>> PageDrawer.java?av=f
>>> (couldn't find 1.8.7) into my own PageDrawer class. That *doesn't* work
>>> properly...  lines aren't drawn at all (probably off the page?). I don't
>>> understand this at all... surely identical code will do the same thing???
>>> Or is something else in the pdfbox library directly accessing
>>> org.apache.pdfbox.pdfviewer.PageDrawer via one of its public methods?
>>>
>>> This may be the case because when I changed my PageDrawer to extend
>>> org.apache.pdfbox.pdfviewer.PageDrawer instead of PdfStreamEngine, it
>>> worked perfectly. Which is all the more confusing because my original
>>> class
>>> extended PageDrawer and didn't work.
>>>
>>> Frank
>>>
>>>
>>> On Tue, Sep 30, 2014 at 5:04 AM, Tilman Hausherr <TH...@t-online.de>
>>> wrote:
>>>
>>>  Hi,
>>>>
>>>> The best is to upload the code and the PDFs to a public location.
>>>>
>>>> PDF is not easy... coordinates that you see in the stream are always
>>>> relative to the current transformation matrix.
>>>>
>>>> Tilman
>>>>
>>>> Am 29.09.2014 um 10:56 schrieb Frank van der Hulst:
>>>>
>>>>   Hi all,
>>>>
>>>>> I'm new to the list... I beg your indulgence if I'm out of line here,
>>>>> but
>>>>> here goes...
>>>>>
>>>>> I'm working on a PDF table extractor.  This is my second attempt at it,
>>>>> and
>>>>> this one is based on extending PageDrawer.
>>>>>
>>>>> In particular, I'm looking for table cells delineated by vertical &
>>>>> horizontal lines, and then grabbing whatever text is inside the
>>>>> rectangle.
>>>>>
>>>>> This works well for most PDFs I've tried (admittedly all from the same
>>>>> source), but there's a large subset that it doesn't work on. I've
>>>>> debugged
>>>>> my way through one, and it appears that when      processStream(page,
>>>>> page.findResources(), page.getContents().getStream()); calls
>>>>> fillPath()
>>>>> or
>>>>> strokepath() to draw the lines, they aren't drawn in the correct place.
>>>>> They seem to be offset some distance down the page.
>>>>>
>>>>> I've looked at a couple of my troublesome PDFs, and one thing they
>>>>> have in
>>>>> common is that they are v1.4, whereas the ones that work are v1.7.
>>>>>
>>>>> Sooo... Has anyone encountered this before? Is there a known bug with
>>>>> PageDrawer.processStream() or perhaps with the PdfStreamEngine and
>>>>> drawing
>>>>> of v1.4 PDFs?
>>>>>
>>>>> I'm happy to share my source code and example PDFs with anyone if it
>>>>> would
>>>>> help.
>>>>>
>>>>> Thanks
>>>>>
>>>>> Frank
>>>>>
>>>>>
>>>>>
>>
>