You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@pdfbox.apache.org by Constantine Dokolas <cd...@gmail.com> on 2018/05/15 13:00:01 UTC

Dubious implementation approach with content stream OperatorProcessors and PDFGraphicsStreamEngine

Hi all!

I'm currently implementing a library that intercepts content stream
operators from a document's pages in order to produce a filtered version
(as a different document). To this, I'm writing an extension to
PDFGraphicsStreamEngine (as suggested by the documentation) and using
processOperator() to do "my stuff".

Certain operators (MoveTextSetLeading, NextLine, ShowTextLine and
ShowTextLineAndSpace), in order to trigger the (multiple) corresponding
events, take the shortcut of calling processOperator() for "bogus"
operators that are in fact *not present* in the content stream.

I suggest the devs replace such calls with temporary OperatorProcessor
instantiations and process(...) calls on them. I'd be happy to contribute
the related changes, but I'm not sure how.

I'd like to voice my appreciation for this project. It has solved many
problems for me and I greatly appreciate the devs' effort.

Regards,

Constantine

-- 
There is a computer disease that anybody who works with computers knows
about. It's a very serious disease and it interferes completely with the
work. The trouble with computers is that you 'play' with them!
- Richard P. Feynman

Re: Dubious implementation approach with content stream OperatorProcessors and PDFGraphicsStreamEngine

Posted by Constantine Dokolas <cd...@gmail.com>.
I'm back with a patch for you. See if it makes sense.

Regards, and keep up the good work,
Constantine

On Thu, May 24, 2018 at 7:27 PM Tilman Hausherr <TH...@t-online.de>
wrote:

> Am 24.05.2018 um 11:08 schrieb Constantine Dokolas:
> > Can't say about design rules, but I'm under the impression few are using
> > the framework as I have lately (i.e. using it to process content on the
> > operator level), so the impact on existing code should be minimal. Making
> > these fixes will be best in the long run, and I don't see why there will
> be
> > double code. Perhaps some other major fixes can be gathered to justify a
> > 2.1.0 release, bringing more attention to the changes in behavior.
> >
> > I'd like to contribute, but all I can do right now is push a revision on
> > Github (on my fork of the repository).
> >
> > If you can post a short guide on how to setup an IDE (Eclipse, VS Code or
> > Netbeans) to work with the project, I'd also appreciate it.
>
> In netbeans:
>
> Team, Subversion, Checkout and then the project URL
>
> |http://svn.apache.org/repos/asf/pdfbox/trunk/ Next, change the directory
> if needed, check "skip trunk", press finish. After some time you'll have
> to press OK on the main project. To build, use the right mouse key on
> the project. Again, please be aware that I'm very skeptic on your
> suggested change, but I'm interested to see it because you have an
> argument in your initial post. Btw there's a different way to filter,
> just go through the tokens in the content stream. See the RemoveAllTexts
> example in the source code download. Re "|I'm under the impression few are
> using the framework as I have lately" - how would you know?||
>
>
> Tilman
>
> >
> > Regards,
> > Constantine
> >
> > On Wed, May 23, 2018 at 7:58 PM Tilman Hausherr <TH...@t-online.de>
> > wrote:
> >
> >> You're right... makes we wonder if we violated some design rule. The
> >> alternative would be some double code, which isn't good either.
> >>
> >> Lets say we change this, then not only we'd have some double code, but
> >> some existing code by our users might no longer work...
> >>
> >> You can open an issue in JIRA and attach a patch / diff file there. But
> >> it might be a difficult decision.
> >>
> >> Tilman
> >>
> >> Am 15.05.2018 um 15:00 schrieb Constantine Dokolas:
> >>> Hi all!
> >>>
> >>> I'm currently implementing a library that intercepts content stream
> >>> operators from a document's pages in order to produce a filtered
> version
> >>> (as a different document). To this, I'm writing an extension to
> >>> PDFGraphicsStreamEngine (as suggested by the documentation) and using
> >>> processOperator() to do "my stuff".
> >>>
> >>> Certain operators (MoveTextSetLeading, NextLine, ShowTextLine and
> >>> ShowTextLineAndSpace), in order to trigger the (multiple) corresponding
> >>> events, take the shortcut of calling processOperator() for "bogus"
> >>> operators that are in fact *not present* in the content stream.
> >>>
> >>> I suggest the devs replace such calls with temporary OperatorProcessor
> >>> instantiations and process(...) calls on them. I'd be happy to
> contribute
> >>> the related changes, but I'm not sure how.
> >>>
> >>> I'd like to voice my appreciation for this project. It has solved many
> >>> problems for me and I greatly appreciate the devs' effort.
> >>>
> >>> Regards,
> >>>
> >>> Constantine
> >>>
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
> >> For additional commands, e-mail: users-help@pdfbox.apache.org
> >>
> >> --
> > There is a computer disease that anybody who works with computers knows
> > about. It's a very serious disease and it interferes completely with the
> > work. The trouble with computers is that you 'play' with them!
> > - Richard P. Feynman
> >
>
> --
There is a computer disease that anybody who works with computers knows
about. It's a very serious disease and it interferes completely with the
work. The trouble with computers is that you 'play' with them!
- Richard P. Feynman

Re: Dubious implementation approach with content stream OperatorProcessors and PDFGraphicsStreamEngine

Posted by Tilman Hausherr <TH...@t-online.de>.
Am 24.05.2018 um 11:08 schrieb Constantine Dokolas:
> Can't say about design rules, but I'm under the impression few are using
> the framework as I have lately (i.e. using it to process content on the
> operator level), so the impact on existing code should be minimal. Making
> these fixes will be best in the long run, and I don't see why there will be
> double code. Perhaps some other major fixes can be gathered to justify a
> 2.1.0 release, bringing more attention to the changes in behavior.
>
> I'd like to contribute, but all I can do right now is push a revision on
> Github (on my fork of the repository).
>
> If you can post a short guide on how to setup an IDE (Eclipse, VS Code or
> Netbeans) to work with the project, I'd also appreciate it.

In netbeans:

Team, Subversion, Checkout and then the project URL

|http://svn.apache.org/repos/asf/pdfbox/trunk/ Next, change the directory 
if needed, check "skip trunk", press finish. After some time you'll have 
to press OK on the main project. To build, use the right mouse key on 
the project. Again, please be aware that I'm very skeptic on your 
suggested change, but I'm interested to see it because you have an 
argument in your initial post. Btw there's a different way to filter, 
just go through the tokens in the content stream. See the RemoveAllTexts 
example in the source code download. Re "|I'm under the impression few are using the framework as I have lately" - how would you know?||


Tilman

>
> Regards,
> Constantine
>
> On Wed, May 23, 2018 at 7:58 PM Tilman Hausherr <TH...@t-online.de>
> wrote:
>
>> You're right... makes we wonder if we violated some design rule. The
>> alternative would be some double code, which isn't good either.
>>
>> Lets say we change this, then not only we'd have some double code, but
>> some existing code by our users might no longer work...
>>
>> You can open an issue in JIRA and attach a patch / diff file there. But
>> it might be a difficult decision.
>>
>> Tilman
>>
>> Am 15.05.2018 um 15:00 schrieb Constantine Dokolas:
>>> Hi all!
>>>
>>> I'm currently implementing a library that intercepts content stream
>>> operators from a document's pages in order to produce a filtered version
>>> (as a different document). To this, I'm writing an extension to
>>> PDFGraphicsStreamEngine (as suggested by the documentation) and using
>>> processOperator() to do "my stuff".
>>>
>>> Certain operators (MoveTextSetLeading, NextLine, ShowTextLine and
>>> ShowTextLineAndSpace), in order to trigger the (multiple) corresponding
>>> events, take the shortcut of calling processOperator() for "bogus"
>>> operators that are in fact *not present* in the content stream.
>>>
>>> I suggest the devs replace such calls with temporary OperatorProcessor
>>> instantiations and process(...) calls on them. I'd be happy to contribute
>>> the related changes, but I'm not sure how.
>>>
>>> I'd like to voice my appreciation for this project. It has solved many
>>> problems for me and I greatly appreciate the devs' effort.
>>>
>>> Regards,
>>>
>>> Constantine
>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
>> For additional commands, e-mail: users-help@pdfbox.apache.org
>>
>> --
> There is a computer disease that anybody who works with computers knows
> about. It's a very serious disease and it interferes completely with the
> work. The trouble with computers is that you 'play' with them!
> - Richard P. Feynman
>


Re: Dubious implementation approach with content stream OperatorProcessors and PDFGraphicsStreamEngine

Posted by Constantine Dokolas <cd...@gmail.com>.
Can't say about design rules, but I'm under the impression few are using
the framework as I have lately (i.e. using it to process content on the
operator level), so the impact on existing code should be minimal. Making
these fixes will be best in the long run, and I don't see why there will be
double code. Perhaps some other major fixes can be gathered to justify a
2.1.0 release, bringing more attention to the changes in behavior.

I'd like to contribute, but all I can do right now is push a revision on
Github (on my fork of the repository).

If you can post a short guide on how to setup an IDE (Eclipse, VS Code or
Netbeans) to work with the project, I'd also appreciate it.

Regards,
Constantine

On Wed, May 23, 2018 at 7:58 PM Tilman Hausherr <TH...@t-online.de>
wrote:

> You're right... makes we wonder if we violated some design rule. The
> alternative would be some double code, which isn't good either.
>
> Lets say we change this, then not only we'd have some double code, but
> some existing code by our users might no longer work...
>
> You can open an issue in JIRA and attach a patch / diff file there. But
> it might be a difficult decision.
>
> Tilman
>
> Am 15.05.2018 um 15:00 schrieb Constantine Dokolas:
> > Hi all!
> >
> > I'm currently implementing a library that intercepts content stream
> > operators from a document's pages in order to produce a filtered version
> > (as a different document). To this, I'm writing an extension to
> > PDFGraphicsStreamEngine (as suggested by the documentation) and using
> > processOperator() to do "my stuff".
> >
> > Certain operators (MoveTextSetLeading, NextLine, ShowTextLine and
> > ShowTextLineAndSpace), in order to trigger the (multiple) corresponding
> > events, take the shortcut of calling processOperator() for "bogus"
> > operators that are in fact *not present* in the content stream.
> >
> > I suggest the devs replace such calls with temporary OperatorProcessor
> > instantiations and process(...) calls on them. I'd be happy to contribute
> > the related changes, but I'm not sure how.
> >
> > I'd like to voice my appreciation for this project. It has solved many
> > problems for me and I greatly appreciate the devs' effort.
> >
> > Regards,
> >
> > Constantine
> >
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
> For additional commands, e-mail: users-help@pdfbox.apache.org
>
> --
There is a computer disease that anybody who works with computers knows
about. It's a very serious disease and it interferes completely with the
work. The trouble with computers is that you 'play' with them!
- Richard P. Feynman

Re: Dubious implementation approach with content stream OperatorProcessors and PDFGraphicsStreamEngine

Posted by Tilman Hausherr <TH...@t-online.de>.
You're right... makes we wonder if we violated some design rule. The 
alternative would be some double code, which isn't good either.

Lets say we change this, then not only we'd have some double code, but 
some existing code by our users might no longer work...

You can open an issue in JIRA and attach a patch / diff file there. But 
it might be a difficult decision.

Tilman

Am 15.05.2018 um 15:00 schrieb Constantine Dokolas:
> Hi all!
>
> I'm currently implementing a library that intercepts content stream
> operators from a document's pages in order to produce a filtered version
> (as a different document). To this, I'm writing an extension to
> PDFGraphicsStreamEngine (as suggested by the documentation) and using
> processOperator() to do "my stuff".
>
> Certain operators (MoveTextSetLeading, NextLine, ShowTextLine and
> ShowTextLineAndSpace), in order to trigger the (multiple) corresponding
> events, take the shortcut of calling processOperator() for "bogus"
> operators that are in fact *not present* in the content stream.
>
> I suggest the devs replace such calls with temporary OperatorProcessor
> instantiations and process(...) calls on them. I'd be happy to contribute
> the related changes, but I'm not sure how.
>
> I'd like to voice my appreciation for this project. It has solved many
> problems for me and I greatly appreciate the devs' effort.
>
> Regards,
>
> Constantine
>


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org