You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by "Ken Weinert (JIRA)" <ji...@apache.org> on 2008/08/16 18:35:45 UTC

[jira] Issue Comment Edited: (PDFBOX-41) Tool to create hyperlinks

    [ https://issues.apache.org/jira/browse/PDFBOX-41?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12623140#action_12623140 ] 

kweinert edited comment on PDFBOX-41 at 8/16/08 9:34 AM:
------------------------------------------------------------

I have started work on this feature.

My plan is to make it just a little more extensible, and I'll have a command line tool called PDFFilter and it will use the PDFFilterUtility class.

As I'm thinking now the command line would look something like:

...PDFFilter -filter 'link("regex", "url", "mode")' -dest outputFileName inputFileName+

I suspect that in the long run most usage will be to filter a singe file, but I'm building off the PDFMerge concept and I'll insert the filtering into the page copies.

For each filter I'll load a class named PDFfilternameFilter (so the URL replacement would be PDFLinkFilter) and execute a method in the class named Filter that takes an object array as the argument.  In the proposed scenario the object array would consist of the regex, the url, and the mode, where mode is how the link is shown in the text. Right now I'm thinking of the following modes:  boxed, underlined, none.

I haven't thought of the detail yet on how the filter will get the text it's supposed to operate on. At the moment I've thought of, but not decided on, whether to pass the page or individual streams. In this case the page object would be better, because I have to add additional objects to implement the link. Doing so, however, will mean that if I am able to make this generic filter implementation then it puts the burden of parsing pages on all the filters that can be plugged in. That's overkill for some, but it might be worth the price.

An example of another filter would be a simple text substitution:  replace("oldText", "newText") and I'm sure there are others that could use this scheme.

Feedback *more* than welcome.


      was (Author: kweinert):
    I have started work on this feature.

My plan is to make it just a little more extensible, and I'll have a command line tool called PDFilter and it will use the PDFilterUtility class.

As I'm thinking now the command line would look something like:

...PDFFilter -filter 'link("regex", "url", "mode")' -dest outputFileName inputFileName+

I suspect that in the long run most usage will be to filter a singe file, but I'm building off the PDFMerge concept and I'll insert the filtering into the page copies.

For each filter I'll load a class named PDFfilternameFilter (so the URL replacement would be PDFLinkFilter) and execute a method in the class named Filter that takes an object array as the argument.  In the proposed scenario the object array would consist of the regex, the url, and the mode, where mode is how the link is shown in the text. Right now I'm thinking of the following modes:  boxed, underlined, none.

I haven't thought of the detail yet on how the filter will get the text it's supposed to operate on. At the moment I've thought of, but not decided on, whether to pass the page or individual streams. In this case the page object would be better, because I have to add additional objects to implement the link. Doing so, however, will mean that if I am able to make this generic filter implementation then it puts the burden of parsing pages on all the filters that can be plugged in. That's overkill for some, but it might be worth the price.

An example of another filter would be a simple text substitution:  replace("oldText", "newText") and I'm sure there are others that could use this scheme.

Feedback *more* than welcome.

  
> Tool to create hyperlinks
> -------------------------
>
>                 Key: PDFBOX-41
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-41
>             Project: PDFBox
>          Issue Type: New Feature
>          Components: PDModel
>
> [imported from SourceForge]
> http://sourceforge.net/tracker/index.php?group_id=78314&atid=552835&aid=1144507
> Originally submitted by benlitchfield on 2005-02-19 12:09.
> There are many PDF documents that contain URLs but 
> are not hyperlinks.  Should create a tool which replaces 
> them with clickable links.
> Ben
> [comment on SourceForge]
> Originally sent by parvins.
> Logged In: YES 
> user_id=2029336
> Originator: NO
> Does this feature is supported in future releases?
> [comment on SourceForge]
> Originally sent by dukat.
> Logged In: YES 
> user_id=950950
> Such a tool will be really useful.
> Some other feature of this tool coud be something like this.
> Create a link for each text that 
> matches a specific regular expression.
> with a method signature like this
> createLinks(String regex,String url)
> Where regex is the matching regular expression
> and url is the url that should be set.
> So you can combine regex groups with the link.
> For example:
> A text conatins a ordernumber like
> 123456.01.002
> an you want a link like 
> http://www.myshop.com/order.jsp?myproductid=123456&myid=01&page=002
> The regex ist something like
> ([0-9]{6})\\.([0-9]{2})\\.([0-9]{3})
> and the url something like
> http://www.myshop.com/order.jsp?myproductid=$1&myid=$2&page=$3
> bes regards Juergen
> [comment on SourceForge]
> Originally sent by nobody.
> Logged In: NO 
> I agree!
> [comment on SourceForge]
> Originally sent by nobody.
> Logged In: NO 
> Please please add such a tool! Hyperlinks are really needed!

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.