You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@pdfbox.apache.org by Alin Ghitulan <al...@gmail.com> on 2016/01/18 06:43:23 UTC

PDFBox for JavaScript analysis

Hello,

Can anyone help me accomplish this? I need some direction on how to obtain
a list of objects in PDF that contains JavaScript code so I can further
process the JS code.

Thanks,
Alin

Re: PDFBox for JavaScript analysis

Posted by Tilman Hausherr <TH...@t-online.de>.

Am 20.01.2016 um 05:58 schrieb Tilman Hausherr:
> These indirect objects can have several dictionary levels nested.
and arrays too.

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org

Re: PDFBox for JavaScript analysis

Posted by Alin Ghitulan <al...@gmail.com>.

Thanks a lot, I will try it out and come back once I have component that
can read the whole javascript. Maybe the code can be put somewhere in the
API and be of used to someone else :)

On Wed, Jan 20, 2016, 15:35 Tilman Hausherr <TH...@t-online.de> wrote:

> Am 20.01.2016 um 05:58 schrieb Tilman Hausherr:
> >>
> >
> > It doesn't work that way because there are direct and indirect objects
> > in PDF. One can list the indirect objects only (you did that
> > somewhere). The indirect objects are the ones that start like this "42
> > 0 obj". These indirect objects can have several dictionary levels
> > nested. And then, these indirect objects can either be seen in the
> > PDF, or be "hidden" in an compressed object stream.
>
> It suddenly ocurred to me that it is of course possible to get all
> COSStrings with the API. I've put some code at
>
> https://stackoverflow.com/questions/34840299/finding-javascript-code-in-pdf-using-apache-pdfbox/34899156#34899156
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
> For additional commands, e-mail: users-help@pdfbox.apache.org
>
>

Re: PDFBox for JavaScript analysis

Posted by Tilman Hausherr <TH...@t-online.de>.

Am 20.01.2016 um 05:58 schrieb Tilman Hausherr:
>>
>
> It doesn't work that way because there are direct and indirect objects 
> in PDF. One can list the indirect objects only (you did that 
> somewhere). The indirect objects are the ones that start like this "42 
> 0 obj". These indirect objects can have several dictionary levels 
> nested. And then, these indirect objects can either be seen in the 
> PDF, or be "hidden" in an compressed object stream. 

It suddenly ocurred to me that it is of course possible to get all 
COSStrings with the API. I've put some code at
https://stackoverflow.com/questions/34840299/finding-javascript-code-in-pdf-using-apache-pdfbox/34899156#34899156



---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org

Re: PDFBox for JavaScript analysis

Posted by Tilman Hausherr <TH...@t-online.de>.

Am 20.01.2016 um 02:01 schrieb Alin Ghitulan:
> Tilman Hausherr, Roberto Nibali
>
> Thanks for this excellent piece of code. It seems to be dealing very well
> with AcroForm. But here are some stupid questions I have:
>
> I may have some problem understanding the API but it seems to me that those
> PDActionJavaScript objects are nothing more than COSString objects with
> COSName("JS") in a dictionary. Why I can't just list all objects by this
> name and get them? I assume it's because they are inside of some dictionary
> but then isn't there an utilitary method that can exhaustively list all
> "prime" objects (string, long) ? Can you share an example on how one would
> achieve this? I don't mind writing some regexs after to select only the
> ones that contain javascript.

It doesn't work that way because there are direct and indirect objects 
in PDF. One can list the indirect objects only (you did that somewhere). 
The indirect objects are the ones that start like this "42 0 obj". These 
indirect objects can have several dictionary levels nested. And then, 
these indirect objects can either be seen in the PDF, or be "hidden" in 
an compressed object stream.


> I expect more of the javascript code to be placed inside AcroForms, there
> obviously can be some code in the OpenAction ... do you have ideas where
> else could be? I am new to the PDF format but I wouldn't think that it
> allows to put executable javascript code anywhere in the file.

Read the PDF specification... just search for the word "javascript" and 
you'll be amazed.

Tilman




> Thanks for your answers,
> Alin
>
> On Tue, Jan 19, 2016 at 9:45 AM Alin Ghitulan <al...@gmail.com>
> wrote:
>
>> Thanks a lot ! I will try it out tonight and see how it goes :).
>>
>> On Tue, Jan 19, 2016, 08:18 Tilman Hausherr <TH...@t-online.de> wrote:
>>
>>> Am 18.01.2016 um 23:19 schrieb Roberto Nibali:
>>>> This then calls dumpJavaScriptEntries() for all
>>> non-PDFNonTerminalFields,
>>>> which finally dumps the javascript portions of your PDF (courtesy of
>>> Tilman
>>>> Hausherr):
>>> Ah, I forgot that I had written something at that time. Here's the
>>> original code I wrote, although this was for that file only and other
>>> files can have javascript elsewhere too.
>>>
>>>
>>>
>>> /*
>>>    * Licensed to the Apache Software Foundation (ASF) under one or more
>>>    * contributor license agreements.  See the NOTICE file distributed with
>>>    * this work for additional information regarding copyright ownership.
>>>    * The ASF licenses this file to You under the Apache License, Version
>>> 2.0
>>>    * (the "License"); you may not use this file except in compliance with
>>>    * the License.  You may obtain a copy of the License at
>>>    *
>>>    *      http://www.apache.org/licenses/LICENSE-2.0
>>>    *
>>>    * Unless required by applicable law or agreed to in writing, software
>>>    * distributed under the License is distributed on an "AS IS" BASIS,
>>>    * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
>>> implied.
>>>    * See the License for the specific language governing permissions and
>>>    * limitations under the License.
>>>    */
>>> package pdfboxpageimageextraction;
>>>
>>> import java.io.File;
>>> import java.io.IOException;
>>> import java.util.List;
>>> import org.apache.pdfbox.pdmodel.PDDocument;
>>> import org.apache.pdfbox.pdmodel.PDDocumentCatalog;
>>> import org.apache.pdfbox.pdmodel.interactive.action.PDAction;
>>> import org.apache.pdfbox.pdmodel.interactive.action.PDActionJavaScript;
>>> import
>>> org.apache.pdfbox.pdmodel.interactive.action.PDFormFieldAdditionalActions;
>>> import
>>> org.apache.pdfbox.pdmodel.interactive.annotation.PDAnnotationWidget;
>>> import org.apache.pdfbox.pdmodel.interactive.form.PDAcroForm;
>>> import org.apache.pdfbox.pdmodel.interactive.form.PDField;
>>> import org.apache.pdfbox.pdmodel.interactive.form.PDNonTerminalField;
>>> import org.apache.pdfbox.pdmodel.interactive.form.PDTerminalField;
>>>
>>> /**
>>>    * This example will take a PDF document and print all the fields from
>>> the file.
>>>    *
>>>    * @author Ben Litchfield
>>>    *
>>>    */
>>> public class PrintJavaScriptFields
>>> {
>>>
>>>       /**
>>>        * This will print all the fields from the document.
>>>        *
>>>        * @param pdfDocument The PDF to get the fields from.
>>>        *
>>>        * @throws IOException If there is an error getting the fields.
>>>        */
>>>       public void printFields(PDDocument pdfDocument) throws IOException
>>>       {
>>>           PDDocumentCatalog docCatalog = pdfDocument.getDocumentCatalog();
>>>           PDAcroForm acroForm = docCatalog.getAcroForm();
>>>           List<PDField> fields = acroForm.getFields();
>>>
>>>           //System.out.println(fields.size() + " top-level fields were
>>> found on the form");
>>>           for (PDField field : fields)
>>>           {
>>>               processField(field, "|--", field.getPartialName());
>>>           }
>>>       }
>>>
>>>       private void processField(PDField field, String sLevel, String
>>> sParent) throws IOException
>>>       {
>>>           String partialName = field.getPartialName();
>>>
>>>           if (field instanceof PDTerminalField)
>>>           {
>>>               PDTerminalField termField = (PDTerminalField) field;
>>>               PDFormFieldAdditionalActions fieldActions =
>>> field.getActions();
>>>               if (fieldActions != null)
>>>               {
>>>                   System.out.println(field.getFullyQualifiedName() + ": "
>>> + fieldActions.getClass().getSimpleName() + " js field actionS:\n" +
>>> fieldActions.getCOSObject());
>>>                   printPossibleJS(fieldActions.getK());
>>>                   printPossibleJS(fieldActions.getC());
>>>                   printPossibleJS(fieldActions.getF());
>>>                   printPossibleJS(fieldActions.getV());
>>>               }
>>>               for (PDAnnotationWidget widgetAction :
>>> termField.getWidgets())
>>>               {
>>>                   PDAction action = widgetAction.getAction();
>>>                   if (action instanceof PDActionJavaScript)
>>>                   {
>>>                       System.out.println(field.getFullyQualifiedName() +
>>> ": " + action.getClass().getSimpleName() + " js widget action:\n" +
>>> action.getCOSObject());
>>>                       printPossibleJS(action);
>>>                   }
>>>               }
>>>           }
>>>
>>>           if (field instanceof PDNonTerminalField)
>>>           {
>>>               if (!sParent.equals(field.getPartialName()))
>>>               {
>>>                   if (partialName != null)
>>>                   {
>>>                       sParent = sParent + "." + partialName;
>>>                   }
>>>               }
>>>               //System.out.println(sLevel + sParent);
>>>
>>>               for (PDField child : ((PDNonTerminalField)
>>> field).getChildren())
>>>               {
>>>                   processField(child, "|  " + sLevel, sParent);
>>>               }
>>>           }
>>>           else
>>>           {
>>>               String fieldValue = field.getValueAsString();
>>>               StringBuilder outputString = new StringBuilder(sLevel);
>>>               outputString.append(sParent);
>>>               if (partialName != null)
>>>               {
>>>                   outputString.append(".").append(partialName);
>>>               }
>>>               outputString.append(" = ").append(fieldValue);
>>>               outputString.append(",
>>> type=").append(field.getClass().getName());
>>>               //System.out.println(outputString);
>>>           }
>>>       }
>>>
>>>       private void printPossibleJS(PDAction kAction)
>>>       {
>>>           if (kAction instanceof PDActionJavaScript)
>>>           {
>>>               PDActionJavaScript jsAction = (PDActionJavaScript) kAction;
>>>               String jsString = jsAction.getAction();
>>>               if (!jsString.contains("\n"))
>>>               {
>>>                   // Sonst erscheint in Netbeans nichts?!
>>>                   jsString = jsString.replaceAll("\r",
>>> "\n").replaceAll("\n\n", "\n");
>>>               }
>>>               System.out.println(jsString);
>>>               System.out.println();
>>>           }
>>>       }
>>>
>>>       /**
>>>        * This will read a PDF file and print out the form elements. <br />
>>>        * see usage() for commandline
>>>        *
>>>        * @param args command line arguments
>>>        *
>>>        * @throws IOException If there is an error importing the FDF
>>> document.
>>>        */
>>>       public static void main(String[] args) throws IOException
>>>       {
>>>           PDDocument pdf = null;
>>>           try
>>>           {
>>>               pdf = PDDocument.load(new File("XXXX", "YYYYY.pdf"));
>>>               PrintJavaScriptFields exporter = new PrintJavaScriptFields();
>>>               exporter.printFields(pdf);
>>>           }
>>>           finally
>>>           {
>>>               if (pdf != null)
>>>               {
>>>                   pdf.close();
>>>               }
>>>           }
>>>       }
>>>
>>> }
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
>>> For additional commands, e-mail: users-help@pdfbox.apache.org
>>>
>>>


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org

Re: PDFBox for JavaScript analysis

Posted by Alin Ghitulan <al...@gmail.com>.

Tilman Hausherr, Roberto Nibali

Thanks for this excellent piece of code. It seems to be dealing very well
with AcroForm. But here are some stupid questions I have:

I may have some problem understanding the API but it seems to me that those
PDActionJavaScript objects are nothing more than COSString objects with
COSName("JS") in a dictionary. Why I can't just list all objects by this
name and get them? I assume it's because they are inside of some dictionary
but then isn't there an utilitary method that can exhaustively list all
"prime" objects (string, long) ? Can you share an example on how one would
achieve this? I don't mind writing some regexs after to select only the
ones that contain javascript.

I expect more of the javascript code to be placed inside AcroForms, there
obviously can be some code in the OpenAction ... do you have ideas where
else could be? I am new to the PDF format but I wouldn't think that it
allows to put executable javascript code anywhere in the file.

Thanks for your answers,
Alin

On Tue, Jan 19, 2016 at 9:45 AM Alin Ghitulan <al...@gmail.com>
wrote:

> Thanks a lot ! I will try it out tonight and see how it goes :).
>
> On Tue, Jan 19, 2016, 08:18 Tilman Hausherr <TH...@t-online.de> wrote:
>
>> Am 18.01.2016 um 23:19 schrieb Roberto Nibali:
>> > This then calls dumpJavaScriptEntries() for all
>> non-PDFNonTerminalFields,
>> > which finally dumps the javascript portions of your PDF (courtesy of
>> Tilman
>> > Hausherr):
>>
>> Ah, I forgot that I had written something at that time. Here's the
>> original code I wrote, although this was for that file only and other
>> files can have javascript elsewhere too.
>>
>>
>>
>> /*
>>   * Licensed to the Apache Software Foundation (ASF) under one or more
>>   * contributor license agreements.  See the NOTICE file distributed with
>>   * this work for additional information regarding copyright ownership.
>>   * The ASF licenses this file to You under the Apache License, Version
>> 2.0
>>   * (the "License"); you may not use this file except in compliance with
>>   * the License.  You may obtain a copy of the License at
>>   *
>>   *      http://www.apache.org/licenses/LICENSE-2.0
>>   *
>>   * Unless required by applicable law or agreed to in writing, software
>>   * distributed under the License is distributed on an "AS IS" BASIS,
>>   * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
>> implied.
>>   * See the License for the specific language governing permissions and
>>   * limitations under the License.
>>   */
>> package pdfboxpageimageextraction;
>>
>> import java.io.File;
>> import java.io.IOException;
>> import java.util.List;
>> import org.apache.pdfbox.pdmodel.PDDocument;
>> import org.apache.pdfbox.pdmodel.PDDocumentCatalog;
>> import org.apache.pdfbox.pdmodel.interactive.action.PDAction;
>> import org.apache.pdfbox.pdmodel.interactive.action.PDActionJavaScript;
>> import
>> org.apache.pdfbox.pdmodel.interactive.action.PDFormFieldAdditionalActions;
>> import
>> org.apache.pdfbox.pdmodel.interactive.annotation.PDAnnotationWidget;
>> import org.apache.pdfbox.pdmodel.interactive.form.PDAcroForm;
>> import org.apache.pdfbox.pdmodel.interactive.form.PDField;
>> import org.apache.pdfbox.pdmodel.interactive.form.PDNonTerminalField;
>> import org.apache.pdfbox.pdmodel.interactive.form.PDTerminalField;
>>
>> /**
>>   * This example will take a PDF document and print all the fields from
>> the file.
>>   *
>>   * @author Ben Litchfield
>>   *
>>   */
>> public class PrintJavaScriptFields
>> {
>>
>>      /**
>>       * This will print all the fields from the document.
>>       *
>>       * @param pdfDocument The PDF to get the fields from.
>>       *
>>       * @throws IOException If there is an error getting the fields.
>>       */
>>      public void printFields(PDDocument pdfDocument) throws IOException
>>      {
>>          PDDocumentCatalog docCatalog = pdfDocument.getDocumentCatalog();
>>          PDAcroForm acroForm = docCatalog.getAcroForm();
>>          List<PDField> fields = acroForm.getFields();
>>
>>          //System.out.println(fields.size() + " top-level fields were
>> found on the form");
>>          for (PDField field : fields)
>>          {
>>              processField(field, "|--", field.getPartialName());
>>          }
>>      }
>>
>>      private void processField(PDField field, String sLevel, String
>> sParent) throws IOException
>>      {
>>          String partialName = field.getPartialName();
>>
>>          if (field instanceof PDTerminalField)
>>          {
>>              PDTerminalField termField = (PDTerminalField) field;
>>              PDFormFieldAdditionalActions fieldActions =
>> field.getActions();
>>              if (fieldActions != null)
>>              {
>>                  System.out.println(field.getFullyQualifiedName() + ": "
>> + fieldActions.getClass().getSimpleName() + " js field actionS:\n" +
>> fieldActions.getCOSObject());
>>                  printPossibleJS(fieldActions.getK());
>>                  printPossibleJS(fieldActions.getC());
>>                  printPossibleJS(fieldActions.getF());
>>                  printPossibleJS(fieldActions.getV());
>>              }
>>              for (PDAnnotationWidget widgetAction :
>> termField.getWidgets())
>>              {
>>                  PDAction action = widgetAction.getAction();
>>                  if (action instanceof PDActionJavaScript)
>>                  {
>>                      System.out.println(field.getFullyQualifiedName() +
>> ": " + action.getClass().getSimpleName() + " js widget action:\n" +
>> action.getCOSObject());
>>                      printPossibleJS(action);
>>                  }
>>              }
>>          }
>>
>>          if (field instanceof PDNonTerminalField)
>>          {
>>              if (!sParent.equals(field.getPartialName()))
>>              {
>>                  if (partialName != null)
>>                  {
>>                      sParent = sParent + "." + partialName;
>>                  }
>>              }
>>              //System.out.println(sLevel + sParent);
>>
>>              for (PDField child : ((PDNonTerminalField)
>> field).getChildren())
>>              {
>>                  processField(child, "|  " + sLevel, sParent);
>>              }
>>          }
>>          else
>>          {
>>              String fieldValue = field.getValueAsString();
>>              StringBuilder outputString = new StringBuilder(sLevel);
>>              outputString.append(sParent);
>>              if (partialName != null)
>>              {
>>                  outputString.append(".").append(partialName);
>>              }
>>              outputString.append(" = ").append(fieldValue);
>>              outputString.append(",
>> type=").append(field.getClass().getName());
>>              //System.out.println(outputString);
>>          }
>>      }
>>
>>      private void printPossibleJS(PDAction kAction)
>>      {
>>          if (kAction instanceof PDActionJavaScript)
>>          {
>>              PDActionJavaScript jsAction = (PDActionJavaScript) kAction;
>>              String jsString = jsAction.getAction();
>>              if (!jsString.contains("\n"))
>>              {
>>                  // Sonst erscheint in Netbeans nichts?!
>>                  jsString = jsString.replaceAll("\r",
>> "\n").replaceAll("\n\n", "\n");
>>              }
>>              System.out.println(jsString);
>>              System.out.println();
>>          }
>>      }
>>
>>      /**
>>       * This will read a PDF file and print out the form elements. <br />
>>       * see usage() for commandline
>>       *
>>       * @param args command line arguments
>>       *
>>       * @throws IOException If there is an error importing the FDF
>> document.
>>       */
>>      public static void main(String[] args) throws IOException
>>      {
>>          PDDocument pdf = null;
>>          try
>>          {
>>              pdf = PDDocument.load(new File("XXXX", "YYYYY.pdf"));
>>              PrintJavaScriptFields exporter = new PrintJavaScriptFields();
>>              exporter.printFields(pdf);
>>          }
>>          finally
>>          {
>>              if (pdf != null)
>>              {
>>                  pdf.close();
>>              }
>>          }
>>      }
>>
>> }
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
>> For additional commands, e-mail: users-help@pdfbox.apache.org
>>
>>

Re: PDFBox for JavaScript analysis

Posted by Alin Ghitulan <al...@gmail.com>.

Thanks a lot ! I will try it out tonight and see how it goes :).

On Tue, Jan 19, 2016, 08:18 Tilman Hausherr <TH...@t-online.de> wrote:

> Am 18.01.2016 um 23:19 schrieb Roberto Nibali:
> > This then calls dumpJavaScriptEntries() for all non-PDFNonTerminalFields,
> > which finally dumps the javascript portions of your PDF (courtesy of
> Tilman
> > Hausherr):
>
> Ah, I forgot that I had written something at that time. Here's the
> original code I wrote, although this was for that file only and other
> files can have javascript elsewhere too.
>
>
>
> /*
>   * Licensed to the Apache Software Foundation (ASF) under one or more
>   * contributor license agreements.  See the NOTICE file distributed with
>   * this work for additional information regarding copyright ownership.
>   * The ASF licenses this file to You under the Apache License, Version 2.0
>   * (the "License"); you may not use this file except in compliance with
>   * the License.  You may obtain a copy of the License at
>   *
>   *      http://www.apache.org/licenses/LICENSE-2.0
>   *
>   * Unless required by applicable law or agreed to in writing, software
>   * distributed under the License is distributed on an "AS IS" BASIS,
>   * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
> implied.
>   * See the License for the specific language governing permissions and
>   * limitations under the License.
>   */
> package pdfboxpageimageextraction;
>
> import java.io.File;
> import java.io.IOException;
> import java.util.List;
> import org.apache.pdfbox.pdmodel.PDDocument;
> import org.apache.pdfbox.pdmodel.PDDocumentCatalog;
> import org.apache.pdfbox.pdmodel.interactive.action.PDAction;
> import org.apache.pdfbox.pdmodel.interactive.action.PDActionJavaScript;
> import
> org.apache.pdfbox.pdmodel.interactive.action.PDFormFieldAdditionalActions;
> import org.apache.pdfbox.pdmodel.interactive.annotation.PDAnnotationWidget;
> import org.apache.pdfbox.pdmodel.interactive.form.PDAcroForm;
> import org.apache.pdfbox.pdmodel.interactive.form.PDField;
> import org.apache.pdfbox.pdmodel.interactive.form.PDNonTerminalField;
> import org.apache.pdfbox.pdmodel.interactive.form.PDTerminalField;
>
> /**
>   * This example will take a PDF document and print all the fields from
> the file.
>   *
>   * @author Ben Litchfield
>   *
>   */
> public class PrintJavaScriptFields
> {
>
>      /**
>       * This will print all the fields from the document.
>       *
>       * @param pdfDocument The PDF to get the fields from.
>       *
>       * @throws IOException If there is an error getting the fields.
>       */
>      public void printFields(PDDocument pdfDocument) throws IOException
>      {
>          PDDocumentCatalog docCatalog = pdfDocument.getDocumentCatalog();
>          PDAcroForm acroForm = docCatalog.getAcroForm();
>          List<PDField> fields = acroForm.getFields();
>
>          //System.out.println(fields.size() + " top-level fields were
> found on the form");
>          for (PDField field : fields)
>          {
>              processField(field, "|--", field.getPartialName());
>          }
>      }
>
>      private void processField(PDField field, String sLevel, String
> sParent) throws IOException
>      {
>          String partialName = field.getPartialName();
>
>          if (field instanceof PDTerminalField)
>          {
>              PDTerminalField termField = (PDTerminalField) field;
>              PDFormFieldAdditionalActions fieldActions =
> field.getActions();
>              if (fieldActions != null)
>              {
>                  System.out.println(field.getFullyQualifiedName() + ": "
> + fieldActions.getClass().getSimpleName() + " js field actionS:\n" +
> fieldActions.getCOSObject());
>                  printPossibleJS(fieldActions.getK());
>                  printPossibleJS(fieldActions.getC());
>                  printPossibleJS(fieldActions.getF());
>                  printPossibleJS(fieldActions.getV());
>              }
>              for (PDAnnotationWidget widgetAction : termField.getWidgets())
>              {
>                  PDAction action = widgetAction.getAction();
>                  if (action instanceof PDActionJavaScript)
>                  {
>                      System.out.println(field.getFullyQualifiedName() +
> ": " + action.getClass().getSimpleName() + " js widget action:\n" +
> action.getCOSObject());
>                      printPossibleJS(action);
>                  }
>              }
>          }
>
>          if (field instanceof PDNonTerminalField)
>          {
>              if (!sParent.equals(field.getPartialName()))
>              {
>                  if (partialName != null)
>                  {
>                      sParent = sParent + "." + partialName;
>                  }
>              }
>              //System.out.println(sLevel + sParent);
>
>              for (PDField child : ((PDNonTerminalField)
> field).getChildren())
>              {
>                  processField(child, "|  " + sLevel, sParent);
>              }
>          }
>          else
>          {
>              String fieldValue = field.getValueAsString();
>              StringBuilder outputString = new StringBuilder(sLevel);
>              outputString.append(sParent);
>              if (partialName != null)
>              {
>                  outputString.append(".").append(partialName);
>              }
>              outputString.append(" = ").append(fieldValue);
>              outputString.append(",
> type=").append(field.getClass().getName());
>              //System.out.println(outputString);
>          }
>      }
>
>      private void printPossibleJS(PDAction kAction)
>      {
>          if (kAction instanceof PDActionJavaScript)
>          {
>              PDActionJavaScript jsAction = (PDActionJavaScript) kAction;
>              String jsString = jsAction.getAction();
>              if (!jsString.contains("\n"))
>              {
>                  // Sonst erscheint in Netbeans nichts?!
>                  jsString = jsString.replaceAll("\r",
> "\n").replaceAll("\n\n", "\n");
>              }
>              System.out.println(jsString);
>              System.out.println();
>          }
>      }
>
>      /**
>       * This will read a PDF file and print out the form elements. <br />
>       * see usage() for commandline
>       *
>       * @param args command line arguments
>       *
>       * @throws IOException If there is an error importing the FDF
> document.
>       */
>      public static void main(String[] args) throws IOException
>      {
>          PDDocument pdf = null;
>          try
>          {
>              pdf = PDDocument.load(new File("XXXX", "YYYYY.pdf"));
>              PrintJavaScriptFields exporter = new PrintJavaScriptFields();
>              exporter.printFields(pdf);
>          }
>          finally
>          {
>              if (pdf != null)
>              {
>                  pdf.close();
>              }
>          }
>      }
>
> }
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
> For additional commands, e-mail: users-help@pdfbox.apache.org
>
>

Re: PDFBox for JavaScript analysis

Posted by Tilman Hausherr <TH...@t-online.de>.

Am 18.01.2016 um 23:19 schrieb Roberto Nibali:
> This then calls dumpJavaScriptEntries() for all non-PDFNonTerminalFields,
> which finally dumps the javascript portions of your PDF (courtesy of Tilman
> Hausherr):

Ah, I forgot that I had written something at that time. Here's the 
original code I wrote, although this was for that file only and other 
files can have javascript elsewhere too.



/*
  * Licensed to the Apache Software Foundation (ASF) under one or more
  * contributor license agreements.  See the NOTICE file distributed with
  * this work for additional information regarding copyright ownership.
  * The ASF licenses this file to You under the Apache License, Version 2.0
  * (the "License"); you may not use this file except in compliance with
  * the License.  You may obtain a copy of the License at
  *
  *      http://www.apache.org/licenses/LICENSE-2.0
  *
  * Unless required by applicable law or agreed to in writing, software
  * distributed under the License is distributed on an "AS IS" BASIS,
  * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  * See the License for the specific language governing permissions and
  * limitations under the License.
  */
package pdfboxpageimageextraction;

import java.io.File;
import java.io.IOException;
import java.util.List;
import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.pdmodel.PDDocumentCatalog;
import org.apache.pdfbox.pdmodel.interactive.action.PDAction;
import org.apache.pdfbox.pdmodel.interactive.action.PDActionJavaScript;
import 
org.apache.pdfbox.pdmodel.interactive.action.PDFormFieldAdditionalActions;
import org.apache.pdfbox.pdmodel.interactive.annotation.PDAnnotationWidget;
import org.apache.pdfbox.pdmodel.interactive.form.PDAcroForm;
import org.apache.pdfbox.pdmodel.interactive.form.PDField;
import org.apache.pdfbox.pdmodel.interactive.form.PDNonTerminalField;
import org.apache.pdfbox.pdmodel.interactive.form.PDTerminalField;

/**
  * This example will take a PDF document and print all the fields from 
the file.
  *
  * @author Ben Litchfield
  *
  */
public class PrintJavaScriptFields
{

     /**
      * This will print all the fields from the document.
      *
      * @param pdfDocument The PDF to get the fields from.
      *
      * @throws IOException If there is an error getting the fields.
      */
     public void printFields(PDDocument pdfDocument) throws IOException
     {
         PDDocumentCatalog docCatalog = pdfDocument.getDocumentCatalog();
         PDAcroForm acroForm = docCatalog.getAcroForm();
         List<PDField> fields = acroForm.getFields();

         //System.out.println(fields.size() + " top-level fields were 
found on the form");
         for (PDField field : fields)
         {
             processField(field, "|--", field.getPartialName());
         }
     }

     private void processField(PDField field, String sLevel, String 
sParent) throws IOException
     {
         String partialName = field.getPartialName();

         if (field instanceof PDTerminalField)
         {
             PDTerminalField termField = (PDTerminalField) field;
             PDFormFieldAdditionalActions fieldActions = field.getActions();
             if (fieldActions != null)
             {
                 System.out.println(field.getFullyQualifiedName() + ": " 
+ fieldActions.getClass().getSimpleName() + " js field actionS:\n" + 
fieldActions.getCOSObject());
                 printPossibleJS(fieldActions.getK());
                 printPossibleJS(fieldActions.getC());
                 printPossibleJS(fieldActions.getF());
                 printPossibleJS(fieldActions.getV());
             }
             for (PDAnnotationWidget widgetAction : termField.getWidgets())
             {
                 PDAction action = widgetAction.getAction();
                 if (action instanceof PDActionJavaScript)
                 {
                     System.out.println(field.getFullyQualifiedName() + 
": " + action.getClass().getSimpleName() + " js widget action:\n" + 
action.getCOSObject());
                     printPossibleJS(action);
                 }
             }
         }

         if (field instanceof PDNonTerminalField)
         {
             if (!sParent.equals(field.getPartialName()))
             {
                 if (partialName != null)
                 {
                     sParent = sParent + "." + partialName;
                 }
             }
             //System.out.println(sLevel + sParent);

             for (PDField child : ((PDNonTerminalField) 
field).getChildren())
             {
                 processField(child, "|  " + sLevel, sParent);
             }
         }
         else
         {
             String fieldValue = field.getValueAsString();
             StringBuilder outputString = new StringBuilder(sLevel);
             outputString.append(sParent);
             if (partialName != null)
             {
                 outputString.append(".").append(partialName);
             }
             outputString.append(" = ").append(fieldValue);
             outputString.append(", 
type=").append(field.getClass().getName());
             //System.out.println(outputString);
         }
     }

     private void printPossibleJS(PDAction kAction)
     {
         if (kAction instanceof PDActionJavaScript)
         {
             PDActionJavaScript jsAction = (PDActionJavaScript) kAction;
             String jsString = jsAction.getAction();
             if (!jsString.contains("\n"))
             {
                 // Sonst erscheint in Netbeans nichts?!
                 jsString = jsString.replaceAll("\r", 
"\n").replaceAll("\n\n", "\n");
             }
             System.out.println(jsString);
             System.out.println();
         }
     }

     /**
      * This will read a PDF file and print out the form elements. <br />
      * see usage() for commandline
      *
      * @param args command line arguments
      *
      * @throws IOException If there is an error importing the FDF document.
      */
     public static void main(String[] args) throws IOException
     {
         PDDocument pdf = null;
         try
         {
             pdf = PDDocument.load(new File("XXXX", "YYYYY.pdf"));
             PrintJavaScriptFields exporter = new PrintJavaScriptFields();
             exporter.printFields(pdf);
         }
         finally
         {
             if (pdf != null)
             {
                 pdf.close();
             }
         }
     }

}


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org

Re: PDFBox for JavaScript analysis

Posted by Roberto Nibali <rn...@gmail.com>.

Hi

One of my PDF library methods reads the following method (stripped to
remove references to other internal library calls not relevant to your
question):

private void executeDumpJS(String srcDocName) throws IOException {
    PDDocument srcDoc = null;
    try {
        srcDoc = PDDocument.load(new File(srcDocName));
        srcDoc.getDocumentCatalog().getAcroForm().getFields().forEach(this::dumpJSEntry);
        srcDoc.close();
    } catch (Exception e) {
        // do something
    } finally {
        if (srcDoc != null) {
            srcDoc.close();
        }
    }
}

The input string is the PDF file. The dumpJSEntry() method looks as follows:

private void dumpJSEntry(PDField srcField) {
    if (srcField instanceof PDNonTerminalField) {
        ((PDNonTerminalField)
srcField).getChildren().forEach(this::dumpJSEntry);
    } else if (!(srcField instanceof PDSignatureField)) {
        dumpJavaScriptEntries(srcField);
    }
}

This then calls dumpJavaScriptEntries() for all non-PDFNonTerminalFields,
which finally dumps the javascript portions of your PDF (courtesy of Tilman
Hausherr):

private void dumpJavaScriptEntries(PDField field) {
    final String fqName = field.getFullyQualifiedName();

    final PDFormFieldAdditionalActions fieldActions = field.getActions();
    if (fieldActions != null) {
        final StringBuilder sb = new StringBuilder();
        final Formatter formatter = new Formatter(sb, Locale.ENGLISH);
        formatter.format("// %s [%s]:%n", fqName,
fieldActions.getClass().getSimpleName());
        System.out.printf("%s", sb.toString());

        /**
         * This will dump a JavaScript action to be performed when the user
         * types a keystroke into a text field or combo box or modifies the
         * selection in a scrollable list box. This allows the keystroke to
         * be checked for validity and rejected or modified.
         */
        printPossibleJS(fieldActions.getK());
        /**
         * This will dump a JavaScript action to be performed in order
         * to recalculate the value of this field when that of another
         * field changes.
         */
        printPossibleJS(fieldActions.getC());
        /**
         * This will dump a JavaScript action to be performed before
         * the field is formatted to display its current value. This
         * allows the field's value to be modified before formatting.
         */
        printPossibleJS(fieldActions.getF());
        /**
         * This will dump a JavaScript action to be performed
         * when the field's value is changed. This allows the
         * new value to be checked for validity.
         */
        printPossibleJS(fieldActions.getV());
    }

    final PDTerminalField termField = (PDTerminalField) field;
    for (PDAnnotationWidget widgetAction : termField.getWidgets()) {
        final PDAction action = widgetAction.getAction();
        if (action instanceof PDActionJavaScript) {
            final StringBuilder sb = new StringBuilder();
            final Formatter formatter = new Formatter(sb, Locale.ENGLISH);
            formatter.format("// %s [%s]:%n", fqName,
action.getClass().getSimpleName());
            System.out.printf("%s", sb.toString());
            printPossibleJS(action);
        }
    }
}

Now, only one last piece is missing, printPossibleJS(), which again
originates from some code written by Tilman Hausherr:

private void printPossibleJS(PDAction kAction) {
    if (kAction instanceof PDActionJavaScript) {
        final PDActionJavaScript jsAction = (PDActionJavaScript) kAction;
        String jsString = jsAction.getAction();
        if (!jsString.contains("\n")) {
            jsString = jsString.replaceAll("\r", "\n").replaceAll("\n\n", "\n");
        }
        System.out.println(jsString);
        System.out.println();
    }
}

Couldn't find a simpler way to do this, since a PDF basically is a directed
graph of objects. Pick out the pieces you need.

Hope it helps.

Cheers

Roberto



On Mon, Jan 18, 2016 at 6:43 AM, Alin Ghitulan <al...@gmail.com>
wrote:

> Hello,
>
> Can anyone help me accomplish this? I need some direction on how to obtain
> a list of objects in PDF that contains JavaScript code so I can further
> process the JS code.
>
> Thanks,
> Alin
>