You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@pdfbox.apache.org by Davide Zoni <Da...@Cedacri.it> on 2016/08/22 13:14:19 UTC

Check for scripts in a PDF

Hallo everybody,

i'm using PDFbox to check if a PDF file contains malicious scripts. I'm using the PDF/A-1a validation to check the file. Since i'm searching only for potential damaging code and not for a true PDF/A-1a standard accompliance, is it enough to consider 1.x.x, 6.x.x and 7.x.x errors as "true" errors? Below category description:

Category        Description
1[.y[.z]]       Syntax Error
2[.y[.z]]       Graphic Error
3[.y[.z]]       Font Error
4[.y[.z]]       Transparency Error
5[.y[.z]]       Annotation Error
6[.y[.z]]       Action Error
7[.y[.z]]       Metadata Error

Thanks.






Il contenuto e le informazioni di questo messaggio di posta elettronica sono riservate, confidenziali e non vincolanti nè impegnative per Cedacri s.p.a., ne è vietata pertanto la diffusione o divulgazione in qualunque modo eseguita. Qualora Lei non fosse la persona a cui il presente messaggio è destinato La invitiamo ad eliminarlo e a non leggerlo, dandocene gentilmente comunicazione. The content, informations and any attachments of this e-mail are classified, confidential and not binding neither impegnative for Cedacri S.P.A., the spread or spreading in any executed way is prohibited therefore. If you are not named recipient, please notify the sender immediately and do not disclose the contents to another person, use it for any purpose, or store or copy the information in any medium.

Re: Check for scripts in a PDF

Posted by Maruan Sahyoun <sa...@fileaffairs.de>.
Dear Davide,

you had a specific question " ... access the acroForm (even in a pdf file with scripts and forms) it's null. …"

but the samples you are lonking to

> Am 29.08.2016 um 13:16 schrieb Davide Zoni <Da...@Cedacri.it>:
> 
> Hallo,
> 
> yuo can find something a multimedia example here (i'm aware that the code suggested by Tilman might not work here) :
> 
> http://media.washingtonpost.com/wp-adv/advertisers/Adobe/Obama/090808/ObamaPort.pdf

there is no form at all.

> 
> or here (the first one):
> 
> http://www.pdfscripting.com/public/Free-Sample-PDF-Files-with-scripts.cfm
> 

there are several files. Which one shows the behavior you are mentioning?

BR

Maruan


> where PDAcroForm is not null but the code fails to check for javascript fields.
> 
> Thanks.
> 
>        Davide Zoni
> 
>        Cedacri S.p.A.
> 
>        Tel.: 0521807433
> 
>        e-mail: davide.zoni@cedacri.it
> 
>        www.cedacri.it
> 
> 
> ________________________________________
> Da: Maruan Sahyoun [sahyoun@fileaffairs.de]
> Inviato: lunedì 29 agosto 2016 11.28
> A: users@pdfbox.apache.org
> Oggetto: Re: Check for scripts in a PDF
> 
> Hi,
> 
>> Am 29.08.2016 um 11:09 schrieb Davide Zoni <Da...@Cedacri.it>:
>> 
>> Hi everybody again,
>> 
>> i'm trying to figure out if your method is suitable for my necessities but everytime i try to access the acroForm (even in a pdf file with scripts and forms) it's null.
> 
> could you upload a  sample PDF to a public site to take a look at? An interactive PDF form should have an AcroForm entry.
> 
> BR
> Maruan
> 
> 
>> Am i loading the file in a wrong way? Am i missing something?
>> 
>> Best regards.
>> 
>> ________________________________________
>> Da: Tilman Hausherr [THausherr@t-online.de]
>> Inviato: mercoledì 24 agosto 2016 18.24
>> A: users@pdfbox.apache.org
>> Oggetto: Re: Check for scripts in a PDF
>> 
>> Am 24.08.2016 um 15:41 schrieb Davide Zoni:
>>> Thank you. This might be helpful but i'm afraid that i would not be able to check every possibility. There's a way to check if a PDF is static (or dynamic)? For our pourpose that shuold be enough.
>> 
>> No there is no such method.
>> 
>> Tilman
>> 
>> 
>>> Best regards.
>>> 
>>>        Davide Zoni
>>> 
>>>        Cedacri S.p.A.
>>> 
>>>        Tel.: 0521807433
>>> 
>>>        e-mail: davide.zoni@cedacri.it
>>> 
>>>        www.cedacri.it
>>> 
>>> 
>>> ________________________________________
>>> Da: Tilman Hausherr [THausherr@t-online.de]
>>> Inviato: martedì 23 agosto 2016 18.23
>>> A: users@pdfbox.apache.org
>>> Oggetto: Re: Check for scripts in a PDF
>>> 
>>> Am 23.08.2016 um 09:35 schrieb Davide Zoni:
>>>> Yes, i'm seeking to detect files with scripts. Not static. I don't undestand what do you mean with "Maybe compare
>>>> with the preflight source code to check that you didn't miss something", can you elaborate on that?
>>> I meant to search for "Javascript" in the source code, and then see
>>> where it is used. This is just so that you can be more sure what you got
>>> all when you read the PDF specification.
>>> 
>>> Btw I once wrote some code to show (some) javascript fields, see below
>>> or search for "Roberto Nibali Javascript". He also improved that code
>>> and posted the improved version. It may not find all javascript stuff,
>>> but it could help show you how to write code.
>>> 
>>> Tilman
>>> 
>>> 
>>> public class PrintJavaScriptFields
>>> {
>>> 
>>>     /**
>>>      * This will print all the fields from the document.
>>>      *
>>>      * @param pdfDocument The PDF to get the fields from.
>>>      *
>>>      * @throws IOException If there is an error getting the fields.
>>>      */
>>>     public void printFields(PDDocument pdfDocument) throws IOException
>>>     {
>>>         PDDocumentCatalog docCatalog = pdfDocument.getDocumentCatalog();
>>>         PDAcroForm acroForm = docCatalog.getAcroForm();
>>>         List<PDField> fields = acroForm.getFields();
>>> 
>>>         //System.out.println(fields.size() + " top-level fields were
>>> found on the form");
>>> 
>>>         for (PDField field : fields)
>>>         {
>>>             processField(field, "|--", field.getPartialName());
>>>         }
>>>     }
>>> 
>>>     private void processField(PDField field, String sLevel, String
>>> sParent) throws IOException
>>>     {
>>>         String partialName = field.getPartialName();
>>> 
>>>         if (field instanceof PDTerminalField)
>>>         {
>>>             PDTerminalField termField = (PDTerminalField) field;
>>>             for (PDAnnotationWidget widget : termField.getWidgets())
>>>             {
>>>                 PDAction action = widget.getAction();
>>>                 if (action instanceof PDActionJavaScript)
>>>                 {
>>>                     System.out.println(field.getFullyQualifiedName() +
>>> ": " + action.getClass().getSimpleName() + " js widget action:\n" +
>>> action.getCOSObject());
>>>                     printPossibleJS(action);
>>>                 }
>>>                 PDAnnotationAdditionalActions actions =
>>> widget.getActions();
>>>                 if (actions != null)
>>>                 {
>>>                     System.out.println(field.getFullyQualifiedName() +
>>> ": " + actions.getClass().getSimpleName() + " js widget actionS:\n" +
>>> actions.getCOSObject());
>>> 
>>>                     // Merkwürdig, wieso bekomme ich nicht
>>> PDFormFieldAdditionalActions sondern ein PDAnnotationAdditionalActions
>>> in dem ein K ist aber kein getK() ?
>>>                     PDFormFieldAdditionalActions ffActions = new
>>> PDFormFieldAdditionalActions((COSDictionary) actions.getCOSObject());
>>>                     printPossibleJS(ffActions.getK());
>>>                     printPossibleJS(ffActions.getC());
>>>                     printPossibleJS(ffActions.getF());
>>>                     printPossibleJS(ffActions.getV());
>>>                 }
>>>             }
>>>         }
>>> 
>>>         if (field instanceof PDNonTerminalField)
>>>         {
>>>             if (!sParent.equals(field.getPartialName()))
>>>             {
>>>                 if (partialName != null)
>>>                 {
>>>                     sParent = sParent + "." + partialName;
>>>                 }
>>>             }
>>>             //System.out.println(sLevel + sParent);
>>> 
>>>             for (PDField child : ((PDNonTerminalField)
>>> field).getChildren())
>>>             {
>>>                 processField(child, "|  " + sLevel, sParent);
>>>             }
>>>         }
>>>         else
>>>         {
>>>             String fieldValue = field.getValueAsString();
>>>             StringBuilder outputString = new StringBuilder(sLevel);
>>>             outputString.append(sParent);
>>>             if (partialName != null)
>>>             {
>>>                 outputString.append(".").append(partialName);
>>>             }
>>>             outputString.append(" = ").append(fieldValue);
>>>             outputString.append(",
>>> type=").append(field.getClass().getName());
>>>             //System.out.println(outputString);
>>>         }
>>>     }
>>> 
>>>     private void printPossibleJS(PDAction kAction)
>>>     {
>>>         if (kAction instanceof PDActionJavaScript)
>>>         {
>>>             PDActionJavaScript jsAction = (PDActionJavaScript) kAction;
>>>             String jsString = jsAction.getAction();
>>>             if (!jsString.contains("\n"))
>>>             {
>>>                 // Sonst erscheint in Netbeans nichts?!
>>>                 jsString = jsString.replaceAll("\r",
>>> "\n").replaceAll("\n\n", "\n");
>>>             }
>>>             System.out.println(jsString);
>>>             System.out.println();
>>>         }
>>>     }
>>> 
>>>     /**
>>>      * This will read a PDF file and print out the form elements. <br />
>>>      * see usage() for commandline
>>>      *
>>>      * @param args command line arguments
>>>      *
>>>      * @throws IOException If there is an error importing the FDF document.
>>>      */
>>>     public static void main(String[] args) throws IOException
>>>     {
>>>         PDDocument pdf = null;
>>>         try
>>>         {
>>>             pdf = PDDocument.load(new File(XXXXXX));
>>>             PrintJavaScriptFields exporter = new PrintJavaScriptFields();
>>>             exporter.printFields(pdf);
>>>         }
>>>         finally
>>>         {
>>>             if (pdf != null)
>>>             {
>>>                 pdf.close();
>>>             }
>>>         }
>>>     }
>>> 
>>> }
>>> 
>>> 
>>> 
>>>> Thank you.
>>>> 
>>>>         Davide
>>>> 
>>>> ________________________________________
>>>> Da: Tilman Hausherr [THausherr@t-online.de]
>>>> Inviato: martedì 23 agosto 2016 8.34
>>>> A: users@pdfbox.apache.org
>>>> Oggetto: Re: Check for scripts in a PDF
>>>> 
>>>> Am 22.08.2016 um 15:14 schrieb Davide Zoni:
>>>>> Hallo everybody,
>>>>> 
>>>>> i'm using PDFbox to check if a PDF file contains malicious scripts. I'm using the PDF/A-1a validation to check the file. Since i'm searching only for potential damaging code and not for a true PDF/A-1a standard accompliance, is it enough to consider 1.x.x, 6.x.x and 7.x.x errors as "true" errors? Below category description:
>>>>> 
>>>>> Category        Description
>>>>> 1[.y[.z]]       Syntax Error
>>>>> 2[.y[.z]]       Graphic Error
>>>>> 3[.y[.z]]       Font Error
>>>>> 4[.y[.z]]       Transparency Error
>>>>> 5[.y[.z]]       Annotation Error
>>>>> 6[.y[.z]]       Action Error
>>>>> 7[.y[.z]]       Metadata Error
>>>> Unclear what you're asking. Are you seeking to detect files with
>>>> javascript? If so, I'd rather build something something from scratch,
>>>> i.e. read the PDF specification and see where JS is used. Maybe compare
>>>> with the preflight source code to check that you didn't miss something.
>>>> 
>>>> Tilman
>>>> 
>>>> 
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
>>>> For additional commands, e-mail: users-help@pdfbox.apache.org
>>>> 
>>>> Il contenuto e le informazioni di questo messaggio di posta elettronica sono riservate, confidenziali e non vincolanti nè impegnative per Cedacri s.p.a., ne è vietata pertanto la diffusione o divulgazione in qualunque modo eseguita. Qualora Lei non fosse la persona a cui il presente messaggio è destinato La invitiamo ad eliminarlo e a non leggerlo, dandocene gentilmente comunicazione. The content, informations and any attachments of this e-mail are classified, confidential and not binding neither impegnative for Cedacri S.P.A., the spread or spreading in any executed way is prohibited therefore. If you are not named recipient, please notify the sender immediately and do not disclose the contents to another person, use it for any purpose, or store or copy the information in any medium.
>>>> 
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
>>>> For additional commands, e-mail: users-help@pdfbox.apache.org
>>>> 
>>> 
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
>>> For additional commands, e-mail: users-help@pdfbox.apache.org
>>> 
>>> 
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
>>> For additional commands, e-mail: users-help@pdfbox.apache.org
>>> 
>> 
>> 
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
>> For additional commands, e-mail: users-help@pdfbox.apache.org
>> 
>> 
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
>> For additional commands, e-mail: users-help@pdfbox.apache.org
>> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
> For additional commands, e-mail: users-help@pdfbox.apache.org
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
> For additional commands, e-mail: users-help@pdfbox.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org


RE: Check for scripts in a PDF

Posted by Davide Zoni <Da...@Cedacri.it>.
Hallo,

yuo can find something a multimedia example here (i'm aware that the code suggested by Tilman might not work here) :

http://media.washingtonpost.com/wp-adv/advertisers/Adobe/Obama/090808/ObamaPort.pdf

or here (the first one):

http://www.pdfscripting.com/public/Free-Sample-PDF-Files-with-scripts.cfm

where PDAcroForm is not null but the code fails to check for javascript fields.

Thanks.

        Davide Zoni

        Cedacri S.p.A.

        Tel.: 0521807433

        e-mail: davide.zoni@cedacri.it

        www.cedacri.it


________________________________________
Da: Maruan Sahyoun [sahyoun@fileaffairs.de]
Inviato: lunedì 29 agosto 2016 11.28
A: users@pdfbox.apache.org
Oggetto: Re: Check for scripts in a PDF

Hi,

> Am 29.08.2016 um 11:09 schrieb Davide Zoni <Da...@Cedacri.it>:
>
> Hi everybody again,
>
> i'm trying to figure out if your method is suitable for my necessities but everytime i try to access the acroForm (even in a pdf file with scripts and forms) it's null.

could you upload a  sample PDF to a public site to take a look at? An interactive PDF form should have an AcroForm entry.

BR
Maruan


> Am i loading the file in a wrong way? Am i missing something?
>
> Best regards.
>
> ________________________________________
> Da: Tilman Hausherr [THausherr@t-online.de]
> Inviato: mercoledì 24 agosto 2016 18.24
> A: users@pdfbox.apache.org
> Oggetto: Re: Check for scripts in a PDF
>
> Am 24.08.2016 um 15:41 schrieb Davide Zoni:
>> Thank you. This might be helpful but i'm afraid that i would not be able to check every possibility. There's a way to check if a PDF is static (or dynamic)? For our pourpose that shuold be enough.
>
> No there is no such method.
>
> Tilman
>
>
>> Best regards.
>>
>>         Davide Zoni
>>
>>         Cedacri S.p.A.
>>
>>         Tel.: 0521807433
>>
>>         e-mail: davide.zoni@cedacri.it
>>
>>         www.cedacri.it
>>
>>
>> ________________________________________
>> Da: Tilman Hausherr [THausherr@t-online.de]
>> Inviato: martedì 23 agosto 2016 18.23
>> A: users@pdfbox.apache.org
>> Oggetto: Re: Check for scripts in a PDF
>>
>> Am 23.08.2016 um 09:35 schrieb Davide Zoni:
>>> Yes, i'm seeking to detect files with scripts. Not static. I don't undestand what do you mean with "Maybe compare
>>> with the preflight source code to check that you didn't miss something", can you elaborate on that?
>> I meant to search for "Javascript" in the source code, and then see
>> where it is used. This is just so that you can be more sure what you got
>> all when you read the PDF specification.
>>
>> Btw I once wrote some code to show (some) javascript fields, see below
>> or search for "Roberto Nibali Javascript". He also improved that code
>> and posted the improved version. It may not find all javascript stuff,
>> but it could help show you how to write code.
>>
>> Tilman
>>
>>
>> public class PrintJavaScriptFields
>> {
>>
>>      /**
>>       * This will print all the fields from the document.
>>       *
>>       * @param pdfDocument The PDF to get the fields from.
>>       *
>>       * @throws IOException If there is an error getting the fields.
>>       */
>>      public void printFields(PDDocument pdfDocument) throws IOException
>>      {
>>          PDDocumentCatalog docCatalog = pdfDocument.getDocumentCatalog();
>>          PDAcroForm acroForm = docCatalog.getAcroForm();
>>          List<PDField> fields = acroForm.getFields();
>>
>>          //System.out.println(fields.size() + " top-level fields were
>> found on the form");
>>
>>          for (PDField field : fields)
>>          {
>>              processField(field, "|--", field.getPartialName());
>>          }
>>      }
>>
>>      private void processField(PDField field, String sLevel, String
>> sParent) throws IOException
>>      {
>>          String partialName = field.getPartialName();
>>
>>          if (field instanceof PDTerminalField)
>>          {
>>              PDTerminalField termField = (PDTerminalField) field;
>>              for (PDAnnotationWidget widget : termField.getWidgets())
>>              {
>>                  PDAction action = widget.getAction();
>>                  if (action instanceof PDActionJavaScript)
>>                  {
>>                      System.out.println(field.getFullyQualifiedName() +
>> ": " + action.getClass().getSimpleName() + " js widget action:\n" +
>> action.getCOSObject());
>>                      printPossibleJS(action);
>>                  }
>>                  PDAnnotationAdditionalActions actions =
>> widget.getActions();
>>                  if (actions != null)
>>                  {
>>                      System.out.println(field.getFullyQualifiedName() +
>> ": " + actions.getClass().getSimpleName() + " js widget actionS:\n" +
>> actions.getCOSObject());
>>
>>                      // Merkwürdig, wieso bekomme ich nicht
>> PDFormFieldAdditionalActions sondern ein PDAnnotationAdditionalActions
>> in dem ein K ist aber kein getK() ?
>>                      PDFormFieldAdditionalActions ffActions = new
>> PDFormFieldAdditionalActions((COSDictionary) actions.getCOSObject());
>>                      printPossibleJS(ffActions.getK());
>>                      printPossibleJS(ffActions.getC());
>>                      printPossibleJS(ffActions.getF());
>>                      printPossibleJS(ffActions.getV());
>>                  }
>>              }
>>          }
>>
>>          if (field instanceof PDNonTerminalField)
>>          {
>>              if (!sParent.equals(field.getPartialName()))
>>              {
>>                  if (partialName != null)
>>                  {
>>                      sParent = sParent + "." + partialName;
>>                  }
>>              }
>>              //System.out.println(sLevel + sParent);
>>
>>              for (PDField child : ((PDNonTerminalField)
>> field).getChildren())
>>              {
>>                  processField(child, "|  " + sLevel, sParent);
>>              }
>>          }
>>          else
>>          {
>>              String fieldValue = field.getValueAsString();
>>              StringBuilder outputString = new StringBuilder(sLevel);
>>              outputString.append(sParent);
>>              if (partialName != null)
>>              {
>>                  outputString.append(".").append(partialName);
>>              }
>>              outputString.append(" = ").append(fieldValue);
>>              outputString.append(",
>> type=").append(field.getClass().getName());
>>              //System.out.println(outputString);
>>          }
>>      }
>>
>>      private void printPossibleJS(PDAction kAction)
>>      {
>>          if (kAction instanceof PDActionJavaScript)
>>          {
>>              PDActionJavaScript jsAction = (PDActionJavaScript) kAction;
>>              String jsString = jsAction.getAction();
>>              if (!jsString.contains("\n"))
>>              {
>>                  // Sonst erscheint in Netbeans nichts?!
>>                  jsString = jsString.replaceAll("\r",
>> "\n").replaceAll("\n\n", "\n");
>>              }
>>              System.out.println(jsString);
>>              System.out.println();
>>          }
>>      }
>>
>>      /**
>>       * This will read a PDF file and print out the form elements. <br />
>>       * see usage() for commandline
>>       *
>>       * @param args command line arguments
>>       *
>>       * @throws IOException If there is an error importing the FDF document.
>>       */
>>      public static void main(String[] args) throws IOException
>>      {
>>          PDDocument pdf = null;
>>          try
>>          {
>>              pdf = PDDocument.load(new File(XXXXXX));
>>              PrintJavaScriptFields exporter = new PrintJavaScriptFields();
>>              exporter.printFields(pdf);
>>          }
>>          finally
>>          {
>>              if (pdf != null)
>>              {
>>                  pdf.close();
>>              }
>>          }
>>      }
>>
>> }
>>
>>
>>
>>> Thank you.
>>>
>>>          Davide
>>>
>>> ________________________________________
>>> Da: Tilman Hausherr [THausherr@t-online.de]
>>> Inviato: martedì 23 agosto 2016 8.34
>>> A: users@pdfbox.apache.org
>>> Oggetto: Re: Check for scripts in a PDF
>>>
>>> Am 22.08.2016 um 15:14 schrieb Davide Zoni:
>>>> Hallo everybody,
>>>>
>>>> i'm using PDFbox to check if a PDF file contains malicious scripts. I'm using the PDF/A-1a validation to check the file. Since i'm searching only for potential damaging code and not for a true PDF/A-1a standard accompliance, is it enough to consider 1.x.x, 6.x.x and 7.x.x errors as "true" errors? Below category description:
>>>>
>>>> Category        Description
>>>> 1[.y[.z]]       Syntax Error
>>>> 2[.y[.z]]       Graphic Error
>>>> 3[.y[.z]]       Font Error
>>>> 4[.y[.z]]       Transparency Error
>>>> 5[.y[.z]]       Annotation Error
>>>> 6[.y[.z]]       Action Error
>>>> 7[.y[.z]]       Metadata Error
>>> Unclear what you're asking. Are you seeking to detect files with
>>> javascript? If so, I'd rather build something something from scratch,
>>> i.e. read the PDF specification and see where JS is used. Maybe compare
>>> with the preflight source code to check that you didn't miss something.
>>>
>>> Tilman
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
>>> For additional commands, e-mail: users-help@pdfbox.apache.org
>>>
>>> Il contenuto e le informazioni di questo messaggio di posta elettronica sono riservate, confidenziali e non vincolanti nè impegnative per Cedacri s.p.a., ne è vietata pertanto la diffusione o divulgazione in qualunque modo eseguita. Qualora Lei non fosse la persona a cui il presente messaggio è destinato La invitiamo ad eliminarlo e a non leggerlo, dandocene gentilmente comunicazione. The content, informations and any attachments of this e-mail are classified, confidential and not binding neither impegnative for Cedacri S.P.A., the spread or spreading in any executed way is prohibited therefore. If you are not named recipient, please notify the sender immediately and do not disclose the contents to another person, use it for any purpose, or store or copy the information in any medium.
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
>>> For additional commands, e-mail: users-help@pdfbox.apache.org
>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
>> For additional commands, e-mail: users-help@pdfbox.apache.org
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
>> For additional commands, e-mail: users-help@pdfbox.apache.org
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
> For additional commands, e-mail: users-help@pdfbox.apache.org
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
> For additional commands, e-mail: users-help@pdfbox.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org


Re: Check for scripts in a PDF

Posted by Maruan Sahyoun <sa...@fileaffairs.de>.
Hi,

> Am 29.08.2016 um 11:09 schrieb Davide Zoni <Da...@Cedacri.it>:
> 
> Hi everybody again,
> 
> i'm trying to figure out if your method is suitable for my necessities but everytime i try to access the acroForm (even in a pdf file with scripts and forms) it's null.

could you upload a  sample PDF to a public site to take a look at? An interactive PDF form should have an AcroForm entry. 

BR
Maruan


> Am i loading the file in a wrong way? Am i missing something?
> 
> Best regards.
> 
> ________________________________________
> Da: Tilman Hausherr [THausherr@t-online.de]
> Inviato: mercoledì 24 agosto 2016 18.24
> A: users@pdfbox.apache.org
> Oggetto: Re: Check for scripts in a PDF
> 
> Am 24.08.2016 um 15:41 schrieb Davide Zoni:
>> Thank you. This might be helpful but i'm afraid that i would not be able to check every possibility. There's a way to check if a PDF is static (or dynamic)? For our pourpose that shuold be enough.
> 
> No there is no such method.
> 
> Tilman
> 
> 
>> Best regards.
>> 
>>         Davide Zoni
>> 
>>         Cedacri S.p.A.
>> 
>>         Tel.: 0521807433
>> 
>>         e-mail: davide.zoni@cedacri.it
>> 
>>         www.cedacri.it
>> 
>> 
>> ________________________________________
>> Da: Tilman Hausherr [THausherr@t-online.de]
>> Inviato: martedì 23 agosto 2016 18.23
>> A: users@pdfbox.apache.org
>> Oggetto: Re: Check for scripts in a PDF
>> 
>> Am 23.08.2016 um 09:35 schrieb Davide Zoni:
>>> Yes, i'm seeking to detect files with scripts. Not static. I don't undestand what do you mean with "Maybe compare
>>> with the preflight source code to check that you didn't miss something", can you elaborate on that?
>> I meant to search for "Javascript" in the source code, and then see
>> where it is used. This is just so that you can be more sure what you got
>> all when you read the PDF specification.
>> 
>> Btw I once wrote some code to show (some) javascript fields, see below
>> or search for "Roberto Nibali Javascript". He also improved that code
>> and posted the improved version. It may not find all javascript stuff,
>> but it could help show you how to write code.
>> 
>> Tilman
>> 
>> 
>> public class PrintJavaScriptFields
>> {
>> 
>>      /**
>>       * This will print all the fields from the document.
>>       *
>>       * @param pdfDocument The PDF to get the fields from.
>>       *
>>       * @throws IOException If there is an error getting the fields.
>>       */
>>      public void printFields(PDDocument pdfDocument) throws IOException
>>      {
>>          PDDocumentCatalog docCatalog = pdfDocument.getDocumentCatalog();
>>          PDAcroForm acroForm = docCatalog.getAcroForm();
>>          List<PDField> fields = acroForm.getFields();
>> 
>>          //System.out.println(fields.size() + " top-level fields were
>> found on the form");
>> 
>>          for (PDField field : fields)
>>          {
>>              processField(field, "|--", field.getPartialName());
>>          }
>>      }
>> 
>>      private void processField(PDField field, String sLevel, String
>> sParent) throws IOException
>>      {
>>          String partialName = field.getPartialName();
>> 
>>          if (field instanceof PDTerminalField)
>>          {
>>              PDTerminalField termField = (PDTerminalField) field;
>>              for (PDAnnotationWidget widget : termField.getWidgets())
>>              {
>>                  PDAction action = widget.getAction();
>>                  if (action instanceof PDActionJavaScript)
>>                  {
>>                      System.out.println(field.getFullyQualifiedName() +
>> ": " + action.getClass().getSimpleName() + " js widget action:\n" +
>> action.getCOSObject());
>>                      printPossibleJS(action);
>>                  }
>>                  PDAnnotationAdditionalActions actions =
>> widget.getActions();
>>                  if (actions != null)
>>                  {
>>                      System.out.println(field.getFullyQualifiedName() +
>> ": " + actions.getClass().getSimpleName() + " js widget actionS:\n" +
>> actions.getCOSObject());
>> 
>>                      // Merkwürdig, wieso bekomme ich nicht
>> PDFormFieldAdditionalActions sondern ein PDAnnotationAdditionalActions
>> in dem ein K ist aber kein getK() ?
>>                      PDFormFieldAdditionalActions ffActions = new
>> PDFormFieldAdditionalActions((COSDictionary) actions.getCOSObject());
>>                      printPossibleJS(ffActions.getK());
>>                      printPossibleJS(ffActions.getC());
>>                      printPossibleJS(ffActions.getF());
>>                      printPossibleJS(ffActions.getV());
>>                  }
>>              }
>>          }
>> 
>>          if (field instanceof PDNonTerminalField)
>>          {
>>              if (!sParent.equals(field.getPartialName()))
>>              {
>>                  if (partialName != null)
>>                  {
>>                      sParent = sParent + "." + partialName;
>>                  }
>>              }
>>              //System.out.println(sLevel + sParent);
>> 
>>              for (PDField child : ((PDNonTerminalField)
>> field).getChildren())
>>              {
>>                  processField(child, "|  " + sLevel, sParent);
>>              }
>>          }
>>          else
>>          {
>>              String fieldValue = field.getValueAsString();
>>              StringBuilder outputString = new StringBuilder(sLevel);
>>              outputString.append(sParent);
>>              if (partialName != null)
>>              {
>>                  outputString.append(".").append(partialName);
>>              }
>>              outputString.append(" = ").append(fieldValue);
>>              outputString.append(",
>> type=").append(field.getClass().getName());
>>              //System.out.println(outputString);
>>          }
>>      }
>> 
>>      private void printPossibleJS(PDAction kAction)
>>      {
>>          if (kAction instanceof PDActionJavaScript)
>>          {
>>              PDActionJavaScript jsAction = (PDActionJavaScript) kAction;
>>              String jsString = jsAction.getAction();
>>              if (!jsString.contains("\n"))
>>              {
>>                  // Sonst erscheint in Netbeans nichts?!
>>                  jsString = jsString.replaceAll("\r",
>> "\n").replaceAll("\n\n", "\n");
>>              }
>>              System.out.println(jsString);
>>              System.out.println();
>>          }
>>      }
>> 
>>      /**
>>       * This will read a PDF file and print out the form elements. <br />
>>       * see usage() for commandline
>>       *
>>       * @param args command line arguments
>>       *
>>       * @throws IOException If there is an error importing the FDF document.
>>       */
>>      public static void main(String[] args) throws IOException
>>      {
>>          PDDocument pdf = null;
>>          try
>>          {
>>              pdf = PDDocument.load(new File(XXXXXX));
>>              PrintJavaScriptFields exporter = new PrintJavaScriptFields();
>>              exporter.printFields(pdf);
>>          }
>>          finally
>>          {
>>              if (pdf != null)
>>              {
>>                  pdf.close();
>>              }
>>          }
>>      }
>> 
>> }
>> 
>> 
>> 
>>> Thank you.
>>> 
>>>          Davide
>>> 
>>> ________________________________________
>>> Da: Tilman Hausherr [THausherr@t-online.de]
>>> Inviato: martedì 23 agosto 2016 8.34
>>> A: users@pdfbox.apache.org
>>> Oggetto: Re: Check for scripts in a PDF
>>> 
>>> Am 22.08.2016 um 15:14 schrieb Davide Zoni:
>>>> Hallo everybody,
>>>> 
>>>> i'm using PDFbox to check if a PDF file contains malicious scripts. I'm using the PDF/A-1a validation to check the file. Since i'm searching only for potential damaging code and not for a true PDF/A-1a standard accompliance, is it enough to consider 1.x.x, 6.x.x and 7.x.x errors as "true" errors? Below category description:
>>>> 
>>>> Category        Description
>>>> 1[.y[.z]]       Syntax Error
>>>> 2[.y[.z]]       Graphic Error
>>>> 3[.y[.z]]       Font Error
>>>> 4[.y[.z]]       Transparency Error
>>>> 5[.y[.z]]       Annotation Error
>>>> 6[.y[.z]]       Action Error
>>>> 7[.y[.z]]       Metadata Error
>>> Unclear what you're asking. Are you seeking to detect files with
>>> javascript? If so, I'd rather build something something from scratch,
>>> i.e. read the PDF specification and see where JS is used. Maybe compare
>>> with the preflight source code to check that you didn't miss something.
>>> 
>>> Tilman
>>> 
>>> 
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
>>> For additional commands, e-mail: users-help@pdfbox.apache.org
>>> 
>>> Il contenuto e le informazioni di questo messaggio di posta elettronica sono riservate, confidenziali e non vincolanti nè impegnative per Cedacri s.p.a., ne è vietata pertanto la diffusione o divulgazione in qualunque modo eseguita. Qualora Lei non fosse la persona a cui il presente messaggio è destinato La invitiamo ad eliminarlo e a non leggerlo, dandocene gentilmente comunicazione. The content, informations and any attachments of this e-mail are classified, confidential and not binding neither impegnative for Cedacri S.P.A., the spread or spreading in any executed way is prohibited therefore. If you are not named recipient, please notify the sender immediately and do not disclose the contents to another person, use it for any purpose, or store or copy the information in any medium.
>>> 
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
>>> For additional commands, e-mail: users-help@pdfbox.apache.org
>>> 
>> 
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
>> For additional commands, e-mail: users-help@pdfbox.apache.org
>> 
>> 
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
>> For additional commands, e-mail: users-help@pdfbox.apache.org
>> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
> For additional commands, e-mail: users-help@pdfbox.apache.org
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
> For additional commands, e-mail: users-help@pdfbox.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org


RE: Check for scripts in a PDF

Posted by Davide Zoni <Da...@Cedacri.it>.
Hi everybody again,

i'm trying to figure out if your method is suitable for my necessities but everytime i try to access the acroForm (even in a pdf file with scripts and forms) it's null.
Am i loading the file in a wrong way? Am i missing something?

Best regards.
        
________________________________________
Da: Tilman Hausherr [THausherr@t-online.de]
Inviato: mercoledì 24 agosto 2016 18.24
A: users@pdfbox.apache.org
Oggetto: Re: Check for scripts in a PDF

Am 24.08.2016 um 15:41 schrieb Davide Zoni:
> Thank you. This might be helpful but i'm afraid that i would not be able to check every possibility. There's a way to check if a PDF is static (or dynamic)? For our pourpose that shuold be enough.

No there is no such method.

Tilman


> Best regards.
>
>          Davide Zoni
>
>          Cedacri S.p.A.
>
>          Tel.: 0521807433
>
>          e-mail: davide.zoni@cedacri.it
>
>          www.cedacri.it
>
>
> ________________________________________
> Da: Tilman Hausherr [THausherr@t-online.de]
> Inviato: martedì 23 agosto 2016 18.23
> A: users@pdfbox.apache.org
> Oggetto: Re: Check for scripts in a PDF
>
> Am 23.08.2016 um 09:35 schrieb Davide Zoni:
>> Yes, i'm seeking to detect files with scripts. Not static. I don't undestand what do you mean with "Maybe compare
>> with the preflight source code to check that you didn't miss something", can you elaborate on that?
> I meant to search for "Javascript" in the source code, and then see
> where it is used. This is just so that you can be more sure what you got
> all when you read the PDF specification.
>
> Btw I once wrote some code to show (some) javascript fields, see below
> or search for "Roberto Nibali Javascript". He also improved that code
> and posted the improved version. It may not find all javascript stuff,
> but it could help show you how to write code.
>
> Tilman
>
>
> public class PrintJavaScriptFields
> {
>
>       /**
>        * This will print all the fields from the document.
>        *
>        * @param pdfDocument The PDF to get the fields from.
>        *
>        * @throws IOException If there is an error getting the fields.
>        */
>       public void printFields(PDDocument pdfDocument) throws IOException
>       {
>           PDDocumentCatalog docCatalog = pdfDocument.getDocumentCatalog();
>           PDAcroForm acroForm = docCatalog.getAcroForm();
>           List<PDField> fields = acroForm.getFields();
>
>           //System.out.println(fields.size() + " top-level fields were
> found on the form");
>
>           for (PDField field : fields)
>           {
>               processField(field, "|--", field.getPartialName());
>           }
>       }
>
>       private void processField(PDField field, String sLevel, String
> sParent) throws IOException
>       {
>           String partialName = field.getPartialName();
>
>           if (field instanceof PDTerminalField)
>           {
>               PDTerminalField termField = (PDTerminalField) field;
>               for (PDAnnotationWidget widget : termField.getWidgets())
>               {
>                   PDAction action = widget.getAction();
>                   if (action instanceof PDActionJavaScript)
>                   {
>                       System.out.println(field.getFullyQualifiedName() +
> ": " + action.getClass().getSimpleName() + " js widget action:\n" +
> action.getCOSObject());
>                       printPossibleJS(action);
>                   }
>                   PDAnnotationAdditionalActions actions =
> widget.getActions();
>                   if (actions != null)
>                   {
>                       System.out.println(field.getFullyQualifiedName() +
> ": " + actions.getClass().getSimpleName() + " js widget actionS:\n" +
> actions.getCOSObject());
>
>                       // Merkwürdig, wieso bekomme ich nicht
> PDFormFieldAdditionalActions sondern ein PDAnnotationAdditionalActions
> in dem ein K ist aber kein getK() ?
>                       PDFormFieldAdditionalActions ffActions = new
> PDFormFieldAdditionalActions((COSDictionary) actions.getCOSObject());
>                       printPossibleJS(ffActions.getK());
>                       printPossibleJS(ffActions.getC());
>                       printPossibleJS(ffActions.getF());
>                       printPossibleJS(ffActions.getV());
>                   }
>               }
>           }
>
>           if (field instanceof PDNonTerminalField)
>           {
>               if (!sParent.equals(field.getPartialName()))
>               {
>                   if (partialName != null)
>                   {
>                       sParent = sParent + "." + partialName;
>                   }
>               }
>               //System.out.println(sLevel + sParent);
>
>               for (PDField child : ((PDNonTerminalField)
> field).getChildren())
>               {
>                   processField(child, "|  " + sLevel, sParent);
>               }
>           }
>           else
>           {
>               String fieldValue = field.getValueAsString();
>               StringBuilder outputString = new StringBuilder(sLevel);
>               outputString.append(sParent);
>               if (partialName != null)
>               {
>                   outputString.append(".").append(partialName);
>               }
>               outputString.append(" = ").append(fieldValue);
>               outputString.append(",
> type=").append(field.getClass().getName());
>               //System.out.println(outputString);
>           }
>       }
>
>       private void printPossibleJS(PDAction kAction)
>       {
>           if (kAction instanceof PDActionJavaScript)
>           {
>               PDActionJavaScript jsAction = (PDActionJavaScript) kAction;
>               String jsString = jsAction.getAction();
>               if (!jsString.contains("\n"))
>               {
>                   // Sonst erscheint in Netbeans nichts?!
>                   jsString = jsString.replaceAll("\r",
> "\n").replaceAll("\n\n", "\n");
>               }
>               System.out.println(jsString);
>               System.out.println();
>           }
>       }
>
>       /**
>        * This will read a PDF file and print out the form elements. <br />
>        * see usage() for commandline
>        *
>        * @param args command line arguments
>        *
>        * @throws IOException If there is an error importing the FDF document.
>        */
>       public static void main(String[] args) throws IOException
>       {
>           PDDocument pdf = null;
>           try
>           {
>               pdf = PDDocument.load(new File(XXXXXX));
>               PrintJavaScriptFields exporter = new PrintJavaScriptFields();
>               exporter.printFields(pdf);
>           }
>           finally
>           {
>               if (pdf != null)
>               {
>                   pdf.close();
>               }
>           }
>       }
>
> }
>
>
>
>> Thank you.
>>
>>           Davide
>>
>> ________________________________________
>> Da: Tilman Hausherr [THausherr@t-online.de]
>> Inviato: martedì 23 agosto 2016 8.34
>> A: users@pdfbox.apache.org
>> Oggetto: Re: Check for scripts in a PDF
>>
>> Am 22.08.2016 um 15:14 schrieb Davide Zoni:
>>> Hallo everybody,
>>>
>>> i'm using PDFbox to check if a PDF file contains malicious scripts. I'm using the PDF/A-1a validation to check the file. Since i'm searching only for potential damaging code and not for a true PDF/A-1a standard accompliance, is it enough to consider 1.x.x, 6.x.x and 7.x.x errors as "true" errors? Below category description:
>>>
>>> Category        Description
>>> 1[.y[.z]]       Syntax Error
>>> 2[.y[.z]]       Graphic Error
>>> 3[.y[.z]]       Font Error
>>> 4[.y[.z]]       Transparency Error
>>> 5[.y[.z]]       Annotation Error
>>> 6[.y[.z]]       Action Error
>>> 7[.y[.z]]       Metadata Error
>> Unclear what you're asking. Are you seeking to detect files with
>> javascript? If so, I'd rather build something something from scratch,
>> i.e. read the PDF specification and see where JS is used. Maybe compare
>> with the preflight source code to check that you didn't miss something.
>>
>> Tilman
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
>> For additional commands, e-mail: users-help@pdfbox.apache.org
>>
>> Il contenuto e le informazioni di questo messaggio di posta elettronica sono riservate, confidenziali e non vincolanti nè impegnative per Cedacri s.p.a., ne è vietata pertanto la diffusione o divulgazione in qualunque modo eseguita. Qualora Lei non fosse la persona a cui il presente messaggio è destinato La invitiamo ad eliminarlo e a non leggerlo, dandocene gentilmente comunicazione. The content, informations and any attachments of this e-mail are classified, confidential and not binding neither impegnative for Cedacri S.P.A., the spread or spreading in any executed way is prohibited therefore. If you are not named recipient, please notify the sender immediately and do not disclose the contents to another person, use it for any purpose, or store or copy the information in any medium.
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
>> For additional commands, e-mail: users-help@pdfbox.apache.org
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
> For additional commands, e-mail: users-help@pdfbox.apache.org
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
> For additional commands, e-mail: users-help@pdfbox.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org


Re: Check for scripts in a PDF

Posted by Tilman Hausherr <TH...@t-online.de>.
Am 24.08.2016 um 15:41 schrieb Davide Zoni:
> Thank you. This might be helpful but i'm afraid that i would not be able to check every possibility. There's a way to check if a PDF is static (or dynamic)? For our pourpose that shuold be enough.

No there is no such method.

Tilman


> Best regards.
>
>          Davide Zoni
>
>          Cedacri S.p.A.
>
>          Tel.: 0521807433
>
>          e-mail: davide.zoni@cedacri.it
>
>          www.cedacri.it
>
>
> ________________________________________
> Da: Tilman Hausherr [THausherr@t-online.de]
> Inviato: marted 23 agosto 2016 18.23
> A: users@pdfbox.apache.org
> Oggetto: Re: Check for scripts in a PDF
>
> Am 23.08.2016 um 09:35 schrieb Davide Zoni:
>> Yes, i'm seeking to detect files with scripts. Not static. I don't undestand what do you mean with "Maybe compare
>> with the preflight source code to check that you didn't miss something", can you elaborate on that?
> I meant to search for "Javascript" in the source code, and then see
> where it is used. This is just so that you can be more sure what you got
> all when you read the PDF specification.
>
> Btw I once wrote some code to show (some) javascript fields, see below
> or search for "Roberto Nibali Javascript". He also improved that code
> and posted the improved version. It may not find all javascript stuff,
> but it could help show you how to write code.
>
> Tilman
>
>
> public class PrintJavaScriptFields
> {
>
>       /**
>        * This will print all the fields from the document.
>        *
>        * @param pdfDocument The PDF to get the fields from.
>        *
>        * @throws IOException If there is an error getting the fields.
>        */
>       public void printFields(PDDocument pdfDocument) throws IOException
>       {
>           PDDocumentCatalog docCatalog = pdfDocument.getDocumentCatalog();
>           PDAcroForm acroForm = docCatalog.getAcroForm();
>           List<PDField> fields = acroForm.getFields();
>
>           //System.out.println(fields.size() + " top-level fields were
> found on the form");
>
>           for (PDField field : fields)
>           {
>               processField(field, "|--", field.getPartialName());
>           }
>       }
>
>       private void processField(PDField field, String sLevel, String
> sParent) throws IOException
>       {
>           String partialName = field.getPartialName();
>
>           if (field instanceof PDTerminalField)
>           {
>               PDTerminalField termField = (PDTerminalField) field;
>               for (PDAnnotationWidget widget : termField.getWidgets())
>               {
>                   PDAction action = widget.getAction();
>                   if (action instanceof PDActionJavaScript)
>                   {
>                       System.out.println(field.getFullyQualifiedName() +
> ": " + action.getClass().getSimpleName() + " js widget action:\n" +
> action.getCOSObject());
>                       printPossibleJS(action);
>                   }
>                   PDAnnotationAdditionalActions actions =
> widget.getActions();
>                   if (actions != null)
>                   {
>                       System.out.println(field.getFullyQualifiedName() +
> ": " + actions.getClass().getSimpleName() + " js widget actionS:\n" +
> actions.getCOSObject());
>
>                       // Merkwrdig, wieso bekomme ich nicht
> PDFormFieldAdditionalActions sondern ein PDAnnotationAdditionalActions
> in dem ein K ist aber kein getK() ?
>                       PDFormFieldAdditionalActions ffActions = new
> PDFormFieldAdditionalActions((COSDictionary) actions.getCOSObject());
>                       printPossibleJS(ffActions.getK());
>                       printPossibleJS(ffActions.getC());
>                       printPossibleJS(ffActions.getF());
>                       printPossibleJS(ffActions.getV());
>                   }
>               }
>           }
>
>           if (field instanceof PDNonTerminalField)
>           {
>               if (!sParent.equals(field.getPartialName()))
>               {
>                   if (partialName != null)
>                   {
>                       sParent = sParent + "." + partialName;
>                   }
>               }
>               //System.out.println(sLevel + sParent);
>
>               for (PDField child : ((PDNonTerminalField)
> field).getChildren())
>               {
>                   processField(child, "|  " + sLevel, sParent);
>               }
>           }
>           else
>           {
>               String fieldValue = field.getValueAsString();
>               StringBuilder outputString = new StringBuilder(sLevel);
>               outputString.append(sParent);
>               if (partialName != null)
>               {
>                   outputString.append(".").append(partialName);
>               }
>               outputString.append(" = ").append(fieldValue);
>               outputString.append(",
> type=").append(field.getClass().getName());
>               //System.out.println(outputString);
>           }
>       }
>
>       private void printPossibleJS(PDAction kAction)
>       {
>           if (kAction instanceof PDActionJavaScript)
>           {
>               PDActionJavaScript jsAction = (PDActionJavaScript) kAction;
>               String jsString = jsAction.getAction();
>               if (!jsString.contains("\n"))
>               {
>                   // Sonst erscheint in Netbeans nichts?!
>                   jsString = jsString.replaceAll("\r",
> "\n").replaceAll("\n\n", "\n");
>               }
>               System.out.println(jsString);
>               System.out.println();
>           }
>       }
>
>       /**
>        * This will read a PDF file and print out the form elements. <br />
>        * see usage() for commandline
>        *
>        * @param args command line arguments
>        *
>        * @throws IOException If there is an error importing the FDF document.
>        */
>       public static void main(String[] args) throws IOException
>       {
>           PDDocument pdf = null;
>           try
>           {
>               pdf = PDDocument.load(new File(XXXXXX));
>               PrintJavaScriptFields exporter = new PrintJavaScriptFields();
>               exporter.printFields(pdf);
>           }
>           finally
>           {
>               if (pdf != null)
>               {
>                   pdf.close();
>               }
>           }
>       }
>
> }
>
>
>
>> Thank you.
>>
>>           Davide
>>
>> ________________________________________
>> Da: Tilman Hausherr [THausherr@t-online.de]
>> Inviato: marted 23 agosto 2016 8.34
>> A: users@pdfbox.apache.org
>> Oggetto: Re: Check for scripts in a PDF
>>
>> Am 22.08.2016 um 15:14 schrieb Davide Zoni:
>>> Hallo everybody,
>>>
>>> i'm using PDFbox to check if a PDF file contains malicious scripts. I'm using the PDF/A-1a validation to check the file. Since i'm searching only for potential damaging code and not for a true PDF/A-1a standard accompliance, is it enough to consider 1.x.x, 6.x.x and 7.x.x errors as "true" errors? Below category description:
>>>
>>> Category        Description
>>> 1[.y[.z]]       Syntax Error
>>> 2[.y[.z]]       Graphic Error
>>> 3[.y[.z]]       Font Error
>>> 4[.y[.z]]       Transparency Error
>>> 5[.y[.z]]       Annotation Error
>>> 6[.y[.z]]       Action Error
>>> 7[.y[.z]]       Metadata Error
>> Unclear what you're asking. Are you seeking to detect files with
>> javascript? If so, I'd rather build something something from scratch,
>> i.e. read the PDF specification and see where JS is used. Maybe compare
>> with the preflight source code to check that you didn't miss something.
>>
>> Tilman
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
>> For additional commands, e-mail: users-help@pdfbox.apache.org
>>
>> Il contenuto e le informazioni di questo messaggio di posta elettronica sono riservate, confidenziali e non vincolanti n impegnative per Cedacri s.p.a., ne  vietata pertanto la diffusione o divulgazione in qualunque modo eseguita. Qualora Lei non fosse la persona a cui il presente messaggio  destinato La invitiamo ad eliminarlo e a non leggerlo, dandocene gentilmente comunicazione. The content, informations and any attachments of this e-mail are classified, confidential and not binding neither impegnative for Cedacri S.P.A., the spread or spreading in any executed way is prohibited therefore. If you are not named recipient, please notify the sender immediately and do not disclose the contents to another person, use it for any purpose, or store or copy the information in any medium.
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
>> For additional commands, e-mail: users-help@pdfbox.apache.org
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
> For additional commands, e-mail: users-help@pdfbox.apache.org
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
> For additional commands, e-mail: users-help@pdfbox.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org


RE: Check for scripts in a PDF

Posted by Davide Zoni <Da...@Cedacri.it>.
Thank you. This might be helpful but i'm afraid that i would not be able to check every possibility. There's a way to check if a PDF is static (or dynamic)? For our pourpose that shuold be enough.

Best regards.

        Davide Zoni

        Cedacri S.p.A.

        Tel.: 0521807433

        e-mail: davide.zoni@cedacri.it

        www.cedacri.it


________________________________________
Da: Tilman Hausherr [THausherr@t-online.de]
Inviato: martedì 23 agosto 2016 18.23
A: users@pdfbox.apache.org
Oggetto: Re: Check for scripts in a PDF

Am 23.08.2016 um 09:35 schrieb Davide Zoni:
> Yes, i'm seeking to detect files with scripts. Not static. I don't undestand what do you mean with "Maybe compare
> with the preflight source code to check that you didn't miss something", can you elaborate on that?

I meant to search for "Javascript" in the source code, and then see
where it is used. This is just so that you can be more sure what you got
all when you read the PDF specification.

Btw I once wrote some code to show (some) javascript fields, see below
or search for "Roberto Nibali Javascript". He also improved that code
and posted the improved version. It may not find all javascript stuff,
but it could help show you how to write code.

Tilman


public class PrintJavaScriptFields
{

     /**
      * This will print all the fields from the document.
      *
      * @param pdfDocument The PDF to get the fields from.
      *
      * @throws IOException If there is an error getting the fields.
      */
     public void printFields(PDDocument pdfDocument) throws IOException
     {
         PDDocumentCatalog docCatalog = pdfDocument.getDocumentCatalog();
         PDAcroForm acroForm = docCatalog.getAcroForm();
         List<PDField> fields = acroForm.getFields();

         //System.out.println(fields.size() + " top-level fields were
found on the form");

         for (PDField field : fields)
         {
             processField(field, "|--", field.getPartialName());
         }
     }

     private void processField(PDField field, String sLevel, String
sParent) throws IOException
     {
         String partialName = field.getPartialName();

         if (field instanceof PDTerminalField)
         {
             PDTerminalField termField = (PDTerminalField) field;
             for (PDAnnotationWidget widget : termField.getWidgets())
             {
                 PDAction action = widget.getAction();
                 if (action instanceof PDActionJavaScript)
                 {
                     System.out.println(field.getFullyQualifiedName() +
": " + action.getClass().getSimpleName() + " js widget action:\n" +
action.getCOSObject());
                     printPossibleJS(action);
                 }
                 PDAnnotationAdditionalActions actions =
widget.getActions();
                 if (actions != null)
                 {
                     System.out.println(field.getFullyQualifiedName() +
": " + actions.getClass().getSimpleName() + " js widget actionS:\n" +
actions.getCOSObject());

                     // Merkwürdig, wieso bekomme ich nicht
PDFormFieldAdditionalActions sondern ein PDAnnotationAdditionalActions
in dem ein K ist aber kein getK() ?
                     PDFormFieldAdditionalActions ffActions = new
PDFormFieldAdditionalActions((COSDictionary) actions.getCOSObject());
                     printPossibleJS(ffActions.getK());
                     printPossibleJS(ffActions.getC());
                     printPossibleJS(ffActions.getF());
                     printPossibleJS(ffActions.getV());
                 }
             }
         }

         if (field instanceof PDNonTerminalField)
         {
             if (!sParent.equals(field.getPartialName()))
             {
                 if (partialName != null)
                 {
                     sParent = sParent + "." + partialName;
                 }
             }
             //System.out.println(sLevel + sParent);

             for (PDField child : ((PDNonTerminalField)
field).getChildren())
             {
                 processField(child, "|  " + sLevel, sParent);
             }
         }
         else
         {
             String fieldValue = field.getValueAsString();
             StringBuilder outputString = new StringBuilder(sLevel);
             outputString.append(sParent);
             if (partialName != null)
             {
                 outputString.append(".").append(partialName);
             }
             outputString.append(" = ").append(fieldValue);
             outputString.append(",
type=").append(field.getClass().getName());
             //System.out.println(outputString);
         }
     }

     private void printPossibleJS(PDAction kAction)
     {
         if (kAction instanceof PDActionJavaScript)
         {
             PDActionJavaScript jsAction = (PDActionJavaScript) kAction;
             String jsString = jsAction.getAction();
             if (!jsString.contains("\n"))
             {
                 // Sonst erscheint in Netbeans nichts?!
                 jsString = jsString.replaceAll("\r",
"\n").replaceAll("\n\n", "\n");
             }
             System.out.println(jsString);
             System.out.println();
         }
     }

     /**
      * This will read a PDF file and print out the form elements. <br />
      * see usage() for commandline
      *
      * @param args command line arguments
      *
      * @throws IOException If there is an error importing the FDF document.
      */
     public static void main(String[] args) throws IOException
     {
         PDDocument pdf = null;
         try
         {
             pdf = PDDocument.load(new File(XXXXXX));
             PrintJavaScriptFields exporter = new PrintJavaScriptFields();
             exporter.printFields(pdf);
         }
         finally
         {
             if (pdf != null)
             {
                 pdf.close();
             }
         }
     }

}



>
> Thank you.
>
>          Davide
>
> ________________________________________
> Da: Tilman Hausherr [THausherr@t-online.de]
> Inviato: martedì 23 agosto 2016 8.34
> A: users@pdfbox.apache.org
> Oggetto: Re: Check for scripts in a PDF
>
> Am 22.08.2016 um 15:14 schrieb Davide Zoni:
>> Hallo everybody,
>>
>> i'm using PDFbox to check if a PDF file contains malicious scripts. I'm using the PDF/A-1a validation to check the file. Since i'm searching only for potential damaging code and not for a true PDF/A-1a standard accompliance, is it enough to consider 1.x.x, 6.x.x and 7.x.x errors as "true" errors? Below category description:
>>
>> Category        Description
>> 1[.y[.z]]       Syntax Error
>> 2[.y[.z]]       Graphic Error
>> 3[.y[.z]]       Font Error
>> 4[.y[.z]]       Transparency Error
>> 5[.y[.z]]       Annotation Error
>> 6[.y[.z]]       Action Error
>> 7[.y[.z]]       Metadata Error
> Unclear what you're asking. Are you seeking to detect files with
> javascript? If so, I'd rather build something something from scratch,
> i.e. read the PDF specification and see where JS is used. Maybe compare
> with the preflight source code to check that you didn't miss something.
>
> Tilman
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
> For additional commands, e-mail: users-help@pdfbox.apache.org
>
> Il contenuto e le informazioni di questo messaggio di posta elettronica sono riservate, confidenziali e non vincolanti nè impegnative per Cedacri s.p.a., ne è vietata pertanto la diffusione o divulgazione in qualunque modo eseguita. Qualora Lei non fosse la persona a cui il presente messaggio è destinato La invitiamo ad eliminarlo e a non leggerlo, dandocene gentilmente comunicazione. The content, informations and any attachments of this e-mail are classified, confidential and not binding neither impegnative for Cedacri S.P.A., the spread or spreading in any executed way is prohibited therefore. If you are not named recipient, please notify the sender immediately and do not disclose the contents to another person, use it for any purpose, or store or copy the information in any medium.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
> For additional commands, e-mail: users-help@pdfbox.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org


Re: Check for scripts in a PDF

Posted by Tilman Hausherr <TH...@t-online.de>.
Am 23.08.2016 um 09:35 schrieb Davide Zoni:
> Yes, i'm seeking to detect files with scripts. Not static. I don't undestand what do you mean with "Maybe compare
> with the preflight source code to check that you didn't miss something", can you elaborate on that?

I meant to search for "Javascript" in the source code, and then see 
where it is used. This is just so that you can be more sure what you got 
all when you read the PDF specification.

Btw I once wrote some code to show (some) javascript fields, see below 
or search for "Roberto Nibali Javascript". He also improved that code 
and posted the improved version. It may not find all javascript stuff, 
but it could help show you how to write code.

Tilman


public class PrintJavaScriptFields
{

     /**
      * This will print all the fields from the document.
      *
      * @param pdfDocument The PDF to get the fields from.
      *
      * @throws IOException If there is an error getting the fields.
      */
     public void printFields(PDDocument pdfDocument) throws IOException
     {
         PDDocumentCatalog docCatalog = pdfDocument.getDocumentCatalog();
         PDAcroForm acroForm = docCatalog.getAcroForm();
         List<PDField> fields = acroForm.getFields();

         //System.out.println(fields.size() + " top-level fields were 
found on the form");

         for (PDField field : fields)
         {
             processField(field, "|--", field.getPartialName());
         }
     }

     private void processField(PDField field, String sLevel, String 
sParent) throws IOException
     {
         String partialName = field.getPartialName();

         if (field instanceof PDTerminalField)
         {
             PDTerminalField termField = (PDTerminalField) field;
             for (PDAnnotationWidget widget : termField.getWidgets())
             {
                 PDAction action = widget.getAction();
                 if (action instanceof PDActionJavaScript)
                 {
                     System.out.println(field.getFullyQualifiedName() + 
": " + action.getClass().getSimpleName() + " js widget action:\n" + 
action.getCOSObject());
                     printPossibleJS(action);
                 }
                 PDAnnotationAdditionalActions actions = 
widget.getActions();
                 if (actions != null)
                 {
                     System.out.println(field.getFullyQualifiedName() + 
": " + actions.getClass().getSimpleName() + " js widget actionS:\n" + 
actions.getCOSObject());

                     // Merkwrdig, wieso bekomme ich nicht 
PDFormFieldAdditionalActions sondern ein PDAnnotationAdditionalActions 
in dem ein K ist aber kein getK() ?
                     PDFormFieldAdditionalActions ffActions = new 
PDFormFieldAdditionalActions((COSDictionary) actions.getCOSObject());
                     printPossibleJS(ffActions.getK());
                     printPossibleJS(ffActions.getC());
                     printPossibleJS(ffActions.getF());
                     printPossibleJS(ffActions.getV());
                 }
             }
         }

         if (field instanceof PDNonTerminalField)
         {
             if (!sParent.equals(field.getPartialName()))
             {
                 if (partialName != null)
                 {
                     sParent = sParent + "." + partialName;
                 }
             }
             //System.out.println(sLevel + sParent);

             for (PDField child : ((PDNonTerminalField) 
field).getChildren())
             {
                 processField(child, "|  " + sLevel, sParent);
             }
         }
         else
         {
             String fieldValue = field.getValueAsString();
             StringBuilder outputString = new StringBuilder(sLevel);
             outputString.append(sParent);
             if (partialName != null)
             {
                 outputString.append(".").append(partialName);
             }
             outputString.append(" = ").append(fieldValue);
             outputString.append(", 
type=").append(field.getClass().getName());
             //System.out.println(outputString);
         }
     }

     private void printPossibleJS(PDAction kAction)
     {
         if (kAction instanceof PDActionJavaScript)
         {
             PDActionJavaScript jsAction = (PDActionJavaScript) kAction;
             String jsString = jsAction.getAction();
             if (!jsString.contains("\n"))
             {
                 // Sonst erscheint in Netbeans nichts?!
                 jsString = jsString.replaceAll("\r", 
"\n").replaceAll("\n\n", "\n");
             }
             System.out.println(jsString);
             System.out.println();
         }
     }

     /**
      * This will read a PDF file and print out the form elements. <br />
      * see usage() for commandline
      *
      * @param args command line arguments
      *
      * @throws IOException If there is an error importing the FDF document.
      */
     public static void main(String[] args) throws IOException
     {
         PDDocument pdf = null;
         try
         {
             pdf = PDDocument.load(new File(XXXXXX));
             PrintJavaScriptFields exporter = new PrintJavaScriptFields();
             exporter.printFields(pdf);
         }
         finally
         {
             if (pdf != null)
             {
                 pdf.close();
             }
         }
     }

}



>
> Thank you.
>
>          Davide
>
> ________________________________________
> Da: Tilman Hausherr [THausherr@t-online.de]
> Inviato: marted 23 agosto 2016 8.34
> A: users@pdfbox.apache.org
> Oggetto: Re: Check for scripts in a PDF
>
> Am 22.08.2016 um 15:14 schrieb Davide Zoni:
>> Hallo everybody,
>>
>> i'm using PDFbox to check if a PDF file contains malicious scripts. I'm using the PDF/A-1a validation to check the file. Since i'm searching only for potential damaging code and not for a true PDF/A-1a standard accompliance, is it enough to consider 1.x.x, 6.x.x and 7.x.x errors as "true" errors? Below category description:
>>
>> Category        Description
>> 1[.y[.z]]       Syntax Error
>> 2[.y[.z]]       Graphic Error
>> 3[.y[.z]]       Font Error
>> 4[.y[.z]]       Transparency Error
>> 5[.y[.z]]       Annotation Error
>> 6[.y[.z]]       Action Error
>> 7[.y[.z]]       Metadata Error
> Unclear what you're asking. Are you seeking to detect files with
> javascript? If so, I'd rather build something something from scratch,
> i.e. read the PDF specification and see where JS is used. Maybe compare
> with the preflight source code to check that you didn't miss something.
>
> Tilman
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
> For additional commands, e-mail: users-help@pdfbox.apache.org
>
> Il contenuto e le informazioni di questo messaggio di posta elettronica sono riservate, confidenziali e non vincolanti n impegnative per Cedacri s.p.a., ne  vietata pertanto la diffusione o divulgazione in qualunque modo eseguita. Qualora Lei non fosse la persona a cui il presente messaggio  destinato La invitiamo ad eliminarlo e a non leggerlo, dandocene gentilmente comunicazione. The content, informations and any attachments of this e-mail are classified, confidential and not binding neither impegnative for Cedacri S.P.A., the spread or spreading in any executed way is prohibited therefore. If you are not named recipient, please notify the sender immediately and do not disclose the contents to another person, use it for any purpose, or store or copy the information in any medium.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
> For additional commands, e-mail: users-help@pdfbox.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org


RE: Check for scripts in a PDF

Posted by Davide Zoni <Da...@Cedacri.it>.
Yes, i'm seeking to detect files with scripts. Not static. I don't undestand what do you mean with "Maybe compare
with the preflight source code to check that you didn't miss something", can you elaborate on that?

Thank you.

        Davide

________________________________________
Da: Tilman Hausherr [THausherr@t-online.de]
Inviato: martedì 23 agosto 2016 8.34
A: users@pdfbox.apache.org
Oggetto: Re: Check for scripts in a PDF

Am 22.08.2016 um 15:14 schrieb Davide Zoni:
> Hallo everybody,
>
> i'm using PDFbox to check if a PDF file contains malicious scripts. I'm using the PDF/A-1a validation to check the file. Since i'm searching only for potential damaging code and not for a true PDF/A-1a standard accompliance, is it enough to consider 1.x.x, 6.x.x and 7.x.x errors as "true" errors? Below category description:
>
> Category        Description
> 1[.y[.z]]       Syntax Error
> 2[.y[.z]]       Graphic Error
> 3[.y[.z]]       Font Error
> 4[.y[.z]]       Transparency Error
> 5[.y[.z]]       Annotation Error
> 6[.y[.z]]       Action Error
> 7[.y[.z]]       Metadata Error

Unclear what you're asking. Are you seeking to detect files with
javascript? If so, I'd rather build something something from scratch,
i.e. read the PDF specification and see where JS is used. Maybe compare
with the preflight source code to check that you didn't miss something.

Tilman


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org

Il contenuto e le informazioni di questo messaggio di posta elettronica sono riservate, confidenziali e non vincolanti nè impegnative per Cedacri s.p.a., ne è vietata pertanto la diffusione o divulgazione in qualunque modo eseguita. Qualora Lei non fosse la persona a cui il presente messaggio è destinato La invitiamo ad eliminarlo e a non leggerlo, dandocene gentilmente comunicazione. The content, informations and any attachments of this e-mail are classified, confidential and not binding neither impegnative for Cedacri S.P.A., the spread or spreading in any executed way is prohibited therefore. If you are not named recipient, please notify the sender immediately and do not disclose the contents to another person, use it for any purpose, or store or copy the information in any medium.

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org


Re: Check for scripts in a PDF

Posted by Tilman Hausherr <TH...@t-online.de>.
Am 22.08.2016 um 15:14 schrieb Davide Zoni:
> Hallo everybody,
>
> i'm using PDFbox to check if a PDF file contains malicious scripts. I'm using the PDF/A-1a validation to check the file. Since i'm searching only for potential damaging code and not for a true PDF/A-1a standard accompliance, is it enough to consider 1.x.x, 6.x.x and 7.x.x errors as "true" errors? Below category description:
>
> Category        Description
> 1[.y[.z]]       Syntax Error
> 2[.y[.z]]       Graphic Error
> 3[.y[.z]]       Font Error
> 4[.y[.z]]       Transparency Error
> 5[.y[.z]]       Annotation Error
> 6[.y[.z]]       Action Error
> 7[.y[.z]]       Metadata Error

Unclear what you're asking. Are you seeking to detect files with 
javascript? If so, I'd rather build something something from scratch, 
i.e. read the PDF specification and see where JS is used. Maybe compare 
with the preflight source code to check that you didn't miss something.

Tilman


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org