You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by "Maruan Sahyoun (Issue Comment Edited) (JIRA)" <ji...@apache.org> on 2011/09/30 09:20:45 UTC

[jira] [Issue Comment Edited] (PDFBOX-1123) Not able to read field values from a PDF File if the field contains special characters.

    [ https://issues.apache.org/jira/browse/PDFBOX-1123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13117915#comment-13117915 ] 

Maruan Sahyoun edited comment on PDFBOX-1123 at 9/30/11 7:20 AM:
-----------------------------------------------------------------

@Rubesh
getAcroForm().getFields() only returns the first level of field nodes which is topmostSubform in your case. You need to call .getKids() on a each field node which will give you all the kids, inspect if you are at a field or at another node and move on until you get to the final field. There you can use either .getPartialName() to retrieve the name of the field only or .getFullyQualifiedName() to get the name including the parents.

As an alternative you might want to use the findKid() on a top level field node method which drills down based on an array of names created e.g. by doing a split("\\.") on the full names of the fields you are looking for.

@Andreas
If I'm not mistaken there is no easier way at this point in time to supply the fullyQualifiedName to get the field right? Should we add a convenience method to do so?
                
      was (Author: msahyoun):
    @Rubesh
getAcroForm().getFields() only returns the first level of field nodes which is topmostSubform in your case. You need to call .getKids() on a each field node which will give you all the kids, inspect if you are at a field or at another node and move on until you get to the final field. There you can use either .getPartialName() to retrieve the name of the field only or .getFullyQualifiedName() to get the name including the parents.

@Andreas
If I'm not mistaken there is no easier way at this point in time to supply the fullyQualifiedName to get the field right? Should we add a convenience method to do so?
                  
> Not able to read field values from a PDF File if the field contains special characters.
> ---------------------------------------------------------------------------------------
>
>                 Key: PDFBOX-1123
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-1123
>             Project: PDFBox
>          Issue Type: Bug
>            Reporter: Rubesh MX
>            Priority: Minor
>              Labels: acroform
>         Attachments: fspl.pdf
>
>   Original Estimate: 12h
>  Remaining Estimate: 12h
>
> Hi, I am trying to read the field names in a PDF file, it is working with most of the files, but in some files we are not able to read the field Id/name, the reason being we have some field names as -
> topmostSubform[0].Page1[0].c1_04_0_[0]
> topmostSubform[0].Page1[0].c1_09_0_
> topmostSubform[0].Page2[0].Table_Line4a[0].#subform[1].p2-t69[0]
> Here all the field names starts with topmostSubform[0]. so when we try to get the field names like PDField.getpartialname() - the field name is getting truncated at '.' and we get only - topmostSubform[0] and since all the field names starts with the same name the total count of fields are coming as 1. Since there are some special characters like '.'; '_'; '#' this is causing the issue. Could you please suggest on this? This is very critical.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira