You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by "Vipul Pujari (JIRA)" <ji...@apache.org> on 2010/06/10 18:07:24 UTC

[jira] Created: (PDFBOX-748) Unable to extract special characters from pdf

Unable to extract special characters from pdf
---------------------------------------------

                 Key: PDFBOX-748
                 URL: https://issues.apache.org/jira/browse/PDFBOX-748
             Project: PDFBox
          Issue Type: Bug
         Environment: Windows XP, .Net 2.0
            Reporter: Vipul Pujari


Using below code

Dim ObjBytesRead As String
Dim doc As Org.pdfbox.pdmodel.PDDocument = Org.pdfbox.pdmodel.PDDocument.load(FileName)
Dim stripper As New Org.pdfbox.util.PDFTextStripper
ObjBytesRead = stripper.getText(doc)

I am unable to extract special characters("_") from pdf

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PDFBOX-748) Unable to extract special characters from pdf

Posted by "Vipul Pujari (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PDFBOX-748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12881682#action_12881682 ] 

Vipul Pujari commented on PDFBOX-748:
-------------------------------------

Here in this pdf file I want to extract all text including the table border

> Unable to extract special characters from pdf
> ---------------------------------------------
>
>                 Key: PDFBOX-748
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-748
>             Project: PDFBox
>          Issue Type: Bug
>         Environment: Windows XP, .Net 2.0
>            Reporter: Vipul Pujari
>         Attachments: msnet-formatting-strings.pdf
>
>
> Using below code
> Dim ObjBytesRead As String
> Dim doc As Org.pdfbox.pdmodel.PDDocument = Org.pdfbox.pdmodel.PDDocument.load(FileName)
> Dim stripper As New Org.pdfbox.util.PDFTextStripper
> ObjBytesRead = stripper.getText(doc)
> I am unable to extract special characters("_") from pdf

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PDFBOX-748) Unable to extract special characters from pdf

Posted by "Vipul Pujari (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PDFBOX-748?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Vipul Pujari updated PDFBOX-748:
--------------------------------

    Attachment: msnet-formatting-strings.pdf

Pdf file

> Unable to extract special characters from pdf
> ---------------------------------------------
>
>                 Key: PDFBOX-748
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-748
>             Project: PDFBox
>          Issue Type: Bug
>         Environment: Windows XP, .Net 2.0
>            Reporter: Vipul Pujari
>         Attachments: msnet-formatting-strings.pdf
>
>
> Using below code
> Dim ObjBytesRead As String
> Dim doc As Org.pdfbox.pdmodel.PDDocument = Org.pdfbox.pdmodel.PDDocument.load(FileName)
> Dim stripper As New Org.pdfbox.util.PDFTextStripper
> ObjBytesRead = stripper.getText(doc)
> I am unable to extract special characters("_") from pdf

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PDFBOX-748) Unable to extract special characters from pdf

Posted by "Jukka Zitting (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PDFBOX-748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12880868#action_12880868 ] 

Jukka Zitting commented on PDFBOX-748:
--------------------------------------

Do you have an example PDF document that you could share with us?

> Unable to extract special characters from pdf
> ---------------------------------------------
>
>                 Key: PDFBOX-748
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-748
>             Project: PDFBox
>          Issue Type: Bug
>         Environment: Windows XP, .Net 2.0
>            Reporter: Vipul Pujari
>
> Using below code
> Dim ObjBytesRead As String
> Dim doc As Org.pdfbox.pdmodel.PDDocument = Org.pdfbox.pdmodel.PDDocument.load(FileName)
> Dim stripper As New Org.pdfbox.util.PDFTextStripper
> ObjBytesRead = stripper.getText(doc)
> I am unable to extract special characters("_") from pdf

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.