You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@pdfbox.apache.org by "Andreas Lehmkühler (JIRA)" <ji...@apache.org> on 2013/10/08 19:20:45 UTC

[jira] [Commented] (PDFBOX-929) Extraction of the Content from CJK pdf's using PDFBox and indexing the same with LUCENE search in Solaris fails.

    [ https://issues.apache.org/jira/browse/PDFBOX-929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13789409#comment-13789409 ] 

Andreas Lehmkühler commented on PDFBOX-929:
-------------------------------------------

I guess this is a text extraction issue. Can you attach a sample pdf or is it just to late ...

> Extraction of the Content from CJK pdf's using PDFBox and indexing the same with LUCENE search in Solaris fails.
> ----------------------------------------------------------------------------------------------------------------
>
>                 Key: PDFBOX-929
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-929
>             Project: PDFBox
>          Issue Type: Bug
>          Components: PDFReader, Text extraction
>         Environment: Solaris
>            Reporter: gomathy s
>   Original Estimate: 5h
>  Remaining Estimate: 5h
>
> In the solaris environment , when we are using the PDFBox ,extracting the content and setting few lines from the PDF as a description and 
> indexing the content.In the search we don't get any results when we are searching with the CJK characters but english words it is
> able to retreive results.Am using the correct analyzer both during indexing and searching.This happens only in Solaris , in windows it is working 
> fine.Please suggest me guys , this is an major issue for me.



--
This message was sent by Atlassian JIRA
(v6.1#6144)