You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@lucene.apache.org by Haipeng Du <fl...@hotmail.com> on 2004/09/14 17:18:28 UTC

I am new to lucene

Hi, everyone:
I am new to Lucene. There are some questions I want to know why.
(1) when I use Field.Text("content", Reader) to index the file content, I 
can not retrive it when I search. Here is part of code
Analyzer analyzer = new StopAnalyzer();
    Searcher searcher = new IndexSearcher(indexPath);
    Query query = QueryParser.parse(queryString, key2,
                              analyzer);
    Hits hits = searcher.search(query);
I can not find the field when I use : hits.doc(i).get("content"). It is 
null. But I can get all other fields value as the same way. How could I get 
that?
(2) Does Lucene have a way to index pdf content? Which is the best API that 
can be easy used to change pdf to text?
Please response me. Thanks a lot.
Haipeng

_________________________________________________________________
Express yourself instantly with MSN Messenger! Download today - it's FREE! 
hthttp://messenger.msn.click-url.com/go/onm00200471ave/direct/01/


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org

RE: I am new to lucene

Posted by Aviran <am...@infosciences.com>.

1. Field.Text Constructs a Reader-valued Field that is tokenized and
indexed, but is not stored in the index verbatim, Thus you can not retrieve
the text. You need to use Field.Text("content", String) to be able to read
back the content.
2. You can use an open source project called PDFBox which can extract text
from a PDF document.

Aviran

-----Original Message-----
From: Haipeng Du [mailto:flyabovesun@hotmail.com] 
Sent: Tuesday, September 14, 2004 11:18 AM
To: lucene-dev@jakarta.apache.org
Subject: I am new to lucene


Hi, everyone:
I am new to Lucene. There are some questions I want to know why.
(1) when I use Field.Text("content", Reader) to index the file content, I 
can not retrive it when I search. Here is part of code
Analyzer analyzer = new StopAnalyzer();
    Searcher searcher = new IndexSearcher(indexPath);
    Query query = QueryParser.parse(queryString, key2,
                              analyzer);
    Hits hits = searcher.search(query);
I can not find the field when I use : hits.doc(i).get("content"). It is 
null. But I can get all other fields value as the same way. How could I get 
that?
(2) Does Lucene have a way to index pdf content? Which is the best API that 
can be easy used to change pdf to text?
Please response me. Thanks a lot.
Haipeng

_________________________________________________________________
Express yourself instantly with MSN Messenger! Download today - it's FREE! 
hthttp://messenger.msn.click-url.com/go/onm00200471ave/direct/01/


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org