You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Matthias Bräuer <ne...@matrix-web.de> on 2005/09/16 21:51:28 UTC
Text is not indexed when passed as a StringReader
Hello,
this question seems to have occured in the mailing list before but I
wasn't able to find a satisfying answer. So please excuse if I'm asking
something that has already been discussed.
My problem is as follows:
If I use the Field.Text(String,Reader) method to create an indexed, but
unstored field and the passed in Reader happens to be a StringReader
(e.g. when extracting Word documents using the Textmining library) the
field is not indexed at all. That means Luke shows no terms for this
field and, consequently, searches do not yield any result. For
FileReaders, however, everything seems to work fine.
Of course, I could just convert the reader back into a string (e.g. with
Jakarta Commons IO - IOTools.toString()) and use the
Unstored(String,String) method but then again it wouldn't make sense to
use a StringReader in the first place.
Thanks for your help,
Matthias
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: Text is not indexed when passed as a StringReader
Posted by Andrzej Bialecki <ab...@getopt.org>.
Daniel Naber wrote:
> On Friday 16 September 2005 21:51, Matthias Bräuer wrote:
>
>
>>but
>>unstored field and the passed in Reader happens to be a StringReader
>>(e.g. when extracting Word documents using the Textmining library) the
>>field is not indexed at all. That means Luke shows no terms for this
>>field and, consequently, searches do not yield any result.
>
>
> Luke only shows terms if the field is *stored* (which it isn't for a
> reader). You need to click the "Reconstruct & Edit" button to see if the
> text really isn't *indexed*.
Caveat emptor - the "Restore" function just collects existing terms from
the index. If the input text passed through an aggresive analyzer (like
the StandardAnalyzer), many tokens will be missing and the reconstructed
text will be incomplete.
--
Best regards,
Andrzej Bialecki <><
___. ___ ___ ___ _ _ __________________________________
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com Contact: info at sigram dot com
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: Text is not indexed when passed as a StringReader
Posted by Daniel Naber <lu...@danielnaber.de>.
On Friday 16 September 2005 21:51, Matthias Bräuer wrote:
> but
> unstored field and the passed in Reader happens to be a StringReader
> (e.g. when extracting Word documents using the Textmining library) the
> field is not indexed at all. That means Luke shows no terms for this
> field and, consequently, searches do not yield any result.
Luke only shows terms if the field is *stored* (which it isn't for a
reader). You need to click the "Reconstruct & Edit" button to see if the
text really isn't *indexed*.
Regards
Daniel
--
http://www.danielnaber.de
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: Text is not indexed when passed as a StringReader
Posted by Chris Hostetter <ho...@fucit.org>.
I think you may be having another problem somewhere, usinga StringReader
works just fine for me (in fact: when you create a field with
a plain String, it is wrapped in a StringReader to pass to
your analyzer.
Note the following demo works just fine...
public static void main(String[] args) throws Exception {
RAMDirectory index = new RAMDirectory();
IndexWriter writer = new IndexWriter(index,
new WhitespaceAnalyzer(),
true);
Document doc = new Document();
doc.add(Field.Text("foo", new StringReader("a b c d")));
writer.addDocument(doc);
writer.close();
IndexSearcher s = new IndexSearcher(IndexReader.open(index));
Hits h = s.search(new TermQuery(new Term("foo","a")));
System.out.println(h.length() == 1 ? "FOUND" : "ERROR");
}
: Date: Sat, 17 Sep 2005 03:51:28 +0800
: From: "[ISO-8859-15] Matthias Bräuer" <ne...@matrix-web.de>
: Reply-To: java-user@lucene.apache.org, matthew@matrix-web.de
: To: java-user@lucene.apache.org
: Subject: Text is not indexed when passed as a StringReader
:
: Hello,
:
: this question seems to have occured in the mailing list before but I
: wasn't able to find a satisfying answer. So please excuse if I'm asking
: something that has already been discussed.
:
: My problem is as follows:
: If I use the Field.Text(String,Reader) method to create an indexed, but
: unstored field and the passed in Reader happens to be a StringReader
: (e.g. when extracting Word documents using the Textmining library) the
: field is not indexed at all. That means Luke shows no terms for this
: field and, consequently, searches do not yield any result. For
: FileReaders, however, everything seems to work fine.
:
: Of course, I could just convert the reader back into a string (e.g. with
: Jakarta Commons IO - IOTools.toString()) and use the
: Unstored(String,String) method but then again it wouldn't make sense to
: use a StringReader in the first place.
:
: Thanks for your help,
: Matthias
:
:
:
: ---------------------------------------------------------------------
: To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
: For additional commands, e-mail: java-user-help@lucene.apache.org
:
-Hoss
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org