You are viewing a plain text version of this content. The canonical link for it is here.

Posted to java-user@lucene.apache.org by Matthias Bräuer <ne...@matrix-web.de> on 2005/09/16 21:51:28 UTC

Text is not indexed when passed as a StringReader

Hello,

this question seems to have occured in the mailing list before but I 
wasn't able to find a satisfying answer. So please excuse if I'm asking 
something that has already been discussed.

My problem is as follows:
If I use the Field.Text(String,Reader) method to create an indexed, but 
unstored field and the passed in Reader happens to be a StringReader 
(e.g. when extracting Word documents using the Textmining library) the 
field is not indexed at all. That means Luke shows no terms for this 
field and, consequently, searches do not yield any result. For 
FileReaders, however, everything seems to work fine.

Of course, I could just convert the reader back into a string (e.g. with 
Jakarta Commons IO - IOTools.toString()) and use the 
Unstored(String,String) method but then again it wouldn't make sense to 
use a StringReader in the first place.

Thanks for your help,
Matthias



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: Text is not indexed when passed as a StringReader

Posted by Andrzej Bialecki <ab...@getopt.org>.

Daniel Naber wrote:
> On Friday 16 September 2005 21:51, Matthias Bräuer wrote:
> 
> 
>>but
>>unstored field and the passed in Reader happens to be a StringReader
>>(e.g. when extracting Word documents using the Textmining library) the
>>field is not indexed at all. That means Luke shows no terms for this
>>field and, consequently, searches do not yield any result.
> 
> 
> Luke only shows terms if the field is *stored* (which it isn't for a 
> reader). You need to click the "Reconstruct & Edit" button to see if the 
> text really isn't *indexed*.

Caveat emptor - the "Restore" function just collects existing terms from 
the index. If the input text passed through an aggresive analyzer (like 
the StandardAnalyzer), many tokens will be missing and the reconstructed 
text will be incomplete.

-- 
Best regards,
Andrzej Bialecki     <><
  ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: Text is not indexed when passed as a StringReader

Posted by Daniel Naber <lu...@danielnaber.de>.

On Friday 16 September 2005 21:51, Matthias Bräuer wrote:

> but
> unstored field and the passed in Reader happens to be a StringReader
> (e.g. when extracting Word documents using the Textmining library) the
> field is not indexed at all. That means Luke shows no terms for this
> field and, consequently, searches do not yield any result.

Luke only shows terms if the field is *stored* (which it isn't for a 
reader). You need to click the "Reconstruct & Edit" button to see if the 
text really isn't *indexed*.

Regards
 Daniel

-- 
http://www.danielnaber.de

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: Text is not indexed when passed as a StringReader

Posted by Chris Hostetter <ho...@fucit.org>.

I think you may be having another problem somewhere, usinga StringReader
works just fine for me (in fact: when you create a field with
a plain String, it is wrapped in a StringReader to pass to
your analyzer.

Note the following demo works just fine...

    public static void main(String[] args) throws Exception {
        RAMDirectory index = new RAMDirectory();
        IndexWriter writer = new IndexWriter(index,
                                             new WhitespaceAnalyzer(),
                                             true);
        Document doc = new Document();
        doc.add(Field.Text("foo", new StringReader("a b c d")));
        writer.addDocument(doc);
        writer.close();
        IndexSearcher s = new IndexSearcher(IndexReader.open(index));
        Hits h = s.search(new TermQuery(new Term("foo","a")));
        System.out.println(h.length() == 1 ? "FOUND" : "ERROR");
    }






: Date: Sat, 17 Sep 2005 03:51:28 +0800
: From: "[ISO-8859-15] Matthias Bräuer" <ne...@matrix-web.de>
: Reply-To: java-user@lucene.apache.org, matthew@matrix-web.de
: To: java-user@lucene.apache.org
: Subject: Text is not indexed when passed as a StringReader
:
: Hello,
:
: this question seems to have occured in the mailing list before but I
: wasn't able to find a satisfying answer. So please excuse if I'm asking
: something that has already been discussed.
:
: My problem is as follows:
: If I use the Field.Text(String,Reader) method to create an indexed, but
: unstored field and the passed in Reader happens to be a StringReader
: (e.g. when extracting Word documents using the Textmining library) the
: field is not indexed at all. That means Luke shows no terms for this
: field and, consequently, searches do not yield any result. For
: FileReaders, however, everything seems to work fine.
:
: Of course, I could just convert the reader back into a string (e.g. with
: Jakarta Commons IO - IOTools.toString()) and use the
: Unstored(String,String) method but then again it wouldn't make sense to
: use a StringReader in the first place.
:
: Thanks for your help,
: Matthias
:
:
:
: ---------------------------------------------------------------------
: To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
: For additional commands, e-mail: java-user-help@lucene.apache.org
:



-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org