You are viewing a plain text version of this content. The canonical link for it is here.

Posted to java-user@lucene.apache.org by nachi <na...@gmail.com> on 2007/08/11 08:55:03 UTC

indexing going wrong

all,

No sure if earlier mail went thru..so resending...

Im new lucene and Im trying to develope a textual search module. I have
written the following code ( this is research code) -


 File dir = new File("c:/test");
  IndexWriter writer = new IndexWriter(dir, new StandardAnalyzer(), true);
  Document doc = new Document();
  File[] file = dir.listFiles();
  for (File f: file) {
   if (f.isFile() && f.canRead()) {
    System.out.println(f.getName());
    doc.add(new Field("filename",f.getName(),Field.Store.YES,
Field.Index.UN_TOKENIZED));
    doc.add(new Field("contents", new FileReader(f)));
    writer.addDocument(doc);
   }
  }

  System.out.println("count=" + writer.docCount());
     writer.optimize();
  writer.close();



I'm trying to index the contents of test diretory which has only txt files.

When I search the index for an particular word, I get the same filename
everytime.

Here is the code for searching -

File dir = new File("D:\\test");
  FSDirectory fsdir = FSDirectory.getDirectory(dir);
  IndexSearcher d = new IndexSearcher(fsdir);
  QueryParser p = new QueryParser("contents",new StandardAnalyzer());
  Query q = p.parse("ERROR");
  Hits hits = d.search(q);

  for (int i = 0; i < hits.length(); i++) {
   Document doc = hits.doc(i);
   System.out.println(doc.get("filename"));
   }
  d.close();
 }

Can somebody tell me what I'm doing wrong ? I suspect that there is
something wrong in the way I index.




-- 
-nachi

Re: indexing going wrong

Posted by nachi <na...@gmail.com>.

oops...that was a control-c control-v error. Im indexing directory "c:\test"
and using the
index in c:\test for searching. I found the problem to be this - Im reusing
the same document
object in the for loop. I solved it by creating new document each time the
loop runs...
actually when if statement becomes true.

-nachi




On 8/11/07, Erick Erickson <er...@gmail.com> wrote:
>
> A couple of things come to mind. But before I get to them, really, really,
> really get a copy of Luke. It'll allow you to examine your index and
> see if what's in there is really what you expect. It'll save you a world
> of hurt <G>.... Google luke lucene....
>
> Also, use query.toString to see what the query actually looks like.
>
> Your index and the files you're trying to put into that index are both
> "c:/test". Are there really files to start with in that directory?
> And Lucene creates a new index when you specify true in the
> IndexWriter, and I'm not sure how many files it blows away in the process.
>
>
> But none of that matters, because your searcher opens "d:/test" and
> you're indexing into "c:/test" <G>.....
>
>
> Best
> Erick
>
>
> On 8/11/07, nachi <na...@gmail.com> wrote:
> >
> > all,
> >
> > No sure if earlier mail went thru..so resending...
> >
> > Im new lucene and Im trying to develope a textual search module. I have
> > written the following code ( this is research code) -
> >
> >
> > File dir = new File("c:/test");
> >   IndexWriter writer = new IndexWriter(dir, new StandardAnalyzer(),
> true);
> >   Document doc = new Document();
> >   File[] file = dir.listFiles();
> >   for (File f: file) {
> >    if (f.isFile() && f.canRead()) {
> >     System.out.println(f.getName());
> >     doc.add(new Field("filename",f.getName(),Field.Store.YES,
> > Field.Index.UN_TOKENIZED));
> >     doc.add(new Field("contents", new FileReader(f)));
> >     writer.addDocument(doc);
> >    }
> >   }
> >
> >   System.out.println("count=" + writer.docCount());
> >      writer.optimize();
> >   writer.close();
> >
> >
> >
> > I'm trying to index the contents of test diretory which has only txt
> > files.
> >
> > When I search the index for an particular word, I get the same filename
> > everytime.
> >
> > Here is the code for searching -
> >
> > File dir = new File("D:\\test");
> >   FSDirectory fsdir = FSDirectory.getDirectory(dir);
> >   IndexSearcher d = new IndexSearcher(fsdir);
> >   QueryParser p = new QueryParser("contents",new StandardAnalyzer());
> >   Query q = p.parse("ERROR");
> >   Hits hits = d.search(q);
> >
> >   for (int i = 0; i < hits.length(); i++) {
> >    Document doc = hits.doc(i);
> >    System.out.println(doc.get("filename"));
> >    }
> >   d.close();
> > }
> >
> > Can somebody tell me what I'm doing wrong ? I suspect that there is
> > something wrong in the way I index.
> >
> >
> >
> >
> > --
> > -nachi
> >
>



-- 
-nachi

Re: indexing going wrong

Posted by Erick Erickson <er...@gmail.com>.

A couple of things come to mind. But before I get to them, really, really,
really get a copy of Luke. It'll allow you to examine your index and
see if what's in there is really what you expect. It'll save you a world
of hurt <G>.... Google luke lucene....

Also, use query.toString to see what the query actually looks like.

Your index and the files you're trying to put into that index are both
"c:/test". Are there really files to start with in that directory?
And Lucene creates a new index when you specify true in the
IndexWriter, and I'm not sure how many files it blows away in the process.

But none of that matters, because your searcher opens "d:/test" and
you're indexing into "c:/test" <G>.....

Best
Erick

On 8/11/07, nachi <na...@gmail.com> wrote:
>
> all,
>
> No sure if earlier mail went thru..so resending...
>
> Im new lucene and Im trying to develope a textual search module. I have
> written the following code ( this is research code) -
>
>
> File dir = new File("c:/test");
>   IndexWriter writer = new IndexWriter(dir, new StandardAnalyzer(), true);
>   Document doc = new Document();
>   File[] file = dir.listFiles();
>   for (File f: file) {
>    if (f.isFile() && f.canRead()) {
>     System.out.println(f.getName());
>     doc.add(new Field("filename",f.getName(),Field.Store.YES,
> Field.Index.UN_TOKENIZED));
>     doc.add(new Field("contents", new FileReader(f)));
>     writer.addDocument(doc);
>    }
>   }
>
>   System.out.println("count=" + writer.docCount());
>      writer.optimize();
>   writer.close();
>
>
>
> I'm trying to index the contents of test diretory which has only txt
> files.
>
> When I search the index for an particular word, I get the same filename
> everytime.
>
> Here is the code for searching -
>
> File dir = new File("D:\\test");
>   FSDirectory fsdir = FSDirectory.getDirectory(dir);
>   IndexSearcher d = new IndexSearcher(fsdir);
>   QueryParser p = new QueryParser("contents",new StandardAnalyzer());
>   Query q = p.parse("ERROR");
>   Hits hits = d.search(q);
>
>   for (int i = 0; i < hits.length(); i++) {
>    Document doc = hits.doc(i);
>    System.out.println(doc.get("filename"));
>    }
>   d.close();
> }
>
> Can somebody tell me what I'm doing wrong ? I suspect that there is
> something wrong in the way I index.
>
>
>
>
> --
> -nachi
>