You are viewing a plain text version of this content. The canonical link for it is here.

Posted to java-user@lucene.apache.org by sirakov <si...@gmail.com> on 2006/11/20 09:20:31 UTC

Newbie Search Question

Hello,
the indexation works, as well as the search. 

E:\Temp>java org.apache.lucene.demo.IndexFiles E:\Temp\linux-vani
Indexing to directory 'index'...
adding E:\Temp\linux-vani\mngenf.pdf
adding E:\Temp\linux-vani\VVZSS2006.pdf
adding E:\Temp\linux-vani\komm.Vlvz-SS06-Hp.pdf
adding E:\Temp\linux-vani\ostsuednf.pdf
adding E:\Temp\linux-vani\ects_2006s.pdf
adding E:\Temp\linux-vani\Programm_SS06.doc
adding E:\Temp\linux-vani\Programm_SS06.kwd
adding E:\Temp\linux-vani\bulso-nf.pdf
adding E:\Temp\linux-vani\Programm_SS06.rtf
Optimizing...
2624 total milliseconds

E:\Temp>java org.apache.lucene.demo.SearchFiles
Query: Kulturthema
Searching for: kulturthema
2 total matching documents
1. E:\Temp\linux-vani\Programm_SS06.rtf
2. E:\Temp\linux-vani\Programm_SS06.doc
Query:


But how I can insert some text into the search results. I must use the
highligter or what? Unfortunately, I find no tips about that under
http://lucene.apache.org/java/docs/gettingstarted.html.

Thanks in advance,
Sirakov
-- 
View this message in context: http://www.nabble.com/Newbie-Search-Question-tf2667479.html#a7438630
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: Newbie Search Question

Posted by sirakov <si...@gmail.com>.

I've read the FAQ, and i have found the following:

You need to store the documents' summary in the index (use Field.Store.YES
when creating that field) and then use the Highlighter from the contrib area
(distributed with Lucene since version 1.9 as
"lucene-highlighter-(version).jar").

If I understand correct, i must create this Field "summary" using
Field.Store.YES and Field.Index.YES, but what kind of parser i must use?

This must look like the example in the HTMLDocument.java:

doc.add(new Field("summary", parser.getSummary(), Field.Store.YES,
Field.Index.YES));

The variable "parser" was declared as HTMLParser, but i cant use this into
the FileDocument.java....can someone give me a hint or help me?

Thanks in advance,
Sirakov
-- 
View this message in context: http://www.nabble.com/Newbie-Search-Question-tf2667479.html#a7782174
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: Newbie Search Question

Posted by sirakov <si...@gmail.com>.

Erick Erickson wrote:
> 
> And how are you storing your date? Field.Store.YES? NO? COMPRESSED?
> 

I think, here is my problem...I have found this in the FileDocument.java:

doc.add(new Field("contents", new FileReader(f)));

Field.Store.YES is missing, but when I try to put this argument, i become an
error message. I'll try to found a solution for the problem, but if you have
any tips for me - please let me know :)

thanks in advance,
sirakov

-- 
View this message in context: http://www.nabble.com/Newbie-Search-Question-tf2667479.html#a7555948
Sent from the Lucene - Java Users mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: Newbie Search Question

Posted by Erick Erickson <er...@gmail.com>.

If we're still dealing with StringReader(text) throwing an error.... It
really shouldn't unless the document has no field named "contents". Here's
what I'd do...

Get a copy of Luke (google luke lucene) to examine your index.
Figure out what the document ID is that you're blowing up on and look at it
in Luke to be sure that there's text in the contents field.
Watch case etc.

You shouldn't be getting a null here unless 1> your doc ID is not in your
index or 2> your document doesn't have such a field.

And how are you storing your date? Field.Store.YES? NO? COMPRESSED?

Best
Erick

On 11/23/06, sirakov <si...@gmail.com> wrote:
>
>
>
> Erick Erickson wrote:
> >
> > So why not assign a string to "text" and try it again? Or show us the
> code
> > where you expect the text variable to get a value.....
> >
> > Erick
> >
> >
>
> I`m sorry that was a miss from my side.
>
> I've tried to put the simple code into SearchFiles, between
>
> Hits hits = searcher.search(query);
>
> and
>
> String path = doc.get("path");
>
> Here the code:
>
>       Highlighter highlighter = new Highlighter(new QueryScorer(query));
>
>
>
>       if (repeat > 0) {                           // repeat & time as
> benchmark
>         Date start = new Date();
>         for (int i = 0; i < repeat; i++) {
>           hits = searcher.search(query);
>         }
>         Date end = new Date();
>         System.out.println("Time: "+(end.getTime()-start.getTime())+"ms");
>       }
>
>       System.out.println(hits.length() + " total matching documents");
>
>       final int HITS_PER_PAGE = 10;
>       for (int start = 0; start < hits.length(); start += HITS_PER_PAGE) {
>         int end = Math.min(hits.length(), start + HITS_PER_PAGE);
>         for (int i = start; i < end; i++) {
>
>           if (raw) {                              // output raw format
>             System.out.println("doc="+hits.id(i)+" score="+hits.score(i));
>             continue;
>           }
>
>           Document doc = hits.doc(i);
>
>           String text = hits.doc(i).get(field); //String field =
> "contents";
>           TokenStream tokenStream = analyzer.tokenStream(field, new
> StringReader(text));
>           // Get 3 best fragments and seperate with a "..."
>           String result = highlighter.getBestFragments(tokenStream, text,
> 3,
> "...");
>
>
>           String path = doc.get("path");
>           if (path != null) {
>             System.out.println((i+1) + ". " +
> path);System.out.println("\t"+result);
>             String title = doc.get("title");
>             if (title != null) {
>               System.out.println("   Title: " + doc.get("title"));
>             }
>           } else {
>             System.out.println((i+1) + ". " + "No path for this
> document");
>           }
>         }
>
> The bolded text was added by me. I hope, i have made the changes in the
> right file :)
> --
> View this message in context:
> http://www.nabble.com/Newbie-Search-Question-tf2667479.html#a7513261
> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Re: Newbie Search Question

Posted by sirakov <si...@gmail.com>.


Erick Erickson wrote:
> 
> So why not assign a string to "text" and try it again? Or show us the code
> where you expect the text variable to get a value.....
> 
> Erick
> 
> 

I`m sorry that was a miss from my side. 

I've tried to put the simple code into SearchFiles, between

Hits hits = searcher.search(query);

and 

String path = doc.get("path");

Here the code:

      Highlighter highlighter = new Highlighter(new QueryScorer(query));
      
      
      
      if (repeat > 0) {                           // repeat & time as
benchmark
        Date start = new Date();
        for (int i = 0; i < repeat; i++) {
          hits = searcher.search(query);
        }
        Date end = new Date();
        System.out.println("Time: "+(end.getTime()-start.getTime())+"ms");
      }

      System.out.println(hits.length() + " total matching documents");

      final int HITS_PER_PAGE = 10;
      for (int start = 0; start < hits.length(); start += HITS_PER_PAGE) {
        int end = Math.min(hits.length(), start + HITS_PER_PAGE);
        for (int i = start; i < end; i++) {

          if (raw) {                              // output raw format
            System.out.println("doc="+hits.id(i)+" score="+hits.score(i));
            continue;
          }

          Document doc = hits.doc(i);

          String text = hits.doc(i).get(field); //String field = "contents"; 		  
    	  TokenStream tokenStream = analyzer.tokenStream(field, new
StringReader(text));
    	  // Get 3 best fragments and seperate with a "..."
    	  String result = highlighter.getBestFragments(tokenStream, text, 3,
"...");
          
          
          String path = doc.get("path");
          if (path != null) {
            System.out.println((i+1) + ". " +
path);System.out.println("\t"+result);           
            String title = doc.get("title");
            if (title != null) {
              System.out.println("   Title: " + doc.get("title"));
            }
          } else {
            System.out.println((i+1) + ". " + "No path for this document");
          }
        }

The bolded text was added by me. I hope, i have made the changes in the
right file :)
-- 
View this message in context: http://www.nabble.com/Newbie-Search-Question-tf2667479.html#a7513261
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: Newbie Search Question

Posted by Erick Erickson <er...@gmail.com>.

So why not assign a string to "text" and try it again? Or show us the code
where you expect the text variable to get a value.....

Erick

On 11/23/06, sirakov <si...@gmail.com> wrote:
>
>
> thanky you for the info :) i've showed and tried to use the example, but I
> get following error message
>
> Query: test
> Searching for: test
> Fields are:
> java.lang. NullPointerException
>          at java.io. String reader. <init> (Unknown Source)
>          at org.apache.lucene.demo. SearchFiles.main (SearchFiles.java
> :158)
> 15 totally matching documents
>
> Line 158 is:
>
> TokenStream tokenStream = analyzer.tokenStream (field, new string reader
> (text));
>
> The variable "text" is a "null".
>
> my question is: should I index the files differently or should I use some
> parameters?
> --
> View this message in context:
> http://www.nabble.com/Newbie-Search-Question-tf2667479.html#a7506220
> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Re: Newbie Search Question

Posted by sirakov <si...@gmail.com>.

thanky you for the info :) i've showed and tried to use the example, but I
get following error message

Query: test
Searching for: test
Fields are:
java.lang. NullPointerException
	 at java.io. String reader. <init> (Unknown Source)
	 at org.apache.lucene.demo. SearchFiles.main (SearchFiles.java:158)
15 totally matching documents

Line 158 is:

TokenStream tokenStream = analyzer.tokenStream (field, new string reader
(text));

The variable "text" is a "null".

my question is: should I index the files differently or should I use some
parameters?
-- 
View this message in context: http://www.nabble.com/Newbie-Search-Question-tf2667479.html#a7506220
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: Newbie Search Question

Posted by Erik Hatcher <er...@ehatchersolutions.com>.

On Nov 20, 2006, at 3:20 AM, sirakov wrote:
> E:\Temp>java org.apache.lucene.demo.SearchFiles
> Query: Kulturthema
> Searching for: kulturthema
> 2 total matching documents
> 1. E:\Temp\linux-vani\Programm_SS06.rtf
> 2. E:\Temp\linux-vani\Programm_SS06.doc
> Query:
>
>
> But how I can insert some text into the search results. I must use the
> highligter or what? Unfortunately, I find no tips about that under
> http://lucene.apache.org/java/docs/gettingstarted.html.

The demo code does not tap into highlighting.  To integrate the  
highlighter, you'll use its API and add it to the Java code that  
outputs the results.  It's API is here:

	<http://lucene.apache.org/java/docs/api/org/apache/lucene/search/ 
highlight/package-summary.html>

You'll learn more about the usage of the API by perusing the test  
cases available along with the source code (either in the source  
distribution or Subversion directly).

	Erik

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org