You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by sirakov <si...@gmail.com> on 2006/11/20 09:20:31 UTC
Newbie Search Question
Hello,
the indexation works, as well as the search.
E:\Temp>java org.apache.lucene.demo.IndexFiles E:\Temp\linux-vani
Indexing to directory 'index'...
adding E:\Temp\linux-vani\mngenf.pdf
adding E:\Temp\linux-vani\VVZSS2006.pdf
adding E:\Temp\linux-vani\komm.Vlvz-SS06-Hp.pdf
adding E:\Temp\linux-vani\ostsuednf.pdf
adding E:\Temp\linux-vani\ects_2006s.pdf
adding E:\Temp\linux-vani\Programm_SS06.doc
adding E:\Temp\linux-vani\Programm_SS06.kwd
adding E:\Temp\linux-vani\bulso-nf.pdf
adding E:\Temp\linux-vani\Programm_SS06.rtf
Optimizing...
2624 total milliseconds
E:\Temp>java org.apache.lucene.demo.SearchFiles
Query: Kulturthema
Searching for: kulturthema
2 total matching documents
1. E:\Temp\linux-vani\Programm_SS06.rtf
2. E:\Temp\linux-vani\Programm_SS06.doc
Query:
But how I can insert some text into the search results. I must use the
highligter or what? Unfortunately, I find no tips about that under
http://lucene.apache.org/java/docs/gettingstarted.html.
Thanks in advance,
Sirakov
--
View this message in context: http://www.nabble.com/Newbie-Search-Question-tf2667479.html#a7438630
Sent from the Lucene - Java Users mailing list archive at Nabble.com.
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: Newbie Search Question
Posted by sirakov <si...@gmail.com>.
I've read the FAQ, and i have found the following:
You need to store the documents' summary in the index (use Field.Store.YES
when creating that field) and then use the Highlighter from the contrib area
(distributed with Lucene since version 1.9 as
"lucene-highlighter-(version).jar").
If I understand correct, i must create this Field "summary" using
Field.Store.YES and Field.Index.YES, but what kind of parser i must use?
This must look like the example in the HTMLDocument.java:
doc.add(new Field("summary", parser.getSummary(), Field.Store.YES,
Field.Index.YES));
The variable "parser" was declared as HTMLParser, but i cant use this into
the FileDocument.java....can someone give me a hint or help me?
Thanks in advance,
Sirakov
--
View this message in context: http://www.nabble.com/Newbie-Search-Question-tf2667479.html#a7782174
Sent from the Lucene - Java Users mailing list archive at Nabble.com.
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: Newbie Search Question
Posted by sirakov <si...@gmail.com>.
Erick Erickson wrote:
>
> And how are you storing your date? Field.Store.YES? NO? COMPRESSED?
>
I think, here is my problem...I have found this in the FileDocument.java:
doc.add(new Field("contents", new FileReader(f)));
Field.Store.YES is missing, but when I try to put this argument, i become an
error message. I'll try to found a solution for the problem, but if you have
any tips for me - please let me know :)
thanks in advance,
sirakov
--
View this message in context: http://www.nabble.com/Newbie-Search-Question-tf2667479.html#a7555948
Sent from the Lucene - Java Users mailing list archive at Nabble.com.
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: Newbie Search Question
Posted by Erick Erickson <er...@gmail.com>.
If we're still dealing with StringReader(text) throwing an error.... It
really shouldn't unless the document has no field named "contents". Here's
what I'd do...
Get a copy of Luke (google luke lucene) to examine your index.
Figure out what the document ID is that you're blowing up on and look at it
in Luke to be sure that there's text in the contents field.
Watch case etc.
You shouldn't be getting a null here unless 1> your doc ID is not in your
index or 2> your document doesn't have such a field.
And how are you storing your date? Field.Store.YES? NO? COMPRESSED?
Best
Erick
On 11/23/06, sirakov <si...@gmail.com> wrote:
>
>
>
> Erick Erickson wrote:
> >
> > So why not assign a string to "text" and try it again? Or show us the
> code
> > where you expect the text variable to get a value.....
> >
> > Erick
> >
> >
>
> I`m sorry that was a miss from my side.
>
> I've tried to put the simple code into SearchFiles, between
>
> Hits hits = searcher.search(query);
>
> and
>
> String path = doc.get("path");
>
> Here the code:
>
> Highlighter highlighter = new Highlighter(new QueryScorer(query));
>
>
>
> if (repeat > 0) { // repeat & time as
> benchmark
> Date start = new Date();
> for (int i = 0; i < repeat; i++) {
> hits = searcher.search(query);
> }
> Date end = new Date();
> System.out.println("Time: "+(end.getTime()-start.getTime())+"ms");
> }
>
> System.out.println(hits.length() + " total matching documents");
>
> final int HITS_PER_PAGE = 10;
> for (int start = 0; start < hits.length(); start += HITS_PER_PAGE) {
> int end = Math.min(hits.length(), start + HITS_PER_PAGE);
> for (int i = start; i < end; i++) {
>
> if (raw) { // output raw format
> System.out.println("doc="+hits.id(i)+" score="+hits.score(i));
> continue;
> }
>
> Document doc = hits.doc(i);
>
> String text = hits.doc(i).get(field); //String field =
> "contents";
> TokenStream tokenStream = analyzer.tokenStream(field, new
> StringReader(text));
> // Get 3 best fragments and seperate with a "..."
> String result = highlighter.getBestFragments(tokenStream, text,
> 3,
> "...");
>
>
> String path = doc.get("path");
> if (path != null) {
> System.out.println((i+1) + ". " +
> path);System.out.println("\t"+result);
> String title = doc.get("title");
> if (title != null) {
> System.out.println(" Title: " + doc.get("title"));
> }
> } else {
> System.out.println((i+1) + ". " + "No path for this
> document");
> }
> }
>
> The bolded text was added by me. I hope, i have made the changes in the
> right file :)
> --
> View this message in context:
> http://www.nabble.com/Newbie-Search-Question-tf2667479.html#a7513261
> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
Re: Newbie Search Question
Posted by sirakov <si...@gmail.com>.
Erick Erickson wrote:
>
> So why not assign a string to "text" and try it again? Or show us the code
> where you expect the text variable to get a value.....
>
> Erick
>
>
I`m sorry that was a miss from my side.
I've tried to put the simple code into SearchFiles, between
Hits hits = searcher.search(query);
and
String path = doc.get("path");
Here the code:
Highlighter highlighter = new Highlighter(new QueryScorer(query));
if (repeat > 0) { // repeat & time as
benchmark
Date start = new Date();
for (int i = 0; i < repeat; i++) {
hits = searcher.search(query);
}
Date end = new Date();
System.out.println("Time: "+(end.getTime()-start.getTime())+"ms");
}
System.out.println(hits.length() + " total matching documents");
final int HITS_PER_PAGE = 10;
for (int start = 0; start < hits.length(); start += HITS_PER_PAGE) {
int end = Math.min(hits.length(), start + HITS_PER_PAGE);
for (int i = start; i < end; i++) {
if (raw) { // output raw format
System.out.println("doc="+hits.id(i)+" score="+hits.score(i));
continue;
}
Document doc = hits.doc(i);
String text = hits.doc(i).get(field); //String field = "contents";
TokenStream tokenStream = analyzer.tokenStream(field, new
StringReader(text));
// Get 3 best fragments and seperate with a "..."
String result = highlighter.getBestFragments(tokenStream, text, 3,
"...");
String path = doc.get("path");
if (path != null) {
System.out.println((i+1) + ". " +
path);System.out.println("\t"+result);
String title = doc.get("title");
if (title != null) {
System.out.println(" Title: " + doc.get("title"));
}
} else {
System.out.println((i+1) + ". " + "No path for this document");
}
}
The bolded text was added by me. I hope, i have made the changes in the
right file :)
--
View this message in context: http://www.nabble.com/Newbie-Search-Question-tf2667479.html#a7513261
Sent from the Lucene - Java Users mailing list archive at Nabble.com.
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: Newbie Search Question
Posted by Erick Erickson <er...@gmail.com>.
So why not assign a string to "text" and try it again? Or show us the code
where you expect the text variable to get a value.....
Erick
On 11/23/06, sirakov <si...@gmail.com> wrote:
>
>
> thanky you for the info :) i've showed and tried to use the example, but I
> get following error message
>
> Query: test
> Searching for: test
> Fields are:
> java.lang. NullPointerException
> at java.io. String reader. <init> (Unknown Source)
> at org.apache.lucene.demo. SearchFiles.main (SearchFiles.java
> :158)
> 15 totally matching documents
>
> Line 158 is:
>
> TokenStream tokenStream = analyzer.tokenStream (field, new string reader
> (text));
>
> The variable "text" is a "null".
>
> my question is: should I index the files differently or should I use some
> parameters?
> --
> View this message in context:
> http://www.nabble.com/Newbie-Search-Question-tf2667479.html#a7506220
> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
Re: Newbie Search Question
Posted by sirakov <si...@gmail.com>.
thanky you for the info :) i've showed and tried to use the example, but I
get following error message
Query: test
Searching for: test
Fields are:
java.lang. NullPointerException
at java.io. String reader. <init> (Unknown Source)
at org.apache.lucene.demo. SearchFiles.main (SearchFiles.java:158)
15 totally matching documents
Line 158 is:
TokenStream tokenStream = analyzer.tokenStream (field, new string reader
(text));
The variable "text" is a "null".
my question is: should I index the files differently or should I use some
parameters?
--
View this message in context: http://www.nabble.com/Newbie-Search-Question-tf2667479.html#a7506220
Sent from the Lucene - Java Users mailing list archive at Nabble.com.
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: Newbie Search Question
Posted by Erik Hatcher <er...@ehatchersolutions.com>.
On Nov 20, 2006, at 3:20 AM, sirakov wrote:
> E:\Temp>java org.apache.lucene.demo.SearchFiles
> Query: Kulturthema
> Searching for: kulturthema
> 2 total matching documents
> 1. E:\Temp\linux-vani\Programm_SS06.rtf
> 2. E:\Temp\linux-vani\Programm_SS06.doc
> Query:
>
>
> But how I can insert some text into the search results. I must use the
> highligter or what? Unfortunately, I find no tips about that under
> http://lucene.apache.org/java/docs/gettingstarted.html.
The demo code does not tap into highlighting. To integrate the
highlighter, you'll use its API and add it to the Java code that
outputs the results. It's API is here:
<http://lucene.apache.org/java/docs/api/org/apache/lucene/search/
highlight/package-summary.html>
You'll learn more about the usage of the API by perusing the test
cases available along with the source code (either in the source
distribution or Subversion directly).
Erik
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org