You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Hetan Shah <He...@Sun.COM> on 2004/12/22 23:01:16 UTC

Search Result Text

All,

This might be asked earlier please point to the earlier post or any 
pointers would be appreciated.
I have bunch of HTML pages which I index using IndexHTML. My dilemma is 
when I want to search the pages and then display the results the text 
that I use for the results snippet do not get the data from the body of 
the page it just gets the top portion of the page. How do I control what 
I want to show in the result text.

IndexSearcher searcher = null;
query = QueryParser.parse(queryString, "contents", analyzer);
hits = searcher.search(query);                       

I am currently using
TokenStream tokenStream = new StandardAnalyzer().tokenStream("f", new 
StringReader(doc.get("summary")));
String result = highlighter.getBestFragments(tokenStream, 
doc.get("summary"), 3, "...");


e.g.
Search Results
*Product Name: *Computer systems PAMIR?? 
<javascript:processDetailWizard(4005512);>
*Company Name: *ASE Group / Advanced system engineering
sun.com How To Buy  |  My Sun  |  Worldwide Sites               [Sun 
Microsystems Logo] [Products and Services]   [Support

*Product Name: *ODC-SOL <javascript:processDetailWizard(5363);>
*Company Name: *INSTAR Corporation
sun.com How To Buy  |  My Sun  |  Worldwide Sites               [Sun 
Microsystems Logo] [Products and Services]   [Support


The text in red color is the problem.


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Re: Search Result Text

Posted by Erik Hatcher <er...@ehatchersolutions.com>.
The demo IndexHTML does not store the "contents" field - it is indexed 
using a Reader and thus not stored.  You will have to modify the code 
to get the complete contents available at search time.

	Erik

On Dec 22, 2004, at 5:01 PM, Hetan Shah wrote:

> All,
>
> This might be asked earlier please point to the earlier post or any 
> pointers would be appreciated.
> I have bunch of HTML pages which I index using IndexHTML. My dilemma 
> is when I want to search the pages and then display the results the 
> text that I use for the results snippet do not get the data from the 
> body of the page it just gets the top portion of the page. How do I 
> control what I want to show in the result text.
>
> IndexSearcher searcher = null;
> query = QueryParser.parse(queryString, "contents", analyzer);
> hits = searcher.search(query);
> I am currently using
> TokenStream tokenStream = new StandardAnalyzer().tokenStream("f", new 
> StringReader(doc.get("summary")));
> String result = highlighter.getBestFragments(tokenStream, 
> doc.get("summary"), 3, "...");
>
>
> e.g.
> Search Results
> *Product Name: *Computer systems PAMIR?? 
> <javascript:processDetailWizard(4005512);>
> *Company Name: *ASE Group / Advanced system engineering
> sun.com How To Buy  |  My Sun  |  Worldwide Sites               [Sun 
> Microsystems Logo] [Products and Services]   [Support
>
> *Product Name: *ODC-SOL <javascript:processDetailWizard(5363);>
> *Company Name: *INSTAR Corporation
> sun.com How To Buy  |  My Sun  |  Worldwide Sites               [Sun 
> Microsystems Logo] [Products and Services]   [Support
>
>
> The text in red color is the problem.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org