You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Hetan Shah <He...@Sun.COM> on 2004/12/22 23:01:16 UTC
Search Result Text
All,
This might be asked earlier please point to the earlier post or any
pointers would be appreciated.
I have bunch of HTML pages which I index using IndexHTML. My dilemma is
when I want to search the pages and then display the results the text
that I use for the results snippet do not get the data from the body of
the page it just gets the top portion of the page. How do I control what
I want to show in the result text.
IndexSearcher searcher = null;
query = QueryParser.parse(queryString, "contents", analyzer);
hits = searcher.search(query);
I am currently using
TokenStream tokenStream = new StandardAnalyzer().tokenStream("f", new
StringReader(doc.get("summary")));
String result = highlighter.getBestFragments(tokenStream,
doc.get("summary"), 3, "...");
e.g.
Search Results
*Product Name: *Computer systems PAMIR??
<javascript:processDetailWizard(4005512);>
*Company Name: *ASE Group / Advanced system engineering
sun.com How To Buy | My Sun | Worldwide Sites [Sun
Microsystems Logo] [Products and Services] [Support
*Product Name: *ODC-SOL <javascript:processDetailWizard(5363);>
*Company Name: *INSTAR Corporation
sun.com How To Buy | My Sun | Worldwide Sites [Sun
Microsystems Logo] [Products and Services] [Support
The text in red color is the problem.
---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org
Re: Search Result Text
Posted by Erik Hatcher <er...@ehatchersolutions.com>.
The demo IndexHTML does not store the "contents" field - it is indexed
using a Reader and thus not stored. You will have to modify the code
to get the complete contents available at search time.
Erik
On Dec 22, 2004, at 5:01 PM, Hetan Shah wrote:
> All,
>
> This might be asked earlier please point to the earlier post or any
> pointers would be appreciated.
> I have bunch of HTML pages which I index using IndexHTML. My dilemma
> is when I want to search the pages and then display the results the
> text that I use for the results snippet do not get the data from the
> body of the page it just gets the top portion of the page. How do I
> control what I want to show in the result text.
>
> IndexSearcher searcher = null;
> query = QueryParser.parse(queryString, "contents", analyzer);
> hits = searcher.search(query);
> I am currently using
> TokenStream tokenStream = new StandardAnalyzer().tokenStream("f", new
> StringReader(doc.get("summary")));
> String result = highlighter.getBestFragments(tokenStream,
> doc.get("summary"), 3, "...");
>
>
> e.g.
> Search Results
> *Product Name: *Computer systems PAMIR??
> <javascript:processDetailWizard(4005512);>
> *Company Name: *ASE Group / Advanced system engineering
> sun.com How To Buy | My Sun | Worldwide Sites [Sun
> Microsystems Logo] [Products and Services] [Support
>
> *Product Name: *ODC-SOL <javascript:processDetailWizard(5363);>
> *Company Name: *INSTAR Corporation
> sun.com How To Buy | My Sun | Worldwide Sites [Sun
> Microsystems Logo] [Products and Services] [Support
>
>
> The text in red color is the problem.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org