You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Yagnesh Shah <ys...@hwwilson.com> on 2005/04/01 01:36:18 UTC
RE: HTML pages highlighter
Hi! Eric,
I have modified HTMLDocument.java try section to used doc.add(Field.Text("contents", l)); I am able to compile with following warning about depricated API. But I am still unable to see any value.
############ compile warning #########
compile-demo:
[javac] Compiling 1 source file to /opt/dynamo/trunk/build/classes/demo
[javac] Note: /opt/dynamo/trunk/src/demo/org/apache/lucene/demo/HTMLDocument
.java uses or overrides a deprecated API.
[javac] Note: Recompile with -deprecation for details.
jar-demo:
[jar] Building jar: /opt/dynamo/trunk/build/lucene-demos-1.9-rc1-devYS.jar
############### code change #############
try {
fis = new FileInputStream(f);
HTMLParser parser = new HTMLParser(fis);
// Add the tag-stripped contents as a Reader-valued Text field so it will
// get tokenized and indexed.
// doc.add(new Field("contents", parser.getReader()));
LineNumberReader reader = new LineNumberReader(parser.getReader());
for (String l = reader.readLine(); l != null; l = reader.readLine())
// System.out.println(l);
doc.add(Field.Text("contents", l));
// Add the summary as a field that is stored and returned with
// hit documents for display.
doc.add(new Field("summary", parser.getSummary(), Field.Store.YES, Field.Index.NO));
// Add the title as a field that it can be searched and that is stored.
doc.add(new Field("title", parser.getTitle(), Field.Store.YES, Field.Index.TOKENIZED));
}
-----Original Message-----
From: Erik Hatcher [mailto:erik@ehatchersolutions.com]
Sent: Wednesday, March 30, 2005 7:38 PM
To: java-user@lucene.apache.org
Subject: Re: HTML pages highlighter
On Mar 30, 2005, at 4:46 PM, Yagnesh Shah wrote:
> Hi! Eric,
Erik - with a 'k' - Sorry, I let it slide once though :)
> I try to modified that with this but I get compile error. Do you have
> any code snippet of highlighting code to pull the contents from the
> original source?
I have a whole book full of code examples :)
http://www.lucenebook.com - Grab the source code and look in
src/lia/tools at Highlight*.java
> or Do you know how I can do field store?
>
> doc.add(new Field("contents", parser.getReader(),
> Field.Store.YES, Field.Index.NO));
You cannot store it with a Reader. You need to use Field.Text(String,
String), or one of the other variations.
Erik
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: HTML pages highlighter
Posted by Erik Hatcher <er...@ehatchersolutions.com>.
On Mar 31, 2005, at 6:36 PM, Yagnesh Shah wrote:
> try {
> fis = new FileInputStream(f);
> HTMLParser parser = new HTMLParser(fis);
>
> // Add the tag-stripped contents as a Reader-valued Text field
> so it will
> // get tokenized and indexed.
> // doc.add(new Field("contents", parser.getReader()));
> LineNumberReader reader = new
> LineNumberReader(parser.getReader());
> for (String l = reader.readLine(); l != null; l =
> reader.readLine())
> // System.out.println(l);
> doc.add(Field.Text("contents", l));
Notice that your loop here is adding a "contents" field for *every*
line read since that is where the first semi-colon is.
Look at using Luke to explore your index. Try indexing just a dummy
String:
doc.add(Field.Text("contents", "some dummy text"));
to show that it works. Always always always simplify a complicated
situation by doing the most obvious thing that _should_ work.
Also, the demo Lucene code is not really designed to be used in a
production application (sadly), so you're better off borrowing code
from the many articles or our book to begin with.
Erik
>
> // Add the summary as a field that is stored and returned with
> // hit documents for display.
> doc.add(new Field("summary", parser.getSummary(),
> Field.Store.YES, Field.Index.NO));
>
> // Add the title as a field that it can be searched and that is
> stored.
> doc.add(new Field("title", parser.getTitle(), Field.Store.YES,
> Field.Index.TOKENIZED));
> }
>
>
>
> -----Original Message-----
> From: Erik Hatcher [mailto:erik@ehatchersolutions.com]
> Sent: Wednesday, March 30, 2005 7:38 PM
> To: java-user@lucene.apache.org
> Subject: Re: HTML pages highlighter
>
>
>
> On Mar 30, 2005, at 4:46 PM, Yagnesh Shah wrote:
>
>> Hi! Eric,
>
> Erik - with a 'k' - Sorry, I let it slide once though :)
>
>> I try to modified that with this but I get compile error. Do you have
>> any code snippet of highlighting code to pull the contents from the
>> original source?
>
> I have a whole book full of code examples :)
> http://www.lucenebook.com - Grab the source code and look in
> src/lia/tools at Highlight*.java
>
>> or Do you know how I can do field store?
>>
>> doc.add(new Field("contents", parser.getReader(),
>> Field.Store.YES, Field.Index.NO));
>
> You cannot store it with a Reader. You need to use Field.Text(String,
> String), or one of the other variations.
>
> Erik
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org