You are viewing a plain text version of this content. The canonical link for it is here.

Posted to java-user@lucene.apache.org by Yagnesh Shah <ys...@hwwilson.com> on 2005/04/01 01:36:18 UTC

RE: HTML pages highlighter

Hi! Eric,
	I have modified HTMLDocument.java try section to used doc.add(Field.Text("contents", l)); I am able to compile with following warning about depricated API. But I am still unable to see any value.

############ compile warning #########
compile-demo:
    [javac] Compiling 1 source file to /opt/dynamo/trunk/build/classes/demo
    [javac] Note: /opt/dynamo/trunk/src/demo/org/apache/lucene/demo/HTMLDocument
.java uses or overrides a deprecated API.
    [javac] Note: Recompile with -deprecation for details.

jar-demo:
      [jar] Building jar: /opt/dynamo/trunk/build/lucene-demos-1.9-rc1-devYS.jar


############### code change #############

    try {
      fis = new FileInputStream(f);
      HTMLParser parser = new HTMLParser(fis);

      // Add the tag-stripped contents as a Reader-valued Text field so it will
      // get tokenized and indexed.
//      doc.add(new Field("contents", parser.getReader()));
      LineNumberReader reader = new LineNumberReader(parser.getReader());
      for (String l = reader.readLine(); l != null; l = reader.readLine())
//        System.out.println(l);
      doc.add(Field.Text("contents", l));

      // Add the summary as a field that is stored and returned with
      // hit documents for display.
      doc.add(new Field("summary", parser.getSummary(), Field.Store.YES, Field.Index.NO));

      // Add the title as a field that it can be searched and that is stored.
      doc.add(new Field("title", parser.getTitle(), Field.Store.YES, Field.Index.TOKENIZED));
    }



-----Original Message-----
From: Erik Hatcher [mailto:erik@ehatchersolutions.com]
Sent: Wednesday, March 30, 2005 7:38 PM
To: java-user@lucene.apache.org
Subject: Re: HTML pages highlighter



On Mar 30, 2005, at 4:46 PM, Yagnesh Shah wrote:

> Hi! Eric,

Erik - with a 'k' - Sorry, I let it slide once though :)

> 	I try to modified that with this but I get compile error. Do you have 
> any code snippet of highlighting code to pull the contents from the 
> original source?

I have a whole book full of code examples :)   
http://www.lucenebook.com - Grab the source code and look in 
src/lia/tools at Highlight*.java

>  or Do you know how I can do field store?
>
>       doc.add(new Field("contents", parser.getReader(), 
> Field.Store.YES, Field.Index.NO));

You cannot store it with a Reader.  You need to use Field.Text(String, 
String), or one of the other variations.

	Erik


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: HTML pages highlighter

Posted by Erik Hatcher <er...@ehatchersolutions.com>.

On Mar 31, 2005, at 6:36 PM, Yagnesh Shah wrote:
>     try {
>       fis = new FileInputStream(f);
>       HTMLParser parser = new HTMLParser(fis);
>
>       // Add the tag-stripped contents as a Reader-valued Text field 
> so it will
>       // get tokenized and indexed.
> //      doc.add(new Field("contents", parser.getReader()));
>       LineNumberReader reader = new 
> LineNumberReader(parser.getReader());
>       for (String l = reader.readLine(); l != null; l = 
> reader.readLine())
> //        System.out.println(l);
>       doc.add(Field.Text("contents", l));

Notice that your loop here is adding a "contents" field for *every* 
line read since that is where the first semi-colon is.

Look at using Luke to explore your index.  Try indexing just a dummy 
String:

	doc.add(Field.Text("contents", "some dummy text"));

to show that it works.  Always always always simplify a complicated 
situation by doing the most obvious thing that _should_ work.

Also, the demo Lucene code is not really designed to be used in a 
production application (sadly), so you're better off borrowing code 
from the many articles or our book to begin with.

	Erik


>
>       // Add the summary as a field that is stored and returned with
>       // hit documents for display.
>       doc.add(new Field("summary", parser.getSummary(), 
> Field.Store.YES, Field.Index.NO));
>
>       // Add the title as a field that it can be searched and that is 
> stored.
>       doc.add(new Field("title", parser.getTitle(), Field.Store.YES, 
> Field.Index.TOKENIZED));
>     }
>
>
>
> -----Original Message-----
> From: Erik Hatcher [mailto:erik@ehatchersolutions.com]
> Sent: Wednesday, March 30, 2005 7:38 PM
> To: java-user@lucene.apache.org
> Subject: Re: HTML pages highlighter
>
>
>
> On Mar 30, 2005, at 4:46 PM, Yagnesh Shah wrote:
>
>> Hi! Eric,
>
> Erik - with a 'k' - Sorry, I let it slide once though :)
>
>> 	I try to modified that with this but I get compile error. Do you have
>> any code snippet of highlighting code to pull the contents from the
>> original source?
>
> I have a whole book full of code examples :)
> http://www.lucenebook.com - Grab the source code and look in
> src/lia/tools at Highlight*.java
>
>>  or Do you know how I can do field store?
>>
>>       doc.add(new Field("contents", parser.getReader(),
>> Field.Store.YES, Field.Index.NO));
>
> You cannot store it with a Reader.  You need to use Field.Text(String,
> String), or one of the other variations.
>
> 	Erik
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org