You are viewing a plain text version of this content. The canonical link for it is here.

Posted to java-user@lucene.apache.org by karen bran <kg...@yahoo.com> on 2002/08/14 18:33:35 UTC

Help on Hacking HTMLParser.jj file to make the jsp tag <%%> as a comment tag

My problem is not on how to indexing the jsp files. By adding one extra else statement in the IndexHTML.java such as  file.getPath().endsWith(".jsp") , you can easily indexing the .jsp files as well.
My real problem is after the jsp files is indexed, the search result will show the jsp tags <%@page import=............... as the search Summary text. I tried to hack the HTMLParser.jj file and make the jsp tag <% %> as a comment tag. After doing this hack, the search result still shows up the <%@page import=...... as the summary text. So I believe the comment tag hack I did is not successful because I don't understand the garmmar of the Javacc. I just copy the code of the existing commet2 Tag definations in the HTMLParser.jj.
If someone had the experience hacking the HTMLParser.jj to make jsp tag as a new comment tag, Please help me out! 
I appreciate for any help!
 
 
 
 
 don.hillmuth@ps.ge.com wrote:Karen,

I need to index .jsp files too. I was wondering if you made any progress
using a web-crawler?

Don 

Don Hillmuth
GE Network Solutions
don.hillmuth@ps.ge.com
303-268-6164


---------------------------------
Do You Yahoo!?
HotJobs, a Yahoo! service - Search Thousands of New Jobs

Re: problems with HTML Parser

Posted by Ben Litchfield <be...@csh.rit.edu>.

Maurits,

You can get a PDF parser from http://www.pdfbox.org

-Ben


On Wed, 14 Aug 2002, Maurits van Wijland wrote:

> Keith,
>
> I haven't noticed the problem with the Parser...but you trigger me
> by saying that you have a PDFParser!!!
>
> Are you able to contribute this PDFParser??
>
> Maurits.


--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>

Re: problems with HTML Parser

Posted by Keith Gunn <kg...@csd.abdn.ac.uk>.

If your parsing html files have a check in lucene
to see the terms that are index and see if you can
spot any joined terms.

The PDF parser as you can see from the other mail is from
www.pdfbox.org and i highly recommend it (thanks again Ben!)




On Wed, 14 Aug 2002, Maurits van Wijland wrote:

> Keith,
>
> I haven't noticed the problem with the Parser...but you trigger me
> by saying that you have a PDFParser!!!
>
> Are you able to contribute this PDFParser??
>
> Maurits.
> ----- Original Message -----
> From: "Keith Gunn" <kg...@csd.abdn.ac.uk>
> To: "Lucene Users List" <lu...@jakarta.apache.org>
> Sent: Wednesday, August 14, 2002 9:46 AM
> Subject: problems with HTML Parser
>
>
> > Has anyone noticed that the HTML Parser that comes with
> > Lucene joins terms together when parsing a file.
> > I used to think it was my PDFParser but after fixing that
> > I found out it was the HMTLParser.
> >
> > I managed to find a replacement parser that doesn't join terms.
> >
> > Just wondered if anyone had come across this problem??
> >
> >
> >
> >
> > --
> > To unsubscribe, e-mail:
> <ma...@jakarta.apache.org>
> > For additional commands, e-mail:
> <ma...@jakarta.apache.org>
> >
>
>
> --
> To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
> For additional commands, e-mail: <ma...@jakarta.apache.org>
>
>


--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>

Re: problems with HTML Parser

Posted by Maurits van Wijland <m....@quicknet.nl>.

Keith,

I haven't noticed the problem with the Parser...but you trigger me
by saying that you have a PDFParser!!!

Are you able to contribute this PDFParser??

Maurits.
----- Original Message -----
From: "Keith Gunn" <kg...@csd.abdn.ac.uk>
To: "Lucene Users List" <lu...@jakarta.apache.org>
Sent: Wednesday, August 14, 2002 9:46 AM
Subject: problems with HTML Parser


> Has anyone noticed that the HTML Parser that comes with
> Lucene joins terms together when parsing a file.
> I used to think it was my PDFParser but after fixing that
> I found out it was the HMTLParser.
>
> I managed to find a replacement parser that doesn't join terms.
>
> Just wondered if anyone had come across this problem??
>
>
>
>
> --
> To unsubscribe, e-mail:
<ma...@jakarta.apache.org>
> For additional commands, e-mail:
<ma...@jakarta.apache.org>
>


--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>

XML indexing in lucene

Posted by jamin rubio <jr...@jouve.fr>.

Hi all,

I 'm a newbie to lucene, and i have a question . Is that possible that
Lucene just indexed a modified field in a index  without re-indexing all the
document ? Can Lucene do Partial Indexing ?

Cheers


--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>

problems with HTML Parser

Posted by Keith Gunn <kg...@csd.abdn.ac.uk>.

Has anyone noticed that the HTML Parser that comes with
Lucene joins terms together when parsing a file.
I used to think it was my PDFParser but after fixing that
I found out it was the HMTLParser.

I managed to find a replacement parser that doesn't join terms.

Just wondered if anyone had come across this problem??




--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>