You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Pinky Iyer <pi...@yahoo.com> on 2003/02/28 18:04:02 UTC

htmlParser problem...anybody knowledge with CC

 Hi!
   I am trying to parse some JSP files and i am trying to change the HTMLParser.jj code to accomodate this. As mentioned in the FAQ i created the 3rd comment tags  type in the void CommentTag() :, TOKEN :, and <WithinCommentN> TOKEN : sections of HTMLParser.jj 
Here is it.

void CommentTag() :

{}

{

(<Comment1> ( <CommentText1> )* <CommentEnd1>)

|

(<Comment2> ( <CommentText2> )* <CommentEnd2>)

|

(<Comment3> ( <CommentText3> )* <CommentEnd3>)

}

and the token part has following:

< Comment3: "<%" > : WithinComment3

and withinComment3 is as follows:

<WithinComment3> TOKEN :

{

< CommentText3: (~[">"])+>

| < CommentEnd3: "%>" > : DEFAULT

}

However I get lexical errors when parsing the jsp file which is :

Parse Aborted: Lexical error at line 2, column 96.  Encountered: ">" (62), after
 : ""
Title:
Summary:

and title and summary are not picked up. ANybody has anyidea whats the mistake i am commiting. I do not know the parsing language.....

Anyhelp appreciated!

Thanks!
Pinky



---------------------------------
Do you Yahoo!?
Yahoo! Tax Center - forms, calculators, tips, and more

Re: Word doc parser

Posted by Ryan Ackley <sa...@cfl.rr.com>.
Go to http://www.textmining.org 

----- Original Message ----- 
From: "Pinky Iyer" <pi...@yahoo.com>
To: "Lucene Users List" <lu...@jakarta.apache.org>
Sent: Friday, February 28, 2003 3:44 PM
Subject: Word doc parser


> 
>  Anybody knows of a good word document parsers. 
> Thanks !
> P Iyer
> 
> 
> 
> ---------------------------------
> Do you Yahoo!?
> Yahoo! Tax Center - forms, calculators, tips, and more

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Re: Word doc parser

Posted by Clemens Marschner <cm...@lanlab.de>.
You may want to think about using POI from Jakarta

http://jakarta.apache.org/poi

Clemens

----- Original Message ----- 
From: "Pinky Iyer" <pi...@yahoo.com>
To: "Lucene Users List" <lu...@jakarta.apache.org>
Sent: Friday, February 28, 2003 9:44 PM
Subject: Word doc parser


> 
>  Anybody knows of a good word document parsers. 
> Thanks !
> P Iyer
> 
> 
> 
> ---------------------------------
> Do you Yahoo!?
> Yahoo! Tax Center - forms, calculators, tips, and more

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Word doc parser

Posted by Pinky Iyer <pi...@yahoo.com>.
 Anybody knows of a good word document parsers. 
Thanks !
P Iyer



---------------------------------
Do you Yahoo!?
Yahoo! Tax Center - forms, calculators, tips, and more