You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@lucenenet.apache.org by Min Yin <yi...@AI.SRI.COM> on 2008/08/05 20:01:39 UTC

StandardTokenizer.cs and StandardTokenizer.jj

Hi there!

I'd like to change a little bit of the grammar rules in 
StandardTokenizer.jj, but I'm not sure how to convert the change to 
StandardTokenizer.cs.

JavaCC will generate a Java file, is there a C# equivalent? or do I have 
to manually change StandardTokenizer.cs myself?

Many thanks in advance!

Min

Re: StandardTokenizer.cs and StandardTokenizer.jj

Posted by Doug Sale <do...@gmail.com>.
I am not aware of a C# equivalent of JavaCC.

The approach used to generate StandardTokenizer.cs was to use Visual
Studio's JLCA (Java Language Conversion Assistant) on the Java code.  Manual
"tweaks" might be necessary post-conversion for anything the JLCA fails to
convert or converts incorrectly.

So, if you are comfortable with JavaCC parser grammar, your process would
be:
- edit grammar file (StandardTokenizer.jj)
- run grammar file thru JavaCC to get Java source (actually yields 2 files,
iirc)
- run Java source thru JLCA to get C# source
- fix any compilation errors and then test

It might be helpful to compare your .cs files to the release versions when
diagnosing any issues.  Also, Lucene 2.2.0 was the last version to use
JavaCC for the StandardTokenizer grammar.  Since 2.3.0, the grammar is
implemented using JFlex.

Doug

On Tue, Aug 5, 2008 at 1:01 PM, Min Yin <yi...@ai.sri.com> wrote:

> Hi there!
>
> I'd like to change a little bit of the grammar rules in
> StandardTokenizer.jj, but I'm not sure how to convert the change to
> StandardTokenizer.cs.
>
> JavaCC will generate a Java file, is there a C# equivalent? or do I have to
> manually change StandardTokenizer.cs myself?
>
> Many thanks in advance!
>
> Min
>