You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Dmitry Kan <dm...@gmail.com> on 2011/07/06 09:23:04 UTC

solr.StandardTokenizerFactory: more info needed

Hi all!

solr.StandardTokenizerFactory -- is it possible to see the full description
of its behaviour for solr.1.4 somewhere? Wiki
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.StandardTokenizerFactory
is
very short..

-- 
Regards,

Dmitry Kan

Re: solr.StandardTokenizerFactory: more info needed

Posted by Dmitry Kan <dm...@gmail.com>.
Thanks, Erick.

On Wed, Jul 6, 2011 at 6:27 PM, Erick Erickson <er...@gmail.com>wrote:

> See ..src/test/org/apache/solr/analysis.
>
> But... you'll be changing the grammar, so
> I don't know how tests would actually help you. Actually
> I'd expect them to break. And you'd have to write some
> new ones of your own to exercise your changes to insure
> that they do what you want....
>
> Best
> Erick
>
> On Wed, Jul 6, 2011 at 9:31 AM, Dmitry Kan <dm...@gmail.com> wrote:
> > OK, thanks. Do you know if there are tokenizer specific tests to run
> after
> > compilation?
> >
> > On Wed, Jul 6, 2011 at 4:25 PM, Steven A Rowe <sa...@syr.edu> wrote:
> >
> >> Yes, you can change the rules and recompile.
> >>
> >> Before you recompile, you have to run 'ant jflex' to  generate the java
> >> source.
> >>
> >> Steve
> >>
> >> -----Original Message-----
> >> From: Dmitry Kan [mailto:dmitry.kan@gmail.com]
> >> Sent: Wednesday, July 06, 2011 9:21 AM
> >> To: solr-user@lucene.apache.org
> >> Subject: Re: solr.StandardTokenizerFactory: more info needed
> >>
> >> Hi Steven,
> >>
> >> This looks very good. Thanks. Do I understand correctly, that I were to
> >> change the tokenizer rules, I could go and change e.g. the token class
> >> definitions (like <NUM>) in this file and recompile the code?
> >>
> >> On Wed, Jul 6, 2011 at 3:45 PM, Steven A Rowe <sa...@syr.edu> wrote:
> >>
> >> > Hi Dmitry,
> >> >
> >> > The underlying Lucene implementation is here:
> >> >
> http://svn.apache.org/repos/asf/lucene/java/tags/lucene_2_9_1/src/java
> >> > /org/apache/lucene/analysis/standard/
> >> >
> >> > StandardTokenizerImpl.jflex is probably where you should start.
> >> >
> >> > Steve
> >> >
> >> > -----Original Message-----
> >> > From: Dmitry Kan [mailto:dmitry.kan@gmail.com]
> >> > Sent: Wednesday, July 06, 2011 3:23 AM
> >> > To: solr-user@lucene.apache.org
> >> > Subject: solr.StandardTokenizerFactory: more info needed
> >> >
> >> > Hi all!
> >> >
> >> > solr.StandardTokenizerFactory -- is it possible to see the full
> >> > description of its behaviour for solr.1.4 somewhere? Wiki
> >> >
> http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.Stand
> >> > ardTokenizerFactory
> >> > is
> >> > very short..
> >> >
> >> > --
> >> > Regards,
> >> >
> >> > Dmitry Kan
> >> >
> >>
> >>
> >>
> >> --
> >> Regards,
> >>
> >> Dmitry Kan
> >>
> >
> >
> >
> > --
> > Regards,
> >
> > Dmitry Kan
> >
>



-- 
Regards,

Dmitry Kan

Re: solr.StandardTokenizerFactory: more info needed

Posted by Erick Erickson <er...@gmail.com>.
See ..src/test/org/apache/solr/analysis.

But... you'll be changing the grammar, so
I don't know how tests would actually help you. Actually
I'd expect them to break. And you'd have to write some
new ones of your own to exercise your changes to insure
that they do what you want....

Best
Erick

On Wed, Jul 6, 2011 at 9:31 AM, Dmitry Kan <dm...@gmail.com> wrote:
> OK, thanks. Do you know if there are tokenizer specific tests to run after
> compilation?
>
> On Wed, Jul 6, 2011 at 4:25 PM, Steven A Rowe <sa...@syr.edu> wrote:
>
>> Yes, you can change the rules and recompile.
>>
>> Before you recompile, you have to run 'ant jflex' to  generate the java
>> source.
>>
>> Steve
>>
>> -----Original Message-----
>> From: Dmitry Kan [mailto:dmitry.kan@gmail.com]
>> Sent: Wednesday, July 06, 2011 9:21 AM
>> To: solr-user@lucene.apache.org
>> Subject: Re: solr.StandardTokenizerFactory: more info needed
>>
>> Hi Steven,
>>
>> This looks very good. Thanks. Do I understand correctly, that I were to
>> change the tokenizer rules, I could go and change e.g. the token class
>> definitions (like <NUM>) in this file and recompile the code?
>>
>> On Wed, Jul 6, 2011 at 3:45 PM, Steven A Rowe <sa...@syr.edu> wrote:
>>
>> > Hi Dmitry,
>> >
>> > The underlying Lucene implementation is here:
>> > http://svn.apache.org/repos/asf/lucene/java/tags/lucene_2_9_1/src/java
>> > /org/apache/lucene/analysis/standard/
>> >
>> > StandardTokenizerImpl.jflex is probably where you should start.
>> >
>> > Steve
>> >
>> > -----Original Message-----
>> > From: Dmitry Kan [mailto:dmitry.kan@gmail.com]
>> > Sent: Wednesday, July 06, 2011 3:23 AM
>> > To: solr-user@lucene.apache.org
>> > Subject: solr.StandardTokenizerFactory: more info needed
>> >
>> > Hi all!
>> >
>> > solr.StandardTokenizerFactory -- is it possible to see the full
>> > description of its behaviour for solr.1.4 somewhere? Wiki
>> > http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.Stand
>> > ardTokenizerFactory
>> > is
>> > very short..
>> >
>> > --
>> > Regards,
>> >
>> > Dmitry Kan
>> >
>>
>>
>>
>> --
>> Regards,
>>
>> Dmitry Kan
>>
>
>
>
> --
> Regards,
>
> Dmitry Kan
>

Re: solr.StandardTokenizerFactory: more info needed

Posted by Dmitry Kan <dm...@gmail.com>.
OK, thanks. Do you know if there are tokenizer specific tests to run after
compilation?

On Wed, Jul 6, 2011 at 4:25 PM, Steven A Rowe <sa...@syr.edu> wrote:

> Yes, you can change the rules and recompile.
>
> Before you recompile, you have to run 'ant jflex' to  generate the java
> source.
>
> Steve
>
> -----Original Message-----
> From: Dmitry Kan [mailto:dmitry.kan@gmail.com]
> Sent: Wednesday, July 06, 2011 9:21 AM
> To: solr-user@lucene.apache.org
> Subject: Re: solr.StandardTokenizerFactory: more info needed
>
> Hi Steven,
>
> This looks very good. Thanks. Do I understand correctly, that I were to
> change the tokenizer rules, I could go and change e.g. the token class
> definitions (like <NUM>) in this file and recompile the code?
>
> On Wed, Jul 6, 2011 at 3:45 PM, Steven A Rowe <sa...@syr.edu> wrote:
>
> > Hi Dmitry,
> >
> > The underlying Lucene implementation is here:
> > http://svn.apache.org/repos/asf/lucene/java/tags/lucene_2_9_1/src/java
> > /org/apache/lucene/analysis/standard/
> >
> > StandardTokenizerImpl.jflex is probably where you should start.
> >
> > Steve
> >
> > -----Original Message-----
> > From: Dmitry Kan [mailto:dmitry.kan@gmail.com]
> > Sent: Wednesday, July 06, 2011 3:23 AM
> > To: solr-user@lucene.apache.org
> > Subject: solr.StandardTokenizerFactory: more info needed
> >
> > Hi all!
> >
> > solr.StandardTokenizerFactory -- is it possible to see the full
> > description of its behaviour for solr.1.4 somewhere? Wiki
> > http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.Stand
> > ardTokenizerFactory
> > is
> > very short..
> >
> > --
> > Regards,
> >
> > Dmitry Kan
> >
>
>
>
> --
> Regards,
>
> Dmitry Kan
>



-- 
Regards,

Dmitry Kan

RE: solr.StandardTokenizerFactory: more info needed

Posted by Steven A Rowe <sa...@syr.edu>.
Yes, you can change the rules and recompile.

Before you recompile, you have to run 'ant jflex' to  generate the java source.

Steve

-----Original Message-----
From: Dmitry Kan [mailto:dmitry.kan@gmail.com] 
Sent: Wednesday, July 06, 2011 9:21 AM
To: solr-user@lucene.apache.org
Subject: Re: solr.StandardTokenizerFactory: more info needed

Hi Steven,

This looks very good. Thanks. Do I understand correctly, that I were to change the tokenizer rules, I could go and change e.g. the token class definitions (like <NUM>) in this file and recompile the code?

On Wed, Jul 6, 2011 at 3:45 PM, Steven A Rowe <sa...@syr.edu> wrote:

> Hi Dmitry,
>
> The underlying Lucene implementation is here:
> http://svn.apache.org/repos/asf/lucene/java/tags/lucene_2_9_1/src/java
> /org/apache/lucene/analysis/standard/
>
> StandardTokenizerImpl.jflex is probably where you should start.
>
> Steve
>
> -----Original Message-----
> From: Dmitry Kan [mailto:dmitry.kan@gmail.com]
> Sent: Wednesday, July 06, 2011 3:23 AM
> To: solr-user@lucene.apache.org
> Subject: solr.StandardTokenizerFactory: more info needed
>
> Hi all!
>
> solr.StandardTokenizerFactory -- is it possible to see the full 
> description of its behaviour for solr.1.4 somewhere? Wiki 
> http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.Stand
> ardTokenizerFactory
> is
> very short..
>
> --
> Regards,
>
> Dmitry Kan
>



--
Regards,

Dmitry Kan

Re: solr.StandardTokenizerFactory: more info needed

Posted by Dmitry Kan <dm...@gmail.com>.
Hi Steven,

This looks very good. Thanks. Do I understand correctly, that I were to
change the tokenizer rules, I could go and change e.g. the token class
definitions (like <NUM>) in this file and recompile the code?

On Wed, Jul 6, 2011 at 3:45 PM, Steven A Rowe <sa...@syr.edu> wrote:

> Hi Dmitry,
>
> The underlying Lucene implementation is here:
> http://svn.apache.org/repos/asf/lucene/java/tags/lucene_2_9_1/src/java/org/apache/lucene/analysis/standard/
>
> StandardTokenizerImpl.jflex is probably where you should start.
>
> Steve
>
> -----Original Message-----
> From: Dmitry Kan [mailto:dmitry.kan@gmail.com]
> Sent: Wednesday, July 06, 2011 3:23 AM
> To: solr-user@lucene.apache.org
> Subject: solr.StandardTokenizerFactory: more info needed
>
> Hi all!
>
> solr.StandardTokenizerFactory -- is it possible to see the full description
> of its behaviour for solr.1.4 somewhere? Wiki
> http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.StandardTokenizerFactory
> is
> very short..
>
> --
> Regards,
>
> Dmitry Kan
>



-- 
Regards,

Dmitry Kan

RE: solr.StandardTokenizerFactory: more info needed

Posted by Steven A Rowe <sa...@syr.edu>.
Hi Dmitry,

The underlying Lucene implementation is here: http://svn.apache.org/repos/asf/lucene/java/tags/lucene_2_9_1/src/java/org/apache/lucene/analysis/standard/

StandardTokenizerImpl.jflex is probably where you should start.

Steve

-----Original Message-----
From: Dmitry Kan [mailto:dmitry.kan@gmail.com] 
Sent: Wednesday, July 06, 2011 3:23 AM
To: solr-user@lucene.apache.org
Subject: solr.StandardTokenizerFactory: more info needed

Hi all!

solr.StandardTokenizerFactory -- is it possible to see the full description of its behaviour for solr.1.4 somewhere? Wiki http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.StandardTokenizerFactory
is
very short..

--
Regards,

Dmitry Kan