You are viewing a plain text version of this content. The canonical link for it is here.

Posted to java-user@lucene.apache.org by Phan The Dai <th...@gmail.com> on 2010/01/27 17:01:15 UTC

Analyze java camelcase words ?

Can everyone suggest me a solution for tokenize the camelcase words in java
?
Examples for camelcase words are: getXmlRule, setTokenizeAnalyzer.
They should be tokenized to get, Xml, Rule, set, Tokenize, Analyzer.

Thank you very much!

Re: Analyze java camelcase words ?

Posted by Phan The Dai <th...@gmail.com>.

Thank you much.
I study about your comments. They are useful.
I am newer using Lucene 3.0. Hope it works well.

On Thu, Jan 28, 2010 at 1:21 AM, Robert Muir <rc...@gmail.com> wrote:

> no, but you can take the tokenfilter itself and simply use it in your
> lucene
> application.
>
> it uses the old tokenstream API so if you want to use Lucene 3.0 or 3.1,
> you
> will need a version that works with the new tokenstream API.
> There is a patch available here for that:
> https://issues.apache.org/jira/browse/SOLR-1710
>
> On Wed, Jan 27, 2010 at 11:17 AM, Erick Erickson <erickerickson@gmail.com
> >wrote:
>
> > Robert:
> >
> > Is this in Lucene yet? According to what I could find in JIRA, it's
> > still open. And it's not in the Javadocs on a quick scan.....
> >
> > Erick
> >
> > On Wed, Jan 27, 2010 at 11:08 AM, Robert Muir <rc...@gmail.com> wrote:
> >
> > > WordDelimiterFilter has a splitOnCaseChange option that should be
> useful
> > > for
> > > this:
> > >
> > >
> >
> http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.WordDelimiterFilterFactory
> > >
> > > From the example: PowerShot -> Power, Shot
> > >
> > > On Wed, Jan 27, 2010 at 11:01 AM, Phan The Dai <
> > thienthanhomenh@gmail.com
> > > >wrote:
> > >
> > > > Can everyone suggest me a solution for tokenize the camelcase words
> in
> > > java
> > > > ?
> > > > Examples for camelcase words are: getXmlRule, setTokenizeAnalyzer.
> > > > They should be tokenized to get, Xml, Rule, set, Tokenize, Analyzer.
> > > >
> > > > Thank you very much!
> > > >
> > >
> > >
> > >
> > > --
> > > Robert Muir
> > > rcmuir@gmail.com
> > >
> >
>
>
>
> --
> Robert Muir
> rcmuir@gmail.com
>

Re: Analyze java camelcase words ?

Posted by Robert Muir <rc...@gmail.com>.

no, but you can take the tokenfilter itself and simply use it in your lucene
application.

it uses the old tokenstream API so if you want to use Lucene 3.0 or 3.1, you
will need a version that works with the new tokenstream API.
There is a patch available here for that:
https://issues.apache.org/jira/browse/SOLR-1710

On Wed, Jan 27, 2010 at 11:17 AM, Erick Erickson <er...@gmail.com>wrote:

> Robert:
>
> Is this in Lucene yet? According to what I could find in JIRA, it's
> still open. And it's not in the Javadocs on a quick scan.....
>
> Erick
>
> On Wed, Jan 27, 2010 at 11:08 AM, Robert Muir <rc...@gmail.com> wrote:
>
> > WordDelimiterFilter has a splitOnCaseChange option that should be useful
> > for
> > this:
> >
> >
> http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.WordDelimiterFilterFactory
> >
> > From the example: PowerShot -> Power, Shot
> >
> > On Wed, Jan 27, 2010 at 11:01 AM, Phan The Dai <
> thienthanhomenh@gmail.com
> > >wrote:
> >
> > > Can everyone suggest me a solution for tokenize the camelcase words in
> > java
> > > ?
> > > Examples for camelcase words are: getXmlRule, setTokenizeAnalyzer.
> > > They should be tokenized to get, Xml, Rule, set, Tokenize, Analyzer.
> > >
> > > Thank you very much!
> > >
> >
> >
> >
> > --
> > Robert Muir
> > rcmuir@gmail.com
> >
>



-- 
Robert Muir
rcmuir@gmail.com

Re: Analyze java camelcase words ?

Posted by Erick Erickson <er...@gmail.com>.

Robert:

Is this in Lucene yet? According to what I could find in JIRA, it's
still open. And it's not in the Javadocs on a quick scan.....

Erick

On Wed, Jan 27, 2010 at 11:08 AM, Robert Muir <rc...@gmail.com> wrote:

> WordDelimiterFilter has a splitOnCaseChange option that should be useful
> for
> this:
>
> http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.WordDelimiterFilterFactory
>
> From the example: PowerShot -> Power, Shot
>
> On Wed, Jan 27, 2010 at 11:01 AM, Phan The Dai <thienthanhomenh@gmail.com
> >wrote:
>
> > Can everyone suggest me a solution for tokenize the camelcase words in
> java
> > ?
> > Examples for camelcase words are: getXmlRule, setTokenizeAnalyzer.
> > They should be tokenized to get, Xml, Rule, set, Tokenize, Analyzer.
> >
> > Thank you very much!
> >
>
>
>
> --
> Robert Muir
> rcmuir@gmail.com
>

Re: Analyze java camelcase words ?

Posted by Robert Muir <rc...@gmail.com>.

WordDelimiterFilter has a splitOnCaseChange option that should be useful for
this:
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.WordDelimiterFilterFactory

>From the example: PowerShot -> Power, Shot

On Wed, Jan 27, 2010 at 11:01 AM, Phan The Dai <th...@gmail.com>wrote:

> Can everyone suggest me a solution for tokenize the camelcase words in java
> ?
> Examples for camelcase words are: getXmlRule, setTokenizeAnalyzer.
> They should be tokenized to get, Xml, Rule, set, Tokenize, Analyzer.
>
> Thank you very much!
>



-- 
Robert Muir
rcmuir@gmail.com