You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Xi Shen <da...@gmail.com> on 2012/12/22 15:26:00 UTC

how to implement a TokenFilter?

Hi,

I need a guide to implement my own TokenFilter. I checked the wiki, but I
could not find any useful guide :(


-- 
Regards,
David Shen

http://about.me/davidshen
https://twitter.com/#!/davidshen84

Re: how to implement a TokenFilter?

Posted by Lance Norskog <go...@gmail.com>.
Go to the top directory and do this:
cp dev-tools/eclipse/dot.project .project
cp dev-tools/eclipse/dot.classpath .classpath
cp -r dev-tools/eclipse/dot.settings .settings

The 'ant eclipse' target does this setup.

On 12/24/2012 10:45 PM, Xi Shen wrote:
> Hi Lance,
>
> I got the lucene 4 from
> http://mirror.bjtu.edu.cn/apache/lucene/java/4.0.0/lucene-4.0.0-src.tgz, it
> is an Ant project. But I do not which IDE can import it...I tried Eclipse,
> it cannot import the build.xml file.
>
>
> Thanks,
> D.
>
>
> On Mon, Dec 24, 2012 at 12:02 PM, Lance Norskog <go...@gmail.com> wrote:
>
>> You need to use an IDE. Find the Attribute type and show all subclasses.
>> This shows a lot of rare ones and a few which are used a lot. Now, look at
>> source code for various TokenFilters and search for other uses of the
>> Attributes you find. This generally is how I figured it out.
>>
>> Also, after the full Analyzer stack is called, the caller saves the output
>> (I guess to codecs?). You can look at which Attributes it saves.
>>
>>
>> On 12/23/2012 06:30 PM, Xi Shen wrote:
>>
>>> thanks a lot :)
>>>
>>>
>>> On Mon, Dec 24, 2012 at 10:22 AM, feng lu <am...@gmail.com> wrote:
>>>
>>>   hi Shen
>>>> May be you can see some source code in org.apache.lucene.analysis
>>>> package,
>>>> such LowerCaseFilter.java,**StopFilter.java and so on.
>>>>
>>>> and some common attribute includes:
>>>>
>>>> offsetAtt = addAttribute(OffsetAttribute.**class);
>>>> termAtt = addAttribute(**CharTermAttribute.class);
>>>> typeAtt = addAttribute(TypeAttribute.**class);
>>>>
>>>> Regards
>>>>
>>>>
>>>> On Sun, Dec 23, 2012 at 4:01 PM, Rafał Kuć <r....@solr.pl> wrote:
>>>>
>>>>   Hello!
>>>>> The simplest way is to look at Lucene javadoc and see what
>>>>> implementations of Attribute interface there are -
>>>>>
>>>>>   http://lucene.apache.org/core/**4_0_0/core/org/apache/lucene/**
>>>> util/Attribute.html<http://lucene.apache.org/core/4_0_0/core/org/apache/lucene/util/Attribute.html>
>>>>
>>>>> --
>>>>> Regards,
>>>>>    Rafał Kuć
>>>>>    Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
>>>>>
>>>>>   thanks, i read this ready. it is useful, but it is too 'small'...
>>>>>> e.g. for this.charTermAttr = addAttribute(**CharTermAttribute.class);
>>>>>> i want to know what are the other attributes i need in order to
>>>>>>
>>>>> implement
>>>>> my function. where i can find a references to these attributes? i tried
>>>>> on
>>>>>
>>>>>> lucene & solr wiki, but all i found is a list of the names of these
>>>>>> attributes, nothing about what are they capable of...
>>>>>>
>>>>>
>>>>>
>>>>>   On Sat, Dec 22, 2012 at 10:37 PM, Rafał Kuć <r....@solr.pl> wrote:
>>>>>>> Hello!
>>>>>>>
>>>>>>> A small example with some explanation can be found here:
>>>>>>> http://solr.pl/en/2012/05/14/**developing-your-own-solr-**filter/<http://solr.pl/en/2012/05/14/developing-your-own-solr-filter/>
>>>>>>>
>>>>>>> --
>>>>>>> Regards,
>>>>>>>    Rafał Kuć
>>>>>>>    Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
>>>>>>>
>>>>>>>   Hi,
>>>>>>>> I need a guide to implement my own TokenFilter. I checked the wiki,
>>>>>>>>
>>>>>>> but I
>>>>>> could not find any useful guide :(
>>>>>>>
>>>>>>> ------------------------------**------------------------------**
>>>>>>> ---------
>>>>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.**apache.org<ja...@lucene.apache.org>
>>>>>>> For additional commands, e-mail: java-user-help@lucene.apache.**org<ja...@lucene.apache.org>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>> ------------------------------**------------------------------**
>>>>> ---------
>>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.**apache.org<ja...@lucene.apache.org>
>>>>> For additional commands, e-mail: java-user-help@lucene.apache.**org<ja...@lucene.apache.org>
>>>>>
>>>>>
>>>>>
>>>> --
>>>> Don't Grow Old, Grow Up... :-)
>>>>
>>>>
>>>
>> ------------------------------**------------------------------**---------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.**apache.org<ja...@lucene.apache.org>
>> For additional commands, e-mail: java-user-help@lucene.apache.**org<ja...@lucene.apache.org>
>>
>>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: how to implement a TokenFilter?

Posted by Xi Shen <da...@gmail.com>.
Hi Lance,

I got the lucene 4 from
http://mirror.bjtu.edu.cn/apache/lucene/java/4.0.0/lucene-4.0.0-src.tgz, it
is an Ant project. But I do not which IDE can import it...I tried Eclipse,
it cannot import the build.xml file.


Thanks,
D.


On Mon, Dec 24, 2012 at 12:02 PM, Lance Norskog <go...@gmail.com> wrote:

> You need to use an IDE. Find the Attribute type and show all subclasses.
> This shows a lot of rare ones and a few which are used a lot. Now, look at
> source code for various TokenFilters and search for other uses of the
> Attributes you find. This generally is how I figured it out.
>
> Also, after the full Analyzer stack is called, the caller saves the output
> (I guess to codecs?). You can look at which Attributes it saves.
>
>
> On 12/23/2012 06:30 PM, Xi Shen wrote:
>
>> thanks a lot :)
>>
>>
>> On Mon, Dec 24, 2012 at 10:22 AM, feng lu <am...@gmail.com> wrote:
>>
>>  hi Shen
>>>
>>> May be you can see some source code in org.apache.lucene.analysis
>>> package,
>>> such LowerCaseFilter.java,**StopFilter.java and so on.
>>>
>>> and some common attribute includes:
>>>
>>> offsetAtt = addAttribute(OffsetAttribute.**class);
>>> termAtt = addAttribute(**CharTermAttribute.class);
>>> typeAtt = addAttribute(TypeAttribute.**class);
>>>
>>> Regards
>>>
>>>
>>> On Sun, Dec 23, 2012 at 4:01 PM, Rafał Kuć <r....@solr.pl> wrote:
>>>
>>>  Hello!
>>>>
>>>> The simplest way is to look at Lucene javadoc and see what
>>>> implementations of Attribute interface there are -
>>>>
>>>>  http://lucene.apache.org/core/**4_0_0/core/org/apache/lucene/**
>>> util/Attribute.html<http://lucene.apache.org/core/4_0_0/core/org/apache/lucene/util/Attribute.html>
>>>
>>>> --
>>>> Regards,
>>>>   Rafał Kuć
>>>>   Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
>>>>
>>>>  thanks, i read this ready. it is useful, but it is too 'small'...
>>>>> e.g. for this.charTermAttr = addAttribute(**CharTermAttribute.class);
>>>>> i want to know what are the other attributes i need in order to
>>>>>
>>>> implement
>>>
>>>> my function. where i can find a references to these attributes? i tried
>>>>>
>>>> on
>>>>
>>>>> lucene & solr wiki, but all i found is a list of the names of these
>>>>> attributes, nothing about what are they capable of...
>>>>>
>>>>
>>>>
>>>>
>>>>  On Sat, Dec 22, 2012 at 10:37 PM, Rafał Kuć <r....@solr.pl> wrote:
>>>>>
>>>>>> Hello!
>>>>>>
>>>>>> A small example with some explanation can be found here:
>>>>>> http://solr.pl/en/2012/05/14/**developing-your-own-solr-**filter/<http://solr.pl/en/2012/05/14/developing-your-own-solr-filter/>
>>>>>>
>>>>>> --
>>>>>> Regards,
>>>>>>   Rafał Kuć
>>>>>>   Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
>>>>>>
>>>>>>  Hi,
>>>>>>> I need a guide to implement my own TokenFilter. I checked the wiki,
>>>>>>>
>>>>>> but I
>>>>
>>>>> could not find any useful guide :(
>>>>>>>
>>>>>>
>>>>>>
>>>>>> ------------------------------**------------------------------**
>>>>>> ---------
>>>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.**apache.org<ja...@lucene.apache.org>
>>>>>> For additional commands, e-mail: java-user-help@lucene.apache.**org<ja...@lucene.apache.org>
>>>>>>
>>>>>>
>>>>>>
>>>>
>>>> ------------------------------**------------------------------**
>>>> ---------
>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.**apache.org<ja...@lucene.apache.org>
>>>> For additional commands, e-mail: java-user-help@lucene.apache.**org<ja...@lucene.apache.org>
>>>>
>>>>
>>>>
>>> --
>>> Don't Grow Old, Grow Up... :-)
>>>
>>>
>>
>>
>
> ------------------------------**------------------------------**---------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.**apache.org<ja...@lucene.apache.org>
> For additional commands, e-mail: java-user-help@lucene.apache.**org<ja...@lucene.apache.org>
>
>


-- 
Regards,
David Shen

http://about.me/davidshen
https://twitter.com/#!/davidshen84

Re: how to implement a TokenFilter?

Posted by Lance Norskog <go...@gmail.com>.
You need to use an IDE. Find the Attribute type and show all subclasses. 
This shows a lot of rare ones and a few which are used a lot. Now, look 
at source code for various TokenFilters and search for other uses of the 
Attributes you find. This generally is how I figured it out.

Also, after the full Analyzer stack is called, the caller saves the 
output (I guess to codecs?). You can look at which Attributes it saves.

On 12/23/2012 06:30 PM, Xi Shen wrote:
> thanks a lot :)
>
>
> On Mon, Dec 24, 2012 at 10:22 AM, feng lu <am...@gmail.com> wrote:
>
>> hi Shen
>>
>> May be you can see some source code in org.apache.lucene.analysis package,
>> such LowerCaseFilter.java,StopFilter.java and so on.
>>
>> and some common attribute includes:
>>
>> offsetAtt = addAttribute(OffsetAttribute.class);
>> termAtt = addAttribute(CharTermAttribute.class);
>> typeAtt = addAttribute(TypeAttribute.class);
>>
>> Regards
>>
>>
>> On Sun, Dec 23, 2012 at 4:01 PM, Rafał Kuć <r....@solr.pl> wrote:
>>
>>> Hello!
>>>
>>> The simplest way is to look at Lucene javadoc and see what
>>> implementations of Attribute interface there are -
>>>
>> http://lucene.apache.org/core/4_0_0/core/org/apache/lucene/util/Attribute.html
>>> --
>>> Regards,
>>>   Rafał Kuć
>>>   Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
>>>
>>>> thanks, i read this ready. it is useful, but it is too 'small'...
>>>> e.g. for this.charTermAttr = addAttribute(CharTermAttribute.class);
>>>> i want to know what are the other attributes i need in order to
>> implement
>>>> my function. where i can find a references to these attributes? i tried
>>> on
>>>> lucene & solr wiki, but all i found is a list of the names of these
>>>> attributes, nothing about what are they capable of...
>>>
>>>
>>>
>>>> On Sat, Dec 22, 2012 at 10:37 PM, Rafał Kuć <r....@solr.pl> wrote:
>>>>> Hello!
>>>>>
>>>>> A small example with some explanation can be found here:
>>>>> http://solr.pl/en/2012/05/14/developing-your-own-solr-filter/
>>>>>
>>>>> --
>>>>> Regards,
>>>>>   Rafał Kuć
>>>>>   Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
>>>>>
>>>>>> Hi,
>>>>>> I need a guide to implement my own TokenFilter. I checked the wiki,
>>> but I
>>>>>> could not find any useful guide :(
>>>>>
>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>>
>>>>>
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>
>>>
>>
>> --
>> Don't Grow Old, Grow Up... :-)
>>
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: how to implement a TokenFilter?

Posted by Xi Shen <da...@gmail.com>.
thanks a lot :)


On Mon, Dec 24, 2012 at 10:22 AM, feng lu <am...@gmail.com> wrote:

> hi Shen
>
> May be you can see some source code in org.apache.lucene.analysis package,
> such LowerCaseFilter.java,StopFilter.java and so on.
>
> and some common attribute includes:
>
> offsetAtt = addAttribute(OffsetAttribute.class);
> termAtt = addAttribute(CharTermAttribute.class);
> typeAtt = addAttribute(TypeAttribute.class);
>
> Regards
>
>
> On Sun, Dec 23, 2012 at 4:01 PM, Rafał Kuć <r....@solr.pl> wrote:
>
> > Hello!
> >
> > The simplest way is to look at Lucene javadoc and see what
> > implementations of Attribute interface there are -
> >
> http://lucene.apache.org/core/4_0_0/core/org/apache/lucene/util/Attribute.html
> >
> > --
> > Regards,
> >  Rafał Kuć
> >  Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
> >
> > > thanks, i read this ready. it is useful, but it is too 'small'...
> >
> > > e.g. for this.charTermAttr = addAttribute(CharTermAttribute.class);
> >
> > > i want to know what are the other attributes i need in order to
> implement
> > > my function. where i can find a references to these attributes? i tried
> > on
> > > lucene & solr wiki, but all i found is a list of the names of these
> > > attributes, nothing about what are they capable of...
> >
> >
> >
> >
> > > On Sat, Dec 22, 2012 at 10:37 PM, Rafał Kuć <r....@solr.pl> wrote:
> >
> > >> Hello!
> > >>
> > >> A small example with some explanation can be found here:
> > >> http://solr.pl/en/2012/05/14/developing-your-own-solr-filter/
> > >>
> > >> --
> > >> Regards,
> > >>  Rafał Kuć
> > >>  Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
> > >>
> > >> > Hi,
> > >>
> > >> > I need a guide to implement my own TokenFilter. I checked the wiki,
> > but I
> > >> > could not find any useful guide :(
> > >>
> > >>
> > >>
> > >> ---------------------------------------------------------------------
> > >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > >> For additional commands, e-mail: java-user-help@lucene.apache.org
> > >>
> > >>
> >
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
> >
>
>
> --
> Don't Grow Old, Grow Up... :-)
>



-- 
Regards,
David Shen

http://about.me/davidshen
https://twitter.com/#!/davidshen84

Re: how to implement a TokenFilter?

Posted by feng lu <am...@gmail.com>.
hi Shen

May be you can see some source code in org.apache.lucene.analysis package,
such LowerCaseFilter.java,StopFilter.java and so on.

and some common attribute includes:

offsetAtt = addAttribute(OffsetAttribute.class);
termAtt = addAttribute(CharTermAttribute.class);
typeAtt = addAttribute(TypeAttribute.class);

Regards


On Sun, Dec 23, 2012 at 4:01 PM, Rafał Kuć <r....@solr.pl> wrote:

> Hello!
>
> The simplest way is to look at Lucene javadoc and see what
> implementations of Attribute interface there are -
> http://lucene.apache.org/core/4_0_0/core/org/apache/lucene/util/Attribute.html
>
> --
> Regards,
>  Rafał Kuć
>  Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
>
> > thanks, i read this ready. it is useful, but it is too 'small'...
>
> > e.g. for this.charTermAttr = addAttribute(CharTermAttribute.class);
>
> > i want to know what are the other attributes i need in order to implement
> > my function. where i can find a references to these attributes? i tried
> on
> > lucene & solr wiki, but all i found is a list of the names of these
> > attributes, nothing about what are they capable of...
>
>
>
>
> > On Sat, Dec 22, 2012 at 10:37 PM, Rafał Kuć <r....@solr.pl> wrote:
>
> >> Hello!
> >>
> >> A small example with some explanation can be found here:
> >> http://solr.pl/en/2012/05/14/developing-your-own-solr-filter/
> >>
> >> --
> >> Regards,
> >>  Rafał Kuć
> >>  Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
> >>
> >> > Hi,
> >>
> >> > I need a guide to implement my own TokenFilter. I checked the wiki,
> but I
> >> > could not find any useful guide :(
> >>
> >>
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> >> For additional commands, e-mail: java-user-help@lucene.apache.org
> >>
> >>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>


-- 
Don't Grow Old, Grow Up... :-)

Re: how to implement a TokenFilter?

Posted by Rafał Kuć <r....@solr.pl>.
Hello!

The simplest way is to look at Lucene javadoc and see what
implementations of Attribute interface there are - http://lucene.apache.org/core/4_0_0/core/org/apache/lucene/util/Attribute.html

-- 
Regards,
 Rafał Kuć
 Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch

> thanks, i read this ready. it is useful, but it is too 'small'...

> e.g. for this.charTermAttr = addAttribute(CharTermAttribute.class);

> i want to know what are the other attributes i need in order to implement
> my function. where i can find a references to these attributes? i tried on
> lucene & solr wiki, but all i found is a list of the names of these
> attributes, nothing about what are they capable of...




> On Sat, Dec 22, 2012 at 10:37 PM, Rafał Kuć <r....@solr.pl> wrote:

>> Hello!
>>
>> A small example with some explanation can be found here:
>> http://solr.pl/en/2012/05/14/developing-your-own-solr-filter/
>>
>> --
>> Regards,
>>  Rafał Kuć
>>  Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
>>
>> > Hi,
>>
>> > I need a guide to implement my own TokenFilter. I checked the wiki, but I
>> > could not find any useful guide :(
>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: how to implement a TokenFilter?

Posted by Xi Shen <da...@gmail.com>.
thanks, i read this ready. it is useful, but it is too 'small'...

e.g. for this.charTermAttr = addAttribute(CharTermAttribute.class);

i want to know what are the other attributes i need in order to implement
my function. where i can find a references to these attributes? i tried on
lucene & solr wiki, but all i found is a list of the names of these
attributes, nothing about what are they capable of...




On Sat, Dec 22, 2012 at 10:37 PM, Rafał Kuć <r....@solr.pl> wrote:

> Hello!
>
> A small example with some explanation can be found here:
> http://solr.pl/en/2012/05/14/developing-your-own-solr-filter/
>
> --
> Regards,
>  Rafał Kuć
>  Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
>
> > Hi,
>
> > I need a guide to implement my own TokenFilter. I checked the wiki, but I
> > could not find any useful guide :(
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>


-- 
Regards,
David Shen

http://about.me/davidshen
https://twitter.com/#!/davidshen84

Re: how to implement a TokenFilter?

Posted by Rafał Kuć <r....@solr.pl>.
Hello!

A small example with some explanation can be found here: http://solr.pl/en/2012/05/14/developing-your-own-solr-filter/

-- 
Regards,
 Rafał Kuć
 Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch

> Hi,

> I need a guide to implement my own TokenFilter. I checked the wiki, but I
> could not find any useful guide :(



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org