You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@lucene.apache.org by Koji Sekiguchi <ko...@r.email.ne.jp> on 2012/03/20 05:37:35 UTC

any general way of getting which attributes token stream has?

Is there any general way of getting/looking what attributes a token stream has?

I want to use spell checker with a query analyzer, which the analyzer generates
ReadingAttribute for each tokens, and I want to use the ReadingAttributes for
spell checking. I think I can have my own SpellingQueryConverter extension to
override analyze method, but I saw the TODO comment in SpellingQueryConverter:

  protected void analyze(Collection<Token> result, Reader text, int offset) throws IOException {
    TokenStream stream = analyzer.reusableTokenStream("", text);
    // TODO: support custom attributes
    CharTermAttribute termAtt = stream.addAttribute(CharTermAttribute.class);
    FlagsAttribute flagsAtt = stream.addAttribute(FlagsAttribute.class);
    TypeAttribute typeAtt = stream.addAttribute(TypeAttribute.class);
    PayloadAttribute payloadAtt = stream.addAttribute(PayloadAttribute.class);
    PositionIncrementAttribute posIncAtt = stream.addAttribute(PositionIncrementAttribute.class);
    OffsetAttribute offsetAtt = stream.addAttribute(OffsetAttribute.class);
       :

If we can have a general way of getting such information, I think it would be helpful
not only for spell checking. (For example, SynonymFilter can add PartOfSpeechAttribute
if the original token has.)

koji
-- 
Query Log Visualizer for Apache Solr
http://soleami.com/

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

Re: any general way of getting which attributes token stream has?

Posted by Mikhail Khludnev <mk...@griddynamics.com>.

Hello Koji,

Can't it be done via tokenStrem.reflectWith(AttributeReflector) with
reflector which puts all attrs properties into Token via reflection or into
AttributeSource?
WDYT?

2012/3/20 Koji Sekiguchi <ko...@r.email.ne.jp>

> Is there any general way of getting/looking what attributes a token stream
> has?
>
> I want to use spell checker with a query analyzer, which the analyzer
> generates
> ReadingAttribute for each tokens, and I want to use the ReadingAttributes
> for
> spell checking. I think I can have my own SpellingQueryConverter extension
> to
> override analyze method, but I saw the TODO comment in
> SpellingQueryConverter:
>
>  protected void analyze(Collection<Token> result, Reader text, int offset)
> throws IOException {
>    TokenStream stream = analyzer.reusableTokenStream("", text);
>    // TODO: support custom attributes
>    CharTermAttribute termAtt =
> stream.addAttribute(CharTermAttribute.class);
>    FlagsAttribute flagsAtt = stream.addAttribute(FlagsAttribute.class);
>    TypeAttribute typeAtt = stream.addAttribute(TypeAttribute.class);
>    PayloadAttribute payloadAtt =
> stream.addAttribute(PayloadAttribute.class);
>    PositionIncrementAttribute posIncAtt =
> stream.addAttribute(PositionIncrementAttribute.class);
>    OffsetAttribute offsetAtt = stream.addAttribute(OffsetAttribute.class);
>       :
>
> If we can have a general way of getting such information, I think it would
> be helpful
> not only for spell checking. (For example, SynonymFilter can add
> PartOfSpeechAttribute
> if the original token has.)
>
> koji
> --
> Query Log Visualizer for Apache Solr
> http://soleami.com/
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>
>


-- 
Sincerely yours
Mikhail Khludnev
Lucid Certified
Apache Lucene/Solr Developer
Grid Dynamics

<http://www.griddynamics.com>
 <mk...@griddynamics.com>

Re: any general way of getting which attributes token stream has?

Posted by Koji Sekiguchi <ko...@r.email.ne.jp>.

(12/03/20 13:47), Robert Muir wrote:
> I think we should probably change the QueryConverter api from:
>      public abstract Collection<Token>  convert(String original);
> to:
>      public abstract TokenStream convert(original)
>
> Currently attributes such as ReadingAttribute are lost...
>
> If we really want a Collection we could alternatively have
> Collection<AttributeSource>  which would also preserve attributes, but
> it seems silly when QueryConverter could just return a TokenStream.
>
> This makes SuggestQueryConverter extremely simple :)
> In fact SpellingQueryConvert could be simple too: I think its
> basically really just is a regex-tokenizer with a stopword list
> (OR/AND) ?

Hi Robert,

Thanks for the comment.

As I'm investigating further the Lucene spell checker for Japanese,
I've realized that there is more essential problem in it. I'll open a
JIRA ticket for it shortly. In the ticket, I change the api you mentioned
if needed.

koji
-- 
Query Log Visualizer for Apache Solr
http://soleami.com/

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

Re: any general way of getting which attributes token stream has?

Posted by Robert Muir <rc...@gmail.com>.

I think we should probably change the QueryConverter api from:
    public abstract Collection<Token> convert(String original);
to:
    public abstract TokenStream convert(original)

Currently attributes such as ReadingAttribute are lost...

If we really want a Collection we could alternatively have
Collection<AttributeSource> which would also preserve attributes, but
it seems silly when QueryConverter could just return a TokenStream.

This makes SuggestQueryConverter extremely simple :)
In fact SpellingQueryConvert could be simple too: I think its
basically really just is a regex-tokenizer with a stopword list
(OR/AND) ?

On Tue, Mar 20, 2012 at 12:37 AM, Koji Sekiguchi <ko...@r.email.ne.jp> wrote:
> Is there any general way of getting/looking what attributes a token stream has?
>
> I want to use spell checker with a query analyzer, which the analyzer generates
> ReadingAttribute for each tokens, and I want to use the ReadingAttributes for
> spell checking. I think I can have my own SpellingQueryConverter extension to
> override analyze method, but I saw the TODO comment in SpellingQueryConverter:
>
>  protected void analyze(Collection<Token> result, Reader text, int offset) throws IOException {
>    TokenStream stream = analyzer.reusableTokenStream("", text);
>    // TODO: support custom attributes
>    CharTermAttribute termAtt = stream.addAttribute(CharTermAttribute.class);
>    FlagsAttribute flagsAtt = stream.addAttribute(FlagsAttribute.class);
>    TypeAttribute typeAtt = stream.addAttribute(TypeAttribute.class);
>    PayloadAttribute payloadAtt = stream.addAttribute(PayloadAttribute.class);
>    PositionIncrementAttribute posIncAtt = stream.addAttribute(PositionIncrementAttribute.class);
>    OffsetAttribute offsetAtt = stream.addAttribute(OffsetAttribute.class);
>       :
>
> If we can have a general way of getting such information, I think it would be helpful
> not only for spell checking. (For example, SynonymFilter can add PartOfSpeechAttribute
> if the original token has.)
>
> koji
> --
> Query Log Visualizer for Apache Solr
> http://soleami.com/
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>



-- 
lucidimagination.com

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org