You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by Koji Sekiguchi <ko...@r.email.ne.jp> on 2012/03/20 05:37:35 UTC
any general way of getting which attributes token stream has?
Is there any general way of getting/looking what attributes a token stream has?
I want to use spell checker with a query analyzer, which the analyzer generates
ReadingAttribute for each tokens, and I want to use the ReadingAttributes for
spell checking. I think I can have my own SpellingQueryConverter extension to
override analyze method, but I saw the TODO comment in SpellingQueryConverter:
protected void analyze(Collection<Token> result, Reader text, int offset) throws IOException {
TokenStream stream = analyzer.reusableTokenStream("", text);
// TODO: support custom attributes
CharTermAttribute termAtt = stream.addAttribute(CharTermAttribute.class);
FlagsAttribute flagsAtt = stream.addAttribute(FlagsAttribute.class);
TypeAttribute typeAtt = stream.addAttribute(TypeAttribute.class);
PayloadAttribute payloadAtt = stream.addAttribute(PayloadAttribute.class);
PositionIncrementAttribute posIncAtt = stream.addAttribute(PositionIncrementAttribute.class);
OffsetAttribute offsetAtt = stream.addAttribute(OffsetAttribute.class);
:
If we can have a general way of getting such information, I think it would be helpful
not only for spell checking. (For example, SynonymFilter can add PartOfSpeechAttribute
if the original token has.)
koji
--
Query Log Visualizer for Apache Solr
http://soleami.com/
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org
Re: any general way of getting which attributes token stream has?
Posted by Mikhail Khludnev <mk...@griddynamics.com>.
Hello Koji,
Can't it be done via tokenStrem.reflectWith(AttributeReflector) with
reflector which puts all attrs properties into Token via reflection or into
AttributeSource?
WDYT?
2012/3/20 Koji Sekiguchi <ko...@r.email.ne.jp>
> Is there any general way of getting/looking what attributes a token stream
> has?
>
> I want to use spell checker with a query analyzer, which the analyzer
> generates
> ReadingAttribute for each tokens, and I want to use the ReadingAttributes
> for
> spell checking. I think I can have my own SpellingQueryConverter extension
> to
> override analyze method, but I saw the TODO comment in
> SpellingQueryConverter:
>
> protected void analyze(Collection<Token> result, Reader text, int offset)
> throws IOException {
> TokenStream stream = analyzer.reusableTokenStream("", text);
> // TODO: support custom attributes
> CharTermAttribute termAtt =
> stream.addAttribute(CharTermAttribute.class);
> FlagsAttribute flagsAtt = stream.addAttribute(FlagsAttribute.class);
> TypeAttribute typeAtt = stream.addAttribute(TypeAttribute.class);
> PayloadAttribute payloadAtt =
> stream.addAttribute(PayloadAttribute.class);
> PositionIncrementAttribute posIncAtt =
> stream.addAttribute(PositionIncrementAttribute.class);
> OffsetAttribute offsetAtt = stream.addAttribute(OffsetAttribute.class);
> :
>
> If we can have a general way of getting such information, I think it would
> be helpful
> not only for spell checking. (For example, SynonymFilter can add
> PartOfSpeechAttribute
> if the original token has.)
>
> koji
> --
> Query Log Visualizer for Apache Solr
> http://soleami.com/
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>
>
--
Sincerely yours
Mikhail Khludnev
Lucid Certified
Apache Lucene/Solr Developer
Grid Dynamics
<http://www.griddynamics.com>
<mk...@griddynamics.com>
Re: any general way of getting which attributes token stream has?
Posted by Koji Sekiguchi <ko...@r.email.ne.jp>.
(12/03/20 13:47), Robert Muir wrote:
> I think we should probably change the QueryConverter api from:
> public abstract Collection<Token> convert(String original);
> to:
> public abstract TokenStream convert(original)
>
> Currently attributes such as ReadingAttribute are lost...
>
> If we really want a Collection we could alternatively have
> Collection<AttributeSource> which would also preserve attributes, but
> it seems silly when QueryConverter could just return a TokenStream.
>
> This makes SuggestQueryConverter extremely simple :)
> In fact SpellingQueryConvert could be simple too: I think its
> basically really just is a regex-tokenizer with a stopword list
> (OR/AND) ?
Hi Robert,
Thanks for the comment.
As I'm investigating further the Lucene spell checker for Japanese,
I've realized that there is more essential problem in it. I'll open a
JIRA ticket for it shortly. In the ticket, I change the api you mentioned
if needed.
koji
--
Query Log Visualizer for Apache Solr
http://soleami.com/
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org
Re: any general way of getting which attributes token stream has?
Posted by Robert Muir <rc...@gmail.com>.
I think we should probably change the QueryConverter api from:
public abstract Collection<Token> convert(String original);
to:
public abstract TokenStream convert(original)
Currently attributes such as ReadingAttribute are lost...
If we really want a Collection we could alternatively have
Collection<AttributeSource> which would also preserve attributes, but
it seems silly when QueryConverter could just return a TokenStream.
This makes SuggestQueryConverter extremely simple :)
In fact SpellingQueryConvert could be simple too: I think its
basically really just is a regex-tokenizer with a stopword list
(OR/AND) ?
On Tue, Mar 20, 2012 at 12:37 AM, Koji Sekiguchi <ko...@r.email.ne.jp> wrote:
> Is there any general way of getting/looking what attributes a token stream has?
>
> I want to use spell checker with a query analyzer, which the analyzer generates
> ReadingAttribute for each tokens, and I want to use the ReadingAttributes for
> spell checking. I think I can have my own SpellingQueryConverter extension to
> override analyze method, but I saw the TODO comment in SpellingQueryConverter:
>
> protected void analyze(Collection<Token> result, Reader text, int offset) throws IOException {
> TokenStream stream = analyzer.reusableTokenStream("", text);
> // TODO: support custom attributes
> CharTermAttribute termAtt = stream.addAttribute(CharTermAttribute.class);
> FlagsAttribute flagsAtt = stream.addAttribute(FlagsAttribute.class);
> TypeAttribute typeAtt = stream.addAttribute(TypeAttribute.class);
> PayloadAttribute payloadAtt = stream.addAttribute(PayloadAttribute.class);
> PositionIncrementAttribute posIncAtt = stream.addAttribute(PositionIncrementAttribute.class);
> OffsetAttribute offsetAtt = stream.addAttribute(OffsetAttribute.class);
> :
>
> If we can have a general way of getting such information, I think it would be helpful
> not only for spell checking. (For example, SynonymFilter can add PartOfSpeechAttribute
> if the original token has.)
>
> koji
> --
> Query Log Visualizer for Apache Solr
> http://soleami.com/
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>
--
lucidimagination.com
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org