You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Shaun Senecal <sh...@lithium.com> on 2015/01/08 20:05:51 UTC

cloning a NumericTermAttributeImpl

It appears that NumericTermAttributeImpl doesn't support cloning.  Is this simply an oversight, or is there a reason why this isnt possible?


I'm trying to clone a State containing a NumericTermAttribute, and I end up with all of the BytesRefs being equal to the last instance, since the NumericTermAttributeImpl? isn't actually being cloned.  Is there a way around this?



Thanks


Shaun

RE: cloning a NumericTermAttributeImpl

Posted by Uwe Schindler <uw...@thetaphi.de>.
Hi,

I checked it out a second time. We *can* implement deep clone. Actually this is a bug from the time when we changed to BytesRefBuilder. I opened https://issues.apache.org/jira/browse/LUCENE-6173 about this. Thanks.

Uwe

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe@thetaphi.de


> -----Original Message-----
> From: Uwe Schindler [mailto:uwe@thetaphi.de]
> Sent: Saturday, January 10, 2015 1:24 PM
> To: java-user@lucene.apache.org
> Subject: RE: cloning a NumericTermAttributeImpl
> 
> Hi,
> 
> NumericTokenStream is an internal class to implement NumericField. It is
> only public for historical reasons (was used by Solr) and because it is in
> another package. The Attributes it uses do not implement clone, because
> they have some internal state. Cloning is also not needed, because in
> "standard" Analyzers you would not see those TokenStreams. Lucene also
> never adds TokenFilters on top.
> 
> Let me check if this TokenStream is really marked as @lucene.internal in
> Javadocs.
> 
> Uwe
> 
> -----
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: uwe@thetaphi.de
> 
> 
> > -----Original Message-----
> > From: Shaun Senecal [mailto:shaun.senecal@lithium.com]
> > Sent: Thursday, January 08, 2015 8:06 PM
> > To: java-user@lucene.apache.org
> > Subject: cloning a NumericTermAttributeImpl
> >
> > It appears that NumericTermAttributeImpl doesn't support cloning.  Is
> > this simply an oversight, or is there a reason why this isnt possible?
> >
> >
> > I'm trying to clone a State containing a NumericTermAttribute, and I
> > end up with all of the BytesRefs being equal to the last instance,
> > since the NumericTermAttributeImpl? isn't actually being cloned.  Is
> > there a way around this?
> >
> >
> >
> > Thanks
> >
> >
> > Shaun
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


RE: cloning a NumericTermAttributeImpl

Posted by Uwe Schindler <uw...@thetaphi.de>.
Hi,

NumericTokenStream is an internal class to implement NumericField. It is only public for historical reasons (was used by Solr) and because it is in another package. The Attributes it uses do not implement clone, because they have some internal state. Cloning is also not needed, because in "standard" Analyzers you would not see those TokenStreams. Lucene also never adds TokenFilters on top.

Let me check if this TokenStream is really marked as @lucene.internal in Javadocs.

Uwe

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe@thetaphi.de


> -----Original Message-----
> From: Shaun Senecal [mailto:shaun.senecal@lithium.com]
> Sent: Thursday, January 08, 2015 8:06 PM
> To: java-user@lucene.apache.org
> Subject: cloning a NumericTermAttributeImpl
> 
> It appears that NumericTermAttributeImpl doesn't support cloning.  Is this
> simply an oversight, or is there a reason why this isnt possible?
> 
> 
> I'm trying to clone a State containing a NumericTermAttribute, and I end up
> with all of the BytesRefs being equal to the last instance, since the
> NumericTermAttributeImpl? isn't actually being cloned.  Is there a way
> around this?
> 
> 
> 
> Thanks
> 
> 
> Shaun


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Finding a match for an automaton against a FST

Posted by Michael McCandless <lu...@mikemccandless.com>.
On Sat, Jan 10, 2015 at 8:23 AM, Olivier Binda <ol...@wanadoo.fr> wrote:
> On 01/10/2015 11:00 AM, Michael McCandless wrote:
>>
>> On Fri, Jan 9, 2015 at 6:42 AM, Olivier Binda <ol...@wanadoo.fr>
>> wrote:
>>>
>>> Hello.
>>>
>>> 1) What is the best way to check if an automaton (from a regex or a
>>> string
>>> with a wildcard)
>>> has at least 1 match against a FST (from a WFSTCompletionLookup) ?
>>
>> You need to implement "intersect".  We already have this method for
>> two automata (Operations.java); maybe you can start from that but
>> cutover to the FST APIs instead for the 2nd automaton?
>
>
> I looked a bit into this. This is complicated stuff :/

Sorry, yes it is.  If you have any ideas to simplify the APIs that
would be awesome :)

> I think I get what the nested loops in intersect() do  : transitions consist
> of a double dimension array, somehow those arays are intersected.
> I don't understand yet why is there a .min and a .max for a transition (why
> not just a codepoint ?)

Most automaton transitions cover a wide range of unicode characters,
so requiring a separate transition for each would be too costly (too
many objects / RAM).

> Fst and automaton (and maybee the lucene codec stuff) are 3 different
> implementations
> of finite state machine/transducers, right ?

I think we have only 2 implementations (FST, Automaton).

> How does regexQuery (automaton) match against an index ?
> Does it use intersect() internally ?  (if it does, maybee I could reuse that
> code too)

RegexpQuery in core (NOT to be confused with the much slower,
differing in name by only one letter, RegexQuery in sandbox) builds an
Automaton and then uses Terms.intersect API.

However, I would not look for inspiration from Terms.intersect: that
implementation (in block tree terms dict) works with the terms
dictionary data structures to perform a fast intersection and that
code is crazy complex.

Possibly a place to look for inspiration/poaching is
FSTUtil.intersectPrefixPaths: that intersects an automaton with an
FST.  It's used by the fuzzy suggester...

Mike McCandless

http://blog.mikemccandless.com



>
>
>>
>>> 2) Also, is there a simple/efficient way to find the lowest and the
>>> highest
>>> arcs of a FST that match against  an automaton ?
>>
>> Hmm arcs leaving which state?  The initial state?  You could simply
>> walk all arcs leaving the initial state from the FST and check if the
>> automaton accepts them leaving its initial state (assuming the
>> automaton has no dead states)?
>>
>> Or, if you are already doing an intersection here, just save this
>> information as a side effect since you will have already computed it.
>
>
> thanks for the tips, it helps.
> Olivier
>
>
>
>> Mike McCandless
>>
>> http://blog.mikemccandless.com
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Finding a match for an automaton against a FST

Posted by Olivier Binda <ol...@wanadoo.fr>.
On 01/10/2015 11:00 AM, Michael McCandless wrote:
> On Fri, Jan 9, 2015 at 6:42 AM, Olivier Binda <ol...@wanadoo.fr> wrote:
>> Hello.
>>
>> 1) What is the best way to check if an automaton (from a regex or a string
>> with a wildcard)
>> has at least 1 match against a FST (from a WFSTCompletionLookup) ?
> You need to implement "intersect".  We already have this method for
> two automata (Operations.java); maybe you can start from that but
> cutover to the FST APIs instead for the 2nd automaton?

I looked a bit into this. This is complicated stuff :/

I think I get what the nested loops in intersect() do  : transitions 
consist of a double dimension array, somehow those arays are intersected.
I don't understand yet why is there a .min and a .max for a transition 
(why not just a codepoint ?)

Fst and automaton (and maybee the lucene codec stuff) are 3 different 
implementations
of finite state machine/transducers, right ?

How does regexQuery (automaton) match against an index ?
Does it use intersect() internally ?  (if it does, maybee I could reuse 
that code too)



>
>> 2) Also, is there a simple/efficient way to find the lowest and the highest
>> arcs of a FST that match against  an automaton ?
> Hmm arcs leaving which state?  The initial state?  You could simply
> walk all arcs leaving the initial state from the FST and check if the
> automaton accepts them leaving its initial state (assuming the
> automaton has no dead states)?
>
> Or, if you are already doing an intersection here, just save this
> information as a side effect since you will have already computed it.

thanks for the tips, it helps.
Olivier



> Mike McCandless
>
> http://blog.mikemccandless.com
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Finding a match for an automaton against a FST

Posted by Michael McCandless <lu...@mikemccandless.com>.
On Fri, Jan 9, 2015 at 6:42 AM, Olivier Binda <ol...@wanadoo.fr> wrote:
> Hello.
>
> 1) What is the best way to check if an automaton (from a regex or a string
> with a wildcard)
> has at least 1 match against a FST (from a WFSTCompletionLookup) ?

You need to implement "intersect".  We already have this method for
two automata (Operations.java); maybe you can start from that but
cutover to the FST APIs instead for the 2nd automaton?

> 2) Also, is there a simple/efficient way to find the lowest and the highest
> arcs of a FST that match against  an automaton ?

Hmm arcs leaving which state?  The initial state?  You could simply
walk all arcs leaving the initial state from the FST and check if the
automaton accepts them leaving its initial state (assuming the
automaton has no dead states)?

Or, if you are already doing an intersection here, just save this
information as a side effect since you will have already computed it.

Mike McCandless

http://blog.mikemccandless.com

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Finding a match for an automaton against a FST

Posted by Olivier Binda <ol...@wanadoo.fr>.
Hello.

1) What is the best way to check if an automaton (from a regex or a 
string with a wildcard)
has at least 1 match against a FST (from a WFSTCompletionLookup) ?

Is there a simple way to do that ?


2) Also, is there a simple/efficient way to find the lowest and the 
highest arcs of a FST that match against  an automaton ?

Best regards,
Olivier



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org