You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@lucenenet.apache.org by Ben West <bw...@yahoo.com> on 2010/12/16 21:19:46 UTC

Concatenated TokenStreams

Hey All,

I want to make something roughly equivalent to Solr's copy fields. However, I can't just concatenate the values into one string then put that into a field, because each field needs its appropriate analyzer.

I know you can instantiate a field with a token stream as an input, but I'm not sure how to combine multiple token streams into one. I tried to make a wrapper class to do this, but I can't figure out how since with the attribute API I can't switch the attributes they already had onto a new stream.

I checked Solr's source, and as far as I could tell it doesn't seem like they allow each input field to be tokenized in its own way (though I could definitely be misreading it).

Does anyone know if this is possible?

Thanks,
-Ben


      

Re: Concatenated TokenStreams

Posted by Ben West <bw...@yahoo.com>.
This works perfectly, thanks Troy! I'm in your debt.

--- On Thu, 12/16/10, Troy Howard <th...@gmail.com> wrote:

> From: Troy Howard <th...@gmail.com>
> Subject: Re: Concatenated TokenStreams
> To: lucene-net-user@lucene.apache.org
> Date: Thursday, December 16, 2010, 9:36 PM
> Here's some example code that seems
> to work...
> 
> In a nutshell, I created a new kind of field and a new kind
> of
> TokenStream (MultiField and MultiTokenStream)... Both of
> which just
> handle aggregation logic.
> 
> https://gist.github.com/744444
> 
> This is a naive example, so you might need to do a bit more
> to toughen
> it up for production use. It takes advantage of
> CaptureState and
> RestoreState in the TokenStream API, so this might not be
> the best
> performing code in the world, but it should point you in
> the right
> direction.
> 
> 
> Thanks,
> Troy
> 
> 
> On Thu, Dec 16, 2010 at 1:23 PM, Ben West <bw...@yahoo.com>
> wrote:
> > wow yeah, that would be awesome.
> >
> > I'm using 2.9.2
> >
> > Thanks,
> > -ben
> >
> > --- On Thu, 12/16/10, Troy Howard <th...@gmail.com>
> wrote:
> >
> >> From: Troy Howard <th...@gmail.com>
> >> Subject: Re: Concatenated TokenStreams
> >> To: lucene-net-user@lucene.apache.org
> >> Date: Thursday, December 16, 2010, 3:10 PM
> >> Ben,
> >>
> >> This seems totally possible with a wrapper. I'll
> see if I
> >> can't mock
> >> up a prototype.
> >>
> >> I assume you're using 2.9.2?
> >>
> >> Thanks,
> >> Troy
> >>
> >>
> >> On Thu, Dec 16, 2010 at 12:19 PM, Ben West <bw...@yahoo.com>
> >> wrote:
> >> > Hey All,
> >> >
> >> > I want to make something roughly equivalent
> to Solr's
> >> copy fields. However, I can't just concatenate the
> values
> >> into one string then put that into a field,
> because each
> >> field needs its appropriate analyzer.
> >> >
> >> > I know you can instantiate a field with a
> token stream
> >> as an input, but I'm not sure how to combine
> multiple token
> >> streams into one. I tried to make a wrapper class
> to do
> >> this, but I can't figure out how since with the
> attribute
> >> API I can't switch the attributes they already had
> onto a
> >> new stream.
> >> >
> >> > I checked Solr's source, and as far as I
> could tell it
> >> doesn't seem like they allow each input field to
> be
> >> tokenized in its own way (though I could
> definitely be
> >> misreading it).
> >> >
> >> > Does anyone know if this is possible?
> >> >
> >> > Thanks,
> >> > -Ben
> >> >
> >> >
> >> >
> >> >
> >>
> >
> >
> >
> >
> 


      

Re: Concatenated TokenStreams

Posted by Troy Howard <th...@gmail.com>.
Here's some example code that seems to work...

In a nutshell, I created a new kind of field and a new kind of
TokenStream (MultiField and MultiTokenStream)... Both of which just
handle aggregation logic.

https://gist.github.com/744444

This is a naive example, so you might need to do a bit more to toughen
it up for production use. It takes advantage of CaptureState and
RestoreState in the TokenStream API, so this might not be the best
performing code in the world, but it should point you in the right
direction.


Thanks,
Troy


On Thu, Dec 16, 2010 at 1:23 PM, Ben West <bw...@yahoo.com> wrote:
> wow yeah, that would be awesome.
>
> I'm using 2.9.2
>
> Thanks,
> -ben
>
> --- On Thu, 12/16/10, Troy Howard <th...@gmail.com> wrote:
>
>> From: Troy Howard <th...@gmail.com>
>> Subject: Re: Concatenated TokenStreams
>> To: lucene-net-user@lucene.apache.org
>> Date: Thursday, December 16, 2010, 3:10 PM
>> Ben,
>>
>> This seems totally possible with a wrapper. I'll see if I
>> can't mock
>> up a prototype.
>>
>> I assume you're using 2.9.2?
>>
>> Thanks,
>> Troy
>>
>>
>> On Thu, Dec 16, 2010 at 12:19 PM, Ben West <bw...@yahoo.com>
>> wrote:
>> > Hey All,
>> >
>> > I want to make something roughly equivalent to Solr's
>> copy fields. However, I can't just concatenate the values
>> into one string then put that into a field, because each
>> field needs its appropriate analyzer.
>> >
>> > I know you can instantiate a field with a token stream
>> as an input, but I'm not sure how to combine multiple token
>> streams into one. I tried to make a wrapper class to do
>> this, but I can't figure out how since with the attribute
>> API I can't switch the attributes they already had onto a
>> new stream.
>> >
>> > I checked Solr's source, and as far as I could tell it
>> doesn't seem like they allow each input field to be
>> tokenized in its own way (though I could definitely be
>> misreading it).
>> >
>> > Does anyone know if this is possible?
>> >
>> > Thanks,
>> > -Ben
>> >
>> >
>> >
>> >
>>
>
>
>
>

Re: Concatenated TokenStreams

Posted by Ben West <bw...@yahoo.com>.
wow yeah, that would be awesome.

I'm using 2.9.2

Thanks,
-ben

--- On Thu, 12/16/10, Troy Howard <th...@gmail.com> wrote:

> From: Troy Howard <th...@gmail.com>
> Subject: Re: Concatenated TokenStreams
> To: lucene-net-user@lucene.apache.org
> Date: Thursday, December 16, 2010, 3:10 PM
> Ben,
> 
> This seems totally possible with a wrapper. I'll see if I
> can't mock
> up a prototype.
> 
> I assume you're using 2.9.2?
> 
> Thanks,
> Troy
> 
> 
> On Thu, Dec 16, 2010 at 12:19 PM, Ben West <bw...@yahoo.com>
> wrote:
> > Hey All,
> >
> > I want to make something roughly equivalent to Solr's
> copy fields. However, I can't just concatenate the values
> into one string then put that into a field, because each
> field needs its appropriate analyzer.
> >
> > I know you can instantiate a field with a token stream
> as an input, but I'm not sure how to combine multiple token
> streams into one. I tried to make a wrapper class to do
> this, but I can't figure out how since with the attribute
> API I can't switch the attributes they already had onto a
> new stream.
> >
> > I checked Solr's source, and as far as I could tell it
> doesn't seem like they allow each input field to be
> tokenized in its own way (though I could definitely be
> misreading it).
> >
> > Does anyone know if this is possible?
> >
> > Thanks,
> > -Ben
> >
> >
> >
> >
> 


      

Re: Concatenated TokenStreams

Posted by Troy Howard <th...@gmail.com>.
Ben,

This seems totally possible with a wrapper. I'll see if I can't mock
up a prototype.

I assume you're using 2.9.2?

Thanks,
Troy


On Thu, Dec 16, 2010 at 12:19 PM, Ben West <bw...@yahoo.com> wrote:
> Hey All,
>
> I want to make something roughly equivalent to Solr's copy fields. However, I can't just concatenate the values into one string then put that into a field, because each field needs its appropriate analyzer.
>
> I know you can instantiate a field with a token stream as an input, but I'm not sure how to combine multiple token streams into one. I tried to make a wrapper class to do this, but I can't figure out how since with the attribute API I can't switch the attributes they already had onto a new stream.
>
> I checked Solr's source, and as far as I could tell it doesn't seem like they allow each input field to be tokenized in its own way (though I could definitely be misreading it).
>
> Does anyone know if this is possible?
>
> Thanks,
> -Ben
>
>
>
>