You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by Yonik Seeley <yo...@lucidimagination.com> on 2009/06/16 03:48:03 UTC

Field.tokenStreamValue

The JavaDoc suggests that one can't have a tokenStreamValue and a
StringValue or binaryValue at the same time... any good reason for
this restriction?

  /** The value of the field as a String, or null.  If null, the Reader value,
   * binary value, or TokenStream value is used.  Exactly one of stringValue(),
   * readerValue(), getBinaryValue(), and tokenStreamValue() must be set. */

The indexing code looks like it should actually work - but the Field
restricts one setting a tokenStreamValue and having a stored field at
the same time.
Should we fix this?

-Yonik
http://www.lucidimagination.com

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Re: Field.tokenStreamValue

Posted by Yonik Seeley <yo...@lucidimagination.com>.
Yep, it's also useful for pre-analyzing text.
Wish I had it way back when I started Solr (to avoid an unneccessary
pass through the analyzer, I actually stored and indexed the number in
transformed but untokenized form... not great for Luke :-)

-Yonik
http://www.lucidimagination.com

On Tue, Jun 16, 2009 at 6:18 AM, Uwe Schindler<uw...@thetaphi.de> wrote:
> Yes, I exactly need this for NumericField! The numeric value gets indexed
> using the tokenStream, but an optional stored field value (e.g. the number
> as plain text or even prefixEncoded) would also be good. Currently the user
> must index both types separate (but can use the same field name). As far as
> I see, this is not a problem with the current indexer. The indexer first
> tries tokenStreamValue() during indexing, but when saving the stored fields,
> always stringValue()/getBinaryValue() is used.
>
> -----
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: uwe@thetaphi.de
>
>> -----Original Message-----
>> From: Michael McCandless [mailto:lucene@mikemccandless.com]
>> Sent: Tuesday, June 16, 2009 12:13 PM
>> To: java-dev@lucene.apache.org; yonik@lucidimagination.com
>> Subject: Re: Field.tokenStreamValue
>>
>> Seems reasonable?
>>
>> So you're saying that if a Field has both TokenStream and some other
>> value, the TokenStream gets indexed into postings & term vectors, but
>> the other value gets stored?
>>
>> Mike
>>
>> On Mon, Jun 15, 2009 at 9:48 PM, Yonik Seeley<yo...@lucidimagination.com>
>> wrote:
>> > The JavaDoc suggests that one can't have a tokenStreamValue and a
>> > StringValue or binaryValue at the same time... any good reason for
>> > this restriction?
>> >
>> >  /** The value of the field as a String, or null.  If null, the Reader
>> value,
>> >   * binary value, or TokenStream value is used.  Exactly one of
>> stringValue(),
>> >   * readerValue(), getBinaryValue(), and tokenStreamValue() must be set.
>> */
>> >
>> > The indexing code looks like it should actually work - but the Field
>> > restricts one setting a tokenStreamValue and having a stored field at
>> > the same time.
>> > Should we fix this?
>> >
>> > -Yonik
>> > http://www.lucidimagination.com
>> >
>> > ---------------------------------------------------------------------
>> > To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>> > For additional commands, e-mail: java-dev-help@lucene.apache.org
>> >
>> >
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Re: Field.tokenStreamValue

Posted by Michael McCandless <lu...@mikemccandless.com>.
That sounds good.

Mike

On Tue, Jun 16, 2009 at 6:53 AM, Uwe Schindler<uw...@thetaphi.de> wrote:
> Maybe we should also add ctors to Field, with TokenStream and String/binary
> that set Field.Store.YES (compress is deprecated, so no need to support).
>
> -----
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: uwe@thetaphi.de
>
>
>> -----Original Message-----
>> From: Michael McCandless [mailto:lucene@mikemccandless.com]
>> Sent: Tuesday, June 16, 2009 12:48 PM
>> To: java-dev@lucene.apache.org
>> Subject: Re: Field.tokenStreamValue
>>
>> OK let's do it then... Yonik do you want to open issue, patch, etc.?
>>
>> We should spell this out clearly in the javadocs that this case
>> (tokenStream + string/binary value) is handled "specially", because
>> this does break from Field's "normal" semantics.
>>
>> Mike
>>
>> On Tue, Jun 16, 2009 at 6:18 AM, Uwe Schindler<uw...@thetaphi.de> wrote:
>> > Yes, I exactly need this for NumericField! The numeric value gets
>> indexed
>> > using the tokenStream, but an optional stored field value (e.g. the
>> number
>> > as plain text or even prefixEncoded) would also be good. Currently the
>> user
>> > must index both types separate (but can use the same field name). As far
>> as
>> > I see, this is not a problem with the current indexer. The indexer first
>> > tries tokenStreamValue() during indexing, but when saving the stored
>> fields,
>> > always stringValue()/getBinaryValue() is used.
>> >
>> > -----
>> > Uwe Schindler
>> > H.-H.-Meier-Allee 63, D-28213 Bremen
>> > http://www.thetaphi.de
>> > eMail: uwe@thetaphi.de
>> >
>> >> -----Original Message-----
>> >> From: Michael McCandless [mailto:lucene@mikemccandless.com]
>> >> Sent: Tuesday, June 16, 2009 12:13 PM
>> >> To: java-dev@lucene.apache.org; yonik@lucidimagination.com
>> >> Subject: Re: Field.tokenStreamValue
>> >>
>> >> Seems reasonable?
>> >>
>> >> So you're saying that if a Field has both TokenStream and some other
>> >> value, the TokenStream gets indexed into postings & term vectors, but
>> >> the other value gets stored?
>> >>
>> >> Mike
>> >>
>> >> On Mon, Jun 15, 2009 at 9:48 PM, Yonik
>> Seeley<yo...@lucidimagination.com>
>> >> wrote:
>> >> > The JavaDoc suggests that one can't have a tokenStreamValue and a
>> >> > StringValue or binaryValue at the same time... any good reason for
>> >> > this restriction?
>> >> >
>> >> >  /** The value of the field as a String, or null.  If null, the
>> Reader
>> >> value,
>> >> >   * binary value, or TokenStream value is used.  Exactly one of
>> >> stringValue(),
>> >> >   * readerValue(), getBinaryValue(), and tokenStreamValue() must be
>> set.
>> >> */
>> >> >
>> >> > The indexing code looks like it should actually work - but the Field
>> >> > restricts one setting a tokenStreamValue and having a stored field at
>> >> > the same time.
>> >> > Should we fix this?
>> >> >
>> >> > -Yonik
>> >> > http://www.lucidimagination.com
>> >> >
>> >> > ---------------------------------------------------------------------
>> >> > To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>> >> > For additional commands, e-mail: java-dev-help@lucene.apache.org
>> >> >
>> >> >
>> >>
>> >> ---------------------------------------------------------------------
>> >> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>> >> For additional commands, e-mail: java-dev-help@lucene.apache.org
>> >
>> >
>> >
>> > ---------------------------------------------------------------------
>> > To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>> > For additional commands, e-mail: java-dev-help@lucene.apache.org
>> >
>> >
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


RE: Field.tokenStreamValue

Posted by Uwe Schindler <uw...@thetaphi.de>.
Maybe we should also add ctors to Field, with TokenStream and String/binary
that set Field.Store.YES (compress is deprecated, so no need to support).

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe@thetaphi.de


> -----Original Message-----
> From: Michael McCandless [mailto:lucene@mikemccandless.com]
> Sent: Tuesday, June 16, 2009 12:48 PM
> To: java-dev@lucene.apache.org
> Subject: Re: Field.tokenStreamValue
> 
> OK let's do it then... Yonik do you want to open issue, patch, etc.?
> 
> We should spell this out clearly in the javadocs that this case
> (tokenStream + string/binary value) is handled "specially", because
> this does break from Field's "normal" semantics.
> 
> Mike
> 
> On Tue, Jun 16, 2009 at 6:18 AM, Uwe Schindler<uw...@thetaphi.de> wrote:
> > Yes, I exactly need this for NumericField! The numeric value gets
> indexed
> > using the tokenStream, but an optional stored field value (e.g. the
> number
> > as plain text or even prefixEncoded) would also be good. Currently the
> user
> > must index both types separate (but can use the same field name). As far
> as
> > I see, this is not a problem with the current indexer. The indexer first
> > tries tokenStreamValue() during indexing, but when saving the stored
> fields,
> > always stringValue()/getBinaryValue() is used.
> >
> > -----
> > Uwe Schindler
> > H.-H.-Meier-Allee 63, D-28213 Bremen
> > http://www.thetaphi.de
> > eMail: uwe@thetaphi.de
> >
> >> -----Original Message-----
> >> From: Michael McCandless [mailto:lucene@mikemccandless.com]
> >> Sent: Tuesday, June 16, 2009 12:13 PM
> >> To: java-dev@lucene.apache.org; yonik@lucidimagination.com
> >> Subject: Re: Field.tokenStreamValue
> >>
> >> Seems reasonable?
> >>
> >> So you're saying that if a Field has both TokenStream and some other
> >> value, the TokenStream gets indexed into postings & term vectors, but
> >> the other value gets stored?
> >>
> >> Mike
> >>
> >> On Mon, Jun 15, 2009 at 9:48 PM, Yonik
> Seeley<yo...@lucidimagination.com>
> >> wrote:
> >> > The JavaDoc suggests that one can't have a tokenStreamValue and a
> >> > StringValue or binaryValue at the same time... any good reason for
> >> > this restriction?
> >> >
> >> >  /** The value of the field as a String, or null.  If null, the
> Reader
> >> value,
> >> >   * binary value, or TokenStream value is used.  Exactly one of
> >> stringValue(),
> >> >   * readerValue(), getBinaryValue(), and tokenStreamValue() must be
> set.
> >> */
> >> >
> >> > The indexing code looks like it should actually work - but the Field
> >> > restricts one setting a tokenStreamValue and having a stored field at
> >> > the same time.
> >> > Should we fix this?
> >> >
> >> > -Yonik
> >> > http://www.lucidimagination.com
> >> >
> >> > ---------------------------------------------------------------------
> >> > To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> >> > For additional commands, e-mail: java-dev-help@lucene.apache.org
> >> >
> >> >
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> >> For additional commands, e-mail: java-dev-help@lucene.apache.org
> >
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-dev-help@lucene.apache.org
> >
> >
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Re: Field.tokenStreamValue

Posted by Michael McCandless <lu...@mikemccandless.com>.
OK let's do it then... Yonik do you want to open issue, patch, etc.?

We should spell this out clearly in the javadocs that this case
(tokenStream + string/binary value) is handled "specially", because
this does break from Field's "normal" semantics.

Mike

On Tue, Jun 16, 2009 at 6:18 AM, Uwe Schindler<uw...@thetaphi.de> wrote:
> Yes, I exactly need this for NumericField! The numeric value gets indexed
> using the tokenStream, but an optional stored field value (e.g. the number
> as plain text or even prefixEncoded) would also be good. Currently the user
> must index both types separate (but can use the same field name). As far as
> I see, this is not a problem with the current indexer. The indexer first
> tries tokenStreamValue() during indexing, but when saving the stored fields,
> always stringValue()/getBinaryValue() is used.
>
> -----
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: uwe@thetaphi.de
>
>> -----Original Message-----
>> From: Michael McCandless [mailto:lucene@mikemccandless.com]
>> Sent: Tuesday, June 16, 2009 12:13 PM
>> To: java-dev@lucene.apache.org; yonik@lucidimagination.com
>> Subject: Re: Field.tokenStreamValue
>>
>> Seems reasonable?
>>
>> So you're saying that if a Field has both TokenStream and some other
>> value, the TokenStream gets indexed into postings & term vectors, but
>> the other value gets stored?
>>
>> Mike
>>
>> On Mon, Jun 15, 2009 at 9:48 PM, Yonik Seeley<yo...@lucidimagination.com>
>> wrote:
>> > The JavaDoc suggests that one can't have a tokenStreamValue and a
>> > StringValue or binaryValue at the same time... any good reason for
>> > this restriction?
>> >
>> >  /** The value of the field as a String, or null.  If null, the Reader
>> value,
>> >   * binary value, or TokenStream value is used.  Exactly one of
>> stringValue(),
>> >   * readerValue(), getBinaryValue(), and tokenStreamValue() must be set.
>> */
>> >
>> > The indexing code looks like it should actually work - but the Field
>> > restricts one setting a tokenStreamValue and having a stored field at
>> > the same time.
>> > Should we fix this?
>> >
>> > -Yonik
>> > http://www.lucidimagination.com
>> >
>> > ---------------------------------------------------------------------
>> > To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>> > For additional commands, e-mail: java-dev-help@lucene.apache.org
>> >
>> >
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


RE: Field.tokenStreamValue

Posted by Uwe Schindler <uw...@thetaphi.de>.
Yes, I exactly need this for NumericField! The numeric value gets indexed
using the tokenStream, but an optional stored field value (e.g. the number
as plain text or even prefixEncoded) would also be good. Currently the user
must index both types separate (but can use the same field name). As far as
I see, this is not a problem with the current indexer. The indexer first
tries tokenStreamValue() during indexing, but when saving the stored fields,
always stringValue()/getBinaryValue() is used.

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe@thetaphi.de

> -----Original Message-----
> From: Michael McCandless [mailto:lucene@mikemccandless.com]
> Sent: Tuesday, June 16, 2009 12:13 PM
> To: java-dev@lucene.apache.org; yonik@lucidimagination.com
> Subject: Re: Field.tokenStreamValue
> 
> Seems reasonable?
> 
> So you're saying that if a Field has both TokenStream and some other
> value, the TokenStream gets indexed into postings & term vectors, but
> the other value gets stored?
> 
> Mike
> 
> On Mon, Jun 15, 2009 at 9:48 PM, Yonik Seeley<yo...@lucidimagination.com>
> wrote:
> > The JavaDoc suggests that one can't have a tokenStreamValue and a
> > StringValue or binaryValue at the same time... any good reason for
> > this restriction?
> >
> >  /** The value of the field as a String, or null.  If null, the Reader
> value,
> >   * binary value, or TokenStream value is used.  Exactly one of
> stringValue(),
> >   * readerValue(), getBinaryValue(), and tokenStreamValue() must be set.
> */
> >
> > The indexing code looks like it should actually work - but the Field
> > restricts one setting a tokenStreamValue and having a stored field at
> > the same time.
> > Should we fix this?
> >
> > -Yonik
> > http://www.lucidimagination.com
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-dev-help@lucene.apache.org
> >
> >
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Re: Field.tokenStreamValue

Posted by Michael McCandless <lu...@mikemccandless.com>.
Seems reasonable?

So you're saying that if a Field has both TokenStream and some other
value, the TokenStream gets indexed into postings & term vectors, but
the other value gets stored?

Mike

On Mon, Jun 15, 2009 at 9:48 PM, Yonik Seeley<yo...@lucidimagination.com> wrote:
> The JavaDoc suggests that one can't have a tokenStreamValue and a
> StringValue or binaryValue at the same time... any good reason for
> this restriction?
>
>  /** The value of the field as a String, or null.  If null, the Reader value,
>   * binary value, or TokenStream value is used.  Exactly one of stringValue(),
>   * readerValue(), getBinaryValue(), and tokenStreamValue() must be set. */
>
> The indexing code looks like it should actually work - but the Field
> restricts one setting a tokenStreamValue and having a stored field at
> the same time.
> Should we fix this?
>
> -Yonik
> http://www.lucidimagination.com
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org