You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Luis Cappa Banda <lu...@gmail.com> on 2015/05/14 13:17:51 UTC

Real-Time get and Dynamic Fields: possible bug.

Hi there,

I have the following dynamicFields definition in my schema.xml:


<!-- I18n DynamicFields -->

<dynamicField name="i18n*" type="string" indexed="true" stored="true" /> <!--
DynamicFields used typically for faceting issues by copying values from
other existing fields--> <dynamicField name="*_facet" type="string" indexed=
"true" stored="true" multiValued="true" />


I' ve seen that when fetching documents with /select?q=id:whateverId, the
results returned include both i18n* and *_facet fields filled. However,
when using real-time request handler (/get?ids:whateverIds) the result
fetched include only i18n* dynamic fields, but *_facet ones are not
included.

I have the impression during /get RequestHandler the server-side regular
expression used when parsing fields and fields values to return documents
with existing dynamic fields seems to be wrong. From the client side, I' ve
checked that the class DocField.java that parses SolrDocument to Bean ones
uses the following matcher:

 } else if (annotation.value().indexOf('*') >= 0) { // dynamic fields are
annotated as @Field("categories_*")

// if the field was annotated as a dynamic field, convert the name into a
pattern

// the wildcard (*) is supposed to be either a prefix or a suffix, hence
the use of replaceFirst

name = annotation.value().replaceFirst("\\*", "\\.*");

dynamicFieldNamePatternMatcher = Pattern.compile("^" + name + "$");

 } else {

name = annotation.value();

 }

So maybe a similar behavior from the server-side is wrong. That' s the only
reason I find to understand why when using /select all fields are returned
but when using /get those that matches *_facet regexp are not.

If you can confirm that this is a bug (because maybe is the expected
behavior, but after some years using Solr I think it is not) I can create
the JIRA issue and debug it more deeply to apply a patch with the aim to
help.


Regards,


-- 
- Luis Cappa

Re: Real-Time get and Dynamic Fields: possible bug.

Posted by Luis Cappa Banda <lu...@gmail.com>.
Yep, but those dynamic fields had a field type "string", so the unique
indexed therm will be the entire field value and the faceted terms counted
will match with exactly with each field value. Thats why I was confused.
Typically I use faceting with string non tokenized field values for simple
stats and this kind of things.

Do you think the behavior explained (I mean, ghost dynamic field values
when using real-time request handler) can be a bug? I don' t mind
investigating it this weekend and trying to patch it.

2015-05-14 18:59 GMT+02:00 Yonik Seeley <ys...@gmail.com>:

> On Thu, May 14, 2015 at 12:49 PM, Luis Cappa Banda <lu...@gmail.com>
> wrote:
> > If you don' t mark as stored a field indexed and 'facetable', I was
> > expecting to not be able to return their values, so faceting has no
> sense.
>
> Faceting does not use or retrieve stored field values.  The labels
> faceting returns are from the indexed values.
>
> "If you want the value returned, it needs to be stored" only applies
> to fields in the main document list (the fields that are retrieved for
> the top ranked documents).
>
> -Yonik
>



-- 
- Luis Cappa

Re: Real-Time get and Dynamic Fields: possible bug.

Posted by Yonik Seeley <ys...@gmail.com>.
On Thu, May 14, 2015 at 12:49 PM, Luis Cappa Banda <lu...@gmail.com> wrote:
> If you don' t mark as stored a field indexed and 'facetable', I was
> expecting to not be able to return their values, so faceting has no sense.

Faceting does not use or retrieve stored field values.  The labels
faceting returns are from the indexed values.

"If you want the value returned, it needs to be stored" only applies
to fields in the main document list (the fields that are retrieved for
the top ranked documents).

-Yonik

Re: Real-Time get and Dynamic Fields: possible bug.

Posted by Luis Cappa Banda <lu...@gmail.com>.
That is something I didin' t know, but I thought it was mandatory. I' ll
try to explain step by step my (I think) logical way to understand it:

   - If a field is indexed, you can search by it.
   - When faceting, you have to index the field (because it can be
   tokenized and then you would like to facet by their terms). Then, you need
   to mark as indexed those fields you want to facet by.
   - If you mark as stored a field, you can return its value with the
   'original value' it was stored.
   - If you facet, you are searching, counting terms and returning values
   and their counters. Thus, that "returning their values" step is what I
   thought where 'stored=true' was necessary.

If you don' t mark as stored a field indexed and 'facetable', I was
expecting to not be able to return their values, so faceting has no sense.
Thats what I thought, of course. If it is not necessary, thats perfect: the
lighter the data, the better, and one more thing I' ve learned, :-)

Anyway, I think that the question is still open: both are dynamic fields,
stored (it is not necessary, OK) and indexed. When applying real time
requestHandler, i18n* dynamic fields are returned but those *_facet are
not. However, when applying the default /select requestHandler and finding
by the document id, both i18n* and *_facet fields are returned. You can try
it with Solr 5.1, the version I' m currently using.

The only differences between them are:

   - Regular expression: i18n* VS *_facet
   - Multivalued: *_facet are multivalued.


Regards,


- Luis Cappa

2015-05-14 18:32 GMT+02:00 Yonik Seeley <ys...@gmail.com>:

> On Thu, May 14, 2015 at 10:47 AM, Luis Cappa Banda <lu...@gmail.com>
> wrote:
> > Hi Yonik,
> >
> > Yes, they are the target from copyFields in the schema.xml. This *_target
> > fields are suposed to be used in some specific searchable (thus,
> tokenized)
> > fields that in the future are candidates to be faceted to return some
> > stats. For example, imagine that you have a field storing a directory
> path
> > and you want to search by. Also, you may want to facet by the whole
> > directory path value (not just their terms). Thats why I' m storing both
> > field values: searchable and tokenized one, string and 'facet candidate'
> > one.
>
> OK, but you don't need to *store* the values in _facet, right?
> -Yonik
>



-- 
- Luis Cappa

Re: Real-Time get and Dynamic Fields: possible bug.

Posted by Yonik Seeley <ys...@gmail.com>.
On Thu, May 14, 2015 at 10:47 AM, Luis Cappa Banda <lu...@gmail.com> wrote:
> Hi Yonik,
>
> Yes, they are the target from copyFields in the schema.xml. This *_target
> fields are suposed to be used in some specific searchable (thus, tokenized)
> fields that in the future are candidates to be faceted to return some
> stats. For example, imagine that you have a field storing a directory path
> and you want to search by. Also, you may want to facet by the whole
> directory path value (not just their terms). Thats why I' m storing both
> field values: searchable and tokenized one, string and 'facet candidate'
> one.

OK, but you don't need to *store* the values in _facet, right?
-Yonik

Re: Real-Time get and Dynamic Fields: possible bug.

Posted by Luis Cappa Banda <lu...@gmail.com>.
Ehem, *_target ---> *_facet.

2015-05-14 16:47 GMT+02:00 Luis Cappa Banda <lu...@gmail.com>:

> Hi Yonik,
>
> Yes, they are the target from copyFields in the schema.xml. This *_target
> fields are suposed to be used in some specific searchable (thus, tokenized)
> fields that in the future are candidates to be faceted to return some
> stats. For example, imagine that you have a field storing a directory path
> and you want to search by. Also, you may want to facet by the whole
> directory path value (not just their terms). Thats why I' m storing both
> field values: searchable and tokenized one, string and 'facet candidate'
> one.
>
> What I do not understand is that both i18n* and *_target are dynamic,
> indexed and stored values. The only difference is that *_target one is
> multivalued. Does it have some sense?
>
>
> Regards
>
>
> - Luis Cappa
>
> 2015-05-14 16:42 GMT+02:00 Yonik Seeley <ys...@gmail.com>:
>
>> Are the _facet fields the target of a copyField in the schema?
>> Realtime get either gets the values from the transaction log (and if
>> you didn't send it the values, they won't be there) or gets them from
>> the index to try and reconstruct what was sent in.
>>
>> It's generally not recommended to have copyField targets "stored", or
>> have a mix of explicitly set values and copyField values in the same
>> field.
>>
>> -Yonik
>>
>> On Thu, May 14, 2015 at 7:17 AM, Luis Cappa Banda <lu...@gmail.com>
>> wrote:
>> > Hi there,
>> >
>> > I have the following dynamicFields definition in my schema.xml:
>> >
>> >
>> > <!-- I18n DynamicFields -->
>> >
>> > <dynamicField name="i18n*" type="string" indexed="true" stored="true"
>> /> <!--
>> > DynamicFields used typically for faceting issues by copying values from
>> > other existing fields--> <dynamicField name="*_facet" type="string"
>> indexed=
>> > "true" stored="true" multiValued="true" />
>> >
>> >
>> > I' ve seen that when fetching documents with /select?q=id:whateverId,
>> the
>> > results returned include both i18n* and *_facet fields filled. However,
>> > when using real-time request handler (/get?ids:whateverIds) the result
>> > fetched include only i18n* dynamic fields, but *_facet ones are not
>> > included.
>> >
>> > I have the impression during /get RequestHandler the server-side regular
>> > expression used when parsing fields and fields values to return
>> documents
>> > with existing dynamic fields seems to be wrong. From the client side,
>> I' ve
>> > checked that the class DocField.java that parses SolrDocument to Bean
>> ones
>> > uses the following matcher:
>> >
>> >  } else if (annotation.value().indexOf('*') >= 0) { // dynamic fields
>> are
>> > annotated as @Field("categories_*")
>> >
>> > // if the field was annotated as a dynamic field, convert the name into
>> a
>> > pattern
>> >
>> > // the wildcard (*) is supposed to be either a prefix or a suffix, hence
>> > the use of replaceFirst
>> >
>> > name = annotation.value().replaceFirst("\\*", "\\.*");
>> >
>> > dynamicFieldNamePatternMatcher = Pattern.compile("^" + name + "$");
>> >
>> >  } else {
>> >
>> > name = annotation.value();
>> >
>> >  }
>> >
>> > So maybe a similar behavior from the server-side is wrong. That' s the
>> only
>> > reason I find to understand why when using /select all fields are
>> returned
>> > but when using /get those that matches *_facet regexp are not.
>> >
>> > If you can confirm that this is a bug (because maybe is the expected
>> > behavior, but after some years using Solr I think it is not) I can
>> create
>> > the JIRA issue and debug it more deeply to apply a patch with the aim to
>> > help.
>> >
>> >
>> > Regards,
>> >
>> >
>> > --
>> > - Luis Cappa
>>
>
>
>
> --
> - Luis Cappa
>



-- 
- Luis Cappa

Re: Real-Time get and Dynamic Fields: possible bug.

Posted by Luis Cappa Banda <lu...@gmail.com>.
Hi Yonik,

Yes, they are the target from copyFields in the schema.xml. This *_target
fields are suposed to be used in some specific searchable (thus, tokenized)
fields that in the future are candidates to be faceted to return some
stats. For example, imagine that you have a field storing a directory path
and you want to search by. Also, you may want to facet by the whole
directory path value (not just their terms). Thats why I' m storing both
field values: searchable and tokenized one, string and 'facet candidate'
one.

What I do not understand is that both i18n* and *_target are dynamic,
indexed and stored values. The only difference is that *_target one is
multivalued. Does it have some sense?


Regards


- Luis Cappa

2015-05-14 16:42 GMT+02:00 Yonik Seeley <ys...@gmail.com>:

> Are the _facet fields the target of a copyField in the schema?
> Realtime get either gets the values from the transaction log (and if
> you didn't send it the values, they won't be there) or gets them from
> the index to try and reconstruct what was sent in.
>
> It's generally not recommended to have copyField targets "stored", or
> have a mix of explicitly set values and copyField values in the same
> field.
>
> -Yonik
>
> On Thu, May 14, 2015 at 7:17 AM, Luis Cappa Banda <lu...@gmail.com>
> wrote:
> > Hi there,
> >
> > I have the following dynamicFields definition in my schema.xml:
> >
> >
> > <!-- I18n DynamicFields -->
> >
> > <dynamicField name="i18n*" type="string" indexed="true" stored="true" />
> <!--
> > DynamicFields used typically for faceting issues by copying values from
> > other existing fields--> <dynamicField name="*_facet" type="string"
> indexed=
> > "true" stored="true" multiValued="true" />
> >
> >
> > I' ve seen that when fetching documents with /select?q=id:whateverId, the
> > results returned include both i18n* and *_facet fields filled. However,
> > when using real-time request handler (/get?ids:whateverIds) the result
> > fetched include only i18n* dynamic fields, but *_facet ones are not
> > included.
> >
> > I have the impression during /get RequestHandler the server-side regular
> > expression used when parsing fields and fields values to return documents
> > with existing dynamic fields seems to be wrong. From the client side, I'
> ve
> > checked that the class DocField.java that parses SolrDocument to Bean
> ones
> > uses the following matcher:
> >
> >  } else if (annotation.value().indexOf('*') >= 0) { // dynamic fields are
> > annotated as @Field("categories_*")
> >
> > // if the field was annotated as a dynamic field, convert the name into a
> > pattern
> >
> > // the wildcard (*) is supposed to be either a prefix or a suffix, hence
> > the use of replaceFirst
> >
> > name = annotation.value().replaceFirst("\\*", "\\.*");
> >
> > dynamicFieldNamePatternMatcher = Pattern.compile("^" + name + "$");
> >
> >  } else {
> >
> > name = annotation.value();
> >
> >  }
> >
> > So maybe a similar behavior from the server-side is wrong. That' s the
> only
> > reason I find to understand why when using /select all fields are
> returned
> > but when using /get those that matches *_facet regexp are not.
> >
> > If you can confirm that this is a bug (because maybe is the expected
> > behavior, but after some years using Solr I think it is not) I can create
> > the JIRA issue and debug it more deeply to apply a patch with the aim to
> > help.
> >
> >
> > Regards,
> >
> >
> > --
> > - Luis Cappa
>



-- 
- Luis Cappa

Re: Real-Time get and Dynamic Fields: possible bug.

Posted by Yonik Seeley <ys...@gmail.com>.
Are the _facet fields the target of a copyField in the schema?
Realtime get either gets the values from the transaction log (and if
you didn't send it the values, they won't be there) or gets them from
the index to try and reconstruct what was sent in.

It's generally not recommended to have copyField targets "stored", or
have a mix of explicitly set values and copyField values in the same
field.

-Yonik

On Thu, May 14, 2015 at 7:17 AM, Luis Cappa Banda <lu...@gmail.com> wrote:
> Hi there,
>
> I have the following dynamicFields definition in my schema.xml:
>
>
> <!-- I18n DynamicFields -->
>
> <dynamicField name="i18n*" type="string" indexed="true" stored="true" /> <!--
> DynamicFields used typically for faceting issues by copying values from
> other existing fields--> <dynamicField name="*_facet" type="string" indexed=
> "true" stored="true" multiValued="true" />
>
>
> I' ve seen that when fetching documents with /select?q=id:whateverId, the
> results returned include both i18n* and *_facet fields filled. However,
> when using real-time request handler (/get?ids:whateverIds) the result
> fetched include only i18n* dynamic fields, but *_facet ones are not
> included.
>
> I have the impression during /get RequestHandler the server-side regular
> expression used when parsing fields and fields values to return documents
> with existing dynamic fields seems to be wrong. From the client side, I' ve
> checked that the class DocField.java that parses SolrDocument to Bean ones
> uses the following matcher:
>
>  } else if (annotation.value().indexOf('*') >= 0) { // dynamic fields are
> annotated as @Field("categories_*")
>
> // if the field was annotated as a dynamic field, convert the name into a
> pattern
>
> // the wildcard (*) is supposed to be either a prefix or a suffix, hence
> the use of replaceFirst
>
> name = annotation.value().replaceFirst("\\*", "\\.*");
>
> dynamicFieldNamePatternMatcher = Pattern.compile("^" + name + "$");
>
>  } else {
>
> name = annotation.value();
>
>  }
>
> So maybe a similar behavior from the server-side is wrong. That' s the only
> reason I find to understand why when using /select all fields are returned
> but when using /get those that matches *_facet regexp are not.
>
> If you can confirm that this is a bug (because maybe is the expected
> behavior, but after some years using Solr I think it is not) I can create
> the JIRA issue and debug it more deeply to apply a patch with the aim to
> help.
>
>
> Regards,
>
>
> --
> - Luis Cappa