You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Alexandre Rafalovitch <ar...@gmail.com> on 2016/11/11 02:08:15 UTC

Is there a way to tell if multivalued field actually contains multiple values?

Hello,

Say I indexed a large dataset against a schemaless configuration. Now
I have a bunch of multivalued fields. Is there any way to say which of
these (text) fields have (for given data) only single values? I know I
am supposed to look at the original data, and all that, but this is
more for debugging/troubleshooting.

Turning termOffsets/termPositions would make it easy, but that's a bit
messy for troubleshooting purposes.

I was thinking that one giveaway is the positionIncrementGap causing
the second value's token to start at number above a hundred. But I am
not sure how to craft a query against a field to see if such a token
is generically present.


Any ideas?

Regards,
    Alex.

----
Solr Example reading group is starting November 2016, join us at
http://j.mp/SolrERG
Newsletter and resources for Solr beginners and intermediates:
http://www.solr-start.com/

Re: Is there a way to tell if multivalued field actually contains multiple values?

Posted by Erick Erickson <er...@gmail.com>.
I don't think so. Once things are indexed, they look just like a
regular text field with odd offsets for some of the terms. Of course
if you returned the stored form (assuming it's stored) it'd look
different, but that's messy too.

Best,
Erick

On Thu, Nov 10, 2016 at 6:08 PM, Alexandre Rafalovitch
<ar...@gmail.com> wrote:
> Hello,
>
> Say I indexed a large dataset against a schemaless configuration. Now
> I have a bunch of multivalued fields. Is there any way to say which of
> these (text) fields have (for given data) only single values? I know I
> am supposed to look at the original data, and all that, but this is
> more for debugging/troubleshooting.
>
> Turning termOffsets/termPositions would make it easy, but that's a bit
> messy for troubleshooting purposes.
>
> I was thinking that one giveaway is the positionIncrementGap causing
> the second value's token to start at number above a hundred. But I am
> not sure how to craft a query against a field to see if such a token
> is generically present.
>
>
> Any ideas?
>
> Regards,
>     Alex.
>
> ----
> Solr Example reading group is starting November 2016, join us at
> http://j.mp/SolrERG
> Newsletter and resources for Solr beginners and intermediates:
> http://www.solr-start.com/

Re: Is there a way to tell if multivalued field actually contains multiple values?

Posted by Michael McCandless <lu...@mikemccandless.com>.
I think you can use the term stats that Lucene tracks for each field.

Compare Terms.getSumTotalTermFreq and Terms.getDocCount.  If they are
equal it means every document that had this field, had only one token.

Mike McCandless

http://blog.mikemccandless.com


On Fri, Nov 11, 2016 at 5:50 AM, Mikhail Khludnev <mk...@apache.org> wrote:
> I suppose it's needless to remind that norm(field) is proportional (but not
> precisely by default) to number of tokens in a doc's field (although not
> actual text values).
>
> On Fri, Nov 11, 2016 at 5:08 AM, Alexandre Rafalovitch <ar...@gmail.com>
> wrote:
>
>> Hello,
>>
>> Say I indexed a large dataset against a schemaless configuration. Now
>> I have a bunch of multivalued fields. Is there any way to say which of
>> these (text) fields have (for given data) only single values? I know I
>> am supposed to look at the original data, and all that, but this is
>> more for debugging/troubleshooting.
>>
>> Turning termOffsets/termPositions would make it easy, but that's a bit
>> messy for troubleshooting purposes.
>>
>> I was thinking that one giveaway is the positionIncrementGap causing
>> the second value's token to start at number above a hundred. But I am
>> not sure how to craft a query against a field to see if such a token
>> is generically present.
>>
>>
>> Any ideas?
>>
>> Regards,
>>     Alex.
>>
>> ----
>> Solr Example reading group is starting November 2016, join us at
>> http://j.mp/SolrERG
>> Newsletter and resources for Solr beginners and intermediates:
>> http://www.solr-start.com/
>>
>
>
>
> --
> Sincerely yours
> Mikhail Khludnev

Re: Is there a way to tell if multivalued field actually contains multiple values?

Posted by Mikhail Khludnev <mk...@apache.org>.
I suppose it's needless to remind that norm(field) is proportional (but not
precisely by default) to number of tokens in a doc's field (although not
actual text values).

On Fri, Nov 11, 2016 at 5:08 AM, Alexandre Rafalovitch <ar...@gmail.com>
wrote:

> Hello,
>
> Say I indexed a large dataset against a schemaless configuration. Now
> I have a bunch of multivalued fields. Is there any way to say which of
> these (text) fields have (for given data) only single values? I know I
> am supposed to look at the original data, and all that, but this is
> more for debugging/troubleshooting.
>
> Turning termOffsets/termPositions would make it easy, but that's a bit
> messy for troubleshooting purposes.
>
> I was thinking that one giveaway is the positionIncrementGap causing
> the second value's token to start at number above a hundred. But I am
> not sure how to craft a query against a field to see if such a token
> is generically present.
>
>
> Any ideas?
>
> Regards,
>     Alex.
>
> ----
> Solr Example reading group is starting November 2016, join us at
> http://j.mp/SolrERG
> Newsletter and resources for Solr beginners and intermediates:
> http://www.solr-start.com/
>



-- 
Sincerely yours
Mikhail Khludnev