You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by "Simon, Richard T" <Ri...@hms.harvard.edu> on 2011/06/15 20:21:28 UTC

getFieldValue always returns an ArrayList?

Hi - I am examining a SolrDocument I retrieved through a query. The field I am looking at is declared this way in my schema:

<field name="uri" type="string" indexed="true" stored="true" multivalued="false" required="true" />

I know multivalued defaults to false, but I set it explicitly because I'm seeing some unexpected behavior. I retrieve the value of the field like so:

final String resource = (String)document.getFieldValue("uri");


However, I get an exception because an ArrayList is returned. I confirmed that the returned ArrayList has one element with the correct value, but I thought getFieldValue would return a String if the field is single valued. When I index the document, I have some code that retrieves the same field in the same way from the SolrInputDocument, and that code works.

I looked at the code for SolrDocument.setField and it looks like the only way a field should be set to an ArrayList is if one is passed in by the code creating the SolrDocument. Why would it do that if the field is not multivalued?

Is this behavior expected?

-Rich

RE: getFieldValue always returns an ArrayList?

Posted by "Simon, Richard T" <Ri...@hms.harvard.edu>.
FYI: Using multiValued="false" for all string fields results in the following output:

####### Field uri is an instance of String.
####### Field entity_label is an instance of String.
####### Field institution_uri is an instance of String.
####### Field asserted_type_uri is an instance of String.
####### Field asserted_type_label is an instance of String.
####### Field provider_uri is an instance of String.
####### Field provider_label is an instance of String.

-Rich

-----Original Message-----
From: Simon, Richard T 
Sent: Thursday, June 16, 2011 10:08 AM
To: solr-user@lucene.apache.org
Cc: Simon, Richard T
Subject: RE: getFieldValue always returns an ArrayList?

Interesting. You guessed right. I changed "multivalued" to "multiValued" and all of a sudden I get Strings. But, doesn't multivalued default to false? In my schema, I originally did not set multivalued. I only put in multivalued="false" after I experienced this issue. 

-Rich

For the record, I had a number of fields which had never settings for multivalued because none of them were multivalued and I expected the default to be false. When I experienced this problem, I added multivalued="false" to all of them. I still had the problem. So, I added a method to deal with the returned ArrayLists:

private Object getFieldValue(String field, SolrDocument document) {
		
			ArrayList list = (ArrayList)document.getFieldValue(field);
			return list.get(0);
	
	}


I deliberately did not test if the return Object was an ArrayList because I wanted to get an exception if any of them were Strings; I got no exceptions, so they were all returned as ArrayLists. 

I then changed one of the fields to use multiValued="false", and I got an exception, trying to cast String to ArrayList! So, I changed all the troublesome fields to use multiValued, and changed my helper method to look like this:

private Object getFieldValue(String field, SolrDocument document) {
		Object o = document.getFieldValue(field);
		
		if (o instanceof ArrayList) {
			System.out.println("####### Field " + field + " is an instance of ArrayList.");
			ArrayList list = (ArrayList)document.getFieldValue(field);
			return list.get(0);
		} else {
			if (!(o instanceof String)) {
				System.out.println("###### ERROR");
			} else {
				System.out.println("####### Field " + field + " is an instance of String.");
			}
			return o;
		}
		
	}


Here's the output, interspersed with the schema definitions of the fields:

<field name="uri" type="string" indexed="true" stored="true" multiValued="false" required="true" />
####### Field uri is an instance of String.

<field name="entity_label" type="string" indexed="false" stored="true" required="false" />
####### Field entity_label is an instance of ArrayList.

<field name="institution_uri" type="string" indexed="true" stored="true" required="false" />
####### Field institution_uri is an instance of ArrayList.

<field name="asserted_type_uri" type="string" indexed="true" stored="true" required="false" />    
####### Field asserted_type_uri is an instance of ArrayList.

<field name="asserted_type_label" type="text_eaglei" indexed="true" stored="true" required="false" />
####### Field asserted_type_label is an instance of ArrayList.

 <field name="provider_uri" type="string" indexed="true" stored="true" multiValued="false" required="false" />
####### Field provider_uri is an instance of String.

<field name="provider_label" type="string" indexed="true" stored="true" multiValued="false" required="false" />
####### Field provider_label is an instance of String.


As you can see, the ones with no declaration for multivalued are returned as ArrayLists, while the ones with multiValued="false" are returned as Strings. 

So, it looks like there are two problems here: multivalued (small v) is not recognized, since using that in the schema still causes all fields to be returned as ArrayLists; and, multivalued does not default to false (or, at least, not setting it causes a field to be returned as an ArrayList, as though it were set to true).

-Rich


-----Original Message-----
From: Erick Erickson [mailto:erickerickson@gmail.com] 
Sent: Wednesday, June 15, 2011 4:25 PM
To: solr-user@lucene.apache.org
Subject: Re: getFieldValue always returns an ArrayList?

Hmmm, I admit I'm not using embedded, and I'm using 3.2, but I'm
not seeing the behavior you are.

My question about reindexing could have been better stated, I
was just making sure you didn't have some leftover cruft where
your field was multi-valued from previous experiments, but if
you're reindexing each time that's not the problem.

Arrrggggh, camel case may be striking again. Try multiValued, not
multivalued....

If that's still not it, can we see the code?

Best
Erick

On Wed, Jun 15, 2011 at 3:47 PM, Simon, Richard T
<Ri...@hms.harvard.edu> wrote:
> We rebuild the index from scratch each time we start (for now). The fields in question are not multi-valued; in fact, I explicitly set multi-valued to false, just to be sure.
>
> Yes, this is SolrJ, using the embedded server, if that matters.
>
> Using Solr/Lucene 3.1.0.
>
> -Rich
>
> -----Original Message-----
> From: Erick Erickson [mailto:erickerickson@gmail.com]
> Sent: Wednesday, June 15, 2011 3:44 PM
> To: solr-user@lucene.apache.org
> Subject: Re: getFieldValue always returns an ArrayList?
>
> Did you perhaps change the schema but not re-index? I'm grasping
> at straws here, but something like this might happen if part of
> your index has that field as a multi-valued field....
>
> If that't not the problem, what version of solr are you using? I
> presume this is SolrJ?
>
> Best
> Erick
>
> On Wed, Jun 15, 2011 at 2:21 PM, Simon, Richard T
> <Ri...@hms.harvard.edu> wrote:
>> Hi - I am examining a SolrDocument I retrieved through a query. The field I am looking at is declared this way in my schema:
>>
>> <field name="uri" type="string" indexed="true" stored="true" multivalued="false" required="true" />
>>
>> I know multivalued defaults to false, but I set it explicitly because I'm seeing some unexpected behavior. I retrieve the value of the field like so:
>>
>> final String resource = (String)document.getFieldValue("uri");
>>
>>
>> However, I get an exception because an ArrayList is returned. I confirmed that the returned ArrayList has one element with the correct value, but I thought getFieldValue would return a String if the field is single valued. When I index the document, I have some code that retrieves the same field in the same way from the SolrInputDocument, and that code works.
>>
>> I looked at the code for SolrDocument.setField and it looks like the only way a field should be set to an ArrayList is if one is passed in by the code creating the SolrDocument. Why would it do that if the field is not multivalued?
>>
>> Is this behavior expected?
>>
>> -Rich
>>
>

RE: getFieldValue always returns an ArrayList?

Posted by "Simon, Richard T" <Ri...@hms.harvard.edu>.
Ah! That was the problem. The version was 1.0. I'll change it to 1.2. Thanks!

-Rich

-----Original Message-----
From: Chris Hostetter [mailto:hossman_lucene@fucit.org] 
Sent: Thursday, June 16, 2011 2:33 PM
To: Simon, Richard T
Cc: solr-user@lucene.apache.org
Subject: RE: getFieldValue always returns an ArrayList?


: We haven't changed Solr versions. We've been using 3.1.0 all along.

but that's not what i'm talking about.  I'm talking about the "schema 
version" ... a specific property declared in your schema.xml file.

did you check it?

(even when people start with Solr X, they sometimes are using schema.xml 
files provided by external packages -- Drupal, wordpress, etc... -- and 
don't notice that those are from older versions)

: Plus, I have some code that runs during indexing and retrieves the 
: fields from a SolrInputDocument, rather than a SolrDocument. That code 
: gets Strings without any problem, and always has, even without saying 
: multiValued="false".

SolrInputDocument's are irelevant.  they are used to index data, but they 
don't know anything about the schema.  A SolrInputDocument might be 
completely invalid because of multiple values for singled value fields, or 
missing values for required fields, etc...   what comes back from a search 
*is* consistent with the schema (even when there is only one value stored 
in a multiValued field)

-Hoss

RE: getFieldValue always returns an ArrayList?

Posted by Chris Hostetter <ho...@fucit.org>.
: We haven't changed Solr versions. We've been using 3.1.0 all along.

but that's not what i'm talking about.  I'm talking about the "schema 
version" ... a specific property declared in your schema.xml file.

did you check it?

(even when people start with Solr X, they sometimes are using schema.xml 
files provided by external packages -- Drupal, wordpress, etc... -- and 
don't notice that those are from older versions)

: Plus, I have some code that runs during indexing and retrieves the 
: fields from a SolrInputDocument, rather than a SolrDocument. That code 
: gets Strings without any problem, and always has, even without saying 
: multiValued="false".

SolrInputDocument's are irelevant.  they are used to index data, but they 
don't know anything about the schema.  A SolrInputDocument might be 
completely invalid because of multiple values for singled value fields, or 
missing values for required fields, etc...   what comes back from a search 
*is* consistent with the schema (even when there is only one value stored 
in a multiValued field)

-Hoss

RE: getFieldValue always returns an ArrayList?

Posted by "Simon, Richard T" <Ri...@hms.harvard.edu>.
We haven't changed Solr versions. We've been using 3.1.0 all along.

Plus, I have some code that runs during indexing and retrieves the fields from a SolrInputDocument, rather than a SolrDocument. That code gets Strings without any problem, and always has, even without saying multiValued="false".

-Rich

-----Original Message-----
From: Chris Hostetter [mailto:hossman_lucene@fucit.org] 
Sent: Thursday, June 16, 2011 2:18 PM
To: solr-user@lucene.apache.org
Cc: Simon, Richard T
Subject: RE: getFieldValue always returns an ArrayList?


: and all of a sudden I get Strings. But, doesn't multivalued default to 
: false? In my schema, I originally did not set multivalued. I only put in 
: multivalued="false" after I experienced this issue.

That's dependent on the version of Solr, and it's is where the 
"version" property of the schema comes in.  (as the default behavior in 
solr changes, it does so dependent on what "version" you specify in your 
schema to prevent radical behavior changes if you upgrade but keep the 
same configs)...

<schema name="example" version="1.4">
  <!-- attribute "name" is the name of this schema and is only used for display purposes.
       Applications should change this to reflect the nature of the search collection.
       version="1.4" is Solr's version number for the schema syntax and semantics.  It should
       not normally be changed by applications.
       1.0: multiValued attribute did not exist, all fields are multiValued by nature
       1.1: multiValued attribute introduced, false by default 
       1.2: omitTermFreqAndPositions attribute introduced, true by default except for text fields.
       1.3: removed optional field compress feature
       1.4: default auto-phrase (QueryParser feature) to off
     -->



-Hoss

RE: getFieldValue always returns an ArrayList?

Posted by Chris Hostetter <ho...@fucit.org>.
: and all of a sudden I get Strings. But, doesn't multivalued default to 
: false? In my schema, I originally did not set multivalued. I only put in 
: multivalued="false" after I experienced this issue.

That's dependent on the version of Solr, and it's is where the 
"version" property of the schema comes in.  (as the default behavior in 
solr changes, it does so dependent on what "version" you specify in your 
schema to prevent radical behavior changes if you upgrade but keep the 
same configs)...

<schema name="example" version="1.4">
  <!-- attribute "name" is the name of this schema and is only used for display purposes.
       Applications should change this to reflect the nature of the search collection.
       version="1.4" is Solr's version number for the schema syntax and semantics.  It should
       not normally be changed by applications.
       1.0: multiValued attribute did not exist, all fields are multiValued by nature
       1.1: multiValued attribute introduced, false by default 
       1.2: omitTermFreqAndPositions attribute introduced, true by default except for text fields.
       1.3: removed optional field compress feature
       1.4: default auto-phrase (QueryParser feature) to off
     -->



-Hoss

RE: getFieldValue always returns an ArrayList?

Posted by "Simon, Richard T" <Ri...@hms.harvard.edu>.
Interesting. You guessed right. I changed "multivalued" to "multiValued" and all of a sudden I get Strings. But, doesn't multivalued default to false? In my schema, I originally did not set multivalued. I only put in multivalued="false" after I experienced this issue. 

-Rich

For the record, I had a number of fields which had never settings for multivalued because none of them were multivalued and I expected the default to be false. When I experienced this problem, I added multivalued="false" to all of them. I still had the problem. So, I added a method to deal with the returned ArrayLists:

private Object getFieldValue(String field, SolrDocument document) {
		
			ArrayList list = (ArrayList)document.getFieldValue(field);
			return list.get(0);
	
	}


I deliberately did not test if the return Object was an ArrayList because I wanted to get an exception if any of them were Strings; I got no exceptions, so they were all returned as ArrayLists. 

I then changed one of the fields to use multiValued="false", and I got an exception, trying to cast String to ArrayList! So, I changed all the troublesome fields to use multiValued, and changed my helper method to look like this:

private Object getFieldValue(String field, SolrDocument document) {
		Object o = document.getFieldValue(field);
		
		if (o instanceof ArrayList) {
			System.out.println("####### Field " + field + " is an instance of ArrayList.");
			ArrayList list = (ArrayList)document.getFieldValue(field);
			return list.get(0);
		} else {
			if (!(o instanceof String)) {
				System.out.println("###### ERROR");
			} else {
				System.out.println("####### Field " + field + " is an instance of String.");
			}
			return o;
		}
		
	}


Here's the output, interspersed with the schema definitions of the fields:

<field name="uri" type="string" indexed="true" stored="true" multiValued="false" required="true" />
####### Field uri is an instance of String.

<field name="entity_label" type="string" indexed="false" stored="true" required="false" />
####### Field entity_label is an instance of ArrayList.

<field name="institution_uri" type="string" indexed="true" stored="true" required="false" />
####### Field institution_uri is an instance of ArrayList.

<field name="asserted_type_uri" type="string" indexed="true" stored="true" required="false" />    
####### Field asserted_type_uri is an instance of ArrayList.

<field name="asserted_type_label" type="text_eaglei" indexed="true" stored="true" required="false" />
####### Field asserted_type_label is an instance of ArrayList.

 <field name="provider_uri" type="string" indexed="true" stored="true" multiValued="false" required="false" />
####### Field provider_uri is an instance of String.

<field name="provider_label" type="string" indexed="true" stored="true" multiValued="false" required="false" />
####### Field provider_label is an instance of String.


As you can see, the ones with no declaration for multivalued are returned as ArrayLists, while the ones with multiValued="false" are returned as Strings. 

So, it looks like there are two problems here: multivalued (small v) is not recognized, since using that in the schema still causes all fields to be returned as ArrayLists; and, multivalued does not default to false (or, at least, not setting it causes a field to be returned as an ArrayList, as though it were set to true).

-Rich


-----Original Message-----
From: Erick Erickson [mailto:erickerickson@gmail.com] 
Sent: Wednesday, June 15, 2011 4:25 PM
To: solr-user@lucene.apache.org
Subject: Re: getFieldValue always returns an ArrayList?

Hmmm, I admit I'm not using embedded, and I'm using 3.2, but I'm
not seeing the behavior you are.

My question about reindexing could have been better stated, I
was just making sure you didn't have some leftover cruft where
your field was multi-valued from previous experiments, but if
you're reindexing each time that's not the problem.

Arrrggggh, camel case may be striking again. Try multiValued, not
multivalued....

If that's still not it, can we see the code?

Best
Erick

On Wed, Jun 15, 2011 at 3:47 PM, Simon, Richard T
<Ri...@hms.harvard.edu> wrote:
> We rebuild the index from scratch each time we start (for now). The fields in question are not multi-valued; in fact, I explicitly set multi-valued to false, just to be sure.
>
> Yes, this is SolrJ, using the embedded server, if that matters.
>
> Using Solr/Lucene 3.1.0.
>
> -Rich
>
> -----Original Message-----
> From: Erick Erickson [mailto:erickerickson@gmail.com]
> Sent: Wednesday, June 15, 2011 3:44 PM
> To: solr-user@lucene.apache.org
> Subject: Re: getFieldValue always returns an ArrayList?
>
> Did you perhaps change the schema but not re-index? I'm grasping
> at straws here, but something like this might happen if part of
> your index has that field as a multi-valued field....
>
> If that't not the problem, what version of solr are you using? I
> presume this is SolrJ?
>
> Best
> Erick
>
> On Wed, Jun 15, 2011 at 2:21 PM, Simon, Richard T
> <Ri...@hms.harvard.edu> wrote:
>> Hi - I am examining a SolrDocument I retrieved through a query. The field I am looking at is declared this way in my schema:
>>
>> <field name="uri" type="string" indexed="true" stored="true" multivalued="false" required="true" />
>>
>> I know multivalued defaults to false, but I set it explicitly because I'm seeing some unexpected behavior. I retrieve the value of the field like so:
>>
>> final String resource = (String)document.getFieldValue("uri");
>>
>>
>> However, I get an exception because an ArrayList is returned. I confirmed that the returned ArrayList has one element with the correct value, but I thought getFieldValue would return a String if the field is single valued. When I index the document, I have some code that retrieves the same field in the same way from the SolrInputDocument, and that code works.
>>
>> I looked at the code for SolrDocument.setField and it looks like the only way a field should be set to an ArrayList is if one is passed in by the code creating the SolrDocument. Why would it do that if the field is not multivalued?
>>
>> Is this behavior expected?
>>
>> -Rich
>>
>

Re: getFieldValue always returns an ArrayList?

Posted by Erick Erickson <er...@gmail.com>.
Hmmm, I admit I'm not using embedded, and I'm using 3.2, but I'm
not seeing the behavior you are.

My question about reindexing could have been better stated, I
was just making sure you didn't have some leftover cruft where
your field was multi-valued from previous experiments, but if
you're reindexing each time that's not the problem.

Arrrggggh, camel case may be striking again. Try multiValued, not
multivalued....

If that's still not it, can we see the code?

Best
Erick

On Wed, Jun 15, 2011 at 3:47 PM, Simon, Richard T
<Ri...@hms.harvard.edu> wrote:
> We rebuild the index from scratch each time we start (for now). The fields in question are not multi-valued; in fact, I explicitly set multi-valued to false, just to be sure.
>
> Yes, this is SolrJ, using the embedded server, if that matters.
>
> Using Solr/Lucene 3.1.0.
>
> -Rich
>
> -----Original Message-----
> From: Erick Erickson [mailto:erickerickson@gmail.com]
> Sent: Wednesday, June 15, 2011 3:44 PM
> To: solr-user@lucene.apache.org
> Subject: Re: getFieldValue always returns an ArrayList?
>
> Did you perhaps change the schema but not re-index? I'm grasping
> at straws here, but something like this might happen if part of
> your index has that field as a multi-valued field....
>
> If that't not the problem, what version of solr are you using? I
> presume this is SolrJ?
>
> Best
> Erick
>
> On Wed, Jun 15, 2011 at 2:21 PM, Simon, Richard T
> <Ri...@hms.harvard.edu> wrote:
>> Hi - I am examining a SolrDocument I retrieved through a query. The field I am looking at is declared this way in my schema:
>>
>> <field name="uri" type="string" indexed="true" stored="true" multivalued="false" required="true" />
>>
>> I know multivalued defaults to false, but I set it explicitly because I'm seeing some unexpected behavior. I retrieve the value of the field like so:
>>
>> final String resource = (String)document.getFieldValue("uri");
>>
>>
>> However, I get an exception because an ArrayList is returned. I confirmed that the returned ArrayList has one element with the correct value, but I thought getFieldValue would return a String if the field is single valued. When I index the document, I have some code that retrieves the same field in the same way from the SolrInputDocument, and that code works.
>>
>> I looked at the code for SolrDocument.setField and it looks like the only way a field should be set to an ArrayList is if one is passed in by the code creating the SolrDocument. Why would it do that if the field is not multivalued?
>>
>> Is this behavior expected?
>>
>> -Rich
>>
>

RE: getFieldValue always returns an ArrayList?

Posted by "Simon, Richard T" <Ri...@hms.harvard.edu>.
We rebuild the index from scratch each time we start (for now). The fields in question are not multi-valued; in fact, I explicitly set multi-valued to false, just to be sure.

Yes, this is SolrJ, using the embedded server, if that matters.

Using Solr/Lucene 3.1.0.

-Rich

-----Original Message-----
From: Erick Erickson [mailto:erickerickson@gmail.com] 
Sent: Wednesday, June 15, 2011 3:44 PM
To: solr-user@lucene.apache.org
Subject: Re: getFieldValue always returns an ArrayList?

Did you perhaps change the schema but not re-index? I'm grasping
at straws here, but something like this might happen if part of
your index has that field as a multi-valued field....

If that't not the problem, what version of solr are you using? I
presume this is SolrJ?

Best
Erick

On Wed, Jun 15, 2011 at 2:21 PM, Simon, Richard T
<Ri...@hms.harvard.edu> wrote:
> Hi - I am examining a SolrDocument I retrieved through a query. The field I am looking at is declared this way in my schema:
>
> <field name="uri" type="string" indexed="true" stored="true" multivalued="false" required="true" />
>
> I know multivalued defaults to false, but I set it explicitly because I'm seeing some unexpected behavior. I retrieve the value of the field like so:
>
> final String resource = (String)document.getFieldValue("uri");
>
>
> However, I get an exception because an ArrayList is returned. I confirmed that the returned ArrayList has one element with the correct value, but I thought getFieldValue would return a String if the field is single valued. When I index the document, I have some code that retrieves the same field in the same way from the SolrInputDocument, and that code works.
>
> I looked at the code for SolrDocument.setField and it looks like the only way a field should be set to an ArrayList is if one is passed in by the code creating the SolrDocument. Why would it do that if the field is not multivalued?
>
> Is this behavior expected?
>
> -Rich
>

Re: getFieldValue always returns an ArrayList?

Posted by Erick Erickson <er...@gmail.com>.
Did you perhaps change the schema but not re-index? I'm grasping
at straws here, but something like this might happen if part of
your index has that field as a multi-valued field....

If that't not the problem, what version of solr are you using? I
presume this is SolrJ?

Best
Erick

On Wed, Jun 15, 2011 at 2:21 PM, Simon, Richard T
<Ri...@hms.harvard.edu> wrote:
> Hi - I am examining a SolrDocument I retrieved through a query. The field I am looking at is declared this way in my schema:
>
> <field name="uri" type="string" indexed="true" stored="true" multivalued="false" required="true" />
>
> I know multivalued defaults to false, but I set it explicitly because I'm seeing some unexpected behavior. I retrieve the value of the field like so:
>
> final String resource = (String)document.getFieldValue("uri");
>
>
> However, I get an exception because an ArrayList is returned. I confirmed that the returned ArrayList has one element with the correct value, but I thought getFieldValue would return a String if the field is single valued. When I index the document, I have some code that retrieves the same field in the same way from the SolrInputDocument, and that code works.
>
> I looked at the code for SolrDocument.setField and it looks like the only way a field should be set to an ArrayList is if one is passed in by the code creating the SolrDocument. Why would it do that if the field is not multivalued?
>
> Is this behavior expected?
>
> -Rich
>