You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Bastien Latard - MDPI AG <la...@mdpi.com.INVALID> on 2016/04/15 13:51:48 UTC

Can a field be an array of fields?

Hi everybody!

/I described a bit what I found in another thread, but I prefer to 
create a new thread for this specific question.../
*It's **possible to create an array of string by doing (incomplete example):
- in the data-conf.xml:*
<entity name="solr_articles" query="select title, abstract from articles">

    <entity name="authors_array" query="select given_name, last_name
    from authors WHERE article_id='${solr_articles.id}'">
      <field column="title" name="title" />
      <field column="abstract" name="abstract" />
      <field column="given_name" name="given_name" />
      <field column="last_name" name="last_name" />
    </entity>

</entity>

*- in schema.xml:
*<field name="title" type="string" indexed="false" stored="true" 
required="false" multiValued="true" />
<field name="abstract" type="string" indexed="false" stored="true" 
required="false" multiValued="true" />
<field name="given_name" type="string" indexed="false" stored="true" 
required="false" multiValued="true" />
<field name="last_name" type="string" indexed="false" stored="true" 
required="false" multiValued="true" />

And this provides something like:

"docs":[
       {
[...]
	"given_name":["Bastien",          "Matthieu",          "Nino"],
	"last_name":["lastname1",         "lastname2",         "lastname3",           "lastname4"],

[...]


*Note: there can be one author with only a last_name, and then we are 
unable to tell which one it is...*

My goal would be to get this as a result:

"docs":[
       {
[...]
    "authors_array":
     [	
	[
	"given_name":["Bastien"],
	"last_name":["lastname1"]
         ],
	[
	"last_name":["lastname2"]
         ],
	[
	"given_name":["Matthieu"],
	"last_name":["lastname2"]
         ],
	[
	"given_name":["Nino"],
	"last_name":["lastname4"]
         ],
     ]
[...]


Is there any way to do this?
/PS: I know that I could do '//select if(a.given_name is not null, 
a.given_name ,'') as given_name, [...]//' but I would like to get an 
array.../

I tried to add something like that to the schema.xml, but this doesn't 
work (well, it might be of type 'array'):
<field name="authors_array" type="string" indexed="true" stored="true" 
required="false" multiValued="true"/>

Kind regards,
Bastien Latard
Web engineer
-- 
MDPI AG
Postfach, CH-4005 Basel, Switzerland
Office: Klybeckstrasse 64, CH-4057
Tel. +41 61 683 77 35
Fax: +41 61 302 89 18
E-mail:
latard@mdpi.com
http://www.mdpi.com/


Re: Can a field be an array of fields?

Posted by Bastien Latard - MDPI AG <la...@mdpi.com.INVALID>.
Thank you Jack and Daniel, I somehow missed your answers.

Yes, I already thought about the JSON possibility, but I was more 
concerned of having such structure in result:

"docs":[
        {
[...]
     "authors_array":
      [	
	[
	"given_name":["Bastien"],
	"last_name":["lastname1"]
          ],
	[
	"last_name":["lastname2"]
          ],
	[
	"given_name":["Matthieu"],
	"last_name":["lastname2"]
          ],
	[
	"given_name":["Nino"],
	"last_name":["lastname4"]
          ],
      ]
[...]


And being able to query like:
- q=authors_array.given_name:Nino
OR
- q=authors_array['given_name']:Nino

Is that possible?


Kind regards,
Bastien


On 15/04/2016 17:08, Jack Krupansky wrote:
> It all depends on what your queries look like - what input data does your
> application have and what data does it need to retrieve.
>
> My recommendation is that you store first name and last name as separate,
> multivalued fields if you indeed need to query by precisely a first or last
> name, but also store the full name as a separate multivalued text field. If
> you want to search by only first or last name, fine. If you want to search
> by full name or wildcards, etc., you can use the full name field, using
> phrase query. You can use an update request processor to combine first and
> last name into that third field. You could also store the full name in a
> fourth field as raw JSON if you really need structure in the result. The
> third field might have first and last name with a special separator such as
> "|", although a simple comma is typically sufficient.
>
>
> -- Jack Krupansky
>
> On Fri, Apr 15, 2016 at 10:58 AM, Davis, Daniel (NIH/NLM) [C] <
> daniel.davis@nih.gov> wrote:
>
>> Short answer - JOINs, external query outside Solr, Elastic Search ;)
>> Alternatives:
>>    * You get back an id for each document when you query on "Nino".   You
>> look up the last names in some other system that has the full list.
>>    * You index the authors in another collection and use JOINs
>>    * You store the author_array as formatted, escaped JSON, stored, but not
>> indexed (or analyzed).   When you get the data back, you navigate the JSON
>> to the author_array, get the value, and parse that value as JSON.   Now you
>> have the full list.
>>    * This is a sweet spot for Elastic Search, to be perfectly honest.
>>
>> -----Original Message-----
>> From: Bastien Latard - MDPI AG [mailto:latard@mdpi.com.INVALID]
>> Sent: Friday, April 15, 2016 7:52 AM
>> To: solr-user@lucene.apache.org
>> Subject: Can a field be an array of fields?
>>
>> Hi everybody!
>>
>> /I described a bit what I found in another thread, but I prefer to create
>> a new thread for this specific question.../ *It's **possible to create an
>> array of string by doing (incomplete example):
>> - in the data-conf.xml:*
>> <entity name="solr_articles" query="select title, abstract from articles">
>>
>>      <entity name="authors_array" query="select given_name, last_name
>>      from authors WHERE article_id='${solr_articles.id}'">
>>        <field column="title" name="title" />
>>        <field column="abstract" name="abstract" />
>>        <field column="given_name" name="given_name" />
>>        <field column="last_name" name="last_name" />
>>      </entity>
>>
>> </entity>
>>
>> *- in schema.xml:
>> *<field name="title" type="string" indexed="false" stored="true"
>> required="false" multiValued="true" />
>> <field name="abstract" type="string" indexed="false" stored="true"
>> required="false" multiValued="true" />
>> <field name="given_name" type="string" indexed="false" stored="true"
>> required="false" multiValued="true" />
>> <field name="last_name" type="string" indexed="false" stored="true"
>> required="false" multiValued="true" />
>>
>> And this provides something like:
>>
>> "docs":[
>>         {
>> [...]
>>          "given_name":["Bastien",          "Matthieu",          "Nino"],
>>          "last_name":["lastname1",         "lastname2",
>>   "lastname3",           "lastname4"],
>>
>> [...]
>>
>>
>> *Note: there can be one author with only a last_name, and then we are
>> unable to tell which one it is...*
>>
>> My goal would be to get this as a result:
>>
>> "docs":[
>>         {
>> [...]
>>      "authors_array":
>>       [
>>          [
>>          "given_name":["Bastien"],
>>          "last_name":["lastname1"]
>>           ],
>>          [
>>          "last_name":["lastname2"]
>>           ],
>>          [
>>          "given_name":["Matthieu"],
>>          "last_name":["lastname2"]
>>           ],
>>          [
>>          "given_name":["Nino"],
>>          "last_name":["lastname4"]
>>           ],
>>       ]
>> [...]
>>
>>
>> Is there any way to do this?
>> /PS: I know that I could do '//select if(a.given_name is not null,
>> a.given_name ,'') as given_name, [...]//' but I would like to get an
>> array.../
>>
>> I tried to add something like that to the schema.xml, but this doesn't
>> work (well, it might be of type 'array'):
>> <field name="authors_array" type="string" indexed="true" stored="true"
>> required="false" multiValued="true"/>
>>
>> Kind regards,
>> Bastien Latard
>> Web engineer
>> --
>> MDPI AG
>> Postfach, CH-4005 Basel, Switzerland
>> Office: Klybeckstrasse 64, CH-4057
>> Tel. +41 61 683 77 35
>> Fax: +41 61 302 89 18
>> E-mail:
>> latard@mdpi.com
>> http://www.mdpi.com/
>>
>>

Kind regards,
Bastien Latard
Web engineer
-- 
MDPI AG
Postfach, CH-4005 Basel, Switzerland
Office: Klybeckstrasse 64, CH-4057
Tel. +41 61 683 77 35
Fax: +41 61 302 89 18
E-mail:
latard@mdpi.com
http://www.mdpi.com/


Re: Can a field be an array of fields?

Posted by Jack Krupansky <ja...@gmail.com>.
It all depends on what your queries look like - what input data does your
application have and what data does it need to retrieve.

My recommendation is that you store first name and last name as separate,
multivalued fields if you indeed need to query by precisely a first or last
name, but also store the full name as a separate multivalued text field. If
you want to search by only first or last name, fine. If you want to search
by full name or wildcards, etc., you can use the full name field, using
phrase query. You can use an update request processor to combine first and
last name into that third field. You could also store the full name in a
fourth field as raw JSON if you really need structure in the result. The
third field might have first and last name with a special separator such as
"|", although a simple comma is typically sufficient.


-- Jack Krupansky

On Fri, Apr 15, 2016 at 10:58 AM, Davis, Daniel (NIH/NLM) [C] <
daniel.davis@nih.gov> wrote:

> Short answer - JOINs, external query outside Solr, Elastic Search ;)
> Alternatives:
>   * You get back an id for each document when you query on "Nino".   You
> look up the last names in some other system that has the full list.
>   * You index the authors in another collection and use JOINs
>   * You store the author_array as formatted, escaped JSON, stored, but not
> indexed (or analyzed).   When you get the data back, you navigate the JSON
> to the author_array, get the value, and parse that value as JSON.   Now you
> have the full list.
>   * This is a sweet spot for Elastic Search, to be perfectly honest.
>
> -----Original Message-----
> From: Bastien Latard - MDPI AG [mailto:latard@mdpi.com.INVALID]
> Sent: Friday, April 15, 2016 7:52 AM
> To: solr-user@lucene.apache.org
> Subject: Can a field be an array of fields?
>
> Hi everybody!
>
> /I described a bit what I found in another thread, but I prefer to create
> a new thread for this specific question.../ *It's **possible to create an
> array of string by doing (incomplete example):
> - in the data-conf.xml:*
> <entity name="solr_articles" query="select title, abstract from articles">
>
>     <entity name="authors_array" query="select given_name, last_name
>     from authors WHERE article_id='${solr_articles.id}'">
>       <field column="title" name="title" />
>       <field column="abstract" name="abstract" />
>       <field column="given_name" name="given_name" />
>       <field column="last_name" name="last_name" />
>     </entity>
>
> </entity>
>
> *- in schema.xml:
> *<field name="title" type="string" indexed="false" stored="true"
> required="false" multiValued="true" />
> <field name="abstract" type="string" indexed="false" stored="true"
> required="false" multiValued="true" />
> <field name="given_name" type="string" indexed="false" stored="true"
> required="false" multiValued="true" />
> <field name="last_name" type="string" indexed="false" stored="true"
> required="false" multiValued="true" />
>
> And this provides something like:
>
> "docs":[
>        {
> [...]
>         "given_name":["Bastien",          "Matthieu",          "Nino"],
>         "last_name":["lastname1",         "lastname2",
>  "lastname3",           "lastname4"],
>
> [...]
>
>
> *Note: there can be one author with only a last_name, and then we are
> unable to tell which one it is...*
>
> My goal would be to get this as a result:
>
> "docs":[
>        {
> [...]
>     "authors_array":
>      [
>         [
>         "given_name":["Bastien"],
>         "last_name":["lastname1"]
>          ],
>         [
>         "last_name":["lastname2"]
>          ],
>         [
>         "given_name":["Matthieu"],
>         "last_name":["lastname2"]
>          ],
>         [
>         "given_name":["Nino"],
>         "last_name":["lastname4"]
>          ],
>      ]
> [...]
>
>
> Is there any way to do this?
> /PS: I know that I could do '//select if(a.given_name is not null,
> a.given_name ,'') as given_name, [...]//' but I would like to get an
> array.../
>
> I tried to add something like that to the schema.xml, but this doesn't
> work (well, it might be of type 'array'):
> <field name="authors_array" type="string" indexed="true" stored="true"
> required="false" multiValued="true"/>
>
> Kind regards,
> Bastien Latard
> Web engineer
> --
> MDPI AG
> Postfach, CH-4005 Basel, Switzerland
> Office: Klybeckstrasse 64, CH-4057
> Tel. +41 61 683 77 35
> Fax: +41 61 302 89 18
> E-mail:
> latard@mdpi.com
> http://www.mdpi.com/
>
>

RE: Can a field be an array of fields?

Posted by "Davis, Daniel (NIH/NLM) [C]" <da...@nih.gov>.
Short answer - JOINs, external query outside Solr, Elastic Search ;)
Alternatives:
  * You get back an id for each document when you query on "Nino".   You look up the last names in some other system that has the full list.
  * You index the authors in another collection and use JOINs
  * You store the author_array as formatted, escaped JSON, stored, but not indexed (or analyzed).   When you get the data back, you navigate the JSON to the author_array, get the value, and parse that value as JSON.   Now you have the full list.
  * This is a sweet spot for Elastic Search, to be perfectly honest.

-----Original Message-----
From: Bastien Latard - MDPI AG [mailto:latard@mdpi.com.INVALID] 
Sent: Friday, April 15, 2016 7:52 AM
To: solr-user@lucene.apache.org
Subject: Can a field be an array of fields?

Hi everybody!

/I described a bit what I found in another thread, but I prefer to create a new thread for this specific question.../ *It's **possible to create an array of string by doing (incomplete example):
- in the data-conf.xml:*
<entity name="solr_articles" query="select title, abstract from articles">

    <entity name="authors_array" query="select given_name, last_name
    from authors WHERE article_id='${solr_articles.id}'">
      <field column="title" name="title" />
      <field column="abstract" name="abstract" />
      <field column="given_name" name="given_name" />
      <field column="last_name" name="last_name" />
    </entity>

</entity>

*- in schema.xml:
*<field name="title" type="string" indexed="false" stored="true" 
required="false" multiValued="true" />
<field name="abstract" type="string" indexed="false" stored="true" 
required="false" multiValued="true" />
<field name="given_name" type="string" indexed="false" stored="true" 
required="false" multiValued="true" />
<field name="last_name" type="string" indexed="false" stored="true" 
required="false" multiValued="true" />

And this provides something like:

"docs":[
       {
[...]
	"given_name":["Bastien",          "Matthieu",          "Nino"],
	"last_name":["lastname1",         "lastname2",         "lastname3",           "lastname4"],

[...]


*Note: there can be one author with only a last_name, and then we are unable to tell which one it is...*

My goal would be to get this as a result:

"docs":[
       {
[...]
    "authors_array":
     [	
	[
	"given_name":["Bastien"],
	"last_name":["lastname1"]
         ],
	[
	"last_name":["lastname2"]
         ],
	[
	"given_name":["Matthieu"],
	"last_name":["lastname2"]
         ],
	[
	"given_name":["Nino"],
	"last_name":["lastname4"]
         ],
     ]
[...]


Is there any way to do this?
/PS: I know that I could do '//select if(a.given_name is not null, a.given_name ,'') as given_name, [...]//' but I would like to get an array.../

I tried to add something like that to the schema.xml, but this doesn't work (well, it might be of type 'array'):
<field name="authors_array" type="string" indexed="true" stored="true" 
required="false" multiValued="true"/>

Kind regards,
Bastien Latard
Web engineer
--
MDPI AG
Postfach, CH-4005 Basel, Switzerland
Office: Klybeckstrasse 64, CH-4057
Tel. +41 61 683 77 35
Fax: +41 61 302 89 18
E-mail:
latard@mdpi.com
http://www.mdpi.com/


Can a field be an array of fields?

Posted by Bastien Latard - MDPI AG <la...@mdpi.com.INVALID>.
The same email, but with formatting...
(email below)

-------- Forwarded Message --------
Subject: 	Can a field be an array of fields?
Date: 	Fri, 15 Apr 2016 13:51:48 +0200
From: 	Bastien Latard - MDPI AG <la...@mdpi.com>
To: 	solr-user@lucene.apache.org



Hi everybody!

/I described a bit what I found in another thread, but I prefer to 
create a new thread for this specific question.../
*It's **possible to create an array of string by doing (incomplete example):
- in the data-conf.xml:*
<entity name="solr_articles" query="select title, abstract from articles">

    <entity name="authors_array" query="select given_name, last_name
    from authors WHERE article_id='${solr_articles.id}'">
      <field column="title" name="title" />
      <field column="abstract" name="abstract" />
      <field column="given_name" name="given_name" />
      <field column="last_name" name="last_name" />
    </entity>

</entity>

*- in schema.xml:
*<field name="title" type="string" indexed="false" stored="true" 
required="false" multiValued="true" />
<field name="abstract" type="string" indexed="false" stored="true" 
required="false" multiValued="true" />
<field name="given_name" type="string" indexed="false" stored="true" 
required="false" multiValued="true" />
<field name="last_name" type="string" indexed="false" stored="true" 
required="false" multiValued="true" />

And this provides something like:

"docs":[
       {
[...]
	"given_name":["Bastien",          "Matthieu",          "Nino"],
	"last_name":["lastname1",         "lastname2",         "lastname3",           "lastname4"],

[...]


*Note: there can be one author with only a last_name, and then we are 
unable to tell which one it is...*

My goal would be to get this as a result:

"docs":[
       {
[...]
    "authors_array":
     [	
	[
	"given_name":["Bastien"],
	"last_name":["lastname1"]
         ],
	[
	"last_name":["lastname2"]
         ],
	[
	"given_name":["Matthieu"],
	"last_name":["lastname2"]
         ],
	[
	"given_name":["Nino"],
	"last_name":["lastname4"]
         ],
     ]
[...]


Is there any way to do this?
/PS: I know that I could do '//select if(a.given_name is not null, 
a.given_name ,'') as given_name, [...]//' but I would like to get an 
array.../

I tried to add something like that to the schema.xml, but this doesn't 
work (well, it might be of type 'array'):
<field name="authors_array" type="string" indexed="true" stored="true" 
required="false" multiValued="true"/>

Kind regards,
Bastien Latard
Web engineer
-- 
MDPI AG
Postfach, CH-4005 Basel, Switzerland
Office: Klybeckstrasse 64, CH-4057
Tel. +41 61 683 77 35
Fax: +41 61 302 89 18
E-mail:
latard@mdpi.com
http://www.mdpi.com/