You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Zac Tolley <za...@thetolleys.com> on 2011/08/24 16:53:59 UTC

Newbie Question, can I store structured sub elements?

I have a very scenario in which I have a film and showings, each film has
multiple showings at set times on set channels, so I have:

Movie
-----
id
title
description
duration


Showing
-----
id
movie_id
starttime
channelname



I want to know can I store this in solr so that I keep this stucture?

I did try to do an initial import with the DIH using this config:

<entity name="movie" query="SELECT * from movies">
   <field column="ID" name="id"/>
   <field column="TITLE" name="title"/>
   <field name="DESCRIPTION" name="description"/>

   <entity name="showing" query="SELECT * from showings WHERE movie_id = ${
movie.id}">
     <field column="ID" name="id"/>
     <field column="STARTTIME" name="starttime"/>
     <field column="CHANNELNAME" name="channelname"/>
   </entity>
</entity>

I was hoping, for each movie to get a sub entity with the showing like:

<doc>
   <str name="title">.....</str>
   <showing>
     <str name="channelname".....



but instead all the fields are flattened down to the top level.

I know this must be easy, what am I missing... ?

Re: Newbie Question, can I store structured sub elements?

Posted by Paul Libbrecht <pa...@hoplahup.net>.
Whether multi-valued or token-streams, the question is search, not (de)serialization: that's opaque to Solr which will take and give it to you as needed.

paul


Le 25 août 2011 à 21:24, Zac Tolley a écrit :

> My search is very simple, mainly on titles, actors, show times and channels.
> Having multiple lists of values is probably better for that, and as the
> order is kept the same its relatively simple to map the response back onto
> pojos for my presentation layer.


Re: Newbie Question, can I store structured sub elements?

Posted by Zac Tolley <za...@thetolleys.com>.
My search is very simple, mainly on titles, actors, show times and channels.
Having multiple lists of values is probably better for that, and as the
order is kept the same its relatively simple to map the response back onto
pojos for my presentation layer.

On Thu, Aug 25, 2011 at 8:18 PM, Paul Libbrecht <pa...@hoplahup.net> wrote:

> Delimited text is the baby form of lists.
> Text can be made very very structured (think XML, ontologies...).
> I think the crux is your search needs.
>
> For example, with Lucene, I made a search for formulæ (including sub-terms)
> by converting the OpenMath-encoded terms into rows of tokens and querying
> with SpanQueries. Quite structured to my taste.
>
> What you don't have is the freedom of joins which brings a very flexible
> query mechanism almost independent of the schema... but this often can be
> circumvented by the flat solr and lucene storage whose performance is really
> amazing.
>
> paul
>
>
> Le 25 août 2011 à 21:07, Zac Tolley a écrit :
>
> > have come to that conclusion  so had to choose between multiple fields
> with
> > multiple vales or a field with delimited text, gone for the former
> >
> > On Thu, Aug 25, 2011 at 7:58 PM, Erick Erickson <erickerickson@gmail.com
> >wrote:
> >
> >> nope, it's not easy. Solr docs are flat, flat, flat with the tiny
> >> exception that multiValued fields are returned as a lists.
> >>
> >> However, you can count on multi-valued fields being returned
> >> in the order they were added, so it might work out for you to
> >> treat these as parallel arrays in Solr documents.
> >>
> >> Best
> >> Erick
> >>
> >> On Thu, Aug 25, 2011 at 3:10 AM, Zac Tolley <za...@thetolleys.com> wrote:
> >>> I know I can have multi value on them but that doesn't let me see that
> >>> a showing instance happens at a particular time on a particular
> >>> channel, just that it shows on a range of channels at a range of times
> >>>
> >>> Starting to think I will have to either store a formatted string that
> >>> combines them or keep it flat just for indexing, retrieve ids and use
> >>> them to get data out of the RDBMS
> >>>
> >>>
> >>> On 24 Aug 2011, at 23:09, dan whelan <da...@adicio.com> wrote:
> >>>
> >>>> You could change starttime and channelname to multiValued=true and use
> >> these fields to store all the values for those fields.
> >>>>
> >>>> showing.movie_id and showing.id probably isn't needed in a solr
> record.
> >>>>
> >>>>
> >>>>
> >>>> On 8/24/11 7:53 AM, Zac Tolley wrote:
> >>>>> I have a very scenario in which I have a film and showings, each film
> >> has
> >>>>> multiple showings at set times on set channels, so I have:
> >>>>>
> >>>>> Movie
> >>>>> -----
> >>>>> id
> >>>>> title
> >>>>> description
> >>>>> duration
> >>>>>
> >>>>>
> >>>>> Showing
> >>>>> -----
> >>>>> id
> >>>>> movie_id
> >>>>> starttime
> >>>>> channelname
> >>>>>
> >>>>>
> >>>>>
> >>>>> I want to know can I store this in solr so that I keep this stucture?
> >>>>>
> >>>>> I did try to do an initial import with the DIH using this config:
> >>>>>
> >>>>> <entity name="movie" query="SELECT * from movies">
> >>>>>   <field column="ID" name="id"/>
> >>>>>   <field column="TITLE" name="title"/>
> >>>>>   <field name="DESCRIPTION" name="description"/>
> >>>>>
> >>>>>   <entity name="showing" query="SELECT * from showings WHERE movie_id
> >> = ${
> >>>>> movie.id}">
> >>>>>     <field column="ID" name="id"/>
> >>>>>     <field column="STARTTIME" name="starttime"/>
> >>>>>     <field column="CHANNELNAME" name="channelname"/>
> >>>>>   </entity>
> >>>>> </entity>
> >>>>>
> >>>>> I was hoping, for each movie to get a sub entity with the showing
> like:
> >>>>>
> >>>>> <doc>
> >>>>>   <str name="title">.....</str>
> >>>>>   <showing>
> >>>>>     <str name="channelname".....
> >>>>>
> >>>>>
> >>>>>
> >>>>> but instead all the fields are flattened down to the top level.
> >>>>>
> >>>>> I know this must be easy, what am I missing... ?
> >>>>>
> >>>>
> >>>
> >>
>
>

Re: Newbie Question, can I store structured sub elements?

Posted by Paul Libbrecht <pa...@hoplahup.net>.
Delimited text is the baby form of lists.
Text can be made very very structured (think XML, ontologies...).
I think the crux is your search needs.

For example, with Lucene, I made a search for formulæ (including sub-terms) by converting the OpenMath-encoded terms into rows of tokens and querying with SpanQueries. Quite structured to my taste.

What you don't have is the freedom of joins which brings a very flexible query mechanism almost independent of the schema... but this often can be circumvented by the flat solr and lucene storage whose performance is really amazing.

paul


Le 25 août 2011 à 21:07, Zac Tolley a écrit :

> have come to that conclusion  so had to choose between multiple fields with
> multiple vales or a field with delimited text, gone for the former
> 
> On Thu, Aug 25, 2011 at 7:58 PM, Erick Erickson <er...@gmail.com>wrote:
> 
>> nope, it's not easy. Solr docs are flat, flat, flat with the tiny
>> exception that multiValued fields are returned as a lists.
>> 
>> However, you can count on multi-valued fields being returned
>> in the order they were added, so it might work out for you to
>> treat these as parallel arrays in Solr documents.
>> 
>> Best
>> Erick
>> 
>> On Thu, Aug 25, 2011 at 3:10 AM, Zac Tolley <za...@thetolleys.com> wrote:
>>> I know I can have multi value on them but that doesn't let me see that
>>> a showing instance happens at a particular time on a particular
>>> channel, just that it shows on a range of channels at a range of times
>>> 
>>> Starting to think I will have to either store a formatted string that
>>> combines them or keep it flat just for indexing, retrieve ids and use
>>> them to get data out of the RDBMS
>>> 
>>> 
>>> On 24 Aug 2011, at 23:09, dan whelan <da...@adicio.com> wrote:
>>> 
>>>> You could change starttime and channelname to multiValued=true and use
>> these fields to store all the values for those fields.
>>>> 
>>>> showing.movie_id and showing.id probably isn't needed in a solr record.
>>>> 
>>>> 
>>>> 
>>>> On 8/24/11 7:53 AM, Zac Tolley wrote:
>>>>> I have a very scenario in which I have a film and showings, each film
>> has
>>>>> multiple showings at set times on set channels, so I have:
>>>>> 
>>>>> Movie
>>>>> -----
>>>>> id
>>>>> title
>>>>> description
>>>>> duration
>>>>> 
>>>>> 
>>>>> Showing
>>>>> -----
>>>>> id
>>>>> movie_id
>>>>> starttime
>>>>> channelname
>>>>> 
>>>>> 
>>>>> 
>>>>> I want to know can I store this in solr so that I keep this stucture?
>>>>> 
>>>>> I did try to do an initial import with the DIH using this config:
>>>>> 
>>>>> <entity name="movie" query="SELECT * from movies">
>>>>>   <field column="ID" name="id"/>
>>>>>   <field column="TITLE" name="title"/>
>>>>>   <field name="DESCRIPTION" name="description"/>
>>>>> 
>>>>>   <entity name="showing" query="SELECT * from showings WHERE movie_id
>> = ${
>>>>> movie.id}">
>>>>>     <field column="ID" name="id"/>
>>>>>     <field column="STARTTIME" name="starttime"/>
>>>>>     <field column="CHANNELNAME" name="channelname"/>
>>>>>   </entity>
>>>>> </entity>
>>>>> 
>>>>> I was hoping, for each movie to get a sub entity with the showing like:
>>>>> 
>>>>> <doc>
>>>>>   <str name="title">.....</str>
>>>>>   <showing>
>>>>>     <str name="channelname".....
>>>>> 
>>>>> 
>>>>> 
>>>>> but instead all the fields are flattened down to the top level.
>>>>> 
>>>>> I know this must be easy, what am I missing... ?
>>>>> 
>>>> 
>>> 
>> 


Re: Newbie Question, can I store structured sub elements?

Posted by Zac Tolley <za...@thetolleys.com>.
have come to that conclusion  so had to choose between multiple fields with
multiple vales or a field with delimited text, gone for the former

On Thu, Aug 25, 2011 at 7:58 PM, Erick Erickson <er...@gmail.com>wrote:

> nope, it's not easy. Solr docs are flat, flat, flat with the tiny
> exception that multiValued fields are returned as a lists.
>
> However, you can count on multi-valued fields being returned
> in the order they were added, so it might work out for you to
> treat these as parallel arrays in Solr documents.
>
> Best
> Erick
>
> On Thu, Aug 25, 2011 at 3:10 AM, Zac Tolley <za...@thetolleys.com> wrote:
> > I know I can have multi value on them but that doesn't let me see that
> > a showing instance happens at a particular time on a particular
> > channel, just that it shows on a range of channels at a range of times
> >
> > Starting to think I will have to either store a formatted string that
> > combines them or keep it flat just for indexing, retrieve ids and use
> > them to get data out of the RDBMS
> >
> >
> > On 24 Aug 2011, at 23:09, dan whelan <da...@adicio.com> wrote:
> >
> >> You could change starttime and channelname to multiValued=true and use
> these fields to store all the values for those fields.
> >>
> >> showing.movie_id and showing.id probably isn't needed in a solr record.
> >>
> >>
> >>
> >> On 8/24/11 7:53 AM, Zac Tolley wrote:
> >>> I have a very scenario in which I have a film and showings, each film
> has
> >>> multiple showings at set times on set channels, so I have:
> >>>
> >>> Movie
> >>> -----
> >>> id
> >>> title
> >>> description
> >>> duration
> >>>
> >>>
> >>> Showing
> >>> -----
> >>> id
> >>> movie_id
> >>> starttime
> >>> channelname
> >>>
> >>>
> >>>
> >>> I want to know can I store this in solr so that I keep this stucture?
> >>>
> >>> I did try to do an initial import with the DIH using this config:
> >>>
> >>> <entity name="movie" query="SELECT * from movies">
> >>>    <field column="ID" name="id"/>
> >>>    <field column="TITLE" name="title"/>
> >>>    <field name="DESCRIPTION" name="description"/>
> >>>
> >>>    <entity name="showing" query="SELECT * from showings WHERE movie_id
> = ${
> >>> movie.id}">
> >>>      <field column="ID" name="id"/>
> >>>      <field column="STARTTIME" name="starttime"/>
> >>>      <field column="CHANNELNAME" name="channelname"/>
> >>>    </entity>
> >>> </entity>
> >>>
> >>> I was hoping, for each movie to get a sub entity with the showing like:
> >>>
> >>> <doc>
> >>>    <str name="title">.....</str>
> >>>    <showing>
> >>>      <str name="channelname".....
> >>>
> >>>
> >>>
> >>> but instead all the fields are flattened down to the top level.
> >>>
> >>> I know this must be easy, what am I missing... ?
> >>>
> >>
> >
>

Re: Newbie Question, can I store structured sub elements?

Posted by Erick Erickson <er...@gmail.com>.
nope, it's not easy. Solr docs are flat, flat, flat with the tiny
exception that multiValued fields are returned as a lists.

However, you can count on multi-valued fields being returned
in the order they were added, so it might work out for you to
treat these as parallel arrays in Solr documents.

Best
Erick

On Thu, Aug 25, 2011 at 3:10 AM, Zac Tolley <za...@thetolleys.com> wrote:
> I know I can have multi value on them but that doesn't let me see that
> a showing instance happens at a particular time on a particular
> channel, just that it shows on a range of channels at a range of times
>
> Starting to think I will have to either store a formatted string that
> combines them or keep it flat just for indexing, retrieve ids and use
> them to get data out of the RDBMS
>
>
> On 24 Aug 2011, at 23:09, dan whelan <da...@adicio.com> wrote:
>
>> You could change starttime and channelname to multiValued=true and use these fields to store all the values for those fields.
>>
>> showing.movie_id and showing.id probably isn't needed in a solr record.
>>
>>
>>
>> On 8/24/11 7:53 AM, Zac Tolley wrote:
>>> I have a very scenario in which I have a film and showings, each film has
>>> multiple showings at set times on set channels, so I have:
>>>
>>> Movie
>>> -----
>>> id
>>> title
>>> description
>>> duration
>>>
>>>
>>> Showing
>>> -----
>>> id
>>> movie_id
>>> starttime
>>> channelname
>>>
>>>
>>>
>>> I want to know can I store this in solr so that I keep this stucture?
>>>
>>> I did try to do an initial import with the DIH using this config:
>>>
>>> <entity name="movie" query="SELECT * from movies">
>>>    <field column="ID" name="id"/>
>>>    <field column="TITLE" name="title"/>
>>>    <field name="DESCRIPTION" name="description"/>
>>>
>>>    <entity name="showing" query="SELECT * from showings WHERE movie_id = ${
>>> movie.id}">
>>>      <field column="ID" name="id"/>
>>>      <field column="STARTTIME" name="starttime"/>
>>>      <field column="CHANNELNAME" name="channelname"/>
>>>    </entity>
>>> </entity>
>>>
>>> I was hoping, for each movie to get a sub entity with the showing like:
>>>
>>> <doc>
>>>    <str name="title">.....</str>
>>>    <showing>
>>>      <str name="channelname".....
>>>
>>>
>>>
>>> but instead all the fields are flattened down to the top level.
>>>
>>> I know this must be easy, what am I missing... ?
>>>
>>
>

Re: Newbie Question, can I store structured sub elements?

Posted by Zac Tolley <za...@thetolleys.com>.
I know I can have multi value on them but that doesn't let me see that
a showing instance happens at a particular time on a particular
channel, just that it shows on a range of channels at a range of times

Starting to think I will have to either store a formatted string that
combines them or keep it flat just for indexing, retrieve ids and use
them to get data out of the RDBMS


On 24 Aug 2011, at 23:09, dan whelan <da...@adicio.com> wrote:

> You could change starttime and channelname to multiValued=true and use these fields to store all the values for those fields.
>
> showing.movie_id and showing.id probably isn't needed in a solr record.
>
>
>
> On 8/24/11 7:53 AM, Zac Tolley wrote:
>> I have a very scenario in which I have a film and showings, each film has
>> multiple showings at set times on set channels, so I have:
>>
>> Movie
>> -----
>> id
>> title
>> description
>> duration
>>
>>
>> Showing
>> -----
>> id
>> movie_id
>> starttime
>> channelname
>>
>>
>>
>> I want to know can I store this in solr so that I keep this stucture?
>>
>> I did try to do an initial import with the DIH using this config:
>>
>> <entity name="movie" query="SELECT * from movies">
>>    <field column="ID" name="id"/>
>>    <field column="TITLE" name="title"/>
>>    <field name="DESCRIPTION" name="description"/>
>>
>>    <entity name="showing" query="SELECT * from showings WHERE movie_id = ${
>> movie.id}">
>>      <field column="ID" name="id"/>
>>      <field column="STARTTIME" name="starttime"/>
>>      <field column="CHANNELNAME" name="channelname"/>
>>    </entity>
>> </entity>
>>
>> I was hoping, for each movie to get a sub entity with the showing like:
>>
>> <doc>
>>    <str name="title">.....</str>
>>    <showing>
>>      <str name="channelname".....
>>
>>
>>
>> but instead all the fields are flattened down to the top level.
>>
>> I know this must be easy, what am I missing... ?
>>
>

Re: Newbie Question, can I store structured sub elements?

Posted by dan whelan <da...@adicio.com>.
You could change starttime and channelname to multiValued=true and use 
these fields to store all the values for those fields.

showing.movie_id and showing.id probably isn't needed in a solr record.



On 8/24/11 7:53 AM, Zac Tolley wrote:
> I have a very scenario in which I have a film and showings, each film has
> multiple showings at set times on set channels, so I have:
>
> Movie
> -----
> id
> title
> description
> duration
>
>
> Showing
> -----
> id
> movie_id
> starttime
> channelname
>
>
>
> I want to know can I store this in solr so that I keep this stucture?
>
> I did try to do an initial import with the DIH using this config:
>
> <entity name="movie" query="SELECT * from movies">
>     <field column="ID" name="id"/>
>     <field column="TITLE" name="title"/>
>     <field name="DESCRIPTION" name="description"/>
>
>     <entity name="showing" query="SELECT * from showings WHERE movie_id = ${
> movie.id}">
>       <field column="ID" name="id"/>
>       <field column="STARTTIME" name="starttime"/>
>       <field column="CHANNELNAME" name="channelname"/>
>     </entity>
> </entity>
>
> I was hoping, for each movie to get a sub entity with the showing like:
>
> <doc>
>     <str name="title">.....</str>
>     <showing>
>       <str name="channelname".....
>
>
>
> but instead all the fields are flattened down to the top level.
>
> I know this must be easy, what am I missing... ?
>