You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by JaredM <em...@gmail.com> on 2009/12/31 16:26:28 UTC

Help with creating a solr schema

Hi,

I'm new to Solr but so far I think its great.  I've spent 2 weeks reading
through the wiki and mailing list info.

I have a use case and I'm not sure what the best way is to implement it.  I
am keeping track of peoples calendar schedules in a really simple way: each
user can login and input a number of date ranges where they are available
(so for example - User Alice might be available between 1-Jan-2010 -
15-Jan-2010 and 20-Feb-2010 - 22-Feb-2010 and 1-Mar-2010-5-Mar-2010.

In my data model I have this modelled as a one-to-many with a User table
(consisting of username, some metadata) and an Availability table
(consisting of start date and end date).

Now I need to search which users are available between a given date range. 
The bit I'm having trouble with is how to store multiple start & end date
pairs.  Can someone provide some guidance?
-- 
View this message in context: http://old.nabble.com/Help-with-creating-a-solr-schema-tp26979319p26979319.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Help with creating a solr schema

Posted by Israel Ekpo <is...@gmail.com>.
On Thu, Dec 31, 2009 at 10:26 AM, JaredM <em...@gmail.com> wrote:

>
> Hi,
>
> I'm new to Solr but so far I think its great.  I've spent 2 weeks reading
> through the wiki and mailing list info.
>
> I have a use case and I'm not sure what the best way is to implement it.  I
> am keeping track of peoples calendar schedules in a really simple way: each
> user can login and input a number of date ranges where they are available
> (so for example - User Alice might be available between 1-Jan-2010 -
> 15-Jan-2010 and 20-Feb-2010 - 22-Feb-2010 and 1-Mar-2010-5-Mar-2010.
>
> In my data model I have this modelled as a one-to-many with a User table
> (consisting of username, some metadata) and an Availability table
> (consisting of start date and end date).
>
> Now I need to search which users are available between a given date range.
> The bit I'm having trouble with is how to store multiple start & end date
> pairs.  Can someone provide some guidance?
> --
> View this message in context:
> http://old.nabble.com/Help-with-creating-a-solr-schema-tp26979319p26979319.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>

I have done something similar to this before.

You will have to store the username, firstname, lastname as single valued
fields

<field name="username" type="string" indexed="true" stored="true"
required="true"/>
<field name="firstname" type="string" indexed="true" stored="true" />
<field name="lastname" type="string" indexed="true" stored="true" />
<field name="start_date" type="tint" indexed="true" stored="true"
multiValued="true"/>
<field name="end_date" type="tint" indexed="true" stored="true"
multiValued="true"/>

However, the start and end dates should be multivalued tint types.

I decided to store the dates as UNIX timestamps. The start dates are stored
as the unix timestamps at 12 midnight of that date (00:00:00)

The end dates are stored as the unix time stamps at 11:59:59 PM on the end
date 23:59:59

This (storing the dates as Trie integers) gave me faster range query
results.

when searching you will also have to convert the dates to unix time stamps
using similar logic before using it in the solr search query

You should use the username of the user as the uniqueKey.

If a user has multiple dates of availability you will enter it like so:

<add>
<doc>
<field name="username">exampleun</field>
<field name="firstname">examplefn<f/field>
<field name="lastname">exampleln</field>
<field name="start_date">137865661</field>
<field name="start_date">137865662</field>
<field name="start_date">137865663</field>
<field name="end_date">137865681</field>
<field name="end_date">137865682</field>
<field name="end_date">137865683</field>
</doc>
</add>


-- 
"Good Enough" is not good enough.
To give anything less than your best is to sacrifice the gift.
Quality First. Measure Twice. Cut Once.
http://www.israelekpo.com/

Re: Help with creating a solr schema

Posted by Lance Norskog <go...@gmail.com>.
Another option is to model this problem in Solr with an even more
denormalized schema: you have one document per person per day. So,
instead of:
id=0 user=Alice start_date:1-Jan-2010  end_date:5-Jan-2010
you have:
id=0 user=Alice date:1-Jan-2010
id=1 user=Alice date:2-Jan-2010
id=2 user=Alice date:3-Jan-2010
id=3 user=Alice date:4-Jan-2010
id=4 user=Alice date:5-Jan-2010

For convenience in searching,I would do this with useful id values:
id=Alice_1-Jan-2010 user=Alice date:1-Jan-2010

Solr can handle hundreds of millions of documents quite well.

Also, using the date type for the dates allows you to use the date
range and facet options, which are more efficient that searching on
strings.

Lance

On Fri, Jan 1, 2010 at 9:38 PM, Israel Ekpo <is...@gmail.com> wrote:
> On Fri, Jan 1, 2010 at 9:47 PM, JaredM <em...@gmail.com> wrote:
>
>>
>> Thanks Ahmet and Israel.  I prefer Israel's approach since the amount of
>> metadata for the user is quite high but I'm not clear how to get around one
>> problem:
>>
>> If I had 2 availabilities (I've left it in human-readable form instead of
>> as
>> a UNIX timestamp only for ease of understanding):
>>
>> <field name="start_date">10-Jan-2010</field>
>> <field name="start_date">20-Jan-2010</field>
>> <field name="end_date">25-Jan-2010</field>
>> <field name="end_date">28-Jan-2010</field>
>>
>> and I wanted to query for availability between 12-Jan-2010 to 26-Jan-2010
>> then then wouldn't the above document be returned (even though the use
>> would
>> not be available 20-25 Jan?
>> --
>> View this message in context:
>> http://old.nabble.com/Help-with-creating-a-solr-schema-tp26979319p26990178.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>
>>
>
> Unfortunately,
>
> For this particular use case, if you are using the out-of-the-box features
> available in Solr 1.4, without a custom Solr plugin using a custom Lucene
> filter and some special value storage arrangement for the fields, you will
> have to store each start and end date as a separate document. So, there will
> be N separate documents for each username if that user has N distinct
> periods of availabilty. The start date and end date fields would also have
> to be single valued instead of multi-valued as I specified in the earlier
> post.
>
> Sorry.
> --
> "Good Enough" is not good enough.
> To give anything less than your best is to sacrifice the gift.
> Quality First. Measure Twice. Cut Once.
> http://www.israelekpo.com/
>



-- 
Lance Norskog
goksron@gmail.com

Re: Help with creating a solr schema

Posted by Israel Ekpo <is...@gmail.com>.
On Fri, Jan 1, 2010 at 9:47 PM, JaredM <em...@gmail.com> wrote:

>
> Thanks Ahmet and Israel.  I prefer Israel's approach since the amount of
> metadata for the user is quite high but I'm not clear how to get around one
> problem:
>
> If I had 2 availabilities (I've left it in human-readable form instead of
> as
> a UNIX timestamp only for ease of understanding):
>
> <field name="start_date">10-Jan-2010</field>
> <field name="start_date">20-Jan-2010</field>
> <field name="end_date">25-Jan-2010</field>
> <field name="end_date">28-Jan-2010</field>
>
> and I wanted to query for availability between 12-Jan-2010 to 26-Jan-2010
> then then wouldn't the above document be returned (even though the use
> would
> not be available 20-25 Jan?
> --
> View this message in context:
> http://old.nabble.com/Help-with-creating-a-solr-schema-tp26979319p26990178.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>

Unfortunately,

For this particular use case, if you are using the out-of-the-box features
available in Solr 1.4, without a custom Solr plugin using a custom Lucene
filter and some special value storage arrangement for the fields, you will
have to store each start and end date as a separate document. So, there will
be N separate documents for each username if that user has N distinct
periods of availabilty. The start date and end date fields would also have
to be single valued instead of multi-valued as I specified in the earlier
post.

Sorry.
-- 
"Good Enough" is not good enough.
To give anything less than your best is to sacrifice the gift.
Quality First. Measure Twice. Cut Once.
http://www.israelekpo.com/

Re: Help with creating a solr schema

Posted by JaredM <em...@gmail.com>.
Thanks Ahmet and Israel.  I prefer Israel's approach since the amount of
metadata for the user is quite high but I'm not clear how to get around one
problem:

If I had 2 availabilities (I've left it in human-readable form instead of as
a UNIX timestamp only for ease of understanding):

<field name="start_date">10-Jan-2010</field>
<field name="start_date">20-Jan-2010</field>
<field name="end_date">25-Jan-2010</field>
<field name="end_date">28-Jan-2010</field>

and I wanted to query for availability between 12-Jan-2010 to 26-Jan-2010
then then wouldn't the above document be returned (even though the use would
not be available 20-25 Jan?
-- 
View this message in context: http://old.nabble.com/Help-with-creating-a-solr-schema-tp26979319p26990178.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Help with creating a solr schema

Posted by AHMET ARSLAN <io...@yahoo.com>.
> Hi,
> 
> I'm new to Solr but so far I think its great.  I've
> spent 2 weeks reading
> through the wiki and mailing list info.
> 
> I have a use case and I'm not sure what the best way is to
> implement it.  I
> am keeping track of peoples calendar schedules in a really
> simple way: each
> user can login and input a number of date ranges where they
> are available
> (so for example - User Alice might be available between
> 1-Jan-2010 -
> 15-Jan-2010 and 20-Feb-2010 - 22-Feb-2010 and
> 1-Mar-2010-5-Mar-2010.
> 
> In my data model I have this modelled as a one-to-many with
> a User table
> (consisting of username, some metadata) and an Availability
> table
> (consisting of start date and end date).
> 
> Now I need to search which users are available between a
> given date range. 
> The bit I'm having trouble with is how to store multiple
> start & end date
> pairs.  Can someone provide some guidance?


Many discussions have been made about solr data normalization.
You need to flatten you data. There will be repeated data.

In your case this can be an example:

fields:
id
user
end_date
start_date

id=0 user=Alice start_date:1-Jan-2010  end_date:15-Jan-2010 
id=1 user=Alice start_date:20-Feb-2010 end_date:22-Feb-2010
id=2 user=Alice start_date:1-Mar-2010  end_date:5-Mar-2010.


your query can be = start_date:[* TO given_start_date] AND end_date:[given_end_date TO *]

If you want Exclusive range query you can use { } curly brackets.

If you think that one user has overlapping available dates, you can facet on user field. &facet=true&facet.field=user&facet.mincount=1 This will give you distinct users.

If you are stroing your data in database, you can use http://wiki.apache.org/solr/DataImportHandler to index you data.

hope this helps.