You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Eric Khoury <ek...@hotmail.com> on 2012/08/02 16:45:06 UTC

Solr 4.0 - Join performance






Hello all,

 

I’m testing out the new join feature, hitting some perf
issues, as described in Erick’s article (http://architects.dzone.com/articles/solr-experimenting-join).

Basically, I’m using 2 objects in solr (this is a simplified
view):

 

Item

- Id

- Name

 

Grant

- ItemId

- AvailabilityStartTime

- AvailabilityEndTime

 

Each item can have multiple grants attached to it.

 

The query I'm using is the following, to find items by
name, filtered by grants availability window:

 

solr/select?fq=Name:XXX&q={!join
from=ItemId to=Id} AvailabilityStartTime:[* TO NOW] AND -AvailabilityEndTime:[*
TO NOW]

 

With a hundred thousand items, this query can take multiple seconds
to perform, due to the large number or ItemIds returned from the join query.

Has anyone come up with a better way to use joins for these types of queries?  Are there improvements planned in 4.0 rtm in this area?

 

Btw, I’ve explored simply adding Start-End times to items, but
the flat data model makes it hard to maintain start-end pairs.

 

Thanks for the help!

Eric.

 

 		 	   		  

Re: Solr 4.0 - Join performance

Posted by Mikhail Khludnev <mk...@griddynamics.com>.
Eric,

Unfortunately Solr guys ignores it.

On Tue, Aug 14, 2012 at 7:48 PM, Eric Khoury <ek...@hotmail.com> wrote:

>
> Hi Mikhail, was trying to figure out if solr-3076 made it into the beta,
> but since the issue is still marked as opened, I take it it didn't
> yet?Thanks,Eric.
>  > From: mkhludnev@griddynamics.com
> > Date: Fri, 3 Aug 2012 00:06:36 +0400
> > Subject: Re: Solr 4.0 - Join performance
> > To: ekhoury72@hotmail.com; solr-user@lucene.apache.org
> >
> > Eric,
> >
> > you can take last patch from SOLR-3076
> >  [image: Text File]
> > <
> https://issues.apache.org/jira/secure/attachment/12536717/SOLR-3076.patch>
> >  SOLR-3076.patch
> > <
> https://issues.apache.org/jira/secure/attachment/12536717/SOLR-3076.patch>
> > 16/Jul/12 21:16
> >
> > also can take it applied from
> > https://github.com/m-khl/solr-patches/tree/6611 . But the origin source
> > code might be a little bit old.
> > Regaining a nightly build, it's not so optimistic - I can't attract
> > committer for reviewing it.
> >
> > On Thu, Aug 2, 2012 at 11:51 PM, Eric Khoury <ek...@hotmail.com>
> wrote:
> >
> > >  Wow, great work Mikhail, that's impressive.
> > > I don't currently have build the dev tree, you wouldn't have a patch
> for
> > > the alpha build handy?
> > > If not, when do you think this'll be available in a nightly build?
> > > Thanks again,
> > > Eric.
> > > > From: mkhludnev@griddynamics.com
> > > > Date: Thu, 2 Aug 2012 22:38:13 +0400
> > > > Subject: Re: Solr 4.0 - Join performance
> > > > To: solr-user@lucene.apache.org
> > >
> > > >
> > > > Hello,
> > > >
> > > > You can check my record.
> > > >
> > >
> https://issues.apache.org/jira/browse/SOLR-3076?focusedCommentId=13415644&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13415644
> > > >
> > > > I'm still working on precise performance measurement.
> > > >
> > > > On Thu, Aug 2, 2012 at 6:45 PM, Eric Khoury <ek...@hotmail.com>
> > > wrote:
> > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > Hello all,
> > > > >
> > > > >
> > > > >
> > > > > I’m testing out the new join feature, hitting some perf
> > > > > issues, as described in Erick’s article (
> > > > > http://architects.dzone.com/articles/solr-experimenting-join).
> > > > >
> > > > > Basically, I’m using 2 objects in solr (this is a simplified
> > > > > view):
> > > > >
> > > > >
> > > > >
> > > > > Item
> > > > >
> > > > > - Id
> > > > >
> > > > > - Name
> > > > >
> > > > >
> > > > >
> > > > > Grant
> > > > >
> > > > > - ItemId
> > > > >
> > > > > - AvailabilityStartTime
> > > > >
> > > > > - AvailabilityEndTime
> > > > >
> > > > >
> > > > >
> > > > > Each item can have multiple grants attached to it.
> > > > >
> > > > >
> > > > >
> > > > > The query I'm using is the following, to find items by
> > > > > name, filtered by grants availability window:
> > > > >
> > > > >
> > > > >
> > > > > solr/select?fq=Name:XXX&q={!join
> > > > > from=ItemId to=Id} AvailabilityStartTime:[* TO NOW] AND
> > > > > -AvailabilityEndTime:[*
> > > > > TO NOW]
> > > > >
> > > > >
> > > > >
> > > > > With a hundred thousand items, this query can take multiple seconds
> > > > > to perform, due to the large number or ItemIds returned from the
> join
> > > > > query.
> > > > >
> > > > > Has anyone come up with a better way to use joins for these types
> of
> > > > > queries? Are there improvements planned in 4.0 rtm in this area?
> > > > >
> > > > >
> > > > >
> > > > > Btw, I’ve explored simply adding Start-End times to items, but
> > > > > the flat data model makes it hard to maintain start-end pairs.
> > > > >
> > > > >
> > > > >
> > > > > Thanks for the help!
> > > > >
> > > > > Eric.
> > > > >
> > > > >
> > > > >
> > > > >
> > > >
> > > >
> > > >
> > > >
> > > > --
> > > > Sincerely yours
> > > > Mikhail Khludnev
> > > > Tech Lead
> > > > Grid Dynamics
> > > >
> > > > <http://www.griddynamics.com>
> > > > <mk...@griddynamics.com>
> > >
> >
> >
> >
> > --
> > Sincerely yours
> > Mikhail Khludnev
> > Tech Lead
> > Grid Dynamics
> >
> > <http://www.griddynamics.com>
> >  <mk...@griddynamics.com>
>




-- 
Sincerely yours
Mikhail Khludnev
Tech Lead
Grid Dynamics

<http://www.griddynamics.com>
 <mk...@griddynamics.com>

RE: Solr 4.0 - Join performance

Posted by Eric Khoury <ek...@hotmail.com>.
Hi Mikhail, was trying to figure out if solr-3076 made it into the beta, but since the issue is still marked as opened, I take it it didn't yet?Thanks,Eric.
 > From: mkhludnev@griddynamics.com
> Date: Fri, 3 Aug 2012 00:06:36 +0400
> Subject: Re: Solr 4.0 - Join performance
> To: ekhoury72@hotmail.com; solr-user@lucene.apache.org
> 
> Eric,
> 
> you can take last patch from SOLR-3076
>  [image: Text File]
> <https://issues.apache.org/jira/secure/attachment/12536717/SOLR-3076.patch>
>  SOLR-3076.patch
> <https://issues.apache.org/jira/secure/attachment/12536717/SOLR-3076.patch>
> 16/Jul/12 21:16
> 
> also can take it applied from
> https://github.com/m-khl/solr-patches/tree/6611 . But the origin source
> code might be a little bit old.
> Regaining a nightly build, it's not so optimistic - I can't attract
> committer for reviewing it.
> 
> On Thu, Aug 2, 2012 at 11:51 PM, Eric Khoury <ek...@hotmail.com> wrote:
> 
> >  Wow, great work Mikhail, that's impressive.
> > I don't currently have build the dev tree, you wouldn't have a patch for
> > the alpha build handy?
> > If not, when do you think this'll be available in a nightly build?
> > Thanks again,
> > Eric.
> > > From: mkhludnev@griddynamics.com
> > > Date: Thu, 2 Aug 2012 22:38:13 +0400
> > > Subject: Re: Solr 4.0 - Join performance
> > > To: solr-user@lucene.apache.org
> >
> > >
> > > Hello,
> > >
> > > You can check my record.
> > >
> > https://issues.apache.org/jira/browse/SOLR-3076?focusedCommentId=13415644&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13415644
> > >
> > > I'm still working on precise performance measurement.
> > >
> > > On Thu, Aug 2, 2012 at 6:45 PM, Eric Khoury <ek...@hotmail.com>
> > wrote:
> > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > Hello all,
> > > >
> > > >
> > > >
> > > > I’m testing out the new join feature, hitting some perf
> > > > issues, as described in Erick’s article (
> > > > http://architects.dzone.com/articles/solr-experimenting-join).
> > > >
> > > > Basically, I’m using 2 objects in solr (this is a simplified
> > > > view):
> > > >
> > > >
> > > >
> > > > Item
> > > >
> > > > - Id
> > > >
> > > > - Name
> > > >
> > > >
> > > >
> > > > Grant
> > > >
> > > > - ItemId
> > > >
> > > > - AvailabilityStartTime
> > > >
> > > > - AvailabilityEndTime
> > > >
> > > >
> > > >
> > > > Each item can have multiple grants attached to it.
> > > >
> > > >
> > > >
> > > > The query I'm using is the following, to find items by
> > > > name, filtered by grants availability window:
> > > >
> > > >
> > > >
> > > > solr/select?fq=Name:XXX&q={!join
> > > > from=ItemId to=Id} AvailabilityStartTime:[* TO NOW] AND
> > > > -AvailabilityEndTime:[*
> > > > TO NOW]
> > > >
> > > >
> > > >
> > > > With a hundred thousand items, this query can take multiple seconds
> > > > to perform, due to the large number or ItemIds returned from the join
> > > > query.
> > > >
> > > > Has anyone come up with a better way to use joins for these types of
> > > > queries? Are there improvements planned in 4.0 rtm in this area?
> > > >
> > > >
> > > >
> > > > Btw, I’ve explored simply adding Start-End times to items, but
> > > > the flat data model makes it hard to maintain start-end pairs.
> > > >
> > > >
> > > >
> > > > Thanks for the help!
> > > >
> > > > Eric.
> > > >
> > > >
> > > >
> > > >
> > >
> > >
> > >
> > >
> > > --
> > > Sincerely yours
> > > Mikhail Khludnev
> > > Tech Lead
> > > Grid Dynamics
> > >
> > > <http://www.griddynamics.com>
> > > <mk...@griddynamics.com>
> >
> 
> 
> 
> -- 
> Sincerely yours
> Mikhail Khludnev
> Tech Lead
> Grid Dynamics
> 
> <http://www.griddynamics.com>
>  <mk...@griddynamics.com>
 		 	   		  

Re: Solr 4.0 - Join performance

Posted by Mikhail Khludnev <mk...@griddynamics.com>.
Eric,

you can take last patch from SOLR-3076
 [image: Text File]
<https://issues.apache.org/jira/secure/attachment/12536717/SOLR-3076.patch>
 SOLR-3076.patch
<https://issues.apache.org/jira/secure/attachment/12536717/SOLR-3076.patch>
16/Jul/12 21:16

also can take it applied from
https://github.com/m-khl/solr-patches/tree/6611 . But the origin source
code might be a little bit old.
Regaining a nightly build, it's not so optimistic - I can't attract
committer for reviewing it.

On Thu, Aug 2, 2012 at 11:51 PM, Eric Khoury <ek...@hotmail.com> wrote:

>  Wow, great work Mikhail, that's impressive.
> I don't currently have build the dev tree, you wouldn't have a patch for
> the alpha build handy?
> If not, when do you think this'll be available in a nightly build?
> Thanks again,
> Eric.
> > From: mkhludnev@griddynamics.com
> > Date: Thu, 2 Aug 2012 22:38:13 +0400
> > Subject: Re: Solr 4.0 - Join performance
> > To: solr-user@lucene.apache.org
>
> >
> > Hello,
> >
> > You can check my record.
> >
> https://issues.apache.org/jira/browse/SOLR-3076?focusedCommentId=13415644&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13415644
> >
> > I'm still working on precise performance measurement.
> >
> > On Thu, Aug 2, 2012 at 6:45 PM, Eric Khoury <ek...@hotmail.com>
> wrote:
> >
> > >
> > >
> > >
> > >
> > >
> > >
> > > Hello all,
> > >
> > >
> > >
> > > I’m testing out the new join feature, hitting some perf
> > > issues, as described in Erick’s article (
> > > http://architects.dzone.com/articles/solr-experimenting-join).
> > >
> > > Basically, I’m using 2 objects in solr (this is a simplified
> > > view):
> > >
> > >
> > >
> > > Item
> > >
> > > - Id
> > >
> > > - Name
> > >
> > >
> > >
> > > Grant
> > >
> > > - ItemId
> > >
> > > - AvailabilityStartTime
> > >
> > > - AvailabilityEndTime
> > >
> > >
> > >
> > > Each item can have multiple grants attached to it.
> > >
> > >
> > >
> > > The query I'm using is the following, to find items by
> > > name, filtered by grants availability window:
> > >
> > >
> > >
> > > solr/select?fq=Name:XXX&q={!join
> > > from=ItemId to=Id} AvailabilityStartTime:[* TO NOW] AND
> > > -AvailabilityEndTime:[*
> > > TO NOW]
> > >
> > >
> > >
> > > With a hundred thousand items, this query can take multiple seconds
> > > to perform, due to the large number or ItemIds returned from the join
> > > query.
> > >
> > > Has anyone come up with a better way to use joins for these types of
> > > queries? Are there improvements planned in 4.0 rtm in this area?
> > >
> > >
> > >
> > > Btw, I’ve explored simply adding Start-End times to items, but
> > > the flat data model makes it hard to maintain start-end pairs.
> > >
> > >
> > >
> > > Thanks for the help!
> > >
> > > Eric.
> > >
> > >
> > >
> > >
> >
> >
> >
> >
> > --
> > Sincerely yours
> > Mikhail Khludnev
> > Tech Lead
> > Grid Dynamics
> >
> > <http://www.griddynamics.com>
> > <mk...@griddynamics.com>
>



-- 
Sincerely yours
Mikhail Khludnev
Tech Lead
Grid Dynamics

<http://www.griddynamics.com>
 <mk...@griddynamics.com>

Re: Solr 4.0 - Join performance

Posted by Mikhail Khludnev <mk...@griddynamics.com>.
Hello,

You can check my record.
https://issues.apache.org/jira/browse/SOLR-3076?focusedCommentId=13415644&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13415644

I'm still working on precise performance measurement.

On Thu, Aug 2, 2012 at 6:45 PM, Eric Khoury <ek...@hotmail.com> wrote:

>
>
>
>
>
>
> Hello all,
>
>
>
> I’m testing out the new join feature, hitting some perf
> issues, as described in Erick’s article (
> http://architects.dzone.com/articles/solr-experimenting-join).
>
> Basically, I’m using 2 objects in solr (this is a simplified
> view):
>
>
>
> Item
>
> - Id
>
> - Name
>
>
>
> Grant
>
> - ItemId
>
> - AvailabilityStartTime
>
> - AvailabilityEndTime
>
>
>
> Each item can have multiple grants attached to it.
>
>
>
> The query I'm using is the following, to find items by
> name, filtered by grants availability window:
>
>
>
> solr/select?fq=Name:XXX&q={!join
> from=ItemId to=Id} AvailabilityStartTime:[* TO NOW] AND
> -AvailabilityEndTime:[*
> TO NOW]
>
>
>
> With a hundred thousand items, this query can take multiple seconds
> to perform, due to the large number or ItemIds returned from the join
> query.
>
> Has anyone come up with a better way to use joins for these types of
> queries?  Are there improvements planned in 4.0 rtm in this area?
>
>
>
> Btw, I’ve explored simply adding Start-End times to items, but
> the flat data model makes it hard to maintain start-end pairs.
>
>
>
> Thanks for the help!
>
> Eric.
>
>
>
>




-- 
Sincerely yours
Mikhail Khludnev
Tech Lead
Grid Dynamics

<http://www.griddynamics.com>
 <mk...@griddynamics.com>

Re: Issue using SpatialRecursivePrefixTreeFieldType

Posted by "David Smiley (@MITRE.org)" <DS...@mitre.org>.
Nice!

On Oct 17, 2012, at 10:50 AM, Eric Khoury [via Lucene] wrote:


I'm using the X axis for time availability start and end (total minutes since Jan 2012), each asset can have multiple rectangles (multiple avail start and end).  My original design had a bounding rect of 20 years (0 - 10,000,000 minutes), with certain assets available for the whole time.  Since I'm certain that all my data gets reindexed at least once a month, I changed the design to simply generate availability for this month + next month, so rectangles are now (0 - 45,000 minutes).  And for assets that are available for the complete month, which will be the case for a large percentage of assets, I just mark with a flag, which avoids me creating a rect for that entry all together.  Eric.
 > Date: Tue, 16 Oct 2012 13:00:45 -0700

> From: [hidden email]<x-msg://100/user/SendEmail.jtp?type=node&node=4014257&i=0>
> To: [hidden email]<x-msg://100/user/SendEmail.jtp?type=node&node=4014257&i=1>
> Subject: Re: Issue using SpatialRecursivePrefixTreeFieldType
>
> Eric,
>   Can you please elaborate on your workaround?  I'm not sure I get your drift.
> ~ David
> On Oct 16, 2012, at 12:54 PM, Eric Khoury [via Lucene] wrote:
>
> >
> > Thanks for the help David, makes sense.  I found a workaround, creating much smaller rectangles and updating them more often.Glad to have this functionality, thanks again!Eric.
>
>
>
>
>
> -----
>  Author: http://www.packtpub.com/apache-solr-3-enterprise-search-server/book
> --
> View this message in context: http://lucene.472066.n3.nabble.com/Solr-4-0-Join-performance-tp3998827p4014070.html
> Sent from the Solr - User mailing list archive at Nabble.com<http://Nabble.com>.


________________________________
If you reply to this email, your message will be added to the discussion below:
http://lucene.472066.n3.nabble.com/Solr-4-0-Join-performance-tp3998827p4014257.html
To unsubscribe from Solr 4.0 - Join performance, click here<http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=3998827&code=RFNNSUxFWUBtaXRyZS5vcmd8Mzk5ODgyN3wxMDE2NDI2OTUw>.
NAML<http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>





-----
 Author: http://www.packtpub.com/apache-solr-3-enterprise-search-server/book
--
View this message in context: http://lucene.472066.n3.nabble.com/Solr-4-0-Join-performance-tp3998827p4014265.html
Sent from the Solr - User mailing list archive at Nabble.com.

RE: Issue using SpatialRecursivePrefixTreeFieldType

Posted by Eric Khoury <ek...@hotmail.com>.
I'm using the X axis for time availability start and end (total minutes since Jan 2012), each asset can have multiple rectangles (multiple avail start and end).  My original design had a bounding rect of 20 years (0 - 10,000,000 minutes), with certain assets available for the whole time.  Since I'm certain that all my data gets reindexed at least once a month, I changed the design to simply generate availability for this month + next month, so rectangles are now (0 - 45,000 minutes).  And for assets that are available for the complete month, which will be the case for a large percentage of assets, I just mark with a flag, which avoids me creating a rect for that entry all together.  Eric.
 > Date: Tue, 16 Oct 2012 13:00:45 -0700
> From: DSMILEY@mitre.org
> To: solr-user@lucene.apache.org
> Subject: Re: Issue using SpatialRecursivePrefixTreeFieldType
> 
> Eric,
>   Can you please elaborate on your workaround?  I'm not sure I get your drift.
> ~ David
> On Oct 16, 2012, at 12:54 PM, Eric Khoury [via Lucene] wrote:
> 
> > 
> > Thanks for the help David, makes sense.  I found a workaround, creating much smaller rectangles and updating them more often.Glad to have this functionality, thanks again!Eric. 
> 
> 
> 
> 
> 
> -----
>  Author: http://www.packtpub.com/apache-solr-3-enterprise-search-server/book
> --
> View this message in context: http://lucene.472066.n3.nabble.com/Solr-4-0-Join-performance-tp3998827p4014070.html
> Sent from the Solr - User mailing list archive at Nabble.com.
 		 	   		  

Re: Issue using SpatialRecursivePrefixTreeFieldType

Posted by "David Smiley (@MITRE.org)" <DS...@mitre.org>.
Eric,
  Can you please elaborate on your workaround?  I'm not sure I get your drift.
~ David
On Oct 16, 2012, at 12:54 PM, Eric Khoury [via Lucene] wrote:

> 
> Thanks for the help David, makes sense.  I found a workaround, creating much smaller rectangles and updating them more often.Glad to have this functionality, thanks again!Eric. 





-----
 Author: http://www.packtpub.com/apache-solr-3-enterprise-search-server/book
--
View this message in context: http://lucene.472066.n3.nabble.com/Solr-4-0-Join-performance-tp3998827p4014070.html
Sent from the Solr - User mailing list archive at Nabble.com.

RE: Issue using SpatialRecursivePrefixTreeFieldType

Posted by Eric Khoury <ek...@hotmail.com>.
Thanks for the help David, makes sense.  I found a workaround, creating much smaller rectangles and updating them more often.Glad to have this functionality, thanks again!Eric.
 > Date: Fri, 12 Oct 2012 21:06:52 -0700
> From: DSMILEY@mitre.org
> To: solr-user@lucene.apache.org
> Subject: Re: Issue using SpatialRecursivePrefixTreeFieldType
> 
> Hi again Eric,
>   I could see this unusual use-case of Lucene/Solr spatial really stressing
> it out.  When you say that you "create a rectangle", I figure you mean at
> indexing time -- I'm pretty confident searches are going to be quick in
> nearly any scenario.  At indexing time, yes, it'll basically overlay your
> big rectangle on a stack of matrixes (i.e. grids) of different sizes.  In
> the middle of your rectangle, there are going to be large indexed rectangles
> that efficiently cover a lot of space, but there are going to be more and
> more and more smaller rectangles towards the edge of your rectangle at
> increasing precision.  For your case here, I wouldn't be surprised if it
> wanted to generate hundreds of thousands of grid cells which isn't going to
> scale at all.  "Normally" (in a geospatial context) distErrPct is non-zero
> and so your shape is approximated -- it will only generate smaller and
> smaller grid cells at the edges up until a threshold, relative the the
> overall size of the shape.  Approximating geospatial areas is normal since
> say a polygon is not going to truly be infinitely precise as the digits you
> give the coordinates.  But for you... well I don't know if your use case
> allows the rectangle edges to be approximated but even if you said that'd be
> fine, solr.SpatialRecursivePrefixTreeFieldType will use the same
> approximation measure for both dimensions, meaning the max-y of 11 will
> probably be pushed off into the tens or hundreds of thousands due to the
> max-x being 5 million.
>   So... I can think of how to solve this if I had lots of time to work on it
> but I don't.  What I'm about to write is mostly for me if I look back on
> this.  What I'd do is use solr.SpatialRecursivePrefixTreeFieldType with a
> default approximating distErrPct such that this field acts as a fast
> approximating filter.  But then I'd need something to weed out the
> false-positive hits and that would be a job for a different spatial
> strategy.  Over here: https://github.com/ryantxu/spatial-solr-sandbox  in
> "LSE" there is a JtsGeometryStrategy which will do perfectly accurate
> spatial matching, but it's not really "indexed" (it isn't fast at all). 
> That one is a bit experimental now but I think it works ( I didn't write it
> or use it, Ryan McKinley did ) or it at least should with minor
> modification.  Simply using both strategies together isn't enough, since it
> won't be fast enough if your query shapes are big. I think the technologies
> in both of these could be combined such that only the approximating final
> leaves of the prefix tree are then passed to the JTS strategy, as opposed to
> all matching shapes in the middle of the query shape which are known to
> match with confidence.
> 
> ~ David
> 
> 
> 
> -----
>  Author: http://www.packtpub.com/apache-solr-3-enterprise-search-server/book
> --
> View this message in context: http://lucene.472066.n3.nabble.com/Solr-4-0-Join-performance-tp3998827p4013521.html
> Sent from the Solr - User mailing list archive at Nabble.com.
 		 	   		  

Re: Issue using SpatialRecursivePrefixTreeFieldType

Posted by "David Smiley (@MITRE.org)" <DS...@mitre.org>.
Hi again Eric,
  I could see this unusual use-case of Lucene/Solr spatial really stressing
it out.  When you say that you "create a rectangle", I figure you mean at
indexing time -- I'm pretty confident searches are going to be quick in
nearly any scenario.  At indexing time, yes, it'll basically overlay your
big rectangle on a stack of matrixes (i.e. grids) of different sizes.  In
the middle of your rectangle, there are going to be large indexed rectangles
that efficiently cover a lot of space, but there are going to be more and
more and more smaller rectangles towards the edge of your rectangle at
increasing precision.  For your case here, I wouldn't be surprised if it
wanted to generate hundreds of thousands of grid cells which isn't going to
scale at all.  "Normally" (in a geospatial context) distErrPct is non-zero
and so your shape is approximated -- it will only generate smaller and
smaller grid cells at the edges up until a threshold, relative the the
overall size of the shape.  Approximating geospatial areas is normal since
say a polygon is not going to truly be infinitely precise as the digits you
give the coordinates.  But for you... well I don't know if your use case
allows the rectangle edges to be approximated but even if you said that'd be
fine, solr.SpatialRecursivePrefixTreeFieldType will use the same
approximation measure for both dimensions, meaning the max-y of 11 will
probably be pushed off into the tens or hundreds of thousands due to the
max-x being 5 million.
  So... I can think of how to solve this if I had lots of time to work on it
but I don't.  What I'm about to write is mostly for me if I look back on
this.  What I'd do is use solr.SpatialRecursivePrefixTreeFieldType with a
default approximating distErrPct such that this field acts as a fast
approximating filter.  But then I'd need something to weed out the
false-positive hits and that would be a job for a different spatial
strategy.  Over here: https://github.com/ryantxu/spatial-solr-sandbox  in
"LSE" there is a JtsGeometryStrategy which will do perfectly accurate
spatial matching, but it's not really "indexed" (it isn't fast at all). 
That one is a bit experimental now but I think it works ( I didn't write it
or use it, Ryan McKinley did ) or it at least should with minor
modification.  Simply using both strategies together isn't enough, since it
won't be fast enough if your query shapes are big. I think the technologies
in both of these could be combined such that only the approximating final
leaves of the prefix tree are then passed to the JTS strategy, as opposed to
all matching shapes in the middle of the query shape which are known to
match with confidence.

~ David



-----
 Author: http://www.packtpub.com/apache-solr-3-enterprise-search-server/book
--
View this message in context: http://lucene.472066.n3.nabble.com/Solr-4-0-Join-performance-tp3998827p4013521.html
Sent from the Solr - User mailing list archive at Nabble.com.

Issue using SpatialRecursivePrefixTreeFieldType

Posted by Eric Khoury <ek...@hotmail.com>.
Hi David, I'm defining my field as such: <fieldType name="rectangle" class="solr.SpatialRecursivePrefixTreeFieldType" geo="false" distErrPct="0" maxDetailDist="1" worldBounds="0 0 10916173 200000"/> When I create a large rectangle, say "10 10 5000000 11", Solr seems to freeze for quite some time.  I haven't looked at your code, but I can imagine the algorithm basically fills in some sort of indexing matrix, and that's what's taking so long for large rectangles? Is there a limit to how big the worldBounds should be?Thanks!Eric.
 		 	   		  

RE: Solr 4.0 - Join performance

Posted by Eric Khoury <ek...@hotmail.com>.
Thanks David, will work around this issue for now, and will keep an eye out for changes to solr-3304.Good luck with the rethink.Eric.
 > Date: Wed, 29 Aug 2012 08:44:14 -0700
> From: DSMILEY@mitre.org
> To: solr-user@lucene.apache.org
> Subject: Re: Solr 4.0 - Join performance
> 
> The solr.GeoHashFieldType is useless; I'd like to see it deprecated then removed.  You'll need to go with unreleased code and apply patches or wait till Solr 4.
> 
> ~ David
> 
> On Aug 29, 2012, at 10:53 AM, Eric Khoury [via Lucene] wrote:
> 
> 
> Awesome, thanks David.  In the meantime, could I potentially use geohash, or something similar?  Geohash looks like it supports seperate "lon" or "lat" range queries which would help, but its not a multivalue field, which I need.
>  > Date: Wed, 29 Aug 2012 07:20:42 -0700
> 
> > From: [hidden email]<x-msg://228/user/SendEmail.jtp?type=node&node=4004060&i=0>
> > To: [hidden email]<x-msg://228/user/SendEmail.jtp?type=node&node=4004060&i=1>
> > Subject: Re: Solr 4.0 - Join performance
> >
> > Solr 4 is certainly the goal.  There's a bit of a setback at the moment until some of the Lucene spatial API is re-thought.  I'm working heavily on such things this week.
> > ~ David
> >
> > On Aug 28, 2012, at 6:22 PM, Eric Khoury [via Lucene] wrote:
> >
> >
> > David, Solr support for this will come in Solr-3304 I suppose?http://wiki.apache.org/solr/SolrAdaptersForLuceneSpatial4Any idea if this is going to make it into Solr 4.0? Thanks,Eric.
> >  > Date: Wed, 15 Aug 2012 07:07:21 -0700
> >
> > > From: [hidden email]<x-msg://178/user/SendEmail.jtp?type=node&node=4003852&i=0>
> > > To: [hidden email]<x-msg://178/user/SendEmail.jtp?type=node&node=4003852&i=1>
> > > Subject: RE: Solr 4.0 - Join performance
> > >
> > > You would index rectangles of 0 height but that have a left edge 'x' of the
> > > start time and a right edge 'x' of your end time.  You can index a variable
> > > number of these per Solr document and then query by either a point or
> > > another rectangle to find documents which intersect your query shape.  It
> > > can't do a completely within based query, just intersection for now.  I
> > > really look forward to seeing this wrapped up in some sort of RangeFieldType
> > > so that users don't have to think in spatial terms.
> > >
> > >
> > >
> > > -----
> > >  Author: http://www.packtpub.com/apache-solr-3-enterprise-search-server/book
> > > --
> > > View this message in context: http://lucene.472066.n3.nabble.com/Solr-4-0-Join-performance-tp3998827p4001404.html
> > > Sent from the Solr - User mailing list archive at Nabble.com<http://Nabble.com><http://Nabble.com<http://Nabble.com/>>.
> >
> >
> > ________________________________
> > If you reply to this email, your message will be added to the discussion below:
> >
> > NAML<<x-msg://228/>http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>
> >
> >
> >
> >
> >
> > -----
> >  Author: http://www.packtpub.com/apache-solr-3-enterprise-search-server/book
> > --
> > View this message in context: http://lucene.472066.n3.nabble.com/Solr-4-0-Join-performance-tp3998827p4004035.html
> > Sent from the Solr - User mailing list archive at Nabble.com<http://Nabble.com>.
> 
> 
> ________________________________
> If you reply to this email, your message will be added to the discussion below:
> http://lucene.472066.n3.nabble.com/Solr-4-0-Join-performance-tp3998827p4004060.html
> To unsubscribe from Solr 4.0 - Join performance, click here<http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=3998827&code=RFNNSUxFWUBtaXRyZS5vcmd8Mzk5ODgyN3wxMDE2NDI2OTUw>.
> NAML<http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>
> 
> 
> 
> 
> 
> -----
>  Author: http://www.packtpub.com/apache-solr-3-enterprise-search-server/book
> --
> View this message in context: http://lucene.472066.n3.nabble.com/Solr-4-0-Join-performance-tp3998827p4004073.html
> Sent from the Solr - User mailing list archive at Nabble.com.
 		 	   		  

Re: Solr 4.0 - Join performance

Posted by "David Smiley (@MITRE.org)" <DS...@mitre.org>.
Yes absolutely.  Since 4.0 hasn't been released, anything with a fix version to 4.0 basically implies trunk as well.  Also notice my comment "Committed to trunk & 4x" which is explicit.
~ David

On Sep 17, 2012, at 12:02 PM, Eric Khoury [via Lucene] wrote:


Hi David, I see that you committed the work for solr-3304 to the 4.x tree, which is great news, thanks.I'm not fully familiar with the process, does that mean its currently available in the nighty builds?Eric.
 > Date: Wed, 29 Aug 2012 08:44:14 -0700

> From: [hidden email]<x-msg://175/user/SendEmail.jtp?type=node&node=4008368&i=0>
> To: [hidden email]<x-msg://175/user/SendEmail.jtp?type=node&node=4008368&i=1>
> Subject: Re: Solr 4.0 - Join performance
>
> The solr.GeoHashFieldType is useless; I'd like to see it deprecated then removed.  You'll need to go with unreleased code and apply patches or wait till Solr 4.
>
> ~ David
>
> On Aug 29, 2012, at 10:53 AM, Eric Khoury [via Lucene] wrote:
>
>
> Awesome, thanks David.  In the meantime, could I potentially use geohash, or something similar?  Geohash looks like it supports seperate "lon" or "lat" range queries which would help, but its not a multivalue field, which I need.
>  > Date: Wed, 29 Aug 2012 07:20:42 -0700
>
> > From: [hidden email]<x-msg://228/user/SendEmail.jtp?type=node&node=4004060&i=0>
> > To: [hidden email]<x-msg://228/user/SendEmail.jtp?type=node&node=4004060&i=1>
> > Subject: Re: Solr 4.0 - Join performance
> >
> > Solr 4 is certainly the goal.  There's a bit of a setback at the moment until some of the Lucene spatial API is re-thought.  I'm working heavily on such things this week.
> > ~ David
> >
> > On Aug 28, 2012, at 6:22 PM, Eric Khoury [via Lucene] wrote:
> >
> >
> > David, Solr support for this will come in Solr-3304 I suppose?http://wiki.apache.org/solr/SolrAdaptersForLuceneSpatial4Any idea if this is going to make it into Solr 4.0? Thanks,Eric.
> >  > Date: Wed, 15 Aug 2012 07:07:21 -0700
> >
> > > From: [hidden email]<x-msg://178/user/SendEmail.jtp?type=node&node=4003852&i=0>
> > > To: [hidden email]<x-msg://178/user/SendEmail.jtp?type=node&node=4003852&i=1>
> > > Subject: RE: Solr 4.0 - Join performance
> > >
> > > You would index rectangles of 0 height but that have a left edge 'x' of the
> > > start time and a right edge 'x' of your end time.  You can index a variable
> > > number of these per Solr document and then query by either a point or
> > > another rectangle to find documents which intersect your query shape.  It
> > > can't do a completely within based query, just intersection for now.  I
> > > really look forward to seeing this wrapped up in some sort of RangeFieldType
> > > so that users don't have to think in spatial terms.
> > >
> > >
> > >
> > > -----
> > >  Author: http://www.packtpub.com/apache-solr-3-enterprise-search-server/book
> > > --
> > > View this message in context: http://lucene.472066.n3.nabble.com/Solr-4-0-Join-performance-tp3998827p4001404.html
> > > Sent from the Solr - User mailing list archive at Nabble.com<http://Nabble.com><http://Nabble.com<http://Nabble.com/>><http://Nabble.com<http://Nabble.com/><http://Nabble.com/>>.
> >
> >
> > ________________________________
> > If you reply to this email, your message will be added to the discussion below:
> >
> > NAML<<x-msg://228/>http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>
> >
> >
> >
> >
> >
> > -----
> >  Author: http://www.packtpub.com/apache-solr-3-enterprise-search-server/book
> > --
> > View this message in context: http://lucene.472066.n3.nabble.com/Solr-4-0-Join-performance-tp3998827p4004035.html
> > Sent from the Solr - User mailing list archive at Nabble.com<http://Nabble.com><http://Nabble.com<http://Nabble.com/>>.
>
>
> ________________________________
> If you reply to this email, your message will be added to the discussion below:
>
> NAML<<x-msg://175/>http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>
>
>
>
>
>
> -----
>  Author: http://www.packtpub.com/apache-solr-3-enterprise-search-server/book
> --
> View this message in context: http://lucene.472066.n3.nabble.com/Solr-4-0-Join-performance-tp3998827p4004073.html
> Sent from the Solr - User mailing list archive at Nabble.com<http://Nabble.com>.


________________________________
If you reply to this email, your message will be added to the discussion below:
http://lucene.472066.n3.nabble.com/Solr-4-0-Join-performance-tp3998827p4008368.html
To unsubscribe from Solr 4.0 - Join performance, click here<http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=3998827&code=RFNNSUxFWUBtaXRyZS5vcmd8Mzk5ODgyN3wxMDE2NDI2OTUw>.
NAML<http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>





-----
 Author: http://www.packtpub.com/apache-solr-3-enterprise-search-server/book
--
View this message in context: http://lucene.472066.n3.nabble.com/Solr-4-0-Join-performance-tp3998827p4008392.html
Sent from the Solr - User mailing list archive at Nabble.com.

RE: Using Solr-3304

Posted by Eric Khoury <ek...@hotmail.com>.
Thanks David, I'll play around with it.  I appreciate the help,Eric.
 > Date: Fri, 21 Sep 2012 14:47:36 -0700
> From: DSMILEY@mitre.org
> To: solr-user@lucene.apache.org
> Subject: RE: Using Solr-3304
> 
> When I said "boundary" I meant worldBounds.
> 
> Oh, and set distErrPct="0" to get precise shapes; the default is non-zero.
> It'll use more disk space of course, and all the more reason to carefully
> choose your world bounds carefully.
> 
> 
> 
> -----
>  Author: http://www.packtpub.com/apache-solr-3-enterprise-search-server/book
> --
> View this message in context: http://lucene.472066.n3.nabble.com/Solr-4-0-Join-performance-tp3998827p4009490.html
> Sent from the Solr - User mailing list archive at Nabble.com.
 		 	   		  

RE: Using Solr-3304

Posted by "David Smiley (@MITRE.org)" <DS...@mitre.org>.
When I said "boundary" I meant worldBounds.

Oh, and set distErrPct="0" to get precise shapes; the default is non-zero.
It'll use more disk space of course, and all the more reason to carefully
choose your world bounds carefully.



-----
 Author: http://www.packtpub.com/apache-solr-3-enterprise-search-server/book
--
View this message in context: http://lucene.472066.n3.nabble.com/Solr-4-0-Join-performance-tp3998827p4009490.html
Sent from the Solr - User mailing list archive at Nabble.com.

RE: Using Solr-3304

Posted by "David Smiley (@MITRE.org)" <DS...@mitre.org>.
If you can stick to two dimensions then great.  Remember to set the boundary
attribute on the field type as I described so that spatial knows the
numerical boundaries that all the data must fit in.  e.g. boundary="0 0
100000 2.5" (substituting whatever appropriate number of time units you need
for 100000 there).




-----
 Author: http://www.packtpub.com/apache-solr-3-enterprise-search-server/book
--
View this message in context: http://lucene.472066.n3.nabble.com/Solr-4-0-Join-performance-tp3998827p4009488.html
Sent from the Solr - User mailing list archive at Nabble.com.

RE: Using Solr-3304

Posted by Eric Khoury <ek...@hotmail.com>.
The requirments have evolved.  :-)  This is still the best solution for my needs, I'm close, I belive this can work.  Removing quality from the equation, I have to deal with pairs of GroupIds and Times.  If I set the Y access to 0, as you mentioned, can I create a pair of X values with the groupId as a whole part and the ticks as decimals?  In other word, <field name="rectangle">GroupId.StartTicks 0 GroupId.EndTicks 2.5</field>.
 > From: dsmiley@mitre.org
> To: solr-user@lucene.apache.org
> Subject: Re: Using Solr-3304
> Date: Fri, 21 Sep 2012 21:07:21 +0000
> 
> Spatial doesn't (yet) support 3d.  If you have multi-value relationships across all 3 parameters you mentioned, then you're a bit stuck.  I thought you had 1d (time) multi-value ranges without needing to correlate that to other numeric ranges that are also multi-value.
> 
> On Sep 21, 2012, at 5:03 PM, Eric Khoury wrote:
> 
> > 
> > I have to deal with 3 parameters, time filtering, a groupid (1 to 2000) and a quality value (1 to 5), and was hoping to use  a X format = Group.Ticks and Y = quality level, where ticks is the number of ticks for a given time, rounded to the minute.  In other words, my field indexing would look like: <field name="RightsData2">45.634801234 1.5 45.634805667 2.5</field>.  I guess I'm missing something, as I thought that would define a rectangle.  Where do the min max values come into play?  > Date: Fri, 21 Sep 2012 13:55:24 -0700
> >> From: DSMILEY@mitre.org
> >> To: solr-user@lucene.apache.org
> >> Subject: Re: Using Solr-3304
> >> 
> >> For your use-case of time ranges, set geo="false" (as you've done).  At this point you have a quad tree but it doesn't (yet) work properly for the default min & max numbers that a double can store, so you need to specify the boundary rectangle explicitly and to the particular numbers for your use-case.  Use 0 for the 'Y's.  Think about the smallest granularity of time you need (a minute?) and what the earliest time you need to recognize and the furthest out as measured in your time granularity (in minutes?).  The boundary minX can be zero which will be your epoch, and the maxX will be the farthest out in time you can go -- who knows.  Set maxDetailDist to 1.
> >> 
> >> On Sep 21, 2012, at 4:44 PM, Eric Khoury [via Lucene] wrote:
> >> 
> >> 
> >> David, I tried increasing the maxDetailDist, as I need 9 decimal value precision.    <fieldType name="rectangle" class="solr.SpatialRecursivePrefixTreeFieldType" geo="false" distErrPct="0.025" maxDetailDist="0.000000001" />  But when I do, I get the following error: Data "SEVERE: org.apache.solr.common.SolrException: ERROR: [doc=MV000000050000] Error adding field 'rectangle'='45.634801234 1.5 45.634805667 2.5' msg=Y values [-1.7976931348623157E308 to Infinity] not in boundary Rect(minX=-1.7976931348623157E308,maxX=1.7976931348623157E308,minY=-1.7976931348623157E308,maxY=1.7976931348623157E308)" string
> >> Any ideas?Eric. PS: what does geo=true\false change?
> >>> Date: Fri, 21 Sep 2012 10:34:07 -0700
> >> 
> >>> From: [hidden email]<x-msg://330/user/SendEmail.jtp?type=node&node=4009479&i=0>
> >>> To: [hidden email]<x-msg://330/user/SendEmail.jtp?type=node&node=4009479&i=1>
> >>> Subject: Re: Using Solr-3304
> >>> 
> >>> http://wiki.apache.org/solr/SolrAdaptersForLuceneSpatial4
> >>> Definitely needs some updating; I will try to get to that this weekend.
> >>> 
> >>> 
> >>> 
> >>> -----
> >>> Author: http://www.packtpub.com/apache-solr-3-enterprise-search-server/book
> >>> --
> >>> View this message in context: http://lucene.472066.n3.nabble.com/Solr-4-0-Join-performance-tp3998827p4009454.html
> >>> Sent from the Solr - User mailing list archive at Nabble.com<http://Nabble.com>.
> >> 
> >> 
> >> ________________________________
> >> If you reply to this email, your message will be added to the discussion below:
> >> http://lucene.472066.n3.nabble.com/Solr-4-0-Join-performance-tp3998827p4009479.html
> >> To unsubscribe from Solr 4.0 - Join performance, click here<http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=3998827&code=RFNNSUxFWUBtaXRyZS5vcmd8Mzk5ODgyN3wxMDE2NDI2OTUw>.
> >> NAML<http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>
> >> 
> >> 
> >> 
> >> 
> >> 
> >> -----
> >> Author: http://www.packtpub.com/apache-solr-3-enterprise-search-server/book
> >> --
> >> View this message in context: http://lucene.472066.n3.nabble.com/Solr-4-0-Join-performance-tp3998827p4009483.html
> >> Sent from the Solr - User mailing list archive at Nabble.com.
> > 		 	   		  
> 
 		 	   		  

Re: Using Solr-3304

Posted by "Smiley, David W." <ds...@mitre.org>.
Spatial doesn't (yet) support 3d.  If you have multi-value relationships across all 3 parameters you mentioned, then you're a bit stuck.  I thought you had 1d (time) multi-value ranges without needing to correlate that to other numeric ranges that are also multi-value.

On Sep 21, 2012, at 5:03 PM, Eric Khoury wrote:

> 
> I have to deal with 3 parameters, time filtering, a groupid (1 to 2000) and a quality value (1 to 5), and was hoping to use  a X format = Group.Ticks and Y = quality level, where ticks is the number of ticks for a given time, rounded to the minute.  In other words, my field indexing would look like: <field name="RightsData2">45.634801234 1.5 45.634805667 2.5</field>.  I guess I'm missing something, as I thought that would define a rectangle.  Where do the min max values come into play?  > Date: Fri, 21 Sep 2012 13:55:24 -0700
>> From: DSMILEY@mitre.org
>> To: solr-user@lucene.apache.org
>> Subject: Re: Using Solr-3304
>> 
>> For your use-case of time ranges, set geo="false" (as you've done).  At this point you have a quad tree but it doesn't (yet) work properly for the default min & max numbers that a double can store, so you need to specify the boundary rectangle explicitly and to the particular numbers for your use-case.  Use 0 for the 'Y's.  Think about the smallest granularity of time you need (a minute?) and what the earliest time you need to recognize and the furthest out as measured in your time granularity (in minutes?).  The boundary minX can be zero which will be your epoch, and the maxX will be the farthest out in time you can go -- who knows.  Set maxDetailDist to 1.
>> 
>> On Sep 21, 2012, at 4:44 PM, Eric Khoury [via Lucene] wrote:
>> 
>> 
>> David, I tried increasing the maxDetailDist, as I need 9 decimal value precision.    <fieldType name="rectangle" class="solr.SpatialRecursivePrefixTreeFieldType" geo="false" distErrPct="0.025" maxDetailDist="0.000000001" />  But when I do, I get the following error: Data "SEVERE: org.apache.solr.common.SolrException: ERROR: [doc=MV000000050000] Error adding field 'rectangle'='45.634801234 1.5 45.634805667 2.5' msg=Y values [-1.7976931348623157E308 to Infinity] not in boundary Rect(minX=-1.7976931348623157E308,maxX=1.7976931348623157E308,minY=-1.7976931348623157E308,maxY=1.7976931348623157E308)" string
>> Any ideas?Eric. PS: what does geo=true\false change?
>>> Date: Fri, 21 Sep 2012 10:34:07 -0700
>> 
>>> From: [hidden email]<x-msg://330/user/SendEmail.jtp?type=node&node=4009479&i=0>
>>> To: [hidden email]<x-msg://330/user/SendEmail.jtp?type=node&node=4009479&i=1>
>>> Subject: Re: Using Solr-3304
>>> 
>>> http://wiki.apache.org/solr/SolrAdaptersForLuceneSpatial4
>>> Definitely needs some updating; I will try to get to that this weekend.
>>> 
>>> 
>>> 
>>> -----
>>> Author: http://www.packtpub.com/apache-solr-3-enterprise-search-server/book
>>> --
>>> View this message in context: http://lucene.472066.n3.nabble.com/Solr-4-0-Join-performance-tp3998827p4009454.html
>>> Sent from the Solr - User mailing list archive at Nabble.com<http://Nabble.com>.
>> 
>> 
>> ________________________________
>> If you reply to this email, your message will be added to the discussion below:
>> http://lucene.472066.n3.nabble.com/Solr-4-0-Join-performance-tp3998827p4009479.html
>> To unsubscribe from Solr 4.0 - Join performance, click here<http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=3998827&code=RFNNSUxFWUBtaXRyZS5vcmd8Mzk5ODgyN3wxMDE2NDI2OTUw>.
>> NAML<http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>
>> 
>> 
>> 
>> 
>> 
>> -----
>> Author: http://www.packtpub.com/apache-solr-3-enterprise-search-server/book
>> --
>> View this message in context: http://lucene.472066.n3.nabble.com/Solr-4-0-Join-performance-tp3998827p4009483.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
> 		 	   		  


RE: Using Solr-3304

Posted by Eric Khoury <ek...@hotmail.com>.
I have to deal with 3 parameters, time filtering, a groupid (1 to 2000) and a quality value (1 to 5), and was hoping to use  a X format = Group.Ticks and Y = quality level, where ticks is the number of ticks for a given time, rounded to the minute.  In other words, my field indexing would look like: <field name="RightsData2">45.634801234 1.5 45.634805667 2.5</field>.  I guess I'm missing something, as I thought that would define a rectangle.  Where do the min max values come into play?  > Date: Fri, 21 Sep 2012 13:55:24 -0700
> From: DSMILEY@mitre.org
> To: solr-user@lucene.apache.org
> Subject: Re: Using Solr-3304
> 
> For your use-case of time ranges, set geo="false" (as you've done).  At this point you have a quad tree but it doesn't (yet) work properly for the default min & max numbers that a double can store, so you need to specify the boundary rectangle explicitly and to the particular numbers for your use-case.  Use 0 for the 'Y's.  Think about the smallest granularity of time you need (a minute?) and what the earliest time you need to recognize and the furthest out as measured in your time granularity (in minutes?).  The boundary minX can be zero which will be your epoch, and the maxX will be the farthest out in time you can go -- who knows.  Set maxDetailDist to 1.
> 
> On Sep 21, 2012, at 4:44 PM, Eric Khoury [via Lucene] wrote:
> 
> 
> David, I tried increasing the maxDetailDist, as I need 9 decimal value precision.    <fieldType name="rectangle" class="solr.SpatialRecursivePrefixTreeFieldType" geo="false" distErrPct="0.025" maxDetailDist="0.000000001" />  But when I do, I get the following error: Data "SEVERE: org.apache.solr.common.SolrException: ERROR: [doc=MV000000050000] Error adding field 'rectangle'='45.634801234 1.5 45.634805667 2.5' msg=Y values [-1.7976931348623157E308 to Infinity] not in boundary Rect(minX=-1.7976931348623157E308,maxX=1.7976931348623157E308,minY=-1.7976931348623157E308,maxY=1.7976931348623157E308)" string
> Any ideas?Eric. PS: what does geo=true\false change?
>  > Date: Fri, 21 Sep 2012 10:34:07 -0700
> 
> > From: [hidden email]<x-msg://330/user/SendEmail.jtp?type=node&node=4009479&i=0>
> > To: [hidden email]<x-msg://330/user/SendEmail.jtp?type=node&node=4009479&i=1>
> > Subject: Re: Using Solr-3304
> >
> > http://wiki.apache.org/solr/SolrAdaptersForLuceneSpatial4
> > Definitely needs some updating; I will try to get to that this weekend.
> >
> >
> >
> > -----
> >  Author: http://www.packtpub.com/apache-solr-3-enterprise-search-server/book
> > --
> > View this message in context: http://lucene.472066.n3.nabble.com/Solr-4-0-Join-performance-tp3998827p4009454.html
> > Sent from the Solr - User mailing list archive at Nabble.com<http://Nabble.com>.
> 
> 
> ________________________________
> If you reply to this email, your message will be added to the discussion below:
> http://lucene.472066.n3.nabble.com/Solr-4-0-Join-performance-tp3998827p4009479.html
> To unsubscribe from Solr 4.0 - Join performance, click here<http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=3998827&code=RFNNSUxFWUBtaXRyZS5vcmd8Mzk5ODgyN3wxMDE2NDI2OTUw>.
> NAML<http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>
> 
> 
> 
> 
> 
> -----
>  Author: http://www.packtpub.com/apache-solr-3-enterprise-search-server/book
> --
> View this message in context: http://lucene.472066.n3.nabble.com/Solr-4-0-Join-performance-tp3998827p4009483.html
> Sent from the Solr - User mailing list archive at Nabble.com.
 		 	   		  

Re: Using Solr-3304

Posted by "David Smiley (@MITRE.org)" <DS...@mitre.org>.
For your use-case of time ranges, set geo="false" (as you've done).  At this point you have a quad tree but it doesn't (yet) work properly for the default min & max numbers that a double can store, so you need to specify the boundary rectangle explicitly and to the particular numbers for your use-case.  Use 0 for the 'Y's.  Think about the smallest granularity of time you need (a minute?) and what the earliest time you need to recognize and the furthest out as measured in your time granularity (in minutes?).  The boundary minX can be zero which will be your epoch, and the maxX will be the farthest out in time you can go -- who knows.  Set maxDetailDist to 1.

On Sep 21, 2012, at 4:44 PM, Eric Khoury [via Lucene] wrote:


David, I tried increasing the maxDetailDist, as I need 9 decimal value precision.    <fieldType name="rectangle" class="solr.SpatialRecursivePrefixTreeFieldType" geo="false" distErrPct="0.025" maxDetailDist="0.000000001" />  But when I do, I get the following error: Data "SEVERE: org.apache.solr.common.SolrException: ERROR: [doc=MV000000050000] Error adding field 'rectangle'='45.634801234 1.5 45.634805667 2.5' msg=Y values [-1.7976931348623157E308 to Infinity] not in boundary Rect(minX=-1.7976931348623157E308,maxX=1.7976931348623157E308,minY=-1.7976931348623157E308,maxY=1.7976931348623157E308)" string
Any ideas?Eric. PS: what does geo=true\false change?
 > Date: Fri, 21 Sep 2012 10:34:07 -0700

> From: [hidden email]<x-msg://330/user/SendEmail.jtp?type=node&node=4009479&i=0>
> To: [hidden email]<x-msg://330/user/SendEmail.jtp?type=node&node=4009479&i=1>
> Subject: Re: Using Solr-3304
>
> http://wiki.apache.org/solr/SolrAdaptersForLuceneSpatial4
> Definitely needs some updating; I will try to get to that this weekend.
>
>
>
> -----
>  Author: http://www.packtpub.com/apache-solr-3-enterprise-search-server/book
> --
> View this message in context: http://lucene.472066.n3.nabble.com/Solr-4-0-Join-performance-tp3998827p4009454.html
> Sent from the Solr - User mailing list archive at Nabble.com<http://Nabble.com>.


________________________________
If you reply to this email, your message will be added to the discussion below:
http://lucene.472066.n3.nabble.com/Solr-4-0-Join-performance-tp3998827p4009479.html
To unsubscribe from Solr 4.0 - Join performance, click here<http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=3998827&code=RFNNSUxFWUBtaXRyZS5vcmd8Mzk5ODgyN3wxMDE2NDI2OTUw>.
NAML<http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>





-----
 Author: http://www.packtpub.com/apache-solr-3-enterprise-search-server/book
--
View this message in context: http://lucene.472066.n3.nabble.com/Solr-4-0-Join-performance-tp3998827p4009483.html
Sent from the Solr - User mailing list archive at Nabble.com.

RE: Using Solr-3304

Posted by Eric Khoury <ek...@hotmail.com>.
Thanks David, that's exactly what I needed.  One thing, from my experiments, the order seems to be Xmin Ymin Xmax Ymax for both the indexing and the query.
 Eric.> Date: Fri, 21 Sep 2012 10:34:07 -0700
> From: DSMILEY@mitre.org
> To: solr-user@lucene.apache.org
> Subject: Re: Using Solr-3304
> 
> http://wiki.apache.org/solr/SolrAdaptersForLuceneSpatial4
> Definitely needs some updating; I will try to get to that this weekend.
> 
> 
> 
> -----
>  Author: http://www.packtpub.com/apache-solr-3-enterprise-search-server/book
> --
> View this message in context: http://lucene.472066.n3.nabble.com/Solr-4-0-Join-performance-tp3998827p4009454.html
> Sent from the Solr - User mailing list archive at Nabble.com.
 		 	   		  

RE: Using Solr-3304

Posted by Eric Khoury <ek...@hotmail.com>.
David, I tried increasing the maxDetailDist, as I need 9 decimal value precision.    <fieldType name="rectangle" class="solr.SpatialRecursivePrefixTreeFieldType" geo="false" distErrPct="0.025" maxDetailDist="0.000000001" />  But when I do, I get the following error: Data "SEVERE: org.apache.solr.common.SolrException: ERROR: [doc=MV000000050000] Error adding field 'rectangle'='45.634801234 1.5 45.634805667 2.5' msg=Y values [-1.7976931348623157E308 to Infinity] not in boundary Rect(minX=-1.7976931348623157E308,maxX=1.7976931348623157E308,minY=-1.7976931348623157E308,maxY=1.7976931348623157E308)" string
Any ideas?Eric. PS: what does geo=true\false change?
 > Date: Fri, 21 Sep 2012 10:34:07 -0700
> From: DSMILEY@mitre.org
> To: solr-user@lucene.apache.org
> Subject: Re: Using Solr-3304
> 
> http://wiki.apache.org/solr/SolrAdaptersForLuceneSpatial4
> Definitely needs some updating; I will try to get to that this weekend.
> 
> 
> 
> -----
>  Author: http://www.packtpub.com/apache-solr-3-enterprise-search-server/book
> --
> View this message in context: http://lucene.472066.n3.nabble.com/Solr-4-0-Join-performance-tp3998827p4009454.html
> Sent from the Solr - User mailing list archive at Nabble.com.
 		 	   		  

Re: Using Solr-3304

Posted by "David Smiley (@MITRE.org)" <DS...@mitre.org>.
http://wiki.apache.org/solr/SolrAdaptersForLuceneSpatial4
Definitely needs some updating; I will try to get to that this weekend.



-----
 Author: http://www.packtpub.com/apache-solr-3-enterprise-search-server/book
--
View this message in context: http://lucene.472066.n3.nabble.com/Solr-4-0-Join-performance-tp3998827p4009454.html
Sent from the Solr - User mailing list archive at Nabble.com.

Using Solr-3304

Posted by Eric Khoury <ek...@hotmail.com>.
Hi David, I've installed the latest nightly, and am trying to use the spacial queries.I've defined a field called Rectangle as such:<field name="Rectangle" type="location_rpt" indexed="true" stored="true" multiValued="true"  /> Can you provide some guidance on how to index a field and how to query it? Indexing: <field name="Rectangle">X1,Y1,X2,Y2</field>?Querying:? Thanks!Eric. 		 	   		  

RE: Solr 4.0 - Join performance

Posted by Eric Khoury <ek...@hotmail.com>.
Hi David, I see that you committed the work for solr-3304 to the 4.x tree, which is great news, thanks.I'm not fully familiar with the process, does that mean its currently available in the nighty builds?Eric.
 > Date: Wed, 29 Aug 2012 08:44:14 -0700
> From: DSMILEY@mitre.org
> To: solr-user@lucene.apache.org
> Subject: Re: Solr 4.0 - Join performance
> 
> The solr.GeoHashFieldType is useless; I'd like to see it deprecated then removed.  You'll need to go with unreleased code and apply patches or wait till Solr 4.
> 
> ~ David
> 
> On Aug 29, 2012, at 10:53 AM, Eric Khoury [via Lucene] wrote:
> 
> 
> Awesome, thanks David.  In the meantime, could I potentially use geohash, or something similar?  Geohash looks like it supports seperate "lon" or "lat" range queries which would help, but its not a multivalue field, which I need.
>  > Date: Wed, 29 Aug 2012 07:20:42 -0700
> 
> > From: [hidden email]<x-msg://228/user/SendEmail.jtp?type=node&node=4004060&i=0>
> > To: [hidden email]<x-msg://228/user/SendEmail.jtp?type=node&node=4004060&i=1>
> > Subject: Re: Solr 4.0 - Join performance
> >
> > Solr 4 is certainly the goal.  There's a bit of a setback at the moment until some of the Lucene spatial API is re-thought.  I'm working heavily on such things this week.
> > ~ David
> >
> > On Aug 28, 2012, at 6:22 PM, Eric Khoury [via Lucene] wrote:
> >
> >
> > David, Solr support for this will come in Solr-3304 I suppose?http://wiki.apache.org/solr/SolrAdaptersForLuceneSpatial4Any idea if this is going to make it into Solr 4.0? Thanks,Eric.
> >  > Date: Wed, 15 Aug 2012 07:07:21 -0700
> >
> > > From: [hidden email]<x-msg://178/user/SendEmail.jtp?type=node&node=4003852&i=0>
> > > To: [hidden email]<x-msg://178/user/SendEmail.jtp?type=node&node=4003852&i=1>
> > > Subject: RE: Solr 4.0 - Join performance
> > >
> > > You would index rectangles of 0 height but that have a left edge 'x' of the
> > > start time and a right edge 'x' of your end time.  You can index a variable
> > > number of these per Solr document and then query by either a point or
> > > another rectangle to find documents which intersect your query shape.  It
> > > can't do a completely within based query, just intersection for now.  I
> > > really look forward to seeing this wrapped up in some sort of RangeFieldType
> > > so that users don't have to think in spatial terms.
> > >
> > >
> > >
> > > -----
> > >  Author: http://www.packtpub.com/apache-solr-3-enterprise-search-server/book
> > > --
> > > View this message in context: http://lucene.472066.n3.nabble.com/Solr-4-0-Join-performance-tp3998827p4001404.html
> > > Sent from the Solr - User mailing list archive at Nabble.com<http://Nabble.com><http://Nabble.com<http://Nabble.com/>>.
> >
> >
> > ________________________________
> > If you reply to this email, your message will be added to the discussion below:
> >
> > NAML<<x-msg://228/>http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>
> >
> >
> >
> >
> >
> > -----
> >  Author: http://www.packtpub.com/apache-solr-3-enterprise-search-server/book
> > --
> > View this message in context: http://lucene.472066.n3.nabble.com/Solr-4-0-Join-performance-tp3998827p4004035.html
> > Sent from the Solr - User mailing list archive at Nabble.com<http://Nabble.com>.
> 
> 
> ________________________________
> If you reply to this email, your message will be added to the discussion below:
> http://lucene.472066.n3.nabble.com/Solr-4-0-Join-performance-tp3998827p4004060.html
> To unsubscribe from Solr 4.0 - Join performance, click here<http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=3998827&code=RFNNSUxFWUBtaXRyZS5vcmd8Mzk5ODgyN3wxMDE2NDI2OTUw>.
> NAML<http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>
> 
> 
> 
> 
> 
> -----
>  Author: http://www.packtpub.com/apache-solr-3-enterprise-search-server/book
> --
> View this message in context: http://lucene.472066.n3.nabble.com/Solr-4-0-Join-performance-tp3998827p4004073.html
> Sent from the Solr - User mailing list archive at Nabble.com.
 		 	   		  

Re: Solr 4.0 - Join performance

Posted by "David Smiley (@MITRE.org)" <DS...@mitre.org>.
The solr.GeoHashFieldType is useless; I'd like to see it deprecated then removed.  You'll need to go with unreleased code and apply patches or wait till Solr 4.

~ David

On Aug 29, 2012, at 10:53 AM, Eric Khoury [via Lucene] wrote:


Awesome, thanks David.  In the meantime, could I potentially use geohash, or something similar?  Geohash looks like it supports seperate "lon" or "lat" range queries which would help, but its not a multivalue field, which I need.
 > Date: Wed, 29 Aug 2012 07:20:42 -0700

> From: [hidden email]<x-msg://228/user/SendEmail.jtp?type=node&node=4004060&i=0>
> To: [hidden email]<x-msg://228/user/SendEmail.jtp?type=node&node=4004060&i=1>
> Subject: Re: Solr 4.0 - Join performance
>
> Solr 4 is certainly the goal.  There's a bit of a setback at the moment until some of the Lucene spatial API is re-thought.  I'm working heavily on such things this week.
> ~ David
>
> On Aug 28, 2012, at 6:22 PM, Eric Khoury [via Lucene] wrote:
>
>
> David, Solr support for this will come in Solr-3304 I suppose?http://wiki.apache.org/solr/SolrAdaptersForLuceneSpatial4Any idea if this is going to make it into Solr 4.0? Thanks,Eric.
>  > Date: Wed, 15 Aug 2012 07:07:21 -0700
>
> > From: [hidden email]<x-msg://178/user/SendEmail.jtp?type=node&node=4003852&i=0>
> > To: [hidden email]<x-msg://178/user/SendEmail.jtp?type=node&node=4003852&i=1>
> > Subject: RE: Solr 4.0 - Join performance
> >
> > You would index rectangles of 0 height but that have a left edge 'x' of the
> > start time and a right edge 'x' of your end time.  You can index a variable
> > number of these per Solr document and then query by either a point or
> > another rectangle to find documents which intersect your query shape.  It
> > can't do a completely within based query, just intersection for now.  I
> > really look forward to seeing this wrapped up in some sort of RangeFieldType
> > so that users don't have to think in spatial terms.
> >
> >
> >
> > -----
> >  Author: http://www.packtpub.com/apache-solr-3-enterprise-search-server/book
> > --
> > View this message in context: http://lucene.472066.n3.nabble.com/Solr-4-0-Join-performance-tp3998827p4001404.html
> > Sent from the Solr - User mailing list archive at Nabble.com<http://Nabble.com><http://Nabble.com<http://Nabble.com/>>.
>
>
> ________________________________
> If you reply to this email, your message will be added to the discussion below:
>
> NAML<<x-msg://228/>http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>
>
>
>
>
>
> -----
>  Author: http://www.packtpub.com/apache-solr-3-enterprise-search-server/book
> --
> View this message in context: http://lucene.472066.n3.nabble.com/Solr-4-0-Join-performance-tp3998827p4004035.html
> Sent from the Solr - User mailing list archive at Nabble.com<http://Nabble.com>.


________________________________
If you reply to this email, your message will be added to the discussion below:
http://lucene.472066.n3.nabble.com/Solr-4-0-Join-performance-tp3998827p4004060.html
To unsubscribe from Solr 4.0 - Join performance, click here<http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=3998827&code=RFNNSUxFWUBtaXRyZS5vcmd8Mzk5ODgyN3wxMDE2NDI2OTUw>.
NAML<http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>





-----
 Author: http://www.packtpub.com/apache-solr-3-enterprise-search-server/book
--
View this message in context: http://lucene.472066.n3.nabble.com/Solr-4-0-Join-performance-tp3998827p4004073.html
Sent from the Solr - User mailing list archive at Nabble.com.

RE: Solr 4.0 - Join performance

Posted by Eric Khoury <ek...@hotmail.com>.
Awesome, thanks David.  In the meantime, could I potentially use geohash, or something similar?  Geohash looks like it supports seperate "lon" or "lat" range queries which would help, but its not a multivalue field, which I need.
 > Date: Wed, 29 Aug 2012 07:20:42 -0700
> From: DSMILEY@mitre.org
> To: solr-user@lucene.apache.org
> Subject: Re: Solr 4.0 - Join performance
> 
> Solr 4 is certainly the goal.  There's a bit of a setback at the moment until some of the Lucene spatial API is re-thought.  I'm working heavily on such things this week.
> ~ David
> 
> On Aug 28, 2012, at 6:22 PM, Eric Khoury [via Lucene] wrote:
> 
> 
> David, Solr support for this will come in Solr-3304 I suppose?http://wiki.apache.org/solr/SolrAdaptersForLuceneSpatial4Any idea if this is going to make it into Solr 4.0? Thanks,Eric.
>  > Date: Wed, 15 Aug 2012 07:07:21 -0700
> 
> > From: [hidden email]<x-msg://178/user/SendEmail.jtp?type=node&node=4003852&i=0>
> > To: [hidden email]<x-msg://178/user/SendEmail.jtp?type=node&node=4003852&i=1>
> > Subject: RE: Solr 4.0 - Join performance
> >
> > You would index rectangles of 0 height but that have a left edge 'x' of the
> > start time and a right edge 'x' of your end time.  You can index a variable
> > number of these per Solr document and then query by either a point or
> > another rectangle to find documents which intersect your query shape.  It
> > can't do a completely within based query, just intersection for now.  I
> > really look forward to seeing this wrapped up in some sort of RangeFieldType
> > so that users don't have to think in spatial terms.
> >
> >
> >
> > -----
> >  Author: http://www.packtpub.com/apache-solr-3-enterprise-search-server/book
> > --
> > View this message in context: http://lucene.472066.n3.nabble.com/Solr-4-0-Join-performance-tp3998827p4001404.html
> > Sent from the Solr - User mailing list archive at Nabble.com<http://Nabble.com>.
> 
> 
> ________________________________
> If you reply to this email, your message will be added to the discussion below:
> http://lucene.472066.n3.nabble.com/Solr-4-0-Join-performance-tp3998827p4003852.html
> To unsubscribe from Solr 4.0 - Join performance, click here<http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=3998827&code=RFNNSUxFWUBtaXRyZS5vcmd8Mzk5ODgyN3wxMDE2NDI2OTUw>.
> NAML<http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>
> 
> 
> 
> 
> 
> -----
>  Author: http://www.packtpub.com/apache-solr-3-enterprise-search-server/book
> --
> View this message in context: http://lucene.472066.n3.nabble.com/Solr-4-0-Join-performance-tp3998827p4004035.html
> Sent from the Solr - User mailing list archive at Nabble.com.
 		 	   		  

Re: Solr 4.0 - Join performance

Posted by "David Smiley (@MITRE.org)" <DS...@mitre.org>.
Solr 4 is certainly the goal.  There's a bit of a setback at the moment until some of the Lucene spatial API is re-thought.  I'm working heavily on such things this week.
~ David

On Aug 28, 2012, at 6:22 PM, Eric Khoury [via Lucene] wrote:


David, Solr support for this will come in Solr-3304 I suppose?http://wiki.apache.org/solr/SolrAdaptersForLuceneSpatial4Any idea if this is going to make it into Solr 4.0? Thanks,Eric.
 > Date: Wed, 15 Aug 2012 07:07:21 -0700

> From: [hidden email]<x-msg://178/user/SendEmail.jtp?type=node&node=4003852&i=0>
> To: [hidden email]<x-msg://178/user/SendEmail.jtp?type=node&node=4003852&i=1>
> Subject: RE: Solr 4.0 - Join performance
>
> You would index rectangles of 0 height but that have a left edge 'x' of the
> start time and a right edge 'x' of your end time.  You can index a variable
> number of these per Solr document and then query by either a point or
> another rectangle to find documents which intersect your query shape.  It
> can't do a completely within based query, just intersection for now.  I
> really look forward to seeing this wrapped up in some sort of RangeFieldType
> so that users don't have to think in spatial terms.
>
>
>
> -----
>  Author: http://www.packtpub.com/apache-solr-3-enterprise-search-server/book
> --
> View this message in context: http://lucene.472066.n3.nabble.com/Solr-4-0-Join-performance-tp3998827p4001404.html
> Sent from the Solr - User mailing list archive at Nabble.com<http://Nabble.com>.


________________________________
If you reply to this email, your message will be added to the discussion below:
http://lucene.472066.n3.nabble.com/Solr-4-0-Join-performance-tp3998827p4003852.html
To unsubscribe from Solr 4.0 - Join performance, click here<http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=3998827&code=RFNNSUxFWUBtaXRyZS5vcmd8Mzk5ODgyN3wxMDE2NDI2OTUw>.
NAML<http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>





-----
 Author: http://www.packtpub.com/apache-solr-3-enterprise-search-server/book
--
View this message in context: http://lucene.472066.n3.nabble.com/Solr-4-0-Join-performance-tp3998827p4004035.html
Sent from the Solr - User mailing list archive at Nabble.com.

RE: Solr 4.0 - Join performance

Posted by Eric Khoury <ek...@hotmail.com>.
David, Solr support for this will come in Solr-3304 I suppose?http://wiki.apache.org/solr/SolrAdaptersForLuceneSpatial4Any idea if this is going to make it into Solr 4.0? Thanks,Eric.
 > Date: Wed, 15 Aug 2012 07:07:21 -0700
> From: DSMILEY@mitre.org
> To: solr-user@lucene.apache.org
> Subject: RE: Solr 4.0 - Join performance
> 
> You would index rectangles of 0 height but that have a left edge 'x' of the
> start time and a right edge 'x' of your end time.  You can index a variable
> number of these per Solr document and then query by either a point or
> another rectangle to find documents which intersect your query shape.  It
> can't do a completely within based query, just intersection for now.  I
> really look forward to seeing this wrapped up in some sort of RangeFieldType
> so that users don't have to think in spatial terms.  
> 
> 
> 
> -----
>  Author: http://www.packtpub.com/apache-solr-3-enterprise-search-server/book
> --
> View this message in context: http://lucene.472066.n3.nabble.com/Solr-4-0-Join-performance-tp3998827p4001404.html
> Sent from the Solr - User mailing list archive at Nabble.com.
 		 	   		  

RE: Solr 4.0 - Join performance

Posted by "David Smiley (@MITRE.org)" <DS...@mitre.org>.
You would index rectangles of 0 height but that have a left edge 'x' of the
start time and a right edge 'x' of your end time.  You can index a variable
number of these per Solr document and then query by either a point or
another rectangle to find documents which intersect your query shape.  It
can't do a completely within based query, just intersection for now.  I
really look forward to seeing this wrapped up in some sort of RangeFieldType
so that users don't have to think in spatial terms.  



-----
 Author: http://www.packtpub.com/apache-solr-3-enterprise-search-server/book
--
View this message in context: http://lucene.472066.n3.nabble.com/Solr-4-0-Join-performance-tp3998827p4001404.html
Sent from the Solr - User mailing list archive at Nabble.com.

RE: Solr 4.0 - Join performance

Posted by Eric Khoury <ek...@hotmail.com>.
Thanks David, I'm not clear on how the X value of a range pair will help me filter on pairs of start-end times.Can you explain how that'd work? Still, seems like the ability to create subobjects in solr is a huge feature, I'm hoping it'll eventually make it in.Eric.
 > From: dsmiley@mitre.org
> To: solr-user@lucene.apache.org
> Subject: Re: Solr 4.0 - Join performance
> Date: Tue, 14 Aug 2012 20:50:43 +0000
> 
> This one should work for now:
> https://issues.apache.org/jira/browse/SOLR-3304
> If you're comfortable with checking out Lucene/Solr and applying a patch, then you can do it yourself and get it working without any real coding.  You'd have to use a dummy constant value for 'y' as you index rectangles, and you'd configure it for non-geospatial.  The unfortunate piece is that 'x' (nor 'y') can't be the full range of a double, and it's not oriented towards a 'long' time value.  There's no JIRA issue for a one-dimensional spatial field yet; that's pretty far down the priority list.  You are certainly not the first that could use this feature, though.
> 
> ~ David Smiley
> 
> On Aug 14, 2012, at 4:19 PM, Eric Khoury wrote:
> 
> > 
> > Thanks David, that does indeed sound like it'll help.  Is there an issue number I can use to track development\availability?Eric.
> >> From: dsmiley@mitre.org
> >> To: solr-user@lucene.apache.org
> >> Subject: Re: Solr 4.0 - Join performance
> >> Date: Tue, 14 Aug 2012 20:15:27 +0000
> >> 
> >> Stepping back a bit, the reason you are using multiple cores with a join is because Solr doesn't have a multi-valued numeric range type.  The spatial work I'm doing in Lucene-spatial does, and it's 2-dimensional for an x & y whereas your case calls for one dimension.  It's taking a bit of time, but when finished you should be able to use it for your use case ignoring the 'y'.  Eventually I'd like to develop  such a Solr field type for a numeric/time range to do it more natively but that's a ways off.
> >> 
> >> Cheers,
> >>  ~ David Smiley
> >> 
> >> On Aug 2, 2012, at 10:45 AM, Eric Khoury wrote:
> >> 
> >>> 
> >>> 
> >>> 
> >>> 
> >>> 
> >>> 
> >>> Hello all,
> >>> 
> >>> 
> >>> 
> >>> I’m testing out the new join feature, hitting some perf
> >>> issues, as described in Erick’s article (http://architects.dzone.com/articles/solr-experimenting-join).
> >>> 
> >>> Basically, I’m using 2 objects in solr (this is a simplified
> >>> view):
> >>> 
> >>> 
> >>> 
> >>> Item
> >>> 
> >>> - Id
> >>> 
> >>> - Name
> >>> 
> >>> 
> >>> 
> >>> Grant
> >>> 
> >>> - ItemId
> >>> 
> >>> - AvailabilityStartTime
> >>> 
> >>> - AvailabilityEndTime
> >>> 
> >>> 
> >>> 
> >>> Each item can have multiple grants attached to it.
> >>> 
> >>> 
> >>> 
> >>> The query I'm using is the following, to find items by
> >>> name, filtered by grants availability window:
> >>> 
> >>> 
> >>> 
> >>> solr/select?fq=Name:XXX&q={!join
> >>> from=ItemId to=Id} AvailabilityStartTime:[* TO NOW] AND -AvailabilityEndTime:[*
> >>> TO NOW]
> >>> 
> >>> 
> >>> 
> >>> With a hundred thousand items, this query can take multiple seconds
> >>> to perform, due to the large number or ItemIds returned from the join query.
> >>> 
> >>> Has anyone come up with a better way to use joins for these types of queries?  Are there improvements planned in 4.0 rtm in this area?
> >>> 
> >>> 
> >>> 
> >>> Btw, I’ve explored simply adding Start-End times to items, but
> >>> the flat data model makes it hard to maintain start-end pairs.
> >>> 
> >>> 
> >>> 
> >>> Thanks for the help!
> >>> 
> >>> Eric.
> >>> 
> >>> 
> >>> 
> >>> 		 	   		  
> >> 
> > 		 	   		  
> 
 		 	   		  

Re: Solr 4.0 - Join performance

Posted by "Smiley, David W." <ds...@mitre.org>.
This one should work for now:
https://issues.apache.org/jira/browse/SOLR-3304
If you're comfortable with checking out Lucene/Solr and applying a patch, then you can do it yourself and get it working without any real coding.  You'd have to use a dummy constant value for 'y' as you index rectangles, and you'd configure it for non-geospatial.  The unfortunate piece is that 'x' (nor 'y') can't be the full range of a double, and it's not oriented towards a 'long' time value.  There's no JIRA issue for a one-dimensional spatial field yet; that's pretty far down the priority list.  You are certainly not the first that could use this feature, though.

~ David Smiley

On Aug 14, 2012, at 4:19 PM, Eric Khoury wrote:

> 
> Thanks David, that does indeed sound like it'll help.  Is there an issue number I can use to track development\availability?Eric.
>> From: dsmiley@mitre.org
>> To: solr-user@lucene.apache.org
>> Subject: Re: Solr 4.0 - Join performance
>> Date: Tue, 14 Aug 2012 20:15:27 +0000
>> 
>> Stepping back a bit, the reason you are using multiple cores with a join is because Solr doesn't have a multi-valued numeric range type.  The spatial work I'm doing in Lucene-spatial does, and it's 2-dimensional for an x & y whereas your case calls for one dimension.  It's taking a bit of time, but when finished you should be able to use it for your use case ignoring the 'y'.  Eventually I'd like to develop  such a Solr field type for a numeric/time range to do it more natively but that's a ways off.
>> 
>> Cheers,
>>  ~ David Smiley
>> 
>> On Aug 2, 2012, at 10:45 AM, Eric Khoury wrote:
>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> Hello all,
>>> 
>>> 
>>> 
>>> I’m testing out the new join feature, hitting some perf
>>> issues, as described in Erick’s article (http://architects.dzone.com/articles/solr-experimenting-join).
>>> 
>>> Basically, I’m using 2 objects in solr (this is a simplified
>>> view):
>>> 
>>> 
>>> 
>>> Item
>>> 
>>> - Id
>>> 
>>> - Name
>>> 
>>> 
>>> 
>>> Grant
>>> 
>>> - ItemId
>>> 
>>> - AvailabilityStartTime
>>> 
>>> - AvailabilityEndTime
>>> 
>>> 
>>> 
>>> Each item can have multiple grants attached to it.
>>> 
>>> 
>>> 
>>> The query I'm using is the following, to find items by
>>> name, filtered by grants availability window:
>>> 
>>> 
>>> 
>>> solr/select?fq=Name:XXX&q={!join
>>> from=ItemId to=Id} AvailabilityStartTime:[* TO NOW] AND -AvailabilityEndTime:[*
>>> TO NOW]
>>> 
>>> 
>>> 
>>> With a hundred thousand items, this query can take multiple seconds
>>> to perform, due to the large number or ItemIds returned from the join query.
>>> 
>>> Has anyone come up with a better way to use joins for these types of queries?  Are there improvements planned in 4.0 rtm in this area?
>>> 
>>> 
>>> 
>>> Btw, I’ve explored simply adding Start-End times to items, but
>>> the flat data model makes it hard to maintain start-end pairs.
>>> 
>>> 
>>> 
>>> Thanks for the help!
>>> 
>>> Eric.
>>> 
>>> 
>>> 
>>> 		 	   		  
>> 
> 		 	   		  


RE: Solr 4.0 - Join performance

Posted by Eric Khoury <ek...@hotmail.com>.
Thanks David, that does indeed sound like it'll help.  Is there an issue number I can use to track development\availability?Eric.
 > From: dsmiley@mitre.org
> To: solr-user@lucene.apache.org
> Subject: Re: Solr 4.0 - Join performance
> Date: Tue, 14 Aug 2012 20:15:27 +0000
> 
> Stepping back a bit, the reason you are using multiple cores with a join is because Solr doesn't have a multi-valued numeric range type.  The spatial work I'm doing in Lucene-spatial does, and it's 2-dimensional for an x & y whereas your case calls for one dimension.  It's taking a bit of time, but when finished you should be able to use it for your use case ignoring the 'y'.  Eventually I'd like to develop  such a Solr field type for a numeric/time range to do it more natively but that's a ways off.
> 
> Cheers,
>   ~ David Smiley
> 
> On Aug 2, 2012, at 10:45 AM, Eric Khoury wrote:
> 
> > 
> > 
> > 
> > 
> > 
> > 
> > Hello all,
> > 
> > 
> > 
> > I’m testing out the new join feature, hitting some perf
> > issues, as described in Erick’s article (http://architects.dzone.com/articles/solr-experimenting-join).
> > 
> > Basically, I’m using 2 objects in solr (this is a simplified
> > view):
> > 
> > 
> > 
> > Item
> > 
> > - Id
> > 
> > - Name
> > 
> > 
> > 
> > Grant
> > 
> > - ItemId
> > 
> > - AvailabilityStartTime
> > 
> > - AvailabilityEndTime
> > 
> > 
> > 
> > Each item can have multiple grants attached to it.
> > 
> > 
> > 
> > The query I'm using is the following, to find items by
> > name, filtered by grants availability window:
> > 
> > 
> > 
> > solr/select?fq=Name:XXX&q={!join
> > from=ItemId to=Id} AvailabilityStartTime:[* TO NOW] AND -AvailabilityEndTime:[*
> > TO NOW]
> > 
> > 
> > 
> > With a hundred thousand items, this query can take multiple seconds
> > to perform, due to the large number or ItemIds returned from the join query.
> > 
> > Has anyone come up with a better way to use joins for these types of queries?  Are there improvements planned in 4.0 rtm in this area?
> > 
> > 
> > 
> > Btw, I’ve explored simply adding Start-End times to items, but
> > the flat data model makes it hard to maintain start-end pairs.
> > 
> > 
> > 
> > Thanks for the help!
> > 
> > Eric.
> > 
> > 
> > 
> > 		 	   		  
> 
 		 	   		  

Re: Solr 4.0 - Join performance

Posted by "Smiley, David W." <ds...@mitre.org>.
Stepping back a bit, the reason you are using multiple cores with a join is because Solr doesn't have a multi-valued numeric range type.  The spatial work I'm doing in Lucene-spatial does, and it's 2-dimensional for an x & y whereas your case calls for one dimension.  It's taking a bit of time, but when finished you should be able to use it for your use case ignoring the 'y'.  Eventually I'd like to develop  such a Solr field type for a numeric/time range to do it more natively but that's a ways off.

Cheers,
  ~ David Smiley

On Aug 2, 2012, at 10:45 AM, Eric Khoury wrote:

> 
> 
> 
> 
> 
> 
> Hello all,
> 
> 
> 
> I’m testing out the new join feature, hitting some perf
> issues, as described in Erick’s article (http://architects.dzone.com/articles/solr-experimenting-join).
> 
> Basically, I’m using 2 objects in solr (this is a simplified
> view):
> 
> 
> 
> Item
> 
> - Id
> 
> - Name
> 
> 
> 
> Grant
> 
> - ItemId
> 
> - AvailabilityStartTime
> 
> - AvailabilityEndTime
> 
> 
> 
> Each item can have multiple grants attached to it.
> 
> 
> 
> The query I'm using is the following, to find items by
> name, filtered by grants availability window:
> 
> 
> 
> solr/select?fq=Name:XXX&q={!join
> from=ItemId to=Id} AvailabilityStartTime:[* TO NOW] AND -AvailabilityEndTime:[*
> TO NOW]
> 
> 
> 
> With a hundred thousand items, this query can take multiple seconds
> to perform, due to the large number or ItemIds returned from the join query.
> 
> Has anyone come up with a better way to use joins for these types of queries?  Are there improvements planned in 4.0 rtm in this area?
> 
> 
> 
> Btw, I’ve explored simply adding Start-End times to items, but
> the flat data model makes it hard to maintain start-end pairs.
> 
> 
> 
> Thanks for the help!
> 
> Eric.
> 
> 
> 
>