You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by mganeshs <mg...@live.in> on 2017/06/23 15:04:56 UTC

Using of Streaming to join between shards

Hi,

So far we had only one shards so joins are working fine. And now as our data
is growing, we would like to go for new shards and we would like to go with
only default sharding mechanism for various reasons.

Due to this, join will fail. as it's not supported if we have more than one
shards.

For this reason we are planning to use join. 

Can you suggest whether streaming can be used like we used join before ?
Will there be any penalty wrt response time and CPU utilization ? 

Currently we are using simple join which is like one to one mapping sort of
join. For this when I move to Streaming, What kind of join Should I go for ?
hashJoin or leftOuterJoin or innerJoin etc ? 

Pls suggest,




--
View this message in context: http://lucene.472066.n3.nabble.com/Using-of-Streaming-to-join-between-shards-tp4342563.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Using of Streaming to join between shards

Posted by mganeshs <mg...@live.in>.
Hi Joel,

Thanks for confirming that Streaming would be too costly for high qps loads.

Regards,



--
View this message in context: http://lucene.472066.n3.nabble.com/Using-of-Streaming-to-join-between-shards-tp4342563p4343104.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Using of Streaming to join between shards

Posted by Joel Bernstein <jo...@gmail.com>.
I don't think the distributed joins are going to work for you in the ACL
use case you describe. I think the overhead of streaming the documents will
be too costly in this scenario. The distributed joins were designed more
for OLAP data warehousing use cases rather then high QPS loads.

Joel Bernstein
http://joelsolr.blogspot.com/

On Tue, Jun 27, 2017 at 7:51 AM, mganeshs <mg...@live.in> wrote:

> Hi Susheel,
>
> Thanks for your reply and as you suggested we will start with innerJoin.
>
> But what I want know is that, Is Streaming can be used instead of normal
> default Join ?
>
> For ex. currently we fire request for every user clicks on menu in the page
> to show list of his documents with default JOIN and it works well without
> any issues with 100 concurrent users as well or even more than that
> concurrency.
>
> Can we do same for streaming join as well ? I just want to know whether
> concurrent streaming request will create heavy load to solr server or it's
> same as default join. What would be penalty of using streaming concurrently
> instead of default join ?
>
> Kindly throw some light on this topic.
>
>
>
>
>
> --
> View this message in context: http://lucene.472066.n3.
> nabble.com/Using-of-Streaming-to-join-between-shards-
> tp4342563p4343005.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

Re: Using of Streaming to join between shards

Posted by mganeshs <mg...@live.in>.
Hi Susheel,

Thanks for your reply and as you suggested we will start with innerJoin.

But what I want know is that, Is Streaming can be used instead of normal
default Join ? 

For ex. currently we fire request for every user clicks on menu in the page
to show list of his documents with default JOIN and it works well without
any issues with 100 concurrent users as well or even more than that
concurrency.

Can we do same for streaming join as well ? I just want to know whether
concurrent streaming request will create heavy load to solr server or it's
same as default join. What would be penalty of using streaming concurrently
instead of default join ?

Kindly throw some light on this topic.





--
View this message in context: http://lucene.472066.n3.nabble.com/Using-of-Streaming-to-join-between-shards-tp4342563p4343005.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Using of Streaming to join between shards

Posted by Susheel Kumar <su...@gmail.com>.
You may want to start with innerJoin which is the simple typical join in
database world.

On Mon, Jun 26, 2017 at 1:46 AM, mganeshs <mg...@live.in> wrote:

> Hi Erick,
>
> My scenario goes with two kind of SOLR documents
>
> Document #1 - Real document
> #D_uniqueId #D_documentId(unique), #D_documentname, #D_documentdesc,
> #D_documentinfo1, #D_documentInfo2, #D_documentInfo3, ...
>
> Document #2 - to hold documents ACL
> #P_uniqueId #P_acl_perm ( multi value field, it contains values of user
> like
> U1, U2, U3, U4.. etc )
>
> Now currently (we have only one shard as of now ) with simple join my query
> looks like {!join from=P_uniqueId to=D_uniqueId)P_acl_perm:U1
>
> Number of ACL values per document can grow up to 1M fields.
>
> Now as the number of documents are increasing. we are planning to add one
> more shard, by splitting the shard to two.
>
> As join won't be working with multiple shards. we are planning to use
> streams.
>
> So what should be streaming query to replace this normal join query (
> {!join
> from=P_uniqueId to=D_uniqueId)P_acl_perm:U1 ) ?
>
> Early responses would be really appreciated !
>
> Regards,
>
>
>
> --
> View this message in context: http://lucene.472066.n3.
> nabble.com/Using-of-Streaming-to-join-between-shards-
> tp4342563p4342778.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

Re: Using of Streaming to join between shards

Posted by mganeshs <mg...@live.in>.
Hi Erick,

My scenario goes with two kind of SOLR documents

Document #1 - Real document
#D_uniqueId #D_documentId(unique), #D_documentname, #D_documentdesc,
#D_documentinfo1, #D_documentInfo2, #D_documentInfo3, ... 

Document #2 - to hold documents ACL
#P_uniqueId #P_acl_perm ( multi value field, it contains values of user like
U1, U2, U3, U4.. etc )

Now currently (we have only one shard as of now ) with simple join my query
looks like {!join from=P_uniqueId to=D_uniqueId)P_acl_perm:U1

Number of ACL values per document can grow up to 1M fields.

Now as the number of documents are increasing. we are planning to add one
more shard, by splitting the shard to two. 

As join won't be working with multiple shards. we are planning to use
streams. 

So what should be streaming query to replace this normal join query ( {!join
from=P_uniqueId to=D_uniqueId)P_acl_perm:U1 ) ?

Early responses would be really appreciated !

Regards,



--
View this message in context: http://lucene.472066.n3.nabble.com/Using-of-Streaming-to-join-between-shards-tp4342563p4342778.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Using of Streaming to join between shards

Posted by Erick Erickson <er...@gmail.com>.
You've provided no information to help guide an answer and even with
more information there are too many variables to say definitively.

There are quite a number of Streaming join options, see:
https://cwiki.apache.org/confluence/display/solr/Streaming+Expressions.
You'll have to do some exploration of the various ones mentioned on
that page as they pertain to your particular use case.

Best,
Erick

On Fri, Jun 23, 2017 at 8:04 AM, mganeshs <mg...@live.in> wrote:
> Hi,
>
> So far we had only one shards so joins are working fine. And now as our data
> is growing, we would like to go for new shards and we would like to go with
> only default sharding mechanism for various reasons.
>
> Due to this, join will fail. as it's not supported if we have more than one
> shards.
>
> For this reason we are planning to use join.
>
> Can you suggest whether streaming can be used like we used join before ?
> Will there be any penalty wrt response time and CPU utilization ?
>
> Currently we are using simple join which is like one to one mapping sort of
> join. For this when I move to Streaming, What kind of join Should I go for ?
> hashJoin or leftOuterJoin or innerJoin etc ?
>
> Pls suggest,
>
>
>
>
> --
> View this message in context: http://lucene.472066.n3.nabble.com/Using-of-Streaming-to-join-between-shards-tp4342563.html
> Sent from the Solr - User mailing list archive at Nabble.com.