You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by "Chaushu, Shani" <sh...@intel.com> on 2015/03/31 08:31:29 UTC

Spark-Solr in python

Hi,
I saw there is a tool for reading solr into Spark RDD in JAVA
I want to do something like this in python, is there any package in python for reading solr into spark RDD?

Thanks ,
Shani


---------------------------------------------------------------------
Intel Electronics Ltd.

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.

RE: Spark-Solr in python

Posted by "Chaushu, Shani" <sh...@intel.com>.
There is a package of python with solr-cloud
https://pypi.python.org/pypi/solrcloudpy

but I don't know if there is possibility to connect it to spark


-----Original Message-----
From: Timothy Potter [mailto:thelabdude@gmail.com] 
Sent: Tuesday, March 31, 2015 23:15
To: solr-user@lucene.apache.org
Subject: Re: Spark-Solr in python

You'll need a python lib that uses a python ZooKeeper client to be SolrCloud-aware so that you can do RDD like things, such as reading from all shards in a collection in parallel. I'm not aware of any Solr py libs that are cloud-aware yet, but it would be a good contribution to upgrade https://github.com/toastdriven/pysolr to be SolrCloud-aware

On Mon, Mar 30, 2015 at 11:31 PM, Chaushu, Shani <sh...@intel.com> wrote:
> Hi,
> I saw there is a tool for reading solr into Spark RDD in JAVA I want 
> to do something like this in python, is there any package in python for reading solr into spark RDD?
>
> Thanks ,
> Shani
>
>
> ---------------------------------------------------------------------
> Intel Electronics Ltd.
>
> This e-mail and any attachments may contain confidential material for 
> the sole use of the intended recipient(s). Any review or distribution 
> by others is strictly prohibited. If you are not the intended 
> recipient, please contact the sender and delete all copies.
---------------------------------------------------------------------
Intel Electronics Ltd.

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.

RE: Spark-Solr in python

Posted by "Davis, Daniel (NIH/NLM) [C]" <da...@nih.gov>.
There is a pull request for that - https://github.com/toastdriven/pysolr/pull/138.   Depending on how you install Python modules, you could grab the cone for the feature, and run that version.

-----Original Message-----
From: Timothy Potter [mailto:thelabdude@gmail.com] 
Sent: Tuesday, March 31, 2015 4:15 PM
To: solr-user@lucene.apache.org
Subject: Re: Spark-Solr in python

You'll need a python lib that uses a python ZooKeeper client to be SolrCloud-aware so that you can do RDD like things, such as reading from all shards in a collection in parallel. I'm not aware of any Solr py libs that are cloud-aware yet, but it would be a good contribution to upgrade https://github.com/toastdriven/pysolr to be SolrCloud-aware

On Mon, Mar 30, 2015 at 11:31 PM, Chaushu, Shani <sh...@intel.com> wrote:
> Hi,
> I saw there is a tool for reading solr into Spark RDD in JAVA I want 
> to do something like this in python, is there any package in python for reading solr into spark RDD?
>
> Thanks ,
> Shani
>
>
> ---------------------------------------------------------------------
> Intel Electronics Ltd.
>
> This e-mail and any attachments may contain confidential material for 
> the sole use of the intended recipient(s). Any review or distribution 
> by others is strictly prohibited. If you are not the intended 
> recipient, please contact the sender and delete all copies.

Re: Spark-Solr in python

Posted by Timothy Potter <th...@gmail.com>.
You'll need a python lib that uses a python ZooKeeper client to be
SolrCloud-aware so that you can do RDD like things, such as reading
from all shards in a collection in parallel. I'm not aware of any Solr
py libs that are cloud-aware yet, but it would be a good contribution
to upgrade https://github.com/toastdriven/pysolr to be SolrCloud-aware

On Mon, Mar 30, 2015 at 11:31 PM, Chaushu, Shani
<sh...@intel.com> wrote:
> Hi,
> I saw there is a tool for reading solr into Spark RDD in JAVA
> I want to do something like this in python, is there any package in python for reading solr into spark RDD?
>
> Thanks ,
> Shani
>
>
> ---------------------------------------------------------------------
> Intel Electronics Ltd.
>
> This e-mail and any attachments may contain confidential material for
> the sole use of the intended recipient(s). Any review or distribution
> by others is strictly prohibited. If you are not the intended
> recipient, please contact the sender and delete all copies.