You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Pravin Karne <pr...@persistent.co.in> on 2009/10/07 06:29:48 UTC

Solr Quries

Hi,
I am new to solr. I have following queries :


1.       Is solr work in distributed environment ? if yes, how to configure it?



2.       Is solr have Hadoop support? if yes, how to setup it with Hadoop/HDFS? (Note: I am familiar with Hadoop)



3.       I have employee information(id, name ,address, cell no, personal info) of 1 TB ,To post(index)this data on solr server, shall I have to create xml file with this data and then post it to solr server? Or is there any other optimal way?  In future my data will grow upto 10 TB , then how can I index this data ?(because creating xml is more headache )





Thanks in advance

-Pravin




DISCLAIMER
==========
This e-mail may contain privileged and confidential information which is the property of Persistent Systems Ltd. It is intended only for the use of the individual or entity to which it is addressed. If you are not the intended recipient, you are not authorized to read, retain, copy, print, distribute or use this message. If you have received this communication in error, please notify the sender and delete all copies of this message. Persistent Systems Ltd. does not accept any liability for virus infected mails.

RE: Solr Quries

Posted by Pravin Karne <pr...@persistent.co.in>.
Thanks for your help.
Can you please provide detail configuration for solr distributed environment.
How to setup master and slave ? for this in which  file/s I have to do changes ?
What are the shard parameters ?

Can we integrate zookeeper with this ?

Please provide details for this.

Thanks in advance.
-Pravin

-----Original Message-----
From: Sandeep Tagore [mailto:sandeep.tagore@gmail.com] 
Sent: Wednesday, October 07, 2009 4:29 PM
To: solr-user@lucene.apache.org
Subject: Re: Solr Quries


Hi Pravin,

1. Is solr work in distributed environment ? if yes, how to configure it?
Yep. You can achieve this with Sharding.
For example: Install and Configure Solr on two machines and declare any one
of those as master. Insert shard parameters while you index and search your
data.

2. Is solr have Hadoop support? if yes, how to setup it with Hadoop/HDFS?
(Note: I am familiar with Hadoop)
Sorry. No idea.

3. I have employee information(id, name ,address, cell no, personal info) of
1 TB ,To post(index)this data on solr server, shall I have to create xml
file with this data and then post it to solr server? Or is there any other
optimal way?  In future my data will grow upto 10 TB , then how can I index
this data ?(because creating xml is more headache )
I think, XML is not the best way. I don't suggest it. If you have that 1 TB
data in a database you can achieve this simply using full import command.
Configure your DB details in solr-config.xml and data-config.xml and add you
DB driver jar to solr lib directory. Now import the data in slices (say dept
wise, or in some category wise..). In future, you can import the data from a
DB or you can index the data directly using client-API with simple java
beans.

Hope this info helps you.

Regards,
Sandeep Tagore
-- 
View this message in context: http://www.nabble.com/Solr-Quries-tp25780371p25783891.html
Sent from the Solr - User mailing list archive at Nabble.com.


DISCLAIMER
==========
This e-mail may contain privileged and confidential information which is the property of Persistent Systems Ltd. It is intended only for the use of the individual or entity to which it is addressed. If you are not the intended recipient, you are not authorized to read, retain, copy, print, distribute or use this message. If you have received this communication in error, please notify the sender and delete all copies of this message. Persistent Systems Ltd. does not accept any liability for virus infected mails.

Re: Solr Quries

Posted by Sandeep Tagore <sa...@gmail.com>.
Hi Pravin,

1. Is solr work in distributed environment ? if yes, how to configure it?
Yep. You can achieve this with Sharding.
For example: Install and Configure Solr on two machines and declare any one
of those as master. Insert shard parameters while you index and search your
data.

2. Is solr have Hadoop support? if yes, how to setup it with Hadoop/HDFS?
(Note: I am familiar with Hadoop)
Sorry. No idea.

3. I have employee information(id, name ,address, cell no, personal info) of
1 TB ,To post(index)this data on solr server, shall I have to create xml
file with this data and then post it to solr server? Or is there any other
optimal way?  In future my data will grow upto 10 TB , then how can I index
this data ?(because creating xml is more headache )
I think, XML is not the best way. I don't suggest it. If you have that 1 TB
data in a database you can achieve this simply using full import command.
Configure your DB details in solr-config.xml and data-config.xml and add you
DB driver jar to solr lib directory. Now import the data in slices (say dept
wise, or in some category wise..). In future, you can import the data from a
DB or you can index the data directly using client-API with simple java
beans.

Hope this info helps you.

Regards,
Sandeep Tagore
-- 
View this message in context: http://www.nabble.com/Solr-Quries-tp25780371p25783891.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr Quries

Posted by Shalin Shekhar Mangar <sh...@gmail.com>.
On Fri, Oct 9, 2009 at 11:35 AM, Pravin Karne <pravin_karne@persistent.co.in
> wrote:

> Thanks for your reply.
> I have one more query regarding solr distributed environment.
>
> I have configured solr on to machine as per
> http://wiki.apache.org/solr/DistributedSearch
>
> But I have following test case -
>
> Suppose I have two machine ,Sever1 ,Server2
>
> I have post record with id 1 on sever1 and put other record on server2 with
> same id i.e. 1
>
>
You are supposed to put disjoint set of documents in the shards. If the same
uniqueKey is present on multiple shards, there is no guarantee which
document will be picked up.

-- 
Regards,
Shalin Shekhar Mangar.

RE: Solr Quries

Posted by Pravin Karne <pr...@persistent.co.in>.
Thanks for your reply.
I have one more query regarding solr distributed environment.

I have configured solr on to machine as per http://wiki.apache.org/solr/DistributedSearch

But I have following test case -

Suppose I have two machine ,Sever1 ,Server2

I have post record with id 1 on sever1 and put other record on server2 with same id i.e. 1

So when I gives query like 
http://sever1:8983/solr/select?shards=server1:8983/solr,server2:8983/solr& &q=1
this gives result from server1



http://server2:8983/solr/select?shards=server2:8983/solr,server1/solr&q=1
this gives result from server2

how to solve this..

Is any other setting is required for this ?

Thanks in advance
-Pravin

-----Original Message-----
From: Shalin Shekhar Mangar [mailto:shalinmangar@gmail.com] 
Sent: Wednesday, October 07, 2009 3:37 PM
To: solr-user@lucene.apache.org
Subject: Re: Solr Quries

First, please do not cross-post messages to both solr-dev and solr-user.
Solr-dev is only for development related discussions.

Comments inline:

On Wed, Oct 7, 2009 at 9:59 AM, Pravin Karne
<pr...@persistent.co.in>wrote:

> Hi,
> I am new to solr. I have following queries :
>
>
> 1.       Is solr work in distributed environment ? if yes, how to configure
> it?
>

Yes, Solr works in distributed environment. See
http://wiki.apache.org/solr/DistributedSearch


>
>
>
> 2.       Is solr have Hadoop support? if yes, how to setup it with
> Hadoop/HDFS? (Note: I am familiar with Hadoop)
>
>
Not currently. There is some work going on at
https://issues.apache.org/jira/browse/SOLR-1457


>
>
> 3.       I have employee information(id, name ,address, cell no, personal
> info) of 1 TB ,To post(index)this data on solr server, shall I have to
> create xml file with this data and then post it to solr server? Or is there
> any other optimal way?  In future my data will grow upto 10 TB , then how
> can I index this data ?(because creating xml is more headache )
>
>
XML is just one way. You could use also CSV. If you use, the Solrj java
client with Solr 1.4 (soon to be released), it uses an efficient binary
format for posting data to Solr.

-- 
Regards,
Shalin Shekhar Mangar.

DISCLAIMER
==========
This e-mail may contain privileged and confidential information which is the property of Persistent Systems Ltd. It is intended only for the use of the individual or entity to which it is addressed. If you are not the intended recipient, you are not authorized to read, retain, copy, print, distribute or use this message. If you have received this communication in error, please notify the sender and delete all copies of this message. Persistent Systems Ltd. does not accept any liability for virus infected mails.

Re: Solr Quries

Posted by Shalin Shekhar Mangar <sh...@gmail.com>.
First, please do not cross-post messages to both solr-dev and solr-user.
Solr-dev is only for development related discussions.

Comments inline:

On Wed, Oct 7, 2009 at 9:59 AM, Pravin Karne
<pr...@persistent.co.in>wrote:

> Hi,
> I am new to solr. I have following queries :
>
>
> 1.       Is solr work in distributed environment ? if yes, how to configure
> it?
>

Yes, Solr works in distributed environment. See
http://wiki.apache.org/solr/DistributedSearch


>
>
>
> 2.       Is solr have Hadoop support? if yes, how to setup it with
> Hadoop/HDFS? (Note: I am familiar with Hadoop)
>
>
Not currently. There is some work going on at
https://issues.apache.org/jira/browse/SOLR-1457


>
>
> 3.       I have employee information(id, name ,address, cell no, personal
> info) of 1 TB ,To post(index)this data on solr server, shall I have to
> create xml file with this data and then post it to solr server? Or is there
> any other optimal way?  In future my data will grow upto 10 TB , then how
> can I index this data ?(because creating xml is more headache )
>
>
XML is just one way. You could use also CSV. If you use, the Solrj java
client with Solr 1.4 (soon to be released), it uses an efficient binary
format for posting data to Solr.

-- 
Regards,
Shalin Shekhar Mangar.