You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by search engn dev <sa...@gmail.com> on 2014/07/07 10:23:32 UTC

Need of hadoop

Currently i am exploring hadoop with solr, Somewhere it is written as "This
does not use Hadoop Map-Reduce to process Solr data, rather it only uses the
HDFS filesystem for index and transaction log file storage. " ,

then what is the advantage of using using hadoop over local file system?
will use of hdfs increase overall performance of searching? 

any detailed pointers regarding this will surely help me to understand this.



--
View this message in context: http://lucene.472066.n3.nabble.com/Need-of-hadoop-tp4145846.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Need of hadoop

Posted by Erick Erickson <er...@gmail.com>.

And that's exactly what it means. The HdfsDirectoryFactory is intended
to use the HDFS file system to store the Solr (well, actually Lucene)
index. It's (by default), triply redundant and vastly reduces your
chances of losing your index due to disk errors. That's what HDFS
does.

If that's not useful to you, then you shouldn't use it.

This also is what the MapReduceIndexerTool is written to work with.
You can spread your M/R jobs out over all your cluster, whether or not
you're running Solr on all the nodes that can run M/R jobs. And with
the --go-live option, the final results can be merged into your live
Solr index on whichever nodes are running your Solr instances. Which
are using HdfsDirectoryFactory.

There are anecdotal reports of being able to use the
MapReduceIndexerTool to spread the indexing jobs out, then merge them
with a Solr index on a local file system. However, this is not a
supported use-case by the authors of the MapReduceIndexerTool, and I
suspect (but don't know for sure) that the native file system
implementation doesn't, for instance, make explicit use of the
MMapDirectory wrapper in Lucene. That would be a nice case to support,
but it isn't high on the contributors' priority list.

So the "bottom line" is that if the file redundancy (and associated
goodness of being able to access the data from anywhere) isn't
valuable to you, there's no particular reason to use the Solr index
over HDFS.

Best,
Erick

On Mon, Jul 7, 2014 at 9:49 PM, search engn dev
<sa...@gmail.com> wrote:
> It is written  here
> <https://cwiki.apache.org/confluence/display/solr/Running+Solr+on+HDFS>
>
>
>
> --
> View this message in context: http://lucene.472066.n3.nabble.com/Need-of-hadoop-tp4145846p4146033.html
> Sent from the Solr - User mailing list archive at Nabble.com.

Re: Need of hadoop

Posted by search engn dev <sa...@gmail.com>.

It is written  here
<https://cwiki.apache.org/confluence/display/solr/Running+Solr+on+HDFS>  



--
View this message in context: http://lucene.472066.n3.nabble.com/Need-of-hadoop-tp4145846p4146033.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Need of hadoop

Posted by Erick Erickson <er...@gmail.com>.

OK, _where_ is that written? The HdfsDirectoryFactory code?
Someone's blog somewhere? Your notes?

Ali has one part of the answer, using HDFS will redundantly
store your index, which is good.

Furthermore, the MapReduceIndexerTool (see the contribs)
_will_ use HDFS to do the classic M/R indexing process
for batch, and has a --go-live feature that allows you to merge
the results into a running SolrCloud. It was written with the
assumption that the Solr index was on HDFS.

FWIW,
Erick

On Mon, Jul 7, 2014 at 6:06 AM, Ali Nazemian <al...@gmail.com> wrote:
> I think this will not improve the performance of indexing but probably it
> would be a solution for using HDFS HA with replication factor. But I am not
> sure about that.
>
>
> On Mon, Jul 7, 2014 at 12:53 PM, search engn dev <sa...@gmail.com>
> wrote:
>
>> Currently i am exploring hadoop with solr, Somewhere it is written as "This
>> does not use Hadoop Map-Reduce to process Solr data, rather it only uses
>> the
>> HDFS filesystem for index and transaction log file storage. " ,
>>
>> then what is the advantage of using using hadoop over local file system?
>> will use of hdfs increase overall performance of searching?
>>
>> any detailed pointers regarding this will surely help me to understand
>> this.
>>
>>
>>
>> --
>> View this message in context:
>> http://lucene.472066.n3.nabble.com/Need-of-hadoop-tp4145846.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>
>
>
>
> --
> A.Nazemian

Re: Need of hadoop

Posted by Ali Nazemian <al...@gmail.com>.

I think this will not improve the performance of indexing but probably it
would be a solution for using HDFS HA with replication factor. But I am not
sure about that.


On Mon, Jul 7, 2014 at 12:53 PM, search engn dev <sa...@gmail.com>
wrote:

> Currently i am exploring hadoop with solr, Somewhere it is written as "This
> does not use Hadoop Map-Reduce to process Solr data, rather it only uses
> the
> HDFS filesystem for index and transaction log file storage. " ,
>
> then what is the advantage of using using hadoop over local file system?
> will use of hdfs increase overall performance of searching?
>
> any detailed pointers regarding this will surely help me to understand
> this.
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Need-of-hadoop-tp4145846.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>



-- 
A.Nazemian