You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Greenhorn Techie <gr...@gmail.com> on 2018/06/07 12:41:32 UTC
Running Solr on HDFS - Disk space
Hi,
As HDFS has got its own replication mechanism, with a HDFS replication
factor of 3, and then SolrCloud replication factor of 3, does that mean
each document will probably have around 9 copies replicated underneath of
HDFS? If so, is there a way to configure HDFS or Solr such that only three
copies are maintained overall?
Thanks
Re: Running Solr on HDFS - Disk space
Posted by Hendrik Haddorp <he...@gmx.net>.
The only option should be to configure Solr to just have a replication
factor of 1 or HDFS to have no replication. I would go for the middle
and configure both to use a factor of 2. This way a single failure in
HDFS and Solr is not a problem. While in 1/3 or 3/1 option a single
server error would bring the collection down.
Setting the HDFS replication factor is a bit tricky as Solr takes in
some places the default replication factor set on HDFS and some times
takes a default from the client side. HDFS allows you to set a
replication factor for every file individually.
regards,
Hendrik
On 07.06.2018 15:30, Shawn Heisey wrote:
> On 6/7/2018 6:41 AM, Greenhorn Techie wrote:
>> As HDFS has got its own replication mechanism, with a HDFS replication
>> factor of 3, and then SolrCloud replication factor of 3, does that mean
>> each document will probably have around 9 copies replicated
>> underneath of
>> HDFS? If so, is there a way to configure HDFS or Solr such that only
>> three
>> copies are maintained overall?
>
> Yes, that is exactly what happens.
>
> SolrCloud replication assumes that each of its replicas is a
> completely independent index. I am not aware of anything in Solr's
> HDFS support that can use one HDFS index directory for multiple
> replicas. At the most basic level, a Solr index is a Lucene index.
> Lucene goes to great lengths to make sure that an index *CANNOT* be
> used in more than one place.
>
> Perhaps somebody who is more familiar with HDFSDirectoryFactory can
> offer you a solution. But as far as I know, there isn't one.
>
> Thanks,
> Shawn
>
Re: Running Solr on HDFS - Disk space
Posted by Shawn Heisey <ap...@elyograg.org>.
On 6/7/2018 6:41 AM, Greenhorn Techie wrote:
> As HDFS has got its own replication mechanism, with a HDFS replication
> factor of 3, and then SolrCloud replication factor of 3, does that mean
> each document will probably have around 9 copies replicated underneath of
> HDFS? If so, is there a way to configure HDFS or Solr such that only three
> copies are maintained overall?
Yes, that is exactly what happens.
SolrCloud replication assumes that each of its replicas is a completely
independent index. I am not aware of anything in Solr's HDFS support
that can use one HDFS index directory for multiple replicas. At the
most basic level, a Solr index is a Lucene index. Lucene goes to great
lengths to make sure that an index *CANNOT* be used in more than one place.
Perhaps somebody who is more familiar with HDFSDirectoryFactory can
offer you a solution. But as far as I know, there isn't one.
Thanks,
Shawn