You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by Isaac Hebsh <is...@gmail.com> on 2013/09/30 20:30:34 UTC

Re: Data duplication using Cloud+HDFS+Mirroring

Hi Greg, Did you get an answer?
I'm interested in the same question.

More generally, what are the benefits of HdfsDirectoryFactory, besides the
transparent restore of the shard contents in case of a disk failure, and
the ability to rebuild index using MR?
Is the next statement exact? blocks of a particular shard, which are
replicated to another node, will be never queried, since there is no solr
core configured to read them.

On Wed, Aug 7, 2013 at 8:46 PM, Greg Walters
<gw...@sherpaanalytics.com>wrote:

> While testing Solr's new ability to store data and transaction directories
> in HDFS I added an additional core to one of my testing servers that was
> configured as a backup (active but not leader) core for a shard elsewhere.
> It looks like this extra core copies the data into its own directory rather
> than just using the existing directory with the data that's already
> available to it.
>
> Since HDFS likely already has redundancy of the data covered via the
> replicationFactor is there a reason for non-leader cores to create their
> own data directory rather than doing reads on the existing master copy? I
> searched Jira for anything that suggests this behavior might change and
> didn't find any issues; is there any intent to address this?
>
> Thanks,
> Greg
>