You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Hari Sekhon (JIRA)" <ji...@apache.org> on 2015/04/15 13:04:00 UTC

[jira] [Updated] (SOLR-7393) HDFS poor bulk indexing performance

     [ https://issues.apache.org/jira/browse/SOLR-7393?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hari Sekhon updated SOLR-7393:
------------------------------
    Summary: HDFS poor bulk indexing performance  (was: HDFS bulk indexing performance)

> HDFS poor bulk indexing performance
> -----------------------------------
>
>                 Key: SOLR-7393
>                 URL: https://issues.apache.org/jira/browse/SOLR-7393
>             Project: Solr
>          Issue Type: Bug
>          Components: Hadoop Integration, hdfs, SolrCloud
>    Affects Versions: 4.7.2, 4.10.3
>         Environment: HDP 2.2 / HDP Search + LucidWorks Hive SerDe
>            Reporter: Hari Sekhon
>            Priority: Critical
>
> When switching SolrCloud from local dataDir to HDFS directory factory indexing performance falls through the floor.
> A previous Hive to SolrCloud online indexing job that took 2 hours for 620M rows ended up taking a projected 20+ hours and never completing, usually breaking around the 16-17 hour timeframe when left overnight.
> It's worth noting that I had to disable the HDFS write cache which was causing index corruption (SOLR-7255) on the advice of Mark Miller, who tells me this doesn't make much performance difference anway.
> This is probably also related to SolrCloud not respecting HDFS replication factor, effectively making 4 copies of data instead of 2 (SOLR-6528), but that solely doesn't account for the massive performance drop going from vanilla SolrCloud to SolrCloud on HDFS HA + Kerberos.
> Hari Sekhon
> http://www.linkedin.com/in/harisekhon



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org