You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@druid.apache.org by GitBox <gi...@apache.org> on 2020/09/16 02:09:17 UTC

[GitHub] [druid] Maplejw opened a new issue #10399: index_hadoop failed when druid DetermineHashedPartitionsJob

Maplejw opened a new issue #10399:
URL: https://github.com/apache/druid/issues/10399


   I am ingestion data by hadoop, after the task posted and the mapreduce job running a while. it report an error.
   The mapreduce completed,but DetermineHashedPartitionsJob is failed.
   ```error
   2020-09-15T12:02:26,785 INFO [task-runner-0-priority-0] org.apache.druid.indexer.DetermineHashedPartitionsJob - Job completed, loading up partitions for intervals[Optional.absent()].
   2020-09-15T12:02:26,789 ERROR [task-runner-0-priority-0] org.apache.druid.indexing.common.task.HadoopIndexTask - Got invocation target exception in run(), cause: 
   java.lang.RuntimeException: org.apache.druid.java.util.common.ISE: Path[var/druid/hadoop-tmp/cm_client_behavior/2020-09-15T120154.972Z_dfa47a061aa94dcfa0108e29a55f115f/intervals.json] didn't exist!?
   	at org.apache.druid.indexer.DetermineHashedPartitionsJob.run(DetermineHashedPartitionsJob.java:224) ~[druid-indexing-hadoop-0.15.0-incubating-iap5.jar:0.15.0-incubating-iap5]
   	at org.apache.druid.indexer.JobHelper.runSingleJob(JobHelper.java:384) ~[druid-indexing-hadoop-0.15.0-incubating-iap5.jar:0.15.0-incubating-iap5]
   	at org.apache.druid.indexer.HadoopDruidDetermineConfigurationJob.run(HadoopDruidDetermineConfigurationJob.java:58) ~[druid-indexing-hadoop-0.15.0-incubating-iap5.jar:0.15.0-incubating-iap5]
   	at org.apache.druid.indexing.common.task.HadoopIndexTask$HadoopDetermineConfigInnerProcessingRunner.runTask(HadoopIndexTask.java:617) ~[druid-indexing-service-0.15.0-incubating-iap5.jar:0.15.0-incubating-iap5]
   	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_211]
   	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:1.8.0_211]
   	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_211]
   	at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_211]
   	at org.apache.druid.indexing.common.task.HadoopIndexTask.runInternal(HadoopIndexTask.java:309) ~[druid-indexing-service-0.15.0-incubating-iap5.jar:0.15.0-incubating-iap5]
   	at org.apache.druid.indexing.common.task.HadoopIndexTask.run(HadoopIndexTask.java:244) [druid-indexing-service-0.15.0-incubating-iap5.jar:0.15.0-incubating-iap5]
   	at org.apache.druid.indexing.overlord.SingleTaskBackgroundRunner$SingleTaskBackgroundRunnerCallable.call(SingleTaskBackgroundRunner.java:419) [druid-indexing-service-0.15.0-incubating-iap5.jar:0.15.0-incubating-iap5]
   	at org.apache.druid.indexing.overlord.SingleTaskBackgroundRunner$SingleTaskBackgroundRunnerCallable.call(SingleTaskBackgroundRunner.java:391) [druid-indexing-service-0.15.0-incubating-iap5.jar:0.15.0-incubating-iap5]
   	at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_211]
   	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_211]
   	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_211]
   	at java.lang.Thread.run(Thread.java:748) [?:1.8.0_211]
   Caused by: org.apache.druid.java.util.common.ISE: Path[var/druid/hadoop-tmp/cm_client_behavior/2020-09-15T120154.972Z_dfa47a061aa94dcfa0108e29a55f115f/intervals.json] didn't exist!?
   	at org.apache.druid.indexer.DetermineHashedPartitionsJob.run(DetermineHashedPartitionsJob.java:155) 
   ```
   we use UTC .
   and here is the ingestion config:
   ```json
   {
     "type" : "index_hadoop",
     "spec" : {
       "dataSchema" : {
         "dataSource" : "cm_client_behavior",
         "parser" : {
           "type" : "hadoopyString",
           "parseSpec" : {
             "format" : "json",
             "dimensionsSpec" : {
               "dimensions" : [ "device_id", "os", "country", "sys_lang", "sys_version", "app_version", "channel", "device_model", "behavior_id", "behavior_desc", "is_new" ]
             },
             "timestampSpec" : {
               "column" : "druid_time",
               "format" : "yyyy-MM-dd HH:mm:ss"
             }
           }
         },
         "metricsSpec" : [ {
           "type" : "longSum",
           "name" : "behavior_total_times",
           "fieldName" : "behavior_total_times",
           "expression" : null
         } ],
         "granularitySpec" : {
           "type" : "uniform",
           "segmentGranularity" : "DAY",
           "queryGranularity" : "HOUR",
           "rollup" : true,
           "intervals" : null
         },
         "transformSpec" : {
           "filter" : null,
           "transforms" : [ ]
         }
       },
       "ioConfig" : {
         "type" : "hadoop",
         "inputSpec" : {
           "type" : "static",
           "paths" : "s3a://igg-rd8-data-project/cm/dw/dws/client_behavior/proc_date=20200909/*"
         },
         "metadataUpdateSpec" : null,
         "segmentOutputPath" : null
       },
       "tuningConfig" : {
         "type" : "hadoop",
         "workingPath" : null,
         "version" : "2020-09-15T12:01:54.972Z",
         "partitionsSpec" : {
           "type" : "hashed",
           "targetPartitionSize" : 5000000,
           "maxPartitionSize" : 7500000,
           "assumeGrouped" : false,
           "numShards" : -1,
           "partitionDimensions" : [ "behavior_id" ]
         },
         "shardSpecs" : { },
         "indexSpec" : {
           "bitmap" : {
             "type" : "concise"
           },
           "dimensionCompression" : "lz4",
           "metricCompression" : "lz4",
           "longEncoding" : "longs"
         },
         "maxRowsInMemory" : 1000000,
         "maxBytesInMemory" : 0,
         "leaveIntermediate" : false,
         "cleanupOnFailure" : true,
         "overwriteFiles" : false,
         "ignoreInvalidRows" : false,
         "jobProperties" : {
           "mapreduce.job.classloader" : "true",
           "mapreduce.job.user.classpath.first" : "true",
           "mapreduce.map.java.opts" : "-Duser.timezone=UTC -Dfile.encoding=UTF-8",
           "mapreduce.reduce.java.opts" : "-Xmx2457m -Xms2457m -Duser.timezone=UTC -Dfile.encoding=UTF-8",
           "fs.s3a.access.key" : "xxxxxxxxxxxxxxxxxxxxxxxxx",
           "fs.s3a.secret.key" : "xxxxxxxxxxxxxxxxxxxxxxxxx",
           "fs.s3a.endpoint" : "s3.cn-north-1.amazonaws.com.cn"
         }
       }
     }
   }
   ```
   version:
   Druid: 0.15.0
   Hadoop: CDH-6.2.1 
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org

[GitHub] [druid] Maplejw commented on issue #10399: index_hadoop failed when druid DetermineHashedPartitionsJob

Posted by GitBox <gi...@apache.org>.

Maplejw commented on issue #10399:
URL: https://github.com/apache/druid/issues/10399#issuecomment-694113434


   It resolved. 
   Becaues the config  druid.indexer.task.hadoopWorkingPath=var/druid/hadoop-tmp.  And druid index task does not have permission to create this dir on HDFS. Finally, I change this config druid.indexer.task.hadoopWorkingPath=/tmp/druid-indexing. And druid index task can create this dir.
   So I think this error report is so confused,if the error report is "do not have permission to operate the var/druid/hadoop-tmp", it will be better 
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org

[GitHub] [druid] Maplejw closed issue #10399: index_hadoop failed when druid DetermineHashedPartitionsJob

Posted by GitBox <gi...@apache.org>.

Maplejw closed issue #10399:
URL: https://github.com/apache/druid/issues/10399


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org