You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@druid.apache.org by GitBox <gi...@apache.org> on 2021/06/24 10:01:11 UTC

[GitHub] [druid] tarekabouzeid opened a new issue #11384: Druid ingestion from S3 prefix didnot catch S3 API call errors

tarekabouzeid opened a new issue #11384:
URL: https://github.com/apache/druid/issues/11384


   
   ### Affected Version
   
   0.21.1
   
   ### Description
   
   We were ingesting parquet files from Minio using S3 prefix and pointing to a bucket. Druid gave an estimate number of tasks 998 "estimatedNumSucceededTasks[998]". Then after reading 1000 files at task [125] an error happened in Minio logs as below
   
   "
   API: ListObjectsV2
   Time: 23:28:23:0
   DeploymentID: 15313dfa-ae71-4b9d-a474-873786ed2b28
   RequestID: 168B59DD68270F74
   RemoteHost: xxx.xxx.xxx.xxx
   Host: xxxx.xxxx.xxx.xxxx:9000
   UserAgent: aws-sdk-java/1.11.199 Linux/3.10.0-1062.7.1.el7.x86_64 OpenJDK_64-Bit_Server_VM/25.292-b10 java/1.8.0_292
   Error: remote listing canceled: file not found (*fmt.wrapError)
   cmd/metacache-set.go:541:cmd.(*erasureObjects).listPath()
   cmd/metacache-server-pool.go:174:cmd.(*erasureServerPools).listPath.func1()
   "
   
   But the main index_parallel task didnot detect that error and it just marked the whole ingestion pipeline as success and didnot load the entire dataset from Minio
   
   **Log snippet from the ingestion task** : 
   
   "
   2021-06-23T23:29:45,090 DEBUG [qtp1954745715-173] org.apache.druid.jetty.RequestLog - xxx.xxx.xxx.xxx POST //myhost:8100/druid/worker/v1/chat/index_parallel_sparkpublic_trial_1_gdbmhfbg_2021-06-23T21%3A46%3A23.677Z/report HTTP/1.1
   2021-06-23T23:29:48,042 INFO [task-monitor-0] org.apache.druid.indexing.common.task.batch.parallel.TaskMonitor - [122/998] tasks succeeded
   2021-06-23T23:30:17,535 DEBUG [qtp1954745715-126] org.apache.druid.jetty.RequestLog - xxx.xxx.xxx.xxx POST //myhost:8100/druid/worker/v1/chat/index_parallel_sparkpublic_trial_1_gdbmhfbg_2021-06-23T21%3A46%3A23.677Z/report HTTP/1.1
   2021-06-23T23:30:21,035 INFO [task-monitor-0] org.apache.druid.indexing.common.task.batch.parallel.TaskMonitor - [123/998] tasks succeeded
   2021-06-23T23:31:22,055 DEBUG [qtp1954745715-130] org.apache.druid.jetty.RequestLog - xxx.xxx.xxx.xxx POST //myhost:8100/druid/worker/v1/chat/index_parallel_sparkpublic_trial_1_gdbmhfbg_2021-06-23T21%3A46%3A23.677Z/report HTTP/1.1
   2021-06-23T23:31:23,035 INFO [task-monitor-0] org.apache.druid.indexing.common.task.batch.parallel.TaskMonitor - [124/998] tasks succeeded
   2021-06-23T23:31:34,953 DEBUG [qtp1954745715-144] org.apache.druid.jetty.RequestLog - xxx.xxx.xxx.xxx POST //myhost:8100/druid/worker/v1/chat/index_parallel_sparkpublic_trial_1_gdbmhfbg_2021-06-23T21%3A46%3A23.677Z/report HTTP/1.1
   2021-06-23T23:31:37,035 INFO [task-monitor-0] org.apache.druid.indexing.common.task.batch.parallel.TaskMonitor - [125/998] tasks succeeded
   2021-06-23T23:31:37,036 INFO [task-runner-0-priority-0] org.apache.druid.indexing.common.task.batch.parallel.ParallelIndexPhaseRunner - Cleaning up resources
   2021-06-23T23:31:37,036 INFO [task-runner-0-priority-0] org.apache.druid.indexing.common.task.batch.parallel.TaskMonitor - Stopped taskMonitor
   2021-06-23T23:31:37,787 INFO [task-runner-0-priority-0] org.apache.druid.indexing.common.task.batch.parallel.ParallelIndexSupervisorTask - Published [372] segments
   2021-06-23T23:31:37,792 INFO [task-runner-0-priority-0] org.apache.druid.indexing.worker.executor.ExecutorLifecycle - Task completed with status: {
     "id" : "index_parallel_sparkpublic_trial_1_gdbmhfbg_2021-06-23T21:46:23.677Z",
     "status" : "SUCCESS",
     "duration" : 6303374,
     "errorMsg" : null,
     "location" : {
       "host" : null,
       "port" : -1,
       "tlsPort" : -1
     }
   }
   2021-06-23T23:31:37,802 INFO [main] org.apache.druid.java.util.common.lifecycle.Lifecycle - Stopping lifecycle [module] stage [ANNOUNCEMENTS]
   2021-06-23T23:31:37,805 INFO [main] org.apache.druid.java.util.common.lifecycle.Lifecycle - Stopping lifecycle [module] stage [SERVER]
   2021-06-23T23:31:37,815 INFO [main] org.eclipse.jetty.server.AbstractConnector - Stopped ServerConnector@7cf66cf9{HTTP/1.1, (http/1.1)}{0.0.0.0:8100}
   2021-06-23T23:31:37,815 INFO [main] org.eclipse.jetty.server.session - node0 Stopped scavenging
   2021-06-23T23:31:37,818 INFO [main] org.eclipse.jetty.server.handler.ContextHandler - Stopped o.e.j.s.ServletContextHandler@4dffa400{/,null,STOPPED}
   2021-06-23T23:31:37,830 INFO [main] org.apache.druid.java.util.common.lifecycle.Lifecycle - Stopping lifecycle [module] stage [NORMAL]
   2021-06-23T23:31:37,831 INFO [main] org.apache.druid.indexing.overlord.SingleTaskBackgroundRunner - Starting graceful shutdown of task[index_parallel_sparkpublic_trial_1_gdbmhfbg_2021-06-23T21:46:23.677Z].
   2021-06-23T23:31:37,832 INFO [main] org.apache.druid.indexing.common.task.batch.parallel.ParallelIndexPhaseRunner - Cleaning up resources
   2021-06-23T23:31:37,872 INFO [LookupExtractorFactoryContainerProvider-MainThread] org.apache.druid.query.lookup.LookupReferencesManager - Lookup Management loop exited. Lookup notices are not handled anymore.
   2021-06-23T23:31:37,872 INFO [main] org.apache.druid.security.basic.authorization.db.cache.CoordinatorPollingBasicAuthorizerCacheManager - CoordinatorPollingBasicAuthorizerCacheManager is stopping.
   2021-06-23T23:31:37,873 INFO [main] org.apache.druid.security.basic.authorization.db.cache.CoordinatorPollingBasicAuthorizerCacheManager - CoordinatorPollingBasicAuthorizerCacheManager is stopped.
   2021-06-23T23:31:37,873 INFO [main] org.apache.druid.security.basic.authentication.db.cache.CoordinatorPollingBasicAuthenticatorCacheManager - CoordinatorPollingBasicAuthenticatorCacheManager is stopping.
   2021-06-23T23:31:37,873 INFO [main] org.apache.druid.security.basic.authentication.db.cache.CoordinatorPollingBasicAuthenticatorCacheManager - CoordinatorPollingBasicAuthenticatorCacheManager is stopped.
   2021-06-23T23:31:37,875 INFO [Curator-Framework-0] org.apache.curator.framework.imps.CuratorFrameworkImpl - backgroundOperationsLoop exiting
   2021-06-23T23:31:37,879 INFO [main] org.apache.zookeeper.ZooKeeper - Session: 0x300bcb2cd840004 closed
   2021-06-23T23:31:37,879 INFO [main-EventThread] org.apache.zookeeper.ClientCnxn - EventThread shut down for session: 0x300bcb2cd840004
   2021-06-23T23:31:37,912 INFO [main] org.apache.druid.java.util.common.lifecycle.Lifecycle$CloseableHandler - Closing object[org.asynchttpclient.DefaultAsyncHttpClient@66fff42f]
   2021-06-23T23:31:37,913 INFO [main] org.apache.druid.java.util.common.lifecycle.Lifecycle - Stopping lifecycle [module] stage [INIT]
   
   
   
   
   
   "
   
   **Total Number of files in the bucket** : 7979 files
   **Each file size** : 123 MB
   
   **below is the spec used** 
   
   
   "
   {
     "type": "index_parallel",
     "id": "index_parallel_sparkpublic_trial_1_gdbmhfbg_2021-06-23T21:46:23.677Z",
     "groupId": "index_parallel_sparkpublic_trial_1_gdbmhfbg_2021-06-23T21:46:23.677Z",
     "resource": {
       "availabilityGroup": "index_parallel_sparkpublic_trial_1_gdbmhfbg_2021-06-23T21:46:23.677Z",
       "requiredCapacity": 1
     },
     "spec": {
       "dataSchema": {
         "dataSource": "sparkpublic_trial_1",
         "timestampSpec": {
           "column": "timestamp_col",
           "format": "posix",
           "missingValue": "2020-01-01T00:00:00.000Z"
         },
         "dimensionsSpec": {
           "dimensions": [
             {
               "type": "long",
               "name": "c1",
               "multiValueHandling": "SORTED_ARRAY",
               "createBitmapIndex": false
             },
             {
               "type": "long",
               "name": "c2",
               "multiValueHandling": "SORTED_ARRAY",
               "createBitmapIndex": false
             },
             {
               "type": "long",
               "name": "c3",
               "multiValueHandling": "SORTED_ARRAY",
               "createBitmapIndex": false
             },
             {
               "type": "long",
               "name": "c4",
               "multiValueHandling": "SORTED_ARRAY",
               "createBitmapIndex": false
             },
             {
               "type": "long",
               "name": "c5",
               "multiValueHandling": "SORTED_ARRAY",
               "createBitmapIndex": false
             },
             {
               "type": "long",
               "name": "c6",
               "multiValueHandling": "SORTED_ARRAY",
               "createBitmapIndex": false
             },
             {
               "type": "long",
               "name": "c7",
               "multiValueHandling": "SORTED_ARRAY",
               "createBitmapIndex": false
             },
             {
               "type": "long",
               "name": "c8",
               "multiValueHandling": "SORTED_ARRAY",
               "createBitmapIndex": false
             },
             {
               "type": "long",
               "name": "c9",
               "multiValueHandling": "SORTED_ARRAY",
               "createBitmapIndex": false
             },
             {
               "type": "string",
               "name": "c10",
               "multiValueHandling": "SORTED_ARRAY",
               "createBitmapIndex": true
             },
             {
               "type": "string",
               "name": "c11",
               "multiValueHandling": "SORTED_ARRAY",
               "createBitmapIndex": true
             },
             {
               "type": "string",
               "name": "c12",
               "multiValueHandling": "SORTED_ARRAY",
               "createBitmapIndex": true
             },
             {
               "type": "string",
               "name": "c13",
               "multiValueHandling": "SORTED_ARRAY",
               "createBitmapIndex": true
             },
             {
               "type": "string",
               "name": "c14",
               "multiValueHandling": "SORTED_ARRAY",
               "createBitmapIndex": true
             },
             {
               "type": "string",
               "name": "c15",
               "multiValueHandling": "SORTED_ARRAY",
               "createBitmapIndex": true
             },
             {
               "type": "string",
               "name": "c16",
               "multiValueHandling": "SORTED_ARRAY",
               "createBitmapIndex": true
             },
             {
               "type": "string",
               "name": "c17",
               "multiValueHandling": "SORTED_ARRAY",
               "createBitmapIndex": true
             },
             {
               "type": "string",
               "name": "city",
               "multiValueHandling": "SORTED_ARRAY",
               "createBitmapIndex": true
             },
             {
               "type": "string",
               "name": "country",
               "multiValueHandling": "SORTED_ARRAY",
               "createBitmapIndex": true
             },
             {
               "type": "long",
               "name": "id",
               "multiValueHandling": "SORTED_ARRAY",
               "createBitmapIndex": false
             }
           ],
           "dimensionExclusions": [
             "timestamp_col"
           ]
         },
         "metricsSpec": [],
         "granularitySpec": {
           "type": "uniform",
           "segmentGranularity": "HOUR",
           "queryGranularity": {
             "type": "all"
           },
           "rollup": false,
           "intervals": null
         },
         "transformSpec": {
           "filter": null,
           "transforms": []
         }
       },
       "ioConfig": {
         "type": "index_parallel",
         "inputSource": {
           "type": "s3",
           "uris": null,
           "prefixes": [
             "s3://sparkpublic"
           ],
           "objects": null,
           "properties": null
         },
         "inputFormat": {
           "type": "parquet",
           "flattenSpec": {
             "useFieldDiscovery": true,
             "fields": []
           },
           "binaryAsString": false
         },
         "appendToExisting": false
       },
       "tuningConfig": {
         "type": "index_parallel",
         "maxRowsPerSegment": 5000000,
         "appendableIndexSpec": {
           "type": "onheap"
         },
         "maxRowsInMemory": 1000000,
         "maxBytesInMemory": 0,
         "maxTotalRows": null,
         "numShards": null,
         "splitHintSpec": null,
         "partitionsSpec": {
           "type": "dynamic",
           "maxRowsPerSegment": 5000000,
           "maxTotalRows": null
         },
         "indexSpec": {
           "bitmap": {
             "type": "roaring",
             "compressRunOnSerialization": true
           },
           "dimensionCompression": "lz4",
           "metricCompression": "lz4",
           "longEncoding": "longs",
           "segmentLoader": null
         },
         "indexSpecForIntermediatePersists": {
           "bitmap": {
             "type": "roaring",
             "compressRunOnSerialization": true
           },
           "dimensionCompression": "lz4",
           "metricCompression": "lz4",
           "longEncoding": "longs",
           "segmentLoader": null
         },
         "maxPendingPersists": 0,
         "forceGuaranteedRollup": false,
         "reportParseExceptions": false,
         "pushTimeout": 0,
         "segmentWriteOutMediumFactory": null,
         "maxNumConcurrentSubTasks": 5,
         "maxRetry": 3,
         "taskStatusCheckPeriodMs": 1000,
         "chatHandlerTimeout": "PT10S",
         "chatHandlerNumRetries": 5,
         "maxNumSegmentsToMerge": 100,
         "totalNumMergeTasks": 10,
         "logParseExceptions": false,
         "maxParseExceptions": 2147483647,
         "maxSavedParseExceptions": 0,
         "maxColumnsToMerge": -1,
         "buildV9Directly": true,
         "partitionDimensions": []
       }
     },
     "context": {
       "forceTimeChunkLock": true
     },
     "dataSource": "sparkpublic_trial_1"
   }
   
   "
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org