You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@druid.apache.org by GitBox <gi...@apache.org> on 2020/04/06 15:53:08 UTC
[GitHub] [druid] suneet-pit opened a new issue #9623: IllegalArgumentException: Can not deserialize while ingesting parquet file from hdfs

suneet-pit opened a new issue #9623: IllegalArgumentException: Can not deserialize while ingesting parquet file from hdfs
URL: https://github.com/apache/druid/issues/9623
 
 
   Trying to load the sample parquet data from HDFS to parquet but getting the illegal argument exeception:
   
   Can not deserialize Class com.fasterxml.jackson.annotation.JsonTypeInfo (of type annotation) as a Bean
   
   **2020-04-06T14:55:42,404 INFO [task-runner-0-priority-0] org.apache.hadoop.mapreduce.Job - Task Id : attempt_1584920897538_0110_m_000000_2, Status : FAILED
   Error: java.lang.IllegalArgumentException: Can not deserialize Class com.fasterxml.jackson.annotation.JsonTypeInfo (of type annotation) as a Bean
   	at com.fasterxml.jackson.databind.ObjectMapper._convert(ObjectMapper.java:2774)
   	at com.fasterxml.jackson.databind.ObjectMapper.convertValue(ObjectMapper.java:2700)**
   
   Version : druid-api-0.13.0-incubating
   
   druid-parquet-extensions-0.12.3.jar, is being used
   
   Sample parquet data:
   
   2019-04-03 12:00:00,"druid","test_POC"
   
   IngestionSpec used 
   {
     "type" : "index_hadoop",
     "spec" : {
        "ioConfig" : {
         "type" : "hadoop",
         "inputSpec" : {
           "type" : "static",
           "inputFormat": "io.druid.data.input.parquet.DruidParquetInputFormat",
           "paths" : "hdfs://druid-m:8020/data/testoutput_parquet/part-00000-565e5f99-fa33-41e0-9b99-855323b05a76-c000.parquet"
         }
       },
       "dataSchema" : {
         "dataSource" : "sample_data",
         "parser" : {
           "type" : "parquet",
           "parseSpec" : {
             "format" : "parquet",
             "dimensionsSpec" : {
               "dimensions" : []
             },
             "columns" : ["date_ts","name","project"],
             "timestampSpec" : {
               "format" : "auto",
               "column" : "date_ts"
             }
           }
         },
         "metricsSpec" : [],
         "granularitySpec" : {
           "type" : "uniform",
           "segmentGranularity" : "day",
           "queryGranularity" : "none",
           "rollup" : false
   
         }
       },
   
       "tuningConfig" : {
         "type" : "hadoop",
         "partitionsSpec" : {
           "type" : "hashed",
           "targetPartitionSize" : 50
         },
         "forceExtendableShardSpecs" : true,
         "jobProperties" : {
           "fs.default.name" : "hdfs://druid-m",
           "fs.defaultFS" : "hdfs://druid-m",
           "dfs.datanode.address" : "druid-m",
           "dfs.client.use.datanode.hostname" : "true",
           "dfs.datanode.use.datanode.hostname" : "true",
           "yarn.resourcemanager.hostname" : "druid-m",
           "yarn.nodemanager.vmem-check-enabled" : "false",
           "mapreduce.job.classloader": "true",
           "mapreduce.map.java.opts" : "-Duser.timezone=UTC -Dfile.encoding=UTF-8",
           "mapreduce.reduce.java.opts" : "-Duser.timezone=UTC -Dfile.encoding=UTF-8",
           "mapreduce.map.memory.mb" : 1024,
           "mapreduce.reduce.memory.mb" : 1024
         }
       }
     },
     "hadoopDependencyCoordinates": ["org.apache.hadoop:hadoop-client:2.9.2"]
   }
   
   }
   - Debugging that is already done-
   Tried different parseSpec format (parquet,TimeandDims,timespecFormat)
   Tried with different timestamp formats
   Tried parsing the dimensions explicitly
   Tried with parquet file having schema
   Tried with parquet file not having schema 
   
   Attached is the log file
   
   [log.txt](https://github.com/apache/druid/files/4439448/log.txt)
   
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org