You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@druid.apache.org by GitBox <gi...@apache.org> on 2019/06/24 15:59:06 UTC

[GitHub] [incubator-druid] vogievetsky opened a new issue #7952: Transform specs are ignored if dimensions auto detection is used

vogievetsky opened a new issue #7952: Transform specs are ignored if dimensions auto detection is used
URL: https://github.com/apache/incubator-druid/issues/7952
 
 
   The column auto detecting feature does not find the new columns created by transforms.
   
   ### Affected Version
   
   All versions of Druid so far (up to 0.15.0)
   
   ### Description
   
   Say you have data:
   ```
   {"a":"hello","b":"world"}
   {"a":"where","c":"to go"}
   ```
   
   In a file that lives at: `/Users/vadim/Downloads/test-data.json`
   
   And you ingest it with:
   ```
   {
     "dataSchema": {
       "dataSource": "Downloads",
       "parser": {
         "type": "string",
         "parseSpec": {
           "format": "json",
           "timestampSpec": {
             "column": "!!!_no_such_column_!!!",
             "missingValue": "2010-01-01T00:00:00Z"
           },
           "dimensionsSpec": {}
         }
       },
       "metricsSpec": [
         {
           "name": "count",
           "type": "count"
         }
       ],
       "granularitySpec": {
         "type": "uniform",
         "segmentGranularity": "DAY",
         "queryGranularity": "HOUR",
         "rollup": true,
         "intervals": null
       },
       "transformSpec": {
         "filter": null,
         "transforms": [
           {
             "type": "expression",
             "name": "a_prime",
             "expression": "concat(\"a\",'_prime')"
           }
         ]
       }
     },
     "ioConfig": {
       "type": "index_parallel",
       "firehose": {
         "type": "local",
         "baseDir": "/Users/vadim/Downloads",
         "filter": "test-data.json"
       },
       "appendToExisting": false
     },
     "tuningConfig": {
       "type": "index_parallel",
       "maxRowsPerSegment": null,
       "maxRowsInMemory": 1000000,
       "maxBytesInMemory": 0,
       "maxTotalRows": null,
       "numShards": null,
       "indexSpec": {
         "bitmap": {
           "type": "concise"
         },
         "dimensionCompression": "lz4",
         "metricCompression": "lz4",
         "longEncoding": "longs"
       },
       "maxPendingPersists": 0,
       "forceGuaranteedRollup": false,
       "reportParseExceptions": false,
       "pushTimeout": 0,
       "segmentWriteOutMediumFactory": null,
       "maxNumSubTasks": 1,
       "maxRetry": 3,
       "taskStatusCheckPeriodMs": 1000,
       "chatHandlerTimeout": "PT10S",
       "chatHandlerNumRetries": 5,
       "logParseExceptions": false,
       "maxParseExceptions": 2147483647,
       "maxSavedParseExceptions": 0,
       "partitionDimensions": [],
       "buildV9Directly": true
     },
     "type": "index_parallel"
   }
   ```
   
   Notice how I am trying to create an `a_prime` column with a transform spec.
   
   The job will work but when you query the data:
   
   ![image](https://user-images.githubusercontent.com/177816/60033822-28f8d880-965e-11e9-9cac-557fcc714b87.png)
   
   You see that there is no `a_prime` column.
   
   I would be great (and would make a ton more sense) if the transforms added themselves to the column list coming from the file.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org