You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@druid.apache.org by GitBox <gi...@apache.org> on 2022/10/05 20:08:49 UTC

[GitHub] [druid] techdocsmith opened a new issue, #13174: Add example for nested columns with streaming

techdocsmith opened a new issue, #13174:
URL: https://github.com/apache/druid/issues/13174

   Request from community member for a streaming example with nested JSON, assuming support.
   
   Per @gianm Usage with streaming would be similar to native batch: the dimensionsSpec and transformSpec work the same way


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] sergioferragut commented on issue #13174: Add example for nested columns with streaming

Posted by GitBox <gi...@apache.org>.
sergioferragut commented on issue #13174:
URL: https://github.com/apache/druid/issues/13174#issuecomment-1270806973

   Just tested it by using the kafka tutorial but replacing the wikipedia data with kttm nested data:
   Steps:
   
   Create the topic
   `./bin/kafka-topics.sh --create --topic kttm_nested --bootstrap-server localhost:9092`
   
   Get the nested data from kttm nested example:
   ```
   curl https://static.imply.io/example-data/kttm-nested-v2/kttm-nested-v2-2019-08-25.json.gz -o kttm-nested-data.json.gz
   gunzip -c kttm-nested-data.json.gz > kttm-nested-data.json
   ```
   
   Publish to the topic:
   ```
   export KAFKA_OPTS="-Dfile.encoding=UTF-8"
   ./bin/kafka-console-producer.sh --broker-list localhost:9092 --topic kttm_nested < kttm-nested-data.json
   ```
   
   The UI for "Load Data" does not automatically recognize the nested JSON columns in the parsing step. 
   In the "Configure Schema" step, you can use "Add dimension", type the name and choose type "json".
   
   The resulting Ingestion Spec:
   ```{
     "type": "kafka",
     "spec": {
       "ioConfig": {
         "type": "kafka",
         "consumerProperties": {
           "bootstrap.servers": "localhost:9092"
         },
         "topic": "kttm_nested",
         "inputFormat": {
           "type": "json"
         },
         "useEarliestOffset": true
       },
       "tuningConfig": {
         "type": "kafka"
       },
       "dataSchema": {
         "dataSource": "kttm_nested",
         "timestampSpec": {
           "column": "timestamp",
           "format": "iso"
         },
         "dimensionsSpec": {
           "dimensions": [
             "session",
             "number",
             "client_ip",
             "language",
             "adblock_list",
             "app_version",
             "path",
             "loaded_image",
             "referrer",
             "referrer_host",
             "server_ip",
             "screen",
             "window",
             {
               "type": "long",
               "name": "session_length"
             },
             "timezone",
             "timezone_offset",
             {
               "type": "json",
               "name": "event"
             },
             {
               "type": "json",
               "name": "agent"
             },
             {
               "type": "json",
               "name": "geo_ip"
             }
           ]
         },
         "granularitySpec": {
           "queryGranularity": "none",
           "rollup": false,
           "segmentGranularity": "hour"
         }
       }
     }
   }
   ```
   
   @techdocsmith, This example works, but it requires the kafka setup steps to run, so I'm not sure if it fits in the nested columns docs page as is. Perhaps adjust the kafka tutorial so it uses this source instead? Let me know how else to help.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] techdocsmith commented on issue #13174: Add example for nested columns with streaming

Posted by GitBox <gi...@apache.org>.
techdocsmith commented on issue #13174:
URL: https://github.com/apache/druid/issues/13174#issuecomment-1270900062

   @sergioferragut , it could potentially go both places. on Nested columns to show it's possible & in the tutorial too. Thanks for sharing!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] FrankChen021 closed issue #13174: Add example for nested columns with streaming

Posted by GitBox <gi...@apache.org>.
FrankChen021 closed issue #13174: Add example for nested columns with streaming
URL: https://github.com/apache/druid/issues/13174


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org