You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@carbondata.apache.org by GitBox <gi...@apache.org> on 2021/12/10 14:00:46 UTC

[GitHub] [carbondata] pratyakshsharma opened a new pull request #4243: [CARBONDATA-4308]: added docs for streamer tool configs

pratyakshsharma opened a new pull request #4243:
URL: https://github.com/apache/carbondata/pull/4243


    ### Why is this PR needed?
    
    
    ### What changes were proposed in this PR?
   
       
    ### Does this PR introduce any user interface change?
    - No
    - Yes. (please explain the change and update document)
   
    ### Is any new testcase added?
    - No
    - Yes
   
       
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@carbondata.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4243: [CARBONDATA-4308]: added docs for streamer tool configs

Posted by GitBox <gi...@apache.org>.

CarbonDataQA2 commented on pull request #4243:
URL: https://github.com/apache/carbondata/pull/4243#issuecomment-1001940835


   Build Success with Spark 3.1, Please check CI http://121.244.95.60:12602/job/ApacheCarbon_PR_Builder_3.1/579/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@carbondata.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [carbondata] asfgit closed pull request #4243: [CARBONDATA-4308]: added docs for streamer tool configs

Posted by GitBox <gi...@apache.org>.

asfgit closed pull request #4243:
URL: https://github.com/apache/carbondata/pull/4243


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@carbondata.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4243: [CARBONDATA-4308]: added docs for streamer tool configs

Posted by GitBox <gi...@apache.org>.

CarbonDataQA2 commented on pull request #4243:
URL: https://github.com/apache/carbondata/pull/4243#issuecomment-991114414


   Build Failed  with Spark 2.3.4, Please check CI http://121.244.95.60:12602/job/ApacheCarbonPRBuilder2.3/6159/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@carbondata.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4243: [CARBONDATA-4308]: added docs for streamer tool configs

Posted by GitBox <gi...@apache.org>.

CarbonDataQA2 commented on pull request #4243:
URL: https://github.com/apache/carbondata/pull/4243#issuecomment-991614577






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@carbondata.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4243: [CARBONDATA-4308]: added docs for streamer tool configs

Posted by GitBox <gi...@apache.org>.

CarbonDataQA2 commented on pull request #4243:
URL: https://github.com/apache/carbondata/pull/4243#issuecomment-997387132


   Build Failed  with Spark 2.3.4, Please check CI http://121.244.95.60:12602/job/ApacheCarbonPRBuilder2.3/6170/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@carbondata.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4243: [CARBONDATA-4308]: added docs for streamer tool configs

Posted by GitBox <gi...@apache.org>.

CarbonDataQA2 commented on pull request #4243:
URL: https://github.com/apache/carbondata/pull/4243#issuecomment-998513376


   Build Failed  with Spark 2.3.4, Please check CI http://121.244.95.60:12602/job/ApacheCarbonPRBuilder2.3/6176/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@carbondata.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4243: [CARBONDATA-4308]: added docs for streamer tool configs

Posted by GitBox <gi...@apache.org>.

CarbonDataQA2 commented on pull request #4243:
URL: https://github.com/apache/carbondata/pull/4243#issuecomment-991645706


   Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12602/job/ApacheCarbon_PR_Builder_2.4.5/4417/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@carbondata.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4243: [CARBONDATA-4308]: added docs for streamer tool configs

Posted by GitBox <gi...@apache.org>.

CarbonDataQA2 commented on pull request #4243:
URL: https://github.com/apache/carbondata/pull/4243#issuecomment-994092913


   Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12602/job/ApacheCarbonPRBuilder2.3/6165/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@carbondata.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4243: [CARBONDATA-4308]: added docs for streamer tool configs

Posted by GitBox <gi...@apache.org>.

CarbonDataQA2 commented on pull request #4243:
URL: https://github.com/apache/carbondata/pull/4243#issuecomment-993988013


   Build Failed  with Spark 2.4.5, Please check CI http://121.244.95.60:12602/job/ApacheCarbon_PR_Builder_2.4.5/4422/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@carbondata.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4243: [CARBONDATA-4308]: added docs for streamer tool configs

Posted by GitBox <gi...@apache.org>.

CarbonDataQA2 commented on pull request #4243:
URL: https://github.com/apache/carbondata/pull/4243#issuecomment-995222533


   Build Success with Spark 3.1, Please check CI http://121.244.95.60:12602/job/ApacheCarbon_PR_Builder_3.1/557/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@carbondata.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4243: [CARBONDATA-4308]: added docs for streamer tool configs

Posted by GitBox <gi...@apache.org>.

CarbonDataQA2 commented on pull request #4243:
URL: https://github.com/apache/carbondata/pull/4243#issuecomment-998706543


   Build Failed  with Spark 3.1, Please check CI http://121.244.95.60:12602/job/ApacheCarbon_PR_Builder_3.1/569/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@carbondata.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [carbondata] akashrn5 commented on pull request #4243: [CARBONDATA-4308]: added docs for streamer tool configs

Posted by GitBox <gi...@apache.org>.

akashrn5 commented on pull request #4243:
URL: https://github.com/apache/carbondata/pull/4243#issuecomment-1000079551


   retest this please


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@carbondata.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4243: [CARBONDATA-4308]: added docs for streamer tool configs

Posted by GitBox <gi...@apache.org>.

CarbonDataQA2 commented on pull request #4243:
URL: https://github.com/apache/carbondata/pull/4243#issuecomment-1000311466


   Build Failed  with Spark 2.4.5, Please check CI http://121.244.95.60:12602/job/ApacheCarbon_PR_Builder_2.4.5/4440/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@carbondata.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4243: [CARBONDATA-4308]: added docs for streamer tool configs

Posted by GitBox <gi...@apache.org>.

CarbonDataQA2 commented on pull request #4243:
URL: https://github.com/apache/carbondata/pull/4243#issuecomment-1000657172


   Build Failed  with Spark 2.4.5, Please check CI http://121.244.95.60:12602/job/ApacheCarbon_PR_Builder_2.4.5/4441/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@carbondata.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4243: [CARBONDATA-4308]: added docs for streamer tool configs

Posted by GitBox <gi...@apache.org>.

CarbonDataQA2 commented on pull request #4243:
URL: https://github.com/apache/carbondata/pull/4243#issuecomment-1001109541


   Build Failed  with Spark 2.4.5, Please check CI http://121.244.95.60:12602/job/ApacheCarbon_PR_Builder_2.4.5/4442/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@carbondata.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [carbondata] pratyakshsharma commented on pull request #4243: [CARBONDATA-4308]: added docs for streamer tool configs

Posted by GitBox <gi...@apache.org>.

pratyakshsharma commented on pull request #4243:
URL: https://github.com/apache/carbondata/pull/4243#issuecomment-991566753


   retest this please


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@carbondata.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4243: [CARBONDATA-4308]: added docs for streamer tool configs

Posted by GitBox <gi...@apache.org>.

CarbonDataQA2 commented on pull request #4243:
URL: https://github.com/apache/carbondata/pull/4243#issuecomment-997377955


   Build Failed  with Spark 2.3.4, Please check CI http://121.244.95.60:12602/job/ApacheCarbonPRBuilder2.3/6169/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@carbondata.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [carbondata] pratyakshsharma commented on pull request #4243: [CARBONDATA-4308]: added docs for streamer tool configs

Posted by GitBox <gi...@apache.org>.

pratyakshsharma commented on pull request #4243:
URL: https://github.com/apache/carbondata/pull/4243#issuecomment-990997380


   @akashrn5 please take a look.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@carbondata.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4243: [CARBONDATA-4308]: added docs for streamer tool configs

Posted by GitBox <gi...@apache.org>.

CarbonDataQA2 commented on pull request #4243:
URL: https://github.com/apache/carbondata/pull/4243#issuecomment-998512515


   Build Failed  with Spark 2.4.5, Please check CI http://121.244.95.60:12602/job/ApacheCarbon_PR_Builder_2.4.5/4432/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@carbondata.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4243: [CARBONDATA-4308]: added docs for streamer tool configs

Posted by GitBox <gi...@apache.org>.

CarbonDataQA2 commented on pull request #4243:
URL: https://github.com/apache/carbondata/pull/4243#issuecomment-996597337


   Build Failed  with Spark 2.3.4, Please check CI http://121.244.95.60:12602/job/ApacheCarbonPRBuilder2.3/6167/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@carbondata.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4243: [CARBONDATA-4308]: added docs for streamer tool configs

Posted by GitBox <gi...@apache.org>.

CarbonDataQA2 commented on pull request #4243:
URL: https://github.com/apache/carbondata/pull/4243#issuecomment-1002070646


   Build Failed  with Spark 3.1, Please check CI http://121.244.95.60:12602/job/ApacheCarbon_PR_Builder_3.1/581/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@carbondata.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [carbondata] akashrn5 commented on a change in pull request #4243: [CARBONDATA-4308]: added docs for streamer tool configs

Posted by GitBox <gi...@apache.org>.

akashrn5 commented on a change in pull request #4243:
URL: https://github.com/apache/carbondata/pull/4243#discussion_r769692957



##########
File path: docs/scd-and-cdc-guide.md
##########
@@ -131,4 +131,88 @@ clauses can have at most one UPDATE and one DELETE action, These clauses have th
 
 * Please refer example class [MergeTestCase](https://github.com/apache/carbondata/blob/master/integration/spark/src/test/scala/org/apache/carbondata/spark/testsuite/merge/MergeTestCase.scala) to understand and implement scd and cdc scenarios using APIs.
 * Please refer example class [DataMergeIntoExample](https://github.com/apache/carbondata/blob/master/examples/spark/src/main/scala/org/apache/carbondata/examples/DataMergeIntoExample.scala) to understand and implement scd and cdc scenarios using sql. 
-* Please refer example class [DataUPSERTExample](https://github.com/apache/carbondata/blob/master/examples/spark/src/main/scala/org/apache/carbondata/examples/DataUPSERTExample.scala) to understand and implement cdc using UPSERT APIs.
\ No newline at end of file
+* Please refer example class [DataUPSERTExample](https://github.com/apache/carbondata/blob/master/examples/spark/src/main/scala/org/apache/carbondata/examples/DataUPSERTExample.scala) to understand and implement cdc using UPSERT APIs.
+
+### Streamer Tool
+
+Carbondata streamer tool is a very powerful tool for incrementally capturing change events from varied sources like kafka or DFS and merging them into target carbondata table. This essentially means one needs to integrate with external solutions like Debezium or Maxwell for moving the change events to kafka, if one wishes to capture changes from primary databases like mysql. The tool currently requires incoming data to be present in avro format and incoming schema to evolve in backwards compatible way.
+
+Below is a high level architecture of how the overall pipeline looks like -
+
+![Carbondata streamer tool pipeline](../docs/images/carbondata-streamer-tool-pipeline.png?raw=true)
+
+#### Configs
+
+Streamer tool exposes below configs for users to cater to their CDC use cases - 
+
+| Parameter                         | Default Value                                              | Description                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                |
+|-----------------------------------|------------------------------------------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| carbon.streamer.target.database   | (none)                                                     | The database name where the target table is present to merge the incoming data. If not given by user, system will take the current database in the spark session.                                                                                                                                                                                                                                                                                                                                          |

Review comment:
       in default value section, please add it takes current database from spark session

##########
File path: docs/scd-and-cdc-guide.md
##########
@@ -131,4 +131,88 @@ clauses can have at most one UPDATE and one DELETE action, These clauses have th
 
 * Please refer example class [MergeTestCase](https://github.com/apache/carbondata/blob/master/integration/spark/src/test/scala/org/apache/carbondata/spark/testsuite/merge/MergeTestCase.scala) to understand and implement scd and cdc scenarios using APIs.
 * Please refer example class [DataMergeIntoExample](https://github.com/apache/carbondata/blob/master/examples/spark/src/main/scala/org/apache/carbondata/examples/DataMergeIntoExample.scala) to understand and implement scd and cdc scenarios using sql. 
-* Please refer example class [DataUPSERTExample](https://github.com/apache/carbondata/blob/master/examples/spark/src/main/scala/org/apache/carbondata/examples/DataUPSERTExample.scala) to understand and implement cdc using UPSERT APIs.
\ No newline at end of file
+* Please refer example class [DataUPSERTExample](https://github.com/apache/carbondata/blob/master/examples/spark/src/main/scala/org/apache/carbondata/examples/DataUPSERTExample.scala) to understand and implement cdc using UPSERT APIs.
+
+### Streamer Tool
+
+Carbondata streamer tool is a very powerful tool for incrementally capturing change events from varied sources like kafka or DFS and merging them into target carbondata table. This essentially means one needs to integrate with external solutions like Debezium or Maxwell for moving the change events to kafka, if one wishes to capture changes from primary databases like mysql. The tool currently requires incoming data to be present in avro format and incoming schema to evolve in backwards compatible way.
+
+Below is a high level architecture of how the overall pipeline looks like -
+
+![Carbondata streamer tool pipeline](../docs/images/carbondata-streamer-tool-pipeline.png?raw=true)
+
+#### Configs
+
+Streamer tool exposes below configs for users to cater to their CDC use cases - 
+
+| Parameter                         | Default Value                                              | Description                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                |
+|-----------------------------------|------------------------------------------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| carbon.streamer.target.database   | (none)                                                     | The database name where the target table is present to merge the incoming data. If not given by user, system will take the current database in the spark session.                                                                                                                                                                                                                                                                                                                                          |
+| carbon.streamer.target.table      | (none)                                                     | The target carbondata table where the data has to be merged. If this is not configured by user, the operation will fail.                                                                                                                                                                                                                                                                                                                                                                                   |
+| carbon.streamer.source.type       | kafka                                                      | Streamer tool currently supports two types of data sources. One can ingest data from either kafka or DFS into target carbondata table using streamer tool.                                                                                                                                                                                                                                                                                                                                        |
+| carbon.streamer.dfs.input.path    | (none)                                                     | An absolute path on a given file system from where data needs to be read to ingest into the target carbondata table. Mandatory if the ingestion source type is DFS.                                                                                                                                                                                                                                                                                                                                        |
+| schema.registry.url               | (none)                                                     | Streamer tool supports 2 different ways to supply schema of incoming data. Schemas can be supplied using avro files (file based schema provider) or using schema registry. This property defines the url to connect to in case schema registry is used as the schema source.                                                                                                                                                                                                                               |
+| carbon.streamer.input.kafka.topic | (none)                                                     | This is a mandatory property to be set in case kafka is chosen as the source of data. This property defines the topics from where streamer tool will consume the data.                                                                                                                                                                                                                                                                                                                                     |
+| bootstrap.servers                 | (none)                                                     | This is another mandatory property in case kafka is chosen as the source of data. This defines the end points for kafka brokers.                                                                                                                                                                                                                                                                                                                                                                           |
+| auto.offset.reset | earliest                                                   | Streamer tool maintains checkpoints to keep a track of the incoming messages which are already consumed. In case of first ingestion using kafka source, this property defines the offset from where ingestion will start. This property can take only 2 valid values - `latest` and `earliest`                                                                                                                                                                                                             |
+| key.deserializer | `org.apache.kafka.common.serialization.StringDeserializer` | Any message in kafka is ultimately a key value pair in the form of serialized bytes. This property defines the deserializer to deserialize the key of a message.                                                                                                                                                                                                                                                                                                                                           |
+| value.deserializer | `io.confluent.kafka.serializers.KafkaAvroDeserializer`     | This property defines the class which will be used for deserializing the values present in kafka topic.                                                                                                                                                                                                                                                                                                                                                                                                    |

Review comment:
       actually, we are not using both `key.deserializer` and `value.deserializer` in code and we are using an inbuilt spark Avro deserializer, so can we remove these two from code and also from the doc?

##########
File path: docs/configuration-parameters.md
##########
@@ -179,7 +179,6 @@ This section provides the details of all the configurations required for the Car
 | carbon.update.storage.level | MEMORY_AND_DISK | Storage level to persist dataset of a RDD/dataframe. Applicable when ***carbon.update.persist.enable*** is **true**, if user's executor has less memory, set this parameter to 'MEMORY_AND_DISK_SER' or other storage level to correspond to different environment. [See detail](http://spark.apache.org/docs/latest/rdd-programming-guide.html#rdd-persistence). |
 | carbon.update.check.unique.value | true | By default this property is true, so update will validate key value mapping. This validation might have slight degrade in performance of update query. If user knows that key value mapping is correct, can disable this validation for better update performance by setting this property to false. |
 
-

Review comment:
       please revert this change if not needed

##########
File path: docs/scd-and-cdc-guide.md
##########
@@ -131,4 +131,88 @@ clauses can have at most one UPDATE and one DELETE action, These clauses have th
 
 * Please refer example class [MergeTestCase](https://github.com/apache/carbondata/blob/master/integration/spark/src/test/scala/org/apache/carbondata/spark/testsuite/merge/MergeTestCase.scala) to understand and implement scd and cdc scenarios using APIs.
 * Please refer example class [DataMergeIntoExample](https://github.com/apache/carbondata/blob/master/examples/spark/src/main/scala/org/apache/carbondata/examples/DataMergeIntoExample.scala) to understand and implement scd and cdc scenarios using sql. 
-* Please refer example class [DataUPSERTExample](https://github.com/apache/carbondata/blob/master/examples/spark/src/main/scala/org/apache/carbondata/examples/DataUPSERTExample.scala) to understand and implement cdc using UPSERT APIs.
\ No newline at end of file
+* Please refer example class [DataUPSERTExample](https://github.com/apache/carbondata/blob/master/examples/spark/src/main/scala/org/apache/carbondata/examples/DataUPSERTExample.scala) to understand and implement cdc using UPSERT APIs.
+
+### Streamer Tool
+
+Carbondata streamer tool is a very powerful tool for incrementally capturing change events from varied sources like kafka or DFS and merging them into target carbondata table. This essentially means one needs to integrate with external solutions like Debezium or Maxwell for moving the change events to kafka, if one wishes to capture changes from primary databases like mysql. The tool currently requires incoming data to be present in avro format and incoming schema to evolve in backwards compatible way.
+
+Below is a high level architecture of how the overall pipeline looks like -
+
+![Carbondata streamer tool pipeline](../docs/images/carbondata-streamer-tool-pipeline.png?raw=true)
+
+#### Configs
+
+Streamer tool exposes below configs for users to cater to their CDC use cases - 
+
+| Parameter                         | Default Value                                              | Description                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                |
+|-----------------------------------|------------------------------------------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| carbon.streamer.target.database   | (none)                                                     | The database name where the target table is present to merge the incoming data. If not given by user, system will take the current database in the spark session.                                                                                                                                                                                                                                                                                                                                          |
+| carbon.streamer.target.table      | (none)                                                     | The target carbondata table where the data has to be merged. If this is not configured by user, the operation will fail.                                                                                                                                                                                                                                                                                                                                                                                   |
+| carbon.streamer.source.type       | kafka                                                      | Streamer tool currently supports two types of data sources. One can ingest data from either kafka or DFS into target carbondata table using streamer tool.                                                                                                                                                                                                                                                                                                                                        |
+| carbon.streamer.dfs.input.path    | (none)                                                     | An absolute path on a given file system from where data needs to be read to ingest into the target carbondata table. Mandatory if the ingestion source type is DFS.                                                                                                                                                                                                                                                                                                                                        |
+| schema.registry.url               | (none)                                                     | Streamer tool supports 2 different ways to supply schema of incoming data. Schemas can be supplied using avro files (file based schema provider) or using schema registry. This property defines the url to connect to in case schema registry is used as the schema source.                                                                                                                                                                                                                               |
+| carbon.streamer.input.kafka.topic | (none)                                                     | This is a mandatory property to be set in case kafka is chosen as the source of data. This property defines the topics from where streamer tool will consume the data.                                                                                                                                                                                                                                                                                                                                     |
+| bootstrap.servers                 | (none)                                                     | This is another mandatory property in case kafka is chosen as the source of data. This defines the end points for kafka brokers.                                                                                                                                                                                                                                                                                                                                                                           |
+| auto.offset.reset | earliest                                                   | Streamer tool maintains checkpoints to keep a track of the incoming messages which are already consumed. In case of first ingestion using kafka source, this property defines the offset from where ingestion will start. This property can take only 2 valid values - `latest` and `earliest`                                                                                                                                                                                                             |
+| key.deserializer | `org.apache.kafka.common.serialization.StringDeserializer` | Any message in kafka is ultimately a key value pair in the form of serialized bytes. This property defines the deserializer to deserialize the key of a message.                                                                                                                                                                                                                                                                                                                                           |
+| value.deserializer | `io.confluent.kafka.serializers.KafkaAvroDeserializer`     | This property defines the class which will be used for deserializing the values present in kafka topic.                                                                                                                                                                                                                                                                                                                                                                                                    |
+| enable.auto.commit | false                                                      | Kafka maintains an internal topic for storing offsets corresponding to the consumer groups. This property determines if kafka should actually go forward and commit the offsets consumed in this internal topic. We recommend to keep it as false since we use spark streaming checkpointing to take care of the same.                                                                                                                                                                                     |
+| group.id | (none)                                                     | Streamer tool is ultimately a consumer for kafka. This property determines the consumer group id streamer tool belongs to.                                                                                                                                                                                                                                                                                                                                                                                 |
+| carbon.streamer.input.payload.format | avro                                                       | This determines the format of the incoming messages from source. Currently only avro is supported. We have plans to extend this support to json as well in near future. Avro is the most preferred format for CDC use cases since it helps in making the message size very compact and has good support for schema evolution use cases as well.                                                                                                                                                            |
+| carbon.streamer.schema.provider | SchemaRegistry                                             | As discussed earlier, streamer tool supports 2 ways of supplying schema for incoming messages - schema registry and avro files. Confluent schema registry is the preferred way when using avro as the input format.                                                                                                                                                                                                                                                                                        |
+| carbon.streamer.source.schema.path | (none)                                                     | This property defines the absolute path where files containing schemas for incoming messages are present.                                                                                                                                                                                                                                                                                                                                                                                                  |
+| carbon.streamer.merge.operation.type | upsert                                                     | This defines the operation that needs to be performed on the incoming batch of data while writing it to target data set.                                                                                                                                                                                                                                                                                                                                                                                   |
+| carbon.streamer.merge.operation.field | (none)                                                     | This property defines the field in incoming schema which contains the type of operation performed at source. For example, Debezium includes a field called `op` when reading change events from primary database. Do not confuse this property with `carbon.streamer.merge.operation.type` which defines the operation to be performed on the incoming batch of data. However this property is needed so that streamer tool is able to identify rows deleted at source when the operation type is `upsert`. |
+| carbon.streamer.record.key.field | (none)                                                     | This defines the record key for a particular incoming record. This is used by the streamer tool for performing deduplication. In case this is not defined, operation will fail.                                                                                                                                                                                                                                                                                                                            |
+| carbon.streamer.batch.interval | 10                                                         | Minimum batch interval time between 2 continuous ingestion in continuous mode. Should be specified in seconds.                                                                                                                                                                                                                                                                                                                                                                                             |
+| carbon.streamer.source.ordering.field | <none>                                                     | Name of the field from source schema whose value can be used for picking the latest updates for a particular record in the incoming batch in case of multiple updates for the same record key. Useful if the write operation type is UPDATE or UPSERT. This will be used only if `carbon.streamer.upsert.deduplicate` is enabled.                                                                                                                                                                          |
+| carbon.streamer.insert.deduplicate | false                                                      | This property specifies if the incoming batch needs to be deduplicated in case of INSERT operation type. If set to true, the incoming batch will be deduplicated against the existing data in the target carbondata table.                                                                                                                                                                                                                                                                                 |
+| carbon.streamer.upsert.deduplicate | true                                                       | This property specifies if the incoming batch needs to be deduplicated (when multiple updates for the same record key are present in the incoming batch) in case of UPSERT/UPDATE operation type. If set to true, the user needs to provide proper value for the source ordering field as well.                                                                                                                                                                                                            |
+| carbon.streamer.meta.columns | (none)                                                     | Generally when performing CDC operations on primary databases, few metadata columns are added along with the actual columns for book keeping purposes. This property enables users to list down all such metadata fields (comma separated) which should not be merged with the target carboondata table.                                                                                                                                                                                                   |
+| carbon.enable.schema.enforcement | true                                                       | This flag decides if table schema needs to change as per the incoming batch schema. If set to true, incoming schema will be validated with existing table schema. If the schema has evolved, the incoming batch cannot be ingested and job will simply fail.                                                                                                                                                                                                                                               |
+
+#### Commands
+
+1. For kafka source - 
+
+```
+bin/spark-submit --class org.apache.carbondata.streamer.CarbonDataStreamer \
+--master spark://root1-ThinkPad-T490s:7077 \
+jars/apache-carbondata-2.3.0-SNAPSHOT-bin-spark2.4.5-hadoop2.7.2.jar \

Review comment:
       for master url, remove the specific address and replace with more generalized address. Also carbondata 2.3.0 is not yet released, so the jar name also instead of giving actual value, you can mention like `<carbondata assembly jar path>` and `<spark master url>`, `<schema registry URL>`. This looks better i guess




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@carbondata.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4243: [CARBONDATA-4308]: added docs for streamer tool configs

Posted by GitBox <gi...@apache.org>.

CarbonDataQA2 commented on pull request #4243:
URL: https://github.com/apache/carbondata/pull/4243#issuecomment-995231504


   Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12602/job/ApacheCarbonPRBuilder2.3/6166/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@carbondata.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4243: [CARBONDATA-4308]: added docs for streamer tool configs

Posted by GitBox <gi...@apache.org>.

CarbonDataQA2 commented on pull request #4243:
URL: https://github.com/apache/carbondata/pull/4243#issuecomment-998719392


   Build Failed  with Spark 2.3.4, Please check CI http://121.244.95.60:12602/job/ApacheCarbonPRBuilder2.3/6178/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@carbondata.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4243: [CARBONDATA-4308]: added docs for streamer tool configs

Posted by GitBox <gi...@apache.org>.

CarbonDataQA2 commented on pull request #4243:
URL: https://github.com/apache/carbondata/pull/4243#issuecomment-1000683667


   Build Failed  with Spark 3.1, Please check CI http://121.244.95.60:12602/job/ApacheCarbon_PR_Builder_3.1/576/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@carbondata.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4243: [CARBONDATA-4308]: added docs for streamer tool configs

Posted by GitBox <gi...@apache.org>.

CarbonDataQA2 commented on pull request #4243:
URL: https://github.com/apache/carbondata/pull/4243#issuecomment-1000145766


   Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12602/job/ApacheCarbonPRBuilder2.3/6183/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@carbondata.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4243: [CARBONDATA-4308]: added docs for streamer tool configs

Posted by GitBox <gi...@apache.org>.

CarbonDataQA2 commented on pull request #4243:
URL: https://github.com/apache/carbondata/pull/4243#issuecomment-1002057639


   Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12602/job/ApacheCarbon_PR_Builder_2.4.5/4446/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@carbondata.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4243: [CARBONDATA-4308]: added docs for streamer tool configs

Posted by GitBox <gi...@apache.org>.

CarbonDataQA2 commented on pull request #4243:
URL: https://github.com/apache/carbondata/pull/4243#issuecomment-996609517


   Build Failed  with Spark 2.4.5, Please check CI http://121.244.95.60:12602/job/ApacheCarbon_PR_Builder_2.4.5/4424/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@carbondata.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4243: [CARBONDATA-4308]: added docs for streamer tool configs

Posted by GitBox <gi...@apache.org>.

CarbonDataQA2 commented on pull request #4243:
URL: https://github.com/apache/carbondata/pull/4243#issuecomment-991054672


   Build Failed  with Spark 2.4.5, Please check CI http://121.244.95.60:12602/job/ApacheCarbon_PR_Builder_2.4.5/4416/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@carbondata.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4243: [CARBONDATA-4308]: added docs for streamer tool configs

Posted by GitBox <gi...@apache.org>.

CarbonDataQA2 commented on pull request #4243:
URL: https://github.com/apache/carbondata/pull/4243#issuecomment-991630549


   Build Failed  with Spark 2.3.4, Please check CI http://121.244.95.60:12602/job/ApacheCarbonPRBuilder2.3/6160/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@carbondata.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [carbondata] akashrn5 commented on pull request #4243: [CARBONDATA-4308]: added docs for streamer tool configs

Posted by GitBox <gi...@apache.org>.

akashrn5 commented on pull request #4243:
URL: https://github.com/apache/carbondata/pull/4243#issuecomment-1002137006


   LGTM, the CI failure is not related to this PR


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@carbondata.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4243: [CARBONDATA-4308]: added docs for streamer tool configs

Posted by GitBox <gi...@apache.org>.

CarbonDataQA2 commented on pull request #4243:
URL: https://github.com/apache/carbondata/pull/4243#issuecomment-1001933888


   Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12602/job/ApacheCarbon_PR_Builder_2.4.5/4444/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@carbondata.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4243: [CARBONDATA-4308]: added docs for streamer tool configs

Posted by GitBox <gi...@apache.org>.

CarbonDataQA2 commented on pull request #4243:
URL: https://github.com/apache/carbondata/pull/4243#issuecomment-1001932080


   Build Failed  with Spark 2.3.4, Please check CI http://121.244.95.60:12602/job/ApacheCarbonPRBuilder2.3/6188/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@carbondata.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [carbondata] ydvpankaj99 commented on pull request #4243: [CARBONDATA-4308]: added docs for streamer tool configs

Posted by GitBox <gi...@apache.org>.

ydvpankaj99 commented on pull request #4243:
URL: https://github.com/apache/carbondata/pull/4243#issuecomment-1001373413


   retest this please


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@carbondata.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4243: [CARBONDATA-4308]: added docs for streamer tool configs

Posted by GitBox <gi...@apache.org>.

CarbonDataQA2 commented on pull request #4243:
URL: https://github.com/apache/carbondata/pull/4243#issuecomment-1000676215


   Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12602/job/ApacheCarbonPRBuilder2.3/6185/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@carbondata.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [carbondata] akashrn5 commented on a change in pull request #4243: [CARBONDATA-4308]: added docs for streamer tool configs

Posted by GitBox <gi...@apache.org>.

akashrn5 commented on a change in pull request #4243:
URL: https://github.com/apache/carbondata/pull/4243#discussion_r770227892



##########
File path: docs/scd-and-cdc-guide.md
##########
@@ -131,4 +131,88 @@ clauses can have at most one UPDATE and one DELETE action, These clauses have th
 
 * Please refer example class [MergeTestCase](https://github.com/apache/carbondata/blob/master/integration/spark/src/test/scala/org/apache/carbondata/spark/testsuite/merge/MergeTestCase.scala) to understand and implement scd and cdc scenarios using APIs.
 * Please refer example class [DataMergeIntoExample](https://github.com/apache/carbondata/blob/master/examples/spark/src/main/scala/org/apache/carbondata/examples/DataMergeIntoExample.scala) to understand and implement scd and cdc scenarios using sql. 
-* Please refer example class [DataUPSERTExample](https://github.com/apache/carbondata/blob/master/examples/spark/src/main/scala/org/apache/carbondata/examples/DataUPSERTExample.scala) to understand and implement cdc using UPSERT APIs.
\ No newline at end of file
+* Please refer example class [DataUPSERTExample](https://github.com/apache/carbondata/blob/master/examples/spark/src/main/scala/org/apache/carbondata/examples/DataUPSERTExample.scala) to understand and implement cdc using UPSERT APIs.
+
+### Streamer Tool
+
+Carbondata streamer tool is a very powerful tool for incrementally capturing change events from varied sources like kafka or DFS and merging them into target carbondata table. This essentially means one needs to integrate with external solutions like Debezium or Maxwell for moving the change events to kafka, if one wishes to capture changes from primary databases like mysql. The tool currently requires incoming data to be present in avro format and incoming schema to evolve in backwards compatible way.
+
+Below is a high level architecture of how the overall pipeline looks like -
+
+![Carbondata streamer tool pipeline](../docs/images/carbondata-streamer-tool-pipeline.png?raw=true)
+
+#### Configs
+
+Streamer tool exposes below configs for users to cater to their CDC use cases - 
+
+| Parameter                         | Default Value                                              | Description                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                |
+|-----------------------------------|------------------------------------------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| carbon.streamer.target.database   | (none)                                                     | The database name where the target table is present to merge the incoming data. If not given by user, system will take the current database in the spark session.                                                                                                                                                                                                                                                                                                                                          |
+| carbon.streamer.target.table      | (none)                                                     | The target carbondata table where the data has to be merged. If this is not configured by user, the operation will fail.                                                                                                                                                                                                                                                                                                                                                                                   |
+| carbon.streamer.source.type       | kafka                                                      | Streamer tool currently supports two types of data sources. One can ingest data from either kafka or DFS into target carbondata table using streamer tool.                                                                                                                                                                                                                                                                                                                                        |
+| carbon.streamer.dfs.input.path    | (none)                                                     | An absolute path on a given file system from where data needs to be read to ingest into the target carbondata table. Mandatory if the ingestion source type is DFS.                                                                                                                                                                                                                                                                                                                                        |
+| schema.registry.url               | (none)                                                     | Streamer tool supports 2 different ways to supply schema of incoming data. Schemas can be supplied using avro files (file based schema provider) or using schema registry. This property defines the url to connect to in case schema registry is used as the schema source.                                                                                                                                                                                                                               |
+| carbon.streamer.input.kafka.topic | (none)                                                     | This is a mandatory property to be set in case kafka is chosen as the source of data. This property defines the topics from where streamer tool will consume the data.                                                                                                                                                                                                                                                                                                                                     |
+| bootstrap.servers                 | (none)                                                     | This is another mandatory property in case kafka is chosen as the source of data. This defines the end points for kafka brokers.                                                                                                                                                                                                                                                                                                                                                                           |
+| auto.offset.reset | earliest                                                   | Streamer tool maintains checkpoints to keep a track of the incoming messages which are already consumed. In case of first ingestion using kafka source, this property defines the offset from where ingestion will start. This property can take only 2 valid values - `latest` and `earliest`                                                                                                                                                                                                             |
+| key.deserializer | `org.apache.kafka.common.serialization.StringDeserializer` | Any message in kafka is ultimately a key value pair in the form of serialized bytes. This property defines the deserializer to deserialize the key of a message.                                                                                                                                                                                                                                                                                                                                           |
+| value.deserializer | `io.confluent.kafka.serializers.KafkaAvroDeserializer`     | This property defines the class which will be used for deserializing the values present in kafka topic.                                                                                                                                                                                                                                                                                                                                                                                                    |

Review comment:
       yes




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@carbondata.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4243: [CARBONDATA-4308]: added docs for streamer tool configs

Posted by GitBox <gi...@apache.org>.

CarbonDataQA2 commented on pull request #4243:
URL: https://github.com/apache/carbondata/pull/4243#issuecomment-991614577


   Build Failed  with Spark 3.1, Please check CI http://121.244.95.60:12602/job/ApacheCarbon_PR_Builder_3.1/551/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@carbondata.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4243: [CARBONDATA-4308]: added docs for streamer tool configs

Posted by GitBox <gi...@apache.org>.

CarbonDataQA2 commented on pull request #4243:
URL: https://github.com/apache/carbondata/pull/4243#issuecomment-997388158


   Build Failed  with Spark 2.4.5, Please check CI http://121.244.95.60:12602/job/ApacheCarbon_PR_Builder_2.4.5/4427/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@carbondata.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4243: [CARBONDATA-4308]: added docs for streamer tool configs

Posted by GitBox <gi...@apache.org>.

CarbonDataQA2 commented on pull request #4243:
URL: https://github.com/apache/carbondata/pull/4243#issuecomment-997377506


   Build Failed  with Spark 3.1, Please check CI http://121.244.95.60:12602/job/ApacheCarbon_PR_Builder_3.1/560/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@carbondata.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4243: [CARBONDATA-4308]: added docs for streamer tool configs

Posted by GitBox <gi...@apache.org>.

CarbonDataQA2 commented on pull request #4243:
URL: https://github.com/apache/carbondata/pull/4243#issuecomment-1001118920


   Build Success with Spark 3.1, Please check CI http://121.244.95.60:12602/job/ApacheCarbon_PR_Builder_3.1/577/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@carbondata.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4243: [CARBONDATA-4308]: added docs for streamer tool configs

Posted by GitBox <gi...@apache.org>.

CarbonDataQA2 commented on pull request #4243:
URL: https://github.com/apache/carbondata/pull/4243#issuecomment-1001380833


   Build Failed  with Spark 2.3.4, Please check CI http://121.244.95.60:12602/job/ApacheCarbonPRBuilder2.3/6187/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@carbondata.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [carbondata] brijoobopanna commented on pull request #4243: [CARBONDATA-4308]: added docs for streamer tool configs

Posted by GitBox <gi...@apache.org>.

brijoobopanna commented on pull request #4243:
URL: https://github.com/apache/carbondata/pull/4243#issuecomment-998691004


   retest this please
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@carbondata.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [carbondata] pratyakshsharma commented on a change in pull request #4243: [CARBONDATA-4308]: added docs for streamer tool configs

Posted by GitBox <gi...@apache.org>.

pratyakshsharma commented on a change in pull request #4243:
URL: https://github.com/apache/carbondata/pull/4243#discussion_r769919929



##########
File path: docs/scd-and-cdc-guide.md
##########
@@ -131,4 +131,88 @@ clauses can have at most one UPDATE and one DELETE action, These clauses have th
 
 * Please refer example class [MergeTestCase](https://github.com/apache/carbondata/blob/master/integration/spark/src/test/scala/org/apache/carbondata/spark/testsuite/merge/MergeTestCase.scala) to understand and implement scd and cdc scenarios using APIs.
 * Please refer example class [DataMergeIntoExample](https://github.com/apache/carbondata/blob/master/examples/spark/src/main/scala/org/apache/carbondata/examples/DataMergeIntoExample.scala) to understand and implement scd and cdc scenarios using sql. 
-* Please refer example class [DataUPSERTExample](https://github.com/apache/carbondata/blob/master/examples/spark/src/main/scala/org/apache/carbondata/examples/DataUPSERTExample.scala) to understand and implement cdc using UPSERT APIs.
\ No newline at end of file
+* Please refer example class [DataUPSERTExample](https://github.com/apache/carbondata/blob/master/examples/spark/src/main/scala/org/apache/carbondata/examples/DataUPSERTExample.scala) to understand and implement cdc using UPSERT APIs.
+
+### Streamer Tool
+
+Carbondata streamer tool is a very powerful tool for incrementally capturing change events from varied sources like kafka or DFS and merging them into target carbondata table. This essentially means one needs to integrate with external solutions like Debezium or Maxwell for moving the change events to kafka, if one wishes to capture changes from primary databases like mysql. The tool currently requires incoming data to be present in avro format and incoming schema to evolve in backwards compatible way.
+
+Below is a high level architecture of how the overall pipeline looks like -
+
+![Carbondata streamer tool pipeline](../docs/images/carbondata-streamer-tool-pipeline.png?raw=true)
+
+#### Configs
+
+Streamer tool exposes below configs for users to cater to their CDC use cases - 
+
+| Parameter                         | Default Value                                              | Description                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                |
+|-----------------------------------|------------------------------------------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| carbon.streamer.target.database   | (none)                                                     | The database name where the target table is present to merge the incoming data. If not given by user, system will take the current database in the spark session.                                                                                                                                                                                                                                                                                                                                          |
+| carbon.streamer.target.table      | (none)                                                     | The target carbondata table where the data has to be merged. If this is not configured by user, the operation will fail.                                                                                                                                                                                                                                                                                                                                                                                   |
+| carbon.streamer.source.type       | kafka                                                      | Streamer tool currently supports two types of data sources. One can ingest data from either kafka or DFS into target carbondata table using streamer tool.                                                                                                                                                                                                                                                                                                                                        |
+| carbon.streamer.dfs.input.path    | (none)                                                     | An absolute path on a given file system from where data needs to be read to ingest into the target carbondata table. Mandatory if the ingestion source type is DFS.                                                                                                                                                                                                                                                                                                                                        |
+| schema.registry.url               | (none)                                                     | Streamer tool supports 2 different ways to supply schema of incoming data. Schemas can be supplied using avro files (file based schema provider) or using schema registry. This property defines the url to connect to in case schema registry is used as the schema source.                                                                                                                                                                                                                               |
+| carbon.streamer.input.kafka.topic | (none)                                                     | This is a mandatory property to be set in case kafka is chosen as the source of data. This property defines the topics from where streamer tool will consume the data.                                                                                                                                                                                                                                                                                                                                     |
+| bootstrap.servers                 | (none)                                                     | This is another mandatory property in case kafka is chosen as the source of data. This defines the end points for kafka brokers.                                                                                                                                                                                                                                                                                                                                                                           |
+| auto.offset.reset | earliest                                                   | Streamer tool maintains checkpoints to keep a track of the incoming messages which are already consumed. In case of first ingestion using kafka source, this property defines the offset from where ingestion will start. This property can take only 2 valid values - `latest` and `earliest`                                                                                                                                                                                                             |
+| key.deserializer | `org.apache.kafka.common.serialization.StringDeserializer` | Any message in kafka is ultimately a key value pair in the form of serialized bytes. This property defines the deserializer to deserialize the key of a message.                                                                                                                                                                                                                                                                                                                                           |
+| value.deserializer | `io.confluent.kafka.serializers.KafkaAvroDeserializer`     | This property defines the class which will be used for deserializing the values present in kafka topic.                                                                                                                                                                                                                                                                                                                                                                                                    |

Review comment:
       So we do not want to keep it configurable as of now from user's point of view?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@carbondata.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [carbondata] pratyakshsharma commented on a change in pull request #4243: [CARBONDATA-4308]: added docs for streamer tool configs

Posted by GitBox <gi...@apache.org>.

pratyakshsharma commented on a change in pull request #4243:
URL: https://github.com/apache/carbondata/pull/4243#discussion_r769919929



##########
File path: docs/scd-and-cdc-guide.md
##########
@@ -131,4 +131,88 @@ clauses can have at most one UPDATE and one DELETE action, These clauses have th
 
 * Please refer example class [MergeTestCase](https://github.com/apache/carbondata/blob/master/integration/spark/src/test/scala/org/apache/carbondata/spark/testsuite/merge/MergeTestCase.scala) to understand and implement scd and cdc scenarios using APIs.
 * Please refer example class [DataMergeIntoExample](https://github.com/apache/carbondata/blob/master/examples/spark/src/main/scala/org/apache/carbondata/examples/DataMergeIntoExample.scala) to understand and implement scd and cdc scenarios using sql. 
-* Please refer example class [DataUPSERTExample](https://github.com/apache/carbondata/blob/master/examples/spark/src/main/scala/org/apache/carbondata/examples/DataUPSERTExample.scala) to understand and implement cdc using UPSERT APIs.
\ No newline at end of file
+* Please refer example class [DataUPSERTExample](https://github.com/apache/carbondata/blob/master/examples/spark/src/main/scala/org/apache/carbondata/examples/DataUPSERTExample.scala) to understand and implement cdc using UPSERT APIs.
+
+### Streamer Tool
+
+Carbondata streamer tool is a very powerful tool for incrementally capturing change events from varied sources like kafka or DFS and merging them into target carbondata table. This essentially means one needs to integrate with external solutions like Debezium or Maxwell for moving the change events to kafka, if one wishes to capture changes from primary databases like mysql. The tool currently requires incoming data to be present in avro format and incoming schema to evolve in backwards compatible way.
+
+Below is a high level architecture of how the overall pipeline looks like -
+
+![Carbondata streamer tool pipeline](../docs/images/carbondata-streamer-tool-pipeline.png?raw=true)
+
+#### Configs
+
+Streamer tool exposes below configs for users to cater to their CDC use cases - 
+
+| Parameter                         | Default Value                                              | Description                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                |
+|-----------------------------------|------------------------------------------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| carbon.streamer.target.database   | (none)                                                     | The database name where the target table is present to merge the incoming data. If not given by user, system will take the current database in the spark session.                                                                                                                                                                                                                                                                                                                                          |
+| carbon.streamer.target.table      | (none)                                                     | The target carbondata table where the data has to be merged. If this is not configured by user, the operation will fail.                                                                                                                                                                                                                                                                                                                                                                                   |
+| carbon.streamer.source.type       | kafka                                                      | Streamer tool currently supports two types of data sources. One can ingest data from either kafka or DFS into target carbondata table using streamer tool.                                                                                                                                                                                                                                                                                                                                        |
+| carbon.streamer.dfs.input.path    | (none)                                                     | An absolute path on a given file system from where data needs to be read to ingest into the target carbondata table. Mandatory if the ingestion source type is DFS.                                                                                                                                                                                                                                                                                                                                        |
+| schema.registry.url               | (none)                                                     | Streamer tool supports 2 different ways to supply schema of incoming data. Schemas can be supplied using avro files (file based schema provider) or using schema registry. This property defines the url to connect to in case schema registry is used as the schema source.                                                                                                                                                                                                                               |
+| carbon.streamer.input.kafka.topic | (none)                                                     | This is a mandatory property to be set in case kafka is chosen as the source of data. This property defines the topics from where streamer tool will consume the data.                                                                                                                                                                                                                                                                                                                                     |
+| bootstrap.servers                 | (none)                                                     | This is another mandatory property in case kafka is chosen as the source of data. This defines the end points for kafka brokers.                                                                                                                                                                                                                                                                                                                                                                           |
+| auto.offset.reset | earliest                                                   | Streamer tool maintains checkpoints to keep a track of the incoming messages which are already consumed. In case of first ingestion using kafka source, this property defines the offset from where ingestion will start. This property can take only 2 valid values - `latest` and `earliest`                                                                                                                                                                                                             |
+| key.deserializer | `org.apache.kafka.common.serialization.StringDeserializer` | Any message in kafka is ultimately a key value pair in the form of serialized bytes. This property defines the deserializer to deserialize the key of a message.                                                                                                                                                                                                                                                                                                                                           |
+| value.deserializer | `io.confluent.kafka.serializers.KafkaAvroDeserializer`     | This property defines the class which will be used for deserializing the values present in kafka topic.                                                                                                                                                                                                                                                                                                                                                                                                    |

Review comment:
       So we do not want to keep it configurable as of now?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@carbondata.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4243: [CARBONDATA-4308]: added docs for streamer tool configs

Posted by GitBox <gi...@apache.org>.

CarbonDataQA2 commented on pull request #4243:
URL: https://github.com/apache/carbondata/pull/4243#issuecomment-997378756


   Build Failed  with Spark 2.4.5, Please check CI http://121.244.95.60:12602/job/ApacheCarbon_PR_Builder_2.4.5/4426/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@carbondata.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [carbondata] pratyakshsharma commented on pull request #4243: [CARBONDATA-4308]: added docs for streamer tool configs

Posted by GitBox <gi...@apache.org>.

pratyakshsharma commented on pull request #4243:
URL: https://github.com/apache/carbondata/pull/4243#issuecomment-997383843


   @ydvpankaj99 can you please check why this error keeps on coming? Can we do something to make CI more stable?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@carbondata.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [carbondata] pratyakshsharma commented on pull request #4243: [CARBONDATA-4308]: added docs for streamer tool configs

Posted by GitBox <gi...@apache.org>.

pratyakshsharma commented on pull request #4243:
URL: https://github.com/apache/carbondata/pull/4243#issuecomment-997383867


   retest this please


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@carbondata.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4243: [CARBONDATA-4308]: added docs for streamer tool configs

Posted by GitBox <gi...@apache.org>.

CarbonDataQA2 commented on pull request #4243:
URL: https://github.com/apache/carbondata/pull/4243#issuecomment-1001135110


   Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12602/job/ApacheCarbonPRBuilder2.3/6186/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@carbondata.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4243: [CARBONDATA-4308]: added docs for streamer tool configs

Posted by GitBox <gi...@apache.org>.

CarbonDataQA2 commented on pull request #4243:
URL: https://github.com/apache/carbondata/pull/4243#issuecomment-999842371


   Build Failed  with Spark 2.4.5, Please check CI http://121.244.95.60:12602/job/ApacheCarbon_PR_Builder_2.4.5/4438/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@carbondata.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4243: [CARBONDATA-4308]: added docs for streamer tool configs

Posted by GitBox <gi...@apache.org>.

CarbonDataQA2 commented on pull request #4243:
URL: https://github.com/apache/carbondata/pull/4243#issuecomment-1002045309


   Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12602/job/ApacheCarbonPRBuilder2.3/6190/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@carbondata.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [carbondata] pratyakshsharma commented on pull request #4243: [CARBONDATA-4308]: added docs for streamer tool configs

Posted by GitBox <gi...@apache.org>.

pratyakshsharma commented on pull request #4243:
URL: https://github.com/apache/carbondata/pull/4243#issuecomment-991936752


   retest this please


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@carbondata.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4243: [CARBONDATA-4308]: added docs for streamer tool configs

Posted by GitBox <gi...@apache.org>.

CarbonDataQA2 commented on pull request #4243:
URL: https://github.com/apache/carbondata/pull/4243#issuecomment-991967054


   Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12602/job/ApacheCarbonPRBuilder2.3/6161/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@carbondata.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [carbondata] pratyakshsharma commented on pull request #4243: [CARBONDATA-4308]: added docs for streamer tool configs

Posted by GitBox <gi...@apache.org>.

pratyakshsharma commented on pull request #4243:
URL: https://github.com/apache/carbondata/pull/4243#issuecomment-991566753


   retest this please


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@carbondata.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [carbondata] pratyakshsharma commented on a change in pull request #4243: [CARBONDATA-4308]: added docs for streamer tool configs

Posted by GitBox <gi...@apache.org>.

pratyakshsharma commented on a change in pull request #4243:
URL: https://github.com/apache/carbondata/pull/4243#discussion_r769867806



##########
File path: docs/scd-and-cdc-guide.md
##########
@@ -131,4 +131,88 @@ clauses can have at most one UPDATE and one DELETE action, These clauses have th
 
 * Please refer example class [MergeTestCase](https://github.com/apache/carbondata/blob/master/integration/spark/src/test/scala/org/apache/carbondata/spark/testsuite/merge/MergeTestCase.scala) to understand and implement scd and cdc scenarios using APIs.
 * Please refer example class [DataMergeIntoExample](https://github.com/apache/carbondata/blob/master/examples/spark/src/main/scala/org/apache/carbondata/examples/DataMergeIntoExample.scala) to understand and implement scd and cdc scenarios using sql. 
-* Please refer example class [DataUPSERTExample](https://github.com/apache/carbondata/blob/master/examples/spark/src/main/scala/org/apache/carbondata/examples/DataUPSERTExample.scala) to understand and implement cdc using UPSERT APIs.
\ No newline at end of file
+* Please refer example class [DataUPSERTExample](https://github.com/apache/carbondata/blob/master/examples/spark/src/main/scala/org/apache/carbondata/examples/DataUPSERTExample.scala) to understand and implement cdc using UPSERT APIs.
+
+### Streamer Tool
+
+Carbondata streamer tool is a very powerful tool for incrementally capturing change events from varied sources like kafka or DFS and merging them into target carbondata table. This essentially means one needs to integrate with external solutions like Debezium or Maxwell for moving the change events to kafka, if one wishes to capture changes from primary databases like mysql. The tool currently requires incoming data to be present in avro format and incoming schema to evolve in backwards compatible way.
+
+Below is a high level architecture of how the overall pipeline looks like -
+
+![Carbondata streamer tool pipeline](../docs/images/carbondata-streamer-tool-pipeline.png?raw=true)
+
+#### Configs
+
+Streamer tool exposes below configs for users to cater to their CDC use cases - 
+
+| Parameter                         | Default Value                                              | Description                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                |
+|-----------------------------------|------------------------------------------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| carbon.streamer.target.database   | (none)                                                     | The database name where the target table is present to merge the incoming data. If not given by user, system will take the current database in the spark session.                                                                                                                                                                                                                                                                                                                                          |
+| carbon.streamer.target.table      | (none)                                                     | The target carbondata table where the data has to be merged. If this is not configured by user, the operation will fail.                                                                                                                                                                                                                                                                                                                                                                                   |
+| carbon.streamer.source.type       | kafka                                                      | Streamer tool currently supports two types of data sources. One can ingest data from either kafka or DFS into target carbondata table using streamer tool.                                                                                                                                                                                                                                                                                                                                        |
+| carbon.streamer.dfs.input.path    | (none)                                                     | An absolute path on a given file system from where data needs to be read to ingest into the target carbondata table. Mandatory if the ingestion source type is DFS.                                                                                                                                                                                                                                                                                                                                        |
+| schema.registry.url               | (none)                                                     | Streamer tool supports 2 different ways to supply schema of incoming data. Schemas can be supplied using avro files (file based schema provider) or using schema registry. This property defines the url to connect to in case schema registry is used as the schema source.                                                                                                                                                                                                                               |
+| carbon.streamer.input.kafka.topic | (none)                                                     | This is a mandatory property to be set in case kafka is chosen as the source of data. This property defines the topics from where streamer tool will consume the data.                                                                                                                                                                                                                                                                                                                                     |
+| bootstrap.servers                 | (none)                                                     | This is another mandatory property in case kafka is chosen as the source of data. This defines the end points for kafka brokers.                                                                                                                                                                                                                                                                                                                                                                           |
+| auto.offset.reset | earliest                                                   | Streamer tool maintains checkpoints to keep a track of the incoming messages which are already consumed. In case of first ingestion using kafka source, this property defines the offset from where ingestion will start. This property can take only 2 valid values - `latest` and `earliest`                                                                                                                                                                                                             |
+| key.deserializer | `org.apache.kafka.common.serialization.StringDeserializer` | Any message in kafka is ultimately a key value pair in the form of serialized bytes. This property defines the deserializer to deserialize the key of a message.                                                                                                                                                                                                                                                                                                                                           |
+| value.deserializer | `io.confluent.kafka.serializers.KafkaAvroDeserializer`     | This property defines the class which will be used for deserializing the values present in kafka topic.                                                                                                                                                                                                                                                                                                                                                                                                    |
+| enable.auto.commit | false                                                      | Kafka maintains an internal topic for storing offsets corresponding to the consumer groups. This property determines if kafka should actually go forward and commit the offsets consumed in this internal topic. We recommend to keep it as false since we use spark streaming checkpointing to take care of the same.                                                                                                                                                                                     |
+| group.id | (none)                                                     | Streamer tool is ultimately a consumer for kafka. This property determines the consumer group id streamer tool belongs to.                                                                                                                                                                                                                                                                                                                                                                                 |
+| carbon.streamer.input.payload.format | avro                                                       | This determines the format of the incoming messages from source. Currently only avro is supported. We have plans to extend this support to json as well in near future. Avro is the most preferred format for CDC use cases since it helps in making the message size very compact and has good support for schema evolution use cases as well.                                                                                                                                                            |
+| carbon.streamer.schema.provider | SchemaRegistry                                             | As discussed earlier, streamer tool supports 2 ways of supplying schema for incoming messages - schema registry and avro files. Confluent schema registry is the preferred way when using avro as the input format.                                                                                                                                                                                                                                                                                        |
+| carbon.streamer.source.schema.path | (none)                                                     | This property defines the absolute path where files containing schemas for incoming messages are present.                                                                                                                                                                                                                                                                                                                                                                                                  |
+| carbon.streamer.merge.operation.type | upsert                                                     | This defines the operation that needs to be performed on the incoming batch of data while writing it to target data set.                                                                                                                                                                                                                                                                                                                                                                                   |
+| carbon.streamer.merge.operation.field | (none)                                                     | This property defines the field in incoming schema which contains the type of operation performed at source. For example, Debezium includes a field called `op` when reading change events from primary database. Do not confuse this property with `carbon.streamer.merge.operation.type` which defines the operation to be performed on the incoming batch of data. However this property is needed so that streamer tool is able to identify rows deleted at source when the operation type is `upsert`. |
+| carbon.streamer.record.key.field | (none)                                                     | This defines the record key for a particular incoming record. This is used by the streamer tool for performing deduplication. In case this is not defined, operation will fail.                                                                                                                                                                                                                                                                                                                            |
+| carbon.streamer.batch.interval | 10                                                         | Minimum batch interval time between 2 continuous ingestion in continuous mode. Should be specified in seconds.                                                                                                                                                                                                                                                                                                                                                                                             |
+| carbon.streamer.source.ordering.field | <none>                                                     | Name of the field from source schema whose value can be used for picking the latest updates for a particular record in the incoming batch in case of multiple updates for the same record key. Useful if the write operation type is UPDATE or UPSERT. This will be used only if `carbon.streamer.upsert.deduplicate` is enabled.                                                                                                                                                                          |
+| carbon.streamer.insert.deduplicate | false                                                      | This property specifies if the incoming batch needs to be deduplicated in case of INSERT operation type. If set to true, the incoming batch will be deduplicated against the existing data in the target carbondata table.                                                                                                                                                                                                                                                                                 |
+| carbon.streamer.upsert.deduplicate | true                                                       | This property specifies if the incoming batch needs to be deduplicated (when multiple updates for the same record key are present in the incoming batch) in case of UPSERT/UPDATE operation type. If set to true, the user needs to provide proper value for the source ordering field as well.                                                                                                                                                                                                            |
+| carbon.streamer.meta.columns | (none)                                                     | Generally when performing CDC operations on primary databases, few metadata columns are added along with the actual columns for book keeping purposes. This property enables users to list down all such metadata fields (comma separated) which should not be merged with the target carboondata table.                                                                                                                                                                                                   |
+| carbon.enable.schema.enforcement | true                                                       | This flag decides if table schema needs to change as per the incoming batch schema. If set to true, incoming schema will be validated with existing table schema. If the schema has evolved, the incoming batch cannot be ingested and job will simply fail.                                                                                                                                                                                                                                               |
+
+#### Commands
+
+1. For kafka source - 
+
+```
+bin/spark-submit --class org.apache.carbondata.streamer.CarbonDataStreamer \
+--master spark://root1-ThinkPad-T490s:7077 \
+jars/apache-carbondata-2.3.0-SNAPSHOT-bin-spark2.4.5-hadoop2.7.2.jar \

Review comment:
       my bad, missed it completely. Thank you for pointing this out. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@carbondata.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [carbondata] kunal642 commented on pull request #4243: [CARBONDATA-4308]: added docs for streamer tool configs

Posted by GitBox <gi...@apache.org>.

kunal642 commented on pull request #4243:
URL: https://github.com/apache/carbondata/pull/4243#issuecomment-997661243


   retest this please


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@carbondata.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4243: [CARBONDATA-4308]: added docs for streamer tool configs

Posted by GitBox <gi...@apache.org>.

CarbonDataQA2 commented on pull request #4243:
URL: https://github.com/apache/carbondata/pull/4243#issuecomment-997767753


   Build Failed  with Spark 2.4.5, Please check CI http://121.244.95.60:12602/job/ApacheCarbon_PR_Builder_2.4.5/4429/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@carbondata.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [carbondata] akashrn5 commented on pull request #4243: [CARBONDATA-4308]: added docs for streamer tool configs

Posted by GitBox <gi...@apache.org>.

akashrn5 commented on pull request #4243:
URL: https://github.com/apache/carbondata/pull/4243#issuecomment-1001108555


   retest this please


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@carbondata.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4243: [CARBONDATA-4308]: added docs for streamer tool configs

Posted by GitBox <gi...@apache.org>.

CarbonDataQA2 commented on pull request #4243:
URL: https://github.com/apache/carbondata/pull/4243#issuecomment-1000141602


   Build Success with Spark 3.1, Please check CI http://121.244.95.60:12602/job/ApacheCarbon_PR_Builder_3.1/574/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@carbondata.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [carbondata] akashrn5 commented on pull request #4243: [CARBONDATA-4308]: added docs for streamer tool configs

Posted by GitBox <gi...@apache.org>.

akashrn5 commented on pull request #4243:
URL: https://github.com/apache/carbondata/pull/4243#issuecomment-1000637520


   retest this please


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@carbondata.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4243: [CARBONDATA-4308]: added docs for streamer tool configs

Posted by GitBox <gi...@apache.org>.

CarbonDataQA2 commented on pull request #4243:
URL: https://github.com/apache/carbondata/pull/4243#issuecomment-1001382980


   Build Failed  with Spark 2.4.5, Please check CI http://121.244.95.60:12602/job/ApacheCarbon_PR_Builder_2.4.5/4443/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@carbondata.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [carbondata] ydvpankaj99 commented on pull request #4243: [CARBONDATA-4308]: added docs for streamer tool configs

Posted by GitBox <gi...@apache.org>.

ydvpankaj99 commented on pull request #4243:
URL: https://github.com/apache/carbondata/pull/4243#issuecomment-1001951104


   retest this please


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@carbondata.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4243: [CARBONDATA-4308]: added docs for streamer tool configs

Posted by GitBox <gi...@apache.org>.

CarbonDataQA2 commented on pull request #4243:
URL: https://github.com/apache/carbondata/pull/4243#issuecomment-994057697


   Build Success with Spark 3.1, Please check CI http://121.244.95.60:12602/job/ApacheCarbon_PR_Builder_3.1/556/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@carbondata.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4243: [CARBONDATA-4308]: added docs for streamer tool configs

Posted by GitBox <gi...@apache.org>.

CarbonDataQA2 commented on pull request #4243:
URL: https://github.com/apache/carbondata/pull/4243#issuecomment-991968793


   Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12602/job/ApacheCarbon_PR_Builder_2.4.5/4418/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@carbondata.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4243: [CARBONDATA-4308]: added docs for streamer tool configs

Posted by GitBox <gi...@apache.org>.

CarbonDataQA2 commented on pull request #4243:
URL: https://github.com/apache/carbondata/pull/4243#issuecomment-991959164


   Build Failed  with Spark 3.1, Please check CI http://121.244.95.60:12602/job/ApacheCarbon_PR_Builder_3.1/552/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@carbondata.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [carbondata] Indhumathi27 commented on a change in pull request #4243: [CARBONDATA-4308]: added docs for streamer tool configs

Posted by GitBox <gi...@apache.org>.

Indhumathi27 commented on a change in pull request #4243:
URL: https://github.com/apache/carbondata/pull/4243#discussion_r767598589



##########
File path: docs/configuration-parameters.md
##########
@@ -179,6 +179,33 @@ This section provides the details of all the configurations required for the Car
 | carbon.update.storage.level | MEMORY_AND_DISK | Storage level to persist dataset of a RDD/dataframe. Applicable when ***carbon.update.persist.enable*** is **true**, if user's executor has less memory, set this parameter to 'MEMORY_AND_DISK_SER' or other storage level to correspond to different environment. [See detail](http://spark.apache.org/docs/latest/rdd-programming-guide.html#rdd-persistence). |
 | carbon.update.check.unique.value | true | By default this property is true, so update will validate key value mapping. This validation might have slight degrade in performance of update query. If user knows that key value mapping is correct, can disable this validation for better update performance by setting this property to false. |
 
+## Streamer tool Configuration
+| Parameter                         | Default Value | Description                                                                                                                                                                                                                                                                                                                                     |
+|-----------------------------------|---------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+ | carbon.streamer.target.database   | <none> | The database name where the target table is present to merge the incoming data. If not given by user, system will take the current database in the spark session.                                                                                                                                                                               |

Review comment:
       looks like <none> is not displayed in the document. can change to (none). Please handle in other places also

##########
File path: docs/configuration-parameters.md
##########
@@ -179,6 +179,33 @@ This section provides the details of all the configurations required for the Car
 | carbon.update.storage.level | MEMORY_AND_DISK | Storage level to persist dataset of a RDD/dataframe. Applicable when ***carbon.update.persist.enable*** is **true**, if user's executor has less memory, set this parameter to 'MEMORY_AND_DISK_SER' or other storage level to correspond to different environment. [See detail](http://spark.apache.org/docs/latest/rdd-programming-guide.html#rdd-persistence). |
 | carbon.update.check.unique.value | true | By default this property is true, so update will validate key value mapping. This validation might have slight degrade in performance of update query. If user knows that key value mapping is correct, can disable this validation for better update performance by setting this property to false. |
 
+## Streamer tool Configuration
+| Parameter                         | Default Value | Description                                                                                                                                                                                                                                                                                                                                     |
+|-----------------------------------|---------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+ | carbon.streamer.target.database   | <none> | The database name where the target table is present to merge the incoming data. If not given by user, system will take the current database in the spark session.                                                                                                                                                                               |
+ | carbon.streamer.target.table      | <none> | The target carbondata table where the data has to be merged. If this is not configured by user, the operation will fail.                                                                                                                                                                                                                        |
+ | carbon.streamer.source.type       | kafka | Streamer tool currently supports 2 different types of data sources. One can ingest data from either kafka or DFS into target carbondata table using streamer tool.                                                                                                                                                                              |

Review comment:
       ```suggestion
    | carbon.streamer.source.type       | kafka | Streamer tool currently supports two types of data sources. One can ingest data from either kafka or DFS into target carbondata table using streamer tool.                                                                                                                                                                              |
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@carbondata.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4243: [CARBONDATA-4308]: added docs for streamer tool configs

Posted by GitBox <gi...@apache.org>.

CarbonDataQA2 commented on pull request #4243:
URL: https://github.com/apache/carbondata/pull/4243#issuecomment-998717489


   Build Failed  with Spark 2.4.5, Please check CI http://121.244.95.60:12602/job/ApacheCarbon_PR_Builder_2.4.5/4434/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@carbondata.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4243: [CARBONDATA-4308]: added docs for streamer tool configs

Posted by GitBox <gi...@apache.org>.

CarbonDataQA2 commented on pull request #4243:
URL: https://github.com/apache/carbondata/pull/4243#issuecomment-998513625


   Build Failed  with Spark 3.1, Please check CI http://121.244.95.60:12602/job/ApacheCarbon_PR_Builder_3.1/567/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@carbondata.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [carbondata] pratyakshsharma commented on pull request #4243: [CARBONDATA-4308]: added docs for streamer tool configs

Posted by GitBox <gi...@apache.org>.

pratyakshsharma commented on pull request #4243:
URL: https://github.com/apache/carbondata/pull/4243#issuecomment-997375302


   retest this please


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@carbondata.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4243: [CARBONDATA-4308]: added docs for streamer tool configs

Posted by GitBox <gi...@apache.org>.

CarbonDataQA2 commented on pull request #4243:
URL: https://github.com/apache/carbondata/pull/4243#issuecomment-997785562


   Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12602/job/ApacheCarbonPRBuilder2.3/6172/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@carbondata.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4243: [CARBONDATA-4308]: added docs for streamer tool configs

Posted by GitBox <gi...@apache.org>.

CarbonDataQA2 commented on pull request #4243:
URL: https://github.com/apache/carbondata/pull/4243#issuecomment-997809907


   Build Success with Spark 3.1, Please check CI http://121.244.95.60:12602/job/ApacheCarbon_PR_Builder_3.1/563/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@carbondata.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4243: [CARBONDATA-4308]: added docs for streamer tool configs

Posted by GitBox <gi...@apache.org>.

CarbonDataQA2 commented on pull request #4243:
URL: https://github.com/apache/carbondata/pull/4243#issuecomment-1000107617


   Build Failed  with Spark 2.4.5, Please check CI http://121.244.95.60:12602/job/ApacheCarbon_PR_Builder_2.4.5/4439/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@carbondata.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [carbondata] ydvpankaj99 commented on pull request #4243: [CARBONDATA-4308]: added docs for streamer tool configs

Posted by GitBox <gi...@apache.org>.

ydvpankaj99 commented on pull request #4243:
URL: https://github.com/apache/carbondata/pull/4243#issuecomment-1001870287


   retest this please


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@carbondata.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4243: [CARBONDATA-4308]: added docs for streamer tool configs

Posted by GitBox <gi...@apache.org>.

CarbonDataQA2 commented on pull request #4243:
URL: https://github.com/apache/carbondata/pull/4243#issuecomment-1000374128


   Build Success with Spark 3.1, Please check CI http://121.244.95.60:12602/job/ApacheCarbon_PR_Builder_3.1/575/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@carbondata.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4243: [CARBONDATA-4308]: added docs for streamer tool configs

Posted by GitBox <gi...@apache.org>.

CarbonDataQA2 commented on pull request #4243:
URL: https://github.com/apache/carbondata/pull/4243#issuecomment-999908520


   Build Success with Spark 3.1, Please check CI http://121.244.95.60:12602/job/ApacheCarbon_PR_Builder_3.1/573/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@carbondata.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4243: [CARBONDATA-4308]: added docs for streamer tool configs

Posted by GitBox <gi...@apache.org>.

CarbonDataQA2 commented on pull request #4243:
URL: https://github.com/apache/carbondata/pull/4243#issuecomment-995186200


   Build Failed  with Spark 2.4.5, Please check CI http://121.244.95.60:12602/job/ApacheCarbon_PR_Builder_2.4.5/4423/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@carbondata.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4243: [CARBONDATA-4308]: added docs for streamer tool configs

Posted by GitBox <gi...@apache.org>.

CarbonDataQA2 commented on pull request #4243:
URL: https://github.com/apache/carbondata/pull/4243#issuecomment-996591621


   Build Failed  with Spark 3.1, Please check CI http://121.244.95.60:12602/job/ApacheCarbon_PR_Builder_3.1/558/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@carbondata.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [carbondata] pratyakshsharma commented on pull request #4243: [CARBONDATA-4308]: added docs for streamer tool configs

Posted by GitBox <gi...@apache.org>.

pratyakshsharma commented on pull request #4243:
URL: https://github.com/apache/carbondata/pull/4243#issuecomment-996572571


   retest this please


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@carbondata.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4243: [CARBONDATA-4308]: added docs for streamer tool configs

Posted by GitBox <gi...@apache.org>.

CarbonDataQA2 commented on pull request #4243:
URL: https://github.com/apache/carbondata/pull/4243#issuecomment-997388197


   Build Failed  with Spark 3.1, Please check CI http://121.244.95.60:12602/job/ApacheCarbon_PR_Builder_3.1/561/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@carbondata.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4243: [CARBONDATA-4308]: added docs for streamer tool configs

Posted by GitBox <gi...@apache.org>.

CarbonDataQA2 commented on pull request #4243:
URL: https://github.com/apache/carbondata/pull/4243#issuecomment-1001383575


   Build Failed  with Spark 3.1, Please check CI http://121.244.95.60:12602/job/ApacheCarbon_PR_Builder_3.1/578/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@carbondata.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [carbondata] akashrn5 commented on pull request #4243: [CARBONDATA-4308]: added docs for streamer tool configs

Posted by GitBox <gi...@apache.org>.

akashrn5 commented on pull request #4243:
URL: https://github.com/apache/carbondata/pull/4243#issuecomment-999575634


   @pratyakshsharma can you please fix the compilation issues


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@carbondata.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4243: [CARBONDATA-4308]: added docs for streamer tool configs

Posted by GitBox <gi...@apache.org>.

CarbonDataQA2 commented on pull request #4243:
URL: https://github.com/apache/carbondata/pull/4243#issuecomment-999891555


   Build Failed  with Spark 2.3.4, Please check CI http://121.244.95.60:12602/job/ApacheCarbonPRBuilder2.3/6182/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@carbondata.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [carbondata] akashrn5 commented on pull request #4243: [CARBONDATA-4308]: added docs for streamer tool configs

Posted by GitBox <gi...@apache.org>.

akashrn5 commented on pull request #4243:
URL: https://github.com/apache/carbondata/pull/4243#issuecomment-1000269198


   retest this please


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@carbondata.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4243: [CARBONDATA-4308]: added docs for streamer tool configs

Posted by GitBox <gi...@apache.org>.

CarbonDataQA2 commented on pull request #4243:
URL: https://github.com/apache/carbondata/pull/4243#issuecomment-1000356245


   Build Failed  with Spark 2.3.4, Please check CI http://121.244.95.60:12602/job/ApacheCarbonPRBuilder2.3/6184/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@carbondata.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [carbondata] pratyakshsharma commented on pull request #4243: [CARBONDATA-4308]: added docs for streamer tool configs

Posted by GitBox <gi...@apache.org>.

pratyakshsharma commented on pull request #4243:
URL: https://github.com/apache/carbondata/pull/4243#issuecomment-999805561


   @akashrn5 Fixed. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@carbondata.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4243: [CARBONDATA-4308]: added docs for streamer tool configs

Posted by GitBox <gi...@apache.org>.

CarbonDataQA2 commented on pull request #4243:
URL: https://github.com/apache/carbondata/pull/4243#issuecomment-991136565


   Build Success with Spark 3.1, Please check CI http://121.244.95.60:12602/job/ApacheCarbon_PR_Builder_3.1/550/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@carbondata.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org