You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@druid.apache.org by GitBox <gi...@apache.org> on 2021/06/14 07:15:07 UTC

[GitHub] [druid] bananaaggle opened a new pull request #11360: Add thrift input format

bananaaggle opened a new pull request #11360:
URL: https://github.com/apache/druid/pull/11360


   Because of deprecated of parseSpec, I develop ThriftInputFormat for new interface, which supports stream ingestion for data encoded by Thrift.
   
   This PR has:
   - [ ] been self-reviewed.
      - [ ] using the [concurrency checklist](https://github.com/apache/druid/blob/master/dev/code-review/concurrency.md) (Remove this item if the PR doesn't have any relation to concurrency.)
   - [ ] added documentation for new or modified features or behaviors.
   - [ ] added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
   - [ ] added or updated version, license, or notice information in [licenses.yaml](https://github.com/apache/druid/blob/master/dev/license.md)
   - [ ] added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
   - [ ] added unit tests or modified existing tests to cover new code paths, ensuring the threshold for [code coverage](https://github.com/apache/druid/blob/master/dev/code-review/code-coverage.md) is met.
   - [ ] added integration tests.
   - [ ] been tested in a test Druid cluster.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] bananaaggle commented on pull request #11360: Add thrift input format

Posted by GitBox <gi...@apache.org>.
bananaaggle commented on pull request #11360:
URL: https://github.com/apache/druid/pull/11360#issuecomment-892599164


   Hi, @clintropolis! I think thrift is the last extension which use `parser`. When this `inputformat` finished, we can remove parser's implementations from code and fix all document about it. Do you think we should open an issue for it?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] bananaaggle commented on pull request #11360: Add thrift input format

Posted by GitBox <gi...@apache.org>.
bananaaggle commented on pull request #11360:
URL: https://github.com/apache/druid/pull/11360#issuecomment-892599164


   Hi, @clintropolis! I think thrift is the last extension which use `parser`. When this `inputformat` finished, we can remove parser's implementations from code and fix all document about it. Do you think we should open an issue for it?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] clintropolis commented on a change in pull request #11360: Add thrift input format

Posted by GitBox <gi...@apache.org>.
clintropolis commented on a change in pull request #11360:
URL: https://github.com/apache/druid/pull/11360#discussion_r656090144



##########
File path: docs/ingestion/data-formats.md
##########
@@ -223,6 +223,41 @@ The Parquet `inputFormat` has the following components:
 |flattenSpec| JSON Object |Define a [`flattenSpec`](#flattenspec) to extract nested values from a Parquet file. Note that only 'path' expression are supported ('jq' is unavailable).| no (default will auto-discover 'root' level properties) |
 | binaryAsString | Boolean | Specifies if the bytes parquet column which is not logically marked as a string or enum type should be treated as a UTF-8 encoded string. | no (default = false) |
 
+### Thrift Stream

Review comment:
       hmm, I'm not sure we have any other 'contrib' extensions described in this section, it might be best if this lives in https://github.com/apache/druid/blob/master/docs/development/extensions-contrib/thrift.md for now. On the other hand, thrift i think is the only data format that isn't a core extension (maybe in the future we should just consider adding integration tests and making it a core extension?), so maybe it is ok to be here. @techdocsmith do you have any thoughts?
   
   Also, looking closer at the code, I guess this might also work with batch ingestion too since the deserializer detects the format based on the bytes given to it, though I haven't personally used this extension or tested this scenario. I'll see if I can find some time to pull your branch and test it out

##########
File path: extensions-contrib/thrift-extensions/pom.xml
##########
@@ -141,6 +141,36 @@
       <artifactId>hamcrest-core</artifactId>
       <scope>test</scope>
     </dependency>
+    <dependency>
+      <groupId>com.google.code.findbugs</groupId>
+      <artifactId>jsr305</artifactId>
+      <version>2.0.1</version>
+      <scope>provided</scope>
+    </dependency>
+    <dependency>
+      <groupId>commons-io</groupId>
+      <artifactId>commons-io</artifactId>
+      <version>2.9.0</version>
+      <scope>provided</scope>
+    </dependency>
+    <dependency>
+      <groupId>joda-time</groupId>
+      <artifactId>joda-time</artifactId>
+      <version>2.10.5</version>
+      <scope>provided</scope>
+    </dependency>
+    <dependency>
+      <groupId>com.fasterxml.jackson.core</groupId>
+      <artifactId>jackson-core</artifactId>
+      <version>2.10.2</version>

Review comment:
       i think versions on a lot of these should be already defined in the top level pom (the dependency checker in travis sometimes suggests more than is necessary to fix the issue)




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] techdocsmith commented on a change in pull request #11360: Add thrift input format

Posted by GitBox <gi...@apache.org>.
techdocsmith commented on a change in pull request #11360:
URL: https://github.com/apache/druid/pull/11360#discussion_r710360863



##########
File path: docs/ingestion/data-formats.md
##########
@@ -223,6 +223,41 @@ The Parquet `inputFormat` has the following components:
 |flattenSpec| JSON Object |Define a [`flattenSpec`](#flattenspec) to extract nested values from a Parquet file. Note that only 'path' expression are supported ('jq' is unavailable).| no (default will auto-discover 'root' level properties) |
 | binaryAsString | Boolean | Specifies if the bytes parquet column which is not logically marked as a string or enum type should be treated as a UTF-8 encoded string. | no (default = false) |
 
+### Thrift Stream

Review comment:
       +1 for @clintropolis suggestion to keep the doc in /docs/development/extensions-contrib/thrift.md until the extension is made core. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org