You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pinot.apache.org by "cbalci (via GitHub)" <gi...@apache.org> on 2023/02/23 01:40:36 UTC

[GitHub] [pinot] cbalci opened a new pull request, #10321: Refactor Spark Connector into two modules for reusability

cbalci opened a new pull request, #10321:
URL: https://github.com/apache/pinot/pull/10321

   This is the first of a two PR changes which will add Spark3 support for 'pinot-spark-connector'
   
   **Background**
   Apache Spark has [changed](https://blog.madhukaraphatak.com/spark-3-datasource-v2-part-3) the Datasource interface significantly between Spark2 and Spark3 so `pinot-spark-connector` doesn't work for Spark3. We can implement a new connector for Spark3 as a separate module, however about half of the logic/code under the existing connector is independent of the interface and can potentially be reused across Spark2 and Spark3 connectors. For this, I'm restructuring the packages similar to what was done for batch ingestion in https://github.com/apache/pinot/pull/8560.
   
   **Change**
   In this PR, I'm refactoring Spark Connector into two packages as:
   `pinot-spark-connector` --> ( `pinot-spark-common` +  `pinot-spark-2-connector` )
   
   This is mostly a mechanical refactoring which moves packages around and renames fields/classes for clarity. Only addition is the backported (from Spark) `CaseInsensitiveStringMap` to make `PinotDataSourceReadOptions` reusable across implementations (see comment below).
   
   **Testing**
   Usage and functionality of the Spark2 connector should be completely unchanged except for the renaming of the maven module. All the unit tests are preseved to ensure previous assumptions. I also ran the integration tests under `ExampleSparkPinotConnectorTest` to verify expected behavior.
   
   
   To preview the full changes including the Spark3 Connector implementation you can check [this diff](https://github.com/apache/pinot/compare/master...cbalci:pinot:pinot-spark3-connector#diff-8733e0d7481c08afa1005dfcebcf93aae07cfdc996c82f809bef22fcb2061e0eR41).
   
   `refactor` `cleanup`
   `release-notes` ('Pinot Spark Connector' module is renamed to 'Pinot Spark 2 Connector' for clarity)
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] KKcorps merged pull request #10321: Refactor Spark Connector into two modules for reusability

Posted by "KKcorps (via GitHub)" <gi...@apache.org>.
KKcorps merged PR #10321:
URL: https://github.com/apache/pinot/pull/10321


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] codecov-commenter commented on pull request #10321: Refactor Spark Connector into two modules for reusability

Posted by "codecov-commenter (via GitHub)" <gi...@apache.org>.
codecov-commenter commented on PR #10321:
URL: https://github.com/apache/pinot/pull/10321#issuecomment-1441136929

   # [Codecov](https://codecov.io/gh/apache/pinot/pull/10321?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
   > Merging [#10321](https://codecov.io/gh/apache/pinot/pull/10321?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (94be858) into [master](https://codecov.io/gh/apache/pinot/commit/0f2d51cc80e2bdfcd73bbf8cc75a1b2ab214e722?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (0f2d51c) will **decrease** coverage by `56.60%`.
   > The diff coverage is `34.28%`.
   
   ```diff
   @@              Coverage Diff              @@
   ##             master   #10321       +/-   ##
   =============================================
   - Coverage     70.42%   13.82%   -56.60%     
   + Complexity     5103      243     -4860     
   =============================================
     Files          2017     1986       -31     
     Lines        109181   107846     -1335     
     Branches      16602    16447      -155     
   =============================================
   - Hits          76887    14912    -61975     
   - Misses        26901    91737    +64836     
   + Partials       5393     1197     -4196     
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | integration1 | `?` | |
   | integration2 | `?` | |
   | unittests1 | `?` | |
   | unittests2 | `13.82% <34.28%> (+0.12%)` | :arrow_up: |
   
   Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment) to find out more.
   
   | [Impacted Files](https://codecov.io/gh/apache/pinot/pull/10321?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | |
   |---|---|---|
   | [...pache/pinot/connector/spark/common/HttpUtils.scala](https://codecov.io/gh/apache/pinot/pull/10321?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29ubmVjdG9ycy9waW5vdC1zcGFyay1jb21tb24vc3JjL21haW4vc2NhbGEvb3JnL2FwYWNoZS9waW5vdC9jb25uZWN0b3Ivc3BhcmsvY29tbW9uL0h0dHBVdGlscy5zY2FsYQ==) | `0.00% <ø> (ø)` | |
   | [.../apache/pinot/connector/spark/common/Logging.scala](https://codecov.io/gh/apache/pinot/pull/10321?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29ubmVjdG9ycy9waW5vdC1zcGFyay1jb21tb24vc3JjL21haW4vc2NhbGEvb3JnL2FwYWNoZS9waW5vdC9jb25uZWN0b3Ivc3BhcmsvY29tbW9uL0xvZ2dpbmcuc2NhbGE=) | `6.66% <ø> (ø)` | |
   | [...ot/connector/spark/common/PinotClusterClient.scala](https://codecov.io/gh/apache/pinot/pull/10321?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29ubmVjdG9ycy9waW5vdC1zcGFyay1jb21tb24vc3JjL21haW4vc2NhbGEvb3JnL2FwYWNoZS9waW5vdC9jb25uZWN0b3Ivc3BhcmsvY29tbW9uL1Bpbm90Q2x1c3RlckNsaWVudC5zY2FsYQ==) | `5.15% <0.00%> (ø)` | |
   | [...ache/pinot/connector/spark/common/exceptions.scala](https://codecov.io/gh/apache/pinot/pull/10321?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29ubmVjdG9ycy9waW5vdC1zcGFyay1jb21tb24vc3JjL21haW4vc2NhbGEvb3JnL2FwYWNoZS9waW5vdC9jb25uZWN0b3Ivc3BhcmsvY29tbW9uL2V4Y2VwdGlvbnMuc2NhbGE=) | `20.00% <ø> (ø)` | |
   | [.../apache/pinot/connector/spark/common/package.scala](https://codecov.io/gh/apache/pinot/pull/10321?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29ubmVjdG9ycy9waW5vdC1zcGFyay1jb21tb24vc3JjL21haW4vc2NhbGEvb3JnL2FwYWNoZS9waW5vdC9jb25uZWN0b3Ivc3BhcmsvY29tbW9uL3BhY2thZ2Uuc2NhbGE=) | `0.00% <0.00%> (ø)` | |
   | [...pinot/connector/spark/common/query/ScanQuery.scala](https://codecov.io/gh/apache/pinot/pull/10321?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29ubmVjdG9ycy9waW5vdC1zcGFyay1jb21tb24vc3JjL21haW4vc2NhbGEvb3JnL2FwYWNoZS9waW5vdC9jb25uZWN0b3Ivc3BhcmsvY29tbW9uL3F1ZXJ5L1NjYW5RdWVyeS5zY2FsYQ==) | `57.14% <0.00%> (ø)` | |
   | [...k/common/reader/PinotAbstractPartitionReader.scala](https://codecov.io/gh/apache/pinot/pull/10321?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29ubmVjdG9ycy9waW5vdC1zcGFyay1jb21tb24vc3JjL21haW4vc2NhbGEvb3JnL2FwYWNoZS9waW5vdC9jb25uZWN0b3Ivc3BhcmsvY29tbW9uL3JlYWRlci9QaW5vdEFic3RyYWN0UGFydGl0aW9uUmVhZGVyLnNjYWxh) | `0.00% <0.00%> (ø)` | |
   | [...ark/common/reader/PinotGrpcServerDataFetcher.scala](https://codecov.io/gh/apache/pinot/pull/10321?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29ubmVjdG9ycy9waW5vdC1zcGFyay1jb21tb24vc3JjL21haW4vc2NhbGEvb3JnL2FwYWNoZS9waW5vdC9jb25uZWN0b3Ivc3BhcmsvY29tbW9uL3JlYWRlci9QaW5vdEdycGNTZXJ2ZXJEYXRhRmV0Y2hlci5zY2FsYQ==) | `0.00% <0.00%> (ø)` | |
   | [...r/spark/common/reader/PinotServerDataFetcher.scala](https://codecov.io/gh/apache/pinot/pull/10321?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29ubmVjdG9ycy9waW5vdC1zcGFyay1jb21tb24vc3JjL21haW4vc2NhbGEvb3JnL2FwYWNoZS9waW5vdC9jb25uZWN0b3Ivc3BhcmsvY29tbW9uL3JlYWRlci9QaW5vdFNlcnZlckRhdGFGZXRjaGVyLnNjYWxh) | `0.00% <0.00%> (ø)` | |
   | [...nnector/spark/common/CaseInsensitiveStringMap.java](https://codecov.io/gh/apache/pinot/pull/10321?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29ubmVjdG9ycy9waW5vdC1zcGFyay1jb21tb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL3Bpbm90L2Nvbm5lY3Rvci9zcGFyay9jb21tb24vQ2FzZUluc2Vuc2l0aXZlU3RyaW5nTWFwLmphdmE=) | `35.41% <35.41%> (ø)` | |
   | ... and [1635 more](https://codecov.io/gh/apache/pinot/pull/10321?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | |
   
   :mega: We’re building smart automated test selection to slash your CI/CD build times. [Learn more](https://about.codecov.io/iterative-testing/?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] cbalci commented on a diff in pull request #10321: Refactor Spark Connector into two modules for reusability

Posted by "cbalci (via GitHub)" <gi...@apache.org>.
cbalci commented on code in PR #10321:
URL: https://github.com/apache/pinot/pull/10321#discussion_r1117753596


##########
pinot-connectors/pinot-spark-common/src/main/java/org/apache/pinot/connector/spark/common/CaseInsensitiveStringMap.java:
##########
@@ -0,0 +1,201 @@
+/**

Review Comment:
   Thanks for the review @xiangfu0 !
   
   I think this class enables the simplest interface we can expose to the implementor: `Map[String,String]`. 
   If you take a look at the `PinotDataSourceReadOptions` below, it has two factory methods with following signatures:
   ```
   object PinotDataSourceReadOptions {
   ...
     private[pinot] def from(optionsMap: util.Map[String, String]): PinotDataSourceReadOptions
     ...
     private[pinot] def from(options: CaseInsensitiveStringMap): PinotDataSourceReadOptions
   ...
   }
   ```
   With this, the shared `PinotDataSourceOptions` object can be created either by passing a `CaseInsensitiveStringMap` or a plain old `Map` which internally will be converted to a `CaseInsensitiveStringMap`. I guess we can even drop the second method and only accept `Map[String, String]` for simplicity.
   
   Let me know if you meant something else.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] cbalci commented on a diff in pull request #10321: Refactor Spark Connector into two modules for reusability

Posted by "cbalci (via GitHub)" <gi...@apache.org>.
cbalci commented on code in PR #10321:
URL: https://github.com/apache/pinot/pull/10321#discussion_r1117753596


##########
pinot-connectors/pinot-spark-common/src/main/java/org/apache/pinot/connector/spark/common/CaseInsensitiveStringMap.java:
##########
@@ -0,0 +1,201 @@
+/**

Review Comment:
   I think this class enables the simplest interface we can expose to the implementor: `Map[String,String]`. 
   If you take a look at the `PinotDataSourceReadOptions` below, it has two factory methods with following signatures:
   ```
   object PinotDataSourceReadOptions {
   ...
     private[pinot] def from(optionsMap: util.Map[String, String]): PinotDataSourceReadOptions
     ...
     private[pinot] def from(options: CaseInsensitiveStringMap): PinotDataSourceReadOptions
   ...
   }
   ```
   With this, the shared `PinotDataSourceOptions` object can be created either by passing a `CaseInsensitiveStringMap` or a plain old `Map` which internally will be converted to a `CaseInsensitiveStringMap`. I guess we can even drop the second method and only accept `Map[String, String]` for simplicity.
   
   Let me know if you meant something else.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] xiangfu0 commented on a diff in pull request #10321: Refactor Spark Connector into two modules for reusability

Posted by "xiangfu0 (via GitHub)" <gi...@apache.org>.
xiangfu0 commented on code in PR #10321:
URL: https://github.com/apache/pinot/pull/10321#discussion_r1116165501


##########
pinot-connectors/pinot-spark-common/src/main/java/org/apache/pinot/connector/spark/common/CaseInsensitiveStringMap.java:
##########
@@ -0,0 +1,201 @@
+/**

Review Comment:
   how about make this an interface and have the wrapper of v2/v3 implementation ?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] KKcorps commented on pull request #10321: Refactor Spark Connector into two modules for reusability

Posted by "KKcorps (via GitHub)" <gi...@apache.org>.
KKcorps commented on PR #10321:
URL: https://github.com/apache/pinot/pull/10321#issuecomment-1458095557

   Will take a look at it today!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] cbalci commented on a diff in pull request #10321: Refactor Spark Connector into two modules for reusability

Posted by "cbalci (via GitHub)" <gi...@apache.org>.
cbalci commented on code in PR #10321:
URL: https://github.com/apache/pinot/pull/10321#discussion_r1115168268


##########
pinot-connectors/pinot-spark-common/src/main/java/org/apache/pinot/connector/spark/common/CaseInsensitiveStringMap.java:
##########
@@ -0,0 +1,201 @@
+/**

Review Comment:
   `org.apache.spark.sql.sources.v2.DataSourceOptions`, which is used by Spark to contain user specified options is removed in Spark3 in favor of a new and more capable container `CaseInsensitiveStringMap`, which is compatible with `DataSourceOption`. I copied this class from Spark codebase into our 'pinot-spark-common' package as a drop in replacement for the former and reworked the `PinotDataSourceReadOptions` accordingly. Now the options class can be reused by both implementations (v2 and v3) consistently.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] cbalci commented on pull request #10321: Refactor Spark Connector into two modules for reusability

Posted by "cbalci (via GitHub)" <gi...@apache.org>.
cbalci commented on PR #10321:
URL: https://github.com/apache/pinot/pull/10321#issuecomment-1458602363

   Thanks for the review @KKcorps ! Yes, please go ahead. I have a couple follow up PRs waiting for this.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] KKcorps commented on pull request #10321: Refactor Spark Connector into two modules for reusability

Posted by "KKcorps (via GitHub)" <gi...@apache.org>.
KKcorps commented on PR #10321:
URL: https://github.com/apache/pinot/pull/10321#issuecomment-1458100596

   @cbalci Should I go ahead and merge this?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org