You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pinot.apache.org by GitBox <gi...@apache.org> on 2021/10/12 10:40:30 UTC

[GitHub] [pinot] richardstartin opened a new pull request #7565: allow queries to missing columns to fill in default value

richardstartin opened a new pull request #7565:
URL: https://github.com/apache/pinot/pull/7565


   ## Description
   
   ## Upgrade Notes
   Does this PR prevent a zero down-time upgrade? (Assume upgrade order: Controller, Broker, Server, Minion)
   * [ ] Yes (Please label as **<code>backward-incompat</code>**, and complete the section below on Release Notes)
   
   Does this PR fix a zero-downtime upgrade introduced earlier?
   * [ ] Yes (Please label this as **<code>backward-incompat</code>**, and complete the section below on Release Notes)
   
   Does this PR otherwise need attention when creating release notes? Things to consider:
   - New configuration options
   - Deprecation of configurations
   - Signature changes to public methods/interfaces
   - New plugins added or old plugins removed
   * [ ] Yes (Please label this PR as **<code>release-notes</code>** and complete the section on Release Notes)
   ## Release Notes
   <!-- If you have tagged this as either backward-incompat or release-notes,
   you MUST add text here that you would like to see appear in release notes of the
   next release. -->
   
   <!-- If you have a series of commits adding or enabling a feature, then
   add this section only in final commit that marks the feature completed.
   Refer to earlier release notes to see examples of text.
   -->
   ## Documentation
   <!-- If you have introduced a new feature or configuration, please add it to the documentation as well.
   See https://docs.pinot.apache.org/developers/developers-and-contributors/update-document
   -->
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] richardstartin commented on pull request #7565: allow queries to missing columns to fill in default value

Posted by GitBox <gi...@apache.org>.
richardstartin commented on pull request #7565:
URL: https://github.com/apache/pinot/pull/7565#issuecomment-941052012


   I think the failure is a flaky test https://github.com/apache/pinot/pull/7565/checks?check_run_id=3870807865#step:5:27182


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] richardstartin commented on pull request #7565: allow queries to missing columns to fill in default value

Posted by GitBox <gi...@apache.org>.
richardstartin commented on pull request #7565:
URL: https://github.com/apache/pinot/pull/7565#issuecomment-941184874


   I'm not sure pushing this out in to the query and making it the user's problem is particularly fruitful unless the end goal is just to be able to query outdated but not migrated segments. The goal I have in mind is to be able to handle variant types in semi-structured data eventually.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] richardstartin closed pull request #7565: allow queries to missing columns to fill in default value

Posted by GitBox <gi...@apache.org>.
richardstartin closed pull request #7565:
URL: https://github.com/apache/pinot/pull/7565


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] richardstartin commented on pull request #7565: allow queries to missing columns to fill in default value

Posted by GitBox <gi...@apache.org>.
richardstartin commented on pull request #7565:
URL: https://github.com/apache/pinot/pull/7565#issuecomment-941184874






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] walterddr edited a comment on pull request #7565: allow queries to missing columns to fill in default value

Posted by GitBox <gi...@apache.org>.
walterddr edited a comment on pull request #7565:
URL: https://github.com/apache/pinot/pull/7565#issuecomment-941155655


   > what do you think about introducing something like IFMISSING or IFNULL similar to this https://www.w3schools.com/sql/sql_isnull.asp
   > 
   > This will also allow us to return the right data type.
   
   Adding this would definition be a nice syntax sugar (and agree with @richardstartin that this is tangential). I think we can go with a more generic `COALESCE(expr, literal)` solution. what do you think?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] kishoreg commented on pull request #7565: allow queries to missing columns to fill in default value

Posted by GitBox <gi...@apache.org>.
kishoreg commented on pull request #7565:
URL: https://github.com/apache/pinot/pull/7565#issuecomment-941190546


   > I'm not sure pushing this out in to the query and making it the user's problem is particularly fruitful unless the end goal is just to be able to query outdated but not migrated segments. The goal I have in mind is to be able to handle variant types in semi-structured data eventually.
   
   Overall, I like this feature. I am ok with pushing the changes but I would like to postpone the decision on query semantics. It's too early to make a call and it might be hard to change/reverse later. I like that you have made it configurable for now. Let's go with that and revisit the query semantics later. I have some thoughts but it's beyond the scope of this PR.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] walterddr commented on a change in pull request #7565: allow queries to missing columns to fill in default value

Posted by GitBox <gi...@apache.org>.
walterddr commented on a change in pull request #7565:
URL: https://github.com/apache/pinot/pull/7565#discussion_r727208269



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/query/pruner/DataSchemaSegmentPruner.java
##########
@@ -35,7 +35,13 @@ public void init(PinotConfiguration config) {
 
   @Override
   public boolean prune(IndexSegment segment, QueryContext query) {
-    return !segment.getColumnNames().containsAll(query.getColumns());
+    for (String column : query.getColumns()) {
+      if (segment.getColumnNames().contains(column)) {
+        return false;
+      }
+    }
+    // don't prune unless the segment has none of the relevant columns
+    return !query.getColumns().isEmpty();

Review comment:
       how about directly using Collections.disjoint?
   ```suggestion
       return !query.getColumns().isEmpty() 
           && Collections.disjoint(segment.getColumnNames(), query.getColumns()).isEmpty()
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] walterddr commented on a change in pull request #7565: allow queries to missing columns to fill in default value

Posted by GitBox <gi...@apache.org>.
walterddr commented on a change in pull request #7565:
URL: https://github.com/apache/pinot/pull/7565#discussion_r727277843



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/query/pruner/DataSchemaSegmentPruner.java
##########
@@ -35,7 +35,13 @@ public void init(PinotConfiguration config) {
 
   @Override
   public boolean prune(IndexSegment segment, QueryContext query) {
-    return !segment.getColumnNames().containsAll(query.getColumns());
+    for (String column : query.getColumns()) {
+      if (segment.getColumnNames().contains(column)) {
+        return false;
+      }
+    }
+    // don't prune unless the segment has none of the relevant columns
+    return !query.getColumns().isEmpty();

Review comment:
       i see. so for performance consideration we can probably use `com.google.common.collect.Sets.intersection`. It creates a lazy view and will immediately bail out if it finds intersection when `Sets.intersection(a, b).isEmpty`.
   
   the for loop also works just fine, the reason why I suggested this was that it's easier to reason the 2 conditions. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] codecov-commenter commented on pull request #7565: allow queries to missing columns to fill in default value

Posted by GitBox <gi...@apache.org>.
codecov-commenter commented on pull request #7565:
URL: https://github.com/apache/pinot/pull/7565#issuecomment-940967006


   # [Codecov](https://codecov.io/gh/apache/pinot/pull/7565?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
   > Merging [#7565](https://codecov.io/gh/apache/pinot/pull/7565?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (80a9090) into [master](https://codecov.io/gh/apache/pinot/commit/9ef1f490fbbbab645a467b4464cf6c7d05a88ae2?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (9ef1f49) will **decrease** coverage by `57.45%`.
   > The diff coverage is `0.00%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/pinot/pull/7565/graphs/tree.svg?width=650&height=150&src=pr&token=4ibza2ugkz&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/pinot/pull/7565?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   
   ```diff
   @@              Coverage Diff              @@
   ##             master    #7565       +/-   ##
   =============================================
   - Coverage     72.67%   15.21%   -57.46%     
   + Complexity     3413       80     -3333     
   =============================================
     Files          1524     1479       -45     
     Lines         75702    74101     -1601     
     Branches      11039    10898      -141     
   =============================================
   - Hits          55017    11278    -43739     
   - Misses        16995    62011    +45016     
   + Partials       3690      812     -2878     
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | integration1 | `?` | |
   | integration2 | `?` | |
   | unittests1 | `?` | |
   | unittests2 | `15.21% <0.00%> (-0.06%)` | :arrow_down: |
   
   Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment) to find out more.
   
   | [Impacted Files](https://codecov.io/gh/apache/pinot/pull/7565?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | |
   |---|---|---|
   | [...not/core/query/pruner/DataSchemaSegmentPruner.java](https://codecov.io/gh/apache/pinot/pull/7565/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29yZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29yZS9xdWVyeS9wcnVuZXIvRGF0YVNjaGVtYVNlZ21lbnRQcnVuZXIuamF2YQ==) | `0.00% <0.00%> (-75.00%)` | :arrow_down: |
   | [...ocal/indexsegment/immutable/EmptyIndexSegment.java](https://codecov.io/gh/apache/pinot/pull/7565/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9pbmRleHNlZ21lbnQvaW1tdXRhYmxlL0VtcHR5SW5kZXhTZWdtZW50LmphdmE=) | `0.00% <0.00%> (-33.34%)` | :arrow_down: |
   | [...l/indexsegment/immutable/ImmutableSegmentImpl.java](https://codecov.io/gh/apache/pinot/pull/7565/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9pbmRleHNlZ21lbnQvaW1tdXRhYmxlL0ltbXV0YWJsZVNlZ21lbnRJbXBsLmphdmE=) | `0.00% <0.00%> (-53.13%)` | :arrow_down: |
   | [...ocal/indexsegment/mutable/IntermediateSegment.java](https://codecov.io/gh/apache/pinot/pull/7565/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9pbmRleHNlZ21lbnQvbXV0YWJsZS9JbnRlcm1lZGlhdGVTZWdtZW50LmphdmE=) | `0.00% <0.00%> (-70.07%)` | :arrow_down: |
   | [...local/indexsegment/mutable/MutableSegmentImpl.java](https://codecov.io/gh/apache/pinot/pull/7565/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9pbmRleHNlZ21lbnQvbXV0YWJsZS9NdXRhYmxlU2VnbWVudEltcGwuamF2YQ==) | `0.00% <0.00%> (-58.27%)` | :arrow_down: |
   | [...ment/index/datasource/MissingColumnDataSource.java](https://codecov.io/gh/apache/pinot/pull/7565/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2luZGV4L2RhdGFzb3VyY2UvTWlzc2luZ0NvbHVtbkRhdGFTb3VyY2UuamF2YQ==) | `0.00% <0.00%> (ø)` | |
   | [...al/segment/missing/MissingColumnReaderFactory.java](https://codecov.io/gh/apache/pinot/pull/7565/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L21pc3NpbmcvTWlzc2luZ0NvbHVtblJlYWRlckZhY3RvcnkuamF2YQ==) | `0.00% <0.00%> (ø)` | |
   | [...ocal/segment/missing/MissingColumnValueReader.java](https://codecov.io/gh/apache/pinot/pull/7565/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L21pc3NpbmcvTWlzc2luZ0NvbHVtblZhbHVlUmVhZGVyLmphdmE=) | `0.00% <0.00%> (ø)` | |
   | [...c/main/java/org/apache/pinot/common/tier/Tier.java](https://codecov.io/gh/apache/pinot/pull/7565/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9jb21tb24vdGllci9UaWVyLmphdmE=) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | [...ain/java/org/apache/pinot/core/data/table/Key.java](https://codecov.io/gh/apache/pinot/pull/7565/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29yZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29yZS9kYXRhL3RhYmxlL0tleS5qYXZh) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | ... and [1214 more](https://codecov.io/gh/apache/pinot/pull/7565/diff?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/pinot/pull/7565?src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/pinot/pull/7565?src=pr&el=footer&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Last update [9ef1f49...80a9090](https://codecov.io/gh/apache/pinot/pull/7565?src=pr&el=lastupdated&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] kishoreg commented on pull request #7565: allow queries to missing columns to fill in default value

Posted by GitBox <gi...@apache.org>.
kishoreg commented on pull request #7565:
URL: https://github.com/apache/pinot/pull/7565#issuecomment-941190546


   > I'm not sure pushing this out in to the query and making it the user's problem is particularly fruitful unless the end goal is just to be able to query outdated but not migrated segments. The goal I have in mind is to be able to handle variant types in semi-structured data eventually.
   
   Overall, I like this feature. I am ok with pushing the changes but I would like to postpone the decision on query semantics. It's too early to make a call and it might be hard to change/reverse later. I like that you have made it configurable for now. Let's go with that and revisit the query semantics later. I have some thoughts but it's beyond the scope of this PR.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] richardstartin commented on a change in pull request #7565: allow queries to missing columns to fill in default value

Posted by GitBox <gi...@apache.org>.
richardstartin commented on a change in pull request #7565:
URL: https://github.com/apache/pinot/pull/7565#discussion_r727314178



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/query/pruner/DataSchemaSegmentPruner.java
##########
@@ -35,7 +35,13 @@ public void init(PinotConfiguration config) {
 
   @Override
   public boolean prune(IndexSegment segment, QueryContext query) {
-    return !segment.getColumnNames().containsAll(query.getColumns());
+    for (String column : query.getColumns()) {
+      if (segment.getColumnNames().contains(column)) {
+        return false;
+      }
+    }
+    // don't prune unless the segment has none of the relevant columns
+    return !query.getColumns().isEmpty();

Review comment:
       I don't think this needs to change.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] walterddr commented on pull request #7565: allow queries to missing columns to fill in default value

Posted by GitBox <gi...@apache.org>.
walterddr commented on pull request #7565:
URL: https://github.com/apache/pinot/pull/7565#issuecomment-941155655


   > what do you think about introducing something like IFMISSING or IFNULL similar to this https://www.w3schools.com/sql/sql_isnull.asp
   > 
   > This will also allow us to return the right data type.
   
   Adding this would definition be a nice syntax sugar (and agree with @richardstartin that this is tangential). I think we can go with a more generic `COALESCE(identifier, literal)` solution. what do you think?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] walterddr edited a comment on pull request #7565: allow queries to missing columns to fill in default value

Posted by GitBox <gi...@apache.org>.
walterddr edited a comment on pull request #7565:
URL: https://github.com/apache/pinot/pull/7565#issuecomment-941155655


   > what do you think about introducing something like IFMISSING or IFNULL similar to this https://www.w3schools.com/sql/sql_isnull.asp
   > 
   > This will also allow us to return the right data type.
   
   Adding this would definition be a nice syntax sugar (and agree with @richardstartin that this is tangential). I think we can go with a more generic `COALESCE(expr, literal)` solution. what do you think?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] walterddr commented on pull request #7565: allow queries to missing columns to fill in default value

Posted by GitBox <gi...@apache.org>.
walterddr commented on pull request #7565:
URL: https://github.com/apache/pinot/pull/7565#issuecomment-941155655


   > what do you think about introducing something like IFMISSING or IFNULL similar to this https://www.w3schools.com/sql/sql_isnull.asp
   > 
   > This will also allow us to return the right data type.
   
   Adding this would definition be a nice syntax sugar (and agree with @richardstartin that this is tangential). I think we can go with a more generic `COALESCE(identifier, literal)` solution. what do you think?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] richardstartin commented on a change in pull request #7565: allow queries to missing columns to fill in default value

Posted by GitBox <gi...@apache.org>.
richardstartin commented on a change in pull request #7565:
URL: https://github.com/apache/pinot/pull/7565#discussion_r727215926



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/query/pruner/DataSchemaSegmentPruner.java
##########
@@ -35,7 +35,13 @@ public void init(PinotConfiguration config) {
 
   @Override
   public boolean prune(IndexSegment segment, QueryContext query) {
-    return !segment.getColumnNames().containsAll(query.getColumns());
+    for (String column : query.getColumns()) {
+      if (segment.getColumnNames().contains(column)) {
+        return false;
+      }
+    }
+    // don't prune unless the segment has none of the relevant columns
+    return !query.getColumns().isEmpty();

Review comment:
       I'm not sure I see the benefit, to be honest. It's also clear that the small set is in the outer loop the way it's written, whereas using that code literally, `Collections.disjoint` would iterate over potentially hundreds of columns. Of course, you can swap the arguments, but writing a very simple loop makes it obvious.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] kishoreg commented on pull request #7565: allow queries to missing columns to fill in default value

Posted by GitBox <gi...@apache.org>.
kishoreg commented on pull request #7565:
URL: https://github.com/apache/pinot/pull/7565#issuecomment-941087365


   what do you think about introducing something like IFMISSING or IFNULL similar to this https://www.w3schools.com/sql/sql_isnull.asp 
   
   This will also allow us to return the right data type.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] richardstartin commented on a change in pull request #7565: allow queries to missing columns to fill in default value

Posted by GitBox <gi...@apache.org>.
richardstartin commented on a change in pull request #7565:
URL: https://github.com/apache/pinot/pull/7565#discussion_r727314178



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/query/pruner/DataSchemaSegmentPruner.java
##########
@@ -35,7 +35,13 @@ public void init(PinotConfiguration config) {
 
   @Override
   public boolean prune(IndexSegment segment, QueryContext query) {
-    return !segment.getColumnNames().containsAll(query.getColumns());
+    for (String column : query.getColumns()) {
+      if (segment.getColumnNames().contains(column)) {
+        return false;
+      }
+    }
+    // don't prune unless the segment has none of the relevant columns
+    return !query.getColumns().isEmpty();

Review comment:
       I don't think this needs to change.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] richardstartin removed a comment on pull request #7565: allow queries to missing columns to fill in default value

Posted by GitBox <gi...@apache.org>.
richardstartin removed a comment on pull request #7565:
URL: https://github.com/apache/pinot/pull/7565#issuecomment-941052012


   I think the failure is a flaky test https://github.com/apache/pinot/pull/7565/checks?check_run_id=3870807865#step:5:27182


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] richardstartin commented on pull request #7565: allow queries to missing columns to fill in default value

Posted by GitBox <gi...@apache.org>.
richardstartin commented on pull request #7565:
URL: https://github.com/apache/pinot/pull/7565#issuecomment-941097749


   Providing SQL functions like that to empower the user would be very interesting, but tangential to what I want to achieve. The motivation for this draft is the ability to eventually support columns which contain typed entity-attribute-values for handling semi-structured data and the real blocker is that a `DataSource` _must_ have a single `DataType`, rather than determine it on the fly when dealing with semi-structured columns. Obviously, a typed EAV column isn't really in the spirit of Pinot's columnar architecture, and determining types row by row has a cost, but it opens up use cases with semi-structured data.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] walterddr commented on a change in pull request #7565: allow queries to missing columns to fill in default value

Posted by GitBox <gi...@apache.org>.
walterddr commented on a change in pull request #7565:
URL: https://github.com/apache/pinot/pull/7565#discussion_r727277843



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/query/pruner/DataSchemaSegmentPruner.java
##########
@@ -35,7 +35,13 @@ public void init(PinotConfiguration config) {
 
   @Override
   public boolean prune(IndexSegment segment, QueryContext query) {
-    return !segment.getColumnNames().containsAll(query.getColumns());
+    for (String column : query.getColumns()) {
+      if (segment.getColumnNames().contains(column)) {
+        return false;
+      }
+    }
+    // don't prune unless the segment has none of the relevant columns
+    return !query.getColumns().isEmpty();

Review comment:
       i see. so for performance consideration we can probably use `com.google.common.collect.Sets.intersection`. It creates a lazy view and will immediately bail out if it finds intersection when `Sets.intersection(a, b).isEmpty`.
   
   the for loop also works just fine, the reason why I suggested this was that it's easier to reason the 2 conditions. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] richardstartin closed pull request #7565: allow queries to missing columns to fill in default value

Posted by GitBox <gi...@apache.org>.
richardstartin closed pull request #7565:
URL: https://github.com/apache/pinot/pull/7565


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] richardstartin commented on pull request #7565: allow queries to missing columns to fill in default value

Posted by GitBox <gi...@apache.org>.
richardstartin commented on pull request #7565:
URL: https://github.com/apache/pinot/pull/7565#issuecomment-941208184


   The concerns expressed here fundamentally relate to the way values of different types are combined when results from different segments are merged, and handling this gracefully is a prerequisite for semi-structured data - typed EAVs can't be made to work without it. So this draft triggered useful discussion.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org