You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2021/05/04 20:17:38 UTC

[GitHub] [arrow-datafusion] Dandandan opened a new pull request #262: Implement select distinct

Dandandan opened a new pull request #262:
URL: https://github.com/apache/arrow-datafusion/pull/262


   # Which issue does this PR close?
   
   Closes #250
   
    # Rationale for this change
   Fix behavior and a simple implementation of select distinct.
   
   # What changes are included in this PR?
   
   An implementation of select distinct.
   
   # Are there any user-facing changes?
   
   A working select distinct


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] codecov-commenter edited a comment on pull request #262: Implement select distinct

Posted by GitBox <gi...@apache.org>.
codecov-commenter edited a comment on pull request #262:
URL: https://github.com/apache/arrow-datafusion/pull/262#issuecomment-832258802


   # [Codecov](https://codecov.io/gh/apache/arrow-datafusion/pull/262?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
   > Merging [#262](https://codecov.io/gh/apache/arrow-datafusion/pull/262?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (6ea9c77) into [master](https://codecov.io/gh/apache/arrow-datafusion/commit/5f6024d570f4d2d5fd4c2187ff7dcb0cd389ac4b?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (5f6024d) will **increase** coverage by `0.06%`.
   > The diff coverage is `100.00%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/arrow-datafusion/pull/262/graphs/tree.svg?width=650&height=150&src=pr&token=JXwWBKD3D9&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/arrow-datafusion/pull/262?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   
   ```diff
   @@            Coverage Diff             @@
   ##           master     #262      +/-   ##
   ==========================================
   + Coverage   76.80%   76.87%   +0.06%     
   ==========================================
     Files         133      133              
     Lines       23284    23350      +66     
   ==========================================
   + Hits        17884    17950      +66     
     Misses       5400     5400              
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/arrow-datafusion/pull/262?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | |
   |---|---|---|
   | [datafusion/src/sql/planner.rs](https://codecov.io/gh/apache/arrow-datafusion/pull/262/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-ZGF0YWZ1c2lvbi9zcmMvc3FsL3BsYW5uZXIucnM=) | `83.70% <100.00%> (+0.05%)` | :arrow_up: |
   | [datafusion/tests/sql.rs](https://codecov.io/gh/apache/arrow-datafusion/pull/262/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-ZGF0YWZ1c2lvbi90ZXN0cy9zcWwucnM=) | `99.89% <100.00%> (+<0.01%)` | :arrow_up: |
   | [datafusion/src/datasource/csv.rs](https://codecov.io/gh/apache/arrow-datafusion/pull/262/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-ZGF0YWZ1c2lvbi9zcmMvZGF0YXNvdXJjZS9jc3YucnM=) | `73.07% <0.00%> (-0.26%)` | :arrow_down: |
   | [...tafusion/src/physical\_plan/distinct\_expressions.rs](https://codecov.io/gh/apache/arrow-datafusion/pull/262/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-ZGF0YWZ1c2lvbi9zcmMvcGh5c2ljYWxfcGxhbi9kaXN0aW5jdF9leHByZXNzaW9ucy5ycw==) | `90.80% <0.00%> (-0.11%)` | :arrow_down: |
   | [datafusion/src/physical\_plan/csv.rs](https://codecov.io/gh/apache/arrow-datafusion/pull/262/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-ZGF0YWZ1c2lvbi9zcmMvcGh5c2ljYWxfcGxhbi9jc3YucnM=) | `79.01% <0.00%> (-0.09%)` | :arrow_down: |
   | [datafusion/src/physical\_plan/parquet.rs](https://codecov.io/gh/apache/arrow-datafusion/pull/262/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-ZGF0YWZ1c2lvbi9zcmMvcGh5c2ljYWxfcGxhbi9wYXJxdWV0LnJz) | `87.55% <0.00%> (-0.02%)` | :arrow_down: |
   | [datafusion/src/scalar.rs](https://codecov.io/gh/apache/arrow-datafusion/pull/262/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-ZGF0YWZ1c2lvbi9zcmMvc2NhbGFyLnJz) | `54.36% <0.00%> (+0.22%)` | :arrow_up: |
   | [datafusion/src/physical\_plan/group\_scalar.rs](https://codecov.io/gh/apache/arrow-datafusion/pull/262/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-ZGF0YWZ1c2lvbi9zcmMvcGh5c2ljYWxfcGxhbi9ncm91cF9zY2FsYXIucnM=) | `58.82% <0.00%> (+1.17%)` | :arrow_up: |
   | [datafusion/src/physical\_plan/common.rs](https://codecov.io/gh/apache/arrow-datafusion/pull/262/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-ZGF0YWZ1c2lvbi9zcmMvcGh5c2ljYWxfcGxhbi9jb21tb24ucnM=) | `86.20% <0.00%> (+2.20%)` | :arrow_up: |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/arrow-datafusion/pull/262?src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/arrow-datafusion/pull/262?src=pr&el=footer&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Last update [5f6024d...6ea9c77](https://codecov.io/gh/apache/arrow-datafusion/pull/262?src=pr&el=lastupdated&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] Dandandan commented on a change in pull request #262: Implement select distinct

Posted by GitBox <gi...@apache.org>.
Dandandan commented on a change in pull request #262:
URL: https://github.com/apache/arrow-datafusion/pull/262#discussion_r626089767



##########
File path: datafusion/src/sql/planner.rs
##########
@@ -649,20 +657,15 @@ impl<'a, S: ContextProvider> SqlToRel<'a, S> {
     /// necessary, i.e., when the input fields are different than the
     /// projection. Note that if the input fields are the same, but out of
     /// order, the projection will be applied.
-    fn project(
-        &self,
-        input: &LogicalPlan,
-        expr: Vec<Expr>,
-        force: bool,

Review comment:
       small drive-by simplification, an unused option




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] Dandandan merged pull request #262: Implement select distinct

Posted by GitBox <gi...@apache.org>.
Dandandan merged pull request #262:
URL: https://github.com/apache/arrow-datafusion/pull/262


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] codecov-commenter commented on pull request #262: Implement select distinct

Posted by GitBox <gi...@apache.org>.
codecov-commenter commented on pull request #262:
URL: https://github.com/apache/arrow-datafusion/pull/262#issuecomment-832258802


   # [Codecov](https://codecov.io/gh/apache/arrow-datafusion/pull/262?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
   > Merging [#262](https://codecov.io/gh/apache/arrow-datafusion/pull/262?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (e4c9384) into [master](https://codecov.io/gh/apache/arrow-datafusion/commit/5f6024d570f4d2d5fd4c2187ff7dcb0cd389ac4b?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (5f6024d) will **increase** coverage by `0.06%`.
   > The diff coverage is `95.77%`.
   
   > :exclamation: Current head e4c9384 differs from pull request most recent head 316c734. Consider uploading reports for the commit 316c734 to get more accurate results
   [![Impacted file tree graph](https://codecov.io/gh/apache/arrow-datafusion/pull/262/graphs/tree.svg?width=650&height=150&src=pr&token=JXwWBKD3D9&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/arrow-datafusion/pull/262?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   
   ```diff
   @@            Coverage Diff             @@
   ##           master     #262      +/-   ##
   ==========================================
   + Coverage   76.80%   76.87%   +0.06%     
   ==========================================
     Files         133      133              
     Lines       23284    23349      +65     
   ==========================================
   + Hits        17884    17949      +65     
     Misses       5400     5400              
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/arrow-datafusion/pull/262?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | |
   |---|---|---|
   | [datafusion/src/scalar.rs](https://codecov.io/gh/apache/arrow-datafusion/pull/262/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-ZGF0YWZ1c2lvbi9zcmMvc2NhbGFyLnJz) | `54.36% <0.00%> (+0.22%)` | :arrow_up: |
   | [...tafusion/src/physical\_plan/distinct\_expressions.rs](https://codecov.io/gh/apache/arrow-datafusion/pull/262/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-ZGF0YWZ1c2lvbi9zcmMvcGh5c2ljYWxfcGxhbi9kaXN0aW5jdF9leHByZXNzaW9ucy5ycw==) | `90.80% <87.50%> (-0.11%)` | :arrow_down: |
   | [datafusion/src/sql/planner.rs](https://codecov.io/gh/apache/arrow-datafusion/pull/262/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-ZGF0YWZ1c2lvbi9zcmMvc3FsL3BsYW5uZXIucnM=) | `83.70% <100.00%> (+0.05%)` | :arrow_up: |
   | [datafusion/tests/sql.rs](https://codecov.io/gh/apache/arrow-datafusion/pull/262/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-ZGF0YWZ1c2lvbi90ZXN0cy9zcWwucnM=) | `99.89% <100.00%> (+<0.01%)` | :arrow_up: |
   | [datafusion/src/physical\_plan/group\_scalar.rs](https://codecov.io/gh/apache/arrow-datafusion/pull/262/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-ZGF0YWZ1c2lvbi9zcmMvcGh5c2ljYWxfcGxhbi9ncm91cF9zY2FsYXIucnM=) | `58.82% <0.00%> (+1.17%)` | :arrow_up: |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/arrow-datafusion/pull/262?src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/arrow-datafusion/pull/262?src=pr&el=footer&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Last update [5f6024d...316c734](https://codecov.io/gh/apache/arrow-datafusion/pull/262?src=pr&el=lastupdated&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] alamb commented on a change in pull request #262: Implement select distinct

Posted by GitBox <gi...@apache.org>.
alamb commented on a change in pull request #262:
URL: https://github.com/apache/arrow-datafusion/pull/262#discussion_r626143799



##########
File path: datafusion/tests/sql.rs
##########
@@ -372,6 +372,81 @@ async fn csv_query_group_by_float32() -> Result<()> {
     Ok(())
 }
 
+#[tokio::test]
+async fn select_all() -> Result<()> {
+    let mut ctx = ExecutionContext::new();
+    register_aggregate_simple_csv(&mut ctx)?;
+
+    let sql = "SELECT c1 FROM aggregate_simple order by c1";
+    let actual_no_all = execute(&mut ctx, sql).await;
+
+    let sql_all = "SELECT ALL c1 FROM aggregate_simple order by c1";
+    let actual_all = execute(&mut ctx, sql_all).await;
+
+    assert_eq!(actual_no_all, actual_all);
+
+    Ok(())
+}
+
+#[tokio::test]
+async fn select_distinct_all() -> Result<()> {

Review comment:
       ```suggestion
   async fn select_distinct() -> Result<()> {
   ```

##########
File path: datafusion/tests/sql.rs
##########
@@ -372,6 +372,81 @@ async fn csv_query_group_by_float32() -> Result<()> {
     Ok(())
 }
 
+#[tokio::test]
+async fn select_all() -> Result<()> {
+    let mut ctx = ExecutionContext::new();
+    register_aggregate_simple_csv(&mut ctx)?;
+
+    let sql = "SELECT c1 FROM aggregate_simple order by c1";
+    let actual_no_all = execute(&mut ctx, sql).await;
+
+    let sql_all = "SELECT ALL c1 FROM aggregate_simple order by c1";
+    let actual_all = execute(&mut ctx, sql_all).await;
+
+    assert_eq!(actual_no_all, actual_all);
+
+    Ok(())
+}
+
+#[tokio::test]
+async fn select_distinct_all() -> Result<()> {
+    let mut ctx = ExecutionContext::new();
+    register_aggregate_simple_csv(&mut ctx)?;
+
+    let sql = "SELECT DISTINCT * FROM aggregate_simple";
+    let mut actual = execute(&mut ctx, sql).await;
+    actual.sort();
+
+    let mut dedup = actual.clone();
+    dedup.dedup();
+
+    assert_eq!(actual, dedup);
+
+    Ok(())
+}
+
+#[tokio::test]
+async fn select_distinct_simple() -> Result<()> {
+    let mut ctx = ExecutionContext::new();
+    register_aggregate_simple_csv(&mut ctx)?;
+
+    let sql = "SELECT DISTINCT c1 FROM aggregate_simple order by c1";
+    let actual = execute(&mut ctx, sql).await;
+
+    let expected = vec![

Review comment:
       I doubled checked that the input has duplicates (to show that the distinct has removed them):  https://github.com/apache/arrow-datafusion/blob/master/datafusion/tests/aggregate_simple.csv
   
   👍 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org