You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/03/26 09:58:17 UTC

[GitHub] [arrow-datafusion] silence-coding opened a new issue #2097: Query order

silence-coding opened a new issue #2097:
URL: https://github.com/apache/arrow-datafusion/issues/2097


   Why Is the Returned Result Out of Order After I Use the Where Statement to Query
   Out-of-order occurs between RecordBatches, and each RecordBatch is ordered.
   ```
   Statement:  select * from table
   
   output:
   RecordBatch1
     row1
     row2
   RecordBatch2
     row3
     row4
   RecordBatch3
    row5
    row6
   
   Statement: select * from table where timestamp BETWEEN 1648196340000 and 1648287569004
   
   output:
   RecordBatch1
     row1
     row2
   RecordBatch3
     row5
     row6
   RecordBatch2
     row3
     row4
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] Dandandan edited a comment on issue #2097: 【Help】Query order

Posted by GitBox <gi...@apache.org>.
Dandandan edited a comment on issue #2097:
URL: https://github.com/apache/arrow-datafusion/issues/2097#issuecomment-1079658655


   Hi @silence-coding thanks for opening an issue.
   
   Without an `ORDER BY` clause, the query engine is free to return the rows in any order. The reason it is returning the rows in different order is the parallel execution of DataFusion: execution on one file/partition might finish sooner than the other, returning the rows earlier.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] silence-coding commented on issue #2097: 【Help】Query order

Posted by GitBox <gi...@apache.org>.
silence-coding commented on issue #2097:
URL: https://github.com/apache/arrow-datafusion/issues/2097#issuecomment-1080097420


   Will the same parquet file be executed in parallel? Is there any other solution? I'm worried about whether `ORDER BY` will cause high resource occupation, for example, putting all query results to the memory and then sorting them.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] Dandandan edited a comment on issue #2097: 【Help】Query order

Posted by GitBox <gi...@apache.org>.
Dandandan edited a comment on issue #2097:
URL: https://github.com/apache/arrow-datafusion/issues/2097#issuecomment-1079658655


   Hi @silence-coding thanks for opening an issue.
   
   Without an `ORDER BY` clause, the query engine is free to return the rows in any order. The reason it is returning the rows in different order is the parallel execution of DataFusion: execution on one file/partition might finish sooner than the other, returning the rows earlier and in a different order.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] silence-coding commented on issue #2097: 【Help】the order of query

Posted by GitBox <gi...@apache.org>.
silence-coding commented on issue #2097:
URL: https://github.com/apache/arrow-datafusion/issues/2097#issuecomment-1080278608


   If target_partitions is set to 1 or the limit syntax is added to the SQL statement, DataFusion does not perform concurrent queries to ensure the sequence of queried data.
   
   Is there a better way? If not, I will close the issue in a few days!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] silence-coding closed issue #2097: 【Help】the order of query

Posted by GitBox <gi...@apache.org>.
silence-coding closed issue #2097:
URL: https://github.com/apache/arrow-datafusion/issues/2097


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] Dandandan commented on issue #2097: 【Help】Query order

Posted by GitBox <gi...@apache.org>.
Dandandan commented on issue #2097:
URL: https://github.com/apache/arrow-datafusion/issues/2097#issuecomment-1079658655


   Hi @silence-coding Without an `ORDER BY` clause, the query engine is free to return the rows in any order. The reason it is returning the rows in different order is the parallel execution of DataFusion: execution on one file/partition might finish sooner than the other, returning the rows earlier.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org