You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/03/26 09:58:17 UTC
[GitHub] [arrow-datafusion] silence-coding opened a new issue #2097: Query order
silence-coding opened a new issue #2097:
URL: https://github.com/apache/arrow-datafusion/issues/2097
Why Is the Returned Result Out of Order After I Use the Where Statement to Query
Out-of-order occurs between RecordBatches, and each RecordBatch is ordered.
```
Statement: select * from table
output:
RecordBatch1
row1
row2
RecordBatch2
row3
row4
RecordBatch3
row5
row6
Statement: select * from table where timestamp BETWEEN 1648196340000 and 1648287569004
output:
RecordBatch1
row1
row2
RecordBatch3
row5
row6
RecordBatch2
row3
row4
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow-datafusion] Dandandan edited a comment on issue #2097: 【Help】Query order
Posted by GitBox <gi...@apache.org>.
Dandandan edited a comment on issue #2097:
URL: https://github.com/apache/arrow-datafusion/issues/2097#issuecomment-1079658655
Hi @silence-coding thanks for opening an issue.
Without an `ORDER BY` clause, the query engine is free to return the rows in any order. The reason it is returning the rows in different order is the parallel execution of DataFusion: execution on one file/partition might finish sooner than the other, returning the rows earlier.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow-datafusion] silence-coding commented on issue #2097: 【Help】Query order
Posted by GitBox <gi...@apache.org>.
silence-coding commented on issue #2097:
URL: https://github.com/apache/arrow-datafusion/issues/2097#issuecomment-1080097420
Will the same parquet file be executed in parallel? Is there any other solution? I'm worried about whether `ORDER BY` will cause high resource occupation, for example, putting all query results to the memory and then sorting them.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow-datafusion] Dandandan edited a comment on issue #2097: 【Help】Query order
Posted by GitBox <gi...@apache.org>.
Dandandan edited a comment on issue #2097:
URL: https://github.com/apache/arrow-datafusion/issues/2097#issuecomment-1079658655
Hi @silence-coding thanks for opening an issue.
Without an `ORDER BY` clause, the query engine is free to return the rows in any order. The reason it is returning the rows in different order is the parallel execution of DataFusion: execution on one file/partition might finish sooner than the other, returning the rows earlier and in a different order.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow-datafusion] silence-coding commented on issue #2097: 【Help】the order of query
Posted by GitBox <gi...@apache.org>.
silence-coding commented on issue #2097:
URL: https://github.com/apache/arrow-datafusion/issues/2097#issuecomment-1080278608
If target_partitions is set to 1 or the limit syntax is added to the SQL statement, DataFusion does not perform concurrent queries to ensure the sequence of queried data.
Is there a better way? If not, I will close the issue in a few days!
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow-datafusion] silence-coding closed issue #2097: 【Help】the order of query
Posted by GitBox <gi...@apache.org>.
silence-coding closed issue #2097:
URL: https://github.com/apache/arrow-datafusion/issues/2097
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow-datafusion] Dandandan commented on issue #2097: 【Help】Query order
Posted by GitBox <gi...@apache.org>.
Dandandan commented on issue #2097:
URL: https://github.com/apache/arrow-datafusion/issues/2097#issuecomment-1079658655
Hi @silence-coding Without an `ORDER BY` clause, the query engine is free to return the rows in any order. The reason it is returning the rows in different order is the parallel execution of DataFusion: execution on one file/partition might finish sooner than the other, returning the rows earlier.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org