You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/07/26 03:40:29 UTC

[GitHub] [arrow-datafusion] yahoNanJing opened a new issue, #2964: Better not to display partitions info for ParquetExec

yahoNanJing opened a new issue, #2964:
URL: https://github.com/apache/arrow-datafusion/issues/2964

   **Is your feature request related to a problem or challenge? Please describe what you are trying to do.**
   <!-- A clear and concise description of what the problem is. Ex. I'm always frustrated when [...] 
   (This section helps Arrow developers understand the context and *why* for this feature, in addition to  the *what*) -->
   
   Suppose there're tens of thousands of files needs to be scanned for one SQL. The shown partitions info for the "Physical plan with metrics" will be too messy. And it's also meaningless to show the partitions info because we have already had the filename for the specific ParquetExec task.
   
   **Describe the solution you'd like**
   <!-- A clear and concise description of what you want to happen. -->
   
   Therefore, it's better not to show the partitions info.
   
   **Describe alternatives you've considered**
   <!-- A clear and concise description of any alternative solutions or features you've considered. -->
   
   **Additional context**
   <!-- Add any other context or screenshots about the feature request here. ->
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] Ted-Jiang commented on issue #2964: Better not to display partitions info for ParquetExec

Posted by GitBox <gi...@apache.org>.
Ted-Jiang commented on issue #2964:
URL: https://github.com/apache/arrow-datafusion/issues/2964#issuecomment-1195128121

   @alamb Do you think this is a reasonable change ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] kmitchener commented on issue #2964: Better not to display partitions info for ParquetExec

Posted by GitBox <gi...@apache.org>.
kmitchener commented on issue #2964:
URL: https://github.com/apache/arrow-datafusion/issues/2964#issuecomment-1195669767

   FWIW, attempting to use DataFusion to query AWS VPC Flow Logs, which produces thousands of parquet files per day, makes it impossible to explain plans. Removing the partition list details or condensing the output to just the number of files being read would be great.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] andygrove commented on issue #2964: Better not to display partitions info for ParquetExec

Posted by GitBox <gi...@apache.org>.
andygrove commented on issue #2964:
URL: https://github.com/apache/arrow-datafusion/issues/2964#issuecomment-1195159747

   Perhaps we can make this configurable using the new config mechanism


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb commented on issue #2964: Better not to display partitions info for ParquetExec

Posted by GitBox <gi...@apache.org>.
alamb commented on issue #2964:
URL: https://github.com/apache/arrow-datafusion/issues/2964#issuecomment-1195938630

   >  @alamb Do you think this is a reasonable change ?
   
   @Ted-Jiang  I do  -- perhaps by default the explain plans can summarize the information (e.g. print out the first few parquet files or something)
   
   
   I like @andygrove 's suggestion to make the "show me the full details" a config option (that defaults to only showing the file summary). 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org