You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/05/10 14:28:55 UTC

[GitHub] [arrow-datafusion] andygrove opened a new issue, #2502: [EPIC] Move Ballista to new arrow-ballista repo

andygrove opened a new issue, #2502:
URL: https://github.com/apache/arrow-datafusion/issues/2502

   **Is your feature request related to a problem or challenge? Please describe what you are trying to do.**
   This is a long-term goal and there are some steps to get there but I would like to discuss this with the community.
   
   ## Rationale for this
   
   - Decouple release process for DataFusion and Ballista
   - Allow each project to have top-level documentation and user guides that are targeting the appropriate audience
   - Reduce issue tracking and PR review burden for DataFusion maintainers who are not as interested in Ballista
   - Help avoid accidental circular dependencies being introduced between the projects (such as https://github.com/apache/arrow-datafusion/issues/2433)
   - Helps formalize the public API for DataFusion that other query engines should be using
   
   ## Steps
   
   - Define public API / contract between Ballista and DataFusion
   - Set up CI in such a way that we are aware when changes are made to DataFusion that break compatibility with Ballista so that we can address them quickly
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] andygrove commented on issue #2502: [EPIC] Move Ballista to new arrow-ballista repo

Posted by GitBox <gi...@apache.org>.
andygrove commented on issue #2502:
URL: https://github.com/apache/arrow-datafusion/issues/2502#issuecomment-1122855900

   @alamb @tustvold Here is another proposal that I would like to get your opinion on


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] gaojun2048 commented on issue #2502: [EPIC] Move Ballista to new arrow-ballista repo

Posted by GitBox <gi...@apache.org>.
gaojun2048 commented on issue #2502:
URL: https://github.com/apache/arrow-datafusion/issues/2502#issuecomment-1123188124

   That great!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] andygrove commented on issue #2502: [EPIC] Move Ballista to new arrow-ballista repo

Posted by GitBox <gi...@apache.org>.
andygrove commented on issue #2502:
URL: https://github.com/apache/arrow-datafusion/issues/2502#issuecomment-1123782492

   Thanks for the encouraging feedback.
   
   I started a design doc where we can discuss the finer details. https://docs.google.com/document/d/1jNRbadyStSrV5kifwn0khufAwq6OnzGczG4z8oTQJP4/edit?usp=sharing


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] realno commented on issue #2502: [EPIC] Move Ballista to new arrow-ballista repo

Posted by GitBox <gi...@apache.org>.
realno commented on issue #2502:
URL: https://github.com/apache/arrow-datafusion/issues/2502#issuecomment-1123175255

   The proposal looks reasonable. +1. Looking forward to seeing the public API design.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] andygrove commented on issue #2502: [EPIC] Move Ballista to new arrow-ballista repo

Posted by GitBox <gi...@apache.org>.
andygrove commented on issue #2502:
URL: https://github.com/apache/arrow-datafusion/issues/2502#issuecomment-1133656420

   The next step is to review & merge https://github.com/apache/arrow-datafusion/pull/2582


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] yahoNanJing commented on issue #2502: [EPIC] Move Ballista to new arrow-ballista repo

Posted by GitBox <gi...@apache.org>.
yahoNanJing commented on issue #2502:
URL: https://github.com/apache/arrow-datafusion/issues/2502#issuecomment-1129113968

   Thanks @andygrove and the elaborated design document. The proposal has covered many points to reduce the risk of broken changes. 
   
   One more suggestion is whether it's possible for us to define the public API before moving the Ballista to another top-level repository, or at least document the things may break ballista to let both datafusion and ballista developer to be aware if things change.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] tustvold commented on issue #2502: [EPIC] Move Ballista to new arrow-ballista repo

Posted by GitBox <gi...@apache.org>.
tustvold commented on issue #2502:
URL: https://github.com/apache/arrow-datafusion/issues/2502#issuecomment-1123209315

   No objections from me, but I'm possibly not the right person to ask... As you allude to, we will need to be more careful making breaking changes to DataFusion, but nothing insurmountable.
   
   > Reduce issue tracking and PR review burden for DataFusion maintainers who are not as interested in Ballista
   
   We will need to ensure there is still sufficient review capacity for Ballista to thrive, I don't have a good feel for if this is a concern or not.
   
   > Allow each project to have top-level documentation and user guides that are targeting the appropriate audience
   
   I like this a lot, the audiences for the projects are likely rather different :+1:


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb commented on issue #2502: [EPIC] Move Ballista to new arrow-ballista repo

Posted by GitBox <gi...@apache.org>.
alamb commented on issue #2502:
URL: https://github.com/apache/arrow-datafusion/issues/2502#issuecomment-1123720755

   I agree this would be a good step -- and help Ballista and DataFusion both to mature. I am fully supportive.
   
   Thank you for the offer @thinkharderdev 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] thinkharderdev commented on issue #2502: [EPIC] Move Ballista to new arrow-ballista repo

Posted by GitBox <gi...@apache.org>.
thinkharderdev commented on issue #2502:
URL: https://github.com/apache/arrow-datafusion/issues/2502#issuecomment-1123217370

   Sounds like a good idea. I'm happy to spend time helping review PRs for Ballista if review capacity is an issue. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] andygrove commented on issue #2502: [EPIC] Move Ballista to new arrow-ballista repo

Posted by GitBox <gi...@apache.org>.
andygrove commented on issue #2502:
URL: https://github.com/apache/arrow-datafusion/issues/2502#issuecomment-1127783128

    @liukun4515 @yahoNanJing @mingmwang Please take a look and let us know if you have feedback. We have started a vote on the mailing list to move forward with this proposal.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] andygrove commented on issue #2502: [EPIC] Move Ballista to new arrow-ballista repo

Posted by GitBox <gi...@apache.org>.
andygrove commented on issue #2502:
URL: https://github.com/apache/arrow-datafusion/issues/2502#issuecomment-1129048645

   Hi @yahoNanJing thanks for the input. The plan (detailed in the [design document](https://docs.google.com/document/d/1jNRbadyStSrV5kifwn0khufAwq6OnzGczG4z8oTQJP4/edit?usp=sharing)) is for the DataFusion CI checks to pull the Ballista repo and run the tests to prevent DataFusion making changes that break the Ballista tests.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb commented on issue #2502: [EPIC] Move Ballista to new arrow-ballista repo

Posted by GitBox <gi...@apache.org>.
alamb commented on issue #2502:
URL: https://github.com/apache/arrow-datafusion/issues/2502#issuecomment-1135894039

   I think it is merged -- I wonder if this epic is done?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] yjshen commented on issue #2502: [EPIC] Move Ballista to new arrow-ballista repo

Posted by GitBox <gi...@apache.org>.
yjshen commented on issue #2502:
URL: https://github.com/apache/arrow-datafusion/issues/2502#issuecomment-1123152716

   Cc @liukun4515 @yahoNanJing @mingmwang @thinkharderdev @gaojun2048 @realno as I could remember.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] yahoNanJing commented on issue #2502: [EPIC] Move Ballista to new arrow-ballista repo

Posted by GitBox <gi...@apache.org>.
yahoNanJing commented on issue #2502:
URL: https://github.com/apache/arrow-datafusion/issues/2502#issuecomment-1128350711

   Hi @andygrove, sorry for late response. I'm not opposed to move Ballista to a new repo. However, I still have some concerns.
   1. How to manage some features which needs changes in both datafusion and ballista, like configuration refactoring?
   2. For some features, like Morsel-driven parallel execution, it leverages the **partition** for morsel splitting. However, the **partition** is also used for task splitting in Ballista. The same concept may have conflicts between datafusion and ballista, if ballista is moved out and new features are introduced to datafusion. How should we avoid that?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb closed issue #2502: [EPIC] Move Ballista to new arrow-ballista repo

Posted by GitBox <gi...@apache.org>.
alamb closed issue #2502: [EPIC] Move Ballista to new arrow-ballista repo
URL: https://github.com/apache/arrow-datafusion/issues/2502


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] andygrove commented on issue #2502: [EPIC] Move Ballista to new arrow-ballista repo

Posted by GitBox <gi...@apache.org>.
andygrove commented on issue #2502:
URL: https://github.com/apache/arrow-datafusion/issues/2502#issuecomment-1131740665

   Hi @yahoNanJing 
   
   Well, we already have a documented public API - it is the one that shows up in docs.rs today for all of the DataFusion crates. However, we will continue to add new logical expressions and operators and those are often breaking changes. Also, there will likely be more changes to `ExecutionPlan` to support the new scheduler. I do think that we need to document this and I can do that. Also, we should ask DataFusion maintainers to create corresponding Ballista PRs for any breaking API changes. Adding the CI checks will alert us to which PRs would cause regressions in DataFusion so we can be careful not to merge those ones without reviewing (and testing) the corresponding Ballsta PR.
   
   What do you think?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] andygrove commented on issue #2502: [EPIC] Move Ballista to new arrow-ballista repo

Posted by GitBox <gi...@apache.org>.
andygrove commented on issue #2502:
URL: https://github.com/apache/arrow-datafusion/issues/2502#issuecomment-1131869405

   The new repo is created: https://github.com/apache/arrow-ballista 
   
   I pushed the arrow-datafusion repo as of commit a08d26eef39bcb2adac527e5c260d31f473fca79
   
   There is a PR up to remove the datafusion crates from the new repo: https://github.com/apache/arrow-ballista/pull/1
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org