You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/01/16 00:21:53 UTC

[GitHub] [arrow-rs] houqp commented on issue #1176: Discussion: relationship / unification of arrow-rs and arrow2 going forward

houqp commented on issue #1176:
URL: https://github.com/apache/arrow-rs/issues/1176#issuecomment-1013781098


   Thank you @alamb for starting and driving this discussion. Great summary on the current community consensus.
   
   > It is not clear to me if there is a consensus on:
   > How important the Apache Governance model is (please lend your opinions here!)
   
   Personally, I think the Apache model works better for relatively slow moving monolith projects, while arrow2/parquet2 are fast evolving projects with a vision to be broken up into even smaller modular crates. @alamb has done an exceptional work on driving the arrow-rs releases. But seeing how much effort and time it takes, I would consider it a unnecessary overhead for arrow2 at its current stage. @jorgecarleitao was able to react to user feedbacks fast and release 3-4 new versions in a week for arrow2, this is simply not possible with the Apache Governance model. That said, I think the Apache voting process is very useful when you need high confidence on the quality of every single release and has a large diverse set of PMC members who can participate in the voting in a timely manner. But arrow2 seems still pretty far away from this.
   
   @andygrove brought up a good point that it might become an issue for large corporations with restrictive open source contribution guide lines. This is the first time I am aware of this issue, previously I was under the impression that software license is all what matters. On the other hand, I am guessing ASF is not the only governance that's allowed? Perhaps we could help @jorgecarleitao come up with a different compatible governance model for arrow2 until it's ready for the ASF contribution?  If Andy wants to contribute to arrow2 now but is blocked by lack of governance, then I would consider this a serious issue that we should address. Otherwise I would optimize for iteration velocity over governance until it becomes a real problem. 
   
   In short, from what I have seen so far, the upside from adopting the Apache governance model is to unblock potential contributions from big corporations. The downside is it will slow down our iteration process and potentially even disincentivize @jorgecarleitao from actively working on the project. Reading from his past emails, I get the feeling that he did try very hard to pass the IP clearance and donate arrow2 to ASF last year, but got frustrated by the bureaucracy. I am personally much more concerned about latter than the former.
   
   > How important the stability of APIs / the specific versioning scheme (0.x vs 1.x or later)
   
   IMHO, this is not important as long as it is well communicated to the users. i.e. be explicit that we are special and please treat our 8.x as 0.x until we say otherwise. But Jorge has a strong opinion on this and want to strictly follow what the rest of the Rust ecosystem does. I also understand where he is coming from and respect his stance on this.
   
   > Switch datafusion to arrow2, making no changes to arrow-rs. It could be maintained by anyone who wished to contribute,
   
   I agree with @andygrove on this. As long as there is community interests in this, we should probably still open arrow-rs up for contributions. This is not the result I want to see, but I have a feeling that this is likely what is going to happen :(
   
   > Start more actively porting the more ergonomic parts of arrow2 into arrow-rs 
   
   I think this is certainly doable, but then I stand by my previous comment that it won't be a good use of our time unless there is fundamental design tradeoffs in arrow-rs that are not compatible with arrow2's design. Simply replicating the design another project has is not a good reason to start a fork IMHO. I know @tustvold has a fairly strong opinion on this option and is more familiar with the parquet code base than I do, so perhaps he could help shed some light on this.
   
   > Option 2 leaves open the question of “how does arrow2 development move forward” – where would patches be sent, for example?
   
   Just throwing out random idea here, one potential variant of option 2 is we use arrow-rs as the place to maintain stable arrow2 branches and let arrow2 iterate as fast as it could without the fear of introducing breaking changes. While the stable branch will cherry-pick compatible commits for a specific 0.x release that we want to maintain for X months. This way, we can still direct all contributions back to arrow2. The downside is I don't know how much interests the community has for a stable API considering we just decided to stop maintaining stable releases for arrow-rs.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org