You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by GitBox <gi...@apache.org> on 2022/02/16 06:13:37 UTC

[GitHub] [arrow-julia] quinnj opened a new issue #284: Future of the Julia arrow implementation

quinnj opened a new issue #284:
URL: https://github.com/apache/arrow-julia/issues/284


   Hey all,
   
   I'm opening this issue to facilitate some discussion around the current state of and future direction of the Julia arrow implementation (that lives in this repo). As the original primary code author and main initiator of the transfer of the repo from the JuliaData github repo to the apache organization, I feel somewhat responsible to give an accounting and help craft a plan around the future.
   
   Before transferring the repo, there were a couple key things I, and other Julia contributors, failed to realize/fully consider:
     * No existing Arrow.jl committers would maintain any write permissions post-transfer until a roughly defined "amount of time" had passed showing consistent contributions to be officially nominated as an Arrow PMC member
     * How much administrative overhead would be required by the [official Apache release policies](https://www.apache.org/legal/release-policy.html)
     
   The reality of the Julia arrow implementation is as follows:
     * None of the key contributors, especially me, work on this code in any kind of "full time" or even very regular cadence. I personally am the primary maintainer of some dozen "data" packages in the Julia ecosystem and tend to work in short "blitzes", working through as many open issues/enhancements as possible for a few weeks to push towards a bigger release, while fixing minor or urgent bugs in between blitzes, in addition to reviewing other contributions as they come in. This doesn't ideally fit, in my understanding, the traditional PMC member profile, as it may be an entire year in between major "blitzes" all dependent on my own personal time/energy and demands of other projects.
     * This also impacts the 2nd point above where the overhead of Apache release policies has a heavier impact on the small amount of contributor effort we're already trying to muster.
   
   In short, I think:
     * The Julia implementation might just be too small to realistically be an official Apache project due to the overhead of required apache policies
     * In addition, the current Apache release policies seem more geared to languages (C++, java) that don't have existing infrastructure/processes around package release/maintenance and for projects with more contributor resources
     * The Julia General registry, builtin Pkg.jl package manager, and ecosystem culture have evolved a pretty high-quality "package maintenance and deployment" infrastructure/process that means the Julia implementation just doesn't benefit as much from the rigorous Apache policies compared to other languages/projects (I can expound on specifics here if people are interested, but in short, many specific Apache release policies are already kind of "builtin" to the standard Julia package release process).
     * Lastly, it's been pretty demoralizing while trying to make an effort to "officially" join the broader arrow community, to then be personally "reset" to the status of a completely new contributor. The key bottleneck in open source is _always_ finding enough contributor effort, and yet this "contributor reset" policy seems to actively impede existing contributors.
   
   
   All of that said, I'd like to tentatively propose moving (or just forking) the Julia implementation back to the JuliaData GitHub organization. I think that would "unblock" the implementation from the current lack of contributor ability and releases, while hopefully allowing the project to grow and gain new contributor resources.
   
   I recognize this may lead to being classified an "unofficial" implementation with regards to the arrow community, but personally, it won't change my desire to actively follow/contribute to arrow-dev mailing list discussions, join the biweekly community call, or contribute in other ways to the broader arrow community. I'm also still interested in improving the level of integration testing between the Julia implementation and other languages (via archery) and in general, making the Julia implementation as high-quality as possible and adding to the overall value of the arrow project as a whole. I don't propose this break from any negative feelings toward the arrow community or PMC members (they've only been helpful throughout this process), but more from a realistic consideration of resource constraints, policy applicability, and what I believe will _currently_ foster the best context for the Julia implementation to grow/succeed.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-julia] domoritz edited a comment on issue #284: Future of the Julia arrow implementation

Posted by GitBox <gi...@apache.org>.
domoritz edited a comment on issue #284:
URL: https://github.com/apache/arrow-julia/issues/284#issuecomment-1042480283


   Just adding two cents from the Arrow JS perspective. We are also a small developer group (3 contributors) working on Arrow in bursts and in no way full time. Overall, I'd say that being part of Apache adds some overhead but also gives us a lot more legitimacy and support. For example, there are awesome folks doing releases (including announcement blog posts), maintenance of the CI, and benchmarking.
   
   For a long time, I had no commit access to the main repo but sending pull requests worked fine. The other committers often didn't even merge the PRs but we had committers who don't work on JS jump in and merge approved PRs. Now I can also merge commits but for me it didn't make a huge difference in the day-to-day development. 
   
   Maybe there is a way to fast-track where people who committed significantly to the Julia Arrow package get considered for a committer role right away without nomination. To reduce the overhead for releases, could the Julia release process be folded into the releases for https://github.com/apache/arrow? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-julia] quinnj commented on issue #284: Future of the Julia arrow implementation

Posted by GitBox <gi...@apache.org>.
quinnj commented on issue #284:
URL: https://github.com/apache/arrow-julia/issues/284#issuecomment-1059886912


   Thank you @kou for all your help. I've created a PR so we can try out an initial release w/ the new process: https://github.com/apache/arrow-julia/pull/299


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-julia] alamb commented on issue #284: Future of the Julia arrow implementation

Posted by GitBox <gi...@apache.org>.
alamb commented on issue #284:
URL: https://github.com/apache/arrow-julia/issues/284#issuecomment-1042833415


   Speaking up from the rust implementation point of view:
   
   The rust ecosystem has a similar distribution mechanism (crates.io) that most of the rust ecosystem relies on and so the .tar.gz release based process of apache is also an impedance mismatch and does add some overhead. 
   
   Also, we took several steps that have allowed us to remain in the Apache project's umbrella but move more independently:
   1. In a separate repo (https://github.com/apache/arrow-rs -- still need to be a committer to arrow to merge PRs there) 
   2. Use github issues (rather than JIRA tickets)
   3. Started releasing versions based on the rust implementation rather than in lock step with the `arrow` repo
   
   This is an evolving process that  we have and will keep refining. While not everyone in the community agrees, I  believe that the value gained from being part of the Apache Arrow umbrella and governance is worth the cost.
   
   The largest overhead from my perspective (as the one who does most of the arrow-rs releases) for the process @kou lays out, is actually getting 3 PMC +1 votes for each release.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-julia] kou commented on issue #284: Future of the Julia arrow implementation

Posted by GitBox <gi...@apache.org>.
kou commented on issue #284:
URL: https://github.com/apache/arrow-julia/issues/284#issuecomment-1042585103


   > To reduce the overhead for releases, could the Julia release process be folded into the releases for https://github.com/apache/arrow?
   
   @domoritz Thanks for your comment! But this may not match the Julia implementation because the Julia implementation doesn't want to wait for 2-3 months for each release. On demand release is desired.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-julia] kou commented on issue #284: Future of the Julia arrow implementation

Posted by GitBox <gi...@apache.org>.
kou commented on issue #284:
URL: https://github.com/apache/arrow-julia/issues/284#issuecomment-1051280474


   I've implemented all release related scripts.
   We are ready to release a new version.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-julia] quinnj closed issue #284: Future of the Julia arrow implementation

Posted by GitBox <gi...@apache.org>.
quinnj closed issue #284:
URL: https://github.com/apache/arrow-julia/issues/284


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-julia] kou commented on issue #284: Future of the Julia arrow implementation

Posted by GitBox <gi...@apache.org>.
kou commented on issue #284:
URL: https://github.com/apache/arrow-julia/issues/284#issuecomment-1048264914


   OK.
   
   Now @quinnj become an Apache Arrow committer. @quinnj can merge pull requests for this repository.
   
   I created some issues (#286 , #287 and #288) for vote by PMC members related tasks. We can release a new version after we resolve them. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-julia] emkornfield commented on issue #284: Future of the Julia arrow implementation

Posted by GitBox <gi...@apache.org>.
emkornfield commented on issue #284:
URL: https://github.com/apache/arrow-julia/issues/284#issuecomment-1045353586


   I don't think this will solve the immediate pain, but I think some of the existing workflows of release are out-dated with respect to existing tooling.  I'm going to try to write something up that we can maybe raise to the board in our next report or sooner.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-julia] quinnj commented on issue #284: Future of the Julia arrow implementation

Posted by GitBox <gi...@apache.org>.
quinnj commented on issue #284:
URL: https://github.com/apache/arrow-julia/issues/284#issuecomment-1046118519


   Thanks for everyone's comments. I appreciate the support and quick feedback. I do think it would make a big difference to have commit rights back; as I introspect, that's what makes me personally feel the most "stalled" on doing work. I appreciate and think @kou's suggestion would work with regards to moving forward on releases. I don't want to add too much burden/effort required to @kou's plate, so maybe much can be automated or simplified in the future.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-julia] kou commented on issue #284: Future of the Julia arrow implementation

Posted by GitBox <gi...@apache.org>.
kou commented on issue #284:
URL: https://github.com/apache/arrow-julia/issues/284#issuecomment-1042583219


   My understanding that we have 2 problems to be fixed:
   
   1. Extra release work
   2. Current maintainers/developers don't have write access to apache/arrow-julia
   
   Right?
   
   It seems that we can fix 2. according to Wes' comment. (I didn't know the approach. Sorry.)
   
   For 1., how about doing all extra release work by me?
   It means that I prepare tarball and signs, upload them and start a vote for each release.
   Current maintainers just does the followings as almost usual:
   1. Test the revision to be released
   2. Increment version number in `Project.toml`
   3.  (This is only a new step,) Ping me and wait for vote result (Normally, a vote will be finished in 24-72 hours)
   4. Use JuliaTagBot


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-julia] jrevels commented on issue #284: Future of the Julia arrow implementation

Posted by GitBox <gi...@apache.org>.
jrevels commented on issue #284:
URL: https://github.com/apache/arrow-julia/issues/284#issuecomment-1041572288


   As a similarly "blitz"-y former contributor to Arrow.jl, I wholeheartedly echo @quinnj's sentiments.
   
   @kou and the Arrow community have been incredibly welcoming and I truly appreciate their efforts to integrate the Julia package with the overall Apache org. It's clearly a lot of work, and clearly born from an earnest desire to see the community grow and the software improve at a steady clip.
   
   However, I do think it was detrimental to the Julia package's health that the core maintainers lost commit access to the Julia package in the transfer to Apache. These former committers had the unique combo of context, skillset, and merit-driven community standing (within the package itself, and in the Julia community at large) to function as core maintainers of the package; stripping them of their access puts the package in an awkward state and goes against the merit-based philosophy that seems to be a shared core between the Julia and Apache communities.
   
   > The Julia General registry, builtin Pkg.jl package manager, and ecosystem culture have evolved a pretty high-quality "package maintenance and deployment" infrastructure/process that means the Julia implementation just doesn't benefit as much from the rigorous Apache policies compared to other languages/projects (I can expound on specifics here if people are interested, but in short, many specific Apache release policies are already kind of "builtin" to the standard Julia package release process).
   
   💯 I could not have said it better, and I think this is might be a big driver of the impedance mismatch here. One of the nicest elements of Julia development is how the community's organically-grown package development process achieves such a high degree of ergonomics and robustness simultaneously, both within a given package and across package ecosystems. It lowers the barrier immensely for new contributors without sacrificing quality or iteration time for maintainers, which is (IMO) was a key driver of Julia's early success.
   
   It's pretty unfortunate, because I truly believe that close collaboration between the Arrow/Julia worlds could yield really cool capabilities - Julia's JIT/multiple dispatch plus Arrow's memory model constitutes an incredibly fun space to explore from a distributed computing/data engineering perspective.
   
   But I agree that if we can't resolve some of the current development pain points in this repository, a fork might make the most sense just to return the codebase to a healthier state so that progress can be made. Further echoing @quinnj, I hope that the cross-community collaboration continues to flourish regardless of the development path we take.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-julia] jonkeane commented on issue #284: Future of the Julia arrow implementation

Posted by GitBox <gi...@apache.org>.
jonkeane commented on issue #284:
URL: https://github.com/apache/arrow-julia/issues/284#issuecomment-1043155238


   And in the same theme as @nealrichardson : I'm not a PMC, so I can't help on the vote front. But I am a commiter, and I'm more than happy to be pinged on PRs that need merging (or even some reviewing, learning more about Julia sounds fun) to help out when and where I can.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-julia] domoritz commented on issue #284: Future of the Julia arrow implementation

Posted by GitBox <gi...@apache.org>.
domoritz commented on issue #284:
URL: https://github.com/apache/arrow-julia/issues/284#issuecomment-1042480283


   Just adding two cents from the Arrow JS perspective. We are also a small developer group (3 contributors) working on Arrow in bursts and in no way full time. Overall, I'd say that being part of Apache adds some overhead but also gives us a lot more legitimacy and support. For example, there are awesome folks doing releases (including announcement blog posts), maintenance of the CI, and benchmarking.
   
   For a long time, I had no commit access to the main repo but sending pull requests worked fine. The other committers often didn't even merge the PRs but we had committers who don't work on JS jump in and merge approved PRs. Now I can also merge commits but honestly it didn't make a huge difference in the day-to-day development. For me, the process of becoming a committer was to be recommended by an existing committer and then approval by the PMC.
   
   Maybe there is a way to fast-track where people who committed significantly to the Julia Arrow package get considered for a committer role right away without nomination. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-julia] nealrichardson commented on issue #284: Future of the Julia arrow implementation

Posted by GitBox <gi...@apache.org>.
nealrichardson commented on issue #284:
URL: https://github.com/apache/arrow-julia/issues/284#issuecomment-1043153335


   > The largest overhead from my perspective (as the one who does most of the arrow-rs releases) for the process @kou lays out, is actually getting 3 PMC +1 votes for each release.
   
   I'm personally more than happy to provide a vote on Julia release candidates (as I have done in the past for Rust when needed).
   
   These all sound like issues we can work together to resolve.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-julia] wesm commented on issue #284: Future of the Julia arrow implementation

Posted by GitBox <gi...@apache.org>.
wesm commented on issue #284:
URL: https://github.com/apache/arrow-julia/issues/284#issuecomment-1042470001


   It is often customary to grant commit access to the major contributors that are party to a large code donation through the IP clearance process. For example, many people become committers or PMC members of a project as the result of a code donation through the Apache Incubator process. I think that would be more than appropriate in this case, but you may have to bear with us for 72 hours to formalize this. 
   
   For the record, I do not think it was anyone's intention to take away anyone's agency as a result of the code donation / relocation to the ASF. So for example "Lastly, it's been pretty demoralizing while trying to make an effort to "officially" join the broader arrow community, to then be personally "reset" to the status of a completely new contributor. " certainly this was not our intent, as stated above. I apologize if a different impression was given. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org