You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2020/10/08 05:26:19 UTC

[GitHub] [arrow] quinnj opened a new pull request #8393: ARROW-10228: Contribute Julia implementation

quinnj opened a new pull request #8393:
URL: https://github.com/apache/arrow/pull/8393


   This pull request includes the core code from
   [Arrow.jl](https://github.com/JuliaData/Arrow.jl), a pure Julia
   implementation supporting the apache arrow in-memory/IPC format.
   
   This implementation supports the 1.0 version of the specification, including support for:
     * All primitive data types
     * All nested data types
     * Dictionary encodings and messages, including nested dict encodings
     * Extension types
     * Compression reading/writing
     * Large lists reading/writing
     * Custom alignment
     * Reading "tables" as one whole, or iterating record batches
     * Streaming, file, record batch, and replacement and isdelta dictionary messages
   
   It currently doesn't include support for:
     * Tensors or sparse tensors
     * Flight RPC
     * C data interface
   
   For testing, the original repo was setup to use a .travis.yml file to
   run all tests. Typical testing of a Julia package just involves running
   `Pkg.test("Arrow")` from the Julia REPL. I can explore setting up a
   github action if that's traditional for individual implementation
   testing (it seemed so from a quick scan of the repository).
   
   In the test folder, the `arrowjson.jl` includes code to read from the
   JSON-formatted arrow files. The `integrationtest.jl` file includes a
   command to convert from arrow to json, json to arrow, or validate
   between the two, based off what other language integration testing files
   seemed to support. I've looked a little into how exactly to update
   archery to run the Julia integration tests, but was also told on the
   mailing list that it can be a follow up.
   
   It's unclear to me whether an IP clearance form needs to be made, but am
   willing to do make one if required.
   
   It's also unclear exactly how the release process will work once the
   code is merged in. I understand there's a fixed release plan of some
   kind (time-based, I believe?); in terms of individual language releases,
   are they expected to make a corresponding release to the language
   package manager at the same time? For example, with the upcoming 2.0
   release, and assuming the Julia code was all merged in, would it be
   expected that a 2.0 release of Arrow.jl be made in the [Julia General
   package
   registry](https://github.com/JuliaRegistries/General/tree/master/A/Arrow)?
   Or is the 2.0 tag for the arrow _format_ and implementations should
   release when their _support_ is up to date? I'm in communications with
   the Julia package manager devs on how best to proceed with the setup
   here and subsequent release process.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] github-actions[bot] commented on pull request #8393: ARROW-10228: Contribute Julia implementation

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #8393:
URL: https://github.com/apache/arrow/pull/8393#issuecomment-705339836


   https://issues.apache.org/jira/browse/ARROW-10228


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] nealrichardson commented on pull request #8393: [NEEDS IP CLEARANCE] ARROW-10228: Contribute Julia implementation

Posted by GitBox <gi...@apache.org>.
nealrichardson commented on pull request #8393:
URL: https://github.com/apache/arrow/pull/8393#issuecomment-705695770


   @quinnj I can help you with the IP clearance process.
   
   To get started, here's a link to the Apache CLA: https://www.apache.org/licenses/contributor-agreements.html


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] wesm commented on pull request #8393: [NEEDS IP CLEARANCE] ARROW-10228: Contribute Julia implementation

Posted by GitBox <gi...@apache.org>.
wesm commented on pull request #8393:
URL: https://github.com/apache/arrow/pull/8393#issuecomment-705663389


   I think it would be best to conduct the IP clearance process for this codebase. The first step is to have a vote on the mailing list about accepting the code donation. We'll need to document the authors of the code (and collect CLAs from them -- note that CLAs are NOT necessary for contributions made directly to apahce/arrow) and any third party licenses involved as part of this


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] kszucs commented on pull request #8393: ARROW-10228: Contribute Julia implementation

Posted by GitBox <gi...@apache.org>.
kszucs commented on pull request #8393:
URL: https://github.com/apache/arrow/pull/8393#issuecomment-705494845


   This looks great, thanks Jacob!
   
   I can help you with the CI configuration to align with Arrow's current setup right after the 2.0 release.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou commented on pull request #8393: ARROW-10228: Contribute Julia implementation

Posted by GitBox <gi...@apache.org>.
pitrou commented on pull request #8393:
URL: https://github.com/apache/arrow/pull/8393#issuecomment-705429855


   To give a bit more light about 2.0, we're thinking of releasing ideally tomorrow or next week, so it seems a bit too tight :-)


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou commented on pull request #8393: ARROW-10228: Contribute Julia implementation

Posted by GitBox <gi...@apache.org>.
pitrou commented on pull request #8393:
URL: https://github.com/apache/arrow/pull/8393#issuecomment-705429855






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] nealrichardson commented on pull request #8393: [NEEDS IP CLEARANCE] ARROW-10228: Contribute Julia implementation

Posted by GitBox <gi...@apache.org>.
nealrichardson commented on pull request #8393:
URL: https://github.com/apache/arrow/pull/8393#issuecomment-705695770


   @quinnj I can help you with the IP clearance process.
   
   To get started, here's a link to the Apache CLA: https://www.apache.org/licenses/contributor-agreements.html


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] wesm commented on pull request #8393: [NEEDS IP CLEARANCE] ARROW-10228: Contribute Julia implementation

Posted by GitBox <gi...@apache.org>.
wesm commented on pull request #8393:
URL: https://github.com/apache/arrow/pull/8393#issuecomment-705663389


   I think it would be best to conduct the IP clearance process for this codebase. The first step is to have a vote on the mailing list about accepting the code donation. We'll need to document the authors of the code (and collect CLAs from them -- note that CLAs are NOT necessary for contributions made directly to apahce/arrow) and any third party licenses involved as part of this


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] github-actions[bot] commented on pull request #8393: ARROW-10228: Contribute Julia implementation

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #8393:
URL: https://github.com/apache/arrow/pull/8393#issuecomment-705339836


   https://issues.apache.org/jira/browse/ARROW-10228


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] quinnj commented on pull request #8393: [NEEDS IP CLEARANCE] ARROW-10228: Contribute Julia implementation

Posted by GitBox <gi...@apache.org>.
quinnj commented on pull request #8393:
URL: https://github.com/apache/arrow/pull/8393#issuecomment-705688058


   > We'll need to document the authors of the code
   
   The code included in this PR was done exclusively by me; I initially put the code under the MIT license, but recently changed it to the apache-2. Of the other julia package dependencies used (for compression, Tables.jl API integration, etc.), they all have the MIT license. I'm not aware of any other 3rd party licenses involved. Happy to do a CLA, just point me in the right direction.
   
   > I can help you with the CI configuration to align with Arrow's current setup right after the 2.0 release.
   
   Thanks @kszucs! Yeah, I've seen the recent activity towards 2.0, so definitely no rush here; just put up the PR here since it was mostly ready and I had a free night to dig in.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] quinnj commented on pull request #8393: [NEEDS IP CLEARANCE] ARROW-10228: Contribute Julia implementation

Posted by GitBox <gi...@apache.org>.
quinnj commented on pull request #8393:
URL: https://github.com/apache/arrow/pull/8393#issuecomment-707347557


   Sounds great


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] nealrichardson commented on pull request #8393: [NEEDS IP CLEARANCE] ARROW-10228: Contribute Julia implementation

Posted by GitBox <gi...@apache.org>.
nealrichardson commented on pull request #8393:
URL: https://github.com/apache/arrow/pull/8393#issuecomment-707315192


   Ok @quinnj should we close this one in favor of 8448 then?
   
   I'll start the IP clearance paperwork this week.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] quinnj commented on pull request #8393: [NEEDS IP CLEARANCE] ARROW-10228: Contribute Julia implementation

Posted by GitBox <gi...@apache.org>.
quinnj commented on pull request #8393:
URL: https://github.com/apache/arrow/pull/8393#issuecomment-705764895


   > To get started, here's a link to the Apache CLA: https://www.apache.org/licenses/contributor-agreements.html
   
   Ok, filled out the form and emailed the CLA.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] quinnj closed pull request #8393: [NEEDS IP CLEARANCE] ARROW-10228: Contribute Julia implementation

Posted by GitBox <gi...@apache.org>.
quinnj closed pull request #8393:
URL: https://github.com/apache/arrow/pull/8393


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] quinnj commented on pull request #8393: [NEEDS IP CLEARANCE] ARROW-10228: Contribute Julia implementation

Posted by GitBox <gi...@apache.org>.
quinnj commented on pull request #8393:
URL: https://github.com/apache/arrow/pull/8393#issuecomment-707289909


   I've received a response that my CLA was received and recorded for Apache Arrow. Happy to move forward on next steps.
   
   On my end, I've worked with one of the core package ecosystem devs on the best path for moving the "Arrow.jl" julia package from its existing location to here. We're preparing a new pull request that includes the previous Julia package versions to preserve history (users can always install previous versions of a package as needed for reproducing old results of scripts/applications). The new pull request will be similar in code content, but include 7-8 commits, one for each past version of the package; if these can be rebase-merged (when things are ready to merge, obviously), that will ensure the correct package history is preserved for Julia users. Happy to discuss more details here if things aren't clear.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] quinnj edited a comment on pull request #8393: [NEEDS IP CLEARANCE] ARROW-10228: Contribute Julia implementation

Posted by GitBox <gi...@apache.org>.
quinnj edited a comment on pull request #8393:
URL: https://github.com/apache/arrow/pull/8393#issuecomment-707289909


   I've received a response that my CLA was received and recorded for Apache Arrow. Happy to move forward on next steps.
   
   On my end, I've worked with one of the core package ecosystem devs on the best path for moving the "Arrow.jl" julia package from its existing location to here. We're preparing a new pull request that includes the previous Julia package versions to preserve history (users can always install previous versions of a package as needed for reproducing old results of scripts/applications). The new pull request will be similar in code content, but include 7-8 commits, one for each past version of the package; if these can be rebase-merged (when things are ready to merge, obviously), that will ensure the correct package history is preserved for Julia users. Happy to discuss more details here if things aren't clear.
   
   UPDATE: new pull request: https://github.com/apache/arrow/pull/8448


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] kszucs commented on pull request #8393: ARROW-10228: Contribute Julia implementation

Posted by GitBox <gi...@apache.org>.
kszucs commented on pull request #8393:
URL: https://github.com/apache/arrow/pull/8393#issuecomment-705494845


   This looks great, thanks Jacob!
   
   I can help you with the CI configuration to align with Arrow's current setup right after the 2.0 release.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] quinnj commented on pull request #8393: [NEEDS IP CLEARANCE] ARROW-10228: Contribute Julia implementation

Posted by GitBox <gi...@apache.org>.
quinnj commented on pull request #8393:
URL: https://github.com/apache/arrow/pull/8393#issuecomment-705688058






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou commented on pull request #8393: ARROW-10228: Contribute Julia implementation

Posted by GitBox <gi...@apache.org>.
pitrou commented on pull request #8393:
URL: https://github.com/apache/arrow/pull/8393#issuecomment-705435446


   > This implementation supports the 1.0 version of the specification, including support for: [...]
   
   That's already a lot, congratulations!
   
   > I can explore setting up a github action if that's traditional for individual implementation testing
   
   There's no preference. We tend to use GHA more than Travis because it offers more execution capacity, but we're using a lot of the latter already. IMHO you can choose whichever you're comfortable with.
   
   We try to trim CI jobs when the changes are unrelated (for example, don't run the C++ tests when only Julia changes are involved, or the converse). We use CI configuration for this where possible, and we also have a script `ci/detect_changes.py` that you should update for Julia.
   
   At some point it will be nice for Julia testing on Linux to use the same [docker-compose](https://github.com/apache/arrow/blob/master/docker-compose.yml) harness as other implementations. That can be deferred to a later task or PR, though.
   
   > I've looked a little into how exactly to update archery to run the Julia integration tests, but was also told on the mailing list that it can be a follow up.
   
   Definitely.
   
   > It's unclear to me whether an IP clearance form needs to be made, but am willing to do make one if required.
   
   I think so. @wesm can comment.
   
   >  I understand there's a fixed release plan of some kind (time-based, I believe?); in terms of individual language releases, are they expected to make a corresponding release to the language package manager at the same time?
   
   1) It's loosely time-based, yes (i.e. no hard calendar constraints, but once every ~2 months is the target).
   2) yes
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org