You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2020/07/10 14:58:36 UTC

[GitHub] [arrow] lidavidm opened a new pull request #7702: ARROW-9395: [Python] allow configuring MetadataVersion

lidavidm opened a new pull request #7702:
URL: https://github.com/apache/arrow/pull/7702


   Adds the environment variable `ARROW_PRE_1_0_METADATA_VERSION`. Can rename it to anything else.
   
   I opted to remove `use_legacy_format` in underscore APIs, but kept it in public APIs.
   
   Also makes the necessary changes to Flight.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] wesm commented on pull request #7702: ARROW-9395: [Python] allow configuring MetadataVersion

Posted by GitBox <gi...@apache.org>.
wesm commented on pull request #7702:
URL: https://github.com/apache/arrow/pull/7702#issuecomment-657290062


   The library should test that versions higher than V5 are rejected though. We're going to add that check in C++ , so we need a Java analogue for https://issues.apache.org/jira/browse/ARROW-9399


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] BryanCutler commented on pull request #7702: ARROW-9395: [Python] allow configuring MetadataVersion

Posted by GitBox <gi...@apache.org>.
BryanCutler commented on pull request #7702:
URL: https://github.com/apache/arrow/pull/7702#issuecomment-656949654


   Thanks for doing this @lidavidm ! I believe this will be necessary for a PySpark v2.4.x or v3.0.0 user that upgrades to PyArrow 1.0.0. It would be good to setup the spark integration test to verify this too.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] wesm edited a comment on pull request #7702: ARROW-9395: [Python] allow configuring MetadataVersion

Posted by GitBox <gi...@apache.org>.
wesm edited a comment on pull request #7702:
URL: https://github.com/apache/arrow/pull/7702#issuecomment-657289064


   V4 is backwards compatible so we read it as though it were V5 and only error on unions that have top level nulls


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] BryanCutler commented on pull request #7702: ARROW-9395: [Python] allow configuring MetadataVersion

Posted by GitBox <gi...@apache.org>.
BryanCutler commented on pull request #7702:
URL: https://github.com/apache/arrow/pull/7702#issuecomment-657288029


   >The only place in Arrow/Java where the metadata version gets checked is in MessageSerializer.deserializeMessageBatch, which only gets used in a unit test and isn't used in Spark, so I suppose we got lucky ("lucky") here.
   
   Yeah, you're right. Looks like the `MetadataVersion` does not get checked when reading file or stream formats. That should be fixed right? The correct behavior is to not allow reading metadata from a version other than the current, except in this case where we allow v4 if it doesn't have unions. Is that what is done in C++?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] lidavidm commented on pull request #7702: ARROW-9395: [Python] allow configuring MetadataVersion

Posted by GitBox <gi...@apache.org>.
lidavidm commented on pull request #7702:
URL: https://github.com/apache/arrow/pull/7702#issuecomment-657099127


   I built Spark 3.0 with Arrow 0.15.1 (the default)
   -> Python tests pass with PyArrow 0.17.1
   -> Python tests pass with PyArrow from this PR with the environment variable set
   -> Python tests pass with PyArrow from this PR *without* the environment variable set
   
   The only place in Arrow/Java where the metadata version gets checked is in `MessageSerializer.deserializeMessageBatch`, which only gets used in a unit test and isn't used in Spark, so I suppose we got lucky ("lucky") here. Still, having the options will be useful for systems that may actually check.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] wesm commented on pull request #7702: ARROW-9395: [Python] allow configuring MetadataVersion

Posted by GitBox <gi...@apache.org>.
wesm commented on pull request #7702:
URL: https://github.com/apache/arrow/pull/7702#issuecomment-657289064


   V4 is backwards compatible so we read it in V5 and only error on unions that have top level nulls


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] wesm closed pull request #7702: ARROW-9395: [Python] allow configuring MetadataVersion

Posted by GitBox <gi...@apache.org>.
wesm closed pull request #7702:
URL: https://github.com/apache/arrow/pull/7702


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] github-actions[bot] commented on pull request #7702: ARROW-9395: [Python] allow configuring MetadataVersion

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #7702:
URL: https://github.com/apache/arrow/pull/7702#issuecomment-656724147


   https://issues.apache.org/jira/browse/ARROW-9395


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] lidavidm commented on pull request #7702: ARROW-9395: [Python] allow configuring MetadataVersion

Posted by GitBox <gi...@apache.org>.
lidavidm commented on pull request #7702:
URL: https://github.com/apache/arrow/pull/7702#issuecomment-656950069


   If that's https://github.com/apache/arrow/blob/master/ci/scripts/integration_spark.sh then I'll take a look when I get a chance, thanks.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org