You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "0x26res (via GitHub)" <gi...@apache.org> on 2023/04/13 17:43:12 UTC

[GitHub] [arrow] 0x26res opened a new pull request, #35113: GH-35112: [Python] Expose keys_sorted in python MapType

0x26res opened a new pull request, #35113:
URL: https://github.com/apache/arrow/pull/35113

   
   ### Rationale for this change
   
   It not possible to read `keys_sorted` in the python API
   
   ### What changes are included in this PR?
   
   - expose keys_sorted in `cdef class MapType` / types.pxi
   - add tests
   
   ### Are these changes tested?
   
   yes
   
   ### Are there any user-facing changes?
   
   We're exposing keys_sorted but I guess the documentation will update itself from the `"""` pydoc (?)
   
   This is not an API breaking change
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] 0x26res commented on pull request #35113: GH-35112: [Python] Expose keys_sorted in python MapType

Posted by "0x26res (via GitHub)" <gi...@apache.org>.
0x26res commented on PR #35113:
URL: https://github.com/apache/arrow/pull/35113#issuecomment-1510902015

   Are we then saying that:
   - This field is pure meta data and it's left to the user to provide sorted keys? 
   - it won't be enforced by arrow?
   
   For context, I'm not really using that field. I just need to be able to access it in order to create slightly modified copies of schemas. For example if I want to change the type of nested fields (int32 -> int64). Then I need to make copy of `pa.map_
   ` types, preserving the keys_sorted metadata.
   
   | So maybe the question is do we want to add a check for it in PyArrow Array and Scalar?
   
   Maybe we should create a follow up issue to do this. It would involve making some change that may break some stuff at runtime (if someone was previously providing unsorted data with `keys_sorted=True`).
   
   As far as this MR is concerned, I think we should just improve the doc for that field (and probably update the doc in the C++ MapType class).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] AlenkaF commented on pull request #35113: GH-35112: [Python] Expose keys_sorted in python MapType

Posted by "AlenkaF (via GitHub)" <gi...@apache.org>.
AlenkaF commented on PR #35113:
URL: https://github.com/apache/arrow/pull/35113#issuecomment-1510864909

   I am not sure. I think the type only has the `keys_sorted` defined as metadata. I do not see any reference to it in [pyarrow.MapArray](https://arrow.apache.org/docs/python/generated/pyarrow.MapArray.html#pyarrow.MapArray) or [pyarrow.MapScalar](https://arrow.apache.org/docs/python/generated/pyarrow.MapScalar.html#pyarrow.MapScalar).
   
   So maybe the question is do we want to add a check for it in PyArrow Array and Scalar?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] 0x26res commented on pull request #35113: GH-35112: [Python] Expose keys_sorted in python MapType

Posted by "0x26res (via GitHub)" <gi...@apache.org>.
0x26res commented on PR #35113:
URL: https://github.com/apache/arrow/pull/35113#issuecomment-1512751044

   @AlenkaF thanks for the suggestion, I've updated the comment.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] ursabot commented on pull request #35113: GH-35112: [Python] Expose keys_sorted in python MapType

Posted by "ursabot (via GitHub)" <gi...@apache.org>.
ursabot commented on PR #35113:
URL: https://github.com/apache/arrow/pull/35113#issuecomment-1515651824

   Benchmark runs are scheduled for baseline = 9f852d46b95e00df5546847eccbe376eefcec857 and contender = 1deb740e02fa928e60ce611790d9dff2d1a6077e. 1deb740e02fa928e60ce611790d9dff2d1a6077e is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Finished :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-xlarge-us-east-2](https://conbench.ursa.dev/compare/runs/aa36bad2758f4b8283f36cae1e2f6045...73018541042d4808a6600e0cbd941064/)
   [Failed] [test-mac-arm](https://conbench.ursa.dev/compare/runs/bcf25b3f9104425b97ab3b3a5fd226ab...068dddb20be048b0b054463e140c2917/)
   [Finished :arrow_down:7.65% :arrow_up:0.0%] [ursa-i9-9960x](https://conbench.ursa.dev/compare/runs/2ec8119fbf004f9b8b8fcd9b50c700ce...9a08131ea23a49a5bf5f63355e5d7050/)
   [Finished :arrow_down:0.54% :arrow_up:0.09%] [ursa-thinkcentre-m75q](https://conbench.ursa.dev/compare/runs/7cecbacb719a4f1782d83767fa11bab7...adcce15e47534a55af077a02d287e697/)
   Buildkite builds:
   [Finished] [`1deb740e` ec2-t3-xlarge-us-east-2](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ec2-t3-xlarge-us-east-2/builds/2741)
   [Failed] [`1deb740e` test-mac-arm](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-test-mac-arm/builds/2775)
   [Finished] [`1deb740e` ursa-i9-9960x](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-i9-9960x/builds/2739)
   [Finished] [`1deb740e` ursa-thinkcentre-m75q](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-thinkcentre-m75q/builds/2766)
   [Finished] [`9f852d46` ec2-t3-xlarge-us-east-2](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ec2-t3-xlarge-us-east-2/builds/2740)
   [Failed] [`9f852d46` test-mac-arm](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-test-mac-arm/builds/2774)
   [Finished] [`9f852d46` ursa-i9-9960x](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-i9-9960x/builds/2738)
   [Finished] [`9f852d46` ursa-thinkcentre-m75q](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-thinkcentre-m75q/builds/2765)
   Supported benchmarks:
   ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
   test-mac-arm: Supported benchmark langs: C++, Python, R
   ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
   ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] github-actions[bot] commented on pull request #35113: GH-35112: [Python] Expose keys_sorted in python MapType

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on PR #35113:
URL: https://github.com/apache/arrow/pull/35113#issuecomment-1507370829

   * Closes: #35112


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] AlenkaF commented on pull request #35113: GH-35112: [Python] Expose keys_sorted in python MapType

Posted by "AlenkaF (via GitHub)" <gi...@apache.org>.
AlenkaF commented on PR #35113:
URL: https://github.com/apache/arrow/pull/35113#issuecomment-1512537787

   Yeah, the issue of `keys_sorted` being only metadata or not should be a separate from this PR. We can open a new issue to start a discussion on that.
   
   I agree that for this PR the aim is to expose the parameter as a property so we are able to get the information from the Data Type.
   
   My suggestion for the docs would be to only make it explicit in PyArrow for example: "_Should_ the entries be sorted according to keys."


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] danepitkin commented on pull request #35113: GH-35112: [Python] Expose keys_sorted in python MapType

Posted by "danepitkin (via GitHub)" <gi...@apache.org>.
danepitkin commented on PR #35113:
URL: https://github.com/apache/arrow/pull/35113#issuecomment-1507571938

   LGTM! I'll let someone with committers rights finalize this review.
   
   I believe you can ignore the appveyor error, since there's been issues on main with pytests timing out.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] AlenkaF merged pull request #35113: GH-35112: [Python] Expose keys_sorted in python MapType

Posted by "AlenkaF (via GitHub)" <gi...@apache.org>.
AlenkaF merged PR #35113:
URL: https://github.com/apache/arrow/pull/35113


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] AlenkaF commented on pull request #35113: GH-35112: [Python] Expose keys_sorted in python MapType

Posted by "AlenkaF (via GitHub)" <gi...@apache.org>.
AlenkaF commented on PR #35113:
URL: https://github.com/apache/arrow/pull/35113#issuecomment-1508145952

   The binding looks good to me also +1, thank you for the contribution!
   
   I do have a general, probably silly question, about the keyword in general. Looking at the C++ and the tests, it is meant as a "metadata" keyword and not a "check" that the data is actually sorted, right? What I mean is, you can have a `MapType` defined with `keys_sorted=True` but using it in Scalars for example, the keys do not actually have to be sorted (ascending?):
   
   ```python
   >>> ty = pa.map_(pa.string(), pa.int8(), keys_sorted=True)
   >>> v = [('b', 2), ('a', 1)]
   >>> s = pa.scalar(v, type=ty)
   >>> s
   <pyarrow.MapScalar: [('b', 2), ('a', 1)]>
   ```
   
   And Dane is correct, the failing tests are not connected to this PR.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] github-actions[bot] commented on pull request #35113: GH-35112: [Python] Expose keys_sorted in python MapType

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on PR #35113:
URL: https://github.com/apache/arrow/pull/35113#issuecomment-1507370860

   :warning: GitHub issue #35112 **has been automatically assigned in GitHub** to PR creator.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] danepitkin commented on pull request #35113: GH-35112: [Python] Expose keys_sorted in python MapType

Posted by "danepitkin (via GitHub)" <gi...@apache.org>.
danepitkin commented on PR #35113:
URL: https://github.com/apache/arrow/pull/35113#issuecomment-1509129582

   Good catch, @AlenkaF . Is that a possible bug that the underlying C++ implementation doesn't maintain a sorted order?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] ursabot commented on pull request #35113: GH-35112: [Python] Expose keys_sorted in python MapType

Posted by "ursabot (via GitHub)" <gi...@apache.org>.
ursabot commented on PR #35113:
URL: https://github.com/apache/arrow/pull/35113#issuecomment-1515652552

   ['Python', 'R'] benchmarks have high level of regressions.
   [ursa-i9-9960x](https://conbench.ursa.dev/compare/runs/2ec8119fbf004f9b8b8fcd9b50c700ce...9a08131ea23a49a5bf5f63355e5d7050/)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org