You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/04/04 17:44:10 UTC

[GitHub] [arrow] sanjibansg opened a new pull request, #12791: ARROW-16113: [Python] Partitioning.dictionaries in case of a subset of fields are dictionary encoded

sanjibansg opened a new pull request, #12791:
URL: https://github.com/apache/arrow/pull/12791

   This PR modifies the `dictionaries` method to have entries of all the fields. With this change, the method will return a list with the values if present, otherwise it shall contain `None`, thus returning a list of same length.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] sanjibansg commented on a diff in pull request #12791: ARROW-16113: [Python] Partitioning.dictionaries in case of a subset of fields are dictionary encoded

Posted by GitBox <gi...@apache.org>.
sanjibansg commented on code in PR #12791:
URL: https://github.com/apache/arrow/pull/12791#discussion_r842440686


##########
python/pyarrow/tests/test_dataset.py:
##########
@@ -623,7 +626,7 @@ def test_partitioning():
         dictionaries={
             "key": pa.array(["first", "second", "third"]),
         })
-    assert partitioning.dictionaries[0].to_pylist() == [
+    assert partitioning.dictionaries[1].to_pylist() == [

Review Comment:
   Made the change.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] ursabot commented on pull request #12791: ARROW-16113: [Python] Partitioning.dictionaries in case of a subset of fields are dictionary encoded

Posted by GitBox <gi...@apache.org>.
ursabot commented on PR #12791:
URL: https://github.com/apache/arrow/pull/12791#issuecomment-1088608878

   Benchmark runs are scheduled for baseline = 7616fba56ef4378d3eae5f5dd1196c693b4d4226 and contender = 85809a95419c246a28091aae85fa1da3e8dc5088. 85809a95419c246a28091aae85fa1da3e8dc5088 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Finished :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-xlarge-us-east-2](https://conbench.ursa.dev/compare/runs/67fe07bb8def43d9b7a120632ad4df22...1c02c534cec94acbaab09cd8634d5757/)
   [Failed :arrow_down:0.29% :arrow_up:0.0%] [test-mac-arm](https://conbench.ursa.dev/compare/runs/4278d8b8f7824f579f2744111faf4419...c636de8c26ab42638fbd75f91f21b046/)
   [Failed :arrow_down:0.0% :arrow_up:0.0%] [ursa-i9-9960x](https://conbench.ursa.dev/compare/runs/4c13d1fd0cca4d8c941631d3962ddd67...296d763fc4c2463db05b4bd6359a1273/)
   [Finished :arrow_down:0.04% :arrow_up:0.0%] [ursa-thinkcentre-m75q](https://conbench.ursa.dev/compare/runs/657044bc99794065806ab1dee2a081e7...8632c79ff2b5449cbe02e204cbb02dfc/)
   Buildkite builds:
   [Finished] <https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ec2-t3-xlarge-us-east-2/builds/441| `85809a95` ec2-t3-xlarge-us-east-2>
   [Finished] <https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-test-mac-arm/builds/426| `85809a95` test-mac-arm>
   [Failed] <https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-i9-9960x/builds/427| `85809a95` ursa-i9-9960x>
   [Finished] <https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-thinkcentre-m75q/builds/436| `85809a95` ursa-thinkcentre-m75q>
   [Finished] <https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ec2-t3-xlarge-us-east-2/builds/440| `7616fba5` ec2-t3-xlarge-us-east-2>
   [Failed] <https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-test-mac-arm/builds/425| `7616fba5` test-mac-arm>
   [Finished] <https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-i9-9960x/builds/426| `7616fba5` ursa-i9-9960x>
   [Finished] <https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-thinkcentre-m75q/builds/435| `7616fba5` ursa-thinkcentre-m75q>
   Supported benchmarks:
   ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
   test-mac-arm: Supported benchmark langs: C++, Python, R
   ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
   ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] jorisvandenbossche closed pull request #12791: ARROW-16113: [Python] Partitioning.dictionaries in case of a subset of fields are dictionary encoded

Posted by GitBox <gi...@apache.org>.
jorisvandenbossche closed pull request #12791: ARROW-16113: [Python] Partitioning.dictionaries in case of a subset of fields are dictionary encoded
URL: https://github.com/apache/arrow/pull/12791


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] sanjibansg commented on a diff in pull request #12791: ARROW-16113: [Python] Partitioning.dictionaries in case of a subset of fields are dictionary encoded

Posted by GitBox <gi...@apache.org>.
sanjibansg commented on code in PR #12791:
URL: https://github.com/apache/arrow/pull/12791#discussion_r842440413


##########
python/pyarrow/tests/test_dataset.py:
##########
@@ -549,7 +549,8 @@ def test_partitioning():
             pa.field('key', pa.float64())
         ])
     )
-    assert len(partitioning.dictionaries) == 0
+    assert len(partitioning.dictionaries) == 2
+    assert all(x is None for x in partitioning.dictionaries) is True

Review Comment:
   Made the change. Thanks for the suggestion!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] jorisvandenbossche commented on a diff in pull request #12791: ARROW-16113: [Python] Partitioning.dictionaries in case of a subset of fields are dictionary encoded

Posted by GitBox <gi...@apache.org>.
jorisvandenbossche commented on code in PR #12791:
URL: https://github.com/apache/arrow/pull/12791#discussion_r842409863


##########
python/pyarrow/tests/test_dataset.py:
##########
@@ -549,7 +549,8 @@ def test_partitioning():
             pa.field('key', pa.float64())
         ])
     )
-    assert len(partitioning.dictionaries) == 0
+    assert len(partitioning.dictionaries) == 2
+    assert all(x is None for x in partitioning.dictionaries) is True

Review Comment:
   ```suggestion
       assert all(x is None for x in partitioning.dictionaries)
   ```
   
   One minor comment: when doing an `assert all(...)`, the `is True` is not needed (`all` will return True or False, and `assert` will check that it is True)



##########
python/pyarrow/tests/test_dataset.py:
##########
@@ -623,7 +626,7 @@ def test_partitioning():
         dictionaries={
             "key": pa.array(["first", "second", "third"]),
         })
-    assert partitioning.dictionaries[0].to_pylist() == [
+    assert partitioning.dictionaries[1].to_pylist() == [

Review Comment:
   Can you also assert here that the first element is `None` (`assert partitioning.dictionaries[0] is None`)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] github-actions[bot] commented on pull request #12791: ARROW-16113: [Python] Partitioning.dictionaries in case of a subset of fields are dictionary encoded

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on PR #12791:
URL: https://github.com/apache/arrow/pull/12791#issuecomment-1087837606

   https://issues.apache.org/jira/browse/ARROW-16113


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org