You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2020/10/05 15:13:50 UTC

[GitHub] [arrow] bkietz opened a new pull request #8343: ARROW-9147: [C++][Dataset] Support projection from null->any type

bkietz opened a new pull request #8343:
URL: https://github.com/apache/arrow/pull/8343


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jorisvandenbossche commented on pull request #8343: ARROW-9147: [C++][Dataset] Support projection from null->any type

Posted by GitBox <gi...@apache.org>.
jorisvandenbossche commented on pull request #8343:
URL: https://github.com/apache/arrow/pull/8343#issuecomment-704765279


   Remaining failures look unrelated? 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] bkietz commented on a change in pull request #8343: ARROW-9147: [C++][Dataset] Support projection from null->any type

Posted by GitBox <gi...@apache.org>.
bkietz commented on a change in pull request #8343:
URL: https://github.com/apache/arrow/pull/8343#discussion_r500413021



##########
File path: python/pyarrow/tests/test_dataset.py
##########
@@ -2124,6 +2124,23 @@ def test_dataset_project_only_partition_columns(tempdir):
     assert all_cols.column('part').equals(part_only.column('part'))
 
 
+@pytest.mark.parquet
+@pytest.mark.pandas
+def test_dataset_project_null_column(tempdir):
+    import pandas as pd
+    df = pd.DataFrame({"col": np.array([None, None, None], dtype='object')})
+
+    f = tempdir / "test_dataset_project_null_column.parquet"
+    df.to_parquet(f, engine="pyarrow")
+
+    import pyarrow as pa
+    import pyarrow.dataset as ds

Review comment:
       ```suggestion
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] bkietz commented on a change in pull request #8343: ARROW-9147: [C++][Dataset] Support projection from null->any type

Posted by GitBox <gi...@apache.org>.
bkietz commented on a change in pull request #8343:
URL: https://github.com/apache/arrow/pull/8343#discussion_r499848398



##########
File path: cpp/src/arrow/dataset/projector.cc
##########
@@ -46,6 +46,15 @@ Status CheckProjectable(const Schema& from, const Schema& to) {
                                from);
     }
 
+    if (from_field->type()->id() == Type::NA) {
+      // promotion from null to any type is supported
+      if (to_field->nullable()) continue;
+
+      return Status::TypeError("field ", to_field->ToString(),
+                               " is not nullable and but has type ", NullType(),

Review comment:
       oops, copypasta error




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jorisvandenbossche commented on pull request #8343: ARROW-9147: [C++][Dataset] Support projection from null->any type

Posted by GitBox <gi...@apache.org>.
jorisvandenbossche commented on pull request #8343:
URL: https://github.com/apache/arrow/pull/8343#issuecomment-704897806


   Thanks @bkietz !


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] github-actions[bot] commented on pull request #8343: ARROW-9147: [C++][Dataset] Support projection from null->any type

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #8343:
URL: https://github.com/apache/arrow/pull/8343#issuecomment-703713112


   https://issues.apache.org/jira/browse/ARROW-9147


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou closed pull request #8343: ARROW-9147: [C++][Dataset] Support projection from null->any type

Posted by GitBox <gi...@apache.org>.
pitrou closed pull request #8343:
URL: https://github.com/apache/arrow/pull/8343


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou commented on pull request #8343: ARROW-9147: [C++][Dataset] Support projection from null->any type

Posted by GitBox <gi...@apache.org>.
pitrou commented on pull request #8343:
URL: https://github.com/apache/arrow/pull/8343#issuecomment-704824694


   Yes, they are.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou commented on a change in pull request #8343: ARROW-9147: [C++][Dataset] Support projection from null->any type

Posted by GitBox <gi...@apache.org>.
pitrou commented on a change in pull request #8343:
URL: https://github.com/apache/arrow/pull/8343#discussion_r499739059



##########
File path: cpp/src/arrow/dataset/projector.cc
##########
@@ -46,6 +46,15 @@ Status CheckProjectable(const Schema& from, const Schema& to) {
                                from);
     }
 
+    if (from_field->type()->id() == Type::NA) {
+      // promotion from null to any type is supported
+      if (to_field->nullable()) continue;
+
+      return Status::TypeError("field ", to_field->ToString(),
+                               " is not nullable and but has type ", NullType(),

Review comment:
       "and but"?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou commented on a change in pull request #8343: ARROW-9147: [C++][Dataset] Support projection from null->any type

Posted by GitBox <gi...@apache.org>.
pitrou commented on a change in pull request #8343:
URL: https://github.com/apache/arrow/pull/8343#discussion_r500272157



##########
File path: python/pyarrow/tests/test_dataset.py
##########
@@ -2124,6 +2124,23 @@ def test_dataset_project_only_partition_columns(tempdir):
     assert all_cols.column('part').equals(part_only.column('part'))
 
 
+@pytest.mark.parquet
+@pytest.mark.pandas
+def test_dataset_project_null_column(tempdir):
+    import pandas as pd
+    df = pd.DataFrame({"col": np.array([None, None, None], dtype='object')})
+
+    f = tempdir / "test_dataset_project_null_column.parquet"
+    df.to_parquet(f, engine="pyarrow")
+
+    import pyarrow as pa
+    import pyarrow.dataset as ds

Review comment:
       These two imports should be unnecessary?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org