You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/05/04 00:36:16 UTC

[GitHub] [arrow] alexdesiqueira opened a new pull request, #13060: ARROW-16243: [C++][Python] Remove Parquet ReadSchemaField method

alexdesiqueira opened a new pull request, #13060:
URL: https://github.com/apache/arrow/pull/13060

   As @pitrou started discusses in [JIRA#ARROW-16243](https://issues.apache.org/jira/browse/ARROW-16243), it seems that `ReadSchemaField()` doesn't do anything different than `ReadColumn()`. This PR removes it and its Python method.
   @jorisvandenbossche says that
   > (...) both are different, as Parquet counts the final child leaves, while Arrow counts the top-level parent leaves
   > The column indices you pass to eg ReadTable are parquet-based column indices.
   
   Please feel free to close it if (not) necessary.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] pitrou merged pull request #13060: ARROW-16243: [C++][Python] Remove Parquet ReadSchemaField method

Posted by GitBox <gi...@apache.org>.
pitrou merged PR #13060:
URL: https://github.com/apache/arrow/pull/13060


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] github-actions[bot] commented on pull request #13060: ARROW-16243: [C++][Python] Remove Parquet ReadSchemaField method

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on PR #13060:
URL: https://github.com/apache/arrow/pull/13060#issuecomment-1116824762

   :warning: Ticket **has not been started in JIRA**, please click 'Start Progress'.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] pitrou commented on a diff in pull request #13060: ARROW-16243: [C++][Python] Remove Parquet ReadSchemaField method

Posted by GitBox <gi...@apache.org>.
pitrou commented on code in PR #13060:
URL: https://github.com/apache/arrow/pull/13060#discussion_r876855711


##########
python/pyarrow/_parquet.pyx:
##########
@@ -1421,13 +1421,6 @@ cdef class ParquetReader(_Weakrefable):
                          .ReadColumn(column_index, &out))
         return pyarrow_wrap_chunked_array(out)
 
-    def read_schema_field(self, int field_index):

Review Comment:
   @jorisvandenbossche Ping.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] jorisvandenbossche commented on a diff in pull request #13060: ARROW-16243: [C++][Python] Remove Parquet ReadSchemaField method

Posted by GitBox <gi...@apache.org>.
jorisvandenbossche commented on code in PR #13060:
URL: https://github.com/apache/arrow/pull/13060#discussion_r877029737


##########
python/pyarrow/_parquet.pyx:
##########
@@ -1421,13 +1421,6 @@ cdef class ParquetReader(_Weakrefable):
                          .ReadColumn(column_index, &out))
         return pyarrow_wrap_chunked_array(out)
 
-    def read_schema_field(self, int field_index):

Review Comment:
   Sorry, missed this ping. Yes, I think it is fine to just remove this. The ParquetReader is indeed not meant as public API, although it is unfortunate (I see now ..) that we actually import this in the pyarrow.parquet namespace, instead of calling it as `_parquet.ParquetReader` (that probably something else we could fix). As a user you should use `ParquetFile`, the `ParquetReader` is quite inconvenient to construct.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] github-actions[bot] commented on pull request #13060: ARROW-16243: [C++][Python] Remove Parquet ReadSchemaField method

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on PR #13060:
URL: https://github.com/apache/arrow/pull/13060#issuecomment-1116824699

   https://issues.apache.org/jira/browse/ARROW-16243


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] pitrou commented on a diff in pull request #13060: ARROW-16243: [C++][Python] Remove Parquet ReadSchemaField method

Posted by GitBox <gi...@apache.org>.
pitrou commented on code in PR #13060:
URL: https://github.com/apache/arrow/pull/13060#discussion_r865926081


##########
python/pyarrow/_parquet.pyx:
##########
@@ -1421,13 +1421,6 @@ cdef class ParquetReader(_Weakrefable):
                          .ReadColumn(column_index, &out))
         return pyarrow_wrap_chunked_array(out)
 
-    def read_schema_field(self, int field_index):

Review Comment:
   @jorisvandenbossche Do you think it's fine to remove this (undocumented) method? The ParquetReader class doesn't seem to be documented as a public API...



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] alexdesiqueira commented on pull request #13060: ARROW-16243: [C++][Python] Remove Parquet ReadSchemaField method

Posted by GitBox <gi...@apache.org>.
alexdesiqueira commented on PR #13060:
URL: https://github.com/apache/arrow/pull/13060#issuecomment-1117581669

   Thank you for taking a look at it, @pitrou!
   > Still, we should first deprecate the C++ API, not remove it immediately
   
   Of course, my bad :sweat_smile: I reverted the last commit, added `ARROW_DEPRECATED` to `ReadSchemaField` and removed `read_schema_field`. Let me know if I'm on the right track.
   Thanks again!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] pitrou commented on pull request #13060: ARROW-16243: [C++][Python] Remove Parquet ReadSchemaField method

Posted by GitBox <gi...@apache.org>.
pitrou commented on PR #13060:
URL: https://github.com/apache/arrow/pull/13060#issuecomment-1117015678

   For the record, `ReadSchemaField` was added in https://github.com/apache/parquet-cpp/pull/312.
   It was originally called by other functions in that PR, but that was later removed.
   
   Still, we should first deprecate the C++ API, not remove it immediately (you can look for the `ARROW_DEPRECATED` macro which will help you do that). Of course, the unused Cython declaration can be removed.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org