You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@beam.apache.org by "Arwin S Tio (Jira)" <ji...@apache.org> on 2022/04/03 05:03:00 UTC

[jira] [Updated] (BEAM-14235) parquetio module does not parse PEP-440 compliant Pyarrow version

     [ https://issues.apache.org/jira/browse/BEAM-14235?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Arwin S Tio updated BEAM-14235:
-------------------------------
    Description: 
In version > 2.27, introduced by this PR: [https://github.com/apache/beam/pull/13302/files#diff-33b0b6b112036df96f341aa83b88efba9215ec14dfabc9db9e9ffe66a23154a2R55]

The parquetio module parses the pyarrow version like this:
{code:java}
ARROW_MAJOR_VERSION, _, _ = map(int, pa.__version__.split('.')) {code}
(see [https://github.com/apache/beam/blob/v2.27.0/sdks/python/apache_beam/io/parquetio.py#L55)]

 

This does not support all PEP-440 compliant versions: [https://peps.python.org/pep-0440/]

 

For example, if pyarrow were to have a version like this: *1.0.0+abc.7,* then this module would fail:
{code:java}
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/runpy.py", line 183, in _run_module_as_main
    mod_name, mod_spec, code = _get_module_details(mod_name, _Error)
  File "/usr/local/lib/python3.7/runpy.py", line 109, in _get_module_details
    __import__(pkg_name)
  File "/usr/local/lib/python3.7/site-packages/apache_beam/__init__.py", line 93, in <module>
    from apache_beam import io
  File "/usr/local/lib/python3.7/site-packages/apache_beam/io/__init__.py", line 28, in <module>
    from apache_beam.io.parquetio import *
  File "/usr/local/lib/python3.7/site-packages/apache_beam/io/parquetio.py", line 53, in <module>
    ARROW_MAJOR_VERSION, _, _ = map(int, pa.__version__.split('.'))
ValueError: invalid literal for int() with base 10: '0+abc.7'{code}
 

In practice, this would fail when somebody forks pyarrow, like yours truly.

 

We can fix this by using *pkg_resourses.parse_version* which is PEP-440 compliant starting setuptools 6.0. 

 

If maintainers agree with this change I would be wiling to submit a PR.

 

  was:
In version > 2.27, introduced by this PR: https://github.com/apache/beam/pull/13302/files#diff-33b0b6b112036df96f341aa83b88efba9215ec14dfabc9db9e9ffe66a23154a2R55

The parquetio module parses the pyarrow version like this:
{code:java}
ARROW_MAJOR_VERSION, _, _ = map(int, pa.__version__.split('.')) {code}
(see [https://github.com/apache/beam/blob/v2.27.0/sdks/python/apache_beam/io/parquetio.py#L55)]

 

This does not support all PEP-440 compliant versions: [https://peps.python.org/pep-0440/]

 

For example, if pyarrow were to have a version like this: *1.0.0+abc.7,* then this module would fail:


{code:java}
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/runpy.py", line 183, in _run_module_as_main
    mod_name, mod_spec, code = _get_module_details(mod_name, _Error)
  File "/usr/local/lib/python3.7/runpy.py", line 109, in _get_module_details
    __import__(pkg_name)
  File "/usr/local/lib/python3.7/site-packages/apache_beam/__init__.py", line 93, in <module>
    from apache_beam import io
  File "/usr/local/lib/python3.7/site-packages/apache_beam/io/__init__.py", line 28, in <module>
    from apache_beam.io.parquetio import *
  File "/usr/local/lib/python3.7/site-packages/apache_beam/io/parquetio.py", line 53, in <module>
    ARROW_MAJOR_VERSION, _, _ = map(int, pa.__version__.split('.'))
ValueError: invalid literal for int() with base 10: '0+abc.7'{code}
 

In practice, this would fail when somebody forks pyarrow, like yours truly.

 

We can fix this by using *pkg_resourses.parse_version* which is PEP-440 compliant starting setuptools 6.0. 

 

If maintainers agree with this change I would be wiling to submit a PR.

 

If maintainers agree w


> parquetio module does not parse PEP-440 compliant Pyarrow version
> -----------------------------------------------------------------
>
>                 Key: BEAM-14235
>                 URL: https://issues.apache.org/jira/browse/BEAM-14235
>             Project: Beam
>          Issue Type: Bug
>          Components: io-py-parquet
>    Affects Versions: 2.27.0
>            Reporter: Arwin S Tio
>            Priority: P3
>
> In version > 2.27, introduced by this PR: [https://github.com/apache/beam/pull/13302/files#diff-33b0b6b112036df96f341aa83b88efba9215ec14dfabc9db9e9ffe66a23154a2R55]
> The parquetio module parses the pyarrow version like this:
> {code:java}
> ARROW_MAJOR_VERSION, _, _ = map(int, pa.__version__.split('.')) {code}
> (see [https://github.com/apache/beam/blob/v2.27.0/sdks/python/apache_beam/io/parquetio.py#L55)]
>  
> This does not support all PEP-440 compliant versions: [https://peps.python.org/pep-0440/]
>  
> For example, if pyarrow were to have a version like this: *1.0.0+abc.7,* then this module would fail:
> {code:java}
> Traceback (most recent call last):
>   File "/usr/local/lib/python3.7/runpy.py", line 183, in _run_module_as_main
>     mod_name, mod_spec, code = _get_module_details(mod_name, _Error)
>   File "/usr/local/lib/python3.7/runpy.py", line 109, in _get_module_details
>     __import__(pkg_name)
>   File "/usr/local/lib/python3.7/site-packages/apache_beam/__init__.py", line 93, in <module>
>     from apache_beam import io
>   File "/usr/local/lib/python3.7/site-packages/apache_beam/io/__init__.py", line 28, in <module>
>     from apache_beam.io.parquetio import *
>   File "/usr/local/lib/python3.7/site-packages/apache_beam/io/parquetio.py", line 53, in <module>
>     ARROW_MAJOR_VERSION, _, _ = map(int, pa.__version__.split('.'))
> ValueError: invalid literal for int() with base 10: '0+abc.7'{code}
>  
> In practice, this would fail when somebody forks pyarrow, like yours truly.
>  
> We can fix this by using *pkg_resourses.parse_version* which is PEP-440 compliant starting setuptools 6.0. 
>  
> If maintainers agree with this change I would be wiling to submit a PR.
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)