You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@beam.apache.org by Wiśniowski Piotr <co...@gmail.com> on 2023/11/10 12:41:59 UTC

Fwd: [Python SDK] PyArrow Critical Vulnerability

Hi,

Few days ago this one was detected: 
https://securityonline.info/cve-2023-47248-pyarrow-arbitrary-code-execution-vulnerability-a-critical-threat-to-data-analysts/

I do see that beam 2.51.0 does have `pyarrow<=12.0.0` in requirements.

1. Is there a reason for not allowing newer versions of pyarrow?

2. Is there any planned effort on updating this to `14.0.1`? Is it 
possible to push the update to `2.52.0` beam release? I know the beam 
release is almost there.

Best

Wiśniowski Piotr


Re: [Python SDK] PyArrow Critical Vulnerability

Posted by Wiśniowski Piotr <co...@gmail.com>.
Hi Valentyn,

Thank You for information and details. All make sense! I think we can 
wait for 2.53.0 release and meantime apply hotfix.

Best

Wiśniowski Piotr

On 10.11.2023 20:27, Valentyn Tymofieiev via user wrote:
> From https://pypi.org/project/pyarrow-hotfix/ :
>
> pyarrow_hotfix must be imported in your application or library code 
> for it to take effect.
> Just installing the package is not sufficient:
>
> For Beam users, that means that the pipeline code running on the 
> workers would need to import this module on every worker, for example 
> by adding this line to DoFn.setup or in main session (if pipeline is 
> composed only from one file AND uses dill pickler with 
> --save_main_session flag).
>
> We will continue addressing this in 
> https://github.com/apache/beam/issues/29392.
>
> On Fri, Nov 10, 2023 at 10:23 AM Valentyn Tymofieiev 
> <va...@google.com> wrote:
>
>     Hi Piotr, thanks for bringing this to the list.
>
>     There is a FR to support pyarrow
>     https://github.com/apache/beam/issues/28410 . I looked into it
>     briefly in https://github.com/apache/beam/pull/28437 but saw some
>     test failures and it has been on back burner. Given the news about
>     vulnerability it would make sense to prioritize this.
>
>     I think we could decouple this from 2.52.0 release since:
>       1) there is a workaround
>       2) new versions of pyarrow haven't been fully tested with Beam
>       3) Beam 2.52.0 fixes some other issues that are known to
>     affecting users, e.g. https://github.com/apache/beam/issues/28246
>
>     From
>     https://securityonline.info/cve-2023-47248-pyarrow-arbitrary-code-execution-vulnerability-a-critical-threat-to-data-analysts/
>     :
>       > If you cannot upgrade to PyArrow 14.0.1, you can use the
>     pyarrow-hotfix package to disable the vulnerability on older
>     versions of PyArrow. However, this is not a permanent solution,
>     and you should upgrade to PyArrow 14.0.1 as soon as possible. We
>     could consider adding pyarrow-hotfix to the containers for 2.52.0
>     release. CC: @Danny McCormick
>     <ma...@google.com> (release manager).
>
>     Beam users can also install this additional dependency via one of
>     the ways described in
>     https://beam.apache.org/documentation/sdks/python-pipeline-dependencies/
>     .
>
>
>
>     On Fri, Nov 10, 2023 at 4:42 AM Wiśniowski Piotr
>     <co...@gmail.com> wrote:
>
>         Hi,
>
>         Few days ago this one was detected:
>         https://securityonline.info/cve-2023-47248-pyarrow-arbitrary-code-execution-vulnerability-a-critical-threat-to-data-analysts/
>
>         I do see that beam 2.51.0 does have `pyarrow<=12.0.0` in
>         requirements.
>
>         1. Is there a reason for not allowing newer versions of pyarrow?
>
>         2. Is there any planned effort on updating this to `14.0.1`?
>         Is it
>         possible to push the update to `2.52.0` beam release? I know
>         the beam
>         release is almost there.
>
>         Best
>
>         Wiśniowski Piotr
>

Re: [Python SDK] PyArrow Critical Vulnerability

Posted by Valentyn Tymofieiev via dev <de...@beam.apache.org>.
From  https://pypi.org/project/pyarrow-hotfix/ :

pyarrow_hotfix must be imported in your application or library code for it
to take effect.
Just installing the package is not sufficient:

For Beam users, that means that the pipeline code running on the workers
would need to import this module on every worker, for example by adding
this line to DoFn.setup or in main session (if pipeline is composed only
from one file AND uses dill pickler with --save_main_session flag).

We will continue addressing this in
https://github.com/apache/beam/issues/29392.

On Fri, Nov 10, 2023 at 10:23 AM Valentyn Tymofieiev <va...@google.com>
wrote:

> Hi Piotr, thanks for bringing this to the list.
>
> There is a FR to support pyarrow
> https://github.com/apache/beam/issues/28410 . I looked into it briefly in
> https://github.com/apache/beam/pull/28437 but saw some test failures and
> it has been on back burner. Given the news about vulnerability it would
> make sense to prioritize this.
>
> I think we could decouple this from 2.52.0 release since:
>   1) there is a workaround
>   2) new versions of pyarrow haven't been fully tested with Beam
>   3) Beam 2.52.0 fixes some other issues that are known to affecting
> users, e.g. https://github.com/apache/beam/issues/28246
>
> From
> https://securityonline.info/cve-2023-47248-pyarrow-arbitrary-code-execution-vulnerability-a-critical-threat-to-data-analysts/
> :
>   > If you cannot upgrade to PyArrow 14.0.1, you can use the
> pyarrow-hotfix package to disable the vulnerability on older versions of
> PyArrow. However, this is not a permanent solution, and you should upgrade
> to PyArrow 14.0.1 as soon as possible. We could consider adding
> pyarrow-hotfix to the containers for 2.52.0 release. CC: @Danny McCormick
> <da...@google.com> (release manager).
>
> Beam users can also install this additional dependency via one of the ways
> described in
> https://beam.apache.org/documentation/sdks/python-pipeline-dependencies/ .
>
>
>
> On Fri, Nov 10, 2023 at 4:42 AM Wiśniowski Piotr <
> contact.wisniowskipiotr@gmail.com> wrote:
>
>> Hi,
>>
>> Few days ago this one was detected:
>>
>> https://securityonline.info/cve-2023-47248-pyarrow-arbitrary-code-execution-vulnerability-a-critical-threat-to-data-analysts/
>>
>> I do see that beam 2.51.0 does have `pyarrow<=12.0.0` in requirements.
>>
>> 1. Is there a reason for not allowing newer versions of pyarrow?
>>
>> 2. Is there any planned effort on updating this to `14.0.1`? Is it
>> possible to push the update to `2.52.0` beam release? I know the beam
>> release is almost there.
>>
>> Best
>>
>> Wiśniowski Piotr
>>
>>

Re: [Python SDK] PyArrow Critical Vulnerability

Posted by Valentyn Tymofieiev via user <us...@beam.apache.org>.
From  https://pypi.org/project/pyarrow-hotfix/ :

pyarrow_hotfix must be imported in your application or library code for it
to take effect.
Just installing the package is not sufficient:

For Beam users, that means that the pipeline code running on the workers
would need to import this module on every worker, for example by adding
this line to DoFn.setup or in main session (if pipeline is composed only
from one file AND uses dill pickler with --save_main_session flag).

We will continue addressing this in
https://github.com/apache/beam/issues/29392.

On Fri, Nov 10, 2023 at 10:23 AM Valentyn Tymofieiev <va...@google.com>
wrote:

> Hi Piotr, thanks for bringing this to the list.
>
> There is a FR to support pyarrow
> https://github.com/apache/beam/issues/28410 . I looked into it briefly in
> https://github.com/apache/beam/pull/28437 but saw some test failures and
> it has been on back burner. Given the news about vulnerability it would
> make sense to prioritize this.
>
> I think we could decouple this from 2.52.0 release since:
>   1) there is a workaround
>   2) new versions of pyarrow haven't been fully tested with Beam
>   3) Beam 2.52.0 fixes some other issues that are known to affecting
> users, e.g. https://github.com/apache/beam/issues/28246
>
> From
> https://securityonline.info/cve-2023-47248-pyarrow-arbitrary-code-execution-vulnerability-a-critical-threat-to-data-analysts/
> :
>   > If you cannot upgrade to PyArrow 14.0.1, you can use the
> pyarrow-hotfix package to disable the vulnerability on older versions of
> PyArrow. However, this is not a permanent solution, and you should upgrade
> to PyArrow 14.0.1 as soon as possible. We could consider adding
> pyarrow-hotfix to the containers for 2.52.0 release. CC: @Danny McCormick
> <da...@google.com> (release manager).
>
> Beam users can also install this additional dependency via one of the ways
> described in
> https://beam.apache.org/documentation/sdks/python-pipeline-dependencies/ .
>
>
>
> On Fri, Nov 10, 2023 at 4:42 AM Wiśniowski Piotr <
> contact.wisniowskipiotr@gmail.com> wrote:
>
>> Hi,
>>
>> Few days ago this one was detected:
>>
>> https://securityonline.info/cve-2023-47248-pyarrow-arbitrary-code-execution-vulnerability-a-critical-threat-to-data-analysts/
>>
>> I do see that beam 2.51.0 does have `pyarrow<=12.0.0` in requirements.
>>
>> 1. Is there a reason for not allowing newer versions of pyarrow?
>>
>> 2. Is there any planned effort on updating this to `14.0.1`? Is it
>> possible to push the update to `2.52.0` beam release? I know the beam
>> release is almost there.
>>
>> Best
>>
>> Wiśniowski Piotr
>>
>>

Re: [Python SDK] PyArrow Critical Vulnerability

Posted by Valentyn Tymofieiev via dev <de...@beam.apache.org>.
Hi Piotr, thanks for bringing this to the list.

There is a FR to support pyarrow https://github.com/apache/beam/issues/28410
. I looked into it briefly in https://github.com/apache/beam/pull/28437 but
saw some test failures and it has been on back burner. Given the news about
vulnerability it would make sense to prioritize this.

I think we could decouple this from 2.52.0 release since:
  1) there is a workaround
  2) new versions of pyarrow haven't been fully tested with Beam
  3) Beam 2.52.0 fixes some other issues that are known to affecting users,
e.g. https://github.com/apache/beam/issues/28246

From
https://securityonline.info/cve-2023-47248-pyarrow-arbitrary-code-execution-vulnerability-a-critical-threat-to-data-analysts/
:
  > If you cannot upgrade to PyArrow 14.0.1, you can use the pyarrow-hotfix
package to disable the vulnerability on older versions of PyArrow. However,
this is not a permanent solution, and you should upgrade to PyArrow 14.0.1
as soon as possible. We could consider adding pyarrow-hotfix to the
containers for 2.52.0 release. CC: @Danny McCormick
<da...@google.com> (release manager).

Beam users can also install this additional dependency via one of the ways
described in
https://beam.apache.org/documentation/sdks/python-pipeline-dependencies/ .



On Fri, Nov 10, 2023 at 4:42 AM Wiśniowski Piotr <
contact.wisniowskipiotr@gmail.com> wrote:

> Hi,
>
> Few days ago this one was detected:
>
> https://securityonline.info/cve-2023-47248-pyarrow-arbitrary-code-execution-vulnerability-a-critical-threat-to-data-analysts/
>
> I do see that beam 2.51.0 does have `pyarrow<=12.0.0` in requirements.
>
> 1. Is there a reason for not allowing newer versions of pyarrow?
>
> 2. Is there any planned effort on updating this to `14.0.1`? Is it
> possible to push the update to `2.52.0` beam release? I know the beam
> release is almost there.
>
> Best
>
> Wiśniowski Piotr
>
>

Re: [Python SDK] PyArrow Critical Vulnerability

Posted by Valentyn Tymofieiev via user <us...@beam.apache.org>.
Hi Piotr, thanks for bringing this to the list.

There is a FR to support pyarrow https://github.com/apache/beam/issues/28410
. I looked into it briefly in https://github.com/apache/beam/pull/28437 but
saw some test failures and it has been on back burner. Given the news about
vulnerability it would make sense to prioritize this.

I think we could decouple this from 2.52.0 release since:
  1) there is a workaround
  2) new versions of pyarrow haven't been fully tested with Beam
  3) Beam 2.52.0 fixes some other issues that are known to affecting users,
e.g. https://github.com/apache/beam/issues/28246

From
https://securityonline.info/cve-2023-47248-pyarrow-arbitrary-code-execution-vulnerability-a-critical-threat-to-data-analysts/
:
  > If you cannot upgrade to PyArrow 14.0.1, you can use the pyarrow-hotfix
package to disable the vulnerability on older versions of PyArrow. However,
this is not a permanent solution, and you should upgrade to PyArrow 14.0.1
as soon as possible. We could consider adding pyarrow-hotfix to the
containers for 2.52.0 release. CC: @Danny McCormick
<da...@google.com> (release manager).

Beam users can also install this additional dependency via one of the ways
described in
https://beam.apache.org/documentation/sdks/python-pipeline-dependencies/ .



On Fri, Nov 10, 2023 at 4:42 AM Wiśniowski Piotr <
contact.wisniowskipiotr@gmail.com> wrote:

> Hi,
>
> Few days ago this one was detected:
>
> https://securityonline.info/cve-2023-47248-pyarrow-arbitrary-code-execution-vulnerability-a-critical-threat-to-data-analysts/
>
> I do see that beam 2.51.0 does have `pyarrow<=12.0.0` in requirements.
>
> 1. Is there a reason for not allowing newer versions of pyarrow?
>
> 2. Is there any planned effort on updating this to `14.0.1`? Is it
> possible to push the update to `2.52.0` beam release? I know the beam
> release is almost there.
>
> Best
>
> Wiśniowski Piotr
>
>