You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@parquet.apache.org by "Katelman, Michael" <Mi...@CubistSystematic.com> on 2017/09/26 16:10:21 UTC
pyarrow hang
Hi,
I sometimes see pyarrow.parquet.write_table hang and was wondering if this is as known issue or specific to me. I usually call write_table like this:
pyarrow.parquet.write_table(pyarrow.Table.from_arrays(...), <output-path>, compression="SNAPPY")
and am running code built from apache-arrow-0.5.0 and apache-parquet-cpp-1.2.0-rc1. I have not found a single table on which the hang is consistently reproducible: usually, it will hang after calling write_table multiple times and non-deterministically. It also only seems to only occur when the table being written contains at least one column with string data.
Any thoughts or suggestions would be appreciated.
-Mike
DISCLAIMER: This e-mail message and any attachments are intended solely for the use of the individual or entity to which it is addressed and may contain information that is confidential or legally privileged. If you are not the intended recipient, you are hereby notified that any dissemination, distribution, copying or other use of this message or its attachments is strictly prohibited. If you have received this message in error, please notify the sender immediately and permanently delete this message and any attachments.
RE: pyarrow hang
Posted by "Katelman, Michael" <Mi...@CubistSystematic.com>.
Thanks, Wes.
-----Original Message-----
From: Wes McKinney [mailto:wesmckinn@gmail.com]
Sent: Tuesday, September 26, 2017 13:02
To: dev@parquet.apache.org
Subject: Re: pyarrow hang
Binaries for these are available using either
pip install pyarrow
or
conda install pyarrow -c conda-forge
On Tue, Sep 26, 2017 at 1:00 PM, Katelman, Michael <Mi...@cubistsystematic.com> wrote:
> Thanks, Uwe. I really appreciate the response. I'll build one of the versions you mentioned.
>
> -----Original Message-----
> From: Uwe L. Korn [mailto:uwelk@xhochy.com]
> Sent: Tuesday, September 26, 2017 12:12
> To: dev@parquet.apache.org
> Subject: Re: pyarrow hang
>
> Hello Mike,
>
> this is a known issue with jemalloc. Using the Arrow 0.6.0/0.7.0 release should avoid this.
>
> Uwe
>
> On Tue, Sep 26, 2017, at 06:10 PM, Katelman, Michael wrote:
>> Hi,
>>
>> I sometimes see pyarrow.parquet.write_table hang and was wondering if
>> this is as known issue or specific to me. I usually call write_table
>> like
>> this:
>>
>> pyarrow.parquet.write_table(pyarrow.Table.from_arrays(...),
>> <output-path>, compression="SNAPPY")
>>
>> and am running code built from apache-arrow-0.5.0 and
>> apache-parquet-cpp-1.2.0-rc1. I have not found a single table on
>> which the hang is consistently reproducible: usually, it will hang
>> after calling write_table multiple times and non-deterministically.
>> It also only seems to only occur when the table being written
>> contains at least one column with string data.
>>
>> Any thoughts or suggestions would be appreciated.
>>
>> -Mike
>>
>>
>>
>>
>>
>> DISCLAIMER: This e-mail message and any attachments are intended
>> solely for the use of the individual or entity to which it is
>> addressed and may contain information that is confidential or legally
>> privileged. If you are not the intended recipient, you are hereby
>> notified that any dissemination, distribution, copying or other use
>> of this message or its attachments is strictly prohibited. If you
>> have received this message in error, please notify the sender
>> immediately and permanently delete this message and any attachments.
>>
>>
>>
>
>
>
>
>
> DISCLAIMER: This e-mail message and any attachments are intended solely for the use of the individual or entity to which it is addressed and may contain information that is confidential or legally privileged. If you are not the intended recipient, you are hereby notified that any dissemination, distribution, copying or other use of this message or its attachments is strictly prohibited. If you have received this message in error, please notify the sender immediately and permanently delete this message and any attachments.
>
>
>
Re: pyarrow hang
Posted by Wes McKinney <we...@gmail.com>.
Binaries for these are available using either
pip install pyarrow
or
conda install pyarrow -c conda-forge
On Tue, Sep 26, 2017 at 1:00 PM, Katelman, Michael
<Mi...@cubistsystematic.com> wrote:
> Thanks, Uwe. I really appreciate the response. I'll build one of the versions you mentioned.
>
> -----Original Message-----
> From: Uwe L. Korn [mailto:uwelk@xhochy.com]
> Sent: Tuesday, September 26, 2017 12:12
> To: dev@parquet.apache.org
> Subject: Re: pyarrow hang
>
> Hello Mike,
>
> this is a known issue with jemalloc. Using the Arrow 0.6.0/0.7.0 release should avoid this.
>
> Uwe
>
> On Tue, Sep 26, 2017, at 06:10 PM, Katelman, Michael wrote:
>> Hi,
>>
>> I sometimes see pyarrow.parquet.write_table hang and was wondering if
>> this is as known issue or specific to me. I usually call write_table
>> like
>> this:
>>
>> pyarrow.parquet.write_table(pyarrow.Table.from_arrays(...),
>> <output-path>, compression="SNAPPY")
>>
>> and am running code built from apache-arrow-0.5.0 and
>> apache-parquet-cpp-1.2.0-rc1. I have not found a single table on which
>> the hang is consistently reproducible: usually, it will hang after
>> calling write_table multiple times and non-deterministically. It also
>> only seems to only occur when the table being written contains at
>> least one column with string data.
>>
>> Any thoughts or suggestions would be appreciated.
>>
>> -Mike
>>
>>
>>
>>
>>
>> DISCLAIMER: This e-mail message and any attachments are intended
>> solely for the use of the individual or entity to which it is
>> addressed and may contain information that is confidential or legally
>> privileged. If you are not the intended recipient, you are hereby
>> notified that any dissemination, distribution, copying or other use of
>> this message or its attachments is strictly prohibited. If you have
>> received this message in error, please notify the sender immediately
>> and permanently delete this message and any attachments.
>>
>>
>>
>
>
>
>
>
> DISCLAIMER: This e-mail message and any attachments are intended solely for the use of the individual or entity to which it is addressed and may contain information that is confidential or legally privileged. If you are not the intended recipient, you are hereby notified that any dissemination, distribution, copying or other use of this message or its attachments is strictly prohibited. If you have received this message in error, please notify the sender immediately and permanently delete this message and any attachments.
>
>
>
RE: pyarrow hang
Posted by "Katelman, Michael" <Mi...@CubistSystematic.com>.
Thanks, Uwe. I really appreciate the response. I'll build one of the versions you mentioned.
-----Original Message-----
From: Uwe L. Korn [mailto:uwelk@xhochy.com]
Sent: Tuesday, September 26, 2017 12:12
To: dev@parquet.apache.org
Subject: Re: pyarrow hang
Hello Mike,
this is a known issue with jemalloc. Using the Arrow 0.6.0/0.7.0 release should avoid this.
Uwe
On Tue, Sep 26, 2017, at 06:10 PM, Katelman, Michael wrote:
> Hi,
>
> I sometimes see pyarrow.parquet.write_table hang and was wondering if
> this is as known issue or specific to me. I usually call write_table
> like
> this:
>
> pyarrow.parquet.write_table(pyarrow.Table.from_arrays(...),
> <output-path>, compression="SNAPPY")
>
> and am running code built from apache-arrow-0.5.0 and
> apache-parquet-cpp-1.2.0-rc1. I have not found a single table on which
> the hang is consistently reproducible: usually, it will hang after
> calling write_table multiple times and non-deterministically. It also
> only seems to only occur when the table being written contains at
> least one column with string data.
>
> Any thoughts or suggestions would be appreciated.
>
> -Mike
>
>
>
>
>
> DISCLAIMER: This e-mail message and any attachments are intended
> solely for the use of the individual or entity to which it is
> addressed and may contain information that is confidential or legally
> privileged. If you are not the intended recipient, you are hereby
> notified that any dissemination, distribution, copying or other use of
> this message or its attachments is strictly prohibited. If you have
> received this message in error, please notify the sender immediately
> and permanently delete this message and any attachments.
>
>
>
DISCLAIMER: This e-mail message and any attachments are intended solely for the use of the individual or entity to which it is addressed and may contain information that is confidential or legally privileged. If you are not the intended recipient, you are hereby notified that any dissemination, distribution, copying or other use of this message or its attachments is strictly prohibited. If you have received this message in error, please notify the sender immediately and permanently delete this message and any attachments.
Re: pyarrow hang
Posted by "Uwe L. Korn" <uw...@xhochy.com>.
Hello Mike,
this is a known issue with jemalloc. Using the Arrow 0.6.0/0.7.0 release
should avoid this.
Uwe
On Tue, Sep 26, 2017, at 06:10 PM, Katelman, Michael wrote:
> Hi,
>
> I sometimes see pyarrow.parquet.write_table hang and was wondering if
> this is as known issue or specific to me. I usually call write_table like
> this:
>
> pyarrow.parquet.write_table(pyarrow.Table.from_arrays(...),
> <output-path>, compression="SNAPPY")
>
> and am running code built from apache-arrow-0.5.0 and
> apache-parquet-cpp-1.2.0-rc1. I have not found a single table on which
> the hang is consistently reproducible: usually, it will hang after
> calling write_table multiple times and non-deterministically. It also
> only seems to only occur when the table being written contains at least
> one column with string data.
>
> Any thoughts or suggestions would be appreciated.
>
> -Mike
>
>
>
>
>
> DISCLAIMER: This e-mail message and any attachments are intended solely
> for the use of the individual or entity to which it is addressed and may
> contain information that is confidential or legally privileged. If you
> are not the intended recipient, you are hereby notified that any
> dissemination, distribution, copying or other use of this message or its
> attachments is strictly prohibited. If you have received this message in
> error, please notify the sender immediately and permanently delete this
> message and any attachments.
>
>
>