You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@parquet.apache.org by "Katelman, Michael" <Mi...@CubistSystematic.com> on 2017/09/26 16:10:21 UTC

pyarrow hang

Hi,

I sometimes see pyarrow.parquet.write_table hang and was wondering if this is as known issue or specific to me. I usually call write_table like this:

pyarrow.parquet.write_table(pyarrow.Table.from_arrays(...), <output-path>, compression="SNAPPY")

and am running code built from apache-arrow-0.5.0 and apache-parquet-cpp-1.2.0-rc1. I have not found a single table on which the hang is consistently reproducible: usually, it will hang after calling write_table multiple times and non-deterministically. It also only seems to only occur when the table being written contains at least one column with string data.

Any thoughts or suggestions would be appreciated.

-Mike





DISCLAIMER: This e-mail message and any attachments are intended solely for the use of the individual or entity to which it is addressed and may contain information that is confidential or legally privileged. If you are not the intended recipient, you are hereby notified that any dissemination, distribution, copying or other use of this message or its attachments is strictly prohibited. If you have received this message in error, please notify the sender immediately and permanently delete this message and any attachments.




RE: pyarrow hang

Posted by "Katelman, Michael" <Mi...@CubistSystematic.com>.
Thanks, Wes.

-----Original Message-----
From: Wes McKinney [mailto:wesmckinn@gmail.com] 
Sent: Tuesday, September 26, 2017 13:02
To: dev@parquet.apache.org
Subject: Re: pyarrow hang

Binaries for these are available using either

pip install pyarrow

or

conda install pyarrow -c conda-forge

On Tue, Sep 26, 2017 at 1:00 PM, Katelman, Michael <Mi...@cubistsystematic.com> wrote:
> Thanks, Uwe. I really appreciate the response. I'll build one of the versions you mentioned.
>
> -----Original Message-----
> From: Uwe L. Korn [mailto:uwelk@xhochy.com]
> Sent: Tuesday, September 26, 2017 12:12
> To: dev@parquet.apache.org
> Subject: Re: pyarrow hang
>
> Hello Mike,
>
> this is a known issue with jemalloc. Using the Arrow 0.6.0/0.7.0 release should avoid this.
>
> Uwe
>
> On Tue, Sep 26, 2017, at 06:10 PM, Katelman, Michael wrote:
>> Hi,
>>
>> I sometimes see pyarrow.parquet.write_table hang and was wondering if 
>> this is as known issue or specific to me. I usually call write_table 
>> like
>> this:
>>
>> pyarrow.parquet.write_table(pyarrow.Table.from_arrays(...),
>> <output-path>, compression="SNAPPY")
>>
>> and am running code built from apache-arrow-0.5.0 and 
>> apache-parquet-cpp-1.2.0-rc1. I have not found a single table on 
>> which the hang is consistently reproducible: usually, it will hang 
>> after calling write_table multiple times and non-deterministically. 
>> It also only seems to only occur when the table being written 
>> contains at least one column with string data.
>>
>> Any thoughts or suggestions would be appreciated.
>>
>> -Mike
>>
>>
>>
>>
>>
>> DISCLAIMER: This e-mail message and any attachments are intended 
>> solely for the use of the individual or entity to which it is 
>> addressed and may contain information that is confidential or legally 
>> privileged. If you are not the intended recipient, you are hereby 
>> notified that any dissemination, distribution, copying or other use 
>> of this message or its attachments is strictly prohibited. If you 
>> have received this message in error, please notify the sender 
>> immediately and permanently delete this message and any attachments.
>>
>>
>>
>
>
>
>
>
> DISCLAIMER: This e-mail message and any attachments are intended solely for the use of the individual or entity to which it is addressed and may contain information that is confidential or legally privileged. If you are not the intended recipient, you are hereby notified that any dissemination, distribution, copying or other use of this message or its attachments is strictly prohibited. If you have received this message in error, please notify the sender immediately and permanently delete this message and any attachments.
>
>
>

Re: pyarrow hang

Posted by Wes McKinney <we...@gmail.com>.
Binaries for these are available using either

pip install pyarrow

or

conda install pyarrow -c conda-forge

On Tue, Sep 26, 2017 at 1:00 PM, Katelman, Michael
<Mi...@cubistsystematic.com> wrote:
> Thanks, Uwe. I really appreciate the response. I'll build one of the versions you mentioned.
>
> -----Original Message-----
> From: Uwe L. Korn [mailto:uwelk@xhochy.com]
> Sent: Tuesday, September 26, 2017 12:12
> To: dev@parquet.apache.org
> Subject: Re: pyarrow hang
>
> Hello Mike,
>
> this is a known issue with jemalloc. Using the Arrow 0.6.0/0.7.0 release should avoid this.
>
> Uwe
>
> On Tue, Sep 26, 2017, at 06:10 PM, Katelman, Michael wrote:
>> Hi,
>>
>> I sometimes see pyarrow.parquet.write_table hang and was wondering if
>> this is as known issue or specific to me. I usually call write_table
>> like
>> this:
>>
>> pyarrow.parquet.write_table(pyarrow.Table.from_arrays(...),
>> <output-path>, compression="SNAPPY")
>>
>> and am running code built from apache-arrow-0.5.0 and
>> apache-parquet-cpp-1.2.0-rc1. I have not found a single table on which
>> the hang is consistently reproducible: usually, it will hang after
>> calling write_table multiple times and non-deterministically. It also
>> only seems to only occur when the table being written contains at
>> least one column with string data.
>>
>> Any thoughts or suggestions would be appreciated.
>>
>> -Mike
>>
>>
>>
>>
>>
>> DISCLAIMER: This e-mail message and any attachments are intended
>> solely for the use of the individual or entity to which it is
>> addressed and may contain information that is confidential or legally
>> privileged. If you are not the intended recipient, you are hereby
>> notified that any dissemination, distribution, copying or other use of
>> this message or its attachments is strictly prohibited. If you have
>> received this message in error, please notify the sender immediately
>> and permanently delete this message and any attachments.
>>
>>
>>
>
>
>
>
>
> DISCLAIMER: This e-mail message and any attachments are intended solely for the use of the individual or entity to which it is addressed and may contain information that is confidential or legally privileged. If you are not the intended recipient, you are hereby notified that any dissemination, distribution, copying or other use of this message or its attachments is strictly prohibited. If you have received this message in error, please notify the sender immediately and permanently delete this message and any attachments.
>
>
>

RE: pyarrow hang

Posted by "Katelman, Michael" <Mi...@CubistSystematic.com>.
Thanks, Uwe. I really appreciate the response. I'll build one of the versions you mentioned.

-----Original Message-----
From: Uwe L. Korn [mailto:uwelk@xhochy.com] 
Sent: Tuesday, September 26, 2017 12:12
To: dev@parquet.apache.org
Subject: Re: pyarrow hang

Hello Mike,

this is a known issue with jemalloc. Using the Arrow 0.6.0/0.7.0 release should avoid this.

Uwe

On Tue, Sep 26, 2017, at 06:10 PM, Katelman, Michael wrote:
> Hi,
> 
> I sometimes see pyarrow.parquet.write_table hang and was wondering if 
> this is as known issue or specific to me. I usually call write_table 
> like
> this:
> 
> pyarrow.parquet.write_table(pyarrow.Table.from_arrays(...),
> <output-path>, compression="SNAPPY")
> 
> and am running code built from apache-arrow-0.5.0 and 
> apache-parquet-cpp-1.2.0-rc1. I have not found a single table on which 
> the hang is consistently reproducible: usually, it will hang after 
> calling write_table multiple times and non-deterministically. It also 
> only seems to only occur when the table being written contains at 
> least one column with string data.
> 
> Any thoughts or suggestions would be appreciated.
> 
> -Mike
> 
> 
> 
> 
> 
> DISCLAIMER: This e-mail message and any attachments are intended 
> solely for the use of the individual or entity to which it is 
> addressed and may contain information that is confidential or legally 
> privileged. If you are not the intended recipient, you are hereby 
> notified that any dissemination, distribution, copying or other use of 
> this message or its attachments is strictly prohibited. If you have 
> received this message in error, please notify the sender immediately 
> and permanently delete this message and any attachments.
> 
> 
> 





DISCLAIMER: This e-mail message and any attachments are intended solely for the use of the individual or entity to which it is addressed and may contain information that is confidential or legally privileged. If you are not the intended recipient, you are hereby notified that any dissemination, distribution, copying or other use of this message or its attachments is strictly prohibited. If you have received this message in error, please notify the sender immediately and permanently delete this message and any attachments.




Re: pyarrow hang

Posted by "Uwe L. Korn" <uw...@xhochy.com>.
Hello Mike,

this is a known issue with jemalloc. Using the Arrow 0.6.0/0.7.0 release
should avoid this.

Uwe

On Tue, Sep 26, 2017, at 06:10 PM, Katelman, Michael wrote:
> Hi,
> 
> I sometimes see pyarrow.parquet.write_table hang and was wondering if
> this is as known issue or specific to me. I usually call write_table like
> this:
> 
> pyarrow.parquet.write_table(pyarrow.Table.from_arrays(...),
> <output-path>, compression="SNAPPY")
> 
> and am running code built from apache-arrow-0.5.0 and
> apache-parquet-cpp-1.2.0-rc1. I have not found a single table on which
> the hang is consistently reproducible: usually, it will hang after
> calling write_table multiple times and non-deterministically. It also
> only seems to only occur when the table being written contains at least
> one column with string data.
> 
> Any thoughts or suggestions would be appreciated.
> 
> -Mike
> 
> 
> 
> 
> 
> DISCLAIMER: This e-mail message and any attachments are intended solely
> for the use of the individual or entity to which it is addressed and may
> contain information that is confidential or legally privileged. If you
> are not the intended recipient, you are hereby notified that any
> dissemination, distribution, copying or other use of this message or its
> attachments is strictly prohibited. If you have received this message in
> error, please notify the sender immediately and permanently delete this
> message and any attachments.
> 
> 
>