You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "Joris Van den Bossche (JIRA)" <ji...@apache.org> on 2019/07/31 12:39:00 UTC

[jira] [Commented] (ARROW-6081) FileNotFoundError: [Errno 2] No such file or directory: '/tmp/tmptb2ao6te_job_6e0a8ca1.parquet'

    [ https://issues.apache.org/jira/browse/ARROW-6081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16897125#comment-16897125 ] 

Joris Van den Bossche commented on ARROW-6081:
----------------------------------------------

The final error comes from bigquery, so you might want to also report it to them that they should handle arrow errors in a clearer way. 

The underlying error raised from arrow is "pyarrow.lib.ArrowInvalid: Nested column branch had multiple children". It is hard to say for sure without a reproducible example, but I suppose this is related to the current limitation of the arrow parquet reader regarding nested columns with multiple children. See eg ARROW-1644, ARROW-1599

> FileNotFoundError: [Errno 2] No such file or directory: '/tmp/tmptb2ao6te_job_6e0a8ca1.parquet'
> -----------------------------------------------------------------------------------------------
>
>                 Key: ARROW-6081
>                 URL: https://issues.apache.org/jira/browse/ARROW-6081
>             Project: Apache Arrow
>          Issue Type: Bug
>            Reporter: David Draper
>            Priority: Major
>
> Any idea on how to fix this error? 
>  
> Traceback (most recent call last):
>  File "/usr/local/lib/python3.6/site-packages/google/cloud/bigquery/client.py", line 1530, in load_table_from_dataframe
>  dataframe.to_parquet(tmppath)
>  File "/usr/local/lib64/python3.6/site-packages/pandas/core/frame.py", line 2203, in to_parquet
>  partition_cols=partition_cols, **kwargs)
>  File "/usr/local/lib64/python3.6/site-packages/pandas/io/parquet.py", line 252, in to_parquet
>  partition_cols=partition_cols, **kwargs)
>  File "/usr/local/lib64/python3.6/site-packages/pandas/io/parquet.py", line 122, in write
>  coerce_timestamps=coerce_timestamps, **kwargs)
>  File "/usr/local/lib64/python3.6/site-packages/pyarrow/parquet.py", line 1270, in write_table
>  writer.write_table(table, row_group_size=row_group_size)
>  File "/usr/local/lib64/python3.6/site-packages/pyarrow/parquet.py", line 426, in write_table
>  self.writer.write_table(table, row_group_size=row_group_size)
>  File "pyarrow/_parquet.pyx", line 1311, in pyarrow._parquet.ParquetWriter.write_table
>  File "pyarrow/error.pxi", line 85, in pyarrow.lib.check_status
> pyarrow.lib.ArrowInvalid: Nested column branch had multiple children
> During handling of the above exception, another exception occurred:
> Traceback (most recent call last):
>  File "/var/cache/tomcat/temp/interpreter-2169813765840716657.tmp", line 84, in <module>
>  client.load_table_from_dataframe(appended_data, table_ref,job_config=job_config).result()
>  File "/usr/local/lib/python3.6/site-packages/google/cloud/bigquery/client.py", line 1546, in load_table_from_dataframe
>  os.remove(tmppath)
> FileNotFoundError: [Errno 2] No such file or directory: '/tmp/tmptb2ao6te_job_6e0a8ca1.parquet'



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)