You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "zpz (via GitHub)" <gi...@apache.org> on 2023/04/25 03:51:33 UTC

[GitHub] [arrow] zpz opened a new issue, #35318: [Python][GcsFileSystem][Parquet] fails to create ParquetFile from GCS after a few hundred files

zpz opened a new issue, #35318:
URL: https://github.com/apache/arrow/issues/35318

   ### Describe the bug, including details regarding any error messages, version, and platform.
   
   I posted the question on SO https://stackoverflow.com/questions/76012391/pyarrow-fails-to-create-parquetfile-from-blob-in-google-cloud-storage
   
   My guess about the issue is either GcsFileSystem or its interaction with GCS. I don't have code snippet to reproduce the issue. For me it happens after looping through 300+ files. After that, the issue seems to persist.
   
   The gist of it is using ``biglist.ParquetFileReader.load_file``
   
   - if ``lazy=False``, it works fine.
   - if ``lazy=True``, after 300+ files, it starts to fail with
   
       File "/usr/local/lib/python3.10/dist-packages/pyarrow/parquet/core.py", line 319, in __init__
         source = filesystem.open_input_file(source)
       File "pyarrow/_fs.pyx", line 770, in pyarrow._fs.FileSystem.open_input_file
       File "pyarrow/error.pxi", line 144, in pyarrow.lib.pyarrow_internal_check_status
       File "pyarrow/error.pxi", line 138, in pyarrow.lib.check_status
     pyarrow.lib.ArrowException: Unknown error: google::cloud::Status(UNKNOWN: Permanent error GetObjectMetadata: WaitForHandles(): unexpected error code in curl_multi_*, [12]=Unrecoverable error in select/poll)
   
   
   ### Component(s)
   
   Parquet, Python


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] zpz commented on issue #35318: [Python][GcsFileSystem][Parquet] fails to create ParquetFile from GCS after a few hundred files

Posted by "zpz (via GitHub)" <gi...@apache.org>.
zpz commented on issue #35318:
URL: https://github.com/apache/arrow/issues/35318#issuecomment-1552161651

   I run it within a Docker container. In container I got
   
   ```
   $ ulimit -a
   core file size          (blocks, -c) 0
   data seg size           (kbytes, -d) unlimited
   scheduling priority             (-e) 0
   file size               (blocks, -f) unlimited
   pending signals                 (-i) 62474
   max locked memory       (kbytes, -l) 64
   max memory size         (kbytes, -m) unlimited
   open files                      (-n) 1048576
   pipe size            (512 bytes, -p) 8
   POSIX message queues     (bytes, -q) 819200
   real-time priority              (-r) 0
   stack size              (kbytes, -s) 8192
   cpu time               (seconds, -t) unlimited
   max user processes              (-u) unlimited
   virtual memory          (kbytes, -v) unlimited
   file locks                      (-x) unlimited
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] pitrou commented on issue #35318: [Python][GcsFileSystem][Parquet] fails to create ParquetFile from GCS after a few hundred files

Posted by "pitrou (via GitHub)" <gi...@apache.org>.
pitrou commented on issue #35318:
URL: https://github.com/apache/arrow/issues/35318#issuecomment-1552745718

   Ok, I took a brief tour through the libcurl source code:
   * The "Unrecoverable error in select/poll" error is generated in `curl_multi_wait` if `Curl_poll` returns -1
   * `Curl_poll` (which, understably, is a wrapper around `poll` on Unix) returns 1 in three situations:
     1. `nfds` is non-zero and `poll` returns an error that's not EINTR
     2. `nfds` is zero and the given timeout is negative
     3. `nfds` is zero and `poll` returns an error _including EINTR_
   
   Let's dive a bit into google-cloud-cpp. There are two similar functions named `WaitForHandles` (`CurlImpl::WaitForHandles` and `CurlDownloadRequest::WaitForHandles). Both call `curl_multi_wait` with zero extra file descriptors and a hard-coded positive timeout. This eliminates the "negative timeout" situation above.
   
   We are left with an error returned from `poll`. According to the Linux man page, these can be:
   ```
          EFAULT fds points outside the process's accessible address space.  The
                 array given as argument was not contained in the  calling  pro‐
                 gram's address space.
   
          EINTR  A signal occurred before any requested event; see signal(7).
   
          EINVAL The nfds value exceeds the RLIMIT_NOFILE value.
   
          ENOMEM Unable to allocate memory for kernel data structures.
   ```
   
   We can eliminate EFAULT as `curl_poll` ensures the fds point to accessible memory.
   EINVAL is extremely unlikely given a limit of 1048576 open files in https://github.com/apache/arrow/issues/35318#issuecomment-1552161651 .
   ENOMEM cannot be ruled out, but I guess exhaustion of kernel data space would manifest randomly in other ways?
   
   This leaves us with EINTR, which can happen in the case that `curl_multi_wait` [doesn't find any file descriptors](https://github.com/curl/curl/blob/a9f8fe28481fef7c28d85b4a12a3a35521408eaf/lib/multi.c#L1185-L1207) to wait for.
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] westonpace commented on issue #35318: [Python][GcsFileSystem][Parquet] fails to create ParquetFile from GCS after a few hundred files

Posted by "westonpace (via GitHub)" <gi...@apache.org>.
westonpace commented on issue #35318:
URL: https://github.com/apache/arrow/issues/35318#issuecomment-1524064808

   The error (`unrecoverable error from select/poll`) originates from curl.  So it seems this related to curl / GCS.  Is there any possibility the connection is having issues?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] zpz commented on issue #35318: [Python][GcsFileSystem][Parquet] fails to create ParquetFile from GCS after a few hundred files

Posted by "zpz (via GitHub)" <gi...@apache.org>.
zpz commented on issue #35318:
URL: https://github.com/apache/arrow/issues/35318#issuecomment-1521119807

   the ``load_file`` function is https://github.com/zpz/biglist/blob/c6c1eca5be99370f23b5fcef481b43af8125eecc/src/biglist/_parquet.py#L75


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] pitrou commented on issue #35318: [Python][GcsFileSystem][Parquet] fails to create ParquetFile from GCS after a few hundred files

Posted by "pitrou (via GitHub)" <gi...@apache.org>.
pitrou commented on issue #35318:
URL: https://github.com/apache/arrow/issues/35318#issuecomment-1572635032

   Opened https://github.com/apache/arrow/issues/35879


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] zpz commented on issue #35318: [Python][GcsFileSystem][Parquet] fails to create ParquetFile from GCS after a few hundred files

Posted by "zpz (via GitHub)" <gi...@apache.org>.
zpz commented on issue #35318:
URL: https://github.com/apache/arrow/issues/35318#issuecomment-1596362014

   My working code is here https://github.com/zpz/biglist/blob/main/src/biglist/_parquet.py#L86 the pyarrow behavior here seems to be flawed in that it should take care of this. It has a context manager. However in this case where the context manager doesn't do much, many applications may not use context manager, and the code should handle finalization regardless. This is the case in multiple places in the standard multiprocessing code.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] pitrou commented on issue #35318: [Python][GcsFileSystem][Parquet] fails to create ParquetFile from GCS after a few hundred files

Posted by "pitrou (via GitHub)" <gi...@apache.org>.
pitrou commented on issue #35318:
URL: https://github.com/apache/arrow/issues/35318#issuecomment-1544051602

   @coryan Have you already seen this error?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] pitrou commented on issue #35318: [Python][GcsFileSystem][Parquet] fails to create ParquetFile from GCS after a few hundred files

Posted by "pitrou (via GitHub)" <gi...@apache.org>.
pitrou commented on issue #35318:
URL: https://github.com/apache/arrow/issues/35318#issuecomment-1554766098

   Current status on this:
   * a fix was merged for `libcurl`: https://github.com/curl/curl/issues/11135
   * a workaround was merge for `google-cloud-cpp`: https://github.com/googleapis/google-cloud-cpp/issues/11647
   
   We'll have to bump our bundled version of `google-cloud-cpp` when a new release gets done.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] coryan commented on issue #35318: [Python][GcsFileSystem][Parquet] fails to create ParquetFile from GCS after a few hundred files

Posted by "coryan (via GitHub)" <gi...@apache.org>.
coryan commented on issue #35318:
URL: https://github.com/apache/arrow/issues/35318#issuecomment-1544086324

   > Have you already seen this error?
   
   No, that is a new one for me.
   
   What version of Apache/Arrow is this running with? Does it include the fixes in #34051?
   
   If it does not include those fixes I speculate (as in "I am not sure, but maybe") that this is starting 300+ downloads.  That will consume about 600 sockets and exhausting some resource (e.g. file descriptors).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] pitrou commented on issue #35318: [Python][GcsFileSystem][Parquet] fails to create ParquetFile from GCS after a few hundred files

Posted by "pitrou (via GitHub)" <gi...@apache.org>.
pitrou commented on issue #35318:
URL: https://github.com/apache/arrow/issues/35318#issuecomment-1597127094

   @zpz the `ParquetFile` context manager should ensure that the reader is closed, does that not happen for you?
   https://github.com/apache/arrow/blob/e798e2a08c1bff3f62fa9b0fd10cd07a0488705b/python/pyarrow/parquet/core.py#L346-L350
   
   cc @jorisvandenbossche for the potential `ParquetFile` issue.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] zpz commented on issue #35318: [Python][GcsFileSystem][Parquet] fails to create ParquetFile from GCS after a few hundred files

Posted by "zpz (via GitHub)" <gi...@apache.org>.
zpz commented on issue #35318:
URL: https://github.com/apache/arrow/issues/35318#issuecomment-1525076112

   In the same code, if I download the blob as bytes, there was no issues. So I doubt it's connection issue. I don't know how `curl` is used; it's not used in my code. I feel the issue is some interaction between GcsFileSystem and the GCS service. Note that the issue happens only after processing a few hundred blobs, so there seems to be some thing building up.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] coryan commented on issue #35318: [Python][GcsFileSystem][Parquet] fails to create ParquetFile from GCS after a few hundred files

Posted by "coryan (via GitHub)" <gi...@apache.org>.
coryan commented on issue #35318:
URL: https://github.com/apache/arrow/issues/35318#issuecomment-1544578553

   > The latest pypi published version, that is
   
   Okay.  12.0.0 was released on 2023-05-02, about a week after this issue report.  The fixes are *not* in the 11.0.0 release:
   
   https://github.com/apache/arrow/commit/771c37aab8757287b3fa9cfe1bfb87992126ee08
   
   I think it is worthwhile to try with 12.0.0.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] westonpace commented on issue #35318: [Python][GcsFileSystem][Parquet] fails to create ParquetFile from GCS after a few hundred files

Posted by "westonpace (via GitHub)" <gi...@apache.org>.
westonpace commented on issue #35318:
URL: https://github.com/apache/arrow/issues/35318#issuecomment-1553626461

   >  ENOMEM cannot be ruled out, but I guess exhaustion of kernel data space would manifest randomly in other ways?
   
   It should also be possible to monitor RAM usage during the execution.  Given that `overcommit_memory=1` I think we should only see `ENOMEM` if free memory is close to 0.  However, I agree this is not the likely culprit.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] zpz commented on issue #35318: [Python][GcsFileSystem][Parquet] fails to create ParquetFile from GCS after a few hundred files

Posted by "zpz (via GitHub)" <gi...@apache.org>.
zpz commented on issue #35318:
URL: https://github.com/apache/arrow/issues/35318#issuecomment-1596352872

   While there is apparently a valid bug related to this, I should report that I found a bug in my code, which failed to `close` the `ParquetFile`. That led to buildup of memory consumption. After fixing that, my immediate problem seems to be solved.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] pitrou commented on issue #35318: [Python][GcsFileSystem][Parquet] fails to create ParquetFile from GCS after a few hundred files

Posted by "pitrou (via GitHub)" <gi...@apache.org>.
pitrou commented on issue #35318:
URL: https://github.com/apache/arrow/issues/35318#issuecomment-1552747735

   @zpz Perhaps you can try to use strace to see if your program is receiving any signals?
   See https://unix.stackexchange.com/a/372581


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] zpz commented on issue #35318: [Python][GcsFileSystem][Parquet] fails to create ParquetFile from GCS after a few hundred files

Posted by "zpz (via GitHub)" <gi...@apache.org>.
zpz commented on issue #35318:
URL: https://github.com/apache/arrow/issues/35318#issuecomment-1544538367

   The latest pypi published version, that is


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] westonpace commented on issue #35318: [Python][GcsFileSystem][Parquet] fails to create ParquetFile from GCS after a few hundred files

Posted by "westonpace (via GitHub)" <gi...@apache.org>.
westonpace commented on issue #35318:
URL: https://github.com/apache/arrow/issues/35318#issuecomment-1548384073

   From https://github.com/curl/curl/issues/8921 it would seem that too many open file descriptors is indeed a very likely culprit.
   
   @zpz can you show what you get from `ulimit -a`?
   
   Another possible cause is that the kernel is running out of memory.  @zpz can you share the value of `cat /proc/sys/vm/overcommit_memory`?  If overcommit is disabled (i.e. if that command returns `2`) then it is possible the kernel will decide it is out of memory well before it actually uses all physical memory.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] kou closed issue #35318: [Python][GcsFileSystem][Parquet] fails to create ParquetFile from GCS after a few hundred files

Posted by "kou (via GitHub)" <gi...@apache.org>.
kou closed issue #35318: [Python][GcsFileSystem][Parquet] fails to create ParquetFile from GCS after a few hundred files
URL: https://github.com/apache/arrow/issues/35318


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] pitrou commented on issue #35318: [Python][GcsFileSystem][Parquet] fails to create ParquetFile from GCS after a few hundred files

Posted by "pitrou (via GitHub)" <gi...@apache.org>.
pitrou commented on issue #35318:
URL: https://github.com/apache/arrow/issues/35318#issuecomment-1572632846

   A new `google-cloud-cpp` version has been released with the fix:
   https://github.com/googleapis/google-cloud-cpp/releases/tag/v2.11.0


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] zpz commented on issue #35318: [Python][GcsFileSystem][Parquet] fails to create ParquetFile from GCS after a few hundred files

Posted by "zpz (via GitHub)" <gi...@apache.org>.
zpz commented on issue #35318:
URL: https://github.com/apache/arrow/issues/35318#issuecomment-1545939249

   with pyarrow 12.0.0, I got this error after 333 files:
   
   ```
   error after 0.4718797499954235 seconds
   <class 'pyarrow.lib.ArrowException'>
   Unknown error: google::cloud::Status(UNKNOWN: Permanent error ReadObjectNotWrapped: WaitForHandles(): unexpected error code in curl_multi_*, [12]=Unrecoverable error in select/poll)
   ('Unknown error: google::cloud::Status(UNKNOWN: Permanent error ReadObjectNotWrapped: WaitForHandles(): unexpected error code in curl_multi_*, [12]=Unrecoverable error in select/poll)',)
   
   
   Traceback (most recent call last):
     File "/home/docker-user/sunny/tests/manual/parq.py", line 67, in <module>
       main()
     File "/home/docker-user/sunny/tests/manual/parq.py", line 41, in main
       n = len(batch)
     File "/usr/local/lib/python3.10/site-packages/biglist/_parquet.py", line 171, in __len__
       return self.num_rows
     File "/usr/local/lib/python3.10/site-packages/biglist/_parquet.py", line 203, in num_rows
       return self.metadata.num_rows
     File "/usr/local/lib/python3.10/site-packages/biglist/_parquet.py", line 199, in metadata
       return self.file.metadata
     File "/usr/local/lib/python3.10/site-packages/biglist/_parquet.py", line 194, in file
       self._file = self.load_file(self.path, lazy=self.lazy)
     File "/usr/local/lib/python3.10/site-packages/biglist/_parquet.py", line 97, in load_file
       file = ParquetFile(pp, filesystem=ff)
     File "/usr/local/lib/python3.10/site-packages/pyarrow/parquet/core.py", line 334, in __init__
       self.reader.open(
     File "pyarrow/_parquet.pyx", line 1220, in pyarrow._parquet.ParquetReader.open
     File "pyarrow/error.pxi", line 138, in pyarrow.lib.check_status
   pyarrow.lib.ArrowException: Unknown error: google::cloud::Status(UNKNOWN: Permanent error ReadObjectNotWrapped: WaitForHandles(): unexpected error code in curl_multi_*, [12]=Unrecoverable error in select/poll)
   ```
   
   I'm looping through https://github.com/zpz/biglist/blob/7910c60524aeeee19a037245a61fc58d8638e600/src/biglist/_parquet.py#L49 objects each getting a GCS path. In the loop I call `len(obj)`, which calls its `load_file` with `lazy=True`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] zpz commented on issue #35318: [Python][GcsFileSystem][Parquet] fails to create ParquetFile from GCS after a few hundred files

Posted by "zpz (via GitHub)" <gi...@apache.org>.
zpz commented on issue #35318:
URL: https://github.com/apache/arrow/issues/35318#issuecomment-1544532468

   It was using the latest version as of the time of the issue report


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] zpz commented on issue #35318: [Python][GcsFileSystem][Parquet] fails to create ParquetFile from GCS after a few hundred files

Posted by "zpz (via GitHub)" <gi...@apache.org>.
zpz commented on issue #35318:
URL: https://github.com/apache/arrow/issues/35318#issuecomment-1552185807

   My code does have the problem that as I loop through the hundreds of files, previous files stay around. Now I avoided that situation, still got error after 333 files:
   
   ```
   error after 0.4439328750013374 seconds
   <class 'pyarrow.lib.ArrowException'>
   Unknown error: google::cloud::Status(UNKNOWN: Permanent error ReadObjectNotWrapped: WaitForHandles(): unexpected error code in curl_multi_*, [12]=Unrecoverable error in select/poll)
   ('Unknown error: google::cloud::Status(UNKNOWN: Permanent error ReadObjectNotWrapped: WaitForHandles(): unexpected error code in curl_multi_*, [12]=Unrecoverable error in select/poll)',)
   
   .
   .
   .
   
     File "/usr/local/lib/python3.10/site-packages/biglist/_parquet.py", line 171, in __len__
       return self.num_rows
     File "/usr/local/lib/python3.10/site-packages/biglist/_parquet.py", line 203, in num_rows
       return self.metadata.num_rows
     File "/usr/local/lib/python3.10/site-packages/biglist/_parquet.py", line 199, in metadata
       return self.file.metadata
     File "/usr/local/lib/python3.10/site-packages/biglist/_parquet.py", line 194, in file
       self._file = self.load_file(self.path, lazy=self.lazy)
     File "/usr/local/lib/python3.10/site-packages/biglist/_parquet.py", line 97, in load_file
       file = ParquetFile(pp, filesystem=ff)
     File "/usr/local/lib/python3.10/site-packages/pyarrow/parquet/core.py", line 334, in __init__
       self.reader.open(
     File "pyarrow/_parquet.pyx", line 1220, in pyarrow._parquet.ParquetReader.open
     File "pyarrow/error.pxi", line 138, in pyarrow.lib.check_status
   pyarrow.lib.ArrowException: Unknown error: google::cloud::Status(UNKNOWN: Permanent error ReadObjectNotWrapped: WaitForHandles(): unexpected error code in curl_multi_*, [12]=Unrecoverable error in select/poll)
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] zpz commented on issue #35318: [Python][GcsFileSystem][Parquet] fails to create ParquetFile from GCS after a few hundred files

Posted by "zpz (via GitHub)" <gi...@apache.org>.
zpz commented on issue #35318:
URL: https://github.com/apache/arrow/issues/35318#issuecomment-1597401333

   My code does not use context manager on this ParquetFile. My previous code had a bug in closing it.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org