You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by "jherrmannNetfonds (via GitHub)" <gi...@apache.org> on 2023/06/16 08:51:56 UTC

[GitHub] [airflow] jherrmannNetfonds commented on pull request #31798: fix spark-kubernetes-operator compatibility

jherrmannNetfonds commented on PR #31798:
URL: https://github.com/apache/airflow/pull/31798#issuecomment-1594346774

   Hi, thanks for the PR.
   I am testing this in production right now with airflow 2.6.1 running on Kubernetes with KubernetesExecutor. I encountered this error today:
   
   ```
   [2023-06-16, 09:44:04 CEST] {taskinstance.py:1824} ERROR - Task failed with exception
   Traceback (most recent call last):
     File "/home/airflow/.local/lib/python3.10/site-packages/urllib3/response.py", line 761, in _update_chunk_length
       self.chunk_left = int(line, 16)
   ValueError: invalid literal for int() with base 16: b''
   During handling of the above exception, another exception occurred:
   Traceback (most recent call last):
     File "/home/airflow/.local/lib/python3.10/site-packages/urllib3/response.py", line 444, in _error_catcher
       yield
     File "/home/airflow/.local/lib/python3.10/site-packages/urllib3/response.py", line 828, in read_chunked
       self._update_chunk_length()
     File "/home/airflow/.local/lib/python3.10/site-packages/urllib3/response.py", line 765, in _update_chunk_length
       raise InvalidChunkLength(self, line)
   urllib3.exceptions.InvalidChunkLength: InvalidChunkLength(got length b'', 0 bytes read)
   During handling of the above exception, another exception occurred:
   Traceback (most recent call last):
     File "/opt/airflow/dags/repo/packages/local_copy_of_this_pr/spark_kubernetes.py", line 122, in execute
       for line in pod_log_stream:
     File "/home/airflow/.local/lib/python3.10/site-packages/kubernetes/watch/watch.py", line 165, in stream
       for line in iter_resp_lines(resp):
     File "/home/airflow/.local/lib/python3.10/site-packages/kubernetes/watch/watch.py", line 56, in iter_resp_lines
       for seg in resp.stream(amt=None, decode_content=False):
     File "/home/airflow/.local/lib/python3.10/site-packages/urllib3/response.py", line 624, in stream
       for line in self.read_chunked(amt, decode_content=decode_content):
     File "/home/airflow/.local/lib/python3.10/site-packages/urllib3/response.py", line 816, in read_chunked
       with self._error_catcher():
     File "/usr/local/lib/python3.10/contextlib.py", line 153, in __exit__
       self.gen.throw(typ, value, traceback)
     File "/home/airflow/.local/lib/python3.10/site-packages/urllib3/response.py", line 461, in _error_catcher
       raise ProtocolError("Connection broken: %r" % e, e)
   ```
   This fails the task, but the spark application is still running and succeeding. Maybe some of these types of errors should be catched instead of letting the pod fail. What do you think?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org