You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2021/06/03 15:19:39 UTC

[GitHub] [airflow] DrTeja opened a new issue #16243: When triggering a job via airfow UI, job fails abruptly with Negsignal.SIGSEGV

DrTeja opened a new issue #16243:
URL: https://github.com/apache/airflow/issues/16243


   <!--
   
   Welcome to Apache Airflow!  For a smooth issue process, try to answer the following questions.
   Don't worry if they're not all applicable; just try to include what you can :-)
   
   If you need to include code snippets or logs, please put them in fenced code
   blocks.  If they're super-long, please use the details tag like
   <details><summary>super-long log</summary> lots of stuff </details>
   
   Please delete these comment blocks before submitting the issue.
   
   
   
   IMPORTANT!!!
   
   PLEASE CHECK "SIMILAR TO X EXISTING ISSUES" OPTION IF VISIBLE
   NEXT TO "SUBMIT NEW ISSUE" BUTTON!!!
   
   PLEASE CHECK IF THIS ISSUE HAS BEEN REPORTED PREVIOUSLY USING SEARCH!!!
   
   Please complete the next sections or the issue will be closed.
   These questions are the first thing we need to know to understand the context.
   
   -->
   
   **Apache Airflow version**: 2.0
   
   
   **Kubernetes version (if you are using kubernetes)** (use `kubectl version`):  NA
   
   **Environment**: MAC
   
   - **Cloud provider or hardware configuration**: NA
   - **OS** (e.g. from /etc/os-release): Mac Big sur
   - **Kernel** (e.g. `uname -a`):local 20.4.0 Darwin Kernel Version 20.4.0
   - **Install tools**: NA
   - **Others**: NA
   
   **What happened**:
   
   <!-- (please include exact error messages if you can) -->
   We are using airflow jobs to upload data to big query and created python operators and triggering them via creating dags. So when we run manually suing airflow tasks test <dag_id> <task_id> <date> things works fine ,but the same when triggered via UI, its failing with error 
   
   *** Reading local file: /Users/rdoppalapudi/airflow_project//logs/ygrene_etl_process/run_main_etl_project/2021-06-03T14:55:02.676999+00:00/1.log
   [2021-06-03 10:55:07,575] {taskinstance.py:876} INFO - Dependencies all met for <TaskInstance: ygrene_etl_process.run_main_etl_project 2021-06-03T14:55:02.676999+00:00 [queued]>
   [2021-06-03 10:55:07,580] {taskinstance.py:876} INFO - Dependencies all met for <TaskInstance: ygrene_etl_process.run_main_etl_project 2021-06-03T14:55:02.676999+00:00 [queued]>
   [2021-06-03 10:55:07,580] {taskinstance.py:1067} INFO - 
   --------------------------------------------------------------------------------
   [2021-06-03 10:55:07,580] {taskinstance.py:1068} INFO - Starting attempt 1 of 1
   [2021-06-03 10:55:07,580] {taskinstance.py:1069} INFO - 
   --------------------------------------------------------------------------------
   [2021-06-03 10:55:07,586] {taskinstance.py:1087} INFO - Executing <Task(PythonOperator): run_main_etl_project> on 2021-06-03T14:55:02.676999+00:00
   [2021-06-03 10:55:07,589] {standard_task_runner.py:52} INFO - Started process 9133 to run task
   [2021-06-03 10:55:07,595] {standard_task_runner.py:76} INFO - Running: ['airflow', 'tasks', 'run', 'ygrene_etl_process', 'run_main_etl_project', '2021-06-03T14:55:02.676999+00:00', '--job-id', '16', '--pool', 'default_pool', '--raw', '--subdir', '/Users/rdoppalapudi/airflow_project/dags/etl_airflow.py', '--cfg-path', '/var/folders/5n/72l0n8zd261dnlkm3n902my0pmw_c2/T/tmp4fd_41fd', '--error-file', '/var/folders/5n/72l0n8zd261dnlkm3n902my0pmw_c2/T/tmpn67lg3s9']
   [2021-06-03 10:55:07,597] {standard_task_runner.py:77} INFO - Job 16: Subtask run_main_etl_project
   [2021-06-03 10:55:07,625] {logging_mixin.py:104} INFO - Running <TaskInstance: ygrene_etl_process.run_main_etl_project 2021-06-03T14:55:02.676999+00:00 [running]> on host 1.0.0.127.in-addr.arpa
   [2021-06-03 10:55:07,649] {taskinstance.py:1280} INFO - Exporting the following env vars:
   AIRFLOW_CTX_DAG_OWNER=ygrene
   AIRFLOW_CTX_DAG_ID=ygrene_etl_process
   AIRFLOW_CTX_TASK_ID=run_main_etl_project
   AIRFLOW_CTX_EXECUTION_DATE=2021-06-03T14:55:02.676999+00:00
   AIRFLOW_CTX_DAG_RUN_ID=manual__2021-06-03T14:55:02.676999+00:00
   [2021-06-03 10:55:07,925] {etl_airflow.py:32} INFO - 
   run_id = manual__2021-06-03T14:55:02.676999+00:00 
    dag_id = DAG: ygrene_etl_process 
    task_id = Task(PythonOperator): run_main_etl_project
   [2021-06-03 10:55:08,246] {transport.py:1819} INFO - Connected (version 2.0, client OpenSSH_7.4)
   [2021-06-03 10:55:08,954] {transport.py:1819} INFO - Authentication (publickey) successful!
   [2021-06-03 10:55:14,328] {data_integration.py:29} INFO - Uploading data for projects 
   [2021-06-03 10:55:14,329] {data_integration.py:31} INFO - Creating bigq obj
   [2021-06-03 10:55:26,035] {bigquery_wrapper_apis.py:117} INFO - Got the original json to be uploaded
   [2021-06-03 10:55:27,451] {bigquery_wrapper_apis.py:102} INFO - Creating big client obj
   [2021-06-03 10:55:27,687] {local_task_job.py:151} INFO - Task exited with return code Negsignal.SIGSEGV
   
   **What you expected to happen**:
   
   <!-- What do you think went wrong? -->
   No Sure whats going wrong exactly as scheduler prompt shows up some logs with error 
   
   Running <TaskInstance: ygrene_etl_process.run_main_etl_project 2021-06-03T14:55:02.676999+00:00 [queued]> on host 1.0.0.127.in-addr.arpa
   The process has forked and you cannot use this CoreFoundation functionality safely. You MUST exec().
   Break on __THE_PROCESS_HAS_FORKED_AND_YOU_CANNOT_USE_THIS_COREFOUNDATION_FUNCTIONALITY___YOU_MUST_EXEC__() to debug.
   [2021-06-03 10:55:28,046] {scheduler_job.py:1205} INFO - Executor reports execution of ygrene_etl_process.run_main_etl_project execution_date=2021-06-03 14:55:02.676999+00:00 exited with status success for try_number 1
   [2021-06-03 10:55:29,427] {dagrun.py:429} ERROR - Marking run <DagRun ygrene_etl_process @ 2021-06-03 14:55:02.676999+00:00: manual__2021-06-03T14:55:02.676999+00:00, externally triggered: True> failed
   [2021-06-03 10:56:10,676] {scheduler_job.py:1822} INFO - Resetting orphaned tasks for active dag runs
   [2021-06-03 11:01:10,846] {scheduler_job.py:1822} INFO - Resetting orphaned tasks for active dag runs
   
   
   **How to reproduce it**:
   <!---
   
   As minimally and precisely as possible. Keep in mind we do not have access to your cluster or dags.
   
   If you are using kubernetes, please attempt to recreate the issue using minikube or kind.
   
   ## Install minikube/kind
   
   - Minikube https://minikube.sigs.k8s.io/docs/start/
   - Kind https://kind.sigs.k8s.io/docs/user/quick-start/
   
   If this is a UI bug, please provide a screenshot of the bug or a link to a youtube video of the bug in action
   
   You can include images using the .md style of
   ![alt text](http://url/to/img.png)
   
   To record a screencast, mac users can use QuickTime and then create an unlisted youtube video with the resulting .mov file.
   
   --->
   
   No Sure really how to reproduce as these things all working fine till last week
   
   **Anything else we need to know**:
   
   <!--
   
   How often does this problem occur? Once? Every time etc?
   
   Any relevant logs to include? Put them here in side a detail tag:
   <details><summary>x.log</summary> lots of stuff </details>
   
   -->
   
   Dag  task works fine manually not sure why its failing only during scheduled task run from UI and there is no clear information on what is happening internally, also the issue looks to be more generic and related to multiprocessing ( this we understand, after looking related information on web)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] DrTeja closed issue #16243: When triggering a job via airfow UI, job fails abruptly with Negsignal.SIGSEGV

Posted by GitBox <gi...@apache.org>.
DrTeja closed issue #16243:
URL: https://github.com/apache/airflow/issues/16243


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] DrTeja closed issue #16243: When triggering a job via airfow UI, job fails abruptly with Negsignal.SIGSEGV

Posted by GitBox <gi...@apache.org>.
DrTeja closed issue #16243:
URL: https://github.com/apache/airflow/issues/16243


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] m1racoli commented on issue #16243: When triggering a job via airfow UI, job fails abruptly with Negsignal.SIGSEGV

Posted by GitBox <gi...@apache.org>.
m1racoli commented on issue #16243:
URL: https://github.com/apache/airflow/issues/16243#issuecomment-880816222


   We suspect something in native code related to the BQ job API.
   
   This is our pip freeze output:
   [pip.freeze.txt](https://github.com/apache/airflow/files/6824585/pip.freeze.txt)
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk closed issue #16243: When triggering a job via airfow UI, job fails abruptly with Negsignal.SIGSEGV

Posted by GitBox <gi...@apache.org>.
potiuk closed issue #16243:
URL: https://github.com/apache/airflow/issues/16243


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] iostreamdoth edited a comment on issue #16243: When triggering a job via airfow UI, job fails abruptly with Negsignal.SIGSEGV

Posted by GitBox <gi...@apache.org>.
iostreamdoth edited a comment on issue #16243:
URL: https://github.com/apache/airflow/issues/16243#issuecomment-881909400


   I have encountered similar error in airflow 1.10.15 (on google cloud composer) with DataprocInstantiateWorkflowTemplateOperator. I think it has nothing to do with airflow as is but may be with google api's or some sort of library version mismatch.  I have 17 tasks running  in parallel for the above operator. and 15/17 fail with `Task exited with return code Negsignal.SIGSEGV `
   
   I m not able to get in more information but Here is what I did to get fewer errors. 
   
   I override execute function of the above operator.
   ```
   class InstantiateDataProcWorkflowTemplate(dataproc.DataprocInstantiateWorkflowTemplateOperator):
   
       def execute(self, context):
           def callback(operation_future):
               # Handle result.
               result = operation_future.result()
   
           hook = DataprocHook(gcp_conn_id=self.gcp_conn_id, impersonation_chain=self.impersonation_chain)
           self.log.info('Instantiating template %s', self.template_id)
           operation = hook.instantiate_workflow_template(
               project_id=self.project_id,
               location=self.region,
               template_name=self.template_id,
               version=self.version,
               request_id=self.request_id,
               parameters=self.parameters,
               retry=self.retry,
               timeout=self.timeout,
               metadata=self.metadata,
           )
           self.log.info('Template Started.')
           while not operation.done():
               self.log.info('Workflow template still running.')
               time.sleep(600)
           self.log.info('Template Finished.')
   ```
   Doing the above did result in fewer task failing around 3/17 failed but the failure exists for no substantial reason.  
   
   If I remove the while block `while not operation.done():` or operation.result() from the original [here](https://airflow.apache.org/docs/apache-airflow-providers-google/stable/_modules/airflow/providers/google/cloud/operators/dataproc.html#DataprocInstantiateWorkflowTemplateOperator) . The task runs successfully but no wait for completion.  This lead me to believe that there is some bug with asyn operation call on google.api_core. are majority of people facing this problem using google provider operators?
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] iostreamdoth commented on issue #16243: When triggering a job via airfow UI, job fails abruptly with Negsignal.SIGSEGV

Posted by GitBox <gi...@apache.org>.
iostreamdoth commented on issue #16243:
URL: https://github.com/apache/airflow/issues/16243#issuecomment-882709930


   well its a google grpc library issue
   https://github.com/grpc/grpc/issues/23796
   
   This is not airflow or providers issue I think this can be closed from what I understand


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] m1racoli commented on issue #16243: When triggering a job via airfow UI, job fails abruptly with Negsignal.SIGSEGV

Posted by GitBox <gi...@apache.org>.
m1racoli commented on issue #16243:
URL: https://github.com/apache/airflow/issues/16243#issuecomment-880812154


   We encountered a similar issue with [BigQueryInsertJobOperator](https://registry.astronomer.io/providers/google/modules/bigqueryinsertjoboperator) since yesterday. The BQ job keeps running without issues, while the task on the airflow side gets killed with `SIGSEGV`.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on issue #16243: When triggering a job via airfow UI, job fails abruptly with Negsignal.SIGSEGV

Posted by GitBox <gi...@apache.org>.
potiuk commented on issue #16243:
URL: https://github.com/apache/airflow/issues/16243#issuecomment-884333945


   I think the problem is that there is nothing we can do in airflow with it. 
   
   What would be your suggestion to do ? Do you have any proposal what more we can do? 
   
   We cannot really control environment variables set for external tools or airflow itself. The grpc library might be used in various context and at most it is an optional add-on to Airflow via some providers that use the library. Similarly as we do not tell people how they should configure their celery, we do not tell them either what variables to put there for various cases.
   
   I consider this an edge-case and something that is more of a deployment issue not Airflow's one. The issue is public, indexed by Google. And after your comment (Thanks!) that digested the linked discussion it now even has the suggestion from you to try `GRPC_POLL_STRATEGY` - so if someone encounters similar issue they can find it here and try those different remedies suggested here or in the linked issue.
   
   I honestly think it's quite enough :).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] boring-cyborg[bot] commented on issue #16243: When triggering a job via airfow UI, job fails abruptly with Negsignal.SIGSEGV

Posted by GitBox <gi...@apache.org>.
boring-cyborg[bot] commented on issue #16243:
URL: https://github.com/apache/airflow/issues/16243#issuecomment-853951425


   Thanks for opening your first issue here! Be sure to follow the issue template!
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on issue #16243: When triggering a job via airfow UI, job fails abruptly with Negsignal.SIGSEGV

Posted by GitBox <gi...@apache.org>.
potiuk commented on issue #16243:
URL: https://github.com/apache/airflow/issues/16243#issuecomment-882726582


   Correct. We have constraints set at the latest released version (1.38.0) so hopefully it is fixed there (it was broken in 1.31.0). And even if not, people can still downgrade to 1.30.0 as we are not pinning the version. we just have constraints.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] m1racoli edited a comment on issue #16243: When triggering a job via airfow UI, job fails abruptly with Negsignal.SIGSEGV

Posted by GitBox <gi...@apache.org>.
m1racoli edited a comment on issue #16243:
URL: https://github.com/apache/airflow/issues/16243#issuecomment-880812154


   We encountered a similar issue with [BigQueryInsertJobOperator](https://registry.astronomer.io/providers/google/modules/bigqueryinsertjoboperator) since yesterday. The BQ job keeps running without issues, while the task on the airflow side gets killed with `SIGSEGV`.
   
   We're still trying to narrow down the root cause. It's hard to reproduce as it only happens with specific tasks, which do not differ significantly from others.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] m1racoli edited a comment on issue #16243: When triggering a job via airfow UI, job fails abruptly with Negsignal.SIGSEGV

Posted by GitBox <gi...@apache.org>.
m1racoli edited a comment on issue #16243:
URL: https://github.com/apache/airflow/issues/16243#issuecomment-884202100


   @iostreamdoth @potiuk Using `grpcio<=1.30` certainly helps, but I am not sure if the topic can be considered closed.
   
   Version 1.30.0 of grpc is over a year old and currently there's no bugfix in sight. There is a lot of discussion in https://github.com/grpc/grpc/issues/23796 around potential issues when using multiprocessing with fork and it has been recommended to set
   
   ```
   GRPC_POLL_STRATEGY=epoll1
   ```
   
   in that case.
   
   Therefore it's fair to ask if Airflow's setup (when using Celery workers) works as a catalyst for this issue and if there's something to be done on Airflow's site.
   
   Edit: I might be wrong about Airflow using multiprocess with fork.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on issue #16243: When triggering a job via airfow UI, job fails abruptly with Negsignal.SIGSEGV

Posted by GitBox <gi...@apache.org>.
potiuk commented on issue #16243:
URL: https://github.com/apache/airflow/issues/16243#issuecomment-882726582


   Correct. We have constraints set at the latest released version (1.38.0) so hopefully it is fixed there (it was broken in 1.31.0). And even if not, people can still downgrade to 1.30.0 as we are not pinning the version. we just have constraints.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] boring-cyborg[bot] commented on issue #16243: When triggering a job via airfow UI, job fails abruptly with Negsignal.SIGSEGV

Posted by GitBox <gi...@apache.org>.
boring-cyborg[bot] commented on issue #16243:
URL: https://github.com/apache/airflow/issues/16243#issuecomment-853951425


   Thanks for opening your first issue here! Be sure to follow the issue template!
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] m1racoli commented on issue #16243: When triggering a job via airfow UI, job fails abruptly with Negsignal.SIGSEGV

Posted by GitBox <gi...@apache.org>.
m1racoli commented on issue #16243:
URL: https://github.com/apache/airflow/issues/16243#issuecomment-888140647


   @potiuk I see your point. Airflow after all is not responsible for this. :)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] m1racoli commented on issue #16243: When triggering a job via airfow UI, job fails abruptly with Negsignal.SIGSEGV

Posted by GitBox <gi...@apache.org>.
m1racoli commented on issue #16243:
URL: https://github.com/apache/airflow/issues/16243#issuecomment-884202100


   @potiuk Using `grpcio<=1.30` certainly helps, but I am not sure if the topic can be considered closed.
   
   Version 1.30.0 of grpc is over a year old and currently there's no bugfix in sight. There is a lot of discussion in https://github.com/grpc/grpc/issues/23796 around potential issues when using multiprocessing with fork and it has been recommended to set
   
   ```
   GRPC_POLL_STRATEGY=epoll1
   ```
   
   in that case.
   
   Therefore it's fair to ask if Airflow's setup (when using Celery workers) works as a catalyst for this issue and if there's something to be done on Airflow's site.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] DrTeja removed a comment on issue #16243: When triggering a job via airfow UI, job fails abruptly with Negsignal.SIGSEGV

Posted by GitBox <gi...@apache.org>.
DrTeja removed a comment on issue #16243:
URL: https://github.com/apache/airflow/issues/16243#issuecomment-853952206


   [](url)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] DrTeja removed a comment on issue #16243: When triggering a job via airfow UI, job fails abruptly with Negsignal.SIGSEGV

Posted by GitBox <gi...@apache.org>.
DrTeja removed a comment on issue #16243:
URL: https://github.com/apache/airflow/issues/16243#issuecomment-853952206


   [](url)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on issue #16243: When triggering a job via airfow UI, job fails abruptly with Negsignal.SIGSEGV

Posted by GitBox <gi...@apache.org>.
potiuk commented on issue #16243:
URL: https://github.com/apache/airflow/issues/16243#issuecomment-882726582


   Correct. We have constraints set at the latest released version (1.38.0) so hopefully it is fixed there (it was broken in 1.31.0). And even if not, people can still downgrade to 1.30.0 as we are not pinning the version. we just have constraints.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] github-actions[bot] commented on issue #16243: When triggering a job via airfow UI, job fails abruptly with Negsignal.SIGSEGV

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on issue #16243:
URL: https://github.com/apache/airflow/issues/16243#issuecomment-876014019


   This issue has been automatically marked as stale because it has been open for 30 days with no response from the author. It will be closed in next 7 days if no further activity occurs from the issue author.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] iostreamdoth edited a comment on issue #16243: When triggering a job via airfow UI, job fails abruptly with Negsignal.SIGSEGV

Posted by GitBox <gi...@apache.org>.
iostreamdoth edited a comment on issue #16243:
URL: https://github.com/apache/airflow/issues/16243#issuecomment-881909400


   I have encountered similar error in airflow 1.10.15 (on google cloud composer) with DataprocInstantiateWorkflowTemplateOperator. I think it has nothing to do with airflow as is but may be with google api's or some sort of library version mismatch.  I have 17 tasks running  in parallel for the above operator. and 15/17 fail with `Task exited with return code Negsignal.SIGSEGV `
   
   I m not able to get in more information but Here is what I did to get fewer errors. 
   
   I override execute function of the above operator.
   ```
   class InstantiateDataProcWorkflowTemplate(dataproc.DataprocInstantiateWorkflowTemplateOperator):
   
       def execute(self, context):
           def callback(operation_future):
               # Handle result.
               result = operation_future.result()
   
           hook = DataprocHook(gcp_conn_id=self.gcp_conn_id, impersonation_chain=self.impersonation_chain)
           self.log.info('Instantiating template %s', self.template_id)
           operation = hook.instantiate_workflow_template(
               project_id=self.project_id,
               location=self.region,
               template_name=self.template_id,
               version=self.version,
               request_id=self.request_id,
               parameters=self.parameters,
               retry=self.retry,
               timeout=self.timeout,
               metadata=self.metadata,
           )
           self.log.info('Template Started.')
           while not operation.done():
               self.log.info('Workflow template still running.')
               time.sleep(600)
           self.log.info('Template Finished.')
   ```
   Doing the above did result in fewer task failing around 3/17 failed but the failure exists for no substantial reason.  
   
   If I remove the while block `while not operation.done():` or operation.result() from the original [here](https://airflow.apache.org/docs/apache-airflow-providers-google/stable/_modules/airflow/providers/google/cloud/operators/dataproc.html#DataprocInstantiateWorkflowTemplateOperator) . The task runs successfully but no wait for completion.  This lead me to believe that there is some bug with operation call on google.api_core. are majority of people facing this problem using google provider operators?
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] iostreamdoth commented on issue #16243: When triggering a job via airfow UI, job fails abruptly with Negsignal.SIGSEGV

Posted by GitBox <gi...@apache.org>.
iostreamdoth commented on issue #16243:
URL: https://github.com/apache/airflow/issues/16243#issuecomment-882709930


   well its a google grpc library issue
   https://github.com/grpc/grpc/issues/23796
   
   This is not airflow or providers issue I think this can be closed from what I understand


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] m1racoli edited a comment on issue #16243: When triggering a job via airfow UI, job fails abruptly with Negsignal.SIGSEGV

Posted by GitBox <gi...@apache.org>.
m1racoli edited a comment on issue #16243:
URL: https://github.com/apache/airflow/issues/16243#issuecomment-884202100


   @iostreamdoth @potiuk Using `grpcio<=1.30` certainly helps, but I am not sure if the topic can be considered closed.
   
   Version 1.30.0 of grpc is over a year old and currently there's no bugfix in sight. There is a lot of discussion in https://github.com/grpc/grpc/issues/23796 around potential issues when using multiprocessing with fork and it has been recommended to set
   
   ```
   GRPC_POLL_STRATEGY=epoll1
   ```
   
   in that case.
   
   Therefore it's fair to ask if Airflow's setup (when using Celery workers) works as a catalyst for this issue and if there's something to be done on Airflow's site.
   
   Edit: I might be wrong about Airflow using multiprocessing with fork.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] iostreamdoth commented on issue #16243: When triggering a job via airfow UI, job fails abruptly with Negsignal.SIGSEGV

Posted by GitBox <gi...@apache.org>.
iostreamdoth commented on issue #16243:
URL: https://github.com/apache/airflow/issues/16243#issuecomment-881909400


   I have encountered similar error in airflow 1.10.15 (on google cloud composer) with DataprocInstantiateWorkflowTemplateOperator. I think it has nothing to do with airflow as is but may be with google api's or some sort of library version mismatch.  I have 40 tasks running  in parallel for the above operator. and 15/17 fail with `Task exited with return code Negsignal.SIGSEGV `
   
   I m not able to get in more information but Here is what I did to get fewer errors. 
   
   I override execute function of the above operator.
   ```
   class InstantiateDataProcWorkflowTemplate(dataproc.DataprocInstantiateWorkflowTemplateOperator):
   
       def execute(self, context):
           def callback(operation_future):
               # Handle result.
               result = operation_future.result()
   
           hook = DataprocHook(gcp_conn_id=self.gcp_conn_id, impersonation_chain=self.impersonation_chain)
           self.log.info('Instantiating template %s', self.template_id)
           operation = hook.instantiate_workflow_template(
               project_id=self.project_id,
               location=self.region,
               template_name=self.template_id,
               version=self.version,
               request_id=self.request_id,
               parameters=self.parameters,
               retry=self.retry,
               timeout=self.timeout,
               metadata=self.metadata,
           )
           self.log.info('Template Started.')
           while not operation.done():
               self.log.info('Workflow template still running.')
               time.sleep(600)
           self.log.info('Template Finished.')
   ```
   Doing the above did result in fewer task failing around 3/17 failed but the failure exists for no substantial reason.  
   
   If I remove the while block `while not operation.done():` or operation.result() from the original [here](https://airflow.apache.org/docs/apache-airflow-providers-google/stable/_modules/airflow/providers/google/cloud/operators/dataproc.html#DataprocInstantiateWorkflowTemplateOperator) . The task runs successfully but no wait for completion.  This lead me to believe that there is some bug with asyn operation call on google.api_core. are majority of people facing this problem using google provider operators?
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk closed issue #16243: When triggering a job via airfow UI, job fails abruptly with Negsignal.SIGSEGV

Posted by GitBox <gi...@apache.org>.
potiuk closed issue #16243:
URL: https://github.com/apache/airflow/issues/16243


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] m1racoli edited a comment on issue #16243: When triggering a job via airfow UI, job fails abruptly with Negsignal.SIGSEGV

Posted by GitBox <gi...@apache.org>.
m1racoli edited a comment on issue #16243:
URL: https://github.com/apache/airflow/issues/16243#issuecomment-884202100


   @iostreamdoth @potiuk Using `grpcio<=1.30` certainly helps, but I am not sure if the topic can be considered closed.
   
   Version 1.30.0 of grpc is over a year old and currently there's no bugfix in sight. There is a lot of discussion in https://github.com/grpc/grpc/issues/23796 around potential issues when using multiprocessing with fork and it has been recommended to set
   
   ```
   GRPC_POLL_STRATEGY=epoll1
   ```
   
   in that case.
   
   Therefore it's fair to ask if Airflow's setup (when using Celery workers) works as a catalyst for this issue and if there's something to be done on Airflow's site.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk closed issue #16243: When triggering a job via airfow UI, job fails abruptly with Negsignal.SIGSEGV

Posted by GitBox <gi...@apache.org>.
potiuk closed issue #16243:
URL: https://github.com/apache/airflow/issues/16243


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] DrTeja commented on issue #16243: When triggering a job via airfow UI, job fails abruptly with Negsignal.SIGSEGV

Posted by GitBox <gi...@apache.org>.
DrTeja commented on issue #16243:
URL: https://github.com/apache/airflow/issues/16243#issuecomment-853952206


   [](url)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] DrTeja commented on issue #16243: When triggering a job via airfow UI, job fails abruptly with Negsignal.SIGSEGV

Posted by GitBox <gi...@apache.org>.
DrTeja commented on issue #16243:
URL: https://github.com/apache/airflow/issues/16243#issuecomment-853952206


   [](url)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] eladkal commented on issue #16243: When triggering a job via airfow UI, job fails abruptly with Negsignal.SIGSEGV

Posted by GitBox <gi...@apache.org>.
eladkal commented on issue #16243:
URL: https://github.com/apache/airflow/issues/16243#issuecomment-855600229


   I'm not really sure what we can do if you are unable to provide a reproduce example.
   The description is not enough to understand what is the issue. This may be more in the area of a technical support rather than an actual bug?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] iostreamdoth commented on issue #16243: When triggering a job via airfow UI, job fails abruptly with Negsignal.SIGSEGV

Posted by GitBox <gi...@apache.org>.
iostreamdoth commented on issue #16243:
URL: https://github.com/apache/airflow/issues/16243#issuecomment-882709930


   well its a google grpc library issue
   https://github.com/grpc/grpc/issues/23796
   
   This is not airflow or providers issue I think this can be closed from what I understand


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org