You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2020/10/12 17:44:52 UTC

[GitHub] [airflow] roveo opened a new issue #11476: VerticaOperator exception handling

roveo opened a new issue #11476:
URL: https://github.com/apache/airflow/issues/11476


   <!--
   
   Welcome to Apache Airflow!  For a smooth issue process, try to answer the following questions.
   Don't worry if they're not all applicable; just try to include what you can :-)
   
   If you need to include code snippets or logs, please put them in fenced code
   blocks.  If they're super-long, please use the details tag like
   <details><summary>super-long log</summary> lots of stuff </details>
   
   Please delete these comment blocks before submitting the issue.
   
   -->
   
   <!--
   
   IMPORTANT!!!
   
   PLEASE CHECK "SIMILAR TO X EXISTING ISSUES" OPTION IF VISIBLE
   NEXT TO "SUBMIT NEW ISSUE" BUTTON!!!
   
   PLEASE CHECK IF THIS ISSUE HAS BEEN REPORTED PREVIOUSLY USING SEARCH!!!
   
   Please complete the next sections or the issue will be closed.
   These questions are the first thing we need to know to understand the context.
   
   -->
   
   **Apache Airflow version**:
   1.10.10
   
   **Environment**:
   
   Docker on Linux (but it doesn't really matter here).
   
   **What happened**:
   
   `VerticaOperator` uses `vertica_python`. By default, if there are multiple statements in the query, `vertica_python` will return only the first result set on `cursor.fetchall()` and raise exceptions associated with this result set. E.g. a task running this query:
   
   ```sql
   select 1;
   select throw_error('test');
   ```
   
   executed by the operator will be marked as "Success" and won't show any exceptions in the logs. This is a minimal example of an exception, but the same goes for any error on the side of Vertica (most notably, constraint violations).
   
   **What you expected to happen**:
   
   All exceptions raised properly.
   
   DAG:
   ```py
   from airflow import DAG
   from airflow.contrib.operators.vertica_operator import VerticaOperator
   from airflow.utils.dates import days_ago
   
   query = """
   select 1;
   select throw_error("test");
   """
   
   with DAG(dag_id="test", schedule_interval="@daily", start_date=days_ago(0)) as dag:
       VerticaOperator(task_id="test", sql=query, vertica_conn_id="dwh")
   ```
   
   Then run:
   ```sh
   airflow test test test 2020-01-01
   ```
   
   Result on my setup:
   ```airflow@d98a1f7740e6:/opt/airflow$ airflow test test test 2020-01-01
   [2020-10-12 17:34:34,381] {__init__.py:51} INFO - Using executor SequentialExecutor
   [2020-10-12 17:34:34,382] {dagbag.py:396} INFO - Filling up the DagBag from /opt/airflow/dags
   [2020-10-12 17:34:36,856] {taskinstance.py:669} INFO - Dependencies all met for <TaskInstance: test.test 2020-01-01T00:00:00+00:00 [None]>
   [2020-10-12 17:34:37,038] {taskinstance.py:669} INFO - Dependencies all met for <TaskInstance: test.test 2020-01-01T00:00:00+00:00 [None]>
   [2020-10-12 17:34:37,038] {taskinstance.py:879} INFO - 
   --------------------------------------------------------------------------------
   [2020-10-12 17:34:37,039] {taskinstance.py:880} INFO - Starting attempt 1 of 1
   [2020-10-12 17:34:37,040] {taskinstance.py:881} INFO - 
   --------------------------------------------------------------------------------
   [2020-10-12 17:34:37,042] {taskinstance.py:900} INFO - Executing <Task(VerticaOperator): test> on 2020-01-01T00:00:00+00:00
   [2020-10-12 17:34:37,349] {vertica_operator.py:47} INFO - Executing: 
   select 1;
   select throw_error("test");
   [2020-10-12 17:34:37,414] {base_hook.py:87} INFO - Using connection to: id: ***. Host: ***, Port: None, Schema: ***, Login: ***, Password: ***, extra: None
   [2020-10-12 17:34:37,506] {dbapi_hook.py:174} INFO - 
   select 1;
   select throw_error("test");
   [2020-10-12 17:34:37,618] {taskinstance.py:1065} INFO - Marking task as SUCCESS.dag_id=test, task_id=test, execution_date=20200101T000000, start_date=20201012T173436, end_date=20201012T173437
   ```
   
   ** Fix ideas **
   
   Ideally, we should run some variation of this code either in the operator or in the hook:
   
   ```py
   cursor.execute(query)
   res = cursor.fetchone()
   while res is not None:
       cursor.nextset()
       res = cursor.fetchone()
   ```
   
   I can create a PR, but the main question is whether this is expected behaviour (well, I expected it and spent quite some time figuring out where the problem is) and where the fix should go.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] eladkal commented on issue #11476: VerticaOperator exception handling

Posted by GitBox <gi...@apache.org>.
eladkal commented on issue #11476:
URL: https://github.com/apache/airflow/issues/11476#issuecomment-981952514


   If there is an actual suggestion how to handle this on Airflow side - Please open a PR
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] AlbertoCrespi commented on issue #11476: VerticaOperator exception handling

Posted by GitBox <gi...@apache.org>.
AlbertoCrespi commented on issue #11476:
URL: https://github.com/apache/airflow/issues/11476#issuecomment-810429358


   Hi there, I'have the same issue.
   Basically I do agree with roveo. I have a files with many queries, and if in the middle something won't work properly, I still received a success task.
   
   Did I made some mistake? Do I need to split my queries in many vertica_operator?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] eladkal commented on issue #11476: VerticaOperator exception handling

Posted by GitBox <gi...@apache.org>.
eladkal commented on issue #11476:
URL: https://github.com/apache/airflow/issues/11476#issuecomment-1013868245


   Since there is a debate if this is an airflow issue or vertica issue and due to the fact that no one raised PR to address it I'm converting this issue to discussion. If anyone encounter this problem and has a code suggestion to Airflow - feel free to open PR directly.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] JeffryMAC commented on issue #11476: VerticaOperator exception handling

Posted by GitBox <gi...@apache.org>.
JeffryMAC commented on issue #11476:
URL: https://github.com/apache/airflow/issues/11476#issuecomment-707313976


   I think this is expected.
   Check their test for this behavior:
   https://github.com/vertica/vertica-python/blob/cd988c3d78618606f300f369b0f9ed80b07237c2/vertica_python/tests/integration_tests/test_cursor.py#L416
   
   Also:
   https://github.com/vertica/vertica-python/issues/255
   
   This seems more a feature request to vertical-python


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] boring-cyborg[bot] commented on issue #11476: VerticaOperator exception handling

Posted by GitBox <gi...@apache.org>.
boring-cyborg[bot] commented on issue #11476:
URL: https://github.com/apache/airflow/issues/11476#issuecomment-707258912


   Thanks for opening your first issue here! Be sure to follow the issue template!
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] roveo commented on issue #11476: VerticaOperator exception handling

Posted by GitBox <gi...@apache.org>.
roveo commented on issue #11476:
URL: https://github.com/apache/airflow/issues/11476#issuecomment-707338622


   > I think this is expected.
   > Check their test for this behavior:
   
   This is expected in vertica-python because there is a way to actually retrieve the exceptions by using it properly. But not in Airflow: now the operator works "fire-and-forget"-style so you don't know if anything went wrong. Since one the purposes of Airflow is knowing the state of you tasks and automatically taking the necessary actions (retrying a task or skipping downstream tasks), this seems like a bug or at least unexpected behavior (of Airflow) to me.
   
   Maybe I'm wrong about this, but I think it should be discussed further at least.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] roveo edited a comment on issue #11476: VerticaOperator exception handling

Posted by GitBox <gi...@apache.org>.
roveo edited a comment on issue #11476:
URL: https://github.com/apache/airflow/issues/11476#issuecomment-707260483


   I also created an issue in vertica-python repo some time ago, but I think this is not a bug in vertica-python, since there is a way to produce the exceptions.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] roveo edited a comment on issue #11476: VerticaOperator exception handling

Posted by GitBox <gi...@apache.org>.
roveo edited a comment on issue #11476:
URL: https://github.com/apache/airflow/issues/11476#issuecomment-707338622


   > I think this is expected.
   > Check their test for this behavior:
   
   This is expected in vertica-python because there is a way to actually retrieve the exceptions by using it properly. But not in Airflow: now the operator works "fire-and-forget"-style so you don't know if anything went wrong. Since one the purposes of Airflow is knowing the state of you tasks and automatically taking the necessary actions (retrying a task or skipping downstream tasks), this seems like a bug or at least unexpected behavior (of Airflow) to me.
   
   Maybe I'm wrong about this, or maybe I'm right, but the maintainers would consider this a breaking change. But I think it should be discussed further at least.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] JeffryMAC commented on issue #11476: VerticaOperator exception handling

Posted by GitBox <gi...@apache.org>.
JeffryMAC commented on issue #11476:
URL: https://github.com/apache/airflow/issues/11476#issuecomment-707597690


   I have no strong feeling here I just say that in my opinion it's better to address this by PR in vertica-python for a way to execute the code and if exception raised then actually raise it. When this functionality added to the lib it will be easier to expose it in the operator.
   
   You can always submit pull request to airflow and see if it will be accepted.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] roveo commented on issue #11476: VerticaOperator exception handling

Posted by GitBox <gi...@apache.org>.
roveo commented on issue #11476:
URL: https://github.com/apache/airflow/issues/11476#issuecomment-707260483


   I also created an issue in vertica-python repo some time before, but I think this is not a bug in vertica-python, since there is a way to produce the exceptions.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] roveo commented on issue #11476: VerticaOperator exception handling

Posted by GitBox <gi...@apache.org>.
roveo commented on issue #11476:
URL: https://github.com/apache/airflow/issues/11476#issuecomment-810487322


   @AlbertoCrespi 
   
   > Do I need to split my queries in many vertica_operator?
   
   Yes, or use my code above in a subclass of VerticaOperator with overriden `execute` method.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org