You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@airflow.apache.org by Arthur Purvis <ap...@lumoslabs.com> on 2016/09/02 07:58:54 UTC

long running postgres queries timing out after exactly 2 hours?

Hi -

I am experiencing the following situation while evaluating airflow;
wondering if this rings any bells with anyone:


   1. *airflow 1.7.1.2* is running in a *docker* container on amazon *ECS*
   2. *dagrun_timeout* = timedelta(hours=24) for this DAG
   3. running redshift queries with the PostGresOperator works fine

*except when that query takes over two hours*, and then the query is *cut
off* *exactly at the two hour mark* with this message

[2016-09-02 00:32:14,307] {models.py:1286} ERROR - server closed the
connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/airflow/models.py", line
1245, in run
    result = task_copy.execute(context=context)
  File
"/usr/local/lib/python2.7/dist-packages/airflow/operators/postgres_operator.py",
line 39, in execute
    self.hook.run(self.sql, self.autocommit, parameters=self.parameters)
  File
"/usr/local/lib/python2.7/dist-packages/airflow/hooks/dbapi_hook.py", line
132, in run
    cur.execute(s)
OperationalError: server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.



We are 100% sure that this account is able to run long queries in redshift
because we run these same queries from outside airflow from a regular AWS
EC2 instance (not a docker container) without issue.

The *exactly *two hours says to me that there is some setting somewhere -
either in airflow itself or in the postgresoperator or in the docker TCP
settings (or somewhere else) - that is going awry, but I cannot sort out
what it could be.  It doesn't seem to be an issue with celery, because
other jobs are running that polling for 6+ hours.

If this rings any bells for anyone I would love to hear about it.

Thanks for reading.

Re: long running postgres queries timing out after exactly 2 hours?

Posted by J C Lawrence <cl...@kanga.nu>.
While this wasn't with Airflow, I had similar problems with connection
failures with Redshift on very long queries (6 hours+) when the client
was not in EC2. When I moved the client into EC2 those problems went
away.

-- JCL

On Thu, 8 Sep 2016 10:45:58 -0700, David Klosowski
<da...@thinknear.com> wrote:

> We're running airflow using docker + ecs with EMR and Redshift and
> have not noticed any issues with the PostgresOperator yet.  We don't
> have anything running in production with Redshift, but have evaluated
> a few DAGs using it.  I haven't noticed any issued with the
> Operator.  I would also look at the Redshift query console looking at
> the query in question to check it's status.  You can also check in
> with AWS support to see if they have any guidance.
> 
> Cheers,
> David
> 
> On Thu, Sep 8, 2016 at 8:22 AM, Maxime Beauchemin <
> maximebeauchemin@gmail.com> wrote:
> 
> > My intuition is that is comes from some Postgres or less likely but
> > possible environmental setting (os, network, ...) on your end.
> > Clearly the stack trace you share says that the server closed the
> > connection during the cursor's execute method. I bet you that you
> > can reproduce with a vanilla query using the same python driver
> > outside of Airflow, and most likely with any other API in other
> > languages.
> >
> > Good luck,
> >
> > Max
> >
> > On Fri, Sep 2, 2016 at 12:58 AM, Arthur Purvis
> > <ap...@lumoslabs.com> wrote:
> >
> > > Hi -
> > >
> > > I am experiencing the following situation while evaluating
> > > airflow; wondering if this rings any bells with anyone:
> > >
> > >
> > >    1. *airflow 1.7.1.2* is running in a *docker* container on
> > > amazon
> > *ECS*
> > >    2. *dagrun_timeout* = timedelta(hours=24) for this DAG
> > >    3. running redshift queries with the PostGresOperator works
> > > fine
> > >
> > > *except when that query takes over two hours*, and then the query
> > > is *cut off* *exactly at the two hour mark* with this message
> > >
> > > [2016-09-02 00:32:14,307] {models.py:1286} ERROR - server closed
> > > the connection unexpectedly
> > > This probably means the server terminated abnormally
> > > before or while processing the request.
> > > Traceback (most recent call last):
> > >   File
> > > "/usr/local/lib/python2.7/dist-packages/airflow/models.py", line
> > > 1245, in run result = task_copy.execute(context=context)
> > >   File
> > > "/usr/local/lib/python2.7/dist-packages/airflow/
> > > operators/postgres_operator.py",
> > > line 39, in execute
> > >     self.hook.run(self.sql, self.autocommit,
> > > parameters=self.parameters) File
> > > "/usr/local/lib/python2.7/dist-packages/airflow/hooks/dbapi_hook.py",
> > line
> > > 132, in run
> > >     cur.execute(s)
> > > OperationalError: server closed the connection unexpectedly
> > > This probably means the server terminated abnormally
> > > before or while processing the request.
> > >
> > >
> > >
> > > We are 100% sure that this account is able to run long queries in
> > redshift
> > > because we run these same queries from outside airflow from a
> > > regular AWS EC2 instance (not a docker container) without issue.
> > >
> > > The *exactly *two hours says to me that there is some setting
> > > somewhere - either in airflow itself or in the postgresoperator
> > > or in the docker TCP settings (or somewhere else) - that is going
> > > awry, but I cannot sort out what it could be.  It doesn't seem to
> > > be an issue with celery, because other jobs are running that
> > > polling for 6+ hours.
> > >
> > > If this rings any bells for anyone I would love to hear about it.
> > >
> > > Thanks for reading.

Re: long running postgres queries timing out after exactly 2 hours?

Posted by David Klosowski <da...@thinknear.com>.
We're running airflow using docker + ecs with EMR and Redshift and have not
noticed any issues with the PostgresOperator yet.  We don't have anything
running in production with Redshift, but have evaluated a few DAGs using
it.  I haven't noticed any issued with the Operator.  I would also look at
the Redshift query console looking at the query in question to check it's
status.  You can also check in with AWS support to see if they have any
guidance.

Cheers,
David

On Thu, Sep 8, 2016 at 8:22 AM, Maxime Beauchemin <
maximebeauchemin@gmail.com> wrote:

> My intuition is that is comes from some Postgres or less likely but
> possible environmental setting (os, network, ...) on your end. Clearly the
> stack trace you share says that the server closed the connection during the
> cursor's execute method. I bet you that you can reproduce with a vanilla
> query using the same python driver outside of Airflow, and most likely with
> any other API in other languages.
>
> Good luck,
>
> Max
>
> On Fri, Sep 2, 2016 at 12:58 AM, Arthur Purvis <ap...@lumoslabs.com>
> wrote:
>
> > Hi -
> >
> > I am experiencing the following situation while evaluating airflow;
> > wondering if this rings any bells with anyone:
> >
> >
> >    1. *airflow 1.7.1.2* is running in a *docker* container on amazon
> *ECS*
> >    2. *dagrun_timeout* = timedelta(hours=24) for this DAG
> >    3. running redshift queries with the PostGresOperator works fine
> >
> > *except when that query takes over two hours*, and then the query is *cut
> > off* *exactly at the two hour mark* with this message
> >
> > [2016-09-02 00:32:14,307] {models.py:1286} ERROR - server closed the
> > connection unexpectedly
> > This probably means the server terminated abnormally
> > before or while processing the request.
> > Traceback (most recent call last):
> >   File "/usr/local/lib/python2.7/dist-packages/airflow/models.py", line
> > 1245, in run
> >     result = task_copy.execute(context=context)
> >   File
> > "/usr/local/lib/python2.7/dist-packages/airflow/
> > operators/postgres_operator.py",
> > line 39, in execute
> >     self.hook.run(self.sql, self.autocommit, parameters=self.parameters)
> >   File
> > "/usr/local/lib/python2.7/dist-packages/airflow/hooks/dbapi_hook.py",
> line
> > 132, in run
> >     cur.execute(s)
> > OperationalError: server closed the connection unexpectedly
> > This probably means the server terminated abnormally
> > before or while processing the request.
> >
> >
> >
> > We are 100% sure that this account is able to run long queries in
> redshift
> > because we run these same queries from outside airflow from a regular AWS
> > EC2 instance (not a docker container) without issue.
> >
> > The *exactly *two hours says to me that there is some setting somewhere -
> > either in airflow itself or in the postgresoperator or in the docker TCP
> > settings (or somewhere else) - that is going awry, but I cannot sort out
> > what it could be.  It doesn't seem to be an issue with celery, because
> > other jobs are running that polling for 6+ hours.
> >
> > If this rings any bells for anyone I would love to hear about it.
> >
> > Thanks for reading.
> >
>

Re: long running postgres queries timing out after exactly 2 hours?

Posted by Maxime Beauchemin <ma...@gmail.com>.
My intuition is that is comes from some Postgres or less likely but
possible environmental setting (os, network, ...) on your end. Clearly the
stack trace you share says that the server closed the connection during the
cursor's execute method. I bet you that you can reproduce with a vanilla
query using the same python driver outside of Airflow, and most likely with
any other API in other languages.

Good luck,

Max

On Fri, Sep 2, 2016 at 12:58 AM, Arthur Purvis <ap...@lumoslabs.com>
wrote:

> Hi -
>
> I am experiencing the following situation while evaluating airflow;
> wondering if this rings any bells with anyone:
>
>
>    1. *airflow 1.7.1.2* is running in a *docker* container on amazon *ECS*
>    2. *dagrun_timeout* = timedelta(hours=24) for this DAG
>    3. running redshift queries with the PostGresOperator works fine
>
> *except when that query takes over two hours*, and then the query is *cut
> off* *exactly at the two hour mark* with this message
>
> [2016-09-02 00:32:14,307] {models.py:1286} ERROR - server closed the
> connection unexpectedly
> This probably means the server terminated abnormally
> before or while processing the request.
> Traceback (most recent call last):
>   File "/usr/local/lib/python2.7/dist-packages/airflow/models.py", line
> 1245, in run
>     result = task_copy.execute(context=context)
>   File
> "/usr/local/lib/python2.7/dist-packages/airflow/
> operators/postgres_operator.py",
> line 39, in execute
>     self.hook.run(self.sql, self.autocommit, parameters=self.parameters)
>   File
> "/usr/local/lib/python2.7/dist-packages/airflow/hooks/dbapi_hook.py", line
> 132, in run
>     cur.execute(s)
> OperationalError: server closed the connection unexpectedly
> This probably means the server terminated abnormally
> before or while processing the request.
>
>
>
> We are 100% sure that this account is able to run long queries in redshift
> because we run these same queries from outside airflow from a regular AWS
> EC2 instance (not a docker container) without issue.
>
> The *exactly *two hours says to me that there is some setting somewhere -
> either in airflow itself or in the postgresoperator or in the docker TCP
> settings (or somewhere else) - that is going awry, but I cannot sort out
> what it could be.  It doesn't seem to be an issue with celery, because
> other jobs are running that polling for 6+ hours.
>
> If this rings any bells for anyone I would love to hear about it.
>
> Thanks for reading.
>