You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@airflow.apache.org by Reed Villanueva <rv...@ucera.org> on 2020/02/28 21:47:59 UTC

Airflow creating public tmp dirs for certain task processes?

Airflow (v1.10.7 running in LocalExecutor mode) appears to be automatically
creating publicly readable dirs in /tmp for certain tasks processes. The
files I've seen so far appear innocuous, but seems like a security risk and
would like to know why this may be happening and how to stop it.

I have an airflow task that runs a sqoop <https://sqoop.apache.org/> job.
It does this using a BashOperator that calls a bash script with the sqoop
job logic. I recently noticed that the server's /tmp dir had a public
folder called "sqoop-airflow" whos contents look like...

[root@airflowetl sqoop-airflow]# cd
/tmp/sqoop-airflow/compile/[root@airflowetl compile]# ls
drwxrwxrwx 2 airflow airflows 4.0K Feb 19 20:35
004c815bc9a978acd0093069eefff28a
drwxrwxrwx 2 airflow airflows 4.0K Feb 20 21:35
58d38131dc0a3c433c27bf60570c0135
drwxrwxrwx 2 airflow airflows 4.0K Feb 26 19:35
afe2b89410fee2b4467178eced9d40a8...[root@airflowetl
compile]#[root@airflowetl compile]#[root@airflowetl compile]#
#selecting one of the folders here[root@airflowetl compile]# cd
82298635a8574abd7a55b967cbc1bb64/[root@airflowetl
82298635a8574abd7a55b967cbc1bb64]# lsQueryResult_MY_TABLE$1.class
QueryResult_MY_TABLE$7.classQueryResult_MY_TABLE$2.class
QueryResult_MY_TABLE$8.classQueryResult_MY_TABLE$3.class
QueryResult_MY_TABLE.classQueryResult_MY_TABLE$4.class
QueryResult_MY_TABLE$FieldSetterCommand.classQueryResult_MY_TABLE$5.class
 MY_TABLE.jarQueryResult_MY_TABLE$6.class[root@airflowetl
compile]#[root@airflowetl compile]#[root@airflowetl compile]#
#selecting one of the folders here

Checking the scheduler logs for any reference to this folder shows
nothing...

[airflow@airflowetl airflow]$ cat airflow-scheduler.out | grep
sqoop-airflow[airflow@airflowetl airflow]$ cat airflow-scheduler.log |
grep sqoop-airflow

The reason I strongly suspect this is caused by airflow and not by
something within the bash script itself is that the folder being created in
/tmp is call "sqoop-*airflow*" and IDK how this name is created because it
is not the name of the script or the airflow task_id nor is it a string in
any of my own code (it is the name of the particular command being run
within the script among others).

Does anyone know how this could be happening / where this comes from? Any
way to further debug for more clarity on this?

-- 
This electronic message is intended only for the named 
recipient, and may 
contain information that is confidential or 
privileged. If you are not the 
intended recipient, you are 
hereby notified that any disclosure, copying, 
distribution or 
use of the contents of this message is strictly 
prohibited. If 
you have received this message in error or are not the 
named
recipient, please notify us immediately by contacting the 
sender at 
the electronic mail address noted above, and delete 
and destroy all copies 
of this message. Thank you.

Re: Airflow creating public tmp dirs for certain task processes?

Posted by Reed Villanueva <rv...@ucera.org>.
If nothing in the airflow codebase would create such a thing, then the
airflow-user angle is the mostly likely thing.
Didn't think about that, thanks for the catch.

On Fri, Feb 28, 2020 at 12:00 PM Ash Berlin-Taylor <as...@apache.org> wrote:

> Looking through the code for Airflow 1.10.7 I can't see anything in
> Airflow that would create that folder, especially not containing class
> files and a jar! There doesn't seem to be anything in the Sqoop hook or
> operator that would do it either.
>
> Oh wait BashOperator. The only files the BashOperator writes would be to
> /tmp/airflowtmp*/ -- so I don't know "airflow-sqoop" is coming from, but
> it's not Airflow.
>
> A possible guess: are you running things as the "airflow" linux user
> perhaps?
>
> -ash
>
> On Feb 28 2020, at 9:47 pm, Reed Villanueva <rv...@ucera.org> wrote:
>
> Airflow (v1.10.7 running in LocalExecutor mode) appears to be
> automatically creating publicly readable dirs in /tmp for certain tasks
> processes. The files I've seen so far appear innocuous, but seems like a
> security risk and would like to know why this may be happening and how to
> stop it.
> I have an airflow task that runs a sqoop <https://sqoop.apache.org/> job.
> It does this using a BashOperator that calls a bash script with the sqoop
> job logic. I recently noticed that the server's /tmp dir had a public
> folder called "sqoop-airflow" whos contents look like...
>
> [root@airflowetl sqoop-airflow]# cd /tmp/sqoop-airflow/compile/[root@airflowetl compile]# ls
> drwxrwxrwx 2 airflow airflows 4.0K Feb 19 20:35 004c815bc9a978acd0093069eefff28a
> drwxrwxrwx 2 airflow airflows 4.0K Feb 20 21:35 58d38131dc0a3c433c27bf60570c0135
> drwxrwxrwx 2 airflow airflows 4.0K Feb 26 19:35 afe2b89410fee2b4467178eced9d40a8
> ...[root@airflowetl compile]#[root@airflowetl compile]#[root@airflowetl compile]# #selecting one of the folders here[root@airflowetl compile]# cd 82298635a8574abd7a55b967cbc1bb64/[root@airflowetl 82298635a8574abd7a55b967cbc1bb64]# ls
> QueryResult_MY_TABLE$1.class  QueryResult_MY_TABLE$7.class
> QueryResult_MY_TABLE$2.class  QueryResult_MY_TABLE$8.class
> QueryResult_MY_TABLE$3.class  QueryResult_MY_TABLE.class
> QueryResult_MY_TABLE$4.class  QueryResult_MY_TABLE$FieldSetterCommand.class
> QueryResult_MY_TABLE$5.class  MY_TABLE.jar
> QueryResult_MY_TABLE$6.class
> [root@airflowetl compile]#[root@airflowetl compile]#[root@airflowetl compile]# #selecting one of the folders here
>
> Checking the scheduler logs for any reference to this folder shows
> nothing...
>
> [airflow@airflowetl airflow]$ cat airflow-scheduler.out | grep sqoop-airflow
> [airflow@airflowetl airflow]$ cat airflow-scheduler.log | grep sqoop-airflow
>
> The reason I strongly suspect this is caused by airflow and not by
> something within the bash script itself is that the folder being created in
> /tmp is call "sqoop-*airflow*" and IDK how this name is created because
> it is not the name of the script or the airflow task_id nor is it a string
> in any of my own code (it is the name of the particular command being run
> within the script among others).
> Does anyone know how this could be happening / where this comes from? Any
> way to further debug for more clarity on this?
>
> This electronic message is intended only for the named
> recipient, and may contain information that is confidential or
> privileged. If you are not the intended recipient, you are
> hereby notified that any disclosure, copying, distribution or
> use of the contents of this message is strictly prohibited. If
> you have received this message in error or are not the named
> recipient, please notify us immediately by contacting the
> sender at the electronic mail address noted above, and delete
> and destroy all copies of this message. Thank you.
>
>

-- 
This electronic message is intended only for the named 
recipient, and may 
contain information that is confidential or 
privileged. If you are not the 
intended recipient, you are 
hereby notified that any disclosure, copying, 
distribution or 
use of the contents of this message is strictly 
prohibited. If 
you have received this message in error or are not the 
named
recipient, please notify us immediately by contacting the 
sender at 
the electronic mail address noted above, and delete 
and destroy all copies 
of this message. Thank you.

Re: Airflow creating public tmp dirs for certain task processes?

Posted by Ash Berlin-Taylor <as...@apache.org>.
Looking through the code for Airflow 1.10.7 I can't see anything in Airflow that would create that folder, especially not containing class files and a jar! There doesn't seem to be anything in the Sqoop hook or operator that would do it either.

Oh wait BashOperator. The only files the BashOperator writes would be to /tmp/airflowtmp*/ -- so I don't know "airflow-sqoop" is coming from, but it's not Airflow.
A possible guess: are you running things as the "airflow" linux user perhaps?
-ash
On Feb 28 2020, at 9:47 pm, Reed Villanueva <rv...@ucera.org> wrote:
> Airflow (v1.10.7 running in LocalExecutor mode) appears to be automatically creating publicly readable dirs in /tmp for certain tasks processes. The files I've seen so far appear innocuous, but seems like a security risk and would like to know why this may be happening and how to stop it.
> I have an airflow task that runs a sqoop (https://sqoop.apache.org/) job. It does this using a BashOperator that calls a bash script with the sqoop job logic. I recently noticed that the server's /tmp dir had a public folder called "sqoop-airflow" whos contents look like...
> [root@airflowetl sqoop-airflow]# cd /tmp/sqoop-airflow/compile/[root@airflowetl compile]# ls
> drwxrwxrwx 2 airflow airflows 4.0K Feb 19 20:35 004c815bc9a978acd0093069eefff28a
> drwxrwxrwx 2 airflow airflows 4.0K Feb 20 21:35 58d38131dc0a3c433c27bf60570c0135
> drwxrwxrwx 2 airflow airflows 4.0K Feb 26 19:35 afe2b89410fee2b4467178eced9d40a8
> ...[root@airflowetl compile]#[root@airflowetl compile]#[root@airflowetl compile]# #selecting one of the folders here[root@airflowetl compile]# cd 82298635a8574abd7a55b967cbc1bb64/[root@airflowetl 82298635a8574abd7a55b967cbc1bb64]# ls
> QueryResult_MY_TABLE$1.class QueryResult_MY_TABLE$7.class
> QueryResult_MY_TABLE$2.class QueryResult_MY_TABLE$8.class
> QueryResult_MY_TABLE$3.class QueryResult_MY_TABLE.class
> QueryResult_MY_TABLE$4.class QueryResult_MY_TABLE$FieldSetterCommand.class
> QueryResult_MY_TABLE$5.class MY_TABLE.jar
> QueryResult_MY_TABLE$6.class
> [root@airflowetl compile]#[root@airflowetl compile]#[root@airflowetl compile]# #selecting one of the folders here
> Checking the scheduler logs for any reference to this folder shows nothing...
> [airflow@airflowetl airflow]$ cat airflow-scheduler.out | grep sqoop-airflow
> [airflow@airflowetl airflow]$ cat airflow-scheduler.log | grep sqoop-airflow
> The reason I strongly suspect this is caused by airflow and not by something within the bash script itself is that the folder being created in /tmp is call "sqoop-airflow" and IDK how this name is created because it is not the name of the script or the airflow task_id nor is it a string in any of my own code (it is the name of the particular command being run within the script among others).
> Does anyone know how this could be happening / where this comes from? Any way to further debug for more clarity on this?
>
>
> This electronic message is intended only for the named
> recipient, and may contain information that is confidential or
> privileged. If you are not the intended recipient, you are
> hereby notified that any disclosure, copying, distribution or
> use of the contents of this message is strictly prohibited. If
> you have received this message in error or are not the named
> recipient, please notify us immediately by contacting the
> sender at the electronic mail address noted above, and delete
> and destroy all copies of this message. Thank you.