You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by "Rui Wang (JIRA)" <ji...@apache.org> on 2017/02/09 20:24:41 UTC

[jira] [Updated] (AIRFLOW-855) Security - Airflow SQLAlchemy PickleType Allows for Code Execution

     [ https://issues.apache.org/jira/browse/AIRFLOW-855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Rui Wang updated AIRFLOW-855:
-----------------------------
    Description: 
Impact: Anyone able to modify the application's underlying database, or a computer where certain DAG tasks are executed, may execute arbitrary code on the Airflow host.
Location: The XCom class in /airflow-internal-master/airflow/models.py
Description: Airflow uses the SQLAlchemy object-relational mapping (ORM) to allow for a database agnostic, object-oriented manipulation of application data. You express database tables and values using Python (in this application's use) classes, and the ORM transparently manipulates the underlying database, when you programatically access these structures.
Airflow defines the following class, defining an XCom's11 ORM model:
{code}
class XCom(Base): 
  """
  Base class for XCom objects. 
  """
  __tablename__ = "xcom"
  id = Column(Integer, primary_key=True) 
  key = Column(String(512))
  value = Column(PickleType(pickler=dill)) 
  timestamp = Column(
    DateTime, default=func.now(), nullable=False) 
  execution_date = Column(DateTime, nullable=False)
{code}
XComs are used for inter-task communication, and their values are either defined in a DAG, or the return value of the python_callable() function or the task's execute() method, executed on an remote host. XCom values are, according to this model, of the PickleType, meaning that objects assigned to the value column are transparently serialized (when being written to) and deserialized (when being read from). The deserialization of user- controlled pickle objects allows for the execution of arbitrary code. This means that "slaves" (where DAG code is executed) can compromise "masters" (where DAGs are defined in code) by returning an object that, when serialized (and subsequently deserialized), causes remote code execution. This can also be triggered by anyone who has write access to this portion of the database.
Note: NCC Group plans to meet with developers in the coming days to discuss this finding, and it will be updated to reflect any additional insight provided by this meeting.
Reproduction Steps:
1. Configure a local instance of Airflow.
2. Insert the attached DAG into your AIRFLOW_HOME/dags directory.
This example models a slave returning a malicious object to a task's python_callable by creating a portable object (with reduce) containing a reverse shell and pushing it as an XCom's value. This value is serialized upon xcom_push and deserialized upon xcom_pull.
In an actual exploit scenario, this value would be DAG function's return value, as assigned by code within the function, executing on a malicious remote machine.
3. Start a netcat listener on your machine's port 4444
4. Execute this task from the command line with airflow run push 2016-11-17. Note that your netcat listener has received a shell connect-back.
Remediation: Consider the use of a custom SQLAlchemy data type that performs this transparent serialization and deserialization, but with JSON (a text-based exchange format), rather than pickles (which may contain code).

  was:
Impact: Anyone able to modify the application's underlying database, or a computer where certain DAG tasks are executed, may execute arbitrary code on the Airflow host.
Location: The XCom class in /airflow-internal-master/airflow/models.py
Description: Airflow uses the SQLAlchemy object-relational mapping (ORM) to allow for a database agnostic, object-oriented manipulation of application data. You express database tables and values using Python (in this application's use) classes, and the ORM transparently manipulates the underlying database, when you programatically access these structures.
Airflow defines the following class, defining an XCom's11 ORM model:
{code:title=Bar.python|borderStyle=solid}
class XCom(Base): 
  """
  Base class for XCom objects. 
  """
  __tablename__ = "xcom"
  id = Column(Integer, primary_key=True) 
  key = Column(String(512))
  value = Column(PickleType(pickler=dill)) 
  timestamp = Column(
    DateTime, default=func.now(), nullable=False) 
  execution_date = Column(DateTime, nullable=False)
{code}
XComs are used for inter-task communication, and their values are either defined in a DAG, or the return value of the python_callable() function or the task's execute() method, executed on an remote host. XCom values are, according to this model, of the PickleType, meaning that objects assigned to the value column are transparently serialized (when being written to) and deserialized (when being read from). The deserialization of user- controlled pickle objects allows for the execution of arbitrary code. This means that "slaves" (where DAG code is executed) can compromise "masters" (where DAGs are defined in code) by returning an object that, when serialized (and subsequently deserialized), causes remote code execution. This can also be triggered by anyone who has write access to this portion of the database.
Note: NCC Group plans to meet with developers in the coming days to discuss this finding, and it will be updated to reflect any additional insight provided by this meeting.
Reproduction Steps:
1. Configure a local instance of Airflow.
2. Insert the attached DAG into your AIRFLOW_HOME/dags directory.
This example models a slave returning a malicious object to a task's python_callable by creating a portable object (with reduce) containing a reverse shell and pushing it as an XCom's value. This value is serialized upon xcom_push and deserialized upon xcom_pull.
In an actual exploit scenario, this value would be DAG function's return value, as assigned by code within the function, executing on a malicious remote machine.
3. Start a netcat listener on your machine's port 4444
4. Execute this task from the command line with airflow run push 2016-11-17. Note that your netcat listener has received a shell connect-back.
Remediation: Consider the use of a custom SQLAlchemy data type that performs this transparent serialization and deserialization, but with JSON (a text-based exchange format), rather than pickles (which may contain code).


> Security - Airflow SQLAlchemy PickleType Allows for Code Execution
> ------------------------------------------------------------------
>
>                 Key: AIRFLOW-855
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-855
>             Project: Apache Airflow
>          Issue Type: Bug
>            Reporter: Rui Wang
>         Attachments: test_dag.txt
>
>
> Impact: Anyone able to modify the application's underlying database, or a computer where certain DAG tasks are executed, may execute arbitrary code on the Airflow host.
> Location: The XCom class in /airflow-internal-master/airflow/models.py
> Description: Airflow uses the SQLAlchemy object-relational mapping (ORM) to allow for a database agnostic, object-oriented manipulation of application data. You express database tables and values using Python (in this application's use) classes, and the ORM transparently manipulates the underlying database, when you programatically access these structures.
> Airflow defines the following class, defining an XCom's11 ORM model:
> {code}
> class XCom(Base): 
>   """
>   Base class for XCom objects. 
>   """
>   __tablename__ = "xcom"
>   id = Column(Integer, primary_key=True) 
>   key = Column(String(512))
>   value = Column(PickleType(pickler=dill)) 
>   timestamp = Column(
>     DateTime, default=func.now(), nullable=False) 
>   execution_date = Column(DateTime, nullable=False)
> {code}
> XComs are used for inter-task communication, and their values are either defined in a DAG, or the return value of the python_callable() function or the task's execute() method, executed on an remote host. XCom values are, according to this model, of the PickleType, meaning that objects assigned to the value column are transparently serialized (when being written to) and deserialized (when being read from). The deserialization of user- controlled pickle objects allows for the execution of arbitrary code. This means that "slaves" (where DAG code is executed) can compromise "masters" (where DAGs are defined in code) by returning an object that, when serialized (and subsequently deserialized), causes remote code execution. This can also be triggered by anyone who has write access to this portion of the database.
> Note: NCC Group plans to meet with developers in the coming days to discuss this finding, and it will be updated to reflect any additional insight provided by this meeting.
> Reproduction Steps:
> 1. Configure a local instance of Airflow.
> 2. Insert the attached DAG into your AIRFLOW_HOME/dags directory.
> This example models a slave returning a malicious object to a task's python_callable by creating a portable object (with reduce) containing a reverse shell and pushing it as an XCom's value. This value is serialized upon xcom_push and deserialized upon xcom_pull.
> In an actual exploit scenario, this value would be DAG function's return value, as assigned by code within the function, executing on a malicious remote machine.
> 3. Start a netcat listener on your machine's port 4444
> 4. Execute this task from the command line with airflow run push 2016-11-17. Note that your netcat listener has received a shell connect-back.
> Remediation: Consider the use of a custom SQLAlchemy data type that performs this transparent serialization and deserialization, but with JSON (a text-based exchange format), rather than pickles (which may contain code).



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)