You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2019/01/09 16:57:00 UTC

[jira] [Commented] (AIRFLOW-3567) Logs are not fetched from S3 when using AIM policy

    [ https://issues.apache.org/jira/browse/AIRFLOW-3567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16738432#comment-16738432 ] 

ASF GitHub Bot commented on AIRFLOW-3567:
-----------------------------------------

Penumbra69 commented on pull request #4467: [AIRFLOW-3567] Add archives config option to SparkSubmitOperator
URL: https://github.com/apache/airflow/pull/4467
 
 
   To enable to spark behavior of transporting and extracting an archive
   on job launch,  making the _contents_ of the archive available to the
   driver as well as the workers (not just the jar or archive as a zip
   file) - this configuration attribute is necessary.
   
   This is required if you have no ability to modify the Python env on
   the worker / driver nodes, but you wish to use versions, modules, or
   features not installed.
   
   We transport a full Python 3.5 environment to our CDH cluster using
   this option and the alias "#PYTHON" paired an additional configuration
   to spark to use it:
   
     --archives "hdfs:///user/myuser/my_python_env.zip#PYTHON"
     --conf "spark.yarn.appMasterEnv.PYSPARK_PYTHON=./PYTHON/python35/bin/python3"
   
   Resolves: AIRFLOW-3567
   
   Make sure you have checked _all_ steps below.
   
   ### Jira
   
   - [x] My PR addresses the following [Airflow Jira](https://issues.apache.org/jira/browse/AIRFLOW/) issues and references them in the PR title. For example, "\[AIRFLOW-XXX\] My Airflow PR"
     - https://issues.apache.org/jira/browse/AIRFLOW-XXX
     - In case you are fixing a typo in the documentation you can prepend your commit with \[AIRFLOW-XXX\], code changes always need a Jira issue.
   
   ### Description
   
   - [x] Here are some details about my PR, including screenshots of any UI changes:
   
   To enable to spark behavior of transporting and extracting an archive
   on job launch,  making the _contents_ of the archive available to the
   driver as well as the workers (not just the jar or archive as a zip
   file) - this configuration attribute is necessary.
   
   ### Tests
   
   - [x] My PR adds the following unit tests __OR__ does not need testing for this extremely good reason:
   My PR adjusted existing tests to include the new attribute.  If the original test was considered sufficient, I hope this addendum does as well.
   
   ### Commits
   
   - [x] My commits all reference Jira issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)":
     1. Subject is separated from body by a blank line
     1. Subject is limited to 50 characters (not including Jira issue reference)
     1. Subject does not end with a period
     1. Subject uses the imperative mood ("add", not "adding")
     1. Body wraps at 72 characters
     1. Body explains "what" and "why", not "how"
   
   ### Documentation
   
   - [x] In case of new functionality, my PR adds documentation that describes how to use it.
     - When adding new operators/hooks/sensors, the autoclass documentation generation needs to be added.
     - All the public functions and the classes in the PR contain docstrings that explain what it does
   
   ### Code Quality
   
   - [x] Passes `flake8`
   
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


> Logs are not fetched from S3 when using AIM policy
> --------------------------------------------------
>
>                 Key: AIRFLOW-3567
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-3567
>             Project: Apache Airflow
>          Issue Type: Bug
>          Components: logging
>    Affects Versions: 1.10.1
>            Reporter: Michael Spector
>            Priority: Major
>
> The remote logging is configured as follows:
> {code:python}
> remote_logging=True
> remote_log_conn_id=''
> remote_base_log_folder='s3://logs-bucket/airflow/analytics-airflow'
> encrypt_s3_logs=False
> {code}
> Logs on S3 are created successfully, however when trying to access them via UI we're getting something like this:
> {code:bash}
> Log file does not exist: /var/lib/airflow/logs/jobname/taskname/2018-12-25T03:00:00+00:00/1.logĀ 
> Fetching from: http://10.118.19.142:50192/log/jobname/taskname/2018-12-25T03:00:00+00:00/1.log
> {code}
> Which means that logs fetcher falls back to FileTaskHandler.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)