You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2020/01/11 23:48:05 UTC

[GitHub] [airflow] larryzhu2018 opened a new pull request #7141: add log_id to end-of-file mark and also add an index config for logs

larryzhu2018 opened a new pull request #7141: add log_id to end-of-file mark and also add an index config for logs
URL: https://github.com/apache/airflow/pull/7141
 
 
   ---
   Issue link: WILL BE INSERTED BY [boring-cyborg](https://github.com/kaxil/boring-cyborg)
   
   - [ ] Description above provides context of the change
   The “end of log” marker does not include the aforementioned log_id. The issue is then airflow-web does not know when to stop tailing the logs. 
   - [ ] Commit message/PR title starts with `[AIRFLOW-NNNN]`. AIRFLOW-NNNN = JIRA ID<sup>*</sup>
   - [ ] Unit tests coverage for changes (not needed for documentation changes)
   - [ ] Commits follow "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)"
   - [ ] Relevant documentation is updated including usage instructions.
   - [ ] I will engage committers as explained in [Contribution Workflow Example](https://github.com/apache/airflow/blob/master/CONTRIBUTING.rst#contribution-workflow-example).
   
   <sup>*</sup> For document-only changes commit message can start with `[AIRFLOW-XXXX]`.
   
   ---
   In case of fundamental code change, Airflow Improvement Proposal ([AIP](https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Improvements+Proposals)) is needed.
   In case of a new dependency, check compliance with the [ASF 3rd Party License Policy](https://www.apache.org/legal/resolved.html#category-x).
   In case of backwards incompatible changes please leave a note in [UPDATING.md](https://github.com/apache/airflow/blob/master/UPDATING.md).
   Read the [Pull Request Guidelines](https://github.com/apache/airflow/blob/master/CONTRIBUTING.rst#pull-request-guidelines) for more information.
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [airflow] codecov-io edited a comment on issue #7141: [AIRFLOW-6544] add log_id to end-of-file mark and also add an index config for logs

Posted by GitBox <gi...@apache.org>.
codecov-io edited a comment on issue #7141: [AIRFLOW-6544] add log_id to end-of-file mark and also add an index config for logs
URL: https://github.com/apache/airflow/pull/7141#issuecomment-573571156
 
 
   # [Codecov](https://codecov.io/gh/apache/airflow/pull/7141?src=pr&el=h1) Report
   > Merging [#7141](https://codecov.io/gh/apache/airflow/pull/7141?src=pr&el=desc) into [master](https://codecov.io/gh/apache/airflow/commit/d654d69d7794a57c5c51685a8a96f1d7c38c2302?src=pr&el=desc) will **decrease** coverage by `0.12%`.
   > The diff coverage is `66.66%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/airflow/pull/7141/graphs/tree.svg?width=650&token=WdLKlKHOAU&height=150&src=pr)](https://codecov.io/gh/apache/airflow/pull/7141?src=pr&el=tree)
   
   ```diff
   @@            Coverage Diff             @@
   ##           master    #7141      +/-   ##
   ==========================================
   - Coverage   85.24%   85.12%   -0.13%     
   ==========================================
     Files         683      710      +27     
     Lines       39155    39483     +328     
   ==========================================
   + Hits        33378    33609     +231     
   - Misses       5777     5874      +97
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/airflow/pull/7141?src=pr&el=tree) | Coverage Δ | |
   |---|---|---|
   | [airflow/config\_templates/airflow\_local\_settings.py](https://codecov.io/gh/apache/airflow/pull/7141/diff?src=pr&el=tree#diff-YWlyZmxvdy9jb25maWdfdGVtcGxhdGVzL2FpcmZsb3dfbG9jYWxfc2V0dGluZ3MucHk=) | `70.21% <0%> (-1.53%)` | :arrow_down: |
   | [airflow/utils/log/es\_task\_handler.py](https://codecov.io/gh/apache/airflow/pull/7141/diff?src=pr&el=tree#diff-YWlyZmxvdy91dGlscy9sb2cvZXNfdGFza19oYW5kbGVyLnB5) | `92.66% <80%> (-0.74%)` | :arrow_down: |
   | [airflow/kubernetes/volume\_mount.py](https://codecov.io/gh/apache/airflow/pull/7141/diff?src=pr&el=tree#diff-YWlyZmxvdy9rdWJlcm5ldGVzL3ZvbHVtZV9tb3VudC5weQ==) | `44.44% <0%> (-55.56%)` | :arrow_down: |
   | [airflow/kubernetes/volume.py](https://codecov.io/gh/apache/airflow/pull/7141/diff?src=pr&el=tree#diff-YWlyZmxvdy9rdWJlcm5ldGVzL3ZvbHVtZS5weQ==) | `52.94% <0%> (-47.06%)` | :arrow_down: |
   | [airflow/kubernetes/pod\_launcher.py](https://codecov.io/gh/apache/airflow/pull/7141/diff?src=pr&el=tree#diff-YWlyZmxvdy9rdWJlcm5ldGVzL3BvZF9sYXVuY2hlci5weQ==) | `45.25% <0%> (-46.72%)` | :arrow_down: |
   | [airflow/security/kerberos.py](https://codecov.io/gh/apache/airflow/pull/7141/diff?src=pr&el=tree#diff-YWlyZmxvdy9zZWN1cml0eS9rZXJiZXJvcy5weQ==) | `30.43% <0%> (-45.66%)` | :arrow_down: |
   | [airflow/kubernetes/refresh\_config.py](https://codecov.io/gh/apache/airflow/pull/7141/diff?src=pr&el=tree#diff-YWlyZmxvdy9rdWJlcm5ldGVzL3JlZnJlc2hfY29uZmlnLnB5) | `50.98% <0%> (-23.53%)` | :arrow_down: |
   | [...rflow/contrib/operators/kubernetes\_pod\_operator.py](https://codecov.io/gh/apache/airflow/pull/7141/diff?src=pr&el=tree#diff-YWlyZmxvdy9jb250cmliL29wZXJhdG9ycy9rdWJlcm5ldGVzX3BvZF9vcGVyYXRvci5weQ==) | `78.31% <0%> (-20.49%)` | :arrow_down: |
   | [airflow/gcp/operators/bigquery.py](https://codecov.io/gh/apache/airflow/pull/7141/diff?src=pr&el=tree#diff-YWlyZmxvdy9nY3Avb3BlcmF0b3JzL2JpZ3F1ZXJ5LnB5) | `91.59% <0%> (-0.49%)` | :arrow_down: |
   | [airflow/contrib/operators/gcs\_list\_operator.py](https://codecov.io/gh/apache/airflow/pull/7141/diff?src=pr&el=tree#diff-YWlyZmxvdy9jb250cmliL29wZXJhdG9ycy9nY3NfbGlzdF9vcGVyYXRvci5weQ==) | `100% <0%> (ø)` | :arrow_up: |
   | ... and [81 more](https://codecov.io/gh/apache/airflow/pull/7141/diff?src=pr&el=tree-more) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/airflow/pull/7141?src=pr&el=continue).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/airflow/pull/7141?src=pr&el=footer). Last update [d654d69...75be1e1](https://codecov.io/gh/apache/airflow/pull/7141?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [airflow] jward-bw commented on a change in pull request #7141: [AIRFLOW-6544] add log_id to end_of_log mark log record

Posted by GitBox <gi...@apache.org>.
jward-bw commented on a change in pull request #7141: [AIRFLOW-6544] add log_id to end_of_log mark log record
URL: https://github.com/apache/airflow/pull/7141#discussion_r373462956
 
 

 ##########
 File path: airflow/utils/log/es_task_handler.py
 ##########
 @@ -53,9 +53,12 @@ class ElasticsearchTaskHandler(FileTaskHandler, LoggingMixin):
     PAGE = 0
     MAX_LINE_PER_PAGE = 1000
 
+    # 16 is reasonable in this case
+    # pylint: disable-msg=too-many-arguments
     def __init__(self, base_log_folder, filename_template,
                  log_id_template, end_of_log_mark,
                  write_stdout, json_format, json_fields,
+                 index='*',
 
 Review comment:
   Just to check `*` will match all indexes? I am working with an ES cluster where we have a new index every day

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [airflow] larryzhu2018 commented on a change in pull request #7141: [AIRFLOW-6544] add log_id to end-of-file mark and also add an index config for logs

Posted by GitBox <gi...@apache.org>.
larryzhu2018 commented on a change in pull request #7141: [AIRFLOW-6544] add log_id to end-of-file mark and also add an index config for logs
URL: https://github.com/apache/airflow/pull/7141#discussion_r367788223
 
 

 ##########
 File path: airflow/utils/log/es_task_handler.py
 ##########
 @@ -255,7 +256,9 @@ def close(self):
 
         # Mark the end of file using end of log mark,
         # so we know where to stop while auto-tailing.
-        self.handler.stream.write(self.end_of_log_mark)
+        if self.write_stdout:
+            print()
+        self.handler.emit(logging.makeLogRecord({'msg': self.end_of_log_mark}))
 
 
 Review comment:
   Kevin, the removal of end-of-log mark is already handled by Andrii's initial changes for the stdout support: https://github.com/apache/airflow/commit/0da976a0e1e28e2c0cd274d7384cf2976db6deec#diff-485751b55125e8a90050d22f69e8467c
   
   
           # end_of_log_mark may contain characters like '\n' which is needed to
           # have the log uploaded but will not be stored in elasticsearch.
           metadata['end_of_log'] = False if not logs \
               else logs[-1].message == self.end_of_log_mark.strip()
   
   then
           # If we hit the end of the log, remove the actual end_of_log message
           # to prevent it from showing in the UI.
           i = len(logs) if not metadata['end_of_log'] else len(logs) - 1
           message = '\n'.join([log.message for log in logs[0:i]])
   
   Please see my test case test_close_with_log_id that exercises this logic in the tests now.
   Can you please check if this is clear to you now?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [airflow] larryzhu2018 commented on issue #7141: [AIRFLOW-6544] add log_id to end-of-file mark and also add an index config for logs

Posted by GitBox <gi...@apache.org>.
larryzhu2018 commented on issue #7141: [AIRFLOW-6544] add log_id to end-of-file mark and also add an index config for logs
URL: https://github.com/apache/airflow/pull/7141#issuecomment-575254776
 
 
   > * g another ticket?
   Please see the test case, I need the index parameter so that I can ensure there is only one log line in the index and I use a separate index for that. So it would be hard not to have the index parameter.
   >   general note: i would like to see test case to proof bug and improve test coverage after this change.
   added test cases
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [airflow] larryzhu2018 commented on a change in pull request #7141: [AIRFLOW-6544] add log_id to end-of-file mark and also add an index config for logs

Posted by GitBox <gi...@apache.org>.
larryzhu2018 commented on a change in pull request #7141: [AIRFLOW-6544] add log_id to end-of-file mark and also add an index config for logs
URL: https://github.com/apache/airflow/pull/7141#discussion_r368101978
 
 

 ##########
 File path: airflow/utils/log/es_task_handler.py
 ##########
 @@ -255,7 +256,9 @@ def close(self):
 
         # Mark the end of file using end of log mark,
         # so we know where to stop while auto-tailing.
-        self.handler.stream.write(self.end_of_log_mark)
+        if self.write_stdout:
+            print()
+        self.handler.emit(logging.makeLogRecord({'msg': self.end_of_log_mark}))
 
 
 Review comment:
   this works as long as you do not put white spaces into your end_of_log_mark. I think you would just ask for troubles by putting white space characters into the end-of-log mark.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [airflow] larryzhu2018 commented on a change in pull request #7141: [AIRFLOW-6544] add log_id to end-of-file mark and also add an index config for logs

Posted by GitBox <gi...@apache.org>.
larryzhu2018 commented on a change in pull request #7141: [AIRFLOW-6544] add log_id to end-of-file mark and also add an index config for logs
URL: https://github.com/apache/airflow/pull/7141#discussion_r368105431
 
 

 ##########
 File path: tests/utils/log/test_es_task_handler.py
 ##########
 @@ -262,21 +277,55 @@ def test_set_context_w_json_format_and_write_stdout(self):
         self.es_task_handler.json_format = True
         self.es_task_handler.set_context(self.ti)
 
-    def test_close(self):
-        formatter = logging.Formatter('%(asctime)s - %(name)s - %(levelname)s - %(message)s')
-        self.es_task_handler.formatter = formatter
+    def test_close_with_log_id(self):
+        es_task_handler = ElasticsearchTaskHandler(
+            self.local_log_location,
+            self.filename_template,
+            self.log_id_template,
+            self.end_of_log_mark,
+            self.write_stdout,
+            True,  # json_format
 
 Review comment:
   when you work with elastic search, json formatter is the way to go. We do not have grok filter in the unit tests to parse and index this non-json format message and then index them and look up as I do in the test case here. Realistically we should probably remove the non-json format option, and it is not practical to deploy it correctly as far as I can tell.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [airflow] larryzhu2018 commented on a change in pull request #7141: [AIRFLOW-6544] add log_id to end_of_log mark log record

Posted by GitBox <gi...@apache.org>.
larryzhu2018 commented on a change in pull request #7141: [AIRFLOW-6544] add log_id to end_of_log mark log record
URL: https://github.com/apache/airflow/pull/7141#discussion_r372636506
 
 

 ##########
 File path: airflow/utils/log/es_task_handler.py
 ##########
 @@ -255,7 +256,9 @@ def close(self):
 
         # Mark the end of file using end of log mark,
         # so we know where to stop while auto-tailing.
-        self.handler.stream.write(self.end_of_log_mark)
+        if self.write_stdout:
+            print()
+        self.handler.emit(logging.makeLogRecord({'msg': self.end_of_log_mark}))
 
 
 Review comment:
   > Hi @larryzhu2018 for your description:
   > ![image](https://user-images.githubusercontent.com/8662365/72946940-52353080-3d35-11ea-9c3b-47996b94f47f.png)
   > 
   > It looks like you have `write_stdout` as `true`, in fact, my pr: #7199 fixed the always `true` for `write_stdout`. could you please test the case when `write_stdout` is `False`.
   [larryzhu2018]: the test case I have in the PR, test_close_with_log_id, is setting write_stdout to False so please check.
   > 
   > Our set up is:
   > `write_stdout` is `False`
   > `json` is `False`
   > `END_OF_LOG_MARK = u'\u0004\n'`
   > 
   > This is what we see in the Kibana
   > ![image](https://user-images.githubusercontent.com/8662365/72947738-d8eb0d00-3d37-11ea-8eac-d341a2c66163.png)
   > 
   > I remember if the `end_of_log` is wrapped in a `logging.makeLogRecord` it will start with
   > 
   > ![image](https://user-images.githubusercontent.com/8662365/72947783-033cca80-3d38-11ea-9fc2-a31429464412.png)
   it is confusing in kibana because you use a white space as the mark. what do you think this would look like if you use "end_of_log_for_airflow_task_instance" as that is what I use in deployment.
   
   
   
   > There's a lot of disagreement going on here.
   > 
   > @larryzhu2018 Can you please give me DETAILED instructions on how to test this bug and your fix locally, assujming I have _no_ current elasticsearch or kibana set up currently. (Docker would be preferred.)
   
   Please note that I did not change the production code here. I only reverted Ping's recent fix that broke logging using elastic search. I also provided unit test case that tests the log_id logic, show how log_id is used and I also showed here the ingest processor we use so any one who has an elasticsearch can copy and paste and try it out.  Is this sufficient? I won't have time to add docker, and elastic ingestion nodes etc as I did not add the elastic-search logging support myself. I only reverted a recent change because the authors did not understand the code logic or how it is supposed to work? Do you see the community service I provided here?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [airflow] larryzhu2018 commented on a change in pull request #7141: [AIRFLOW-6544] add log_id to end-of-file mark and also add an index config for logs

Posted by GitBox <gi...@apache.org>.
larryzhu2018 commented on a change in pull request #7141: [AIRFLOW-6544] add log_id to end-of-file mark and also add an index config for logs
URL: https://github.com/apache/airflow/pull/7141#discussion_r366648976
 
 

 ##########
 File path: airflow/utils/log/es_task_handler.py
 ##########
 @@ -255,7 +256,9 @@ def close(self):
 
         # Mark the end of file using end of log mark,
         # so we know where to stop while auto-tailing.
-        self.handler.stream.write(self.end_of_log_mark)
+        if self.write_stdout:
+            print()
 
 Review comment:
   This is to make sure the "end-of-log" mark log record will always be in a separate line therefore a separate log record. as those like us work with elastic search for logging, for various reasons, it is not always easy to ensure log lines are not combined. Because the end-of-log mark is so special, it helps with the print() here to guarantee that we can always find the mark with the log-id from es_task_handler.py.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [airflow] larryzhu2018 commented on a change in pull request #7141: [AIRFLOW-6544] add log_id to end-of-file mark and also add an index config for logs

Posted by GitBox <gi...@apache.org>.
larryzhu2018 commented on a change in pull request #7141: [AIRFLOW-6544] add log_id to end-of-file mark and also add an index config for logs
URL: https://github.com/apache/airflow/pull/7141#discussion_r367788887
 
 

 ##########
 File path: airflow/utils/log/es_task_handler.py
 ##########
 @@ -255,7 +256,9 @@ def close(self):
 
         # Mark the end of file using end of log mark,
         # so we know where to stop while auto-tailing.
-        self.handler.stream.write(self.end_of_log_mark)
+        if self.write_stdout:
+            print()
+        self.handler.emit(logging.makeLogRecord({'msg': self.end_of_log_mark}))
 
 
 Review comment:
   I did observe that occasionally the end-of-log mark is not on its separate line in the console, then it can confuse elasticsearch. So I always add a print() in this PR right before the end-of-log print out and this is just a reliability fix given we need the end-of-log mark in a separate log line for this to work.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [airflow] larryzhu2018 commented on a change in pull request #7141: [AIRFLOW-6544] add log_id to end_of_log mark log record

Posted by GitBox <gi...@apache.org>.
larryzhu2018 commented on a change in pull request #7141: [AIRFLOW-6544] add log_id to end_of_log mark log record
URL: https://github.com/apache/airflow/pull/7141#discussion_r372652360
 
 

 ##########
 File path: tests/utils/log/test_es_task_handler.py
 ##########
 @@ -262,21 +277,55 @@ def test_set_context_w_json_format_and_write_stdout(self):
         self.es_task_handler.json_format = True
         self.es_task_handler.set_context(self.ti)
 
-    def test_close(self):
-        formatter = logging.Formatter('%(asctime)s - %(name)s - %(levelname)s - %(message)s')
 
 Review comment:
   @pingzh you do not use log_id at all in your deployment. you need that in order to make the airflow webui to work as the web ui is searching for the log_id, without that it will keep spinning. Can you please review the comments I added here and check? 
   
   please come back to me here, and tell us if you can make this work with the non-json log format. I do not think you can construct the log_id that is needed for the web-ui using the non-json log format.
   Let me recap: IIUC the only reason that you think your deployment worked was because you missed out the log_id logic completely and I have explained how it should work, where it is used, and also add added a unit test case to show you how this can work. Please review my explanations here and see if it is clear to you now. Thanks.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [airflow] larryzhu2018 commented on a change in pull request #7141: [AIRFLOW-6544] add log_id to end-of-file mark and also add an index config for logs

Posted by GitBox <gi...@apache.org>.
larryzhu2018 commented on a change in pull request #7141: [AIRFLOW-6544] add log_id to end-of-file mark and also add an index config for logs
URL: https://github.com/apache/airflow/pull/7141#discussion_r367788223
 
 

 ##########
 File path: airflow/utils/log/es_task_handler.py
 ##########
 @@ -255,7 +256,9 @@ def close(self):
 
         # Mark the end of file using end of log mark,
         # so we know where to stop while auto-tailing.
-        self.handler.stream.write(self.end_of_log_mark)
+        if self.write_stdout:
+            print()
+        self.handler.emit(logging.makeLogRecord({'msg': self.end_of_log_mark}))
 
 
 Review comment:
   Kevin, the removal of end-of-log mark is already handled by Andrii's initial changes adding the stdout support: https://github.com/apache/airflow/commit/0da976a0e1e28e2c0cd274d7384cf2976db6deec#diff-485751b55125e8a90050d22f69e8467c
   
           # end_of_log_mark may contain characters like '\n' which is needed to
           # have the log uploaded but will not be stored in elasticsearch.
           metadata['end_of_log'] = False if not logs \
               else logs[-1].message == self.end_of_log_mark.strip()
   
   then
           # If we hit the end of the log, remove the actual end_of_log message
           # to prevent it from showing in the UI.
           i = len(logs) if not metadata['end_of_log'] else len(logs) - 1
           message = '\n'.join([log.message for log in logs[0:i]])
   
   Please see my test case test_close_with_log_id that exercises this logic in the tests now.
   Can you please check if this is clear to you now?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [airflow] ashb commented on a change in pull request #7141: [AIRFLOW-6544] add log_id to end-of-file mark and also add an index config for logs

Posted by GitBox <gi...@apache.org>.
ashb commented on a change in pull request #7141: [AIRFLOW-6544] add log_id to end-of-file mark and also add an index config for logs
URL: https://github.com/apache/airflow/pull/7141#discussion_r366228673
 
 

 ##########
 File path: airflow/utils/log/es_task_handler.py
 ##########
 @@ -255,7 +256,9 @@ def close(self):
 
         # Mark the end of file using end of log mark,
         # so we know where to stop while auto-tailing.
-        self.handler.stream.write(self.end_of_log_mark)
+        if self.write_stdout:
+            print()
+        self.handler.emit(logging.makeLogRecord({'msg': self.end_of_log_mark}))
 
 
 Review comment:
   Doesn't this change where the end of log mark goes?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [airflow] larryzhu2018 commented on a change in pull request #7141: [AIRFLOW-6544] add log_id to end-of-file mark and also add an index config for logs

Posted by GitBox <gi...@apache.org>.
larryzhu2018 commented on a change in pull request #7141: [AIRFLOW-6544] add log_id to end-of-file mark and also add an index config for logs
URL: https://github.com/apache/airflow/pull/7141#discussion_r367788223
 
 

 ##########
 File path: airflow/utils/log/es_task_handler.py
 ##########
 @@ -255,7 +256,9 @@ def close(self):
 
         # Mark the end of file using end of log mark,
         # so we know where to stop while auto-tailing.
-        self.handler.stream.write(self.end_of_log_mark)
+        if self.write_stdout:
+            print()
+        self.handler.emit(logging.makeLogRecord({'msg': self.end_of_log_mark}))
 
 
 Review comment:
   Kevin, the removal of end-of-log mark is already handled by Andrii's initial changes for the stdout support so it won't be displayed to the users. Pleas see https://github.com/apache/airflow/commit/0da976a0e1e28e2c0cd274d7384cf2976db6deec#diff-485751b55125e8a90050d22f69e8467c
   
   
           # end_of_log_mark may contain characters like '\n' which is needed to
           # have the log uploaded but will not be stored in elasticsearch.
           metadata['end_of_log'] = False if not logs \
               else logs[-1].message == self.end_of_log_mark.strip()
   
   then
   
           # If we hit the end of the log, remove the actual end_of_log message
           # to prevent it from showing in the UI.
           i = len(logs) if not metadata['end_of_log'] else len(logs) - 1
           message = '\n'.join([log.message for log in logs[0:i]])
   
   Please see my test case test_close_with_log_id that exercises this logic in the tests now.
   Can you please check if this is clear to you now?
   
   Logid is constructed on the Elasticsearch but it needs the dag_id, task_id, execution_date and try_number to compute the log_id and that is why  you need to use emit() to include the information. In my test case, here is how I simulate the logic in the elastic search processors:
               msg['log_id'] = self.log_id_template.format(
                   dag_id=msg['dag_id'],
                   task_id=msg['task_id'],
                   execution_date=msg['execution_date'],
                   try_number=msg['try_number'])
               msg['message'] = msg['message'].strip()
               msg['offset'] = 100
   
   To do the same, the elastic search ingest processor pipeline looks like the following for me:
   {
       "description" : "cluster json log Pipeline",
       "processors" : [
         {
           "rename" : {
             "field" : "message",
             "target_field" : "json_msg"
           }
         },
         {
           "json" : {
             "field" : "json_msg",
             "add_to_root" : true
           }
         },
         {
           "rename" : {
             "field" : "message",
             "target_field" : "outter_msg"
           }
         },      
         {
           "grok" : {
             "field" : "outter_msg",
             "patterns" : [
               "%{DATA} {%{DATA}, \"message\": \"%{DATA:message}\", %{GREEDYDATA}}",
               "%{GREEDYDATA}"
             ]
           }      
         },
         {
           "set" : {
            "field" : "event.kind",
             "value" : "tasks",
             "if" : "ctx.message != null"          
           }
         },
         {
           "rename" : {
             "field" : "outter_msg",
             "target_field" : "message",
             "if" : "ctx.message == null"
           }
         },  
         {
           "remove" : {
             "field" : "outter_msg",
             "ignore_missing" : true
           }
         },      
         {
           "set" : {
             "field" : "event.dataset",
             "value" : "airflow",
             "if" : "ctx.dag_id != null && ctx.task_id != null"
           }
         },    
        {
           "set" : {
             "field" : "log_id",
             "value" : "{{dag_id}}-{{task_id}}-{{execution_date}}-{{try_number}}",
             "if" : "ctx.event?.dataset == 'airflow'"
           }
         },
        {
           "set" : {
             "field" : "offset",
             "value" : "{{log.offset}}",
             "if" : "ctx.event?.dataset == 'airflow'"
           }
         }       
       ],
       "on_failure" : [
         {
           "set" : {
             "field" : "error.message",
             "value" : "{{ _ingest.on_failure_message }}"
           }
         }
       ]
     }
   
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [airflow] larryzhu2018 commented on issue #7141: [AIRFLOW-6544] add log_id to end_of_log mark log record

Posted by GitBox <gi...@apache.org>.
larryzhu2018 commented on issue #7141: [AIRFLOW-6544] add log_id to end_of_log mark log record
URL: https://github.com/apache/airflow/pull/7141#issuecomment-612774103
 
 
   > Any updates on this? Not sure why, but this seems stuck. Any help wanted?
   
   can you please help to move this forward?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [airflow] larryzhu2018 commented on a change in pull request #7141: [AIRFLOW-6544] add log_id to end-of-file mark and also add an index config for logs

Posted by GitBox <gi...@apache.org>.
larryzhu2018 commented on a change in pull request #7141: [AIRFLOW-6544] add log_id to end-of-file mark and also add an index config for logs
URL: https://github.com/apache/airflow/pull/7141#discussion_r366645903
 
 

 ##########
 File path: airflow/utils/log/es_task_handler.py
 ##########
 @@ -255,7 +256,9 @@ def close(self):
 
         # Mark the end of file using end of log mark,
         # so we know where to stop while auto-tailing.
-        self.handler.stream.write(self.end_of_log_mark)
+        if self.write_stdout:
+            print()
+        self.handler.emit(logging.makeLogRecord({'msg': self.end_of_log_mark}))
 
 
 Review comment:
   yes, I sent email to them and suggested those changes are bad and copied that message to you. Please see email I sent to you and authors of 5528.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [airflow] andriisoldatenko commented on a change in pull request #7141: [AIRFLOW-6544] add log_id to end-of-file mark and also add an index config for logs

Posted by GitBox <gi...@apache.org>.
andriisoldatenko commented on a change in pull request #7141: [AIRFLOW-6544] add log_id to end-of-file mark and also add an index config for logs
URL: https://github.com/apache/airflow/pull/7141#discussion_r366292158
 
 

 ##########
 File path: airflow/utils/log/es_task_handler.py
 ##########
 @@ -255,7 +256,9 @@ def close(self):
 
         # Mark the end of file using end of log mark,
         # so we know where to stop while auto-tailing.
-        self.handler.stream.write(self.end_of_log_mark)
+        if self.write_stdout:
+            print()
+        self.handler.emit(logging.makeLogRecord({'msg': self.end_of_log_mark}))
 
 Review comment:
   pls review [AIRFLOW-5528] (end_of_log_mark should not be a log record (#6159) 
   where @pingzh changed this line:
   ```diff
   - self.handler.emit(logging.makeLogRecord({'msg': self.end_of_log_mark}))
   + self.handler.stream.write(self.end_of_log_mark)
   ```

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [airflow] pingzh commented on a change in pull request #7141: [AIRFLOW-6544] add log_id to end-of-file mark and also add an index config for logs

Posted by GitBox <gi...@apache.org>.
pingzh commented on a change in pull request #7141: [AIRFLOW-6544] add log_id to end-of-file mark and also add an index config for logs
URL: https://github.com/apache/airflow/pull/7141#discussion_r368090344
 
 

 ##########
 File path: tests/utils/log/test_es_task_handler.py
 ##########
 @@ -262,21 +277,55 @@ def test_set_context_w_json_format_and_write_stdout(self):
         self.es_task_handler.json_format = True
         self.es_task_handler.set_context(self.ti)
 
-    def test_close(self):
-        formatter = logging.Formatter('%(asctime)s - %(name)s - %(levelname)s - %(message)s')
-        self.es_task_handler.formatter = formatter
+    def test_close_with_log_id(self):
+        es_task_handler = ElasticsearchTaskHandler(
+            self.local_log_location,
+            self.filename_template,
+            self.log_id_template,
+            self.end_of_log_mark,
+            self.write_stdout,
+            True,  # json_format
 
 Review comment:
   can you also have another test with non json format? and set the formatter as `logging.Formatter('%(asctime)s - %(name)s - %(levelname)s - %(message)s')` 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [airflow] derlaft commented on issue #7141: [AIRFLOW-6544] add log_id to end_of_log mark log record

Posted by GitBox <gi...@apache.org>.
derlaft commented on issue #7141: [AIRFLOW-6544] add log_id to end_of_log mark log record
URL: https://github.com/apache/airflow/pull/7141#issuecomment-607300663
 
 
   Any updates on this? Not sure why this seems stuck. Any help wanted?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [airflow] larryzhu2018 commented on a change in pull request #7141: [AIRFLOW-6544] add log_id to end-of-file mark and also add an index config for logs

Posted by GitBox <gi...@apache.org>.
larryzhu2018 commented on a change in pull request #7141: [AIRFLOW-6544] add log_id to end-of-file mark and also add an index config for logs
URL: https://github.com/apache/airflow/pull/7141#discussion_r366647600
 
 

 ##########
 File path: airflow/utils/log/es_task_handler.py
 ##########
 @@ -255,7 +256,9 @@ def close(self):
 
         # Mark the end of file using end of log mark,
         # so we know where to stop while auto-tailing.
-        self.handler.stream.write(self.end_of_log_mark)
+        if self.write_stdout:
+            print()
 
 Review comment:
   it is import for us to find the log_id mark in the log line with "end_of_log_mark" from the elastic search cluster. What I observed was that I saw the end of log mark can end up with the same line of the previous log lines hence it would prevent us from finding the end-of-log mark in some cases console prints from random places without the newline. I am adding an obnoxious new line (print()) so as to guarantee that end-of-log mark is a separate log record. For any other log line it is actually benign to have two log lines to combine into one line in elastic search. Only the end-of-log mark absolutely need to be in its own line.  This is just to make the solution here more robust and decoupled from the rest of log lines. I understand this is a fix for reliability and it probably is not very clean.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [airflow] ashb commented on issue #7141: [AIRFLOW-6544] add log_id to end-of-file mark and also add an index config for logs

Posted by GitBox <gi...@apache.org>.
ashb commented on issue #7141: [AIRFLOW-6544] add log_id to end-of-file mark and also add an index config for logs
URL: https://github.com/apache/airflow/pull/7141#issuecomment-574117895
 
 
   cc @andriisoldatenko @schnie 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [airflow] larryzhu2018 commented on issue #7141: [AIRFLOW-6544] add log_id to end_of_log mark log record

Posted by GitBox <gi...@apache.org>.
larryzhu2018 commented on issue #7141: [AIRFLOW-6544] add log_id to end_of_log mark log record
URL: https://github.com/apache/airflow/pull/7141#issuecomment-612774261
 
 
   Can you please help to move this forward?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [airflow] larryzhu2018 commented on a change in pull request #7141: [AIRFLOW-6544] add log_id to end_of_log mark log record

Posted by GitBox <gi...@apache.org>.
larryzhu2018 commented on a change in pull request #7141: [AIRFLOW-6544] add log_id to end_of_log mark log record
URL: https://github.com/apache/airflow/pull/7141#discussion_r372643591
 
 

 ##########
 File path: airflow/utils/log/es_task_handler.py
 ##########
 @@ -255,7 +256,9 @@ def close(self):
 
         # Mark the end of file using end of log mark,
         # so we know where to stop while auto-tailing.
-        self.handler.stream.write(self.end_of_log_mark)
+        if self.write_stdout:
+            print()
+        self.handler.emit(logging.makeLogRecord({'msg': self.end_of_log_mark}))
 
 
 Review comment:
   > ```
   >  else logs[-1].message == self.end_of_log_mark.strip()
   > ```
   > 
   > I think this will not work if the `end_of_log_mark` is wrapped as a log record, as the `message` will be the format that Kevin mentioned and it also depends on the `log formatter`.
   I did not change any production code but did add a test case to show how this works. 
   
   in short, you will need to deploy an ingest log processor, and the ones we have look like this:
   
   
   "description" : "cluster json log Pipeline",
   "processors" : [
     {
       "rename" : {
         "field" : "message",
         "target_field" : "raw_message"
       }
     },
     {
       "json" : {
         "field" : "raw_message",
         "add_to_root" : false,
         "target_field" : "json_target"
       }
     },
     {
       "grok" : {
         "field" : "json_target.message",
         "patterns" : [
           "Job %{DATA:job_id}: Subtask %{DATA} %{GREEDYDATA:json_msg}",
           "%{GREEDYDATA}"
         ]
       }
     },
     {
       "json" : {
         "field" : "json_msg",
         "add_to_root" : true,
         "if" : "ctx.job_id != null"
       }
     },
     {
       "json" : {
         "field" : "raw_message",
         "add_to_root" : true,
         "if" : "ctx.job_id == null"
       }
     },
     {
       "remove" : {
         "field" : "json_msg",
         "ignore_missing" : true
       }
     },
     {
       "remove" : {
         "field" : "json_target"
       }
     },
     {
       "set" : {
         "field" : "event.kind",
         "value" : "tasks",
         "if" : "ctx.message != null"
       }
     },
     {
       "set" : {
         "field" : "event.dataset",
         "value" : "airflow",
         "if" : "ctx.dag_id != null && ctx.task_id != null"
       }
     },
     {
       "set" : {
         "field" : "log_id",
         "value" : "{{dag_id}}-{{task_id}}-{{execution_date}}-{{try_number}}",
         "if" : "ctx.event?.dataset == 'airflow'"
       }
     },
     {
       "set" : {
         "field" : "offset",
         "value" : "{{log.offset}}",
         "if" : "ctx.event?.dataset == 'airflow'"
       }
     }
   ],
   "on_failure" : [
     {
       "set" : {
         "field" : "error.message",
         "value" : "{{ _ingest.on_failure_message }}"
       }
     }
   ] 
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [airflow] ashb commented on issue #7141: [AIRFLOW-6544] add log_id to end-of-file mark and also add an index config for logs

Posted by GitBox <gi...@apache.org>.
ashb commented on issue #7141: [AIRFLOW-6544] add log_id to end-of-file mark and also add an index config for logs
URL: https://github.com/apache/airflow/pull/7141#issuecomment-574139586
 
 
    @larryzhu2018 Yup, this is two tickets as Andrii mentions.
   
   And this seems to revert the change from https://github.com/apache/airflow/pull/6159 -- so you will need to explain this change in much more detail as to why you think it your version is now right.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [airflow] larryzhu2018 commented on a change in pull request #7141: [AIRFLOW-6544] add log_id to end_of_log mark log record

Posted by GitBox <gi...@apache.org>.
larryzhu2018 commented on a change in pull request #7141: [AIRFLOW-6544] add log_id to end_of_log mark log record
URL: https://github.com/apache/airflow/pull/7141#discussion_r372643591
 
 

 ##########
 File path: airflow/utils/log/es_task_handler.py
 ##########
 @@ -255,7 +256,9 @@ def close(self):
 
         # Mark the end of file using end of log mark,
         # so we know where to stop while auto-tailing.
-        self.handler.stream.write(self.end_of_log_mark)
+        if self.write_stdout:
+            print()
+        self.handler.emit(logging.makeLogRecord({'msg': self.end_of_log_mark}))
 
 
 Review comment:
   > ```
   >  else logs[-1].message == self.end_of_log_mark.strip()
   > ```
   > 
   > I think this will not work if the `end_of_log_mark` is wrapped as a log record, as the `message` will be the format that Kevin mentioned and it also depends on the `log formatter`.
   I did not change any production code but did add a test case to show how this works. 
   
   in short, you will need to deploy an ingest log processor, and the ones we have look like this:
   "description" : "cluster json log Pipeline",
   "processors" : [
     {
       "rename" : {
         "field" : "message",
         "target_field" : "raw_message"
       }
     },
     {
       "json" : {
         "field" : "raw_message",
         "add_to_root" : false,
         "target_field" : "json_target"
       }
     },
     {
       "grok" : {
         "field" : "json_target.message",
         "patterns" : [
           "Job %{DATA:job_id}: Subtask %{DATA} %{GREEDYDATA:json_msg}",
           "%{GREEDYDATA}"
         ]
       }
     },
     {
       "json" : {
         "field" : "json_msg",
         "add_to_root" : true,
         "if" : "ctx.job_id != null"
       }
     },
     {
       "json" : {
         "field" : "raw_message",
         "add_to_root" : true,
         "if" : "ctx.job_id == null"
       }
     },
     {
       "remove" : {
         "field" : "json_msg",
         "ignore_missing" : true
       }
     },
     {
       "remove" : {
         "field" : "json_target"
       }
     },
     {
       "set" : {
         "field" : "event.kind",
         "value" : "tasks",
         "if" : "ctx.message != null"
       }
     },
     {
       "set" : {
         "field" : "event.dataset",
         "value" : "airflow",
         "if" : "ctx.dag_id != null && ctx.task_id != null"
       }
     },
     {
       "set" : {
         "field" : "log_id",
         "value" : "{{dag_id}}-{{task_id}}-{{execution_date}}-{{try_number}}",
         "if" : "ctx.event?.dataset == 'airflow'"
       }
     },
     {
       "set" : {
         "field" : "offset",
         "value" : "{{log.offset}}",
         "if" : "ctx.event?.dataset == 'airflow'"
       }
     }
   ],
   "on_failure" : [
     {
       "set" : {
         "field" : "error.message",
         "value" : "{{ _ingest.on_failure_message }}"
       }
     }
   ] 
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [airflow] larryzhu2018 commented on a change in pull request #7141: [AIRFLOW-6544] add log_id to end-of-file mark and also add an index config for logs

Posted by GitBox <gi...@apache.org>.
larryzhu2018 commented on a change in pull request #7141: [AIRFLOW-6544] add log_id to end-of-file mark and also add an index config for logs
URL: https://github.com/apache/airflow/pull/7141#discussion_r368103893
 
 

 ##########
 File path: airflow/utils/log/es_task_handler.py
 ##########
 @@ -255,7 +256,9 @@ def close(self):
 
         # Mark the end of file using end of log mark,
         # so we know where to stop while auto-tailing.
-        self.handler.stream.write(self.end_of_log_mark)
+        if self.write_stdout:
 
 Review comment:
   We use "end_of_log_for_airflow_task_instance" string literally as the mark in our deployment. As I mentioned earlier, the current code does not work if you have whitespace in the mark

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [airflow] derlaft edited a comment on issue #7141: [AIRFLOW-6544] add log_id to end_of_log mark log record

Posted by GitBox <gi...@apache.org>.
derlaft edited a comment on issue #7141: [AIRFLOW-6544] add log_id to end_of_log mark log record
URL: https://github.com/apache/airflow/pull/7141#issuecomment-607300663
 
 
   Any updates on this? Not sure why, but this seems stuck. Any help wanted?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [airflow] nuclearpinguin commented on issue #7141: [AIRFLOW-6544] add log_id to end_of_log mark log record

Posted by GitBox <gi...@apache.org>.
nuclearpinguin commented on issue #7141: [AIRFLOW-6544] add log_id to end_of_log mark log record
URL: https://github.com/apache/airflow/pull/7141#issuecomment-576040652
 
 
   > reviewers please help to check the latest iteration
   
   You have to add a description above `---` in Pr message:
   
   ```
   Your description goes here.
   ---
   Issue link: [AIRFLOW-6544](https://issues.apache.org/jira/browse/AIRFLOW-6544)
   ````

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [airflow] larryzhu2018 commented on a change in pull request #7141: [AIRFLOW-6544] add log_id to end_of_log mark log record

Posted by GitBox <gi...@apache.org>.
larryzhu2018 commented on a change in pull request #7141: [AIRFLOW-6544] add log_id to end_of_log mark log record
URL: https://github.com/apache/airflow/pull/7141#discussion_r372636506
 
 

 ##########
 File path: airflow/utils/log/es_task_handler.py
 ##########
 @@ -255,7 +256,9 @@ def close(self):
 
         # Mark the end of file using end of log mark,
         # so we know where to stop while auto-tailing.
-        self.handler.stream.write(self.end_of_log_mark)
+        if self.write_stdout:
+            print()
+        self.handler.emit(logging.makeLogRecord({'msg': self.end_of_log_mark}))
 
 
 Review comment:
   > Hi @larryzhu2018 for your description:
   > ![image](https://user-images.githubusercontent.com/8662365/72946940-52353080-3d35-11ea-9c3b-47996b94f47f.png)
   > 
   > It looks like you have `write_stdout` as `true`, in fact, my pr: #7199 fixed the always `true` for `write_stdout`. could you please test the case when `write_stdout` is `False`.
   [larryzhu2018]: the test case I have in the PR, test_close_with_log_id, is setting write_stdout to False so please check.
   > 
   > Our set up is:
   > `write_stdout` is `False`
   > `json` is `False`
   > `END_OF_LOG_MARK = u'\u0004\n'`
   > 
   > This is what we see in the Kibana
   > ![image](https://user-images.githubusercontent.com/8662365/72947738-d8eb0d00-3d37-11ea-8eac-d341a2c66163.png)
   > 
   > I remember if the `end_of_log` is wrapped in a `logging.makeLogRecord` it will start with
   > 
   > ![image](https://user-images.githubusercontent.com/8662365/72947783-033cca80-3d38-11ea-9fc2-a31429464412.png)
   it is confusing in kibana because you use a white space as the mark. what do you think this would look like if you use "end_of_log_for_airflow_task_instance" as that is what I use in deployment.
   
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [airflow] codecov-io edited a comment on issue #7141: [AIRFLOW-6544] add log_id to end-of-file mark and also add an index config for logs

Posted by GitBox <gi...@apache.org>.
codecov-io edited a comment on issue #7141: [AIRFLOW-6544] add log_id to end-of-file mark and also add an index config for logs
URL: https://github.com/apache/airflow/pull/7141#issuecomment-573571156
 
 
   # [Codecov](https://codecov.io/gh/apache/airflow/pull/7141?src=pr&el=h1) Report
   > Merging [#7141](https://codecov.io/gh/apache/airflow/pull/7141?src=pr&el=desc) into [master](https://codecov.io/gh/apache/airflow/commit/d654d69d7794a57c5c51685a8a96f1d7c38c2302?src=pr&el=desc) will **decrease** coverage by `0.12%`.
   > The diff coverage is `66.66%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/airflow/pull/7141/graphs/tree.svg?width=650&token=WdLKlKHOAU&height=150&src=pr)](https://codecov.io/gh/apache/airflow/pull/7141?src=pr&el=tree)
   
   ```diff
   @@            Coverage Diff             @@
   ##           master    #7141      +/-   ##
   ==========================================
   - Coverage   85.24%   85.12%   -0.13%     
   ==========================================
     Files         683      710      +27     
     Lines       39155    39483     +328     
   ==========================================
   + Hits        33378    33609     +231     
   - Misses       5777     5874      +97
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/airflow/pull/7141?src=pr&el=tree) | Coverage Δ | |
   |---|---|---|
   | [airflow/config\_templates/airflow\_local\_settings.py](https://codecov.io/gh/apache/airflow/pull/7141/diff?src=pr&el=tree#diff-YWlyZmxvdy9jb25maWdfdGVtcGxhdGVzL2FpcmZsb3dfbG9jYWxfc2V0dGluZ3MucHk=) | `70.21% <0%> (-1.53%)` | :arrow_down: |
   | [airflow/utils/log/es\_task\_handler.py](https://codecov.io/gh/apache/airflow/pull/7141/diff?src=pr&el=tree#diff-YWlyZmxvdy91dGlscy9sb2cvZXNfdGFza19oYW5kbGVyLnB5) | `92.66% <80%> (-0.74%)` | :arrow_down: |
   | [airflow/kubernetes/volume\_mount.py](https://codecov.io/gh/apache/airflow/pull/7141/diff?src=pr&el=tree#diff-YWlyZmxvdy9rdWJlcm5ldGVzL3ZvbHVtZV9tb3VudC5weQ==) | `44.44% <0%> (-55.56%)` | :arrow_down: |
   | [airflow/kubernetes/volume.py](https://codecov.io/gh/apache/airflow/pull/7141/diff?src=pr&el=tree#diff-YWlyZmxvdy9rdWJlcm5ldGVzL3ZvbHVtZS5weQ==) | `52.94% <0%> (-47.06%)` | :arrow_down: |
   | [airflow/kubernetes/pod\_launcher.py](https://codecov.io/gh/apache/airflow/pull/7141/diff?src=pr&el=tree#diff-YWlyZmxvdy9rdWJlcm5ldGVzL3BvZF9sYXVuY2hlci5weQ==) | `45.25% <0%> (-46.72%)` | :arrow_down: |
   | [airflow/security/kerberos.py](https://codecov.io/gh/apache/airflow/pull/7141/diff?src=pr&el=tree#diff-YWlyZmxvdy9zZWN1cml0eS9rZXJiZXJvcy5weQ==) | `30.43% <0%> (-45.66%)` | :arrow_down: |
   | [airflow/kubernetes/refresh\_config.py](https://codecov.io/gh/apache/airflow/pull/7141/diff?src=pr&el=tree#diff-YWlyZmxvdy9rdWJlcm5ldGVzL3JlZnJlc2hfY29uZmlnLnB5) | `50.98% <0%> (-23.53%)` | :arrow_down: |
   | [...rflow/contrib/operators/kubernetes\_pod\_operator.py](https://codecov.io/gh/apache/airflow/pull/7141/diff?src=pr&el=tree#diff-YWlyZmxvdy9jb250cmliL29wZXJhdG9ycy9rdWJlcm5ldGVzX3BvZF9vcGVyYXRvci5weQ==) | `78.31% <0%> (-20.49%)` | :arrow_down: |
   | [airflow/gcp/operators/bigquery.py](https://codecov.io/gh/apache/airflow/pull/7141/diff?src=pr&el=tree#diff-YWlyZmxvdy9nY3Avb3BlcmF0b3JzL2JpZ3F1ZXJ5LnB5) | `91.59% <0%> (-0.49%)` | :arrow_down: |
   | [airflow/contrib/operators/gcs\_list\_operator.py](https://codecov.io/gh/apache/airflow/pull/7141/diff?src=pr&el=tree#diff-YWlyZmxvdy9jb250cmliL29wZXJhdG9ycy9nY3NfbGlzdF9vcGVyYXRvci5weQ==) | `100% <0%> (ø)` | :arrow_up: |
   | ... and [81 more](https://codecov.io/gh/apache/airflow/pull/7141/diff?src=pr&el=tree-more) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/airflow/pull/7141?src=pr&el=continue).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/airflow/pull/7141?src=pr&el=footer). Last update [d654d69...75be1e1](https://codecov.io/gh/apache/airflow/pull/7141?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [airflow] larryzhu2018 commented on a change in pull request #7141: [AIRFLOW-6544] add log_id to end-of-file mark and also add an index config for logs

Posted by GitBox <gi...@apache.org>.
larryzhu2018 commented on a change in pull request #7141: [AIRFLOW-6544] add log_id to end-of-file mark and also add an index config for logs
URL: https://github.com/apache/airflow/pull/7141#discussion_r368107397
 
 

 ##########
 File path: airflow/utils/log/es_task_handler.py
 ##########
 @@ -53,9 +53,11 @@ class ElasticsearchTaskHandler(FileTaskHandler, LoggingMixin):
     PAGE = 0
     MAX_LINE_PER_PAGE = 1000
 
+    # 16 is reasonable in this case
+    # pylint: disable-msg=too-many-arguments
     def __init__(self, base_log_folder, filename_template,
                  log_id_template, end_of_log_mark,
-                 write_stdout, json_format, json_fields,
 
 Review comment:
   the default right now is "*" which has the same semantic as before and it is backward compatible.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [airflow] larryzhu2018 commented on issue #7141: [AIRFLOW-6544] add log_id to end_of_log mark log record

Posted by GitBox <gi...@apache.org>.
larryzhu2018 commented on issue #7141: [AIRFLOW-6544] add log_id to end_of_log mark log record
URL: https://github.com/apache/airflow/pull/7141#issuecomment-579991950
 
 
   Ping has been terse. I have been going out of my way explaining it, writing test case to show how it is expected to work. I hope once Ping reviews my explanations and unit tests, it would resolve the disagreements. Thanks.
   
   Get Outlook for iOS<https://aka.ms/o0ukef>
   ________________________________
   From: Ash Berlin-Taylor <no...@github.com>
   Sent: Wednesday, January 29, 2020 2:22:36 PM
   To: apache/airflow <ai...@noreply.github.com>
   Cc: larryzhu2018 <la...@live.com>; Mention <me...@noreply.github.com>
   Subject: Re: [apache/airflow] [AIRFLOW-6544] add log_id to end_of_log mark log record (#7141)
   
   
   I only reverted Ping's recent fix that broke logging using elastic search.
   
   The disagreement is wether it's even broken or not. You say it is, Ping says it isn't.
   
   If Ping/Kevin don't respond soon then I'll try and follow your ingest instructions and check our the before and after.
   
   —
   You are receiving this because you were mentioned.
   Reply to this email directly, view it on GitHub<https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fairflow%2Fpull%2F7141%3Femail_source%3Dnotifications%26email_token%3DAKTDZPHJB6JTFQBRABJMQXDRAH6SZA5CNFSM4KFUTS22YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEKI65GI%23issuecomment-579989145&data=02%7C01%7C%7Cb408b76ec8e546fea0bb08d7a509bf67%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637159333589542743&sdata=EHRK5cN35zxsMFf8RgyUO1zPbSDmabPiDz1t4bj2KKs%3D&reserved=0>, or unsubscribe<https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAKTDZPDRGG225V2BGULDNI3RAH6SZANCNFSM4KFUTS2Q&data=02%7C01%7C%7Cb408b76ec8e546fea0bb08d7a509bf67%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637159333589552754&sdata=AUalojADZWGighgNd1Dpfa7aCPwvU0iqAmjf64A60MU%3D&reserved=0>.
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [airflow] pingzh commented on a change in pull request #7141: [AIRFLOW-6544] add log_id to end-of-file mark and also add an index config for logs

Posted by GitBox <gi...@apache.org>.
pingzh commented on a change in pull request #7141: [AIRFLOW-6544] add log_id to end-of-file mark and also add an index config for logs
URL: https://github.com/apache/airflow/pull/7141#discussion_r368092985
 
 

 ##########
 File path: airflow/utils/log/es_task_handler.py
 ##########
 @@ -255,7 +256,9 @@ def close(self):
 
         # Mark the end of file using end of log mark,
         # so we know where to stop while auto-tailing.
-        self.handler.stream.write(self.end_of_log_mark)
+        if self.write_stdout:
+            print()
+        self.handler.emit(logging.makeLogRecord({'msg': self.end_of_log_mark}))
 
 
 Review comment:
   ```
    else logs[-1].message == self.end_of_log_mark.strip()
   ```
   I think this will not work if the `end_of_log_mark` is wrapped as a log record, as the `message` will be the format that Kevin mentioned and it also depends on the `log formatter`.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [airflow] larryzhu2018 commented on a change in pull request #7141: [AIRFLOW-6544] add log_id to end_of_log mark log record

Posted by GitBox <gi...@apache.org>.
larryzhu2018 commented on a change in pull request #7141: [AIRFLOW-6544] add log_id to end_of_log mark log record
URL: https://github.com/apache/airflow/pull/7141#discussion_r372636506
 
 

 ##########
 File path: airflow/utils/log/es_task_handler.py
 ##########
 @@ -255,7 +256,9 @@ def close(self):
 
         # Mark the end of file using end of log mark,
         # so we know where to stop while auto-tailing.
-        self.handler.stream.write(self.end_of_log_mark)
+        if self.write_stdout:
+            print()
+        self.handler.emit(logging.makeLogRecord({'msg': self.end_of_log_mark}))
 
 
 Review comment:
   > Hi @larryzhu2018 for your description:
   > ![image](https://user-images.githubusercontent.com/8662365/72946940-52353080-3d35-11ea-9c3b-47996b94f47f.png)
   > 
   > It looks like you have `write_stdout` as `true`, in fact, my pr: #7199 fixed the always `true` for `write_stdout`. could you please test the case when `write_stdout` is `False`.
   [larryzhu2018]: the test case I have in the PR, test_close_with_log_id, is setting write_stdout to False so please check.
   > 
   > Our set up is:
   > `write_stdout` is `False`
   > `json` is `False`
   > `END_OF_LOG_MARK = u'\u0004\n'`
   > 
   > This is what we see in the Kibana
   > ![image](https://user-images.githubusercontent.com/8662365/72947738-d8eb0d00-3d37-11ea-8eac-d341a2c66163.png)
   > 
   > I remember if the `end_of_log` is wrapped in a `logging.makeLogRecord` it will start with
   > 
   > ![image](https://user-images.githubusercontent.com/8662365/72947783-033cca80-3d38-11ea-9fc2-a31429464412.png)
   it is confusing in kibana because you use a white space as the mark. what do you think this would look like if you use "end_of_log_for_airflow_task_instance" as that is what I use in deployment.
   
   
   
   > There's a lot of disagreement going on here.
   > 
   > @larryzhu2018 Can you please give me DETAILED instructions on how to test this bug and your fix locally, assujming I have _no_ current elasticsearch or kibana set up currently. (Docker would be preferred.)
   
   Please note that I did not change the production code here. I only reverted Ping's recent fix that broke logging using elastic search. I also provided unit test case that tests the log_id logic, show how log_id is used and I also showed here the ingest processor we use so any one who has an elasticsearch can copy and paste and try it out.  Is this sufficient? I won't have time to add docker, and elastic ingestion nodes etc as I did not add the elastic-search logging support myself. I only reverted a recent change because the authors did not understand the code logic or how it is supposed to work. Do you see the community service I provided here?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [airflow] boring-cyborg[bot] commented on issue #7141: add log_id to end-of-file mark and also add an index config for logs

Posted by GitBox <gi...@apache.org>.
boring-cyborg[bot] commented on issue #7141: add log_id to end-of-file mark and also add an index config for logs
URL: https://github.com/apache/airflow/pull/7141#issuecomment-573364933
 
 
   Congratulations on your first Pull Request and welcome to the Apache Airflow community!
   If you have any issues or are unsure about any anything please check our
   Contribution Guide (https://github.com/apache/airflow/blob/master/CONTRIBUTING.rst)
   
   Here are some useful points:
   - Pay attention to the quality of your code (flake8, pylint and type annotations). Our [pre-commits](
   https://github.com/apache/airflow/blob/master/STATIC_CODE_CHECKS.rst#prerequisites-for-pre-commit-hooks)
   will help you with that.
   - In case of a new feature add useful documentation (in docstrings or in `docs/` directory).
   Adding a new operator? Consider adding an example DAG that shows how users should use it.
   - Consider using [Breeze environment](https://github.com/apache/airflow/blob/master/BREEZE.rst) for testing
   locally, it’s a heavy docker but it ships with a working Airflow and a lot of integrations.
   - Be patient and persistent. It might take some time to get a review or get the final approval from
   Committers.
   
   Apache Airflow is a community-driven project and together we are making it better 🚀.
   
   In case of doubts contact the developers at:
   Mailing List: dev@airflow.apache.org
   Slack: https://apache-airflow-slack.herokuapp.com/
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [airflow] larryzhu2018 commented on a change in pull request #7141: [AIRFLOW-6544] add log_id to end_of_log mark log record

Posted by GitBox <gi...@apache.org>.
larryzhu2018 commented on a change in pull request #7141: [AIRFLOW-6544] add log_id to end_of_log mark log record
URL: https://github.com/apache/airflow/pull/7141#discussion_r373888033
 
 

 ##########
 File path: airflow/utils/log/es_task_handler.py
 ##########
 @@ -53,9 +53,12 @@ class ElasticsearchTaskHandler(FileTaskHandler, LoggingMixin):
     PAGE = 0
     MAX_LINE_PER_PAGE = 1000
 
+    # 16 is reasonable in this case
+    # pylint: disable-msg=too-many-arguments
     def __init__(self, base_log_folder, filename_template,
                  log_id_template, end_of_log_mark,
                  write_stdout, json_format, json_fields,
+                 index='*',
 
 Review comment:
   yes

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [airflow] larryzhu2018 commented on a change in pull request #7141: [AIRFLOW-6544] add log_id to end-of-file mark and also add an index config for logs

Posted by GitBox <gi...@apache.org>.
larryzhu2018 commented on a change in pull request #7141: [AIRFLOW-6544] add log_id to end-of-file mark and also add an index config for logs
URL: https://github.com/apache/airflow/pull/7141#discussion_r368107102
 
 

 ##########
 File path: tests/utils/log/test_es_task_handler.py
 ##########
 @@ -81,6 +85,17 @@ def setUp(self):
         self.ti.try_number = 1
         self.ti.state = State.RUNNING
         self.addCleanup(self.dag.clear)
+        self.index_name2 = "test_index2"
+        self.es_task_handler2 = ElasticsearchTaskHandler(
+            self.local_log_location,
 
 Review comment:
   can you please suggest anything more meaningful. I would be happy to accommodate. it is literally handler2 with test_index2. Because the index name is part of the handler, we need to use a separate log handler here in order to use a different index name.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [airflow] ashb commented on issue #7141: [AIRFLOW-6544] add log_id to end_of_log mark log record

Posted by GitBox <gi...@apache.org>.
ashb commented on issue #7141: [AIRFLOW-6544] add log_id to end_of_log mark log record
URL: https://github.com/apache/airflow/pull/7141#issuecomment-579752700
 
 
   There's a lot of disagreement going on here.
   
   @larryzhu2018 Can you please give me DETAILED instructions on how to test this bug and your fix locally, assujming I have _no_ current elasticsearch or kibana set up currently. (Docker would be preferred.)

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [airflow] larryzhu2018 commented on a change in pull request #7141: [AIRFLOW-6544] add log_id to end-of-file mark and also add an index config for logs

Posted by GitBox <gi...@apache.org>.
larryzhu2018 commented on a change in pull request #7141: [AIRFLOW-6544] add log_id to end-of-file mark and also add an index config for logs
URL: https://github.com/apache/airflow/pull/7141#discussion_r366655381
 
 

 ##########
 File path: airflow/utils/log/es_task_handler.py
 ##########
 @@ -255,7 +256,9 @@ def close(self):
 
         # Mark the end of file using end of log mark,
         # so we know where to stop while auto-tailing.
-        self.handler.stream.write(self.end_of_log_mark)
+        if self.write_stdout:
+            print()
+        self.handler.emit(logging.makeLogRecord({'msg': self.end_of_log_mark}))
 
 
 Review comment:
   from 5528:
        When the end_of_log_mark is wrapped in a log record, the end_of_log_mark can no longer be 
        able to match the log line in _read:
   
           metadata['end_of_log'] = False if not logs \
               else logs[-1].message == self.end_of_log_mark.strip()
        It leads to the UI keeps calling backend and generates lots of load to ES.
   By removing the log_id from the end-of-log mark, it would make it worse as the ui would continue to try to find the end-of-log mark and it won't ever find it as it searches the end-of-log mark by log_id.
   
   I am not sure what  the sentence mean by "When the end_of_log_mark is wrapped in a log record". I also observed that the end-of-log mark might end up within the same line of other log lines and it would prevent us from find the end-of-log mark in those cases. To address that, I always add an obnoxious print right in front of the end-of-log mark line, to ensure the "end-of-log" mark is always in a separate line when printing to console. This is import for filebeat/logstash on kubernetes to pick up the end-of-log mark log line in a separate document. 
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [airflow] larryzhu2018 commented on a change in pull request #7141: [AIRFLOW-6544] add log_id to end-of-file mark and also add an index config for logs

Posted by GitBox <gi...@apache.org>.
larryzhu2018 commented on a change in pull request #7141: [AIRFLOW-6544] add log_id to end-of-file mark and also add an index config for logs
URL: https://github.com/apache/airflow/pull/7141#discussion_r367792823
 
 

 ##########
 File path: airflow/utils/log/es_task_handler.py
 ##########
 @@ -255,7 +256,9 @@ def close(self):
 
         # Mark the end of file using end of log mark,
         # so we know where to stop while auto-tailing.
-        self.handler.stream.write(self.end_of_log_mark)
+        if self.write_stdout:
+            print()
+        self.handler.emit(logging.makeLogRecord({'msg': self.end_of_log_mark}))
 
 
 Review comment:
   with regarding to request to split this change into two PRs, can you please check my test case test_close_with_log_id. I need to use the separate index parameter  to safely test the log_id logic correctly as I showed above, and ensure the end-of-log mark is removed correctly and the # of messages are expected etc. It is needed in this PR to improve testability of the code. Can you please check if this is reasonable?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [airflow] larryzhu2018 commented on issue #7141: [AIRFLOW-6544] add log_id to end_of_log mark log record

Posted by GitBox <gi...@apache.org>.
larryzhu2018 commented on issue #7141: [AIRFLOW-6544] add log_id to end_of_log mark log record
URL: https://github.com/apache/airflow/pull/7141#issuecomment-579987877
 
 
   > There's a lot of disagreement going on here.
   > 
   > @larryzhu2018 Can you please give me DETAILED instructions on how to test this bug and your fix locally, assujming I have _no_ current elasticsearch or kibana set up currently. (Docker would be preferred.)
   
   Please note that I did not change the production code here. I only reverted Ping's recent fix that broke logging using elastic search. I also provided unit test case that tests the log_id logic, show how log_id is used and I also showed here the ingest processor we use so any one who has an elasticsearch can copy and paste and try it out. Is this sufficient? I won't have time to add docker, and elastic ingestion nodes etc as I did not add the elastic-search logging support myself. I only reverted a recent change because the authors did not understand the code logic or how it is supposed to work. Do you see the community service I provided here?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [airflow] codecov-io edited a comment on issue #7141: [AIRFLOW-6544] add log_id to end-of-file mark and also add an index config for logs

Posted by GitBox <gi...@apache.org>.
codecov-io edited a comment on issue #7141: [AIRFLOW-6544] add log_id to end-of-file mark and also add an index config for logs
URL: https://github.com/apache/airflow/pull/7141#issuecomment-573571156
 
 
   # [Codecov](https://codecov.io/gh/apache/airflow/pull/7141?src=pr&el=h1) Report
   > Merging [#7141](https://codecov.io/gh/apache/airflow/pull/7141?src=pr&el=desc) into [master](https://codecov.io/gh/apache/airflow/commit/d654d69d7794a57c5c51685a8a96f1d7c38c2302?src=pr&el=desc) will **decrease** coverage by `0.14%`.
   > The diff coverage is `66.66%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/airflow/pull/7141/graphs/tree.svg?width=650&token=WdLKlKHOAU&height=150&src=pr)](https://codecov.io/gh/apache/airflow/pull/7141?src=pr&el=tree)
   
   ```diff
   @@            Coverage Diff             @@
   ##           master    #7141      +/-   ##
   ==========================================
   - Coverage   85.24%   85.09%   -0.15%     
   ==========================================
     Files         683      723      +40     
     Lines       39155    39550     +395     
   ==========================================
   + Hits        33378    33656     +278     
   - Misses       5777     5894     +117
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/airflow/pull/7141?src=pr&el=tree) | Coverage Δ | |
   |---|---|---|
   | [airflow/config\_templates/airflow\_local\_settings.py](https://codecov.io/gh/apache/airflow/pull/7141/diff?src=pr&el=tree#diff-YWlyZmxvdy9jb25maWdfdGVtcGxhdGVzL2FpcmZsb3dfbG9jYWxfc2V0dGluZ3MucHk=) | `70.21% <0%> (-1.53%)` | :arrow_down: |
   | [airflow/utils/log/es\_task\_handler.py](https://codecov.io/gh/apache/airflow/pull/7141/diff?src=pr&el=tree#diff-YWlyZmxvdy91dGlscy9sb2cvZXNfdGFza19oYW5kbGVyLnB5) | `93.57% <80%> (+0.18%)` | :arrow_up: |
   | [airflow/contrib/hooks/azure\_data\_lake\_hook.py](https://codecov.io/gh/apache/airflow/pull/7141/diff?src=pr&el=tree#diff-YWlyZmxvdy9jb250cmliL2hvb2tzL2F6dXJlX2RhdGFfbGFrZV9ob29rLnB5) | `0% <0%> (-93.11%)` | :arrow_down: |
   | [airflow/contrib/sensors/azure\_cosmos\_sensor.py](https://codecov.io/gh/apache/airflow/pull/7141/diff?src=pr&el=tree#diff-YWlyZmxvdy9jb250cmliL3NlbnNvcnMvYXp1cmVfY29zbW9zX3NlbnNvci5weQ==) | `0% <0%> (-81.25%)` | :arrow_down: |
   | [airflow/kubernetes/volume\_mount.py](https://codecov.io/gh/apache/airflow/pull/7141/diff?src=pr&el=tree#diff-YWlyZmxvdy9rdWJlcm5ldGVzL3ZvbHVtZV9tb3VudC5weQ==) | `44.44% <0%> (-55.56%)` | :arrow_down: |
   | [airflow/kubernetes/volume.py](https://codecov.io/gh/apache/airflow/pull/7141/diff?src=pr&el=tree#diff-YWlyZmxvdy9rdWJlcm5ldGVzL3ZvbHVtZS5weQ==) | `52.94% <0%> (-47.06%)` | :arrow_down: |
   | [airflow/kubernetes/pod\_launcher.py](https://codecov.io/gh/apache/airflow/pull/7141/diff?src=pr&el=tree#diff-YWlyZmxvdy9rdWJlcm5ldGVzL3BvZF9sYXVuY2hlci5weQ==) | `45.25% <0%> (-46.72%)` | :arrow_down: |
   | [airflow/security/kerberos.py](https://codecov.io/gh/apache/airflow/pull/7141/diff?src=pr&el=tree#diff-YWlyZmxvdy9zZWN1cml0eS9rZXJiZXJvcy5weQ==) | `30.43% <0%> (-45.66%)` | :arrow_down: |
   | [airflow/kubernetes/refresh\_config.py](https://codecov.io/gh/apache/airflow/pull/7141/diff?src=pr&el=tree#diff-YWlyZmxvdy9rdWJlcm5ldGVzL3JlZnJlc2hfY29uZmlnLnB5) | `50.98% <0%> (-23.53%)` | :arrow_down: |
   | [...rflow/contrib/operators/kubernetes\_pod\_operator.py](https://codecov.io/gh/apache/airflow/pull/7141/diff?src=pr&el=tree#diff-YWlyZmxvdy9jb250cmliL29wZXJhdG9ycy9rdWJlcm5ldGVzX3BvZF9vcGVyYXRvci5weQ==) | `78.31% <0%> (-20.49%)` | :arrow_down: |
   | ... and [163 more](https://codecov.io/gh/apache/airflow/pull/7141/diff?src=pr&el=tree-more) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/airflow/pull/7141?src=pr&el=continue).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/airflow/pull/7141?src=pr&el=footer). Last update [d654d69...fb27397](https://codecov.io/gh/apache/airflow/pull/7141?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [airflow] ashb commented on a change in pull request #7141: [AIRFLOW-6544] add log_id to end_of_log mark log record

Posted by GitBox <gi...@apache.org>.
ashb commented on a change in pull request #7141: [AIRFLOW-6544] add log_id to end_of_log mark log record
URL: https://github.com/apache/airflow/pull/7141#discussion_r372374981
 
 

 ##########
 File path: tests/utils/log/test_es_task_handler.py
 ##########
 @@ -262,21 +277,55 @@ def test_set_context_w_json_format_and_write_stdout(self):
         self.es_task_handler.json_format = True
         self.es_task_handler.set_context(self.ti)
 
-    def test_close(self):
-        formatter = logging.Formatter('%(asctime)s - %(name)s - %(levelname)s - %(message)s')
 
 Review comment:
   This comment seems to still be an issue @larryzhu2018 - pingzh is saying that they use ElasticSearch logs without the JSON format enabled.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [airflow] andriisoldatenko commented on issue #7141: [AIRFLOW-6544] add log_id to end-of-file mark and also add an index config for logs

Posted by GitBox <gi...@apache.org>.
andriisoldatenko commented on issue #7141: [AIRFLOW-6544] add log_id to end-of-file mark and also add an index config for logs
URL: https://github.com/apache/airflow/pull/7141#issuecomment-574136436
 
 
   @ash IMO better to split 2 unrelated changes into 2 tickets: 
   - add index (good idea)
   - fix bug another ticket?
   general note: i would like to see test case to proof bug and improve test coverage after this change.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [airflow] larryzhu2018 commented on a change in pull request #7141: [AIRFLOW-6544] add log_id to end_of_log mark log record

Posted by GitBox <gi...@apache.org>.
larryzhu2018 commented on a change in pull request #7141: [AIRFLOW-6544] add log_id to end_of_log mark log record
URL: https://github.com/apache/airflow/pull/7141#discussion_r372636506
 
 

 ##########
 File path: airflow/utils/log/es_task_handler.py
 ##########
 @@ -255,7 +256,9 @@ def close(self):
 
         # Mark the end of file using end of log mark,
         # so we know where to stop while auto-tailing.
-        self.handler.stream.write(self.end_of_log_mark)
+        if self.write_stdout:
+            print()
+        self.handler.emit(logging.makeLogRecord({'msg': self.end_of_log_mark}))
 
 
 Review comment:
   > Hi @larryzhu2018 for your description:
   > ![image](https://user-images.githubusercontent.com/8662365/72946940-52353080-3d35-11ea-9c3b-47996b94f47f.png)
   > 
   > It looks like you have `write_stdout` as `true`, in fact, my pr: #7199 fixed the always `true` for `write_stdout`. could you please test the case when `write_stdout` is `False`.
   [larryzhu2018]: the test case I have in the PR, test_close_with_log_id, is setting write_stdout to False so please check.
   > 
   > Our set up is:
   > `write_stdout` is `False`
   > `json` is `False`
   > `END_OF_LOG_MARK = u'\u0004\n'`
   > 
   > This is what we see in the Kibana
   > ![image](https://user-images.githubusercontent.com/8662365/72947738-d8eb0d00-3d37-11ea-8eac-d341a2c66163.png)
   > 
   > I remember if the `end_of_log` is wrapped in a `logging.makeLogRecord` it will start with
   > 
   > ![image](https://user-images.githubusercontent.com/8662365/72947783-033cca80-3d38-11ea-9fc2-a31429464412.png)
   
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [airflow] larryzhu2018 commented on issue #7141: [AIRFLOW-6544] add log_id to end-of-file mark and also add an index config for logs

Posted by GitBox <gi...@apache.org>.
larryzhu2018 commented on issue #7141: [AIRFLOW-6544] add log_id to end-of-file mark and also add an index config for logs
URL: https://github.com/apache/airflow/pull/7141#issuecomment-574446137
 
 
   > Also regarding adding index my **5c**:
   > Currently we have something like:
   > `GET /search?q=user:kimchy` (To search all indices in a cluster, omit the parameter.)
   > New logic:
   > `GET /kimchy,elasticsearch/_search?q=user:kimchy` (search in 1 or several incedes), which means all old code won't work.
   > 
   > I like idea to add index explicitly in search param, but it should be optional than required.
   
   I just set the default index "*" so that by default you do not need to set an index name.
    I am not sure I understand the queries above but to search in a specific index, you just need to run <index name>/_search?q=user:kimchy, instead of /_search?q=user:kimchy

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [airflow] larryzhu2018 commented on a change in pull request #7141: [AIRFLOW-6544] add log_id to end-of-file mark and also add an index config for logs

Posted by GitBox <gi...@apache.org>.
larryzhu2018 commented on a change in pull request #7141: [AIRFLOW-6544] add log_id to end-of-file mark and also add an index config for logs
URL: https://github.com/apache/airflow/pull/7141#discussion_r367788223
 
 

 ##########
 File path: airflow/utils/log/es_task_handler.py
 ##########
 @@ -255,7 +256,9 @@ def close(self):
 
         # Mark the end of file using end of log mark,
         # so we know where to stop while auto-tailing.
-        self.handler.stream.write(self.end_of_log_mark)
+        if self.write_stdout:
+            print()
+        self.handler.emit(logging.makeLogRecord({'msg': self.end_of_log_mark}))
 
 
 Review comment:
   Kevin, the removal of end-of-log mark is already handled by Andrii's initial changes for the stdout support: https://github.com/apache/airflow/commit/0da976a0e1e28e2c0cd274d7384cf2976db6deec#diff-485751b55125e8a90050d22f69e8467c
   
   
           # end_of_log_mark may contain characters like '\n' which is needed to
           # have the log uploaded but will not be stored in elasticsearch.
           metadata['end_of_log'] = False if not logs \
               else logs[-1].message == self.end_of_log_mark.strip()
   
   then
   
           # If we hit the end of the log, remove the actual end_of_log message
           # to prevent it from showing in the UI.
           i = len(logs) if not metadata['end_of_log'] else len(logs) - 1
           message = '\n'.join([log.message for log in logs[0:i]])
   
   Please see my test case test_close_with_log_id that exercises this logic in the tests now.
   Can you please check if this is clear to you now?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [airflow] larryzhu2018 edited a comment on issue #7141: [AIRFLOW-6544] add log_id to end-of-file mark and also add an index config for logs

Posted by GitBox <gi...@apache.org>.
larryzhu2018 edited a comment on issue #7141: [AIRFLOW-6544] add log_id to end-of-file mark and also add an index config for logs
URL: https://github.com/apache/airflow/pull/7141#issuecomment-574447922
 
 
   > @ashb IMO better to split 2 unrelated changes into 2 tickets:
   > 
   > * add index (good idea)
   > * fix bug another ticket?
   >   general note: i would like to see test case to proof bug and improve test coverage after this change.
   
   The fix for "5528" removed the log_id from the end-of-log mark log record. It broke the end  to end scenarios. Because the current elastic search mock does not add log-id to the log records, so it cannot test the end-to-end scenarios where in the real elastic search cluster we rely on the log_id to find the logs for airflow.  
   I added tests for the log_id using elastic search mock just now.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [airflow] larryzhu2018 commented on issue #7141: [AIRFLOW-6544] add log_id to end-of-file mark and also add an index config for logs

Posted by GitBox <gi...@apache.org>.
larryzhu2018 commented on issue #7141: [AIRFLOW-6544] add log_id to end-of-file mark and also add an index config for logs
URL: https://github.com/apache/airflow/pull/7141#issuecomment-574451555
 
 
   > Also regarding adding index my **5c**:
   > Currently we have something like:
   > `GET /search?q=user:kimchy` (To search all indices in a cluster, omit the parameter.)
   > New logic:
   > `GET /kimchy,elasticsearch/_search?q=user:kimchy` (search in 1 or several incedes), which means all old code won't work.
   > 
   > I like idea to add index explicitly in search param, but it should be optional than required.
   
   I changed the current test cases to cover this case by using the index "test_index" in the test cases through the new parameter to __init()

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [airflow] larryzhu2018 commented on issue #7141: [AIRFLOW-6544] add log_id to end_of_log mark log record

Posted by GitBox <gi...@apache.org>.
larryzhu2018 commented on issue #7141: [AIRFLOW-6544] add log_id to end_of_log mark log record
URL: https://github.com/apache/airflow/pull/7141#issuecomment-580890427
 
 
   Correct
   
   Get Outlook for iOS<https://aka.ms/o0ukef>
   ________________________________
   From: Jacob Ward <no...@github.com>
   Sent: Friday, January 31, 2020 4:48:04 AM
   To: apache/airflow <ai...@noreply.github.com>
   Cc: larryzhu2018 <la...@live.com>; Mention <me...@noreply.github.com>
   Subject: Re: [apache/airflow] [AIRFLOW-6544] add log_id to end_of_log mark log record (#7141)
   
   
   @jward-bw commented on this pull request.
   
   ________________________________
   
   In airflow/utils/log/es_task_handler.py<https://eur05.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fairflow%2Fpull%2F7141%23discussion_r373462956&data=02%7C01%7C%7C24f613f05baf4de6c9ed08d7a64bd10f%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637160716865134070&sdata=EvdnTxre77C5OZpQJzvmBIVkth5qZw1vzy22fINVj8Q%3D&reserved=0>:
   
   >      def __init__(self, base_log_folder, filename_template,
                     log_id_template, end_of_log_mark,
                     write_stdout, json_format, json_fields,
   +                 index='*',
   
   
   Just to check * will match all indexes? I am working with an ES cluster where we have a new index every day
   
   —
   You are receiving this because you were mentioned.
   Reply to this email directly, view it on GitHub<https://eur05.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fairflow%2Fpull%2F7141%3Femail_source%3Dnotifications%26email_token%3DAKTDZPFA4BDA56DCQ42ITO3RAQMYJA5CNFSM4KFUTS22YY3PNVWWK3TUL52HS4DFWFIHK3DMKJSXC5LFON2FEZLWNFSXPKTDN5WW2ZLOORPWSZGOCTZTE6Q%23pullrequestreview-351482490&data=02%7C01%7C%7C24f613f05baf4de6c9ed08d7a64bd10f%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637160716865144075&sdata=Y1PssHEI6dm4Iiyrsb30xlYcv1F%2BtvSq3TDzCCd6els%3D&reserved=0>, or unsubscribe<https://eur05.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAKTDZPHMHVLZQP5THM7F6LLRAQMYJANCNFSM4KFUTS2Q&data=02%7C01%7C%7C24f613f05baf4de6c9ed08d7a64bd10f%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637160716865144075&sdata=ZITGP%2FIYp%2BFvhLMXTdCxvq2pFqSxZoLOLu1Xqrdx%2B2c%3D&reserved=0>.
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [airflow] KevinYang21 commented on a change in pull request #7141: [AIRFLOW-6544] add log_id to end_of_log mark log record

Posted by GitBox <gi...@apache.org>.
KevinYang21 commented on a change in pull request #7141: [AIRFLOW-6544] add log_id to end_of_log mark log record
URL: https://github.com/apache/airflow/pull/7141#discussion_r369425422
 
 

 ##########
 File path: airflow/utils/log/es_task_handler.py
 ##########
 @@ -255,7 +256,9 @@ def close(self):
 
         # Mark the end of file using end of log mark,
         # so we know where to stop while auto-tailing.
-        self.handler.stream.write(self.end_of_log_mark)
+        if self.write_stdout:
+            print()
+        self.handler.emit(logging.makeLogRecord({'msg': self.end_of_log_mark}))
 
 
 Review comment:
   sry I was get lost in my inbox, yes it is very clear to me now. Thank you for filling me in with details!

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [airflow] pingzh commented on a change in pull request #7141: [AIRFLOW-6544] add log_id to end-of-file mark and also add an index config for logs

Posted by GitBox <gi...@apache.org>.
pingzh commented on a change in pull request #7141: [AIRFLOW-6544] add log_id to end-of-file mark and also add an index config for logs
URL: https://github.com/apache/airflow/pull/7141#discussion_r368208678
 
 

 ##########
 File path: airflow/utils/log/es_task_handler.py
 ##########
 @@ -53,9 +53,11 @@ class ElasticsearchTaskHandler(FileTaskHandler, LoggingMixin):
     PAGE = 0
     MAX_LINE_PER_PAGE = 1000
 
+    # 16 is reasonable in this case
+    # pylint: disable-msg=too-many-arguments
     def __init__(self, base_log_folder, filename_template,
                  log_id_template, end_of_log_mark,
-                 write_stdout, json_format, json_fields,
 
 Review comment:
   i mean a default value in the class constructor. although you have added an index in the airflow_local_settings, users might already have their own customized airflow_local_settings. If you don't put a default value in the index, it will break their code as the constructor requires `index`.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [airflow] pingzh commented on a change in pull request #7141: [AIRFLOW-6544] add log_id to end-of-file mark and also add an index config for logs

Posted by GitBox <gi...@apache.org>.
pingzh commented on a change in pull request #7141: [AIRFLOW-6544] add log_id to end-of-file mark and also add an index config for logs
URL: https://github.com/apache/airflow/pull/7141#discussion_r368096806
 
 

 ##########
 File path: tests/utils/log/test_es_task_handler.py
 ##########
 @@ -262,21 +277,55 @@ def test_set_context_w_json_format_and_write_stdout(self):
         self.es_task_handler.json_format = True
         self.es_task_handler.set_context(self.ti)
 
-    def test_close(self):
-        formatter = logging.Formatter('%(asctime)s - %(name)s - %(levelname)s - %(message)s')
 
 Review comment:
   and your test only focus on `json format`

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [airflow] pingzh commented on a change in pull request #7141: [AIRFLOW-6544] add log_id to end-of-file mark and also add an index config for logs

Posted by GitBox <gi...@apache.org>.
pingzh commented on a change in pull request #7141: [AIRFLOW-6544] add log_id to end-of-file mark and also add an index config for logs
URL: https://github.com/apache/airflow/pull/7141#discussion_r368095723
 
 

 ##########
 File path: tests/utils/log/test_es_task_handler.py
 ##########
 @@ -262,21 +277,55 @@ def test_set_context_w_json_format_and_write_stdout(self):
         self.es_task_handler.json_format = True
         self.es_task_handler.set_context(self.ti)
 
-    def test_close(self):
-        formatter = logging.Formatter('%(asctime)s - %(name)s - %(levelname)s - %(message)s')
 
 Review comment:
   i think we should not remove this test as it is still a valid test case with the formatter set

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [airflow] codecov-io edited a comment on issue #7141: [AIRFLOW-6544] add log_id to end_of_log mark log record and also add an index config for logs

Posted by GitBox <gi...@apache.org>.
codecov-io edited a comment on issue #7141: [AIRFLOW-6544] add log_id to end_of_log mark log record and also add an index config for logs
URL: https://github.com/apache/airflow/pull/7141#issuecomment-573571156
 
 
   # [Codecov](https://codecov.io/gh/apache/airflow/pull/7141?src=pr&el=h1) Report
   > Merging [#7141](https://codecov.io/gh/apache/airflow/pull/7141?src=pr&el=desc) into [master](https://codecov.io/gh/apache/airflow/commit/d654d69d7794a57c5c51685a8a96f1d7c38c2302?src=pr&el=desc) will **decrease** coverage by `0.11%`.
   > The diff coverage is `66.66%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/airflow/pull/7141/graphs/tree.svg?width=650&token=WdLKlKHOAU&height=150&src=pr)](https://codecov.io/gh/apache/airflow/pull/7141?src=pr&el=tree)
   
   ```diff
   @@            Coverage Diff             @@
   ##           master    #7141      +/-   ##
   ==========================================
   - Coverage   85.24%   85.13%   -0.12%     
   ==========================================
     Files         683      753      +70     
     Lines       39155    39713     +558     
   ==========================================
   + Hits        33378    33809     +431     
   - Misses       5777     5904     +127
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/airflow/pull/7141?src=pr&el=tree) | Coverage Δ | |
   |---|---|---|
   | [airflow/config\_templates/airflow\_local\_settings.py](https://codecov.io/gh/apache/airflow/pull/7141/diff?src=pr&el=tree#diff-YWlyZmxvdy9jb25maWdfdGVtcGxhdGVzL2FpcmZsb3dfbG9jYWxfc2V0dGluZ3MucHk=) | `70.21% <0%> (-1.53%)` | :arrow_down: |
   | [airflow/utils/log/es\_task\_handler.py](https://codecov.io/gh/apache/airflow/pull/7141/diff?src=pr&el=tree#diff-YWlyZmxvdy91dGlscy9sb2cvZXNfdGFza19oYW5kbGVyLnB5) | `93.57% <80%> (+0.18%)` | :arrow_up: |
   | [...rflow/contrib/sensors/sagemaker\_training\_sensor.py](https://codecov.io/gh/apache/airflow/pull/7141/diff?src=pr&el=tree#diff-YWlyZmxvdy9jb250cmliL3NlbnNvcnMvc2FnZW1ha2VyX3RyYWluaW5nX3NlbnNvci5weQ==) | `0% <0%> (-100%)` | :arrow_down: |
   | [airflow/contrib/hooks/azure\_data\_lake\_hook.py](https://codecov.io/gh/apache/airflow/pull/7141/diff?src=pr&el=tree#diff-YWlyZmxvdy9jb250cmliL2hvb2tzL2F6dXJlX2RhdGFfbGFrZV9ob29rLnB5) | `0% <0%> (-93.11%)` | :arrow_down: |
   | [airflow/contrib/sensors/azure\_cosmos\_sensor.py](https://codecov.io/gh/apache/airflow/pull/7141/diff?src=pr&el=tree#diff-YWlyZmxvdy9jb250cmliL3NlbnNvcnMvYXp1cmVfY29zbW9zX3NlbnNvci5weQ==) | `0% <0%> (-81.25%)` | :arrow_down: |
   | [airflow/kubernetes/volume\_mount.py](https://codecov.io/gh/apache/airflow/pull/7141/diff?src=pr&el=tree#diff-YWlyZmxvdy9rdWJlcm5ldGVzL3ZvbHVtZV9tb3VudC5weQ==) | `44.44% <0%> (-55.56%)` | :arrow_down: |
   | [airflow/kubernetes/volume.py](https://codecov.io/gh/apache/airflow/pull/7141/diff?src=pr&el=tree#diff-YWlyZmxvdy9rdWJlcm5ldGVzL3ZvbHVtZS5weQ==) | `52.94% <0%> (-47.06%)` | :arrow_down: |
   | [airflow/kubernetes/pod\_launcher.py](https://codecov.io/gh/apache/airflow/pull/7141/diff?src=pr&el=tree#diff-YWlyZmxvdy9rdWJlcm5ldGVzL3BvZF9sYXVuY2hlci5weQ==) | `45.25% <0%> (-46.72%)` | :arrow_down: |
   | [airflow/security/kerberos.py](https://codecov.io/gh/apache/airflow/pull/7141/diff?src=pr&el=tree#diff-YWlyZmxvdy9zZWN1cml0eS9rZXJiZXJvcy5weQ==) | `30.43% <0%> (-45.66%)` | :arrow_down: |
   | [airflow/kubernetes/refresh\_config.py](https://codecov.io/gh/apache/airflow/pull/7141/diff?src=pr&el=tree#diff-YWlyZmxvdy9rdWJlcm5ldGVzL3JlZnJlc2hfY29uZmlnLnB5) | `50.98% <0%> (-23.53%)` | :arrow_down: |
   | ... and [227 more](https://codecov.io/gh/apache/airflow/pull/7141/diff?src=pr&el=tree-more) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/airflow/pull/7141?src=pr&el=continue).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/airflow/pull/7141?src=pr&el=footer). Last update [d654d69...b695a43](https://codecov.io/gh/apache/airflow/pull/7141?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [airflow] pingzh commented on a change in pull request #7141: [AIRFLOW-6544] add log_id to end-of-file mark and also add an index config for logs

Posted by GitBox <gi...@apache.org>.
pingzh commented on a change in pull request #7141: [AIRFLOW-6544] add log_id to end-of-file mark and also add an index config for logs
URL: https://github.com/apache/airflow/pull/7141#discussion_r368087731
 
 

 ##########
 File path: airflow/utils/log/es_task_handler.py
 ##########
 @@ -255,7 +256,9 @@ def close(self):
 
         # Mark the end of file using end of log mark,
         # so we know where to stop while auto-tailing.
-        self.handler.stream.write(self.end_of_log_mark)
+        if self.write_stdout:
 
 Review comment:
   what is your end_of_log_mark? we are using `END_OF_LOG_MARK = u'\u0004\n'` can you try this end of log?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [airflow] pingzh commented on a change in pull request #7141: [AIRFLOW-6544] add log_id to end-of-file mark and also add an index config for logs

Posted by GitBox <gi...@apache.org>.
pingzh commented on a change in pull request #7141: [AIRFLOW-6544] add log_id to end-of-file mark and also add an index config for logs
URL: https://github.com/apache/airflow/pull/7141#discussion_r368094796
 
 

 ##########
 File path: airflow/utils/log/es_task_handler.py
 ##########
 @@ -53,9 +53,11 @@ class ElasticsearchTaskHandler(FileTaskHandler, LoggingMixin):
     PAGE = 0
     MAX_LINE_PER_PAGE = 1000
 
+    # 16 is reasonable in this case
+    # pylint: disable-msg=too-many-arguments
     def __init__(self, base_log_folder, filename_template,
                  log_id_template, end_of_log_mark,
-                 write_stdout, json_format, json_fields,
 
 Review comment:
   we may want to add a default value of `index` to be backward compatible. 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [airflow] codecov-io commented on issue #7141: [AIRFLOW-6544] add log_id to end-of-file mark and also add an index config for logs

Posted by GitBox <gi...@apache.org>.
codecov-io commented on issue #7141: [AIRFLOW-6544] add log_id to end-of-file mark and also add an index config for logs
URL: https://github.com/apache/airflow/pull/7141#issuecomment-573571156
 
 
   # [Codecov](https://codecov.io/gh/apache/airflow/pull/7141?src=pr&el=h1) Report
   > Merging [#7141](https://codecov.io/gh/apache/airflow/pull/7141?src=pr&el=desc) into [master](https://codecov.io/gh/apache/airflow/commit/d654d69d7794a57c5c51685a8a96f1d7c38c2302?src=pr&el=desc) will **decrease** coverage by `0.26%`.
   > The diff coverage is `66.66%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/airflow/pull/7141/graphs/tree.svg?width=650&token=WdLKlKHOAU&height=150&src=pr)](https://codecov.io/gh/apache/airflow/pull/7141?src=pr&el=tree)
   
   ```diff
   @@            Coverage Diff             @@
   ##           master    #7141      +/-   ##
   ==========================================
   - Coverage   85.24%   84.98%   -0.27%     
   ==========================================
     Files         683      707      +24     
     Lines       39155    39259     +104     
   ==========================================
   - Hits        33378    33364      -14     
   - Misses       5777     5895     +118
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/airflow/pull/7141?src=pr&el=tree) | Coverage Δ | |
   |---|---|---|
   | [airflow/config\_templates/airflow\_local\_settings.py](https://codecov.io/gh/apache/airflow/pull/7141/diff?src=pr&el=tree#diff-YWlyZmxvdy9jb25maWdfdGVtcGxhdGVzL2FpcmZsb3dfbG9jYWxfc2V0dGluZ3MucHk=) | `70.21% <0%> (-1.53%)` | :arrow_down: |
   | [airflow/utils/log/es\_task\_handler.py](https://codecov.io/gh/apache/airflow/pull/7141/diff?src=pr&el=tree#diff-YWlyZmxvdy91dGlscy9sb2cvZXNfdGFza19oYW5kbGVyLnB5) | `92.66% <80%> (-0.74%)` | :arrow_down: |
   | [airflow/kubernetes/volume\_mount.py](https://codecov.io/gh/apache/airflow/pull/7141/diff?src=pr&el=tree#diff-YWlyZmxvdy9rdWJlcm5ldGVzL3ZvbHVtZV9tb3VudC5weQ==) | `44.44% <0%> (-55.56%)` | :arrow_down: |
   | [airflow/kubernetes/volume.py](https://codecov.io/gh/apache/airflow/pull/7141/diff?src=pr&el=tree#diff-YWlyZmxvdy9rdWJlcm5ldGVzL3ZvbHVtZS5weQ==) | `52.94% <0%> (-47.06%)` | :arrow_down: |
   | [airflow/kubernetes/pod\_launcher.py](https://codecov.io/gh/apache/airflow/pull/7141/diff?src=pr&el=tree#diff-YWlyZmxvdy9rdWJlcm5ldGVzL3BvZF9sYXVuY2hlci5weQ==) | `45.25% <0%> (-46.72%)` | :arrow_down: |
   | [airflow/kubernetes/refresh\_config.py](https://codecov.io/gh/apache/airflow/pull/7141/diff?src=pr&el=tree#diff-YWlyZmxvdy9rdWJlcm5ldGVzL3JlZnJlc2hfY29uZmlnLnB5) | `50.98% <0%> (-23.53%)` | :arrow_down: |
   | [...rflow/contrib/operators/kubernetes\_pod\_operator.py](https://codecov.io/gh/apache/airflow/pull/7141/diff?src=pr&el=tree#diff-YWlyZmxvdy9jb250cmliL29wZXJhdG9ycy9rdWJlcm5ldGVzX3BvZF9vcGVyYXRvci5weQ==) | `78.31% <0%> (-20.49%)` | :arrow_down: |
   | [airflow/jobs/backfill\_job.py](https://codecov.io/gh/apache/airflow/pull/7141/diff?src=pr&el=tree#diff-YWlyZmxvdy9qb2JzL2JhY2tmaWxsX2pvYi5weQ==) | `90.43% <0%> (-1.45%)` | :arrow_down: |
   | [airflow/contrib/sensors/hdfs\_sensor.py](https://codecov.io/gh/apache/airflow/pull/7141/diff?src=pr&el=tree#diff-YWlyZmxvdy9jb250cmliL3NlbnNvcnMvaGRmc19zZW5zb3IucHk=) | `100% <0%> (ø)` | :arrow_up: |
   | [airflow/sensors/hdfs\_sensor.py](https://codecov.io/gh/apache/airflow/pull/7141/diff?src=pr&el=tree#diff-YWlyZmxvdy9zZW5zb3JzL2hkZnNfc2Vuc29yLnB5) | `100% <0%> (ø)` | :arrow_up: |
   | ... and [50 more](https://codecov.io/gh/apache/airflow/pull/7141/diff?src=pr&el=tree-more) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/airflow/pull/7141?src=pr&el=continue).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/airflow/pull/7141?src=pr&el=footer). Last update [d654d69...18a8785](https://codecov.io/gh/apache/airflow/pull/7141?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [airflow] pingzh commented on a change in pull request #7141: [AIRFLOW-6544] add log_id to end_of_log mark log record

Posted by GitBox <gi...@apache.org>.
pingzh commented on a change in pull request #7141: [AIRFLOW-6544] add log_id to end_of_log mark log record
URL: https://github.com/apache/airflow/pull/7141#discussion_r368753289
 
 

 ##########
 File path: tests/utils/log/test_es_task_handler.py
 ##########
 @@ -262,21 +277,55 @@ def test_set_context_w_json_format_and_write_stdout(self):
         self.es_task_handler.json_format = True
         self.es_task_handler.set_context(self.ti)
 
-    def test_close(self):
-        formatter = logging.Formatter('%(asctime)s - %(name)s - %(levelname)s - %(message)s')
 
 Review comment:
   sorry, i did not get it, `non-json format does not really work in the context of elastic search`. We don't use json format and it works fine in our production. 
   
   could you setup be related to this? https://github.com/apache/airflow/pull/7199

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [airflow] larryzhu2018 commented on a change in pull request #7141: [AIRFLOW-6544] add log_id to end-of-file mark and also add an index config for logs

Posted by GitBox <gi...@apache.org>.
larryzhu2018 commented on a change in pull request #7141: [AIRFLOW-6544] add log_id to end-of-file mark and also add an index config for logs
URL: https://github.com/apache/airflow/pull/7141#discussion_r367788223
 
 

 ##########
 File path: airflow/utils/log/es_task_handler.py
 ##########
 @@ -255,7 +256,9 @@ def close(self):
 
         # Mark the end of file using end of log mark,
         # so we know where to stop while auto-tailing.
-        self.handler.stream.write(self.end_of_log_mark)
+        if self.write_stdout:
+            print()
+        self.handler.emit(logging.makeLogRecord({'msg': self.end_of_log_mark}))
 
 
 Review comment:
   Kevin, the removal of end-of-log mark is already handled by Andrii's initial changes for the stdout support so it won't be displayed to the users. Pleas see https://github.com/apache/airflow/commit/0da976a0e1e28e2c0cd274d7384cf2976db6deec#diff-485751b55125e8a90050d22f69e8467c
   
   
           # end_of_log_mark may contain characters like '\n' which is needed to
           # have the log uploaded but will not be stored in elasticsearch.
           metadata['end_of_log'] = False if not logs \
               else logs[-1].message == self.end_of_log_mark.strip()
   
   then
   
           # If we hit the end of the log, remove the actual end_of_log message
           # to prevent it from showing in the UI.
           i = len(logs) if not metadata['end_of_log'] else len(logs) - 1
           message = '\n'.join([log.message for log in logs[0:i]])
   
   Please see my test case test_close_with_log_id that exercises this logic in the tests now.
   Can you please check if this is clear to you now?
   
   Log_id is constructed on the Elasticsearch but it needs the dag_id, task_id, execution_date and try_number to compute the log_id and that is why  you need to use emit() to include the information. In my test case, here is how I simulate the logic in the elastic search processors:
   
               msg['log_id'] = self.log_id_template.format(
                   dag_id=msg['dag_id'],
                   task_id=msg['task_id'],
                   execution_date=msg['execution_date'],
                   try_number=msg['try_number'])
               msg['message'] = msg['message'].strip()
               msg['offset'] = 100
   
   To do the same, the elastic search ingest processor pipeline looks like the following for me:
   
   
   
       "description" : "cluster json log Pipeline",
       "processors" : [
         {
           "rename" : {
             "field" : "message",
             "target_field" : "raw_message"
           }
         },
         {
           "json" : {
             "field" : "raw_message",
             "add_to_root" : false,
             "target_field" : "json_target"
           }
         },
         {
           "grok" : {
             "field" : "json_target.message",
             "patterns" : [
               "Job %{DATA:job_id}: Subtask %{DATA} %{GREEDYDATA:json_msg}",
               "%{GREEDYDATA}"
             ]
           }
         },
         {
           "json" : {
             "field" : "json_msg",
             "add_to_root" : true,
             "if" : "ctx.job_id != null"
           }
         },
         {
           "json" : {
             "field" : "raw_message",
             "add_to_root" : true,
             "if" : "ctx.job_id == null"
           }
         },
         {
           "remove" : {
             "field" : "json_msg",
             "ignore_missing" : true
           }
         },
         {
           "remove" : {
             "field" : "json_target"
           }
         },
         {
           "set" : {
             "field" : "event.kind",
             "value" : "tasks",
             "if" : "ctx.message != null"
           }
         },
         {
           "set" : {
             "field" : "event.dataset",
             "value" : "airflow",
             "if" : "ctx.dag_id != null && ctx.task_id != null"
           }
         },
         {
           "set" : {
             "field" : "log_id",
             "value" : "{{dag_id}}-{{task_id}}-{{execution_date}}-{{try_number}}",
             "if" : "ctx.event?.dataset == 'airflow'"
           }
         },
         {
           "set" : {
             "field" : "offset",
             "value" : "{{log.offset}}",
             "if" : "ctx.event?.dataset == 'airflow'"
           }
         }
       ],
       "on_failure" : [
         {
           "set" : {
             "field" : "error.message",
             "value" : "{{ _ingest.on_failure_message }}"
           }
         }
       ]
     
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [airflow] ashb commented on a change in pull request #7141: [AIRFLOW-6544] add log_id to end-of-file mark and also add an index config for logs

Posted by GitBox <gi...@apache.org>.
ashb commented on a change in pull request #7141: [AIRFLOW-6544] add log_id to end-of-file mark and also add an index config for logs
URL: https://github.com/apache/airflow/pull/7141#discussion_r366227340
 
 

 ##########
 File path: airflow/config_templates/default_airflow.cfg
 ##########
 @@ -698,6 +698,9 @@ json_format = False
 # Log fields to also attach to the json output, if enabled
 json_fields = asctime, filename, lineno, levelname, message
 
+# Index used to store airflow logs
+index = filebeat-*
 
 Review comment:
   Please don't change the default

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [airflow] larryzhu2018 commented on a change in pull request #7141: [AIRFLOW-6544] add log_id to end_of_log mark log record

Posted by GitBox <gi...@apache.org>.
larryzhu2018 commented on a change in pull request #7141: [AIRFLOW-6544] add log_id to end_of_log mark log record
URL: https://github.com/apache/airflow/pull/7141#discussion_r372652360
 
 

 ##########
 File path: tests/utils/log/test_es_task_handler.py
 ##########
 @@ -262,21 +277,55 @@ def test_set_context_w_json_format_and_write_stdout(self):
         self.es_task_handler.json_format = True
         self.es_task_handler.set_context(self.ti)
 
-    def test_close(self):
-        formatter = logging.Formatter('%(asctime)s - %(name)s - %(levelname)s - %(message)s')
 
 Review comment:
   @pingzh you do not use log_id at all in your deployment. you need that in order to make the airflow webui to work as the web ui is searching for the log_id, without that it will keep spinning. Can you please review the comments I added here and check? 
   
   please come back to me here, and tell us if you can make this work with the non-json log format. I do not think you can construct the log_id that is needed for the web-ui using the non-json log format.
   Let me recap: IIUC the only reason that you think your deployment worked was because you missed out the log_id logic completely and I have explained how it should work, where it is used, and also add added a unit test case to show you how this can work. Please reviews my explanations here and see if it is clear to you now. Thanks.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [airflow] pingzh commented on a change in pull request #7141: [AIRFLOW-6544] add log_id to end_of_log mark log record

Posted by GitBox <gi...@apache.org>.
pingzh commented on a change in pull request #7141: [AIRFLOW-6544] add log_id to end_of_log mark log record
URL: https://github.com/apache/airflow/pull/7141#discussion_r369883128
 
 

 ##########
 File path: airflow/utils/log/es_task_handler.py
 ##########
 @@ -255,7 +256,9 @@ def close(self):
 
         # Mark the end of file using end of log mark,
         # so we know where to stop while auto-tailing.
-        self.handler.stream.write(self.end_of_log_mark)
+        if self.write_stdout:
+            print()
+        self.handler.emit(logging.makeLogRecord({'msg': self.end_of_log_mark}))
 
 
 Review comment:
   > this works as long as you do not put white spaces into your end_of_log_mark. I think you would just ask for troubles by putting white space characters into the end-of-log mark.
   
   I am not sure why using white space will cause troubles.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [airflow] larryzhu2018 commented on a change in pull request #7141: [AIRFLOW-6544] add log_id to end-of-file mark and also add an index config for logs

Posted by GitBox <gi...@apache.org>.
larryzhu2018 commented on a change in pull request #7141: [AIRFLOW-6544] add log_id to end-of-file mark and also add an index config for logs
URL: https://github.com/apache/airflow/pull/7141#discussion_r366655381
 
 

 ##########
 File path: airflow/utils/log/es_task_handler.py
 ##########
 @@ -255,7 +256,9 @@ def close(self):
 
         # Mark the end of file using end of log mark,
         # so we know where to stop while auto-tailing.
-        self.handler.stream.write(self.end_of_log_mark)
+        if self.write_stdout:
+            print()
+        self.handler.emit(logging.makeLogRecord({'msg': self.end_of_log_mark}))
 
 
 Review comment:
   from 5528:
        When the end_of_log_mark is wrapped in a log record, the end_of_log_mark can no longer be 
        able to match the log line in _read:
   
           metadata['end_of_log'] = False if not logs \
               else logs[-1].message == self.end_of_log_mark.strip()
        It leads to the UI keeps calling backend and generates lots of load to ES.
   By removing the log_id from the end-of-log mark, it would make it worse as the ui would continue to try to find the end-of-log mark and it won't ever find it as it searches the end-of-log mark by log_id.
   
   I am not sure what  the sentence mean by "When the end_of_log_mark is wrapped in a log record". I also observed that the end-of-log mark might end up within the same line of other log lines and it would prevent us from finding the end-of-log mark in those cases. To address that, I always add an obnoxious print right in front of the end-of-log mark line, to ensure the "end-of-log" mark is always in a separate line when printing to console. This is import for filebeat/logstash on kubernetes to pick up the end-of-log mark log line in a separate document. 
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [airflow] larryzhu2018 commented on a change in pull request #7141: [AIRFLOW-6544] add log_id to end-of-file mark and also add an index config for logs

Posted by GitBox <gi...@apache.org>.
larryzhu2018 commented on a change in pull request #7141: [AIRFLOW-6544] add log_id to end-of-file mark and also add an index config for logs
URL: https://github.com/apache/airflow/pull/7141#discussion_r367788223
 
 

 ##########
 File path: airflow/utils/log/es_task_handler.py
 ##########
 @@ -255,7 +256,9 @@ def close(self):
 
         # Mark the end of file using end of log mark,
         # so we know where to stop while auto-tailing.
-        self.handler.stream.write(self.end_of_log_mark)
+        if self.write_stdout:
+            print()
+        self.handler.emit(logging.makeLogRecord({'msg': self.end_of_log_mark}))
 
 
 Review comment:
   Kevin, the removal of end-of-log mark is already handled by Andrii's initial changes for the stdout support so it won't be displayed to the users. Pleas see https://github.com/apache/airflow/commit/0da976a0e1e28e2c0cd274d7384cf2976db6deec#diff-485751b55125e8a90050d22f69e8467c
   
   
           # end_of_log_mark may contain characters like '\n' which is needed to
           # have the log uploaded but will not be stored in elasticsearch.
           metadata['end_of_log'] = False if not logs \
               else logs[-1].message == self.end_of_log_mark.strip()
   
   then
   
           # If we hit the end of the log, remove the actual end_of_log message
           # to prevent it from showing in the UI.
           i = len(logs) if not metadata['end_of_log'] else len(logs) - 1
           message = '\n'.join([log.message for log in logs[0:i]])
   
   Please see my test case test_close_with_log_id that exercises this logic in the tests now.
   Can you please check if this is clear to you now?
   
   Log_id is constructed on the Elasticsearch but it needs the dag_id, task_id, execution_date and try_number to compute the log_id and that is why  you need to use emit() to include the information. In my test case, here is how I simulate the logic in the elastic search processors:
               msg['log_id'] = self.log_id_template.format(
                   dag_id=msg['dag_id'],
                   task_id=msg['task_id'],
                   execution_date=msg['execution_date'],
                   try_number=msg['try_number'])
               msg['message'] = msg['message'].strip()
               msg['offset'] = 100
   
   To do the same, the elastic search ingest processor pipeline looks like the following for me:
   
   
   
   `{
       "description" : "cluster json log Pipeline",
       "processors" : [
         {
           "rename" : {
             "field" : "message",
             "target_field" : "json_msg"
           }
         },
         {
           "json" : {
             "field" : "json_msg",
             "add_to_root" : true
           }
         },
         {
           "rename" : {
             "field" : "message",
             "target_field" : "outter_msg"
           }
         },      
         {
           "grok" : {
             "field" : "outter_msg",
             "patterns" : [
               "%{DATA} {%{DATA}, \"message\": \"%{DATA:message}\", %{GREEDYDATA}}",
               "%{GREEDYDATA}"
             ]
           }      
         },
         {
           "set" : {
            "field" : "event.kind",
             "value" : "tasks",
             "if" : "ctx.message != null"          
           }
         },
         {
           "rename" : {
             "field" : "outter_msg",
             "target_field" : "message",
             "if" : "ctx.message == null"
           }
         },  
         {
           "remove" : {
             "field" : "outter_msg",
             "ignore_missing" : true
           }
         },      
         {
           "set" : {
             "field" : "event.dataset",
             "value" : "airflow",
             "if" : "ctx.dag_id != null && ctx.task_id != null"
           }
         },    
        {
           "set" : {
             "field" : "log_id",
             "value" : "{{dag_id}}-{{task_id}}-{{execution_date}}-{{try_number}}",
             "if" : "ctx.event?.dataset == 'airflow'"
           }
         },
        {
           "set" : {
             "field" : "offset",
             "value" : "{{log.offset}}",
             "if" : "ctx.event?.dataset == 'airflow'"
           }
         }       
       ],
       "on_failure" : [
         {
           "set" : {
             "field" : "error.message",
             "value" : "{{ _ingest.on_failure_message }}"
           }
         }
       ]
     }
   `
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [airflow] ash commented on issue #7141: [AIRFLOW-6544] add log_id to end-of-file mark and also add an index config for logs

Posted by GitBox <gi...@apache.org>.
ash commented on issue #7141: [AIRFLOW-6544] add log_id to end-of-file mark and also add an index config for logs
URL: https://github.com/apache/airflow/pull/7141#issuecomment-574136742
 
 
   If you can spell @ashb correctly, that would be great.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [airflow] andriisoldatenko commented on issue #7141: [AIRFLOW-6544] add log_id to end-of-file mark and also add an index config for logs

Posted by GitBox <gi...@apache.org>.
andriisoldatenko commented on issue #7141: [AIRFLOW-6544] add log_id to end-of-file mark and also add an index config for logs
URL: https://github.com/apache/airflow/pull/7141#issuecomment-574137300
 
 
   > If you can spell @ashb correctly, that would be great.
   
   sorry my mistake :(

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [airflow] ashb commented on a change in pull request #7141: [AIRFLOW-6544] add log_id to end_of_log mark log record

Posted by GitBox <gi...@apache.org>.
ashb commented on a change in pull request #7141: [AIRFLOW-6544] add log_id to end_of_log mark log record
URL: https://github.com/apache/airflow/pull/7141#discussion_r372376263
 
 

 ##########
 File path: airflow/utils/log/es_task_handler.py
 ##########
 @@ -255,7 +256,9 @@ def close(self):
 
         # Mark the end of file using end of log mark,
         # so we know where to stop while auto-tailing.
-        self.handler.stream.write(self.end_of_log_mark)
+        if self.write_stdout:
+            print()
 
 Review comment:
   This needs a comment saying what it's for then please.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [airflow] larryzhu2018 commented on a change in pull request #7141: [AIRFLOW-6544] add log_id to end-of-file mark and also add an index config for logs

Posted by GitBox <gi...@apache.org>.
larryzhu2018 commented on a change in pull request #7141: [AIRFLOW-6544] add log_id to end-of-file mark and also add an index config for logs
URL: https://github.com/apache/airflow/pull/7141#discussion_r366652482
 
 

 ##########
 File path: airflow/utils/log/es_task_handler.py
 ##########
 @@ -255,7 +256,9 @@ def close(self):
 
         # Mark the end of file using end of log mark,
         # so we know where to stop while auto-tailing.
-        self.handler.stream.write(self.end_of_log_mark)
+        if self.write_stdout:
+            print()
+        self.handler.emit(logging.makeLogRecord({'msg': self.end_of_log_mark}))
 
 
 Review comment:
   sorry I meant that I sent the email to Andrii and the two authors of 5528. 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [airflow] KevinYang21 commented on a change in pull request #7141: [AIRFLOW-6544] add log_id to end-of-file mark and also add an index config for logs

Posted by GitBox <gi...@apache.org>.
KevinYang21 commented on a change in pull request #7141: [AIRFLOW-6544] add log_id to end-of-file mark and also add an index config for logs
URL: https://github.com/apache/airflow/pull/7141#discussion_r367675716
 
 

 ##########
 File path: airflow/utils/log/es_task_handler.py
 ##########
 @@ -255,7 +256,9 @@ def close(self):
 
         # Mark the end of file using end of log mark,
         # so we know where to stop while auto-tailing.
-        self.handler.stream.write(self.end_of_log_mark)
+        if self.write_stdout:
+            print()
+        self.handler.emit(logging.makeLogRecord({'msg': self.end_of_log_mark}))
 
 
 Review comment:
   It means the last line would be something like `[2020-01-16 07:58:32,712] {es_task_handler.py:XXX} INFO [end_of_log_mark]` and thus made the reader unable to understand it.
   
   I'm a bit lost how did this removed the log id from the end_of_log_mark. Isn't the log_id we constructed in this file only for log fetching? My understanding is that the log_id is determined when we upload the log, e.g. when we pipe stdout to logstash or when we upload file through filebeat to logstash.
   
   Maybe I was understanding this wrong and there is indeed a bug. In that case I would agree on spliting this change into two PRs for sanity purpose.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [airflow] larryzhu2018 commented on a change in pull request #7141: [AIRFLOW-6544] add log_id to end_of_log mark log record

Posted by GitBox <gi...@apache.org>.
larryzhu2018 commented on a change in pull request #7141: [AIRFLOW-6544] add log_id to end_of_log mark log record
URL: https://github.com/apache/airflow/pull/7141#discussion_r372638312
 
 

 ##########
 File path: airflow/utils/log/es_task_handler.py
 ##########
 @@ -255,7 +256,9 @@ def close(self):
 
         # Mark the end of file using end of log mark,
         # so we know where to stop while auto-tailing.
-        self.handler.stream.write(self.end_of_log_mark)
+        if self.write_stdout:
+            print()
+        self.handler.emit(logging.makeLogRecord({'msg': self.end_of_log_mark}))
 
 
 Review comment:
   > > this works as long as you do not put white spaces into your end_of_log_mark. I think you would just ask for troubles by putting white space characters into the end-of-log mark.
   > 
   > I am not sure why using white space will cause troubles.
   see the .strip() call as I quoted earlier. This basically means if you have empty spaces in your end-of-log mark, it won't work and it will confuse the users.
   
    else logs[-1].message == self.end_of_log_mark.strip()
   
   Does this make sense now?
   
   But please do not remove .strip() call to "fix" this. It is hard to guarantee the event pipelines preserve whitespaces so it is best to not use whitespace in your end-of-log mark.
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [airflow] larryzhu2018 commented on a change in pull request #7141: [AIRFLOW-6544] add log_id to end-of-file mark and also add an index config for logs

Posted by GitBox <gi...@apache.org>.
larryzhu2018 commented on a change in pull request #7141: [AIRFLOW-6544] add log_id to end-of-file mark and also add an index config for logs
URL: https://github.com/apache/airflow/pull/7141#discussion_r368107637
 
 

 ##########
 File path: tests/utils/log/test_es_task_handler.py
 ##########
 @@ -262,21 +277,55 @@ def test_set_context_w_json_format_and_write_stdout(self):
         self.es_task_handler.json_format = True
         self.es_task_handler.set_context(self.ti)
 
-    def test_close(self):
-        formatter = logging.Formatter('%(asctime)s - %(name)s - %(levelname)s - %(message)s')
 
 Review comment:
   as I mentioned non-json format does not really work in the context of elastic search. I suspect no one can deploy it correctly.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [airflow] larryzhu2018 commented on a change in pull request #7141: [AIRFLOW-6544] add log_id to end-of-file mark and also add an index config for logs

Posted by GitBox <gi...@apache.org>.
larryzhu2018 commented on a change in pull request #7141: [AIRFLOW-6544] add log_id to end-of-file mark and also add an index config for logs
URL: https://github.com/apache/airflow/pull/7141#discussion_r367788223
 
 

 ##########
 File path: airflow/utils/log/es_task_handler.py
 ##########
 @@ -255,7 +256,9 @@ def close(self):
 
         # Mark the end of file using end of log mark,
         # so we know where to stop while auto-tailing.
-        self.handler.stream.write(self.end_of_log_mark)
+        if self.write_stdout:
+            print()
+        self.handler.emit(logging.makeLogRecord({'msg': self.end_of_log_mark}))
 
 
 Review comment:
   Kevin, the removal of end-of-log mark is already handled by Andrii's initial changes for the stdout support so it won't be displayed to the users. Pleas see https://github.com/apache/airflow/commit/0da976a0e1e28e2c0cd274d7384cf2976db6deec#diff-485751b55125e8a90050d22f69e8467c
   
   
           # end_of_log_mark may contain characters like '\n' which is needed to
           # have the log uploaded but will not be stored in elasticsearch.
           metadata['end_of_log'] = False if not logs \
               else logs[-1].message == self.end_of_log_mark.strip()
   
   then
   
           # If we hit the end of the log, remove the actual end_of_log message
           # to prevent it from showing in the UI.
           i = len(logs) if not metadata['end_of_log'] else len(logs) - 1
           message = '\n'.join([log.message for log in logs[0:i]])
   
   Please see my test case test_close_with_log_id that exercises this logic in the tests now.
   Can you please check if this is clear to you now?
   
   Log_id is constructed on the Elasticsearch but it needs the dag_id, task_id, execution_date and try_number to compute the log_id and that is why  you need to use emit() to include the information. In my test case, here is how I simulate the logic in the elastic search processors:
               msg['log_id'] = self.log_id_template.format(
                   dag_id=msg['dag_id'],
                   task_id=msg['task_id'],
                   execution_date=msg['execution_date'],
                   try_number=msg['try_number'])
               msg['message'] = msg['message'].strip()
               msg['offset'] = 100
   
   To do the same, the elastic search ingest processor pipeline looks like the following for me:
   
   
   
       "description" : "cluster json log Pipeline",
       "processors" : [
         {
           "rename" : {
             "field" : "message",
             "target_field" : "raw_message"
           }
         },
         {
           "json" : {
             "field" : "raw_message",
             "add_to_root" : false,
             "target_field" : "json_target"
           }
         },
         {
           "grok" : {
             "field" : "json_target.message",
             "patterns" : [
               "Job %{DATA:job_id}: Subtask %{DATA} %{GREEDYDATA:json_msg}",
               "%{GREEDYDATA}"
             ]
           }
         },
         {
           "json" : {
             "field" : "json_msg",
             "add_to_root" : true,
             "if" : "ctx.job_id != null"
           }
         },
         {
           "json" : {
             "field" : "raw_message",
             "add_to_root" : true,
             "if" : "ctx.job_id == null"
           }
         },
         {
           "remove" : {
             "field" : "json_msg",
             "ignore_missing" : true
           }
         },
         {
           "remove" : {
             "field" : "json_target"
           }
         },
         {
           "set" : {
             "field" : "event.kind",
             "value" : "tasks",
             "if" : "ctx.message != null"
           }
         },
         {
           "set" : {
             "field" : "event.dataset",
             "value" : "airflow",
             "if" : "ctx.dag_id != null && ctx.task_id != null"
           }
         },
         {
           "set" : {
             "field" : "log_id",
             "value" : "{{dag_id}}-{{task_id}}-{{execution_date}}-{{try_number}}",
             "if" : "ctx.event?.dataset == 'airflow'"
           }
         },
         {
           "set" : {
             "field" : "offset",
             "value" : "{{log.offset}}",
             "if" : "ctx.event?.dataset == 'airflow'"
           }
         }
       ],
       "on_failure" : [
         {
           "set" : {
             "field" : "error.message",
             "value" : "{{ _ingest.on_failure_message }}"
           }
         }
       ]
     
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [airflow] larryzhu2018 commented on issue #7141: [AIRFLOW-6544] add log_id to end-of-file mark and also add an index config for logs

Posted by GitBox <gi...@apache.org>.
larryzhu2018 commented on issue #7141: [AIRFLOW-6544] add log_id to end-of-file mark and also add an index config for logs
URL: https://github.com/apache/airflow/pull/7141#issuecomment-574448853
 
 
   > @larryzhu2018 Yup, this is two tickets as Andrii mentions.
   > 
   > And this seems to revert the change from #6159 -- so you will need to explain this change in much more detail as to why you think it your version is now right.
   
   This change reverts #6159 and I validated that after reverting that, the end-to-end scenarios work and airflow-web can find the end-of-log mark using the log-id.
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [airflow] pingzh commented on a change in pull request #7141: [AIRFLOW-6544] add log_id to end-of-file mark and also add an index config for logs

Posted by GitBox <gi...@apache.org>.
pingzh commented on a change in pull request #7141: [AIRFLOW-6544] add log_id to end-of-file mark and also add an index config for logs
URL: https://github.com/apache/airflow/pull/7141#discussion_r368094024
 
 

 ##########
 File path: tests/utils/log/test_es_task_handler.py
 ##########
 @@ -81,6 +85,17 @@ def setUp(self):
         self.ti.try_number = 1
         self.ti.state = State.RUNNING
         self.addCleanup(self.dag.clear)
+        self.index_name2 = "test_index2"
+        self.es_task_handler2 = ElasticsearchTaskHandler(
+            self.local_log_location,
 
 Review comment:
   can you rename this to be more specific `es_task_handler2`?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [airflow] larryzhu2018 commented on a change in pull request #7141: [AIRFLOW-6544] add log_id to end-of-file mark and also add an index config for logs

Posted by GitBox <gi...@apache.org>.
larryzhu2018 commented on a change in pull request #7141: [AIRFLOW-6544] add log_id to end-of-file mark and also add an index config for logs
URL: https://github.com/apache/airflow/pull/7141#discussion_r368237068
 
 

 ##########
 File path: airflow/utils/log/es_task_handler.py
 ##########
 @@ -53,9 +53,11 @@ class ElasticsearchTaskHandler(FileTaskHandler, LoggingMixin):
     PAGE = 0
     MAX_LINE_PER_PAGE = 1000
 
+    # 16 is reasonable in this case
+    # pylint: disable-msg=too-many-arguments
     def __init__(self, base_log_folder, filename_template,
                  log_id_template, end_of_log_mark,
-                 write_stdout, json_format, json_fields,
 
 Review comment:
   done. Fixed as you suggested. Thanks.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [airflow] codecov-io edited a comment on issue #7141: [AIRFLOW-6544] add log_id to end_of_log mark log record

Posted by GitBox <gi...@apache.org>.
codecov-io edited a comment on issue #7141: [AIRFLOW-6544] add log_id to end_of_log mark log record
URL: https://github.com/apache/airflow/pull/7141#issuecomment-573571156
 
 
   # [Codecov](https://codecov.io/gh/apache/airflow/pull/7141?src=pr&el=h1) Report
   > Merging [#7141](https://codecov.io/gh/apache/airflow/pull/7141?src=pr&el=desc) into [master](https://codecov.io/gh/apache/airflow/commit/d654d69d7794a57c5c51685a8a96f1d7c38c2302?src=pr&el=desc) will **increase** coverage by `0.76%`.
   > The diff coverage is `66.66%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/airflow/pull/7141/graphs/tree.svg?width=650&token=WdLKlKHOAU&height=150&src=pr)](https://codecov.io/gh/apache/airflow/pull/7141?src=pr&el=tree)
   
   ```diff
   @@            Coverage Diff            @@
   ##           master   #7141      +/-   ##
   =========================================
   + Coverage   85.24%     86%   +0.76%     
   =========================================
     Files         683     866     +183     
     Lines       39155   40559    +1404     
   =========================================
   + Hits        33378   34884    +1506     
   + Misses       5777    5675     -102
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/airflow/pull/7141?src=pr&el=tree) | Coverage Δ | |
   |---|---|---|
   | [airflow/config\_templates/airflow\_local\_settings.py](https://codecov.io/gh/apache/airflow/pull/7141/diff?src=pr&el=tree#diff-YWlyZmxvdy9jb25maWdfdGVtcGxhdGVzL2FpcmZsb3dfbG9jYWxfc2V0dGluZ3MucHk=) | `70.21% <0%> (-1.53%)` | :arrow_down: |
   | [airflow/utils/log/es\_task\_handler.py](https://codecov.io/gh/apache/airflow/pull/7141/diff?src=pr&el=tree#diff-YWlyZmxvdy91dGlscy9sb2cvZXNfdGFza19oYW5kbGVyLnB5) | `93.57% <80%> (+0.18%)` | :arrow_up: |
   | [airflow/operators/postgres\_operator.py](https://codecov.io/gh/apache/airflow/pull/7141/diff?src=pr&el=tree#diff-YWlyZmxvdy9vcGVyYXRvcnMvcG9zdGdyZXNfb3BlcmF0b3IucHk=) | `0% <0%> (-100%)` | :arrow_down: |
   | [airflow/operators/mysql\_to\_hive.py](https://codecov.io/gh/apache/airflow/pull/7141/diff?src=pr&el=tree#diff-YWlyZmxvdy9vcGVyYXRvcnMvbXlzcWxfdG9faGl2ZS5weQ==) | `0% <0%> (-100%)` | :arrow_down: |
   | [...rflow/contrib/sensors/sagemaker\_training\_sensor.py](https://codecov.io/gh/apache/airflow/pull/7141/diff?src=pr&el=tree#diff-YWlyZmxvdy9jb250cmliL3NlbnNvcnMvc2FnZW1ha2VyX3RyYWluaW5nX3NlbnNvci5weQ==) | `0% <0%> (-100%)` | :arrow_down: |
   | [airflow/contrib/operators/snowflake\_operator.py](https://codecov.io/gh/apache/airflow/pull/7141/diff?src=pr&el=tree#diff-YWlyZmxvdy9jb250cmliL29wZXJhdG9ycy9zbm93Zmxha2Vfb3BlcmF0b3IucHk=) | `0% <0%> (-95.84%)` | :arrow_down: |
   | [airflow/operators/s3\_to\_hive\_operator.py](https://codecov.io/gh/apache/airflow/pull/7141/diff?src=pr&el=tree#diff-YWlyZmxvdy9vcGVyYXRvcnMvczNfdG9faGl2ZV9vcGVyYXRvci5weQ==) | `0% <0%> (-93.97%)` | :arrow_down: |
   | [airflow/contrib/hooks/azure\_data\_lake\_hook.py](https://codecov.io/gh/apache/airflow/pull/7141/diff?src=pr&el=tree#diff-YWlyZmxvdy9jb250cmliL2hvb2tzL2F6dXJlX2RhdGFfbGFrZV9ob29rLnB5) | `0% <0%> (-93.11%)` | :arrow_down: |
   | [airflow/contrib/hooks/grpc\_hook.py](https://codecov.io/gh/apache/airflow/pull/7141/diff?src=pr&el=tree#diff-YWlyZmxvdy9jb250cmliL2hvb2tzL2dycGNfaG9vay5weQ==) | `0% <0%> (-91.94%)` | :arrow_down: |
   | [airflow/contrib/sensors/azure\_cosmos\_sensor.py](https://codecov.io/gh/apache/airflow/pull/7141/diff?src=pr&el=tree#diff-YWlyZmxvdy9jb250cmliL3NlbnNvcnMvYXp1cmVfY29zbW9zX3NlbnNvci5weQ==) | `0% <0%> (-81.25%)` | :arrow_down: |
   | ... and [722 more](https://codecov.io/gh/apache/airflow/pull/7141/diff?src=pr&el=tree-more) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/airflow/pull/7141?src=pr&el=continue).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/airflow/pull/7141?src=pr&el=footer). Last update [d654d69...1bb12f3](https://codecov.io/gh/apache/airflow/pull/7141?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [airflow] larryzhu2018 commented on a change in pull request #7141: [AIRFLOW-6544] add log_id to end-of-file mark and also add an index config for logs

Posted by GitBox <gi...@apache.org>.
larryzhu2018 commented on a change in pull request #7141: [AIRFLOW-6544] add log_id to end-of-file mark and also add an index config for logs
URL: https://github.com/apache/airflow/pull/7141#discussion_r368105431
 
 

 ##########
 File path: tests/utils/log/test_es_task_handler.py
 ##########
 @@ -262,21 +277,55 @@ def test_set_context_w_json_format_and_write_stdout(self):
         self.es_task_handler.json_format = True
         self.es_task_handler.set_context(self.ti)
 
-    def test_close(self):
-        formatter = logging.Formatter('%(asctime)s - %(name)s - %(levelname)s - %(message)s')
-        self.es_task_handler.formatter = formatter
+    def test_close_with_log_id(self):
+        es_task_handler = ElasticsearchTaskHandler(
+            self.local_log_location,
+            self.filename_template,
+            self.log_id_template,
+            self.end_of_log_mark,
+            self.write_stdout,
+            True,  # json_format
 
 Review comment:
   when you work with elastic, json formatter is the way to go. We do not have grok filter in the unit tests to parse and index this non-json format message and then index them and look up as I do in the test case here. Realistically we should probably remove the non-json format option, and it is not practical to deploy it correctly as far as I can tell.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [airflow] codecov-io edited a comment on issue #7141: [AIRFLOW-6544] add log_id to end_of_log mark log record and also add an index config for logs

Posted by GitBox <gi...@apache.org>.
codecov-io edited a comment on issue #7141: [AIRFLOW-6544] add log_id to end_of_log mark log record and also add an index config for logs
URL: https://github.com/apache/airflow/pull/7141#issuecomment-573571156
 
 
   # [Codecov](https://codecov.io/gh/apache/airflow/pull/7141?src=pr&el=h1) Report
   > Merging [#7141](https://codecov.io/gh/apache/airflow/pull/7141?src=pr&el=desc) into [master](https://codecov.io/gh/apache/airflow/commit/d654d69d7794a57c5c51685a8a96f1d7c38c2302?src=pr&el=desc) will **decrease** coverage by `0.11%`.
   > The diff coverage is `66.66%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/airflow/pull/7141/graphs/tree.svg?width=650&token=WdLKlKHOAU&height=150&src=pr)](https://codecov.io/gh/apache/airflow/pull/7141?src=pr&el=tree)
   
   ```diff
   @@            Coverage Diff             @@
   ##           master    #7141      +/-   ##
   ==========================================
   - Coverage   85.24%   85.13%   -0.12%     
   ==========================================
     Files         683      753      +70     
     Lines       39155    39713     +558     
   ==========================================
   + Hits        33378    33809     +431     
   - Misses       5777     5904     +127
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/airflow/pull/7141?src=pr&el=tree) | Coverage Δ | |
   |---|---|---|
   | [airflow/config\_templates/airflow\_local\_settings.py](https://codecov.io/gh/apache/airflow/pull/7141/diff?src=pr&el=tree#diff-YWlyZmxvdy9jb25maWdfdGVtcGxhdGVzL2FpcmZsb3dfbG9jYWxfc2V0dGluZ3MucHk=) | `70.21% <0%> (-1.53%)` | :arrow_down: |
   | [airflow/utils/log/es\_task\_handler.py](https://codecov.io/gh/apache/airflow/pull/7141/diff?src=pr&el=tree#diff-YWlyZmxvdy91dGlscy9sb2cvZXNfdGFza19oYW5kbGVyLnB5) | `93.57% <80%> (+0.18%)` | :arrow_up: |
   | [...rflow/contrib/sensors/sagemaker\_training\_sensor.py](https://codecov.io/gh/apache/airflow/pull/7141/diff?src=pr&el=tree#diff-YWlyZmxvdy9jb250cmliL3NlbnNvcnMvc2FnZW1ha2VyX3RyYWluaW5nX3NlbnNvci5weQ==) | `0% <0%> (-100%)` | :arrow_down: |
   | [airflow/contrib/hooks/azure\_data\_lake\_hook.py](https://codecov.io/gh/apache/airflow/pull/7141/diff?src=pr&el=tree#diff-YWlyZmxvdy9jb250cmliL2hvb2tzL2F6dXJlX2RhdGFfbGFrZV9ob29rLnB5) | `0% <0%> (-93.11%)` | :arrow_down: |
   | [airflow/contrib/sensors/azure\_cosmos\_sensor.py](https://codecov.io/gh/apache/airflow/pull/7141/diff?src=pr&el=tree#diff-YWlyZmxvdy9jb250cmliL3NlbnNvcnMvYXp1cmVfY29zbW9zX3NlbnNvci5weQ==) | `0% <0%> (-81.25%)` | :arrow_down: |
   | [airflow/kubernetes/volume\_mount.py](https://codecov.io/gh/apache/airflow/pull/7141/diff?src=pr&el=tree#diff-YWlyZmxvdy9rdWJlcm5ldGVzL3ZvbHVtZV9tb3VudC5weQ==) | `44.44% <0%> (-55.56%)` | :arrow_down: |
   | [airflow/kubernetes/volume.py](https://codecov.io/gh/apache/airflow/pull/7141/diff?src=pr&el=tree#diff-YWlyZmxvdy9rdWJlcm5ldGVzL3ZvbHVtZS5weQ==) | `52.94% <0%> (-47.06%)` | :arrow_down: |
   | [airflow/kubernetes/pod\_launcher.py](https://codecov.io/gh/apache/airflow/pull/7141/diff?src=pr&el=tree#diff-YWlyZmxvdy9rdWJlcm5ldGVzL3BvZF9sYXVuY2hlci5weQ==) | `45.25% <0%> (-46.72%)` | :arrow_down: |
   | [airflow/security/kerberos.py](https://codecov.io/gh/apache/airflow/pull/7141/diff?src=pr&el=tree#diff-YWlyZmxvdy9zZWN1cml0eS9rZXJiZXJvcy5weQ==) | `30.43% <0%> (-45.66%)` | :arrow_down: |
   | [airflow/kubernetes/refresh\_config.py](https://codecov.io/gh/apache/airflow/pull/7141/diff?src=pr&el=tree#diff-YWlyZmxvdy9rdWJlcm5ldGVzL3JlZnJlc2hfY29uZmlnLnB5) | `50.98% <0%> (-23.53%)` | :arrow_down: |
   | ... and [227 more](https://codecov.io/gh/apache/airflow/pull/7141/diff?src=pr&el=tree-more) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/airflow/pull/7141?src=pr&el=continue).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/airflow/pull/7141?src=pr&el=footer). Last update [d654d69...b695a43](https://codecov.io/gh/apache/airflow/pull/7141?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [airflow] larryzhu2018 commented on a change in pull request #7141: [AIRFLOW-6544] add log_id to end-of-file mark and also add an index config for logs

Posted by GitBox <gi...@apache.org>.
larryzhu2018 commented on a change in pull request #7141: [AIRFLOW-6544] add log_id to end-of-file mark and also add an index config for logs
URL: https://github.com/apache/airflow/pull/7141#discussion_r368107102
 
 

 ##########
 File path: tests/utils/log/test_es_task_handler.py
 ##########
 @@ -81,6 +85,17 @@ def setUp(self):
         self.ti.try_number = 1
         self.ti.state = State.RUNNING
         self.addCleanup(self.dag.clear)
+        self.index_name2 = "test_index2"
+        self.es_task_handler2 = ElasticsearchTaskHandler(
+            self.local_log_location,
 
 Review comment:
   can you please suggest anything more meaningful. I would be happy to accommodate. it is literally handler2 with test_index2. Because the index name is part of the handler, we need to use a separate log handler here in order to use a different index name here.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [airflow] pingzh commented on a change in pull request #7141: [AIRFLOW-6544] add log_id to end_of_log mark log record

Posted by GitBox <gi...@apache.org>.
pingzh commented on a change in pull request #7141: [AIRFLOW-6544] add log_id to end_of_log mark log record
URL: https://github.com/apache/airflow/pull/7141#discussion_r369881195
 
 

 ##########
 File path: airflow/utils/log/es_task_handler.py
 ##########
 @@ -255,7 +256,9 @@ def close(self):
 
         # Mark the end of file using end of log mark,
         # so we know where to stop while auto-tailing.
-        self.handler.stream.write(self.end_of_log_mark)
+        if self.write_stdout:
+            print()
+        self.handler.emit(logging.makeLogRecord({'msg': self.end_of_log_mark}))
 
 
 Review comment:
   Hi @larryzhu2018 for your description: 
   ![image](https://user-images.githubusercontent.com/8662365/72946940-52353080-3d35-11ea-9c3b-47996b94f47f.png)
   
   It looks like you have `write_stdout` as `true`, in fact, my pr: https://github.com/apache/airflow/pull/7199 fixed the always `true` for `write_stdout`.  could you please test the case when `write_stdout` is `False`.
   
   Our set up is:
   `write_stdout` is `False`
   `json` is `False`
   `END_OF_LOG_MARK = u'\u0004\n'`
   
   This is what we see in the Kibana
   ![image](https://user-images.githubusercontent.com/8662365/72947738-d8eb0d00-3d37-11ea-8eac-d341a2c66163.png)
   
   I remember if the `end_of_log` is wrapped in a `logging.makeLogRecord` it will start with
   
   ![image](https://user-images.githubusercontent.com/8662365/72947783-033cca80-3d38-11ea-9fc2-a31429464412.png)
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [airflow] andriisoldatenko commented on issue #7141: [AIRFLOW-6544] add log_id to end-of-file mark and also add an index config for logs

Posted by GitBox <gi...@apache.org>.
andriisoldatenko commented on issue #7141: [AIRFLOW-6544] add log_id to end-of-file mark and also add an index config for logs
URL: https://github.com/apache/airflow/pull/7141#issuecomment-574141215
 
 
   Also regarding adding index my **5c**:
   Currently we have something like:
   `GET /search?q=user:kimchy` (To search all indices in a cluster, omit the <index> parameter.)
   New logic:
   `GET /kimchy,elasticsearch/_search?q=user:kimchy` (search in 1 or several incedes), which means all old code won't work.
   
   I like idea to add index explicitly in search param, but it should be optional than required.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [airflow] larryzhu2018 commented on a change in pull request #7141: [AIRFLOW-6544] add log_id to end-of-file mark and also add an index config for logs

Posted by GitBox <gi...@apache.org>.
larryzhu2018 commented on a change in pull request #7141: [AIRFLOW-6544] add log_id to end-of-file mark and also add an index config for logs
URL: https://github.com/apache/airflow/pull/7141#discussion_r368107102
 
 

 ##########
 File path: tests/utils/log/test_es_task_handler.py
 ##########
 @@ -81,6 +85,17 @@ def setUp(self):
         self.ti.try_number = 1
         self.ti.state = State.RUNNING
         self.addCleanup(self.dag.clear)
+        self.index_name2 = "test_index2"
+        self.es_task_handler2 = ElasticsearchTaskHandler(
+            self.local_log_location,
 
 Review comment:
   can you please suggest anything more meaningful. I would be happy to accommodate. it is literally handler2 with test_index2. Because the index name is part of the handler, we need to use a separate index in order to use a different index name here.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [airflow] ashb commented on a change in pull request #7141: [AIRFLOW-6544] add log_id to end-of-file mark and also add an index config for logs

Posted by GitBox <gi...@apache.org>.
ashb commented on a change in pull request #7141: [AIRFLOW-6544] add log_id to end-of-file mark and also add an index config for logs
URL: https://github.com/apache/airflow/pull/7141#discussion_r366227710
 
 

 ##########
 File path: airflow/utils/log/es_task_handler.py
 ##########
 @@ -255,7 +256,9 @@ def close(self):
 
         # Mark the end of file using end of log mark,
         # so we know where to stop while auto-tailing.
-        self.handler.stream.write(self.end_of_log_mark)
+        if self.write_stdout:
+            print()
 
 Review comment:
   What's this for?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [airflow] pingzh commented on a change in pull request #7141: [AIRFLOW-6544] add log_id to end_of_log mark log record

Posted by GitBox <gi...@apache.org>.
pingzh commented on a change in pull request #7141: [AIRFLOW-6544] add log_id to end_of_log mark log record
URL: https://github.com/apache/airflow/pull/7141#discussion_r369881195
 
 

 ##########
 File path: airflow/utils/log/es_task_handler.py
 ##########
 @@ -255,7 +256,9 @@ def close(self):
 
         # Mark the end of file using end of log mark,
         # so we know where to stop while auto-tailing.
-        self.handler.stream.write(self.end_of_log_mark)
+        if self.write_stdout:
+            print()
+        self.handler.emit(logging.makeLogRecord({'msg': self.end_of_log_mark}))
 
 
 Review comment:
   Hi @larryzhu2018 for your description: 
   ![image](https://user-images.githubusercontent.com/8662365/72946940-52353080-3d35-11ea-9c3b-47996b94f47f.png)
   
   It looks like you have `write_stdout` as `true`, in fact, my pr: https://github.com/apache/airflow/pull/7199 fixed the always `true` for `write_stdout`.  could you please test the case when `write_stdout` is `False`.
   
   Our set up is:
   `write_stdout` is `False`
   `json` is `False`
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [airflow] larryzhu2018 commented on issue #7141: [AIRFLOW-6544] add log_id to end_of_log mark log record

Posted by GitBox <gi...@apache.org>.
larryzhu2018 commented on issue #7141: [AIRFLOW-6544] add log_id to end_of_log mark log record
URL: https://github.com/apache/airflow/pull/7141#issuecomment-576042219
 
 
   > > reviewers please help to check the latest iteration
   > 
   > You have to add a description above `---` in Pr message:
   > 
   > ```
   > Your description goes here.
   > ---
   > Issue link: [[AIRFLOW-6544](https://issues.apache.org/jira/browse/AIRFLOW-6544)](https://issues.apache.org/jira/browse/[AIRFLOW-6544](https://issues.apache.org/jira/browse/AIRFLOW-6544))
   > ```
   
   is it correct now?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [airflow] codecov-io edited a comment on issue #7141: [AIRFLOW-6544] add log_id to end_of_log mark log record and also add an index config for logs

Posted by GitBox <gi...@apache.org>.
codecov-io edited a comment on issue #7141: [AIRFLOW-6544] add log_id to end_of_log mark log record and also add an index config for logs
URL: https://github.com/apache/airflow/pull/7141#issuecomment-573571156
 
 
   # [Codecov](https://codecov.io/gh/apache/airflow/pull/7141?src=pr&el=h1) Report
   > Merging [#7141](https://codecov.io/gh/apache/airflow/pull/7141?src=pr&el=desc) into [master](https://codecov.io/gh/apache/airflow/commit/d654d69d7794a57c5c51685a8a96f1d7c38c2302?src=pr&el=desc) will **decrease** coverage by `0.11%`.
   > The diff coverage is `66.66%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/airflow/pull/7141/graphs/tree.svg?width=650&token=WdLKlKHOAU&height=150&src=pr)](https://codecov.io/gh/apache/airflow/pull/7141?src=pr&el=tree)
   
   ```diff
   @@            Coverage Diff             @@
   ##           master    #7141      +/-   ##
   ==========================================
   - Coverage   85.24%   85.13%   -0.12%     
   ==========================================
     Files         683      753      +70     
     Lines       39155    39713     +558     
   ==========================================
   + Hits        33378    33809     +431     
   - Misses       5777     5904     +127
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/airflow/pull/7141?src=pr&el=tree) | Coverage Δ | |
   |---|---|---|
   | [airflow/config\_templates/airflow\_local\_settings.py](https://codecov.io/gh/apache/airflow/pull/7141/diff?src=pr&el=tree#diff-YWlyZmxvdy9jb25maWdfdGVtcGxhdGVzL2FpcmZsb3dfbG9jYWxfc2V0dGluZ3MucHk=) | `70.21% <0%> (-1.53%)` | :arrow_down: |
   | [airflow/utils/log/es\_task\_handler.py](https://codecov.io/gh/apache/airflow/pull/7141/diff?src=pr&el=tree#diff-YWlyZmxvdy91dGlscy9sb2cvZXNfdGFza19oYW5kbGVyLnB5) | `93.57% <80%> (+0.18%)` | :arrow_up: |
   | [...rflow/contrib/sensors/sagemaker\_training\_sensor.py](https://codecov.io/gh/apache/airflow/pull/7141/diff?src=pr&el=tree#diff-YWlyZmxvdy9jb250cmliL3NlbnNvcnMvc2FnZW1ha2VyX3RyYWluaW5nX3NlbnNvci5weQ==) | `0% <0%> (-100%)` | :arrow_down: |
   | [airflow/contrib/hooks/azure\_data\_lake\_hook.py](https://codecov.io/gh/apache/airflow/pull/7141/diff?src=pr&el=tree#diff-YWlyZmxvdy9jb250cmliL2hvb2tzL2F6dXJlX2RhdGFfbGFrZV9ob29rLnB5) | `0% <0%> (-93.11%)` | :arrow_down: |
   | [airflow/contrib/sensors/azure\_cosmos\_sensor.py](https://codecov.io/gh/apache/airflow/pull/7141/diff?src=pr&el=tree#diff-YWlyZmxvdy9jb250cmliL3NlbnNvcnMvYXp1cmVfY29zbW9zX3NlbnNvci5weQ==) | `0% <0%> (-81.25%)` | :arrow_down: |
   | [airflow/kubernetes/volume\_mount.py](https://codecov.io/gh/apache/airflow/pull/7141/diff?src=pr&el=tree#diff-YWlyZmxvdy9rdWJlcm5ldGVzL3ZvbHVtZV9tb3VudC5weQ==) | `44.44% <0%> (-55.56%)` | :arrow_down: |
   | [airflow/kubernetes/volume.py](https://codecov.io/gh/apache/airflow/pull/7141/diff?src=pr&el=tree#diff-YWlyZmxvdy9rdWJlcm5ldGVzL3ZvbHVtZS5weQ==) | `52.94% <0%> (-47.06%)` | :arrow_down: |
   | [airflow/kubernetes/pod\_launcher.py](https://codecov.io/gh/apache/airflow/pull/7141/diff?src=pr&el=tree#diff-YWlyZmxvdy9rdWJlcm5ldGVzL3BvZF9sYXVuY2hlci5weQ==) | `45.25% <0%> (-46.72%)` | :arrow_down: |
   | [airflow/security/kerberos.py](https://codecov.io/gh/apache/airflow/pull/7141/diff?src=pr&el=tree#diff-YWlyZmxvdy9zZWN1cml0eS9rZXJiZXJvcy5weQ==) | `30.43% <0%> (-45.66%)` | :arrow_down: |
   | [airflow/kubernetes/refresh\_config.py](https://codecov.io/gh/apache/airflow/pull/7141/diff?src=pr&el=tree#diff-YWlyZmxvdy9rdWJlcm5ldGVzL3JlZnJlc2hfY29uZmlnLnB5) | `50.98% <0%> (-23.53%)` | :arrow_down: |
   | ... and [227 more](https://codecov.io/gh/apache/airflow/pull/7141/diff?src=pr&el=tree-more) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/airflow/pull/7141?src=pr&el=continue).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/airflow/pull/7141?src=pr&el=footer). Last update [d654d69...b695a43](https://codecov.io/gh/apache/airflow/pull/7141?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [airflow] KevinYang21 commented on a change in pull request #7141: [AIRFLOW-6544] add log_id to end_of_log mark log record

Posted by GitBox <gi...@apache.org>.
KevinYang21 commented on a change in pull request #7141: [AIRFLOW-6544] add log_id to end_of_log mark log record
URL: https://github.com/apache/airflow/pull/7141#discussion_r369425422
 
 

 ##########
 File path: airflow/utils/log/es_task_handler.py
 ##########
 @@ -255,7 +256,9 @@ def close(self):
 
         # Mark the end of file using end of log mark,
         # so we know where to stop while auto-tailing.
-        self.handler.stream.write(self.end_of_log_mark)
+        if self.write_stdout:
+            print()
+        self.handler.emit(logging.makeLogRecord({'msg': self.end_of_log_mark}))
 
 
 Review comment:
   sry I was get lost in my inbox, yes it is very clear to me now. Thank you for filling me in with details!

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [airflow] larryzhu2018 commented on issue #7141: [AIRFLOW-6544] add log_id to end-of-file mark and also add an index config for logs

Posted by GitBox <gi...@apache.org>.
larryzhu2018 commented on issue #7141: [AIRFLOW-6544] add log_id to end-of-file mark and also add an index config for logs
URL: https://github.com/apache/airflow/pull/7141#issuecomment-574447922
 
 
   > @ashb IMO better to split 2 unrelated changes into 2 tickets:
   > 
   > * add index (good idea)
   > * fix bug another ticket?
   >   general note: i would like to see test case to proof bug and improve test coverage after this change.
   
   The fix for "5528" removed the log_id from the end-of-log mark log record. It broke the end  to end scenarios. Because the current elastic search mock does not add log-id to the log records, so it cannot test the end-to-end scenarios where in the real elastic search cluster we rely on the log_id to find the logs for airflow.  Since you were the developer who added log_id support, perhaps you can help to update the elastic search mock to add the log id in the log records, so that the scenarios won't be broken in the future, e.g. by another fix along the line of 5528?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [airflow] andriisoldatenko edited a comment on issue #7141: [AIRFLOW-6544] add log_id to end-of-file mark and also add an index config for logs

Posted by GitBox <gi...@apache.org>.
andriisoldatenko edited a comment on issue #7141: [AIRFLOW-6544] add log_id to end-of-file mark and also add an index config for logs
URL: https://github.com/apache/airflow/pull/7141#issuecomment-574136436
 
 
   @ashb  IMO better to split 2 unrelated changes into 2 tickets: 
   - add index (good idea)
   - fix bug another ticket?
   general note: i would like to see test case to proof bug and improve test coverage after this change.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [airflow] ashb commented on issue #7141: [AIRFLOW-6544] add log_id to end_of_log mark log record

Posted by GitBox <gi...@apache.org>.
ashb commented on issue #7141: [AIRFLOW-6544] add log_id to end_of_log mark log record
URL: https://github.com/apache/airflow/pull/7141#issuecomment-579989145
 
 
   
   > I only reverted Ping's recent fix that broke logging using elastic search.
   
   The disagreement is wether it's even broken or not. You say it is, Ping says it isn't.
   
   If Ping/Kevin don't respond soon then I'll try and follow your ingest instructions and check our the before and after.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services