You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2021/12/28 19:07:13 UTC

[GitHub] [airflow] potiuk commented on a change in pull request #20473: SSHOperator.execute() returns str regardless of enable_xcom_pickling

potiuk commented on a change in pull request #20473:
URL: https://github.com/apache/airflow/pull/20473#discussion_r776038635



##########
File path: airflow/providers/ssh/operators/ssh.py
##########
@@ -218,10 +215,7 @@ def execute(self, context=None) -> Union[bytes, str]:
                 result = self.run_ssh_client_command(ssh_client, self.command)
         except Exception as e:
             raise AirflowException(f"SSH operator error: {str(e)}")
-        enable_pickling = conf.getboolean('core', 'enable_xcom_pickling')
-        if not enable_pickling:
-            result = b64encode(result).decode('utf-8')
-        return result
+        return result.decode('utf-8')

Review comment:
       How do we know we can decode using utf-8 here? I think the main problem here is that the ssh command might simply return (depend which command you run - very raw bytes, not even encoded-string. For example when you run 
   `ssh root@host 'cat /bin/bash'` without allocating terminal, you will get content of the file returned. This will work in the old SSH in case of  both flags are set:
   
   - with pickling - content of the file will be saver as byte array in xcom
   - without pickling - the file content will be base64encoded and then converted to the utf8-string
   
   After the change, this will fail and there is no way to send binary, non-encoded data over SSH.
   
   I can't exactly remember, I tried to find it but I was unable to. However  I vaguely recall a discussion that someone had quite a legitimate reason for this - it could be kerberos  keytab sent over ssh connection automatically when authorized or something equally "convoluted". There are some "security-ish" approaches where you want to utilize ssh authentication, connect to a server and you receieve binary data. 
   
   And the current implementation actually handles this case very well.
   
   Whether it's a legitimate case, that is a different story, but I think we cannot just "scrap" the behaviour. And for sure assuming that the data returned by SSH connection will be utf-8 encoded is wrong.
   
   If we really want to store strings, I'd say we should simply add a "result_encoding" parameter and if it is set, we cold do what you did. But this will add even more complexity to what could be expected from ssh operator. 
   
   So I am not really sure if we want to do anyting here :)




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org