You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2021/07/13 12:25:46 UTC

[GitHub] [airflow] potiuk commented on a change in pull request #16954: PythonVritualenvOperator - Feature: Add support for private pypi repo packages in requirements

potiuk commented on a change in pull request #16954:
URL: https://github.com/apache/airflow/pull/16954#discussion_r668703521



##########
File path: tests/operators/test_python.py
##########
@@ -817,6 +822,31 @@ def f():
 
         self._run_as_operator(f, requirements=['funcsigs', 'dill'], system_site_packages=False)
 
+    def test_private_repo_requirements(self):
+
+        with create_session() as session:
+            self._clean_stop_pypi_server()
+            self._initiate_pypi_server()

Review comment:
       This could be pytest fixture or at the very least it should be either in try/finally or in setUp()/tearDown(). otherwise there is a risk the server stays running after test throws an exception.

##########
File path: airflow/utils/python_virtualenv.py
##########
@@ -35,12 +37,35 @@ def _generate_virtualenv_cmd(tmp_dir: str, python_bin: str, system_site_packages
     return cmd
 
 
-def _generate_pip_install_cmd(tmp_dir: str, requirements: List[str]) -> Optional[List[str]]:
+def _generate_pip_install_cmd(tmp_dir: str,
+                              requirements: List[str],
+                              connection_id: Optional[str] = None
+                              ) -> Optional[List[str]]:
     if not requirements:
         return None
-    # direct path alleviates need to activate
-    cmd = [f'{tmp_dir}/bin/pip', 'install']
-    return cmd + requirements
+
+    if connection_id:
+        con: Connection = BaseHook.get_connection(connection_id)
+        user = con.login
+        schema = con.schema or 'http'
+        password = con.get_password()
+        port = con.port
+        host = con.host
+        if user:
+            index_url = f"{schema}://{user}:{password}@{host}:{port}/repository/python/simple"

Review comment:
       There are few problems here:
   * port is Optional [int] - it might be None - we need to provide sensible default if we want to hard-code : (or  handle None) case.
   * "/repository/python/simple should not be hard-coded. I imagine registries can have different path. If anything the "/repository/python/simple" might be a default, but it should be overridable via extra in connection
   * `https://` should be default for private registries. There are good reasons for private registry to be private and I cannot imagine a case where it would be http-based, leaving it essentially unprotected and with user/passwords sent in plain text.
   

##########
File path: setup.cfg
##########
@@ -160,6 +160,7 @@ install_requires =
     typing-extensions>=3.7.4;python_version<"3.8"
     unicodecsv>=0.14.1
     werkzeug~=1.0, >=1.0.1
+    pypiserver~=1.4

Review comment:
       We only need pypi server for development. There is no need to make whole airflow depend on it in `install_requires`. Please move it to `[devel]` extra in `setup.py`.

##########
File path: tests/operators/test_python.py
##########
@@ -817,6 +822,31 @@ def f():
 
         self._run_as_operator(f, requirements=['funcsigs', 'dill'], system_site_packages=False)
 
+    def test_private_repo_requirements(self):
+
+        with create_session() as session:
+            self._clean_stop_pypi_server()
+            self._initiate_pypi_server()
+            session.execute("delete from connection where conn_id = 'private_repo_test_conn';")
+            cn = Connection("private_repo_test_conn",

Review comment:
       There is no need to create the connection. It's enough to patch the method with `AIRFLOW__CONN__NN` environment variables (https://airflow.apache.org/docs/apache-airflow/stable/howto/connection.html#storing-a-connection-in-environment-variables  - also there is an example of how you can generate the connection url from the raw values - incluidng percent-encoding etc. https://airflow.apache.org/docs/apache-airflow/stable/howto/connection.html#generating-a-connection-uri) . Another problem with this implementation is that it creates side-effect - the connection remains in the database after the test is run. If you termporarily patch env variable for the test, it will work faster and will not leave a side effect. 

##########
File path: tests/operators/test_python.py
##########
@@ -856,6 +886,25 @@ def f():
 
         self._run_as_operator(f, python_version=3, use_dill=False, requirements=['dill'])
 
+    @staticmethod
+    def _initiate_pypi_server():
+        try:
+            if not os.path.exists('/root/packages'):
+                os.makedirs('/root/packages')
+            subprocess.Popen(['pypi-server'])
+            time.sleep(1.5)
+        except Exception as e:
+            pass
+
+    @staticmethod
+    def _clean_stop_pypi_server(port: bytes = b"8080"):

Review comment:
       Too complex. running system command `killall  -9 pypi-server` does the job in one line of code.. And with the proper setup/teardown sequence, you should store the pid of the process started and try to kill it with SIGTERM rather than SIGKILL first anyway - this method should only be treated as a "safety net",

##########
File path: tests/operators/test_python.py
##########
@@ -856,6 +886,25 @@ def f():
 
         self._run_as_operator(f, python_version=3, use_dill=False, requirements=['dill'])
 
+    @staticmethod
+    def _initiate_pypi_server():
+        try:
+            if not os.path.exists('/root/packages'):
+                os.makedirs('/root/packages')
+            subprocess.Popen(['pypi-server'])
+            time.sleep(1.5)

Review comment:
       How do we know the magic value of 1.5 is good ? Are you sure it is enough for all cases (running in CI etc?). That's a very bad idea to sleep in unit tests. It slows down the whole test suite. There should be a better way to check if the server is running. One option is to run an http request to it in a loop until it succeeds if there is no better way.

##########
File path: airflow/utils/python_virtualenv.py
##########
@@ -35,12 +37,35 @@ def _generate_virtualenv_cmd(tmp_dir: str, python_bin: str, system_site_packages
     return cmd
 
 
-def _generate_pip_install_cmd(tmp_dir: str, requirements: List[str]) -> Optional[List[str]]:
+def _generate_pip_install_cmd(tmp_dir: str,
+                              requirements: List[str],
+                              connection_id: Optional[str] = None
+                              ) -> Optional[List[str]]:
     if not requirements:
         return None
-    # direct path alleviates need to activate
-    cmd = [f'{tmp_dir}/bin/pip', 'install']
-    return cmd + requirements
+
+    if connection_id:
+        con: Connection = BaseHook.get_connection(connection_id)
+        user = con.login
+        schema = con.schema or 'http'
+        password = con.get_password()
+        port = con.port
+        host = con.host
+        if user:
+            index_url = f"{schema}://{user}:{password}@{host}:{port}/repository/python/simple"
+        else:
+            index_url = f"{schema}://{host}:{port}/repository/python/simple"
+        private_cmd = [f'{tmp_dir}/bin/pip',
+                       'install',
+                       f'--trusted-host', host,

Review comment:
       I will revert to @uranusjr  here. He is our local PyPI expert (and pypi committer) so I trust 100% in his statements here.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org