You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/04/06 12:53:32 UTC

[GitHub] [arrow-cookbook] lidavidm commented on a diff in pull request #177: Include Python Server Client Example with Certificates

lidavidm commented on code in PR #177:
URL: https://github.com/apache/arrow-cookbook/pull/177#discussion_r843910123


##########
python/source/flight.rst:
##########
@@ -605,3 +605,102 @@ Or if we use the wrong credentials on login, we also get an error:
     server.shutdown()
 
 .. _(HTTP) basic authentication: https://developer.mozilla.org/en-US/docs/Web/HTTP/Authentication#basic_authentication_scheme
+
+Authentication with certificates
+=================================
+
+Following on from the previous scenario where traffic to the server is managed via a username and password, 
+HTTPS (more specifically TLS) communication allows an additional layer of security by encrypting messages
+between the client and server. This is achieved using certificates. During development, the easiest 
+approach is developing with self-signed certificates. At startup, the server loads the public and private 
+key and the client client authenticates itself to the server with a public key.
+
+.. note:: In production environments it is recommended to make use of a certificate signed by a certificate authority.
+.. note:: This tutorial uses Windows to create a self-signed certificate. For Linux environments, other methods such as OpenSSL can be used.

Review Comment:
   If possible, we should drop this note, and just link to other resources.
   
   Alternatively, we can remove the instructions altogether, and point to the testing data repository. "Generate a self-signed certificate by using `dotnet` on Windows (link), or `openssl` on Linux or MacOS (link). Or, use the self-signed certificate from the Arrow testing data repository (link)." https://github.com/apache/arrow-testing/tree/master/data/flight



##########
python/source/flight.rst:
##########
@@ -605,3 +605,102 @@ Or if we use the wrong credentials on login, we also get an error:
     server.shutdown()
 
 .. _(HTTP) basic authentication: https://developer.mozilla.org/en-US/docs/Web/HTTP/Authentication#basic_authentication_scheme
+
+Authentication with certificates
+=================================
+
+Following on from the previous scenario where traffic to the server is managed via a username and password, 
+HTTPS (more specifically TLS) communication allows an additional layer of security by encrypting messages
+between the client and server. This is achieved using certificates. During development, the easiest 
+approach is developing with self-signed certificates. At startup, the server loads the public and private 
+key and the client client authenticates itself to the server with a public key.

Review Comment:
   This is describing TLS mutual authentication, right? But that's not what the example shows.



##########
python/source/flight.rst:
##########
@@ -605,3 +605,102 @@ Or if we use the wrong credentials on login, we also get an error:
     server.shutdown()
 
 .. _(HTTP) basic authentication: https://developer.mozilla.org/en-US/docs/Web/HTTP/Authentication#basic_authentication_scheme
+
+Authentication with certificates
+=================================
+
+Following on from the previous scenario where traffic to the server is managed via a username and password, 
+HTTPS (more specifically TLS) communication allows an additional layer of security by encrypting messages
+between the client and server. This is achieved using certificates. During development, the easiest 
+approach is developing with self-signed certificates. At startup, the server loads the public and private 
+key and the client client authenticates itself to the server with a public key.
+
+.. note:: In production environments it is recommended to make use of a certificate signed by a certificate authority.
+.. note:: This tutorial uses Windows to create a self-signed certificate. For Linux environments, other methods such as OpenSSL can be used.
+
+**Step 1 - Generating the Self Signed Certificate**  
+
+To generate a self-signed certificate, run command prompt as administrator and run the following commands.
+.. testcode::
+    dotnet dev-certs https --trust
+    dotnet dev-certs https -ep "<CertificateName>.pfx" -p <CertificatePassword>
+
+You will receive a prompt asking you confirm that you would like to trust this certificate, select yes. 
+You now have a self-signed certificate that your local environment trusts.
+
+**Step 2 - Converting the .pfx file into public and private keys** 
+
+Since `dotnet dev-certs` does not let you export Public and Private keys directly we need to convert the .pfx file. 
+There are several way to achieve this and this tutorial uses OpenSSL (using Windows Subsystem for Linux) 
+to perform the conversion as per this `IBM article`_.
+
+**Step 3 - Running a server with tls enabled**

Review Comment:
   ```suggestion
   **Step 3 - Running a server with TLS enabled**
   ```



##########
python/source/flight.rst:
##########
@@ -605,3 +605,102 @@ Or if we use the wrong credentials on login, we also get an error:
     server.shutdown()
 
 .. _(HTTP) basic authentication: https://developer.mozilla.org/en-US/docs/Web/HTTP/Authentication#basic_authentication_scheme
+
+Authentication with certificates
+=================================
+
+Following on from the previous scenario where traffic to the server is managed via a username and password, 
+HTTPS (more specifically TLS) communication allows an additional layer of security by encrypting messages
+between the client and server. This is achieved using certificates. During development, the easiest 
+approach is developing with self-signed certificates. At startup, the server loads the public and private 
+key and the client client authenticates itself to the server with a public key.
+
+.. note:: In production environments it is recommended to make use of a certificate signed by a certificate authority.
+.. note:: This tutorial uses Windows to create a self-signed certificate. For Linux environments, other methods such as OpenSSL can be used.
+
+**Step 1 - Generating the Self Signed Certificate**  
+
+To generate a self-signed certificate, run command prompt as administrator and run the following commands.
+.. testcode::
+    dotnet dev-certs https --trust
+    dotnet dev-certs https -ep "<CertificateName>.pfx" -p <CertificatePassword>
+
+You will receive a prompt asking you confirm that you would like to trust this certificate, select yes. 
+You now have a self-signed certificate that your local environment trusts.
+
+**Step 2 - Converting the .pfx file into public and private keys** 
+
+Since `dotnet dev-certs` does not let you export Public and Private keys directly we need to convert the .pfx file. 
+There are several way to achieve this and this tutorial uses OpenSSL (using Windows Subsystem for Linux) 
+to perform the conversion as per this `IBM article`_.
+
+**Step 3 - Running a server with tls enabled**
+
+We're going to use the pyarrow server example available on the `GitHub repo`_. To run the server with TLS enabled, the python script should be 
+called with the path to the public and private keys.
+.. testcode::
+    python server.py --tls CERTFILE <PathToPublicCertificate> --tls KEYFILE <PathToPrivateKey>
+
+Assuming the path was valid, you should see ``Serving on grpc+tls://localhost:5005``. The server is now being served on a port set in the code (or by you).
+
+**Step 4 - Securely Connecting a client to the Server**
+Suppose we want to connect to the client and push some data to it. The following code securely sends information to the server using TLS encryption. 
+There is also the option to use mutual TLS encryption using both the public and private key, but we will assume the client will likely only have 
+the public certificate.
+.. testcode::
+    import pyarrow
+    import pyarrow.flight
+    import pandas as pd
+    
+    # Assumes incoming data object is a Dataframe
+    def pushToServer(name, data, client):

Review Comment:
   Note that Python generally uses snake_case (ignoring things like the `logging` module :slightly_smiling_face:)



##########
python/source/flight.rst:
##########
@@ -605,3 +605,102 @@ Or if we use the wrong credentials on login, we also get an error:
     server.shutdown()
 
 .. _(HTTP) basic authentication: https://developer.mozilla.org/en-US/docs/Web/HTTP/Authentication#basic_authentication_scheme
+
+Authentication with certificates
+=================================
+
+Following on from the previous scenario where traffic to the server is managed via a username and password, 
+HTTPS (more specifically TLS) communication allows an additional layer of security by encrypting messages
+between the client and server. This is achieved using certificates. During development, the easiest 
+approach is developing with self-signed certificates. At startup, the server loads the public and private 
+key and the client client authenticates itself to the server with a public key.
+
+.. note:: In production environments it is recommended to make use of a certificate signed by a certificate authority.
+.. note:: This tutorial uses Windows to create a self-signed certificate. For Linux environments, other methods such as OpenSSL can be used.
+
+**Step 1 - Generating the Self Signed Certificate**  
+
+To generate a self-signed certificate, run command prompt as administrator and run the following commands.
+.. testcode::
+    dotnet dev-certs https --trust
+    dotnet dev-certs https -ep "<CertificateName>.pfx" -p <CertificatePassword>
+
+You will receive a prompt asking you confirm that you would like to trust this certificate, select yes. 
+You now have a self-signed certificate that your local environment trusts.
+
+**Step 2 - Converting the .pfx file into public and private keys** 
+
+Since `dotnet dev-certs` does not let you export Public and Private keys directly we need to convert the .pfx file. 
+There are several way to achieve this and this tutorial uses OpenSSL (using Windows Subsystem for Linux) 
+to perform the conversion as per this `IBM article`_.
+
+**Step 3 - Running a server with tls enabled**
+
+We're going to use the pyarrow server example available on the `GitHub repo`_. To run the server with TLS enabled, the python script should be 
+called with the path to the public and private keys.
+.. testcode::
+    python server.py --tls CERTFILE <PathToPublicCertificate> --tls KEYFILE <PathToPrivateKey>

Review Comment:
   We want to keep the examples self-contained, so would it be possible to instead modify the previous server?



##########
python/source/flight.rst:
##########
@@ -605,3 +605,102 @@ Or if we use the wrong credentials on login, we also get an error:
     server.shutdown()
 
 .. _(HTTP) basic authentication: https://developer.mozilla.org/en-US/docs/Web/HTTP/Authentication#basic_authentication_scheme
+
+Authentication with certificates
+=================================
+
+Following on from the previous scenario where traffic to the server is managed via a username and password, 
+HTTPS (more specifically TLS) communication allows an additional layer of security by encrypting messages
+between the client and server. This is achieved using certificates. During development, the easiest 
+approach is developing with self-signed certificates. At startup, the server loads the public and private 
+key and the client client authenticates itself to the server with a public key.
+
+.. note:: In production environments it is recommended to make use of a certificate signed by a certificate authority.
+.. note:: This tutorial uses Windows to create a self-signed certificate. For Linux environments, other methods such as OpenSSL can be used.
+
+**Step 1 - Generating the Self Signed Certificate**  
+
+To generate a self-signed certificate, run command prompt as administrator and run the following commands.
+.. testcode::
+    dotnet dev-certs https --trust
+    dotnet dev-certs https -ep "<CertificateName>.pfx" -p <CertificatePassword>
+
+You will receive a prompt asking you confirm that you would like to trust this certificate, select yes. 
+You now have a self-signed certificate that your local environment trusts.
+
+**Step 2 - Converting the .pfx file into public and private keys** 
+
+Since `dotnet dev-certs` does not let you export Public and Private keys directly we need to convert the .pfx file. 
+There are several way to achieve this and this tutorial uses OpenSSL (using Windows Subsystem for Linux) 
+to perform the conversion as per this `IBM article`_.
+
+**Step 3 - Running a server with tls enabled**
+
+We're going to use the pyarrow server example available on the `GitHub repo`_. To run the server with TLS enabled, the python script should be 
+called with the path to the public and private keys.
+.. testcode::
+    python server.py --tls CERTFILE <PathToPublicCertificate> --tls KEYFILE <PathToPrivateKey>
+
+Assuming the path was valid, you should see ``Serving on grpc+tls://localhost:5005``. The server is now being served on a port set in the code (or by you).
+
+**Step 4 - Securely Connecting a client to the Server**

Review Comment:
   Nit, but these section titles are capitalized inconsistently



##########
python/source/flight.rst:
##########
@@ -605,3 +605,102 @@ Or if we use the wrong credentials on login, we also get an error:
     server.shutdown()
 
 .. _(HTTP) basic authentication: https://developer.mozilla.org/en-US/docs/Web/HTTP/Authentication#basic_authentication_scheme
+
+Authentication with certificates
+=================================
+
+Following on from the previous scenario where traffic to the server is managed via a username and password, 
+HTTPS (more specifically TLS) communication allows an additional layer of security by encrypting messages
+between the client and server. This is achieved using certificates. During development, the easiest 
+approach is developing with self-signed certificates. At startup, the server loads the public and private 
+key and the client client authenticates itself to the server with a public key.
+
+.. note:: In production environments it is recommended to make use of a certificate signed by a certificate authority.
+.. note:: This tutorial uses Windows to create a self-signed certificate. For Linux environments, other methods such as OpenSSL can be used.
+
+**Step 1 - Generating the Self Signed Certificate**  
+
+To generate a self-signed certificate, run command prompt as administrator and run the following commands.
+.. testcode::
+    dotnet dev-certs https --trust
+    dotnet dev-certs https -ep "<CertificateName>.pfx" -p <CertificatePassword>
+
+You will receive a prompt asking you confirm that you would like to trust this certificate, select yes. 
+You now have a self-signed certificate that your local environment trusts.
+
+**Step 2 - Converting the .pfx file into public and private keys** 
+
+Since `dotnet dev-certs` does not let you export Public and Private keys directly we need to convert the .pfx file. 
+There are several way to achieve this and this tutorial uses OpenSSL (using Windows Subsystem for Linux) 
+to perform the conversion as per this `IBM article`_.
+
+**Step 3 - Running a server with tls enabled**
+
+We're going to use the pyarrow server example available on the `GitHub repo`_. To run the server with TLS enabled, the python script should be 
+called with the path to the public and private keys.
+.. testcode::
+    python server.py --tls CERTFILE <PathToPublicCertificate> --tls KEYFILE <PathToPrivateKey>
+
+Assuming the path was valid, you should see ``Serving on grpc+tls://localhost:5005``. The server is now being served on a port set in the code (or by you).
+
+**Step 4 - Securely Connecting a client to the Server**
+Suppose we want to connect to the client and push some data to it. The following code securely sends information to the server using TLS encryption. 
+There is also the option to use mutual TLS encryption using both the public and private key, but we will assume the client will likely only have 
+the public certificate.
+.. testcode::
+    import pyarrow
+    import pyarrow.flight
+    import pandas as pd
+    
+    # Assumes incoming data object is a Dataframe
+    def pushToServer(name, data, client):
+        objectToSend = pyarrow.Table.from_pandas(data)
+        writer, _ = client.do_put(pyarrow.flight.FlightDescriptor.for_path(name), objectToSend.schema)
+        writer.write_table(objectToSend)
+        writer.close()
+    
+    def getClient():
+        
+        return pyarrow.flight.FlightClient("grpc+tcp://localhost:5005")
+    
+    def _add_common_arguments(parser):
+        parser.add_argument('--tls', action='store_true',
+                            help='Enable transport-level security')
+        parser.add_argument('--tls-roots', default=None,
+                            help='Path to trusted TLS certificate(s)')
+        parser.add_argument("--mtls", nargs=2, default=None,
+                            metavar=('CERTFILE', 'KEYFILE'),
+                            help="Enable transport-level security")
+                            
+    def main():
+        parser = argparse.ArgumentParser()
+        args = parser.parse_args()
+        connection_args = {}
+        scheme = "grpc+tls"
+        
+        if args.tls:
+            
+            if args.tls_roots:
+                with open(args.tls_roots, "rb") as root_certs:
+                    connection_args["tls_root_certs"] = root_certs.read()
+        if args.mtls:
+            with open(args.mtls[0], "rb") as cert_file:
+                tls_cert_chain = cert_file.read()
+            with open(args.mtls[1], "rb") as key_file:
+                tls_private_key = key_file.read()
+            connection_args["cert_chain"] = tls_cert_chain
+            connection_args["private_key"] = tls_private_key

Review Comment:
   IMO, we can split mutual TLS into a separate example to reduce the number of things going on here.



##########
python/source/flight.rst:
##########
@@ -605,3 +605,102 @@ Or if we use the wrong credentials on login, we also get an error:
     server.shutdown()
 
 .. _(HTTP) basic authentication: https://developer.mozilla.org/en-US/docs/Web/HTTP/Authentication#basic_authentication_scheme
+
+Authentication with certificates
+=================================
+
+Following on from the previous scenario where traffic to the server is managed via a username and password, 
+HTTPS (more specifically TLS) communication allows an additional layer of security by encrypting messages
+between the client and server. This is achieved using certificates. During development, the easiest 
+approach is developing with self-signed certificates. At startup, the server loads the public and private 
+key and the client client authenticates itself to the server with a public key.
+
+.. note:: In production environments it is recommended to make use of a certificate signed by a certificate authority.
+.. note:: This tutorial uses Windows to create a self-signed certificate. For Linux environments, other methods such as OpenSSL can be used.
+
+**Step 1 - Generating the Self Signed Certificate**  
+
+To generate a self-signed certificate, run command prompt as administrator and run the following commands.
+.. testcode::
+    dotnet dev-certs https --trust
+    dotnet dev-certs https -ep "<CertificateName>.pfx" -p <CertificatePassword>
+
+You will receive a prompt asking you confirm that you would like to trust this certificate, select yes. 
+You now have a self-signed certificate that your local environment trusts.
+
+**Step 2 - Converting the .pfx file into public and private keys** 
+
+Since `dotnet dev-certs` does not let you export Public and Private keys directly we need to convert the .pfx file. 
+There are several way to achieve this and this tutorial uses OpenSSL (using Windows Subsystem for Linux) 
+to perform the conversion as per this `IBM article`_.
+
+**Step 3 - Running a server with tls enabled**
+
+We're going to use the pyarrow server example available on the `GitHub repo`_. To run the server with TLS enabled, the python script should be 
+called with the path to the public and private keys.
+.. testcode::
+    python server.py --tls CERTFILE <PathToPublicCertificate> --tls KEYFILE <PathToPrivateKey>
+
+Assuming the path was valid, you should see ``Serving on grpc+tls://localhost:5005``. The server is now being served on a port set in the code (or by you).
+
+**Step 4 - Securely Connecting a client to the Server**
+Suppose we want to connect to the client and push some data to it. The following code securely sends information to the server using TLS encryption. 
+There is also the option to use mutual TLS encryption using both the public and private key, but we will assume the client will likely only have 
+the public certificate.
+.. testcode::
+    import pyarrow
+    import pyarrow.flight
+    import pandas as pd
+    
+    # Assumes incoming data object is a Dataframe
+    def pushToServer(name, data, client):
+        objectToSend = pyarrow.Table.from_pandas(data)
+        writer, _ = client.do_put(pyarrow.flight.FlightDescriptor.for_path(name), objectToSend.schema)
+        writer.write_table(objectToSend)
+        writer.close()
+    
+    def getClient():
+        
+        return pyarrow.flight.FlightClient("grpc+tcp://localhost:5005")

Review Comment:
   Is this used?



##########
python/source/flight.rst:
##########
@@ -605,3 +605,102 @@ Or if we use the wrong credentials on login, we also get an error:
     server.shutdown()
 
 .. _(HTTP) basic authentication: https://developer.mozilla.org/en-US/docs/Web/HTTP/Authentication#basic_authentication_scheme
+
+Authentication with certificates
+=================================
+
+Following on from the previous scenario where traffic to the server is managed via a username and password, 
+HTTPS (more specifically TLS) communication allows an additional layer of security by encrypting messages
+between the client and server. This is achieved using certificates. During development, the easiest 
+approach is developing with self-signed certificates. At startup, the server loads the public and private 
+key and the client client authenticates itself to the server with a public key.
+
+.. note:: In production environments it is recommended to make use of a certificate signed by a certificate authority.
+.. note:: This tutorial uses Windows to create a self-signed certificate. For Linux environments, other methods such as OpenSSL can be used.
+
+**Step 1 - Generating the Self Signed Certificate**  
+
+To generate a self-signed certificate, run command prompt as administrator and run the following commands.
+.. testcode::
+    dotnet dev-certs https --trust
+    dotnet dev-certs https -ep "<CertificateName>.pfx" -p <CertificatePassword>
+
+You will receive a prompt asking you confirm that you would like to trust this certificate, select yes. 
+You now have a self-signed certificate that your local environment trusts.
+
+**Step 2 - Converting the .pfx file into public and private keys** 
+
+Since `dotnet dev-certs` does not let you export Public and Private keys directly we need to convert the .pfx file. 
+There are several way to achieve this and this tutorial uses OpenSSL (using Windows Subsystem for Linux) 
+to perform the conversion as per this `IBM article`_.
+
+**Step 3 - Running a server with tls enabled**
+
+We're going to use the pyarrow server example available on the `GitHub repo`_. To run the server with TLS enabled, the python script should be 
+called with the path to the public and private keys.
+.. testcode::
+    python server.py --tls CERTFILE <PathToPublicCertificate> --tls KEYFILE <PathToPrivateKey>

Review Comment:
   It would also be nice to follow the general structure laid out above when demoing code.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org