You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@arrow.apache.org by li...@apache.org on 2022/04/11 19:23:46 UTC

[arrow-cookbook] branch main updated: [Flight] Add Python Server Client Example with Certificates (#177)

This is an automated email from the ASF dual-hosted git repository.

lidavidm pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/arrow-cookbook.git


The following commit(s) were added to refs/heads/main by this push:
     new bf258b8  [Flight] Add Python Server Client Example with Certificates (#177)
bf258b8 is described below

commit bf258b83d0470db13ffd90ac3c339ae44b2cbeab
Author: Christopher Dunderdale <47...@users.noreply.github.com>
AuthorDate: Mon Apr 11 21:23:41 2022 +0200

    [Flight] Add Python Server Client Example with Certificates (#177)
    
    * Include Python Server Client Example with Certificates
    
    * Update code output to not be hit by unit tests
    
    * Pull request comments
    
    * Fix build issues
    
    * fix more build issues
    
    * More fixes
    
    * Fix more issues (part 3)
    
    * Another attempt to fix build issues
    
    * spacing after testcode
    
    * Address comments
    
    * Address comments
    
    * address final comments
---
 python/source/create.rst |   2 +-
 python/source/flight.rst | 135 ++++++++++++++++++++++++++++++++++++++++++++++-
 2 files changed, 134 insertions(+), 3 deletions(-)

diff --git a/python/source/create.rst b/python/source/create.rst
index b9ca220..9571a29 100644
--- a/python/source/create.rst
+++ b/python/source/create.rst
@@ -107,7 +107,7 @@ from numpy and pandas arrays and series, but it's also
 possible to create Arrow Arrays and Tables from 
 plain Python structures.
 
-the :func:`pyarrow.table` function allows creation of Tables
+The :func:`pyarrow.table` function allows creation of Tables
 from a variety of inputs, including plain python objects
 
 .. testcode::
diff --git a/python/source/flight.rst b/python/source/flight.rst
index f41e325..0868cc9 100644
--- a/python/source/flight.rst
+++ b/python/source/flight.rst
@@ -88,8 +88,8 @@ the information regarding a single specific data stream.
 Then we expose :meth:`pyarrow.flight.FlightServerBase.do_get` which is in charge
 of actually fetching the exposed data streams and sending them to the client.
 
-Allowing to list and dowload data streams would be pretty useless if we didn't
-expose a way to create them, this is the responsability of
+Allowing to list and download data streams would be pretty useless if we didn't
+expose a way to create them, this is the responsibility of
 :meth:`pyarrow.flight.FlightServerBase.do_put` which is in charge of receiving
 new data from the client and dealing with it (in this case saving it
 into a parquet file)
@@ -605,3 +605,134 @@ Or if we use the wrong credentials on login, we also get an error:
     server.shutdown()
 
 .. _(HTTP) basic authentication: https://developer.mozilla.org/en-US/docs/Web/HTTP/Authentication#basic_authentication_scheme
+
+Securing connections with TLS
+=================================
+
+Following on from the previous scenario where traffic to the server is managed via a username and password, 
+HTTPS (more specifically TLS) communication allows an additional layer of security by encrypting messages
+between the client and server. This is achieved using certificates. During development, the easiest 
+approach is developing with self-signed certificates. At startup, the server loads the public and private 
+key and the client authenticates the server with the TLS root certificate.
+
+.. note:: In production environments it is recommended to make use of a certificate signed by a certificate authority.
+
+**Step 1 - Generating the Self Signed Certificate**  
+
+Generate a self-signed certificate by using dotnet on `Windows`_, or `openssl`_ on Linux or MacOS. 
+Alternatively, the self-signed certificate from the `Arrow testing data repository`_ can be used. 
+Depending on the file generated, you may need to convert it to a .crt and .key file as required for the Arrow server. 
+One method to achieve this is openssl, please visit this `IBM article`_ for more info. 
+
+
+**Step 2 - Running a server with TLS enabled**
+
+The code below is a minimal working example of an Arrow server used to receive data with TLS.
+
+.. testcode::
+    
+    import argparse
+    import pyarrow
+    import pyarrow.flight
+    
+    
+    class FlightServer(pyarrow.flight.FlightServerBase):
+        def __init__(self, host="localhost", location=None,
+                     tls_certificates=None, verify_client=False,
+                     root_certificates=None, auth_handler=None):
+            super(FlightServer, self).__init__(
+                location, auth_handler, tls_certificates, verify_client,
+                root_certificates)
+            self.flights = {}
+    
+        @classmethod
+        def descriptor_to_key(self, descriptor):
+            return (descriptor.descriptor_type.value, descriptor.command,
+                    tuple(descriptor.path or tuple()))
+    
+        def do_put(self, context, descriptor, reader, writer):
+            key = FlightServer.descriptor_to_key(descriptor)
+            print(key)
+            self.flights[key] = reader.read_all()
+            print(self.flights[key])
+    
+    
+    def main():
+        parser = argparse.ArgumentParser()
+        parser.add_argument("--tls", nargs=2, default=None, metavar=('CERTFILE', 'KEYFILE'))
+        args = parser.parse_args()                                
+        tls_certificates = []
+    
+        scheme = "grpc+tls"
+        host = "localhost"
+        port = "5005"
+        
+        with open(args.tls[0], "rb") as cert_file:
+            tls_cert_chain = cert_file.read()
+        with open(args.tls[1], "rb") as key_file:
+            tls_private_key = key_file.read()
+    
+        tls_certificates.append((tls_cert_chain, tls_private_key))
+        
+        location = "{}://{}:{}".format(scheme, host, port)
+    
+        server = FlightServer(host, location,
+                              tls_certificates=tls_certificates)
+        print("Serving on", location)
+        server.serve()
+    
+    
+    if __name__ == '__main__':
+        main()
+
+Running the server, you should see ``Serving on grpc+tls://localhost:5005``.
+
+**Step 3 - Securely Connecting to the Server**
+Suppose we want to connect to the client and push some data to it. The following code securely sends information to the server using TLS encryption.
+
+.. testcode::
+    
+    import argparse
+    import pyarrow
+    import pyarrow.flight
+    import pandas as pd
+    
+    # Assumes incoming data object is a Pandas Dataframe
+    def push_to_server(name, data, client):
+        object_to_send = pyarrow.Table.from_pandas(data)
+        writer, _ = client.do_put(pyarrow.flight.FlightDescriptor.for_path(name), object_to_send.schema)
+        writer.write_table(object_to_send)
+        writer.close()
+    
+    def main():
+        parser = argparse.ArgumentParser()
+    
+        parser.add_argument('--tls-roots', default=None,
+                            help='Path to trusted TLS certificate(s)')
+        parser.add_argument('--host', default="localhost",
+                            help='Host endpoint')
+        parser.add_argument('--port', default=5005,
+                            help='Host port')
+        args = parser.parse_args()
+        kwargs = {}
+    
+        with open(args.tls_roots, "rb") as root_certs:
+            kwargs["tls_root_certs"] = root_certs.read()
+    
+        client = pyarrow.flight.FlightClient(f"grpc+tls://{args.host}:{args.port}", **kwargs)
+        data = {'Animal': ['Dog', 'Cat', 'Mouse'], 'Size': ['Big', 'Small', 'Tiny']}
+        df = pd.DataFrame(data, columns=['Animal', 'Size'])
+        push_to_server("AnimalData", df, client)
+    
+    if __name__ == '__main__':
+        try:
+            main()
+        except Exception as e:
+            print(e) 
+            
+Running the client script, you should see the server printing out information about the data it just received.
+
+.. _IBM article: https://www.ibm.com/docs/en/arl/9.7?topic=certification-extracting-certificate-keys-from-pfx-file
+.. _Windows: https://docs.microsoft.com/en-us/dotnet/core/additional-tools/self-signed-certificates-guide
+.. _Arrow testing data repository: https://github.com/apache/arrow-testing/tree/master/data/flight
+.. _openssl: https://www.ibm.com/docs/en/api-connect/2018.x?topic=overview-generating-self-signed-certificate-using-openssl
\ No newline at end of file