You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Chris Dunderdale (Jira)" <ji...@apache.org> on 2022/04/01 10:49:00 UTC

[jira] [Updated] (ARROW-16090) Unable to connect to flight server in container using self-signed certificate

     [ https://issues.apache.org/jira/browse/ARROW-16090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Chris Dunderdale updated ARROW-16090:
-------------------------------------
    Description: 
hi 

I'm busy trying to build a python Arrow server on a docker container. The rationale for moving it into a container is to isolate components of my program so if there's an exception/performance issue where something gobbles all the memory I'm able to quickly kill the container without bringing down the entire program.

The problem I'm having is connecting a client on my local to the server in the container. I'm not sure if it's a certificate issue /grpc issue/arrow server config issue. Going to break down what I've done below. Any help would be appreciated :)
 # Grabbed the [arrow python server|https://github.com/apache/arrow/tree/master/python/examples/flight] from the github repo.
 # Since I want to implement secure communication I'll need a certificate - self-signed should be fine for development. Generate development certificate using dotnet dev-certs. After trusting certificate, export it using cmd in windows.

{code:java}
dotnet dev-certs https --trust
dotnet dev-certs https -ep "test.pfx" -p testpassword{code}
1. My understanding is that the Arrow server only accepts .crt and .key files for public private key. I used WSL and SSL to convert the pfx file using this article from [IBM.|https://www.ibm.com/docs/en/arl/9.7?topic=certification-extracting-certificate-keys-from-pfx-file]

2. Placing the public and private key in the same folder as my server script - I adjust the code as follows to not need to pass things in via args.
{code:java}
scheme = "grpc+tls"        

with open("testPublicKey.crt", "rb") as cert_file:
     tls_cert_chain = cert_file.read()        
with open("testPrivateKey.key", "rb") as key_file:            
    tls_private_key = key_file.read()        

tls_certificates.append((tls_cert_chain, tls_private_key)) {code}
My client code is a slimmed-down version of the one on the repo as a test I want to push some dummy data into the server.
{code:java}
import pyarrow
import pyarrow.flight
import pandas as pd# Assumes that data is a Dataframe

def pushToServer(name, data, client):
    objectToSend = pyarrow.Table.from_pandas(data)
    writer, _ = client.do_put(pyarrow.flight.FlightDescriptor.for_path(name), objectToSend.schema)
    writer.write_table(objectToSend)
    writer.close()

def getClient():
    return pyarrow.flight.FlightClient("grpc+tcp://localhost:5005")
def main():
    client = getClient()
    data = {'Country': ['Belgium', 'India', 'Brazil'],            'Capital': ['Brussels', 'New Delhi', 'Brasilia'],            'Population': [11190846, 1303171035, 207847528]}    df = pd.DataFrame(data, columns=['Country', 'Capital', 'Population'])
    pushToServer("PredictedValues", df, client)if __name__ == '__main__':
    try:
        main()
    except Exception as e:
        print(e) {code}
3. Running this on my local machine is fine- now I want to move the server into the container. I set up the docker file in the same folder as server script. See below (I know it's not ideal, but it does the job)
{code:java}
FROM mcr.microsoft.com/dotnet/sdk
EXPOSE 5005
COPY server.py /home{code}
build the image  and run the container as below
{code:java}
docker build -t test .
docker run -it -p 5005:5005 test{code}
4. In the container, I quickly get python and pyarrow installed and then start the server
{code:java}
apt-get update
apt-get install python3.10 python3-pip
pip install pyarrow
//start server time
cd home
python3 server.py
//responds with "Serving on grpc+tls://localhost:5005"{code}
5. Since the ports are mapped when we started the container, I rerun the client on my local and I'm greeted with this error on the client end.
{code:java}
gRPC returned unavailable error, with message: failed to connect to all addresses. Client context: IOError: Could not write record batch to stream. Detail: Internal. gRPC client debug context: {"created":"@1648805430.279000000","description":"Failed to pick subchannel","file":"C:\vcpkg\buildtrees\grpc\src\2180080eb4-87c05d756b.clean\src\core\ext\filters\client_channel\client_channel.cc","file_line":3159,"referenced_errors":[{"created":"@1648805430.279000000","description":"failed to connect to all addresses","file":"C:\vcpkg\buildtrees\grpc\src\2180080eb4-87c05d756b.clean\src\core\lib\transport\error_utils.cc","file_line":147,"grpc_status":14}]}. Additionally, could not finish writing record batches before closing {code}
Putting a try-catch on the server-side doesn't provide any more info, unfortunately.

I've already ruled out that I might have a dodgy certificate. I've used the same certificate to set up a basic C# kestrel server in a container using HTTPS. I've also tried the above using a C# server with the same issue.

Is there any obvious I'm missing in the config? I haven't found any examples where people use certificates with pyarrow, so a bit at a loss.

 

 

  was:
hi 

I'm busy trying to build a python Arrow server on a docker container. The rationale for moving it into a container is to isolate components of my program so if there's an exception/performance issue where something gobbles all the memory I'm able to quickly kill the container without bringing down the entire program.

The problem I'm having is connecting a client on my local to the server in the container. I'm not sure if it's a certificate issue /grpc issue/arrow server config issue. Going to break down what I've done below. Any help would be appreciated :)
 # Grabbed the [arrow python server|https://github.com/apache/arrow/tree/master/python/examples/flight] from the github repo.
 # Since I want to implement secure communication I'll need a a certificate - self-signed should be fine for development. Generate development certificate using dotnet dev-certs. After trusting certificate, export it using cmd in windows.

{code:java}
dotnet dev-certs https --trust
dotnet dev-certs https -ep "test.pfx" -p testpassword{code}

 # My understanding is that the Arrow server only accepts .crt and .key files for public private key. I used WSL and SSL to convert the pfx file using this article from [IBM.|https://www.ibm.com/docs/en/arl/9.7?topic=certification-extracting-certificate-keys-from-pfx-file]
 # Placing the public and private key in the same folder as my server script - I adjust the code as follows to not need to pass things in via args.

{code:java}
scheme = "grpc+tls"        

with open("testPublicKey.crt", "rb") as cert_file:
     tls_cert_chain = cert_file.read()        
with open("testPrivateKey.key", "rb") as key_file:            
    tls_private_key = key_file.read()        

tls_certificates.append((tls_cert_chain, tls_private_key)) {code}

 # My client code is a slimmed down version of the one on the repo as a test I want to push some dummy data into the server.


{code:java}
import pyarrow
import pyarrow.flight
import pandas as pd# Assumes that data is a Dataframe

def pushToServer(name, data, client):
    objectToSend = pyarrow.Table.from_pandas(data)
    writer, _ = client.do_put(pyarrow.flight.FlightDescriptor.for_path(name), objectToSend.schema)
    writer.write_table(objectToSend)
    writer.close()

def getClient():
    return pyarrow.flight.FlightClient("grpc+tcp://localhost:5005")
def main():
    client = getClient()
    data = {'Country': ['Belgium', 'India', 'Brazil'],            'Capital': ['Brussels', 'New Delhi', 'Brasilia'],            'Population': [11190846, 1303171035, 207847528]}    df = pd.DataFrame(data, columns=['Country', 'Capital', 'Population'])
    pushToServer("PredictedValues", df, client)if __name__ == '__main__':
    try:
        main()
    except Exception as e:
        print(e) {code}

 # Running this on my local machine is fine- now I want to move the server into the container. I set up the docker file in the same folder as server script. See below (I know it's not ideal, but it does the job)

{code:java}
FROM mcr.microsoft.com/dotnet/sdk
EXPOSE 5005
COPY server.py /home{code}
build the image  and run the container as below

{code:java}
docker build -t test .
docker run -it -p 5005:5005 test{code}

 # In the container, I quickly get python and pyarrow installed and then start the server

{code:java}
apt-get update
apt-get install python3.10 python3-pip
pip install pyarrow
//start server time
cd home
python3 server.py
//responds with "Serving on grpc+tls://localhost:5005"{code}

 # Since the ports are mapped when we started the container, I rerun the client on my local and I'm greeted with this error on the client end.

{code:java}
gRPC returned unavailable error, with message: failed to connect to all addresses. Client context: IOError: Could not write record batch to stream. Detail: Internal. gRPC client debug context: {"created":"@1648805430.279000000","description":"Failed to pick subchannel","file":"C:\vcpkg\buildtrees\grpc\src\2180080eb4-87c05d756b.clean\src\core\ext\filters\client_channel\client_channel.cc","file_line":3159,"referenced_errors":[{"created":"@1648805430.279000000","description":"failed to connect to all addresses","file":"C:\vcpkg\buildtrees\grpc\src\2180080eb4-87c05d756b.clean\src\core\lib\transport\error_utils.cc","file_line":147,"grpc_status":14}]}. Additionally, could not finish writing record batches before closing {code}
 Putting a try catch on the server side doesn't provide any more info unfortunately.

I've already ruled out that I might have a dodgy certificate. I've used the same certificate to set up a basic C# kestrel server in a container using HTTPS. I've also tried the above using a C# server with the same issue.

Is there any obvious I'm missing in the config? I haven't found any examples where people use certificates with pyarrow, so a bit at a loss.

 

 


> Unable to connect to flight server in container using self-signed certificate
> -----------------------------------------------------------------------------
>
>                 Key: ARROW-16090
>                 URL: https://issues.apache.org/jira/browse/ARROW-16090
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: FlightRPC, Python
>    Affects Versions: 7.0.0
>            Reporter: Chris Dunderdale
>            Priority: Blocker
>
> hi 
> I'm busy trying to build a python Arrow server on a docker container. The rationale for moving it into a container is to isolate components of my program so if there's an exception/performance issue where something gobbles all the memory I'm able to quickly kill the container without bringing down the entire program.
> The problem I'm having is connecting a client on my local to the server in the container. I'm not sure if it's a certificate issue /grpc issue/arrow server config issue. Going to break down what I've done below. Any help would be appreciated :)
>  # Grabbed the [arrow python server|https://github.com/apache/arrow/tree/master/python/examples/flight] from the github repo.
>  # Since I want to implement secure communication I'll need a certificate - self-signed should be fine for development. Generate development certificate using dotnet dev-certs. After trusting certificate, export it using cmd in windows.
> {code:java}
> dotnet dev-certs https --trust
> dotnet dev-certs https -ep "test.pfx" -p testpassword{code}
> 1. My understanding is that the Arrow server only accepts .crt and .key files for public private key. I used WSL and SSL to convert the pfx file using this article from [IBM.|https://www.ibm.com/docs/en/arl/9.7?topic=certification-extracting-certificate-keys-from-pfx-file]
> 2. Placing the public and private key in the same folder as my server script - I adjust the code as follows to not need to pass things in via args.
> {code:java}
> scheme = "grpc+tls"        
> with open("testPublicKey.crt", "rb") as cert_file:
>      tls_cert_chain = cert_file.read()        
> with open("testPrivateKey.key", "rb") as key_file:            
>     tls_private_key = key_file.read()        
> tls_certificates.append((tls_cert_chain, tls_private_key)) {code}
> My client code is a slimmed-down version of the one on the repo as a test I want to push some dummy data into the server.
> {code:java}
> import pyarrow
> import pyarrow.flight
> import pandas as pd# Assumes that data is a Dataframe
> def pushToServer(name, data, client):
>     objectToSend = pyarrow.Table.from_pandas(data)
>     writer, _ = client.do_put(pyarrow.flight.FlightDescriptor.for_path(name), objectToSend.schema)
>     writer.write_table(objectToSend)
>     writer.close()
> def getClient():
>     return pyarrow.flight.FlightClient("grpc+tcp://localhost:5005")
> def main():
>     client = getClient()
>     data = {'Country': ['Belgium', 'India', 'Brazil'],            'Capital': ['Brussels', 'New Delhi', 'Brasilia'],            'Population': [11190846, 1303171035, 207847528]}    df = pd.DataFrame(data, columns=['Country', 'Capital', 'Population'])
>     pushToServer("PredictedValues", df, client)if __name__ == '__main__':
>     try:
>         main()
>     except Exception as e:
>         print(e) {code}
> 3. Running this on my local machine is fine- now I want to move the server into the container. I set up the docker file in the same folder as server script. See below (I know it's not ideal, but it does the job)
> {code:java}
> FROM mcr.microsoft.com/dotnet/sdk
> EXPOSE 5005
> COPY server.py /home{code}
> build the image  and run the container as below
> {code:java}
> docker build -t test .
> docker run -it -p 5005:5005 test{code}
> 4. In the container, I quickly get python and pyarrow installed and then start the server
> {code:java}
> apt-get update
> apt-get install python3.10 python3-pip
> pip install pyarrow
> //start server time
> cd home
> python3 server.py
> //responds with "Serving on grpc+tls://localhost:5005"{code}
> 5. Since the ports are mapped when we started the container, I rerun the client on my local and I'm greeted with this error on the client end.
> {code:java}
> gRPC returned unavailable error, with message: failed to connect to all addresses. Client context: IOError: Could not write record batch to stream. Detail: Internal. gRPC client debug context: {"created":"@1648805430.279000000","description":"Failed to pick subchannel","file":"C:\vcpkg\buildtrees\grpc\src\2180080eb4-87c05d756b.clean\src\core\ext\filters\client_channel\client_channel.cc","file_line":3159,"referenced_errors":[{"created":"@1648805430.279000000","description":"failed to connect to all addresses","file":"C:\vcpkg\buildtrees\grpc\src\2180080eb4-87c05d756b.clean\src\core\lib\transport\error_utils.cc","file_line":147,"grpc_status":14}]}. Additionally, could not finish writing record batches before closing {code}
> Putting a try-catch on the server-side doesn't provide any more info, unfortunately.
> I've already ruled out that I might have a dodgy certificate. I've used the same certificate to set up a basic C# kestrel server in a container using HTTPS. I've also tried the above using a C# server with the same issue.
> Is there any obvious I'm missing in the config? I haven't found any examples where people use certificates with pyarrow, so a bit at a loss.
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)