You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@beam.apache.org by "Vasu Gupta (Jira)" <ji...@apache.org> on 2020/08/28 16:21:00 UTC

[jira] [Created] (BEAM-10832) ClickhouseIO's getTableSchema method is called before Pipeline Starts

Vasu Gupta created BEAM-10832:
---------------------------------

             Summary: ClickhouseIO's getTableSchema method is called before Pipeline Starts
                 Key: BEAM-10832
                 URL: https://issues.apache.org/jira/browse/BEAM-10832
             Project: Beam
          Issue Type: Improvement
          Components: beam-model
    Affects Versions: 2.23.0
            Reporter: Vasu Gupta
             Fix For: Not applicable


A method in ClickhouseIO called {color:#172b4d}getTableSchema() is being used in WriteFn's expand method which is called even before the Pipeline is started. The main issue is that getTableSchema() makes a connection with Clickhouse and if at the time of just pipeline launch, if i can't connect to a clickhouse-server, the pipeline won't even start. Let's suppose there is a clickhouse server deployed on a production server, now if i want to launch a DataFlow pipeline from my local then i shouldn't be requiring a working connection to clickhouse-server from my local environment (but i should be able to connect to clickhouse-server from dataflow).{color}

 

{color:#172b4d}What i suggest:{color}

{color:#172b4d}getTableSchema() should be a singleton method and must be called in setup() method (instead of PTransform's expand method) of DoFn since setup method is called after the pipeline is started (In my case "at DataFlow" not local){color}

 

I would be more than happy to work on this improvement in Apache Beam (Java).

 

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)