You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pulsar.apache.org by GitBox <gi...@apache.org> on 2022/09/10 18:18:30 UTC

[GitHub] [pulsar-client-go] tuky191 commented on issue #546: go client producer + schema breaks Pulsar SQL

tuky191 commented on issue #546:
URL: https://github.com/apache/pulsar-client-go/issues/546#issuecomment-1242780920

   While working on the project, I've also encountered this problem. Compiled a pulsar (2.7.2) from source so I could debug the sql-worker. 
   
   When using the json schema, go client marshals the Pulsar's message Value into json([]byte) and sends it to pulsar. The names of the columns are decided at this point based on the json tags (if present) or the names of struct's data fields(if not). 
   
   Your schema has to reflect this. In other words, if your column's name is 'id' after marshalling, it HAS to be also 'id' in the AVRO schema. If it would be 'ID' as in  [schema_test](https://github.com/apache/pulsar-client-go/blob/master/pulsar/schema_test.go) then this discrepancy breaks the presto-sql queries.
   
   
   ### Not working
   ```sh
   type testJSON struct {
   	ID   int    `json:"id"`
   	Name string `json:"name"`
   }
    exampleSchemaDef = "{\"type\":\"record\",\"name\":\"Example\",\"namespace\":\"test\"," +
   		"\"fields\":[{\"name\":\"ID\",\"type\":\"int\"},{\"name\":\"Name\",\"type\":\"string\"}]}"
   ```
   
   ```sh
   presto> select * from pulsar."public/default".goJson;
     id  | name | __partition__ |     __event_time__      |    __publish_time__     | __message_id__ | __sequence_id__ | __producer_name__ | __key_>
   ------+------+---------------+-------------------------+-------------------------+----------------+-----------------+-------------------+------->
    NULL | NULL |            -1 | 2339-03-21 22:18:14.838 | 2022-09-10 17:42:06.748 | (10,0,0)       |               0 | standalone-0-0    | NULL  >
   (1 row)
   
   Query 20220910_174314_00000_zqxmh, FINISHED, 1 node
   Splits: 18 total, 18 done (100.00%)
   0:02 [1 rows, 90B] [0 rows/s, 46B/s]
   ```
   ### Working 
   
   ```sh
   type testJSON struct {
   	ID   int    `json:"id"`
   	Name string `json:"name"`
   }
   
   exampleSchemaDef = "{\"type\":\"record\",\"name\":\"Example\",\"namespace\":\"test\"," +
   		"\"fields\":[{\"name\":\"id\",\"type\":\"int\"},{\"name\":\"name\",\"type\":\"string\"}]}"
   ```
   
   ```sh
   presto> select * from pulsar."public/default".go_json;
    id  |  name  | __partition__ |     __event_time__      |    __publish_time__     | __message_id__ | __sequence_id__ | __producer_name__ | __key>
   -----+--------+---------------+-------------------------+-------------------------+----------------+-----------------+-------------------+------>
    120 | pulsar |            -1 | 2339-03-21 22:18:14.838 | 2022-09-10 17:45:37.752 | (13,0,0)       |               0 | standalone-0-3    | NULL >
    120 | pulsar |            -1 | 2339-03-21 22:18:14.838 | 2022-09-10 17:45:40.231 | (13,1,0)       |               0 | standalone-0-4    | NULL >
   (2 rows)
   
   Query 20220910_174546_00002_zqxmh, FINISHED, 1 node
   Splits: 18 total, 18 done (100.00%)
   0:00 [2 rows, 270B] [5 rows/s, 707B/s]
   ```
   
   Verified this works also on the pulsar:latest (2.10.1). 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org