You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pulsar.apache.org by GitBox <gi...@apache.org> on 2020/08/27 09:02:37 UTC

[GitHub] [pulsar] hnail commented on issue #4747: [pulsar-sql] Support arrays and maps

hnail commented on issue #4747:
URL: https://github.com/apache/pulsar/issues/4747#issuecomment-681822007


   The reason is same is [issues-7652](https://github.com/apache/pulsar/issues/7652) : 
   
   1.  [PulsarMetadata.getColumns()](https://github.com/apache/pulsar/blob/master/pulsar-sql/presto-pulsar/src/main/java/org/apache/pulsar/sql/presto/PulsarMetadata.java#L468) , nested field is dissociate with presto ParameterizedType in  TypeManager . nested field should be Row type in presto (reference `Hive struct type support`  https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Types#LanguageManualTypes-ComplexTypes)
   2.  SchemaHandler is hard to work with [RecordCursor.getObject()](https://github.com/apache/pulsar/blob/master/pulsar-sql/presto-pulsar/src/main/java/org/apache/pulsar/sql/presto/PulsarRecordCursor.java#L557) to support `ROW`,`MAP`,`ARRAY` .etc
   
   so , I haved do a big reconsitution Local Branch ,the main  change is
   
   - PulsarMetadata sociated with presto TypeManager 
   -  Deprecate `SchemaHandler` , migrate to `presto-record-decoder` with a bit of extension 
   - decoupled pulsar-presto main module ( RecordSet,ConnectorMetadata .etc ) with `org.apache.avro.Schema`-> coupled with `org.apache.pulsar.common.schema.SchemaInfo `, aim to friendly with other schema type ( `PB` 、`thrift` etc..)
   
   I accomplished this code and test on my local environment ,@sijie Is anyone else doing same thing  ?
   
   
   ```
    presto> show create table pulsar."test-tenant/test-namespace".avroata;
   
    CREATE TABLE pulsar."test-tenant/test-namespace".avroata (
       name varchar COMMENT '["null","string"]',
       age integer COMMENT '"int"',
       childrens array(varchar) COMMENT '["null",{"type":"array","items":"string","java-class":"java.util.List"}]',
       teachers map(varchar, varchar) COMMENT '["null",{"type":"map","values":"string"}]',
       parent ROW(father varchar, mother varchar) COMMENT '["null",{"type":"record","name":"Parent","namespace":"com.hnail.pulsar.AvroGen"
       __partition__ integer COMMENT 'The partition number which the message belongs to',
       __event_time__ timestamp(3) COMMENT 'Application defined timestamp in milliseconds of when the event occurred',
       __publish_time__ timestamp(3) COMMENT 'The timestamp in milliseconds of when event as published',
       __message_id__ varchar COMMENT 'The message ID of the message used to generate this row',
       __sequence_id__ bigint COMMENT 'The sequence ID of the message used to generate this row',
       __producer_name__ varchar COMMENT 'The name of the producer that publish the message used to generate this row',
       __key__ varchar COMMENT 'The partition key for the topic',
       __properties__ varchar COMMENT 'User defined properties'
    )
   (1 row)
   
   Query 20200826_083759_00000_neuwa, FINISHED, 1 node
   Splits: 1 total, 1 done (100.00%)
   9.18 [0 rows, 0B] [0 rows/s, 0B/s]
   
   presto> select * from pulsar."test-tenant/test-namespace".avroata limit 3;
      name   | age |     childrens     |                   teachers                   |              parent              | __partition__ |
   ----------+-----+-------------------+----------------------------------------------+----------------------------------+---------------+
    Student1 |  23 | [zhangsan, lisi]  | {yuwen=yuwen_value, shuxue=shuxue_value}     | {father=father1, mother=mother1} |             2 |
    Student2 |  55 | [wangwu, fengliu] | {shuxue2=shuxue2_value, yuwen2=yuwen2_value} | {father=father2, mother=mother2} |             2 |
    Student1 |  23 | [zhangsan, lisi]  | {yuwen=yuwen_value, shuxue=shuxue_value}     | {father=father1, mother=mother1} |             0 |
   (3 rows)
   
   presto> select childrens[1],teachers['yuwen'] from pulsar."test-tenant/test-namespace".avroata limit 1;
     _col0   |    _col1
   ----------+-------------
    zhangsan | yuwen_value
   (1 row)
   
   Query 20200826_114004_00004_kz734, FINISHED, 1 node
   
   Query 20200826_083759_00000_neuwa, FINISHED, 1 node
   Splits: 1 total, 1 done (100.00%)
   9.18 [0 rows, 0B] [0 rows/s, 0B/s]
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org