You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@doris.apache.org by GitBox <gi...@apache.org> on 2020/08/05 06:53:38 UTC
[GitHub] [incubator-doris] Youngwb edited a comment on issue #3930: [Proposal] Doris support version column for REPLACE aggregate type

Youngwb edited a comment on issue #3930:
URL: https://github.com/apache/incubator-doris/issues/3930#issuecomment-654853902


   According to @morningman @yangzhg 's suggest, I made some corrections
   ## name
   use `sequence` column instead of `version` column  for user understand easy.
   
   ## Create table
   Use UNIQUE_KEYS instead of AGG_KEYS. Because sequence column is a hidden column,  there is no need to create a `version` column with MAX AGG_TYPE.
   
   ```
   CREATE TABLE `test_1` (
   `pin_id` bigint(20) NOT NULL COMMENT "",
   `date` date NOT NULL COMMENT "",
   `group_id` bigint(20) NOT NULL COMMENT "",
   `keyword` varchar(128)  NOT NULL
   ) ENGINE=OLAP
   UNIQUE KEY(`pin_id`, `date`, `group_id`)
   PROPERTIES (
     "function_column.sequence_type" = "int"
   );
   ```
   like such example, user need to add `sequence_type` to Identify the sequence column type. It only support the Integer types (int, bigint, largeint) and  time types(date, datetime). User can't query the `sequence_column` hidden in table ,  but can add one column which value is equal to `sequence_column`.  like this
   ```
   CREATE TABLE `test_2` (
   `pin_id` bigint(20) NOT NULL COMMENT "",
   `date` date NOT NULL COMMENT "",
   `group_id` bigint(20) NOT NULL COMMENT "",
   `sequence_visiable` int NOT NULL,
   `keyword` varchar(128)  NOT NULL
   ) ENGINE=OLAP
   UNIQUE KEY(`pin_id`, `date`, `group_id`)
   PROPERTIES (
   "function_column.sequence_type" = "int"
   );
   ```
   Column names are not necessarily "sequence_visiable", this is just an example. The user ensures that the values are same by specifying parameters at LOAD time
   
   ## LOAD
   ### Stream Load
   `curl --location-trusted -u root -H "columns: pin_id,date,group_id,source_sequence,keyword"  -H "function_column.sequence_col: source_sequence" -T test_load  http://127.0.01:8030/api/test/test_1/_stream_load`
   
   ### Broker Load
   ```
   LOAD LABEL test.test11
   (
       DATA INFILE("hdfs://path/to/load_file")
       INTO TABLE `test_1`
       FORMAT AS "parquet"
       (pin_id,date,group_id,source_sequence,keyword)
       ORDER BY source_sequence
   ) with BROKER broker_name (...)
   ```
   
   ### Routine Load
   ```
   CREATE ROUTINE LOAD test_1_job ON test_1
   COLUMNS TERMINATED BY ",",
   (pin_id,date,group_id,source_sequence,keyword)
   ORDER BY source_sequence
   PROPERTIES
   (
       "desired_concurrent_number"="3",
       "max_batch_interval" = "30",
       "max_batch_rows" = "1000000",
       "max_batch_size" = "509715200"
   ) FROM KAFKA
   (
       "kafka_broker_list" = "...",
       "kafka_topic" = "...",
       "property.client.id" = "...",
       "property.group.id" = "..."
   );
   ```
   
   I added a parameter `sequence_col` to identify the source data for the sequence column at load,  because it's hidden column, user need to identify the source column in `columns_mapping`. 
   
   For table `test_2` which has column `sequence_visiable`, user can set "sequence_col" = "sequence_visiable" at properties,  which means the hidden column "sequence_col" is same with the "sequence_visiable" in table, user can query the the column "sequence_visiable" instead of "sequence_col"


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org