You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@kudu.apache.org by "Yingchun Lai (Jira)" <ji...@apache.org> on 2022/07/20 11:28:00 UTC
[jira] [Comment Edited] (KUDU-3353) Support setnx semantic on column

    [ https://issues.apache.org/jira/browse/KUDU-3353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17568976#comment-17568976 ] 

Yingchun Lai edited comment on KUDU-3353 at 7/20/22 11:27 AM:
--------------------------------------------------------------

Let me clarify some use cases:

A user profile table in Kudu has a column "first_login_ts", it represent the first login time to the website. The data in the table is upsert by user event log, the log contains user's id, some attributes, and "first_login_ts". The first_login_ts is filled by the log produced time, that means for a specified user, his/her event logs have a different (higher and higher) "first_login_ts", but only the first one could be set, and the following logs should not update it.

The same to columns such as sex, birthday, birthplace and etc.

 

If the table column supports "immutable" attribute, the new value in update/upsert ops will not be applied to the change list, we can gain the profits of faster read.

And in some cases without immutable attribute, we have to read the old value, compare with the new value, and then judge which value wins, it would be much cost. 

 

The updated design:

1. Add a column attribute to define a column as IMMUTABLE, means the column cell value can not be updated after it's been written during inserting the row.

2. Use UPDATE_IGNORE and add UPSERT_IGNORE, for UPDATE and UPSERT ops but ignore update-errors on IMMUTABLE columns.


was (Author: laiyingchun):
Let me clarify some use cases:

A user profile table in Kudu has a column "first_login_ts", it represent the first login time to the website. The data in the table is upsert by user event log, the log contains user's id, some attributes, and "first_login_ts". The first_login_ts is filled by the log produced time, that means for a specified user, his/her event logs have a different (higher and higher) "first_login_ts", but only the first one could be set, and the following logs should not update it.

 

The updated design:

1. Add a column attribute to define a column as IMMUTABLE, means the column cell value can not be updated after it's been written during inserting the row.

2. Use UPDATE_IGNORE and add UPSERT_IGNORE, for UPDATE and UPSERT ops but ignore update-errors on IMMUTABLE columns.

> Support setnx semantic on column
> --------------------------------
>
>                 Key: KUDU-3353
>                 URL: https://issues.apache.org/jira/browse/KUDU-3353
>             Project: Kudu
>          Issue Type: New Feature
>          Components: api, server
>            Reporter: Yingchun Lai
>            Assignee: Yingchun Lai
>            Priority: Major
>
> h1. motivation
> In some usage scenarios, Kudu table has a column with semantic of "create time", which means it represent the create timestamp of the row. The other columns have the similar semantic as before, for example, the user properties like age, address, and etc.
> Upstream and Kudu user doesn't know whether a row is exist or not, and every cell data is the lastest ingested from, for example, event stream.
> If without the "create time" column, Kudu user can use UPSERT operations to write data to the table, every columns with data will overwrite the old data. But if with the "create time" column, the cell data will be overwrote by the following UPSERT ops, which is not what we expect.
> To achive the goal, we have to read the column out to judge whether the column is NULL or not, if it's NULL, we can fill the row with the cell, if not NULL, we will drop it from the data before UPSERT, to avoid overwite "create time".
> It's expensive, is there a way to avoid a read from Kudu?
> h1. Resolvation
> We can implement column schema with semantic of "update if null". That means cell data in changelist will update the base data if the latter is NULL, and will ignore updates if it is not NULL.
> So we can use Kudu similarly as before, but only defined the column as "update if null" when create table or add column.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)