You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@devlake.apache.org by Jinglei Ren <ji...@merico.dev.INVALID> on 2022/06/15 02:18:45 UTC

Re: [discuss] team entity design => table name

I am changing the email title to branch out and avoid distracting your main thread. Right, this is not a big deal, so let’s conclude quickly.

You know, ambiguity can only be resolved by defining the concepts. Otherwise, `persons` do not help either. What I proposed was to just define `accounts` as your previous concept of persons or unified users. The example in your last email was a wrong use of the concept (such as in “we introduce `people` or `persons` or `unified users` to link those `accounts` together” – you still used `account` to refer to Git emails or duplicate Git users.).

Now let’s switch to the new definition of account. Then there can be two ways to handle a new commit email: (1) we can directly create a new account for it and then later merge it to another account if it is duplicate; (2) the commit emails are just modeled as `emails` or not linked to any account, and they are linked to accounts whenever they can.

Thanks,
Jinglei

From: Klesh Wong <kl...@apache.org>
Date: Tuesday, June 14, 2022 at 11:52 PM
To: dev@devlake.apache.org <de...@devlake.apache.org>
Subject: Re: [discuss] team entity design
I'm ok with any name as long as @Julien @Keon @Hezheng are ok with it.

As of `table.accounts`, I don't understand, how can it represents
`unified users` while it representing multiple accounts?

For example, we are collecting `commits` data by `gitextractor`, in
order to associate a specific `commit` to a specific account, what we
can do is creating an `account` with `commit.author_email` as PK.  But,
one might create commits with different email addresses, so we introduce
`people` or `persons` or `unified users` to link those `accounts` together.

Thanks,

Klesh Wong

On 6/14/22 21:27, Jinglei Ren wrote:
> Just a comment: `people` should better be `persons` to make it consistent with other plural names as well as `person_teams`, etc.
>
> I see the reasons for this name, but I am still against `people` or `persons` because our system should not model natural persons at all. In some sense, it cannot because you never know if it is a person or a dog :p The key point is that we should consider the concept itself, not just convenience of use.
>
> So, why not keep all types of user names as they are from different data sources and just add `table.accounts` to represent the standard/unified users?
>
> Thanks,
> Jinglei
>
> From: Klesh Wong <kl...@apache.org>
> Date: Monday, June 13, 2022 at 10:24 PM
> To: dev@devlake.apache.org <de...@devlake.apache.org>
> Subject: [discuss] team entity design
>   I meant to post the proposals of Team Entity Design to this mailing
> list, but too much graphical / table and code involved. So I posted it
> on
> https://github.com/apache/incubator-devlake/issues/1680#issuecomment-1153588720
> instead.
>
>    I suggest that every take a look, and either vote for whichever you
> like or propose your solution.
>
>
> Notice we have 2 TOPICS to decide:
>
>   1. How to aggregate commits by Natural Person, which is prefixed by
>      `proposal 1.x`
>   2. What should be the Primary Key of the `people` table, which is
>      prefixed by `proposal 2.x`
>
> Please reply this email with your favorite proposal options, like:
>
>
> +1 proposal 1.1
>
> +1 proposal 2.1
>
>
> PICK ONE OPTION FOR EACH TOPIC
>
> or, post your thoughts.
>
>
> Thanks
>
>
> Klesh Wong
>

Re: [discuss] team entity design => table name

Posted by Kaiyun Zhang <ka...@merico.dev.INVALID>.
We had a discussion about the naming of the tables. There were 2 proposal:
1. identities, unique_identities
2. accounts, users

I personally prefer the 2nd one.

If there are no strong objections from anyone, we will first use the naming for subsequent design and discussion.


> 2022年6月16日 上午9:05,Klesh Wong <kl...@apache.org> 写道:
> 
> I see, make sense to me.
> 
> I would like to use `user` as `unified identity` as well, as I said, we picked `person` over `user` because it is ambiguous, it can be used to refer `unified identity`, or `account` on some platforms, or whoever using Apache DevLake depends on the context.
> 
> I will talk to others and post the result here, thanks for you input, very helpful.
> 
> Best
> 
> Klesh Wong
> 
> On 6/16/22 00:39, Jinglei Ren wrote:
>> OK, if `user` and `account` have been exchanged, there is surely no reason to go back. But why did you need `person`? Now `user` is the “person” you referred to, right? If so, that’s totally fine.
>> 
>> 
>> With that said, let’s clarify the root cause for the mess. Otherwise, more mess will show up in the future. The root cause is less about naming but more about the model. And a high-quality model is so critical for a data product.
>> 
>> We should first define the model very well and then layer names on it – they are not necessarily two separate steps, but it is important to keep this principle in mind. Essentially, whatever names are fine. But you are right, to facilitate everyone to express, we should choose intuitive names.
>> 
>> Take what you said for example: “an `account` might have multiple `users` on one or multiple `platforms` vs. “a `user` might have multiple `accounts` on one or multiple `platforms`.” They are not opposite. They are the same model: A has multiple Bs on one or more platforms. The difference is just A/B is called user or account. I agree the naming can follow your convenience.
>> 
>> So, let me finally confirm with you that the following is the current model and there is no so-called `person` as what you referred to.
>>> 1. `A`: the unified identity on Apache DevLake.
>>> 2. `B`: a website (github.com/gitlab.com/etc...), or abstract domain (git repository, … and the only reliable identity for a git user is email)
>>> 3. `C`: a registration record to represent a user on a B, but an A may or may not map to multiple Cs on a specific B.
>>> Now, what we try to do here is to group those Cs by A… (take git author_email as an example, different emails can belong to one A).
>> If it is confirmed, I have no issue with the current design.
>> 
>> 
>> (A side note: With all the above said, if we could replay the history, I would vote for reserving the word user for platforms but using the word account for the unified identity of Apache DevLake. The mess in your expression was mainly due to unclear definitions and should be resolved in the right way. But I don’t argue for this now as you’ve already made the decision. As long as the model is good, naming per se should not cost more of our time.)
>> 
>> From: Klesh Wong <kl...@apache.org>
>> Date: Wednesday, June 15, 2022 at 10:53 PM
>> To: dev@devlake.apache.org <de...@devlake.apache.org>
>> Subject: Re: [discuss] team entity design => table name
>> I see, yeah, we all agreed that it was better to keep the `users` as it
>> was, and add another entity to represent `unified identity` couple days
>> back.
>> 
>> But it have caused mess during multiple discussions, many of us can't
>> even express himself including myself. so we gave up and agreed that it
>> is better to rename existing `users` to `accounts` for greater good.
>> 
>> The terms you defined, I think it would cause a much much bigger mess
>> for us to express our thoughts, especially myself... -_-!!!
>> 
>> Correct me if I'm wrong, By your definition, a `account` might have
>> multiple `users` on one or multiple `platforms`.
>> 
>> This is the opposite of my cognition: a `user` might have multiple
>> `accounts` on one or multiple `platforms`.
>> 
>> Another reason why we wanted to avoid using `user` is sometimes it
>> refers to the ones using Apache DevLake.
>> 
>> Does it make sense?
>> 
>> 
>> Thanks
>> 
>> Klesh Wong
>> 
>> On 6/15/22 21:14, Jinglei Ren wrote:
>>> The bad smell comes from “a living thing” which the system should not model.
>>> 
>>> We can follow most of your model but (1) merge `person` and `user` in your model and name it `account`; (2) rename the `account` in your model to `user`.
>>> 
>>> The reason for (2) is that, as mentioned in https://github.com/apache/incubator-devlake/issues/1680, “we thought of changing the existing table.users to table.accounts and adding a table.users to represent … natural people, but that will cause many changes in the code.” So, it is good to keep the word `user` for various platforms rather than introduce the `account` in your model.
>>> 
>>> All in all, we can use the new `account` concept and rephrase your model.
>>> 
>>> 1. `account`: the unified identity on Apache DevLake for collecting and analyzing data from different platforms.
>>> 2. `platform`: a website (github.com/gitlab.com/etc...), or abstract domain (git repository, … and the only reliable identity for a git user is email)
>>> 3. `user`: a registration record to represent a user on a `platform`, but an `account` may or may not map to multiple `users` on a specific platform.
>>>    (1) any `account` is always associated with a single user on a single platform (we don't need `account` table)
>>>    (2) some `account` is associated with one user on each of multiple platforms (we need `account` table)
>>>    (3) some `account` is associated with multiple users on multiple platforms (we need `account` table badly)
>>> Now, what we try to do here is to group those `users` by `account`… (take git author_email as
>>> an example, different emails can belong to one `account`).
>>> 
>>> You can see the refined model is simpler than your original one. So, to quickly form consensus, the decision point can be like this: (1) If the above refined model meets the requirements, my understanding should be correct and my irritation with `person` actually leads to better definitions. Then let’s go with it and we won’t spend more time on the word choice of `account`, for example. (2) If the above refined model doesn’t work or misses something, my understanding should be flawed so please just keep to your original model and `person` and ignore this thread.
>>> 
>>> Thanks,
>>> Jinglei
>>> 
>>> From: Klesh Wong <kl...@apache.org>
>>> Date: Wednesday, June 15, 2022 at 2:30 PM
>>> To: dev@devlake.apache.org <de...@devlake.apache.org>
>>> Subject: Re: [discuss] team entity design => table name
>>> Let's bare with existing terms a little bit longer, I don't buy your
>>> definition of `account` just yet. Here is why:
>>> 
>>>   1. `person`: a Living Thing (Human, Dog, or Alien)
>>>   2. `user`: a `person` who is using Apache DevLake to collect and
>>>      analyze DevOps data
>>>   3. `platform`: a website(github.com/gitlab.com/etc...), or abstract
>>>      domain(git repository, it can be cloned to different
>>>      machines/websites, but somehow we treat them the same git repo, and
>>>      the only reliable identity for `person` is email)
>>>   4. `account`: a registration record to represent a `person` on a
>>>      `platform`, but a `person` may or may not have multiple `accounts`
>>>      on a specific platform.
>>>       1. one `person` register on one platform one time and use it
>>>          forever (we don't need `person` table)
>>>       2. one `person` register on multiple platforms one time each and
>>>          use them forever (we need `person` table)
>>>       3. one `person` register on multiple platform multiple time each
>>>          and use some of them (we need `person` table badly)
>>> 
>>> Now, what we try to do here is to group those `accounts` by `person`,
>>> thus, "introduced `person`", and we don't have enough clues to figure
>>> out who is who across multiple platforms, even worst, we can't even
>>> figure out who is who for a specific platform (take git author_email as
>>> an example, different email can belong to one `person`).
>>> 
>>> So, most of us agreed the best way to solve the problem is to aggregate
>>> all those accounts from different platforms into one table named
>>> `accounts`, and then, let `user` connect them to `persons`
>>> 
>>> Hope that explains the situation here.
>>> 
>>> 
>>> Ok, would you mind explaining your idea of how to address the problem by
>>> using only a single table?
>>> 
>>> 
>>> Thanks
>>> 
>>> Klesh Wong
>>> 
>>> On 6/15/22 10:18, Jinglei Ren wrote:
>>>> I am changing the email title to branch out and avoid distracting your main thread. Right, this is not a big deal, so let’s conclude quickly.
>>>> 
>>>> You know, ambiguity can only be resolved by defining the concepts. Otherwise, `persons` do not help either. What I proposed was to just define `accounts` as your previous concept of persons or unified users. The example in your last email was a wrong use of the concept (such as in “we introduce `people` or `persons` or `unified users` to link those `accounts` together” – you still used `account` to refer to Git emails or duplicate Git users.).
>>>> 
>>>> Now let’s switch to the new definition of account. Then there can be two ways to handle a new commit email: (1) we can directly create a new account for it and then later merge it to another account if it is duplicate; (2) the commit emails are just modeled as `emails` or not linked to any account, and they are linked to accounts whenever they can.
>>>> 
>>>> Thanks,
>>>> Jinglei
>>>> 
>>>> From: Klesh Wong<kl...@apache.org>
>>>> Date: Tuesday, June 14, 2022 at 11:52 PM
>>>> To:dev@devlake.apache.org  <de...@devlake.apache.org>
>>>> Subject: Re: [discuss] team entity design
>>>> I'm ok with any name as long as @Julien @Keon @Hezheng are ok with it.
>>>> 
>>>> As of `table.accounts`, I don't understand, how can it represents
>>>> `unified users` while it representing multiple accounts?
>>>> 
>>>> For example, we are collecting `commits` data by `gitextractor`, in
>>>> order to associate a specific `commit` to a specific account, what we
>>>> can do is creating an `account` with `commit.author_email` as PK.  But,
>>>> one might create commits with different email addresses, so we introduce
>>>> `people` or `persons` or `unified users` to link those `accounts` together.
>>>> 
>>>> Thanks,
>>>> 
>>>> Klesh Wong
>>>> 
>>>> On 6/14/22 21:27, Jinglei Ren wrote:
>>>>> Just a comment: `people` should better be `persons` to make it consistent with other plural names as well as `person_teams`, etc.
>>>>> 
>>>>> I see the reasons for this name, but I am still against `people` or `persons` because our system should not model natural persons at all. In some sense, it cannot because you never know if it is a person or a dog :p The key point is that we should consider the concept itself, not just convenience of use.
>>>>> 
>>>>> So, why not keep all types of user names as they are from different data sources and just add `table.accounts` to represent the standard/unified users?
>>>>> 
>>>>> Thanks,
>>>>> Jinglei
>>>>> 
>>>>> From: Klesh Wong<kl...@apache.org>
>>>>> Date: Monday, June 13, 2022 at 10:24 PM
>>>>> To:dev@devlake.apache.org  <de...@devlake.apache.org>
>>>>> Subject: [discuss] team entity design
>>>>>     I meant to post the proposals of Team Entity Design to this mailing
>>>>> list, but too much graphical / table and code involved. So I posted it
>>>>> on
>>>>> https://github.com/apache/incubator-devlake/issues/1680#issuecomment-1153588720
>>>>> instead.
>>>>> 
>>>>>      I suggest that every take a look, and either vote for whichever you
>>>>> like or propose your solution.
>>>>> 
>>>>> 
>>>>> Notice we have 2 TOPICS to decide:
>>>>> 
>>>>>     1. How to aggregate commits by Natural Person, which is prefixed by
>>>>>        `proposal 1.x`
>>>>>     2. What should be the Primary Key of the `people` table, which is
>>>>>        prefixed by `proposal 2.x`
>>>>> 
>>>>> Please reply this email with your favorite proposal options, like:
>>>>> 
>>>>> 
>>>>> +1 proposal 1.1
>>>>> 
>>>>> +1 proposal 2.1
>>>>> 
>>>>> 
>>>>> PICK ONE OPTION FOR EACH TOPIC
>>>>> 
>>>>> or, post your thoughts.
>>>>> 
>>>>> 
>>>>> Thanks
>>>>> 
>>>>> 
>>>>> Klesh Wong
>>>>> 


Re: [discuss] team entity design => table name

Posted by Klesh Wong <kl...@apache.org>.
I see, make sense to me.

I would like to use `user` as `unified identity` as well, as I said, we 
picked `person` over `user` because it is ambiguous, it can be used to 
refer `unified identity`, or `account` on some platforms, or whoever 
using Apache DevLake depends on the context.

I will talk to others and post the result here, thanks for you input, 
very helpful.

Best

Klesh Wong

On 6/16/22 00:39, Jinglei Ren wrote:
> OK, if `user` and `account` have been exchanged, there is surely no reason to go back. But why did you need `person`? Now `user` is the “person” you referred to, right? If so, that’s totally fine.
>
>
> With that said, let’s clarify the root cause for the mess. Otherwise, more mess will show up in the future. The root cause is less about naming but more about the model. And a high-quality model is so critical for a data product.
>
> We should first define the model very well and then layer names on it – they are not necessarily two separate steps, but it is important to keep this principle in mind. Essentially, whatever names are fine. But you are right, to facilitate everyone to express, we should choose intuitive names.
>
> Take what you said for example: “an `account` might have multiple `users` on one or multiple `platforms` vs. “a `user` might have multiple `accounts` on one or multiple `platforms`.” They are not opposite. They are the same model: A has multiple Bs on one or more platforms. The difference is just A/B is called user or account. I agree the naming can follow your convenience.
>
> So, let me finally confirm with you that the following is the current model and there is no so-called `person` as what you referred to.
>> 1. `A`: the unified identity on Apache DevLake.
>> 2. `B`: a website (github.com/gitlab.com/etc...), or abstract domain (git repository, … and the only reliable identity for a git user is email)
>> 3. `C`: a registration record to represent a user on a B, but an A may or may not map to multiple Cs on a specific B.
>> Now, what we try to do here is to group those Cs by A… (take git author_email as an example, different emails can belong to one A).
> If it is confirmed, I have no issue with the current design.
>
>
> (A side note: With all the above said, if we could replay the history, I would vote for reserving the word user for platforms but using the word account for the unified identity of Apache DevLake. The mess in your expression was mainly due to unclear definitions and should be resolved in the right way. But I don’t argue for this now as you’ve already made the decision. As long as the model is good, naming per se should not cost more of our time.)
>
> From: Klesh Wong <kl...@apache.org>
> Date: Wednesday, June 15, 2022 at 10:53 PM
> To: dev@devlake.apache.org <de...@devlake.apache.org>
> Subject: Re: [discuss] team entity design => table name
> I see, yeah, we all agreed that it was better to keep the `users` as it
> was, and add another entity to represent `unified identity` couple days
> back.
>
> But it have caused mess during multiple discussions, many of us can't
> even express himself including myself. so we gave up and agreed that it
> is better to rename existing `users` to `accounts` for greater good.
>
> The terms you defined, I think it would cause a much much bigger mess
> for us to express our thoughts, especially myself... -_-!!!
>
> Correct me if I'm wrong, By your definition, a `account` might have
> multiple `users` on one or multiple `platforms`.
>
> This is the opposite of my cognition: a `user` might have multiple
> `accounts` on one or multiple `platforms`.
>
> Another reason why we wanted to avoid using `user` is sometimes it
> refers to the ones using Apache DevLake.
>
> Does it make sense?
>
>
> Thanks
>
> Klesh Wong
>
> On 6/15/22 21:14, Jinglei Ren wrote:
>> The bad smell comes from “a living thing” which the system should not model.
>>
>> We can follow most of your model but (1) merge `person` and `user` in your model and name it `account`; (2) rename the `account` in your model to `user`.
>>
>> The reason for (2) is that, as mentioned in https://github.com/apache/incubator-devlake/issues/1680, “we thought of changing the existing table.users to table.accounts and adding a table.users to represent … natural people, but that will cause many changes in the code.” So, it is good to keep the word `user` for various platforms rather than introduce the `account` in your model.
>>
>> All in all, we can use the new `account` concept and rephrase your model.
>>
>> 1. `account`: the unified identity on Apache DevLake for collecting and analyzing data from different platforms.
>> 2. `platform`: a website (github.com/gitlab.com/etc...), or abstract domain (git repository, … and the only reliable identity for a git user is email)
>> 3. `user`: a registration record to represent a user on a `platform`, but an `account` may or may not map to multiple `users` on a specific platform.
>>     (1) any `account` is always associated with a single user on a single platform (we don't need `account` table)
>>     (2) some `account` is associated with one user on each of multiple platforms (we need `account` table)
>>     (3) some `account` is associated with multiple users on multiple platforms (we need `account` table badly)
>> Now, what we try to do here is to group those `users` by `account`… (take git author_email as
>> an example, different emails can belong to one `account`).
>>
>> You can see the refined model is simpler than your original one. So, to quickly form consensus, the decision point can be like this: (1) If the above refined model meets the requirements, my understanding should be correct and my irritation with `person` actually leads to better definitions. Then let’s go with it and we won’t spend more time on the word choice of `account`, for example. (2) If the above refined model doesn’t work or misses something, my understanding should be flawed so please just keep to your original model and `person` and ignore this thread.
>>
>> Thanks,
>> Jinglei
>>
>> From: Klesh Wong <kl...@apache.org>
>> Date: Wednesday, June 15, 2022 at 2:30 PM
>> To: dev@devlake.apache.org <de...@devlake.apache.org>
>> Subject: Re: [discuss] team entity design => table name
>> Let's bare with existing terms a little bit longer, I don't buy your
>> definition of `account` just yet. Here is why:
>>
>>    1. `person`: a Living Thing (Human, Dog, or Alien)
>>    2. `user`: a `person` who is using Apache DevLake to collect and
>>       analyze DevOps data
>>    3. `platform`: a website(github.com/gitlab.com/etc...), or abstract
>>       domain(git repository, it can be cloned to different
>>       machines/websites, but somehow we treat them the same git repo, and
>>       the only reliable identity for `person` is email)
>>    4. `account`: a registration record to represent a `person` on a
>>       `platform`, but a `person` may or may not have multiple `accounts`
>>       on a specific platform.
>>        1. one `person` register on one platform one time and use it
>>           forever (we don't need `person` table)
>>        2. one `person` register on multiple platforms one time each and
>>           use them forever (we need `person` table)
>>        3. one `person` register on multiple platform multiple time each
>>           and use some of them (we need `person` table badly)
>>
>> Now, what we try to do here is to group those `accounts` by `person`,
>> thus, "introduced `person`", and we don't have enough clues to figure
>> out who is who across multiple platforms, even worst, we can't even
>> figure out who is who for a specific platform (take git author_email as
>> an example, different email can belong to one `person`).
>>
>> So, most of us agreed the best way to solve the problem is to aggregate
>> all those accounts from different platforms into one table named
>> `accounts`, and then, let `user` connect them to `persons`
>>
>> Hope that explains the situation here.
>>
>>
>> Ok, would you mind explaining your idea of how to address the problem by
>> using only a single table?
>>
>>
>> Thanks
>>
>> Klesh Wong
>>
>> On 6/15/22 10:18, Jinglei Ren wrote:
>>> I am changing the email title to branch out and avoid distracting your main thread. Right, this is not a big deal, so let’s conclude quickly.
>>>
>>> You know, ambiguity can only be resolved by defining the concepts. Otherwise, `persons` do not help either. What I proposed was to just define `accounts` as your previous concept of persons or unified users. The example in your last email was a wrong use of the concept (such as in “we introduce `people` or `persons` or `unified users` to link those `accounts` together” – you still used `account` to refer to Git emails or duplicate Git users.).
>>>
>>> Now let’s switch to the new definition of account. Then there can be two ways to handle a new commit email: (1) we can directly create a new account for it and then later merge it to another account if it is duplicate; (2) the commit emails are just modeled as `emails` or not linked to any account, and they are linked to accounts whenever they can.
>>>
>>> Thanks,
>>> Jinglei
>>>
>>> From: Klesh Wong<kl...@apache.org>
>>> Date: Tuesday, June 14, 2022 at 11:52 PM
>>> To:dev@devlake.apache.org  <de...@devlake.apache.org>
>>> Subject: Re: [discuss] team entity design
>>> I'm ok with any name as long as @Julien @Keon @Hezheng are ok with it.
>>>
>>> As of `table.accounts`, I don't understand, how can it represents
>>> `unified users` while it representing multiple accounts?
>>>
>>> For example, we are collecting `commits` data by `gitextractor`, in
>>> order to associate a specific `commit` to a specific account, what we
>>> can do is creating an `account` with `commit.author_email` as PK.  But,
>>> one might create commits with different email addresses, so we introduce
>>> `people` or `persons` or `unified users` to link those `accounts` together.
>>>
>>> Thanks,
>>>
>>> Klesh Wong
>>>
>>> On 6/14/22 21:27, Jinglei Ren wrote:
>>>> Just a comment: `people` should better be `persons` to make it consistent with other plural names as well as `person_teams`, etc.
>>>>
>>>> I see the reasons for this name, but I am still against `people` or `persons` because our system should not model natural persons at all. In some sense, it cannot because you never know if it is a person or a dog :p The key point is that we should consider the concept itself, not just convenience of use.
>>>>
>>>> So, why not keep all types of user names as they are from different data sources and just add `table.accounts` to represent the standard/unified users?
>>>>
>>>> Thanks,
>>>> Jinglei
>>>>
>>>> From: Klesh Wong<kl...@apache.org>
>>>> Date: Monday, June 13, 2022 at 10:24 PM
>>>> To:dev@devlake.apache.org  <de...@devlake.apache.org>
>>>> Subject: [discuss] team entity design
>>>>      I meant to post the proposals of Team Entity Design to this mailing
>>>> list, but too much graphical / table and code involved. So I posted it
>>>> on
>>>> https://github.com/apache/incubator-devlake/issues/1680#issuecomment-1153588720
>>>> instead.
>>>>
>>>>       I suggest that every take a look, and either vote for whichever you
>>>> like or propose your solution.
>>>>
>>>>
>>>> Notice we have 2 TOPICS to decide:
>>>>
>>>>      1. How to aggregate commits by Natural Person, which is prefixed by
>>>>         `proposal 1.x`
>>>>      2. What should be the Primary Key of the `people` table, which is
>>>>         prefixed by `proposal 2.x`
>>>>
>>>> Please reply this email with your favorite proposal options, like:
>>>>
>>>>
>>>> +1 proposal 1.1
>>>>
>>>> +1 proposal 2.1
>>>>
>>>>
>>>> PICK ONE OPTION FOR EACH TOPIC
>>>>
>>>> or, post your thoughts.
>>>>
>>>>
>>>> Thanks
>>>>
>>>>
>>>> Klesh Wong
>>>>

Re: [discuss] team entity design => table name

Posted by Jinglei Ren <ji...@merico.dev.INVALID>.
OK, if `user` and `account` have been exchanged, there is surely no reason to go back. But why did you need `person`? Now `user` is the “person” you referred to, right? If so, that’s totally fine.


With that said, let’s clarify the root cause for the mess. Otherwise, more mess will show up in the future. The root cause is less about naming but more about the model. And a high-quality model is so critical for a data product.

We should first define the model very well and then layer names on it – they are not necessarily two separate steps, but it is important to keep this principle in mind. Essentially, whatever names are fine. But you are right, to facilitate everyone to express, we should choose intuitive names.

Take what you said for example: “an `account` might have multiple `users` on one or multiple `platforms` vs. “a `user` might have multiple `accounts` on one or multiple `platforms`.” They are not opposite. They are the same model: A has multiple Bs on one or more platforms. The difference is just A/B is called user or account. I agree the naming can follow your convenience.

So, let me finally confirm with you that the following is the current model and there is no so-called `person` as what you referred to.
> 1. `A`: the unified identity on Apache DevLake.
> 2. `B`: a website (github.com/gitlab.com/etc...), or abstract domain (git repository, … and the only reliable identity for a git user is email)
> 3. `C`: a registration record to represent a user on a B, but an A may or may not map to multiple Cs on a specific B.
> Now, what we try to do here is to group those Cs by A… (take git author_email as an example, different emails can belong to one A).

If it is confirmed, I have no issue with the current design.


(A side note: With all the above said, if we could replay the history, I would vote for reserving the word user for platforms but using the word account for the unified identity of Apache DevLake. The mess in your expression was mainly due to unclear definitions and should be resolved in the right way. But I don’t argue for this now as you’ve already made the decision. As long as the model is good, naming per se should not cost more of our time.)

From: Klesh Wong <kl...@apache.org>
Date: Wednesday, June 15, 2022 at 10:53 PM
To: dev@devlake.apache.org <de...@devlake.apache.org>
Subject: Re: [discuss] team entity design => table name
I see, yeah, we all agreed that it was better to keep the `users` as it
was, and add another entity to represent `unified identity` couple days
back.

But it have caused mess during multiple discussions, many of us can't
even express himself including myself. so we gave up and agreed that it
is better to rename existing `users` to `accounts` for greater good.

The terms you defined, I think it would cause a much much bigger mess
for us to express our thoughts, especially myself... -_-!!!

Correct me if I'm wrong, By your definition, a `account` might have
multiple `users` on one or multiple `platforms`.

This is the opposite of my cognition: a `user` might have multiple
`accounts` on one or multiple `platforms`.

Another reason why we wanted to avoid using `user` is sometimes it
refers to the ones using Apache DevLake.

Does it make sense?


Thanks

Klesh Wong

On 6/15/22 21:14, Jinglei Ren wrote:
> The bad smell comes from “a living thing” which the system should not model.
>
> We can follow most of your model but (1) merge `person` and `user` in your model and name it `account`; (2) rename the `account` in your model to `user`.
>
> The reason for (2) is that, as mentioned in https://github.com/apache/incubator-devlake/issues/1680, “we thought of changing the existing table.users to table.accounts and adding a table.users to represent … natural people, but that will cause many changes in the code.” So, it is good to keep the word `user` for various platforms rather than introduce the `account` in your model.
>
> All in all, we can use the new `account` concept and rephrase your model.
>
> 1. `account`: the unified identity on Apache DevLake for collecting and analyzing data from different platforms.
> 2. `platform`: a website (github.com/gitlab.com/etc...), or abstract domain (git repository, … and the only reliable identity for a git user is email)
> 3. `user`: a registration record to represent a user on a `platform`, but an `account` may or may not map to multiple `users` on a specific platform.
>    (1) any `account` is always associated with a single user on a single platform (we don't need `account` table)
>    (2) some `account` is associated with one user on each of multiple platforms (we need `account` table)
>    (3) some `account` is associated with multiple users on multiple platforms (we need `account` table badly)
> Now, what we try to do here is to group those `users` by `account`… (take git author_email as
> an example, different emails can belong to one `account`).
>
> You can see the refined model is simpler than your original one. So, to quickly form consensus, the decision point can be like this: (1) If the above refined model meets the requirements, my understanding should be correct and my irritation with `person` actually leads to better definitions. Then let’s go with it and we won’t spend more time on the word choice of `account`, for example. (2) If the above refined model doesn’t work or misses something, my understanding should be flawed so please just keep to your original model and `person` and ignore this thread.
>
> Thanks,
> Jinglei
>
> From: Klesh Wong <kl...@apache.org>
> Date: Wednesday, June 15, 2022 at 2:30 PM
> To: dev@devlake.apache.org <de...@devlake.apache.org>
> Subject: Re: [discuss] team entity design => table name
> Let's bare with existing terms a little bit longer, I don't buy your
> definition of `account` just yet. Here is why:
>
>   1. `person`: a Living Thing (Human, Dog, or Alien)
>   2. `user`: a `person` who is using Apache DevLake to collect and
>      analyze DevOps data
>   3. `platform`: a website(github.com/gitlab.com/etc...), or abstract
>      domain(git repository, it can be cloned to different
>      machines/websites, but somehow we treat them the same git repo, and
>      the only reliable identity for `person` is email)
>   4. `account`: a registration record to represent a `person` on a
>      `platform`, but a `person` may or may not have multiple `accounts`
>      on a specific platform.
>       1. one `person` register on one platform one time and use it
>          forever (we don't need `person` table)
>       2. one `person` register on multiple platforms one time each and
>          use them forever (we need `person` table)
>       3. one `person` register on multiple platform multiple time each
>          and use some of them (we need `person` table badly)
>
> Now, what we try to do here is to group those `accounts` by `person`,
> thus, "introduced `person`", and we don't have enough clues to figure
> out who is who across multiple platforms, even worst, we can't even
> figure out who is who for a specific platform (take git author_email as
> an example, different email can belong to one `person`).
>
> So, most of us agreed the best way to solve the problem is to aggregate
> all those accounts from different platforms into one table named
> `accounts`, and then, let `user` connect them to `persons`
>
> Hope that explains the situation here.
>
>
> Ok, would you mind explaining your idea of how to address the problem by
> using only a single table?
>
>
> Thanks
>
> Klesh Wong
>
> On 6/15/22 10:18, Jinglei Ren wrote:
>> I am changing the email title to branch out and avoid distracting your main thread. Right, this is not a big deal, so let’s conclude quickly.
>>
>> You know, ambiguity can only be resolved by defining the concepts. Otherwise, `persons` do not help either. What I proposed was to just define `accounts` as your previous concept of persons or unified users. The example in your last email was a wrong use of the concept (such as in “we introduce `people` or `persons` or `unified users` to link those `accounts` together” – you still used `account` to refer to Git emails or duplicate Git users.).
>>
>> Now let’s switch to the new definition of account. Then there can be two ways to handle a new commit email: (1) we can directly create a new account for it and then later merge it to another account if it is duplicate; (2) the commit emails are just modeled as `emails` or not linked to any account, and they are linked to accounts whenever they can.
>>
>> Thanks,
>> Jinglei
>>
>> From: Klesh Wong<kl...@apache.org>
>> Date: Tuesday, June 14, 2022 at 11:52 PM
>> To:dev@devlake.apache.org  <de...@devlake.apache.org>
>> Subject: Re: [discuss] team entity design
>> I'm ok with any name as long as @Julien @Keon @Hezheng are ok with it.
>>
>> As of `table.accounts`, I don't understand, how can it represents
>> `unified users` while it representing multiple accounts?
>>
>> For example, we are collecting `commits` data by `gitextractor`, in
>> order to associate a specific `commit` to a specific account, what we
>> can do is creating an `account` with `commit.author_email` as PK.  But,
>> one might create commits with different email addresses, so we introduce
>> `people` or `persons` or `unified users` to link those `accounts` together.
>>
>> Thanks,
>>
>> Klesh Wong
>>
>> On 6/14/22 21:27, Jinglei Ren wrote:
>>> Just a comment: `people` should better be `persons` to make it consistent with other plural names as well as `person_teams`, etc.
>>>
>>> I see the reasons for this name, but I am still against `people` or `persons` because our system should not model natural persons at all. In some sense, it cannot because you never know if it is a person or a dog :p The key point is that we should consider the concept itself, not just convenience of use.
>>>
>>> So, why not keep all types of user names as they are from different data sources and just add `table.accounts` to represent the standard/unified users?
>>>
>>> Thanks,
>>> Jinglei
>>>
>>> From: Klesh Wong<kl...@apache.org>
>>> Date: Monday, June 13, 2022 at 10:24 PM
>>> To:dev@devlake.apache.org  <de...@devlake.apache.org>
>>> Subject: [discuss] team entity design
>>>     I meant to post the proposals of Team Entity Design to this mailing
>>> list, but too much graphical / table and code involved. So I posted it
>>> on
>>> https://github.com/apache/incubator-devlake/issues/1680#issuecomment-1153588720
>>> instead.
>>>
>>>      I suggest that every take a look, and either vote for whichever you
>>> like or propose your solution.
>>>
>>>
>>> Notice we have 2 TOPICS to decide:
>>>
>>>     1. How to aggregate commits by Natural Person, which is prefixed by
>>>        `proposal 1.x`
>>>     2. What should be the Primary Key of the `people` table, which is
>>>        prefixed by `proposal 2.x`
>>>
>>> Please reply this email with your favorite proposal options, like:
>>>
>>>
>>> +1 proposal 1.1
>>>
>>> +1 proposal 2.1
>>>
>>>
>>> PICK ONE OPTION FOR EACH TOPIC
>>>
>>> or, post your thoughts.
>>>
>>>
>>> Thanks
>>>
>>>
>>> Klesh Wong
>>>

Re: [discuss] team entity design => table name

Posted by Klesh Wong <kl...@apache.org>.
I see, yeah, we all agreed that it was better to keep the `users` as it 
was, and add another entity to represent `unified identity` couple days 
back.

But it have caused mess during multiple discussions, many of us can't 
even express himself including myself. so we gave up and agreed that it 
is better to rename existing `users` to `accounts` for greater good.

The terms you defined, I think it would cause a much much bigger mess 
for us to express our thoughts, especially myself... -_-!!!

Correct me if I'm wrong, By your definition, a `account` might have 
multiple `users` on one or multiple `platforms`.

This is the opposite of my cognition: a `user` might have multiple 
`accounts` on one or multiple `platforms`.

Another reason why we wanted to avoid using `user` is sometimes it 
refers to the ones using Apache DevLake.

Does it make sense?


Thanks

Klesh Wong

On 6/15/22 21:14, Jinglei Ren wrote:
> The bad smell comes from “a living thing” which the system should not model.
>
> We can follow most of your model but (1) merge `person` and `user` in your model and name it `account`; (2) rename the `account` in your model to `user`.
>
> The reason for (2) is that, as mentioned in https://github.com/apache/incubator-devlake/issues/1680, “we thought of changing the existing table.users to table.accounts and adding a table.users to represent … natural people, but that will cause many changes in the code.” So, it is good to keep the word `user` for various platforms rather than introduce the `account` in your model.
>
> All in all, we can use the new `account` concept and rephrase your model.
>
> 1. `account`: the unified identity on Apache DevLake for collecting and analyzing data from different platforms.
> 2. `platform`: a website (github.com/gitlab.com/etc...), or abstract domain (git repository, … and the only reliable identity for a git user is email)
> 3. `user`: a registration record to represent a user on a `platform`, but an `account` may or may not map to multiple `users` on a specific platform.
>    (1) any `account` is always associated with a single user on a single platform (we don't need `account` table)
>    (2) some `account` is associated with one user on each of multiple platforms (we need `account` table)
>    (3) some `account` is associated with multiple users on multiple platforms (we need `account` table badly)
> Now, what we try to do here is to group those `users` by `account`… (take git author_email as
> an example, different emails can belong to one `account`).
>
> You can see the refined model is simpler than your original one. So, to quickly form consensus, the decision point can be like this: (1) If the above refined model meets the requirements, my understanding should be correct and my irritation with `person` actually leads to better definitions. Then let’s go with it and we won’t spend more time on the word choice of `account`, for example. (2) If the above refined model doesn’t work or misses something, my understanding should be flawed so please just keep to your original model and `person` and ignore this thread.
>
> Thanks,
> Jinglei
>
> From: Klesh Wong <kl...@apache.org>
> Date: Wednesday, June 15, 2022 at 2:30 PM
> To: dev@devlake.apache.org <de...@devlake.apache.org>
> Subject: Re: [discuss] team entity design => table name
> Let's bare with existing terms a little bit longer, I don't buy your
> definition of `account` just yet. Here is why:
>
>   1. `person`: a Living Thing (Human, Dog, or Alien)
>   2. `user`: a `person` who is using Apache DevLake to collect and
>      analyze DevOps data
>   3. `platform`: a website(github.com/gitlab.com/etc...), or abstract
>      domain(git repository, it can be cloned to different
>      machines/websites, but somehow we treat them the same git repo, and
>      the only reliable identity for `person` is email)
>   4. `account`: a registration record to represent a `person` on a
>      `platform`, but a `person` may or may not have multiple `accounts`
>      on a specific platform.
>       1. one `person` register on one platform one time and use it
>          forever (we don't need `person` table)
>       2. one `person` register on multiple platforms one time each and
>          use them forever (we need `person` table)
>       3. one `person` register on multiple platform multiple time each
>          and use some of them (we need `person` table badly)
>
> Now, what we try to do here is to group those `accounts` by `person`,
> thus, "introduced `person`", and we don't have enough clues to figure
> out who is who across multiple platforms, even worst, we can't even
> figure out who is who for a specific platform (take git author_email as
> an example, different email can belong to one `person`).
>
> So, most of us agreed the best way to solve the problem is to aggregate
> all those accounts from different platforms into one table named
> `accounts`, and then, let `user` connect them to `persons`
>
> Hope that explains the situation here.
>
>
> Ok, would you mind explaining your idea of how to address the problem by
> using only a single table?
>
>
> Thanks
>
> Klesh Wong
>
> On 6/15/22 10:18, Jinglei Ren wrote:
>> I am changing the email title to branch out and avoid distracting your main thread. Right, this is not a big deal, so let’s conclude quickly.
>>
>> You know, ambiguity can only be resolved by defining the concepts. Otherwise, `persons` do not help either. What I proposed was to just define `accounts` as your previous concept of persons or unified users. The example in your last email was a wrong use of the concept (such as in “we introduce `people` or `persons` or `unified users` to link those `accounts` together” – you still used `account` to refer to Git emails or duplicate Git users.).
>>
>> Now let’s switch to the new definition of account. Then there can be two ways to handle a new commit email: (1) we can directly create a new account for it and then later merge it to another account if it is duplicate; (2) the commit emails are just modeled as `emails` or not linked to any account, and they are linked to accounts whenever they can.
>>
>> Thanks,
>> Jinglei
>>
>> From: Klesh Wong<kl...@apache.org>
>> Date: Tuesday, June 14, 2022 at 11:52 PM
>> To:dev@devlake.apache.org  <de...@devlake.apache.org>
>> Subject: Re: [discuss] team entity design
>> I'm ok with any name as long as @Julien @Keon @Hezheng are ok with it.
>>
>> As of `table.accounts`, I don't understand, how can it represents
>> `unified users` while it representing multiple accounts?
>>
>> For example, we are collecting `commits` data by `gitextractor`, in
>> order to associate a specific `commit` to a specific account, what we
>> can do is creating an `account` with `commit.author_email` as PK.  But,
>> one might create commits with different email addresses, so we introduce
>> `people` or `persons` or `unified users` to link those `accounts` together.
>>
>> Thanks,
>>
>> Klesh Wong
>>
>> On 6/14/22 21:27, Jinglei Ren wrote:
>>> Just a comment: `people` should better be `persons` to make it consistent with other plural names as well as `person_teams`, etc.
>>>
>>> I see the reasons for this name, but I am still against `people` or `persons` because our system should not model natural persons at all. In some sense, it cannot because you never know if it is a person or a dog :p The key point is that we should consider the concept itself, not just convenience of use.
>>>
>>> So, why not keep all types of user names as they are from different data sources and just add `table.accounts` to represent the standard/unified users?
>>>
>>> Thanks,
>>> Jinglei
>>>
>>> From: Klesh Wong<kl...@apache.org>
>>> Date: Monday, June 13, 2022 at 10:24 PM
>>> To:dev@devlake.apache.org  <de...@devlake.apache.org>
>>> Subject: [discuss] team entity design
>>>     I meant to post the proposals of Team Entity Design to this mailing
>>> list, but too much graphical / table and code involved. So I posted it
>>> on
>>> https://github.com/apache/incubator-devlake/issues/1680#issuecomment-1153588720
>>> instead.
>>>
>>>      I suggest that every take a look, and either vote for whichever you
>>> like or propose your solution.
>>>
>>>
>>> Notice we have 2 TOPICS to decide:
>>>
>>>     1. How to aggregate commits by Natural Person, which is prefixed by
>>>        `proposal 1.x`
>>>     2. What should be the Primary Key of the `people` table, which is
>>>        prefixed by `proposal 2.x`
>>>
>>> Please reply this email with your favorite proposal options, like:
>>>
>>>
>>> +1 proposal 1.1
>>>
>>> +1 proposal 2.1
>>>
>>>
>>> PICK ONE OPTION FOR EACH TOPIC
>>>
>>> or, post your thoughts.
>>>
>>>
>>> Thanks
>>>
>>>
>>> Klesh Wong
>>>

Re: [discuss] team entity design => table name

Posted by Jinglei Ren <ji...@merico.dev.INVALID>.
The bad smell comes from “a living thing” which the system should not model.

We can follow most of your model but (1) merge `person` and `user` in your model and name it `account`; (2) rename the `account` in your model to `user`.

The reason for (2) is that, as mentioned in https://github.com/apache/incubator-devlake/issues/1680, “we thought of changing the existing table.users to table.accounts and adding a table.users to represent … natural people, but that will cause many changes in the code.” So, it is good to keep the word `user` for various platforms rather than introduce the `account` in your model.

All in all, we can use the new `account` concept and rephrase your model.

1. `account`: the unified identity on Apache DevLake for collecting and analyzing data from different platforms.
2. `platform`: a website (github.com/gitlab.com/etc...), or abstract domain (git repository, … and the only reliable identity for a git user is email)
3. `user`: a registration record to represent a user on a `platform`, but an `account` may or may not map to multiple `users` on a specific platform.
  (1) any `account` is always associated with a single user on a single platform (we don't need `account` table)
  (2) some `account` is associated with one user on each of multiple platforms (we need `account` table)
  (3) some `account` is associated with multiple users on multiple platforms (we need `account` table badly)
Now, what we try to do here is to group those `users` by `account`… (take git author_email as
an example, different emails can belong to one `account`).

You can see the refined model is simpler than your original one. So, to quickly form consensus, the decision point can be like this: (1) If the above refined model meets the requirements, my understanding should be correct and my irritation with `person` actually leads to better definitions. Then let’s go with it and we won’t spend more time on the word choice of `account`, for example. (2) If the above refined model doesn’t work or misses something, my understanding should be flawed so please just keep to your original model and `person` and ignore this thread.

Thanks,
Jinglei

From: Klesh Wong <kl...@apache.org>
Date: Wednesday, June 15, 2022 at 2:30 PM
To: dev@devlake.apache.org <de...@devlake.apache.org>
Subject: Re: [discuss] team entity design => table name
Let's bare with existing terms a little bit longer, I don't buy your
definition of `account` just yet. Here is why:

 1. `person`: a Living Thing (Human, Dog, or Alien)
 2. `user`: a `person` who is using Apache DevLake to collect and
    analyze DevOps data
 3. `platform`: a website(github.com/gitlab.com/etc...), or abstract
    domain(git repository, it can be cloned to different
    machines/websites, but somehow we treat them the same git repo, and
    the only reliable identity for `person` is email)
 4. `account`: a registration record to represent a `person` on a
    `platform`, but a `person` may or may not have multiple `accounts`
    on a specific platform.
     1. one `person` register on one platform one time and use it
        forever (we don't need `person` table)
     2. one `person` register on multiple platforms one time each and
        use them forever (we need `person` table)
     3. one `person` register on multiple platform multiple time each
        and use some of them (we need `person` table badly)

Now, what we try to do here is to group those `accounts` by `person`,
thus, "introduced `person`", and we don't have enough clues to figure
out who is who across multiple platforms, even worst, we can't even
figure out who is who for a specific platform (take git author_email as
an example, different email can belong to one `person`).

So, most of us agreed the best way to solve the problem is to aggregate
all those accounts from different platforms into one table named
`accounts`, and then, let `user` connect them to `persons`

Hope that explains the situation here.


Ok, would you mind explaining your idea of how to address the problem by
using only a single table?


Thanks

Klesh Wong

On 6/15/22 10:18, Jinglei Ren wrote:
> I am changing the email title to branch out and avoid distracting your main thread. Right, this is not a big deal, so let’s conclude quickly.
>
> You know, ambiguity can only be resolved by defining the concepts. Otherwise, `persons` do not help either. What I proposed was to just define `accounts` as your previous concept of persons or unified users. The example in your last email was a wrong use of the concept (such as in “we introduce `people` or `persons` or `unified users` to link those `accounts` together” – you still used `account` to refer to Git emails or duplicate Git users.).
>
> Now let’s switch to the new definition of account. Then there can be two ways to handle a new commit email: (1) we can directly create a new account for it and then later merge it to another account if it is duplicate; (2) the commit emails are just modeled as `emails` or not linked to any account, and they are linked to accounts whenever they can.
>
> Thanks,
> Jinglei
>
> From: Klesh Wong<kl...@apache.org>
> Date: Tuesday, June 14, 2022 at 11:52 PM
> To:dev@devlake.apache.org  <de...@devlake.apache.org>
> Subject: Re: [discuss] team entity design
> I'm ok with any name as long as @Julien @Keon @Hezheng are ok with it.
>
> As of `table.accounts`, I don't understand, how can it represents
> `unified users` while it representing multiple accounts?
>
> For example, we are collecting `commits` data by `gitextractor`, in
> order to associate a specific `commit` to a specific account, what we
> can do is creating an `account` with `commit.author_email` as PK.  But,
> one might create commits with different email addresses, so we introduce
> `people` or `persons` or `unified users` to link those `accounts` together.
>
> Thanks,
>
> Klesh Wong
>
> On 6/14/22 21:27, Jinglei Ren wrote:
>> Just a comment: `people` should better be `persons` to make it consistent with other plural names as well as `person_teams`, etc.
>>
>> I see the reasons for this name, but I am still against `people` or `persons` because our system should not model natural persons at all. In some sense, it cannot because you never know if it is a person or a dog :p The key point is that we should consider the concept itself, not just convenience of use.
>>
>> So, why not keep all types of user names as they are from different data sources and just add `table.accounts` to represent the standard/unified users?
>>
>> Thanks,
>> Jinglei
>>
>> From: Klesh Wong<kl...@apache.org>
>> Date: Monday, June 13, 2022 at 10:24 PM
>> To:dev@devlake.apache.org  <de...@devlake.apache.org>
>> Subject: [discuss] team entity design
>>    I meant to post the proposals of Team Entity Design to this mailing
>> list, but too much graphical / table and code involved. So I posted it
>> on
>> https://github.com/apache/incubator-devlake/issues/1680#issuecomment-1153588720
>> instead.
>>
>>     I suggest that every take a look, and either vote for whichever you
>> like or propose your solution.
>>
>>
>> Notice we have 2 TOPICS to decide:
>>
>>    1. How to aggregate commits by Natural Person, which is prefixed by
>>       `proposal 1.x`
>>    2. What should be the Primary Key of the `people` table, which is
>>       prefixed by `proposal 2.x`
>>
>> Please reply this email with your favorite proposal options, like:
>>
>>
>> +1 proposal 1.1
>>
>> +1 proposal 2.1
>>
>>
>> PICK ONE OPTION FOR EACH TOPIC
>>
>> or, post your thoughts.
>>
>>
>> Thanks
>>
>>
>> Klesh Wong
>>

Re: [discuss] team entity design => table name

Posted by Klesh Wong <kl...@apache.org>.
Let's bare with existing terms a little bit longer, I don't buy your 
definition of `account` just yet. Here is why:

 1. `person`: a Living Thing (Human, Dog, or Alien)
 2. `user`: a `person` who is using Apache DevLake to collect and
    analyze DevOps data
 3. `platform`: a website(github.com/gitlab.com/etc...), or abstract
    domain(git repository, it can be cloned to different
    machines/websites, but somehow we treat them the same git repo, and
    the only reliable identity for `person` is email)
 4. `account`: a registration record to represent a `person` on a
    `platform`, but a `person` may or may not have multiple `accounts`
    on a specific platform.
     1. one `person` register on one platform one time and use it
        forever (we don't need `person` table)
     2. one `person` register on multiple platforms one time each and
        use them forever (we need `person` table)
     3. one `person` register on multiple platform multiple time each
        and use some of them (we need `person` table badly)

Now, what we try to do here is to group those `accounts` by `person`, 
thus, "introduced `person`", and we don't have enough clues to figure 
out who is who across multiple platforms, even worst, we can't even 
figure out who is who for a specific platform (take git author_email as 
an example, different email can belong to one `person`).

So, most of us agreed the best way to solve the problem is to aggregate 
all those accounts from different platforms into one table named 
`accounts`, and then, let `user` connect them to `persons`

Hope that explains the situation here.


Ok, would you mind explaining your idea of how to address the problem by 
using only a single table?


Thanks

Klesh Wong

On 6/15/22 10:18, Jinglei Ren wrote:
> I am changing the email title to branch out and avoid distracting your main thread. Right, this is not a big deal, so let’s conclude quickly.
>
> You know, ambiguity can only be resolved by defining the concepts. Otherwise, `persons` do not help either. What I proposed was to just define `accounts` as your previous concept of persons or unified users. The example in your last email was a wrong use of the concept (such as in “we introduce `people` or `persons` or `unified users` to link those `accounts` together” – you still used `account` to refer to Git emails or duplicate Git users.).
>
> Now let’s switch to the new definition of account. Then there can be two ways to handle a new commit email: (1) we can directly create a new account for it and then later merge it to another account if it is duplicate; (2) the commit emails are just modeled as `emails` or not linked to any account, and they are linked to accounts whenever they can.
>
> Thanks,
> Jinglei
>
> From: Klesh Wong<kl...@apache.org>
> Date: Tuesday, June 14, 2022 at 11:52 PM
> To:dev@devlake.apache.org  <de...@devlake.apache.org>
> Subject: Re: [discuss] team entity design
> I'm ok with any name as long as @Julien @Keon @Hezheng are ok with it.
>
> As of `table.accounts`, I don't understand, how can it represents
> `unified users` while it representing multiple accounts?
>
> For example, we are collecting `commits` data by `gitextractor`, in
> order to associate a specific `commit` to a specific account, what we
> can do is creating an `account` with `commit.author_email` as PK.  But,
> one might create commits with different email addresses, so we introduce
> `people` or `persons` or `unified users` to link those `accounts` together.
>
> Thanks,
>
> Klesh Wong
>
> On 6/14/22 21:27, Jinglei Ren wrote:
>> Just a comment: `people` should better be `persons` to make it consistent with other plural names as well as `person_teams`, etc.
>>
>> I see the reasons for this name, but I am still against `people` or `persons` because our system should not model natural persons at all. In some sense, it cannot because you never know if it is a person or a dog :p The key point is that we should consider the concept itself, not just convenience of use.
>>
>> So, why not keep all types of user names as they are from different data sources and just add `table.accounts` to represent the standard/unified users?
>>
>> Thanks,
>> Jinglei
>>
>> From: Klesh Wong<kl...@apache.org>
>> Date: Monday, June 13, 2022 at 10:24 PM
>> To:dev@devlake.apache.org  <de...@devlake.apache.org>
>> Subject: [discuss] team entity design
>>    I meant to post the proposals of Team Entity Design to this mailing
>> list, but too much graphical / table and code involved. So I posted it
>> on
>> https://github.com/apache/incubator-devlake/issues/1680#issuecomment-1153588720
>> instead.
>>
>>     I suggest that every take a look, and either vote for whichever you
>> like or propose your solution.
>>
>>
>> Notice we have 2 TOPICS to decide:
>>
>>    1. How to aggregate commits by Natural Person, which is prefixed by
>>       `proposal 1.x`
>>    2. What should be the Primary Key of the `people` table, which is
>>       prefixed by `proposal 2.x`
>>
>> Please reply this email with your favorite proposal options, like:
>>
>>
>> +1 proposal 1.1
>>
>> +1 proposal 2.1
>>
>>
>> PICK ONE OPTION FOR EACH TOPIC
>>
>> or, post your thoughts.
>>
>>
>> Thanks
>>
>>
>> Klesh Wong
>>