You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@devlake.apache.org by GitBox <gi...@apache.org> on 2022/06/08 09:08:34 UTC

[GitHub] [incubator-devlake] lukasgomez opened a new issue, #2121: [Bug] [Database] author_id column in commits table of database contains an email instead of an id

lukasgomez opened a new issue, #2121:
URL: https://github.com/apache/incubator-devlake/issues/2121

   ### Search before asking
   
   - [X] I had searched in the [issues](https://github.com/apache/incubator-devlake/issues?q=is%3Aissue) and found no similar issues.
   
   
   ### What happened
   
   Exploring the structure of the database and the content of the tables once we scanned few repositories, we realized that the values of the column `author_id` in the table commits, contains the same value of column `author_email`. We compared this with another table and in pull_requests table there is a column also called `author_id`, but its content is the unique GitHub id of the user that created the pull request. 
   
   
   
   ### What you expected to happen
   
   We expected  that the content of `author_id` in Commits had the same value as `author_id` in pull_requests (ex: github:GithubUser:11111111). The design of the table commits contains two columns with the same values which means had duplicated values in the table, and it is not an ideal design. Have this unique identifier in the table commits will help to create dashboards in Grafana to obtain more exactly metrics.
   
   Without an unique author_id on commits table, it's not possible obtain all the commits of an individual as now each combination of email + display name makes a different commiter. 
   
   Actual behavior:
   | author_name | author_email               | author_id                     |
   | ------------- | ------------- | ------------- |
   | Jon Doe         | jon.doe@gmail.com    | jon.doe@gmail.com    |
   | Jon Doe         | jon.doe@hotmail.com | jon.doe@hotmail.com |
   
   (Both emails belong to the same person)
   
   Expected behavior:
   | author_name | author_email               | author_id |
   | ------------- | ------------- | ------------- |
   | Jon Doe         | jon.doe@gmail.com    | github:GithubUser:11111111 |
   | Jon Doe         | jon.doe@hotmail.com | github:GithubUser:11111111 |
   
   ### How to reproduce
   
   1. Have a GitHub connection configured with a token
   2. Go to **Pipelines > Create Pipeline Run**.
   3. Click on **Create Pipeline Run**.
   4. Scroll down to until '**Github**' is shown in Data Providers list.
   5. **Toggle on** GitHub Data provider
   6. Enter repository owner and name for a repository that contains few commits and pull requests created by different users.
   7. Click on '**Run Pipeline**'
   8. Once the register of the repository have finished, go to **Pipelines > Create Pipeline Run**.
   9. Click on **Create Pipeline Run**
   10. Scroll down to until '**Advanced Mode**' option at the bottom appears.
   11. Click on '**Advanced Mode**'.
   12. Create a task in the task editor to launch a GitHub Extractor Task.
   13. Use the following JSON: `[
     [
       {
         "Plugin": "gitextractor",
         "Options": {
           "url": "Url of the repository registered in the previous step ended with .git",
           "repoId": "Github repository id. It looks like -> github:GithubRepo:384111310",
           "user": "Name of the user who is the owner of the GitHub Token",
           "password": "GitHub Token"
         }
       }
     ]`
   14. Click on '**Run Pipeline**'
   15. Once the scan have finished, connect to the database
   16. Use the table `commits`
   17. Execute the query: `SELECT author_email, author_id FROM lake.commits;`
   
    
   
   
   
   
   ### Anything else
   
   Not sure about the exact version of lake we are using, because we are working with the fork of [MericoDev](https://hub.docker.com/layers/lake/mericodev/lake/20220523/images/sha256-b2210658d04[%E2%80%A6]ea0ae55f865c8ad520b628e9a94a76d492045097524bc?context=explore). 
    
   
   ### Version
   
   0.10.0
   
   ### Are you willing to submit PR?
   
   - [ ] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of Conduct](https://www.apache.org/foundation/policies/conduct)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@devlake.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-devlake] lukasgomez commented on issue #2121: [Bug] [Database] author_id column in commits table of database contains an email instead of an id

Posted by GitBox <gi...@apache.org>.
lukasgomez commented on issue #2121:
URL: https://github.com/apache/incubator-devlake/issues/2121#issuecomment-1205252877

   Sure! We will discuss with our managers which topics we can raise and then we can try to find time to schedule a meeting. We will be in touch  🙂 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@devlake.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [I] [Bug] [Database] author_id column in commits table of database contains an email instead of an id [incubator-devlake]

Posted by "Startrekzky (via GitHub)" <gi...@apache.org>.
Startrekzky commented on issue #2121:
URL: https://github.com/apache/incubator-devlake/issues/2121#issuecomment-1752754613

   @klesh I think the `gitextractor` plugin already populated the `accounts` table with the email value rather than the platform id? Do you mean that we should populate with the platform's account id?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@devlake.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-devlake] hezyin commented on issue #2121: [Bug] [Database] author_id column in commits table of database contains an email instead of an id

Posted by GitBox <gi...@apache.org>.
hezyin commented on issue #2121:
URL: https://github.com/apache/incubator-devlake/issues/2121#issuecomment-1230586978

   @lukasgomez Hi Lukas, I'm going to close the issue for now, feel free to reach out if you need more assistance.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@devlake.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-devlake] github-actions[bot] commented on issue #2121: [Bug] [Database] author_id column in commits table of database contains an email instead of an id

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on issue #2121:
URL: https://github.com/apache/incubator-devlake/issues/2121#issuecomment-1200540268

   This issue has been automatically marked as stale because it has not had recent activity for 30 days. It will be closed in next 7 days if no further activity occurs.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@devlake.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [I] [Bug] [Database] author_id column in commits table of database contains an email instead of an id [incubator-devlake]

Posted by "klesh (via GitHub)" <gi...@apache.org>.
klesh commented on issue #2121:
URL: https://github.com/apache/incubator-devlake/issues/2121#issuecomment-1752634941

   @richard-fletcher I think you are right, the `gitextractor` should fill the `accounts` table.
   @Startrekzky what do you think?
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@devlake.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-devlake] klesh commented on issue #2121: [Bug] [Database] author_id column in commits table of database contains an email instead of an id

Posted by GitBox <gi...@apache.org>.
klesh commented on issue #2121:
URL: https://github.com/apache/incubator-devlake/issues/2121#issuecomment-1150869061

   Hi, Thanks for your input.
   There is a catch here, `gitextractor` collector `author_id` based on git repo itself, so it wouldn't know about this **github user id ** at all.
   And because `git` is a distributed system, one can folk repo multiple times to different platforms (like github to gitlab to bitbucket),  say if we collect these folk repos from different platforms, it is hard to determine which **platform userid** should a commit belongs to.
   We are currently working on a People entity to try to solve this kind of problem.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@devlake.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [I] [Bug] [Database] author_id column in commits table of database contains an email instead of an id [incubator-devlake]

Posted by "richard-fletcher (via GitHub)" <gi...@apache.org>.
richard-fletcher commented on issue #2121:
URL: https://github.com/apache/incubator-devlake/issues/2121#issuecomment-1753252879

   @Startrekzky I think you are correct. Let me test this out and confirm back if I'm still having issues.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@devlake.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-devlake] hezyin commented on issue #2121: [Bug] [Database] author_id column in commits table of database contains an email instead of an id

Posted by GitBox <gi...@apache.org>.
hezyin commented on issue #2121:
URL: https://github.com/apache/incubator-devlake/issues/2121#issuecomment-1204663344

   @lukasgomez Hi Lukas, we'd love to support you guys in implementing DevLake for your team and get your feedback. Would you be open to a quick Zoom conversation to get properly connected? Connecting with users is incredibly helpful for us to keep improving DevLake.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@devlake.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-devlake] lukasgomez commented on issue #2121: [Bug] [Database] author_id column in commits table of database contains an email instead of an id

Posted by GitBox <gi...@apache.org>.
lukasgomez commented on issue #2121:
URL: https://github.com/apache/incubator-devlake/issues/2121#issuecomment-1202308538

   Hi,
   We will try this new feature to try to filter commits by GitHub user and test the release candidate images. Hope this solution helps us to achieve the desired behavior. Thanks for your response and your time 😄 .


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@devlake.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-devlake] hezyin commented on issue #2121: [Bug] [Database] author_id column in commits table of database contains an email instead of an id

Posted by GitBox <gi...@apache.org>.
hezyin commented on issue #2121:
URL: https://github.com/apache/incubator-devlake/issues/2121#issuecomment-1201717245

   @lukasgomez @marcemv90 Thanks for putting together such a detailed bug report, really appreciate it!
   
   Like @klesh mentioned, the `commits` table is created by the `gitextractor plugin which directly extracts data from a git repository and it's impossible to tell a git author's GitHub id using the git repo alone.
   
   If you would like to filter/group commits by a specific GitHub user, I'd also recommend looking into the [team configuration](https://devlake.apache.org/docs/v0.12/UserManuals/TeamConfiguration/) feature that's going to be shipped in v0.12.0. The release candidate images are already available on docker hub, we're just going through Apache's formal voting process for releases. If you'd like to get a taste now, feel free to try the images below :-)
   
   * [apache/devlake:v0.12.0-rc2](https://hub.docker.com/layers/devlake/apache/devlake/v0.12.0-rc2/images/sha256-4372688000837272a60b167593a5f19a5d66926626fe50cb619ead571c23286f?context=explore)
   * [apache/devlake-dashboard:v0.12.0-rc2](https://hub.docker.com/layers/devlake-dashboard/apache/devlake-dashboard/v0.12.0-rc2/images/sha256-e4f26f63f154445ba12f4046389daa6584465228e6cbd7b69d2d7b56176d198a?context=explore)
   * [apache/devlake-config-ui:v0.12.0-rc2](https://hub.docker.com/layers/devlake-config-ui/apache/devlake-config-ui/v0.12.0-rc2/images/sha256-d62a154e485188ead083fae80e3662806b239779ad327d0e075a46b6ee2d1659?context=explore)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@devlake.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [I] [Bug] [Database] author_id column in commits table of database contains an email instead of an id [incubator-devlake]

Posted by "synergy-commvault (via GitHub)" <gi...@apache.org>.
synergy-commvault commented on issue #2121:
URL: https://github.com/apache/incubator-devlake/issues/2121#issuecomment-1753249747

   @Startrekzky I think you are correct. Let me test this out and confirm back if I'm still having issues.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@devlake.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-devlake] marcemv90 commented on issue #2121: [Bug] [Database] author_id column in commits table of database contains an email instead of an id

Posted by GitBox <gi...@apache.org>.
marcemv90 commented on issue #2121:
URL: https://github.com/apache/incubator-devlake/issues/2121#issuecomment-1201211703

   Bump, I don't think this issue should be closed.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@devlake.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-devlake] klesh commented on issue #2121: [Bug] [Database] author_id column in commits table of database contains an email instead of an id

Posted by GitBox <gi...@apache.org>.
klesh commented on issue #2121:
URL: https://github.com/apache/incubator-devlake/issues/2121#issuecomment-1201270048

   Hi, @marcemv90 , but the behavior you expected is not viable.
   As I said, in reality, a `commit` may or may not have its `author_id` pointing to github user id.
   Relying on GithubUserID is very limited and won't be supported.
   
   I suggest that you take a look at the [Team Feature](https://devlake.apache.org/docs/v0.12/UserManuals/TeamConfiguration), it allows you to connect multiple email addresses to a Unified Identity.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@devlake.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-devlake] Startrekzky commented on issue #2121: [Bug] [Database] author_id column in commits table of database contains an email instead of an id

Posted by GitBox <gi...@apache.org>.
Startrekzky commented on issue #2121:
URL: https://github.com/apache/incubator-devlake/issues/2121#issuecomment-1201278517

   The [team feature](https://devlake.apache.org/docs/v0.12/UserManuals/TeamConfiguration) @klesh mentioned is supported in v0.12.0, which will be released soon. Stay tuned.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@devlake.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-devlake] hezyin closed issue #2121: [Bug] [Database] author_id column in commits table of database contains an email instead of an id

Posted by GitBox <gi...@apache.org>.
hezyin closed issue #2121: [Bug] [Database] author_id column in commits table of database contains an email instead of an id
URL: https://github.com/apache/incubator-devlake/issues/2121


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@devlake.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-devlake] hezyin commented on issue #2121: [Bug] [Database] author_id column in commits table of database contains an email instead of an id

Posted by GitBox <gi...@apache.org>.
hezyin commented on issue #2121:
URL: https://github.com/apache/incubator-devlake/issues/2121#issuecomment-1225050619

   @lukasgomez Hi Lukas, just checking in on this issue and see if you guys need any further assistance. Also, did you get the chance to discuss with your managers? I'm still up for a chat and any feature request or feedback for DevLake is greatly appreciated.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@devlake.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-devlake] richard-fletcher commented on issue #2121: [Bug] [Database] author_id column in commits table of database contains an email instead of an id

Posted by "richard-fletcher (via GitHub)" <gi...@apache.org>.
richard-fletcher commented on issue #2121:
URL: https://github.com/apache/incubator-devlake/issues/2121#issuecomment-1738277373

   @hezyin @klesh  Sorry to pull up an old issue but can you provide some details on how the Teams functionality can help with this problem. When implementing Teams it seems that having a record in the `accounts` table is key to the process so that it an then be mapped through the mapping process. `gitextractor` does not create any accounts in this table so mapping seems problematic.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@devlake.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org