You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@devlake.apache.org by GitBox <gi...@apache.org> on 2022/08/01 06:38:43 UTC

[GitHub] [incubator-devlake] dealexce opened a new issue, #2650: [Bug][GitExtractor] Some commits are missing during collectGitCommits

dealexce opened a new issue, #2650:
URL: https://github.com/apache/incubator-devlake/issues/2650

   ### Search before asking
   
   - [X] I had searched in the [issues](https://github.com/apache/incubator-devlake/issues?q=is%3Aissue) and found no similar issues.
   
   
   ### What happened
   
   Hello everyone. I am using the latest main branch version and trying to collect gitlab commit data with Devlake. However, I found that sometimes some commits are not collected into the database after the pipeline run. Specifically, I have tested on 3 projects, each's outcome:
   
   1. all commits on 28 branches/tags are stored successfully
   2. only the 7th latest commit (sha:7d7600..) on dev branch is missing. The commits on master branch are all stored
   3. same as the second project, only the 7th latest commit on dev branch is missing
   
   I'm not sure whether it is related to '7th latest commit' or the branches with name 'dev' or something else. But these are some things I tried and observed:
   
   1. The first project also has a branch named 'dev' and all commits seems to be collected well
   2. I submit a new commit on the second project and the 7d7600 commit (which is the 8th latest now) is still missing, and other commits are all collected.
   
   This is how I query the commits on a specific branch:
   `with res as (
   SELECT
     @r AS _sha,
     (SELECT @r := parent_commit_sha FROM commit_parents WHERE commit_sha = _sha limit 1) AS parent_sha,
     @l := @l + 1 AS lvl 
   FROM (SELECT @r := (SELECT commit_sha FROM refs WHERE id='gitlab:GitlabProject:1:175:origin/dev'), @l := 0) vars,
   commit_parents h) 
   SELECT _sha, parent_sha FROM res WHERE parent_sha IS NOT NULL;`
   
   This is what I got:
   ![image](https://user-images.githubusercontent.com/50829041/182084026-e5665cdc-bc38-4cce-9122-b466af20cef5.png)
   
   It shows that there are only 7 commits on this branch, and the commit `7d7600..` are not found in `commit_parents` table so it is the end. However, the 7d7600 is also not found in `commits` table,
   ![image](https://user-images.githubusercontent.com/50829041/182084773-a4817f08-1e39-4bb9-ae83-edf89fc5679b.png)
   
   and the gitlab shows that there are more commits before 7d7600:
   ![image](https://user-images.githubusercontent.com/50829041/182084360-d175b32d-8804-4809-9606-7b8867fb2c1a.png)
   
   Moreover, I check the sha of the parent commit of 7d7600 from the gitlab (acc33d..), and I found that the commits after 7d7600 are all stored well in the tables:
   ![image](https://user-images.githubusercontent.com/50829041/182085493-69b4734a-421e-4685-9248-cc037c62eaae.png)
   ...
   ![image](https://user-images.githubusercontent.com/50829041/182085511-1cf78e4e-59c7-4ad7-bebd-81241c5473e7.png)
   
   So I think this commit is probably missing during the collection some how. This is the logs of the pipeline run for the second project: 
   [[task #18] [gitextractor].log](https://github.com/apache/incubator-devlake/files/9231509/task.18.gitextractor.log)
   
   I might did something wrong on this, thanks a lot for any ideas.
   
   ### What you expected to happen
   
   All commits on all branches of the project should be collected and stored.
   
   ### How to reproduce
   
   Use the latest devlake, create a gitlab blueprint and run. 
   
   ### Anything else
   
   _No response_
   
   ### Version
   
   main
   
   ### Are you willing to submit PR?
   
   - [ ] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of Conduct](https://www.apache.org/foundation/policies/conduct)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@devlake.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-devlake] dealexce commented on issue #2650: [Bug][GitExtractor] Some commits are missing during collectGitCommits

Posted by GitBox <gi...@apache.org>.
dealexce commented on issue #2650:
URL: https://github.com/apache/incubator-devlake/issues/2650#issuecomment-1201947768

   > Hi, @dealexce , the bug was fixed by #2654 on the `main` branch, please take a look, and feel free to re-open if the problem still exists
   
   Everything looks fine now. Thanks, you guys are awesome!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@devlake.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-devlake] klesh commented on issue #2650: [Bug][GitExtractor] Some commits are missing during collectGitCommits

Posted by GitBox <gi...@apache.org>.
klesh commented on issue #2650:
URL: https://github.com/apache/incubator-devlake/issues/2650#issuecomment-1200838381

   I would like to know was the `dev` branch newer than `main` for the 1st repo (the one with all commits collected correctly)?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@devlake.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-devlake] klesh commented on issue #2650: [Bug][GitExtractor] Some commits are missing during collectGitCommits

Posted by GitBox <gi...@apache.org>.
klesh commented on issue #2650:
URL: https://github.com/apache/incubator-devlake/issues/2650#issuecomment-1200815460

   This is the most thorough report I ever seen in my life. Thanks, man, will look into it.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@devlake.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-devlake] klesh closed issue #2650: [Bug][GitExtractor] Some commits are missing during collectGitCommits

Posted by GitBox <gi...@apache.org>.
klesh closed issue #2650: [Bug][GitExtractor] Some commits are missing during collectGitCommits
URL: https://github.com/apache/incubator-devlake/issues/2650


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@devlake.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-devlake] dealexce commented on issue #2650: [Bug][GitExtractor] Some commits are missing during collectGitCommits

Posted by GitBox <gi...@apache.org>.
dealexce commented on issue #2650:
URL: https://github.com/apache/incubator-devlake/issues/2650#issuecomment-1200864140

   > I would like to know was the `dev` branch newer than `main` for the 1st repo (the one with all commits collected correctly)?
   
   Yes, the `dev` branch is newer than the `master` branch. Actually I've just checked again and found that the 1st latest commit on the `master` branch in the 1st repo is also missing. (Sorry for the mistake)
   
   I also checked the logs again and found that, **it is might actually the first processed commit will not be stored.** 
   In the first repo, the logs are:
   ![image](https://user-images.githubusercontent.com/50829041/182102787-22cb1145-0f64-458a-977b-07e9f578ae45.png)
   and the `ebce1d..` is the missing one.
   In the second repo, the logs are:
   ![image](https://user-images.githubusercontent.com/50829041/182102929-98b98368-7960-4d41-b66d-f968145c01df.png)
   and the `7d7600..` is the missing one.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@devlake.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-devlake] klesh commented on issue #2650: [Bug][GitExtractor] Some commits are missing during collectGitCommits

Posted by GitBox <gi...@apache.org>.
klesh commented on issue #2650:
URL: https://github.com/apache/incubator-devlake/issues/2650#issuecomment-1201912903

   Hi, @dealexce , the bug was fixed by #2654 on the `main` branch, please take a look, and feel free to re-open if the problem still exists


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@devlake.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org