You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@druid.apache.org by "tejaswini-imply (via GitHub)" <gi...@apache.org> on 2023/02/08 06:41:03 UTC

[GitHub] [druid] tejaswini-imply opened a new pull request, #13772: Fix GHA tests cache hit miss scenario

tejaswini-imply opened a new pull request, #13772:
URL: https://github.com/apache/druid/pull/13772

   Unit & IT tests in GHA workflows currently cache maven dependencies and docker images during build phase and if the cache retrieval fails in unit or IT test jobs, they'll fail. This PR instead does maven build again in the same job and creates docker image.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] paul-rogers commented on a diff in pull request #13772: Fix GHA tests cache hit miss scenario

Posted by "paul-rogers (via GitHub)" <gi...@apache.org>.
paul-rogers commented on code in PR #13772:
URL: https://github.com/apache/druid/pull/13772#discussion_r1102999793


##########
.github/workflows/reusable-standard-its.yml:
##########
@@ -74,12 +74,15 @@ jobs:
         with:
           path: ~/.m2/repository
           key: maven-${{ runner.os }}-${{ inputs.build_jdk }}-${{ github.sha }}
-          fail-on-cache-miss: true
+
+      - name: Maven build
+        if: steps.maven-restore.outputs.cache-hit != 'true'
+        run: |
+          ./it.sh ci

Review Comment:
   Thanks for the explanation. It is surprising that caching isn't reliable. But, if that is indeed true, then the workaround is fine. Please add a comment that says basically what you explained above.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] tejaswini-imply commented on pull request #13772: Fix GHA tests cache hit miss scenario

Posted by "tejaswini-imply (via GitHub)" <gi...@apache.org>.
tejaswini-imply commented on PR #13772:
URL: https://github.com/apache/druid/pull/13772#issuecomment-1422992653

   @paul-rogers @imply-elliott Could you please review this PR?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] paul-rogers merged pull request #13772: Fix GHA tests cache hit miss scenario

Posted by "paul-rogers (via GitHub)" <gi...@apache.org>.
paul-rogers merged PR #13772:
URL: https://github.com/apache/druid/pull/13772


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] imply-elliott commented on a diff in pull request #13772: Fix GHA tests cache hit miss scenario

Posted by "imply-elliott (via GitHub)" <gi...@apache.org>.
imply-elliott commented on code in PR #13772:
URL: https://github.com/apache/druid/pull/13772#discussion_r1108928170


##########
.github/workflows/reusable-standard-its.yml:
##########
@@ -74,12 +74,15 @@ jobs:
         with:
           path: ~/.m2/repository
           key: maven-${{ runner.os }}-${{ inputs.build_jdk }}-${{ github.sha }}
-          fail-on-cache-miss: true
+
+      - name: Maven build
+        if: steps.maven-restore.outputs.cache-hit != 'true'
+        run: |
+          ./it.sh ci

Review Comment:
   That bug is unfortunate.  
   This workaround will do great to unblock; but I'm going to do some tests with artifacts as a replacement for Caches that i was prototyping a bit ago.  The fact that if we cache miss means all the steps need to build again just sucks.  Caches are a problem for us, as Github only allows 10G before pruning, and that's basically one SHA's worth, so this may bite us hard when the project gets a lot of activity.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] paul-rogers commented on a diff in pull request #13772: Fix GHA tests cache hit miss scenario

Posted by "paul-rogers (via GitHub)" <gi...@apache.org>.
paul-rogers commented on code in PR #13772:
URL: https://github.com/apache/druid/pull/13772#discussion_r1102221995


##########
.github/workflows/reusable-revised-its.yml:
##########
@@ -61,16 +61,38 @@ jobs:
         with:
           path: ~/.m2/repository
           key: maven-${{ runner.os }}-${{ inputs.build_jdk }}-${{ github.sha }}
-          fail-on-cache-miss: true
+
+      - name: Restore targets
+        id: targets-restore
+        uses: actions/cache/restore@v3
+        with:
+          path: ./**/target
+          key: maven-${{ runner.os }}-${{ inputs.build_jdk }}-targets-${{ github.sha }}
 
       - name: Retrieve cached docker image
+        id: docker-restore
         uses: actions/cache/restore@v3
         with:
-          key: druid-container-jdk${{ inputs.build_jdk }}.tar.gz
+          key: druid-container-jdk${{ inputs.build_jdk }}.tar.gz-${{ github.sha }}
           path: |
             ./druid-container-jdk${{ inputs.build_jdk }}.tar.gz
             ./integration-tests-ex/image/target/env.sh
-          fail-on-cache-miss: true
+
+      - name: Maven build
+        if: steps.maven-restore.outputs.cache-hit != 'true' || ( steps.docker-restore.outputs.cache-hit != 'true' && steps.targets-restore.outputs.cache-hit != 'true' )
+        run: |
+          ./it.sh ci
+
+      - name: Create docker image
+        if: steps.docker-restore.outputs.cache-hit != 'true'
+        env:
+          docker-restore: ${{ toJson(steps.docker-restore.outputs) }}
+        run: |
+          ./it.sh image
+          source ./integration-tests-ex/image/target/env.sh

Review Comment:
   I still think that, in GHA, the image name should be passed into `.it.sh`, not retrieved from it. That is, set the image name so something like `drill-test-{{$ github.sha }}` or some such, where the suffix indicates the current commit or overall build. This way, we don't have to parse the env file.



##########
.github/workflows/reusable-standard-its.yml:
##########
@@ -74,12 +74,15 @@ jobs:
         with:
           path: ~/.m2/repository
           key: maven-${{ runner.os }}-${{ inputs.build_jdk }}-${{ github.sha }}
-          fail-on-cache-miss: true
+
+      - name: Maven build
+        if: steps.maven-restore.outputs.cache-hit != 'true'
+        run: |
+          ./it.sh ci

Review Comment:
   I wonder, why would the cache sometimes hit and sometimes not? Is the behavior non-deterministic? If so, is the cache atomic or might we sometimes get a partially-populated cache? Are we trying to populate the cache concurrently from reading from it?
   
   This makes me wonder? What problem are we trying to solve? Do we know the root cause, or are we just adding "here's what to do it we get an unexpected result" workaround?
   
   One really does hope that builds are deterministic!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] tejaswini-imply commented on a diff in pull request #13772: Fix GHA tests cache hit miss scenario

Posted by "tejaswini-imply (via GitHub)" <gi...@apache.org>.
tejaswini-imply commented on code in PR #13772:
URL: https://github.com/apache/druid/pull/13772#discussion_r1102236606


##########
.github/workflows/reusable-standard-its.yml:
##########
@@ -74,12 +74,15 @@ jobs:
         with:
           path: ~/.m2/repository
           key: maven-${{ runner.os }}-${{ inputs.build_jdk }}-${{ github.sha }}
-          fail-on-cache-miss: true
+
+      - name: Maven build
+        if: steps.maven-restore.outputs.cache-hit != 'true'
+        run: |
+          ./it.sh ci

Review Comment:
   This is a known bug from GHA (https://github.com/actions/cache/issues/810). No, it's not atomic, so we're doing a clean build and running the tests as usual. As this is sporadic, failing the entire suite of tests due to a stuck cache didn't seem wise. Another workaround is to let that particular test with stuck cache download fail and rerun it after the workflow ends, hoping the problem doesn't occur again. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org