You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@fluo.apache.org by GitBox <gi...@apache.org> on 2022/01/06 06:18:19 UTC

[GitHub] [fluo-muchos] arvindshmicrosoft opened a new pull request #427: Streamline tarball upload and checksum code

arvindshmicrosoft opened a new pull request #427:
URL: https://github.com/apache/fluo-muchos/pull/427


   * Remove special case handling for SNAPSHOT tarballs - we now compute
     checksums for all relevant tarballs found in conf/upload.
     Those checksums are available to the Ansible download tasks, thereby
     further validating uploaded files.
   
   * Compute the expected local tarball name based on software versions
     specified in muchos.props. Then, if a matching local tarball is found,
     copy the tarball to the Ansible host ("proxy"). Locally found tarballs
     which do not match the configured version, are skipped (previously,
     all .tar.gz files from conf/upload would be blindly copied to proxy,
     regardless of whether they are actually needed).
   
   * Add simple tests to validate expected local tarball name for Accumulo
     and Zookeeper (the latter is especially important as it exercises the
     recursive expansion of Jinja2 templated variables in the Ansible
     host variables).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@fluo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [fluo-muchos] ctubbsii commented on a change in pull request #427: Streamline tarball upload and checksum code

Posted by GitBox <gi...@apache.org>.
ctubbsii commented on a change in pull request #427:
URL: https://github.com/apache/fluo-muchos/pull/427#discussion_r779362079



##########
File path: lib/muchos/config/base.py
##########
@@ -480,12 +505,26 @@ def checksum_ver(self, software, version):
                         )
 
         key = "{0}:{1}".format(software, version)
+
         if key not in self.checksums_d:
-            exit(
-                "ERROR - Failed to find checksums for {} {} in {}".format(
-                    software, version, self.checksums_path
+            local_tarball_path = self.get_local_tarball_path(software)
+            if local_tarball_path is None or not exists(local_tarball_path):
+                exit(
+                    "ERROR - Failed to find either a valid checksum in {}, "
+                    "or a local tarball to upload for {} {}.".format(
+                        self.checksums_path, software, version
+                    )
                 )
-            )
+            else:
+                # compute and use the checksum for local tarball
+                local_tarball_sha512 = sha512()
+                with open(local_tarball_path, "rb") as tarball_contents:
+                    file_buffer = tarball_contents.read(65536)
+                    while len(file_buffer) > 0:
+                        local_tarball_sha512.update(file_buffer)
+                        file_buffer = tarball_contents.read(65536)
+                return f"sha512:{local_tarball_sha512.hexdigest()}"
+

Review comment:
       I like the idea of a local tarball that is uploaded, but we still want to verify the local tarball isn't corrupt or malicious, so we shouldn't just recompute the checksum, but validate it locally before shipping it off to the cluster.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@fluo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [fluo-muchos] arvindshmicrosoft commented on a change in pull request #427: Streamline tarball upload and checksum code

Posted by GitBox <gi...@apache.org>.
arvindshmicrosoft commented on a change in pull request #427:
URL: https://github.com/apache/fluo-muchos/pull/427#discussion_r779657399



##########
File path: lib/muchos/config/base.py
##########
@@ -480,12 +505,26 @@ def checksum_ver(self, software, version):
                         )
 
         key = "{0}:{1}".format(software, version)
+
         if key not in self.checksums_d:
-            exit(
-                "ERROR - Failed to find checksums for {} {} in {}".format(
-                    software, version, self.checksums_path
+            local_tarball_path = self.get_local_tarball_path(software)
+            if local_tarball_path is None or not exists(local_tarball_path):
+                exit(
+                    "ERROR - Failed to find either a valid checksum in {}, "
+                    "or a local tarball to upload for {} {}.".format(
+                        self.checksums_path, software, version
+                    )
                 )
-            )
+            else:
+                # compute and use the checksum for local tarball
+                local_tarball_sha512 = sha512()
+                with open(local_tarball_path, "rb") as tarball_contents:
+                    file_buffer = tarball_contents.read(65536)
+                    while len(file_buffer) > 0:
+                        local_tarball_sha512.update(file_buffer)
+                        file_buffer = tarball_contents.read(65536)
+                return f"sha512:{local_tarball_sha512.hexdigest()}"
+

Review comment:
       Thinking through your comments in #428 I think we may be able to use checksums from conf/checksums (if available) to do this. I'll work through some changes and circle back on these (somewhat related) PRs.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@fluo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [fluo-muchos] ctubbsii commented on pull request #427: Streamline tarball upload and checksum code

Posted by GitBox <gi...@apache.org>.
ctubbsii commented on pull request #427:
URL: https://github.com/apache/fluo-muchos/pull/427#issuecomment-1007644778


   > Given the valid observations / concerns raised by @ctubbsii I am abandoning the PR. As he pointed out, the code bloat and possible reduced functionality do not justify proceeding.
   
   Sorry. I didn't mean to be a "party pooper". I thought maybe I was missing something. No harm done by the attempt to streamline, though!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@fluo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [fluo-muchos] ctubbsii commented on pull request #427: Streamline tarball upload and checksum code

Posted by GitBox <gi...@apache.org>.
ctubbsii commented on pull request #427:
URL: https://github.com/apache/fluo-muchos/pull/427#issuecomment-1007644778


   > Given the valid observations / concerns raised by @ctubbsii I am abandoning the PR. As he pointed out, the code bloat and possible reduced functionality do not justify proceeding.
   
   Sorry. I didn't mean to be a "party pooper". I thought maybe I was missing something. No harm done by the attempt to streamline, though!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@fluo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [fluo-muchos] arvindshmicrosoft commented on a change in pull request #427: Streamline tarball upload and checksum code

Posted by GitBox <gi...@apache.org>.
arvindshmicrosoft commented on a change in pull request #427:
URL: https://github.com/apache/fluo-muchos/pull/427#discussion_r779645999



##########
File path: lib/muchos/config/base.py
##########
@@ -480,12 +505,26 @@ def checksum_ver(self, software, version):
                         )
 
         key = "{0}:{1}".format(software, version)
+
         if key not in self.checksums_d:
-            exit(
-                "ERROR - Failed to find checksums for {} {} in {}".format(
-                    software, version, self.checksums_path
+            local_tarball_path = self.get_local_tarball_path(software)
+            if local_tarball_path is None or not exists(local_tarball_path):
+                exit(
+                    "ERROR - Failed to find either a valid checksum in {}, "
+                    "or a local tarball to upload for {} {}.".format(
+                        self.checksums_path, software, version
+                    )
                 )
-            )
+            else:
+                # compute and use the checksum for local tarball
+                local_tarball_sha512 = sha512()
+                with open(local_tarball_path, "rb") as tarball_contents:
+                    file_buffer = tarball_contents.read(65536)
+                    while len(file_buffer) > 0:
+                        local_tarball_sha512.update(file_buffer)
+                        file_buffer = tarball_contents.read(65536)
+                return f"sha512:{local_tarball_sha512.hexdigest()}"
+

Review comment:
       @ctubbsii - this code path is only used when there is no checksum available (in conf/checksums) to compare against. Do you have any suggestions on how to accomplish what you suggested? I guess we could also require a .sha512 file to be placed in the conf/upload folder (for non-SNAPSHOT tarballs) and use that?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@fluo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [fluo-muchos] arvindshmicrosoft commented on a change in pull request #427: Streamline tarball upload and checksum code

Posted by GitBox <gi...@apache.org>.
arvindshmicrosoft commented on a change in pull request #427:
URL: https://github.com/apache/fluo-muchos/pull/427#discussion_r779975046



##########
File path: lib/muchos/config/base.py
##########
@@ -480,12 +505,26 @@ def checksum_ver(self, software, version):
                         )
 
         key = "{0}:{1}".format(software, version)
+
         if key not in self.checksums_d:
-            exit(
-                "ERROR - Failed to find checksums for {} {} in {}".format(
-                    software, version, self.checksums_path
+            local_tarball_path = self.get_local_tarball_path(software)
+            if local_tarball_path is None or not exists(local_tarball_path):
+                exit(
+                    "ERROR - Failed to find either a valid checksum in {}, "
+                    "or a local tarball to upload for {} {}.".format(
+                        self.checksums_path, software, version
+                    )
                 )
-            )
+            else:
+                # compute and use the checksum for local tarball
+                local_tarball_sha512 = sha512()
+                with open(local_tarball_path, "rb") as tarball_contents:
+                    file_buffer = tarball_contents.read(65536)
+                    while len(file_buffer) > 0:
+                        local_tarball_sha512.update(file_buffer)
+                        file_buffer = tarball_contents.read(65536)
+                return f"sha512:{local_tarball_sha512.hexdigest()}"
+

Review comment:
       Please review my subsequent commit. I believe it addresses the previous comments. To summarize, these changes now offer added flexibility, without sacrificing security:
   * For archive releases, we can now use local tarballs from conf/upload, **only** if they have either a checksum in conf/checksums, or a matching .sha512 file present in conf/upload.
     * I tested this with samples of archived Hadoop, Zookeeper and Spark releases which do not have entries in conf/checksums - they are all good.
     * For good measure, I also tested a case where I placed the released Hadoop 3.1.4 tarball alone in the conf/upload folder (without a .sha512 file) and validated that Mucho correctly used the matching checksum from conf/checksums.
   * For SNAPSHOT tarballs, we will compute a checksum locally and use that on the Ansible side to ensure that the tarball was not corrupted in transit. This is an enhancement over what we have currently in main.
   
   By the way, in neither my original commit, nor in this one, will the code download a tarball if it is already present and the checksum matches. This behavior is also status quo from what is in main.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@fluo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [fluo-muchos] arvindshmicrosoft commented on pull request #427: Streamline tarball upload and checksum code

Posted by GitBox <gi...@apache.org>.
arvindshmicrosoft commented on pull request #427:
URL: https://github.com/apache/fluo-muchos/pull/427#issuecomment-1007512050


   Given the valid observations / concerns raised by @ctubbsii I am abandoning the PR. As he pointed out, the code bloat and possible reduced functionality do not justify proceeding.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@fluo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [fluo-muchos] ctubbsii commented on a change in pull request #427: Streamline tarball upload and checksum code

Posted by GitBox <gi...@apache.org>.
ctubbsii commented on a change in pull request #427:
URL: https://github.com/apache/fluo-muchos/pull/427#discussion_r780048461



##########
File path: lib/muchos/config/base.py
##########
@@ -480,13 +506,51 @@ def checksum_ver(self, software, version):
                         )
 
         key = "{0}:{1}".format(software, version)
-        if key not in self.checksums_d:
-            exit(
-                "ERROR - Failed to find checksums for {} {} in {}".format(
-                    software, version, self.checksums_path
-                )
+        # we first check if there is a tarball in conf/upload for this software
+        local_tarball_path = self.get_local_tarball_path(software)
+        if local_tarball_path is not None and exists(local_tarball_path):
+            if "-SNAPSHOT" in local_tarball_path:
+                # compute checksum for SNAPSHOT tarballs and use that
+                local_tarball_sha512 = sha512()
+                with open(local_tarball_path, "rb") as tarball_contents:
+                    file_buffer = tarball_contents.read(65536)
+                    while len(file_buffer) > 0:
+                        local_tarball_sha512.update(file_buffer)
+                        file_buffer = tarball_contents.read(65536)
+                return f"sha512:{local_tarball_sha512.hexdigest()}"
+
+            # if a local tarball exists, we need either an entry in
+            # conf/checksums, or a .sha512 file
+            if key not in self.checksums_d:
+                # if a local tarball exists but no checksum in conf/checksums
+                # see if we have a .sha512 file to use
+                sha512_path = local_tarball_path + ".sha512"
+                if exists(sha512_path):
+                    with open(sha512_path, "r") as sha512_file:
+                        # replace all whitespace in the sha512 file
+                        sha512_contents = re.sub(
+                            r"\s+", "", sha512_file.read()
+                        )
+
+                        # since the sha512 files have varied structures
+                        # we first check that the exact tarball file name is
+                        # found, and if so, we will extract the sha512 hash
+                        if basename(local_tarball_path) in sha512_contents:
+                            match_result = re.search(
+                                "(?P<hash>[a-fA-F0-9]{128})", sha512_contents

Review comment:
       Searching the contents for a 128 character hash anywhere with this regex cleverly avoids parsing the different file formats, but my brain is thinking of ways to break this logic with some fun release tarball versioning schemes :smiley_cat: 
   
   I think this is probably fine, but I'm not sure we actually needed all this logic to read the arbitrary file formats, when users could just put the hash for anything they put in the uploads directory in the checksums file instead... leaving the job of parsing the crazy formats that some projects use to the user to deal with in their checksum config file. It'd be far simpler to omit this feature.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@fluo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [fluo-muchos] arvindshmicrosoft closed pull request #427: Streamline tarball upload and checksum code

Posted by GitBox <gi...@apache.org>.
arvindshmicrosoft closed pull request #427:
URL: https://github.com/apache/fluo-muchos/pull/427


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@fluo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [fluo-muchos] ctubbsii commented on a change in pull request #427: Streamline tarball upload and checksum code

Posted by GitBox <gi...@apache.org>.
ctubbsii commented on a change in pull request #427:
URL: https://github.com/apache/fluo-muchos/pull/427#discussion_r780048461



##########
File path: lib/muchos/config/base.py
##########
@@ -480,13 +506,51 @@ def checksum_ver(self, software, version):
                         )
 
         key = "{0}:{1}".format(software, version)
-        if key not in self.checksums_d:
-            exit(
-                "ERROR - Failed to find checksums for {} {} in {}".format(
-                    software, version, self.checksums_path
-                )
+        # we first check if there is a tarball in conf/upload for this software
+        local_tarball_path = self.get_local_tarball_path(software)
+        if local_tarball_path is not None and exists(local_tarball_path):
+            if "-SNAPSHOT" in local_tarball_path:
+                # compute checksum for SNAPSHOT tarballs and use that
+                local_tarball_sha512 = sha512()
+                with open(local_tarball_path, "rb") as tarball_contents:
+                    file_buffer = tarball_contents.read(65536)
+                    while len(file_buffer) > 0:
+                        local_tarball_sha512.update(file_buffer)
+                        file_buffer = tarball_contents.read(65536)
+                return f"sha512:{local_tarball_sha512.hexdigest()}"
+
+            # if a local tarball exists, we need either an entry in
+            # conf/checksums, or a .sha512 file
+            if key not in self.checksums_d:
+                # if a local tarball exists but no checksum in conf/checksums
+                # see if we have a .sha512 file to use
+                sha512_path = local_tarball_path + ".sha512"
+                if exists(sha512_path):
+                    with open(sha512_path, "r") as sha512_file:
+                        # replace all whitespace in the sha512 file
+                        sha512_contents = re.sub(
+                            r"\s+", "", sha512_file.read()
+                        )
+
+                        # since the sha512 files have varied structures
+                        # we first check that the exact tarball file name is
+                        # found, and if so, we will extract the sha512 hash
+                        if basename(local_tarball_path) in sha512_contents:
+                            match_result = re.search(
+                                "(?P<hash>[a-fA-F0-9]{128})", sha512_contents

Review comment:
       Searching the contents for a 128 character hash anywhere with this regex cleverly avoids parsing the different file formats, but my brain is thinking of ways to break this logic with some fun release tarball versioning schemes :smiley_cat: 
   
   I think this is probably fine, but I'm not sure we actually needed all this logic to read the arbitrary file formats, when users could just put the hash for anything they put in the uploads directory in the checksums file instead... leaving the job of parsing the crazy formats that some projects use to the user to deal with in their checksum config file. It'd be far simpler to omit this feature.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@fluo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [fluo-muchos] arvindshmicrosoft closed pull request #427: Streamline tarball upload and checksum code

Posted by GitBox <gi...@apache.org>.
arvindshmicrosoft closed pull request #427:
URL: https://github.com/apache/fluo-muchos/pull/427


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@fluo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [fluo-muchos] arvindshmicrosoft commented on pull request #427: Streamline tarball upload and checksum code

Posted by GitBox <gi...@apache.org>.
arvindshmicrosoft commented on pull request #427:
URL: https://github.com/apache/fluo-muchos/pull/427#issuecomment-1007512050


   Given the valid observations / concerns raised by @ctubbsii I am abandoning the PR. As he pointed out, the code bloat and possible reduced functionality do not justify proceeding.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@fluo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [fluo-muchos] ctubbsii commented on a change in pull request #427: Streamline tarball upload and checksum code

Posted by GitBox <gi...@apache.org>.
ctubbsii commented on a change in pull request #427:
URL: https://github.com/apache/fluo-muchos/pull/427#discussion_r779865502



##########
File path: lib/muchos/config/base.py
##########
@@ -480,12 +505,26 @@ def checksum_ver(self, software, version):
                         )
 
         key = "{0}:{1}".format(software, version)
+
         if key not in self.checksums_d:
-            exit(
-                "ERROR - Failed to find checksums for {} {} in {}".format(
-                    software, version, self.checksums_path
+            local_tarball_path = self.get_local_tarball_path(software)
+            if local_tarball_path is None or not exists(local_tarball_path):
+                exit(
+                    "ERROR - Failed to find either a valid checksum in {}, "
+                    "or a local tarball to upload for {} {}.".format(
+                        self.checksums_path, software, version
+                    )
                 )
-            )
+            else:
+                # compute and use the checksum for local tarball
+                local_tarball_sha512 = sha512()
+                with open(local_tarball_path, "rb") as tarball_contents:
+                    file_buffer = tarball_contents.read(65536)
+                    while len(file_buffer) > 0:
+                        local_tarball_sha512.update(file_buffer)
+                        file_buffer = tarball_contents.read(65536)
+                return f"sha512:{local_tarball_sha512.hexdigest()}"
+

Review comment:
       I'm just thinking about how fluo-uno works: if the file is already downloaded, it won't download it again. However, it does always verify the local file's checksum before using it.
   
   If we merely verify a checksum we've computed ourselves, we're not doing anything to ensure the artifacts themselves are legitimate.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@fluo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org