You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2020/07/22 14:03:55 UTC

[GitHub] [iceberg] waterlx opened a new pull request #1227: Improve release script to detect archive list automatically

waterlx opened a new pull request #1227:
URL: https://github.com/apache/iceberg/pull/1227


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] kbendick commented on a change in pull request #1227: Improve release script to detect archive list automatically

Posted by GitBox <gi...@apache.org>.
kbendick commented on a change in pull request #1227:
URL: https://github.com/apache/iceberg/pull/1227#discussion_r483925144



##########
File path: dev/source-release.sh
##########
@@ -66,7 +66,12 @@ tarball=$tag.tar.gz
 
 # be conservative and use the release hash, even though git produces the same
 # archive (identical hashes) using the scm tag
-git archive $release_hash --prefix $tag/ -o $tarball .baseline api arrow bundled-guava common core data dev flink gradle gradlew hive mr orc parquet pig spark spark2 spark-runtime spark3 spark3-runtime LICENSE NOTICE README.md build.gradle baseline.gradle deploy.gradle tasks.gradle jmh.gradle gradle.properties settings.gradle versions.lock versions.props version.txt
+adds=" .baseline"  # prefixed with a blank space for each file name including the first one
+excludes="build|examples|jitpack.yml|python|site"  # excluded as they are not of use for releasing jars
+echo WARNING! The following files/directories are to be excluded from git archive: ${excludes}

Review comment:
       Can you please use quotes with `echo`? I'm not sure which shell you're using, but I suspect that this isn't going to be portable.
   
   i.e. `echo "WARNING! The following files/directories are to be excluded from git archive: ${excludes}"`. 
   
   Additionally, and this is more of a nit, but I would use `NOTE` over `WARNING`. To me, this is more of an info (or even debug) level log of sorts, and the use of `WARNING` might be confusing for someone simply glancing over it. Again, more of a nit really, but the quotes for `echo` are a best practice (if not typically required in many shells for correctness reasons.).

##########
File path: dev/source-release.sh
##########
@@ -66,7 +66,12 @@ tarball=$tag.tar.gz
 
 # be conservative and use the release hash, even though git produces the same
 # archive (identical hashes) using the scm tag
-git archive $release_hash --prefix $tag/ -o $tarball .baseline api arrow bundled-guava common core data dev flink gradle gradlew hive mr orc parquet pig spark spark2 spark-runtime spark3 spark3-runtime LICENSE NOTICE README.md build.gradle baseline.gradle deploy.gradle tasks.gradle jmh.gradle gradle.properties settings.gradle versions.lock versions.props version.txt
+adds=" .baseline"  # prefixed with a blank space for each file name including the first one
+excludes="build|examples|jitpack.yml|python|site"  # excluded as they are not of use for releasing jars
+echo WARNING! The following files/directories are to be excluded from git archive: ${excludes}
+archives=$(ls | grep -vE ${excludes} | tr '\n' ' ')${adds}
+echo git archive list: ${archives}

Review comment:
       The same comment of using quotes with echo applies here.

##########
File path: dev/source-release.sh
##########
@@ -66,7 +66,11 @@ tarball=$tag.tar.gz
 
 # be conservative and use the release hash, even though git produces the same
 # archive (identical hashes) using the scm tag
-git archive $release_hash --prefix $tag/ -o $tarball .baseline api arrow bundled-guava common core data dev flink gradle gradlew hive mr orc parquet pig spark spark2 spark-runtime spark3 spark3-runtime LICENSE NOTICE README.md build.gradle baseline.gradle deploy.gradle tasks.gradle jmh.gradle gradle.properties settings.gradle versions.lock versions.props version.txt
+adds=" .baseline"  # prefixed with a blank space for each file name including the first one
+excludes="build|examples|jitpack.yml|python|site"
+archives=$(ls | grep -vE ${excludes})${adds}
+echo git archive list: ${archives}

Review comment:
       Ahh I see. It's a good thing we checked on another shell then. It's probably safer to normalize to ensure the script is more portable.

##########
File path: dev/source-release.sh
##########
@@ -66,7 +66,12 @@ tarball=$tag.tar.gz
 
 # be conservative and use the release hash, even though git produces the same
 # archive (identical hashes) using the scm tag
-git archive $release_hash --prefix $tag/ -o $tarball .baseline api arrow bundled-guava common core data dev flink gradle gradlew hive mr orc parquet pig spark spark2 spark-runtime spark3 spark3-runtime LICENSE NOTICE README.md build.gradle baseline.gradle deploy.gradle tasks.gradle jmh.gradle gradle.properties settings.gradle versions.lock versions.props version.txt
+adds=" .baseline"  # prefixed with a blank space for each file name including the first one
+excludes="build|examples|jitpack.yml|python|site"  # excluded as they are not of use for releasing jars
+echo WARNING! The following files/directories are to be excluded from git archive: ${excludes}
+archives=$(ls | grep -vE ${excludes} | tr '\n' ' ')${adds}

Review comment:
       Small nit / fyi: When I ran this, I wound up with an additional space between everything generated from the ls / grep / tr and the $adds.
   
   This is not a huge deal, but it might be cleaner to either split this into two lines for readability or prepend the ${adds} and move the leading space from adds above on line 69 to the end, e.g. `adds=".baseline "`.
   
   I would personally suggest the two line approach (or an equivalent but arguably less readable one line approach):
   Two lines:
   ```bash
   adds=".baseline" # notice there's no whitespace included
   excludes="build|examples|jitpack.yml|python|site"
   echo "NOTICE! The following files/directories are to be excluded from git archive: ${excludes}"
   includes=$(ls | grep -vE $excludes)  # Where we're splitting the generation of archives into two lines
   archives=$(echo "${adds} ${includes}" | tr '\n' ' ') # Second line applies the space between `adds` and `includes` and then normalizes both.
   ```
   
   The equivalent one line solution would be
   ```bash
   adds=".baseline"
   ....
   archives=$(echo "${adds} $(ls | grep -vE ${excludes})" | tr '\n' ' ')
   ```
   I'm personally more of a fan of the two line solution as it removes the need for both a padded space (and the associated entire sentence commnent) in `$adds` and I personally find it more readable.
   
   However, this is just a nit / FYI and the current solution is fine (though I'm a big fan of removing the comment from `$adds` and letting the variable names be our comments as in the two line solution and also not requiring special padding in the $adds variable). 🙂 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] kbendick commented on a change in pull request #1227: Improve release script to detect archive list automatically

Posted by GitBox <gi...@apache.org>.
kbendick commented on a change in pull request #1227:
URL: https://github.com/apache/iceberg/pull/1227#discussion_r487662462



##########
File path: dev/source-release.sh
##########
@@ -66,7 +66,12 @@ tarball=$tag.tar.gz
 
 # be conservative and use the release hash, even though git produces the same
 # archive (identical hashes) using the scm tag
-git archive $release_hash --prefix $tag/ -o $tarball .baseline api arrow bundled-guava common core data dev flink gradle gradlew hive mr orc parquet pig spark spark2 spark-runtime spark3 spark3-runtime LICENSE NOTICE README.md build.gradle baseline.gradle deploy.gradle tasks.gradle jmh.gradle gradle.properties settings.gradle versions.lock versions.props version.txt
+adds=" .baseline"  # prefixed with a blank space for each file name including the first one
+excludes="build|examples|jitpack.yml|python|site"  # excluded as they are not of use for releasing jars
+echo "Excluded files and directories: ${excludes}"
+archives=$(git ls-tree --name-only -r HEAD | cut -d"/" -f1 | uniq | grep -vE ${excludes} | tr '\n' ' ')${adds}
+echo "Included files and directories: ${archives}"
+git archive $release_hash --prefix $tag/ -o $tarball ${archives}

Review comment:
       Oh, I also found that since we already created a tag and pushed it, we could possibly just use that tag on tag^{tree}. I've yet to find a way to make this work with our excludes list yet though.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] kbendick commented on a change in pull request #1227: Improve release script to detect archive list automatically

Posted by GitBox <gi...@apache.org>.
kbendick commented on a change in pull request #1227:
URL: https://github.com/apache/iceberg/pull/1227#discussion_r463297612



##########
File path: dev/source-release.sh
##########
@@ -66,7 +66,11 @@ tarball=$tag.tar.gz
 
 # be conservative and use the release hash, even though git produces the same
 # archive (identical hashes) using the scm tag
-git archive $release_hash --prefix $tag/ -o $tarball .baseline api arrow bundled-guava common core data dev flink gradle gradlew hive mr orc parquet pig spark spark2 spark-runtime spark3 spark3-runtime LICENSE NOTICE README.md build.gradle baseline.gradle deploy.gradle tasks.gradle jmh.gradle gradle.properties settings.gradle versions.lock versions.props version.txt
+adds=" .baseline"  # prefixed with a blank space for each file name including the first one
+excludes="build|examples|jitpack.yml|python|site"
+archives=$(ls | grep -vE ${excludes})${adds}
+echo git archive list: ${archives}

Review comment:
       So perhaps I'm running this wrong, but my `$archives` comes out formatted kind of funny based on my expectations. Everything is separated by a newline character except for ` .baseline` which is separated by a space from the last entry. I believe you're correct that `git-archive` excpects the output of archive files / directories to be separated by spaces. Is it possible that we need to munge the output of `$archives` to be space delimited instead of the mixture of newline delimited and space delimited that I got? It's also possible that I ran the script wrong.
   
   Below is my $archives list after running this portion of the script using `apache-iceberg-0.9.0` release (I couldn't find any releases tagged with as an `rc` candidate though I didn't spend a ton of time looking as it likely wouldn't have changed the output of this portion of the script.
   
   ```bash
   $ echo ${archives}
   LICENSE
   NOTICE
   README.md
   api
   arrow
   baseline.gradle
   bundled-guava
   common
   core
   data
   deploy.gradle
   dev
   flink
   gradle
   gradle.properties
   gradlew
   hive
   jmh.gradle
   mr
   orc
   parquet
   pig
   settings.gradle
   spark
   spark-runtime
   spark2
   spark3
   spark3-runtime
   tasks.gradle
   versions.props .baseline
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] kbendick commented on a change in pull request #1227: Improve release script to detect archive list automatically

Posted by GitBox <gi...@apache.org>.
kbendick commented on a change in pull request #1227:
URL: https://github.com/apache/iceberg/pull/1227#discussion_r487610701



##########
File path: dev/source-release.sh
##########
@@ -66,7 +66,12 @@ tarball=$tag.tar.gz
 
 # be conservative and use the release hash, even though git produces the same
 # archive (identical hashes) using the scm tag
-git archive $release_hash --prefix $tag/ -o $tarball .baseline api arrow bundled-guava common core data dev flink gradle gradlew hive mr orc parquet pig spark spark2 spark-runtime spark3 spark3-runtime LICENSE NOTICE README.md build.gradle baseline.gradle deploy.gradle tasks.gradle jmh.gradle gradle.properties settings.gradle versions.lock versions.props version.txt
+adds=" .baseline"  # prefixed with a blank space for each file name including the first one
+excludes="build|examples|jitpack.yml|python|site"  # excluded as they are not of use for releasing jars
+echo "Excluded files and directories: ${excludes}"
+archives=$(git ls-tree --name-only -r HEAD | cut -d"/" -f1 | uniq | grep -vE ${excludes} | tr '\n' ' ')${adds}
+echo "Included files and directories: ${archives}"
+git archive $release_hash --prefix $tag/ -o $tarball ${archives}

Review comment:
       When I run it as is on my local, I get a fatalpathspec. However, when I substitute the variable of `$archives` with `$(git ls-tree --name-only -r HEAD | cut -d"/" -f1 | uniq | grep -vE ${excludes} | tr '\n' ' ')` directly, the tarball is correctly generated.
   
   ```
   $ git archive $release_hash --prefix $tag/ -o $tarball ${archives}
   fatal: pathspec '.asf.yaml .baseline .github .gitignore .travis.yml LICENSE NOTICE README.md api arrow baseline.gradle bundled-guava common core data deploy.gradle dev flink gradle.properties gradle gradlew hive-metastore hive-runtime jmh.gradle mr orc parquet pig settings.gradle spark-runtime spark spark2 spark3-runtime spark3 tasks.gradle version.txt versions.props ' did not match any files
   
   $ git archive $release_hash --prefix $tag/ -o $tarball $(git ls-tree --name-only -r HEAD | cut -d"/" -f1 | uniq | grep -vE ${excludes} | tr '\n' ' ')
   # Proper tarball generated, which I've been able to inspect the cotents of via tar -tf $tarball
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] kbendick commented on a change in pull request #1227: Improve release script to detect archive list automatically

Posted by GitBox <gi...@apache.org>.
kbendick commented on a change in pull request #1227:
URL: https://github.com/apache/iceberg/pull/1227#discussion_r487573087



##########
File path: dev/source-release.sh
##########
@@ -66,7 +66,11 @@ tarball=$tag.tar.gz
 
 # be conservative and use the release hash, even though git produces the same
 # archive (identical hashes) using the scm tag
-git archive $release_hash --prefix $tag/ -o $tarball .baseline api arrow bundled-guava common core data dev flink gradle gradlew hive mr orc parquet pig spark spark2 spark-runtime spark3 spark3-runtime LICENSE NOTICE README.md build.gradle baseline.gradle deploy.gradle tasks.gradle jmh.gradle gradle.properties settings.gradle versions.lock versions.props version.txt
+adds=" .baseline"  # prefixed with a blank space for each file name including the first one
+excludes="build|examples|jitpack.yml|python|site"
+archives=$(ls | grep -vE ${excludes})${adds}
+echo git archive list: ${archives}

Review comment:
       Marking this as closed as this has been fixed.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue closed pull request #1227: Improve release script to detect archive list automatically

Posted by GitBox <gi...@apache.org>.
rdblue closed pull request #1227:
URL: https://github.com/apache/iceberg/pull/1227


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] kbendick commented on a change in pull request #1227: Improve release script to detect archive list automatically

Posted by GitBox <gi...@apache.org>.
kbendick commented on a change in pull request #1227:
URL: https://github.com/apache/iceberg/pull/1227#discussion_r487611405



##########
File path: dev/source-release.sh
##########
@@ -66,7 +66,12 @@ tarball=$tag.tar.gz
 
 # be conservative and use the release hash, even though git produces the same
 # archive (identical hashes) using the scm tag
-git archive $release_hash --prefix $tag/ -o $tarball .baseline api arrow bundled-guava common core data dev flink gradle gradlew hive mr orc parquet pig spark spark2 spark-runtime spark3 spark3-runtime LICENSE NOTICE README.md build.gradle baseline.gradle deploy.gradle tasks.gradle jmh.gradle gradle.properties settings.gradle versions.lock versions.props version.txt
+adds=" .baseline"  # prefixed with a blank space for each file name including the first one
+excludes="build|examples|jitpack.yml|python|site"  # excluded as they are not of use for releasing jars
+echo "Excluded files and directories: ${excludes}"
+archives=$(git ls-tree --name-only -r HEAD | cut -d"/" -f1 | uniq | grep -vE ${excludes} | tr '\n' ' ')${adds}
+echo "Included files and directories: ${archives}"
+git archive $release_hash --prefix $tag/ -o $tarball ${archives}

Review comment:
       Also of note, is that the tarball is declared as `.tar.gz`, however, my output does not come gzipped.
   
   https://github.com/apache/iceberg/blob/7da4aea3fb5a2d976305bbfbea23007fb385dacd/dev/source-release.sh#L65




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] kbendick commented on a change in pull request #1227: Improve release script to detect archive list automatically

Posted by GitBox <gi...@apache.org>.
kbendick commented on a change in pull request #1227:
URL: https://github.com/apache/iceberg/pull/1227#discussion_r487611405



##########
File path: dev/source-release.sh
##########
@@ -66,7 +66,12 @@ tarball=$tag.tar.gz
 
 # be conservative and use the release hash, even though git produces the same
 # archive (identical hashes) using the scm tag
-git archive $release_hash --prefix $tag/ -o $tarball .baseline api arrow bundled-guava common core data dev flink gradle gradlew hive mr orc parquet pig spark spark2 spark-runtime spark3 spark3-runtime LICENSE NOTICE README.md build.gradle baseline.gradle deploy.gradle tasks.gradle jmh.gradle gradle.properties settings.gradle versions.lock versions.props version.txt
+adds=" .baseline"  # prefixed with a blank space for each file name including the first one
+excludes="build|examples|jitpack.yml|python|site"  # excluded as they are not of use for releasing jars
+echo "Excluded files and directories: ${excludes}"
+archives=$(git ls-tree --name-only -r HEAD | cut -d"/" -f1 | uniq | grep -vE ${excludes} | tr '\n' ' ')${adds}
+echo "Included files and directories: ${archives}"
+git archive $release_hash --prefix $tag/ -o $tarball ${archives}

Review comment:
       Also of note, is that the tarball is declared as `.tar.gz`, however, my output does not come gzipped.
   
   `tarball=$tag.tar.gz`
   https://github.com/apache/iceberg/blob/7da4aea3fb5a2d976305bbfbea23007fb385dacd/dev/source-release.sh#L65

##########
File path: dev/source-release.sh
##########
@@ -66,7 +66,12 @@ tarball=$tag.tar.gz
 
 # be conservative and use the release hash, even though git produces the same
 # archive (identical hashes) using the scm tag
-git archive $release_hash --prefix $tag/ -o $tarball .baseline api arrow bundled-guava common core data dev flink gradle gradlew hive mr orc parquet pig spark spark2 spark-runtime spark3 spark3-runtime LICENSE NOTICE README.md build.gradle baseline.gradle deploy.gradle tasks.gradle jmh.gradle gradle.properties settings.gradle versions.lock versions.props version.txt
+adds=" .baseline"  # prefixed with a blank space for each file name including the first one
+excludes="build|examples|jitpack.yml|python|site"  # excluded as they are not of use for releasing jars
+echo "Excluded files and directories: ${excludes}"
+archives=$(git ls-tree --name-only -r HEAD | cut -d"/" -f1 | uniq | grep -vE ${excludes} | tr '\n' ' ')${adds}
+echo "Included files and directories: ${archives}"
+git archive $release_hash --prefix $tag/ -o $tarball ${archives}

Review comment:
       Also of note, is that the tarball is declared as `.tar.gz`, however, my output does not come gzipped.
   
   https://github.com/apache/iceberg/blob/7da4aea3fb5a2d976305bbfbea23007fb385dacd/dev/source-release.sh#L65




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] kbendick commented on a change in pull request #1227: Improve release script to detect archive list automatically

Posted by GitBox <gi...@apache.org>.
kbendick commented on a change in pull request #1227:
URL: https://github.com/apache/iceberg/pull/1227#discussion_r483925144



##########
File path: dev/source-release.sh
##########
@@ -66,7 +66,12 @@ tarball=$tag.tar.gz
 
 # be conservative and use the release hash, even though git produces the same
 # archive (identical hashes) using the scm tag
-git archive $release_hash --prefix $tag/ -o $tarball .baseline api arrow bundled-guava common core data dev flink gradle gradlew hive mr orc parquet pig spark spark2 spark-runtime spark3 spark3-runtime LICENSE NOTICE README.md build.gradle baseline.gradle deploy.gradle tasks.gradle jmh.gradle gradle.properties settings.gradle versions.lock versions.props version.txt
+adds=" .baseline"  # prefixed with a blank space for each file name including the first one
+excludes="build|examples|jitpack.yml|python|site"  # excluded as they are not of use for releasing jars
+echo WARNING! The following files/directories are to be excluded from git archive: ${excludes}

Review comment:
       Can you please use quotes with `echo`? I'm not sure which shell you're using, but I suspect that this isn't going to be portable.
   
   i.e. `echo "NOTE: The following files/directories are to be excluded from git archive: ${excludes}"`. 
   
   Additionally, and this is more of a nit, but I would use `NOTE` over `WARNING`. To me, this is more of an info (or even debug) level log of sorts, and the use of `WARNING` might be confusing for someone simply glancing over it. Again, more of a nit really, but the quotes for `echo` are a best practice (if not always required in some non-posix shells for correctness reasons.).




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] kbendick commented on pull request #1227: Improve release script to detect archive list automatically

Posted by GitBox <gi...@apache.org>.
kbendick commented on pull request #1227:
URL: https://github.com/apache/iceberg/pull/1227#issuecomment-687572568


   Thank you so much for returning to this @waterlx! I have concerns about not using quotes when calling `echo`. It's not technically required in all shells, but it's fairly standard and it gives me some slight concerns for portability (especially if people are running in containers and using stricter shells). Other than that, it looks greatl though I have a small nit / suggestion for making the script slightly more readable (but I'll leave that to your call).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] kbendick commented on a change in pull request #1227: Improve release script to detect archive list automatically

Posted by GitBox <gi...@apache.org>.
kbendick commented on a change in pull request #1227:
URL: https://github.com/apache/iceberg/pull/1227#discussion_r487610701



##########
File path: dev/source-release.sh
##########
@@ -66,7 +66,12 @@ tarball=$tag.tar.gz
 
 # be conservative and use the release hash, even though git produces the same
 # archive (identical hashes) using the scm tag
-git archive $release_hash --prefix $tag/ -o $tarball .baseline api arrow bundled-guava common core data dev flink gradle gradlew hive mr orc parquet pig spark spark2 spark-runtime spark3 spark3-runtime LICENSE NOTICE README.md build.gradle baseline.gradle deploy.gradle tasks.gradle jmh.gradle gradle.properties settings.gradle versions.lock versions.props version.txt
+adds=" .baseline"  # prefixed with a blank space for each file name including the first one
+excludes="build|examples|jitpack.yml|python|site"  # excluded as they are not of use for releasing jars
+echo "Excluded files and directories: ${excludes}"
+archives=$(git ls-tree --name-only -r HEAD | cut -d"/" -f1 | uniq | grep -vE ${excludes} | tr '\n' ' ')${adds}
+echo "Included files and directories: ${archives}"
+git archive $release_hash --prefix $tag/ -o $tarball ${archives}

Review comment:
       When I run it as is on my local, I get a fatalpathspec. However, when I substitute the variable of `$archives` with `$(git ls-tree --name-only -r HEAD | cut -d"/" -f1 | uniq | grep -vE ${excludes} | tr '\n' ' ')` directly, the tarball is correctly generated.
   
   ```
   $ git archive $release_hash --prefix $tag/ -o $tarball ${archives}
   fatal: pathspec '.asf.yaml .baseline .github .gitignore .travis.yml LICENSE NOTICE README.md api arrow baseline.gradle bundled-guava common core data deploy.gradle dev flink gradle.properties gradle gradlew hive-metastore hive-runtime jmh.gradle mr orc parquet pig settings.gradle spark-runtime spark spark2 spark3-runtime spark3 tasks.gradle version.txt versions.props ' did not match any files
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on pull request #1227: Improve release script to detect archive list automatically

Posted by GitBox <gi...@apache.org>.
rdblue commented on pull request #1227:
URL: https://github.com/apache/iceberg/pull/1227#issuecomment-691749089


   I just tried running this locally, but it failed with this error: `fatal: pathspec 'bdp-iceberg' did not match any files`
   
   It looks like the list operation can introduce files or directories that aren't tracked by git and will cause the command to fail.
   
   The idea behind using `git archive` with a list was to ensure that the local copy doesn't leak into the archive so you can have extra files and folders that are automatically ignored. If we committed this, then a release manager would need to fix their local copy before running the script.
   
   Can we fix this by getting the listing from `git` instead?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #1227: Improve release script to detect archive list automatically

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #1227:
URL: https://github.com/apache/iceberg/pull/1227#discussion_r487597227



##########
File path: dev/source-release.sh
##########
@@ -66,7 +66,12 @@ tarball=$tag.tar.gz
 
 # be conservative and use the release hash, even though git produces the same
 # archive (identical hashes) using the scm tag
-git archive $release_hash --prefix $tag/ -o $tarball .baseline api arrow bundled-guava common core data dev flink gradle gradlew hive mr orc parquet pig spark spark2 spark-runtime spark3 spark3-runtime LICENSE NOTICE README.md build.gradle baseline.gradle deploy.gradle tasks.gradle jmh.gradle gradle.properties settings.gradle versions.lock versions.props version.txt
+adds=" .baseline"  # prefixed with a blank space for each file name including the first one
+excludes="build|examples|jitpack.yml|python|site"  # excluded as they are not of use for releasing jars
+echo WARNING! The following files/directories are to be excluded from git archive: ${excludes}

Review comment:
       I agree with this, and I would make the message even simpler since everything is basically at the INFO level:
   
   ```
   echo "Archive exclude list: ${excludes}"
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] kbendick commented on a change in pull request #1227: Improve release script to detect archive list automatically

Posted by GitBox <gi...@apache.org>.
kbendick commented on a change in pull request #1227:
URL: https://github.com/apache/iceberg/pull/1227#discussion_r487671520



##########
File path: dev/source-release.sh
##########
@@ -66,7 +66,12 @@ tarball=$tag.tar.gz
 
 # be conservative and use the release hash, even though git produces the same
 # archive (identical hashes) using the scm tag
-git archive $release_hash --prefix $tag/ -o $tarball .baseline api arrow bundled-guava common core data dev flink gradle gradlew hive mr orc parquet pig spark spark2 spark-runtime spark3 spark3-runtime LICENSE NOTICE README.md build.gradle baseline.gradle deploy.gradle tasks.gradle jmh.gradle gradle.properties settings.gradle versions.lock versions.props version.txt
+adds=" .baseline"  # prefixed with a blank space for each file name including the first one
+excludes="build|examples|jitpack.yml|python|site"  # excluded as they are not of use for releasing jars
+echo "Excluded files and directories: ${excludes}"
+archives=$(git ls-tree --name-only -r HEAD | cut -d"/" -f1 | uniq | grep -vE ${excludes} | tr '\n' ' ')${adds}
+echo "Included files and directories: ${archives}"
+git archive $release_hash --prefix $tag/ -o $tarball ${archives}

Review comment:
       Scratch that, we can tag certain files or folders with `export-ignore` in the .gitattributes file. This is the proper way to exclude them from a git archive command as it's built into git itself. This would also allows us to skip the admittedly large amount of shell commands that are going into this. And while some might not be familiar with `.gitattributes`, I would suspect that a release engineer should hopefully be.
   
   Here's the documentation on the git archive command: https://www.linux.org/docs/man1/git-archive.html
   And here's a post discussing the usage of `export-ignore` in the `.gitattributes` file to properly ignore files. https://feeding.cloud.geek.nz/posts/excluding-files-from-git-archive/
   
   I think this is the best approach as it uses the in built tools of git for generating this (so hopefully it should be better understood by release engineers), and it involves way less shell munging than currently.
   
   An example .gitattributes file for the excludes we want would be something like
   ```
   /build export-ignore
   /examples export-ignore
   .jitpack.yml export-ignore
   /python export-ignore
   /site export-ignore
   ```
   
   The above complicated set of `excludes`, `archives, etc could instead be used off of the tag that is already pushed earlier in the script.
   
   ```bash
   # Assuming the .gitattributes file is set up with the appropriate export-ignore
   git archive --prefix $tag/ -o $tarball --worktree-attributes  $tag^{tree}
   ```
   
   I am going to open a PR for adding a `.gitattributes` file so we can test with that. You can feel free to add it to this branch if you prefer, but I can't push to this branch so it's easier for me to create a separate PR.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] waterlx commented on a change in pull request #1227: Improve release script to detect archive list automatically

Posted by GitBox <gi...@apache.org>.
waterlx commented on a change in pull request #1227:
URL: https://github.com/apache/iceberg/pull/1227#discussion_r480179970



##########
File path: dev/source-release.sh
##########
@@ -66,7 +66,11 @@ tarball=$tag.tar.gz
 
 # be conservative and use the release hash, even though git produces the same
 # archive (identical hashes) using the scm tag
-git archive $release_hash --prefix $tag/ -o $tarball .baseline api arrow bundled-guava common core data dev flink gradle gradlew hive mr orc parquet pig spark spark2 spark-runtime spark3 spark3-runtime LICENSE NOTICE README.md build.gradle baseline.gradle deploy.gradle tasks.gradle jmh.gradle gradle.properties settings.gradle versions.lock versions.props version.txt
+adds=" .baseline"  # prefixed with a blank space for each file name including the first one
+excludes="build|examples|jitpack.yml|python|site"
+archives=$(ls | grep -vE ${excludes})${adds}
+echo git archive list: ${archives}

Review comment:
       @kbendick Really sorry for my late response~~ Thanks for pointing that out. I never noticed that.
   But it seems a happy accident that it has the expected output...
   I made the following experiments just now:
   1. `echo ${archives}` directly (not in a shell script) -> the output is new line delimited.
   2. `echo "${archives}"` directly (not in a shell script) -> the output is new line delimited. 
   3. `echo ${archives}` in a shell sript -> the output is **space** delimited   <-- my lucky
   4. `echo "${archives}"` in a shell sript -> the output is new line delimited
   
   Yes, it is more safe to normalize the input to be all space delimited




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] waterlx edited a comment on pull request #1227: Improve release script to detect archive list automatically

Posted by GitBox <gi...@apache.org>.
waterlx edited a comment on pull request #1227:
URL: https://github.com/apache/iceberg/pull/1227#issuecomment-662477808


   @rdblue would you please review this change at your most convenience?
   
   In our internal release within Tencent, we always forgot to update the release script when adding/removing modules. That could be only detected when dev/source-release.sh is executed when the release is being performed.
   If I remembered correctly, 0.9.0 release also met the same condition 8-)
   Currently, versions.lock is also a candidate to remove. We might need an automatc way to detect the archive list and I feel that the exclusion list might be more stable than the inclusion list used currently.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] kbendick commented on a change in pull request #1227: Improve release script to detect archive list automatically

Posted by GitBox <gi...@apache.org>.
kbendick commented on a change in pull request #1227:
URL: https://github.com/apache/iceberg/pull/1227#discussion_r463297612



##########
File path: dev/source-release.sh
##########
@@ -66,7 +66,11 @@ tarball=$tag.tar.gz
 
 # be conservative and use the release hash, even though git produces the same
 # archive (identical hashes) using the scm tag
-git archive $release_hash --prefix $tag/ -o $tarball .baseline api arrow bundled-guava common core data dev flink gradle gradlew hive mr orc parquet pig spark spark2 spark-runtime spark3 spark3-runtime LICENSE NOTICE README.md build.gradle baseline.gradle deploy.gradle tasks.gradle jmh.gradle gradle.properties settings.gradle versions.lock versions.props version.txt
+adds=" .baseline"  # prefixed with a blank space for each file name including the first one
+excludes="build|examples|jitpack.yml|python|site"
+archives=$(ls | grep -vE ${excludes})${adds}
+echo git archive list: ${archives}

Review comment:
       So perhaps I'm running this wrong, but my `$archives` comes out formatted kind of funny based on my expectations. Everything is separated by a newline character except for ` .baseline` which is separated by a space from the last entry. I believe you're correct that `git-archive` excpects the output of archive files / directories to be separated by spaces. Is it possible that we need to munge the output of `$archives` to be space delimited instead of the mixture of newline delimited and space delimited that I got? It's also possible that I ran the script wrong.
   
   Below is my $archives list after running this portion of the script using `apache-iceberg-0.9.0` release (I couldn't find any releases tagged with as an `rc` candidate though I didn't spend a ton of time looking as it likely wouldn't have changed the output of this portion of the script.
   
   ```bash
   $ echo ${archives}
   LICENSE
   NOTICE
   README.md
   api
   arrow
   baseline.gradle
   bundled-guava
   common
   core
   data
   deploy.gradle
   dev
   flink
   gradle
   gradle.properties
   gradlew
   hive
   jmh.gradle
   mr
   orc
   parquet
   pig
   settings.gradle
   spark
   spark-runtime
   spark2
   spark3
   spark3-runtime
   tasks.gradle
   versions.props .baseline
   ```
   
   I would think at the very least we would need to add `\` to the end of every line for the bash shell to understand the remaining lines are part of the same statement.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] kbendick commented on a change in pull request #1227: Improve release script to detect archive list automatically

Posted by GitBox <gi...@apache.org>.
kbendick commented on a change in pull request #1227:
URL: https://github.com/apache/iceberg/pull/1227#discussion_r487611802



##########
File path: dev/source-release.sh
##########
@@ -66,7 +66,12 @@ tarball=$tag.tar.gz
 
 # be conservative and use the release hash, even though git produces the same
 # archive (identical hashes) using the scm tag
-git archive $release_hash --prefix $tag/ -o $tarball .baseline api arrow bundled-guava common core data dev flink gradle gradlew hive mr orc parquet pig spark spark2 spark-runtime spark3 spark3-runtime LICENSE NOTICE README.md build.gradle baseline.gradle deploy.gradle tasks.gradle jmh.gradle gradle.properties settings.gradle versions.lock versions.props version.txt
+adds=" .baseline"  # prefixed with a blank space for each file name including the first one
+excludes="build|examples|jitpack.yml|python|site"  # excluded as they are not of use for releasing jars
+echo "Excluded files and directories: ${excludes}"
+archives=$(git ls-tree --name-only -r HEAD | cut -d"/" -f1 | uniq | grep -vE ${excludes} | tr '\n' ' ')${adds}
+echo "Included files and directories: ${archives}"
+git archive $release_hash --prefix $tag/ -o $tarball ${archives}

Review comment:
       If we use the flag `--format .tar.gz` to `git archive`, I believe that should generate a truly gzipped tarball.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] kbendick removed a comment on pull request #1227: Improve release script to detect archive list automatically

Posted by GitBox <gi...@apache.org>.
kbendick removed a comment on pull request #1227:
URL: https://github.com/apache/iceberg/pull/1227#issuecomment-687568968


   Hi @waterlx. I will take a look at this this weekend (likely tomorrow). Thanks for coming back to it!


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on pull request #1227: Improve release script to detect archive list automatically

Posted by GitBox <gi...@apache.org>.
rdblue commented on pull request #1227:
URL: https://github.com/apache/iceberg/pull/1227#issuecomment-692393823


   I merged #1457 and #1459 to address this using .gitattributes, since that seems like a better solution than trying to do too much on the command line. Thanks for working on this, @waterlx! I think the current master should fix the problem for you.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #1227: Improve release script to detect archive list automatically

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #1227:
URL: https://github.com/apache/iceberg/pull/1227#discussion_r488078063



##########
File path: dev/source-release.sh
##########
@@ -66,7 +66,12 @@ tarball=$tag.tar.gz
 
 # be conservative and use the release hash, even though git produces the same
 # archive (identical hashes) using the scm tag
-git archive $release_hash --prefix $tag/ -o $tarball .baseline api arrow bundled-guava common core data dev flink gradle gradlew hive mr orc parquet pig spark spark2 spark-runtime spark3 spark3-runtime LICENSE NOTICE README.md build.gradle baseline.gradle deploy.gradle tasks.gradle jmh.gradle gradle.properties settings.gradle versions.lock versions.props version.txt
+adds=" .baseline"  # prefixed with a blank space for each file name including the first one
+excludes="build|examples|jitpack.yml|python|site"  # excluded as they are not of use for releasing jars
+echo "Excluded files and directories: ${excludes}"
+archives=$(git ls-tree --name-only -r HEAD | cut -d"/" -f1 | uniq | grep -vE ${excludes} | tr '\n' ' ')${adds}
+echo "Included files and directories: ${archives}"
+git archive $release_hash --prefix $tag/ -o $tarball ${archives}

Review comment:
       Thanks for looking into it more, @kbendick!




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] waterlx commented on pull request #1227: Improve release script to detect archive list automatically

Posted by GitBox <gi...@apache.org>.
waterlx commented on pull request #1227:
URL: https://github.com/apache/iceberg/pull/1227#issuecomment-662477808


   @rdblue would you please review this change at your most convenience?
   
   In our internal release within Tencent, we always forgot to update the release script when adding/removing modules. That could be only detected when dev/source-release.sh is executed when the release is being performed.
   If I remembered correctly, 0.9.0 release also met the same condition 8-)
   Currently, versions.lock is also a candidate to remove. We might need an automatic way to detect the archive list.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] kbendick edited a comment on pull request #1227: Improve release script to detect archive list automatically

Posted by GitBox <gi...@apache.org>.
kbendick edited a comment on pull request #1227:
URL: https://github.com/apache/iceberg/pull/1227#issuecomment-687572568


   Thank you so much for returning to this @waterlx! My apologies for the delay, my work has been rather busy this past week.
   
   I have concerns about not using quotes when calling `echo`. It's not technically required in all shells, but it's a fairly standard practice (`echo` uses quotes elsewhere in the script). It gives me some slight concerns for portability of the shell script (especially if people are running in containers and using stricter shells). Other than that, it looks great! I have a small nit / suggestion for making the script slightly more readable and to allow for removing hat entire sentence comment on `$adds` (but I'll leave that to your call).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] kbendick commented on a change in pull request #1227: Improve release script to detect archive list automatically

Posted by GitBox <gi...@apache.org>.
kbendick commented on a change in pull request #1227:
URL: https://github.com/apache/iceberg/pull/1227#discussion_r483925144



##########
File path: dev/source-release.sh
##########
@@ -66,7 +66,12 @@ tarball=$tag.tar.gz
 
 # be conservative and use the release hash, even though git produces the same
 # archive (identical hashes) using the scm tag
-git archive $release_hash --prefix $tag/ -o $tarball .baseline api arrow bundled-guava common core data dev flink gradle gradlew hive mr orc parquet pig spark spark2 spark-runtime spark3 spark3-runtime LICENSE NOTICE README.md build.gradle baseline.gradle deploy.gradle tasks.gradle jmh.gradle gradle.properties settings.gradle versions.lock versions.props version.txt
+adds=" .baseline"  # prefixed with a blank space for each file name including the first one
+excludes="build|examples|jitpack.yml|python|site"  # excluded as they are not of use for releasing jars
+echo WARNING! The following files/directories are to be excluded from git archive: ${excludes}

Review comment:
       Can you please use quotes with `echo`? I'm not sure which shell you're using, but I suspect that this isn't going to be portable.
   
   i.e. `echo "NOTE: The following files/directories are to be excluded from git archive: ${excludes}"`. 
   
   Additionally, and this is more of a nit, but I would use `NOTE` over `WARNING`. To me, this is more of an info (or even debug) level log of sorts, and the use of `WARNING` might be confusing for someone simply glancing over it. Again, more of a nit really, but the quotes for `echo` are a best practice (if not typically required in some shells for correctness reasons.).




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] kbendick commented on a change in pull request #1227: Improve release script to detect archive list automatically

Posted by GitBox <gi...@apache.org>.
kbendick commented on a change in pull request #1227:
URL: https://github.com/apache/iceberg/pull/1227#discussion_r487659408



##########
File path: dev/source-release.sh
##########
@@ -66,7 +66,12 @@ tarball=$tag.tar.gz
 
 # be conservative and use the release hash, even though git produces the same
 # archive (identical hashes) using the scm tag
-git archive $release_hash --prefix $tag/ -o $tarball .baseline api arrow bundled-guava common core data dev flink gradle gradlew hive mr orc parquet pig spark spark2 spark-runtime spark3 spark3-runtime LICENSE NOTICE README.md build.gradle baseline.gradle deploy.gradle tasks.gradle jmh.gradle gradle.properties settings.gradle versions.lock versions.props version.txt
+adds=" .baseline"  # prefixed with a blank space for each file name including the first one
+excludes="build|examples|jitpack.yml|python|site"  # excluded as they are not of use for releasing jars
+echo "Excluded files and directories: ${excludes}"
+archives=$(git ls-tree --name-only -r HEAD | cut -d"/" -f1 | uniq | grep -vE ${excludes} | tr '\n' ' ')${adds}
+echo "Included files and directories: ${archives}"
+git archive $release_hash --prefix $tag/ -o $tarball ${archives}

Review comment:
       Ok. So I've I'm come up wth another solution that generates the proper tarball, akin to my second command there, but still broken up into chunks (vs doing WAY too much shell scripting and commands on a single line).
   
   Instead of doing `git archive $release_hash --prefix $tag/ -o $tarball $(git ls-tree --name-only -r HEAD | cut -d"/" -f1 | uniq | grep -vE ${excludes} | tr '\n' ' ')`, where we inline what was previously the $archives variable, we can either write the $archives to a file and then cat that file out as part of the git archive or we can just echo the $archives variable in line (which seems to handle the weird string issues in my zsh shell, which is a relatively common shell).
   
   If we wrote it to a file, that file wouldn't get included as it wouldn't be in the `git ls-tree`. We'd just want to `rm` it at the end to do some clean up. Alternatively, we could create the `/tmp` filder somewhat higher up in the script so that we can place the archives file there and then it wouldn't be in danger of accidentally making it into the git tree as `/tmp` is part of the gitignore.
   
   So here's my proposed solution. It's a little more complex than using $archives inline, but my zsh shell seems to be interpreting it as a single string when we substitute it in and either of these approaches gets around that.
   
   ```bash
   # Note I am removing the adds line as it is no longer needed because the `git ls-tree` will include .baseline
   excludes="build|examples|jitpack.yml|python|site"  # excluded as they are not of use for releasing jars
   # Alternativey, we might add the `.github` directory to the excludes list as it's not needed for releasing
   echo "Excluded files and directories: ${excludes}"
   # Generate the files to archive
   # Note that we no longer need to append the $adds as the `.baseline` file comes from git ls-tree
   archives=$(git ls-tree --name-only -r HEAD | cut -d"/" -f1 | uniq | grep -vE ${excludes} | tr '\n' ' ')
   echo "Included files and directories: ${archives}"
   
   # Two possible ways of achieving this.
   # Way 1: cat the variable so as to make it no longer a string. This worked for me.
   git archive $release_hash --prefix $tag/ -o $tarball $(cat $archives) 
   
   # Way two: Write the archives to a file and then echo that file where we are currently placing $archives
   echo $archives >> to_include.txt
   git archive $release_hash --prefix $tag/ -o $tarball $(cat to_include.txt) # Also generates the tarball
   rm to_include.txt
   ```
   
   I'm still looking to see if there's a cleaner way to do this, but either of the above solutions will work across multiple shells. I think excluding zsh shell portability would be short sighted as it's now the default on the latest versions of macOS if I'm not mistaken.
   
   Both of my solutions however generated a tarball that have the exact same results when inspecting via `tar -tf $tarball`.
   
   Let me know what you think @rdblue. I'm personally fond of writing it out to a file.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] kbendick commented on a change in pull request #1227: Improve release script to detect archive list automatically

Posted by GitBox <gi...@apache.org>.
kbendick commented on a change in pull request #1227:
URL: https://github.com/apache/iceberg/pull/1227#discussion_r463297612



##########
File path: dev/source-release.sh
##########
@@ -66,7 +66,11 @@ tarball=$tag.tar.gz
 
 # be conservative and use the release hash, even though git produces the same
 # archive (identical hashes) using the scm tag
-git archive $release_hash --prefix $tag/ -o $tarball .baseline api arrow bundled-guava common core data dev flink gradle gradlew hive mr orc parquet pig spark spark2 spark-runtime spark3 spark3-runtime LICENSE NOTICE README.md build.gradle baseline.gradle deploy.gradle tasks.gradle jmh.gradle gradle.properties settings.gradle versions.lock versions.props version.txt
+adds=" .baseline"  # prefixed with a blank space for each file name including the first one
+excludes="build|examples|jitpack.yml|python|site"
+archives=$(ls | grep -vE ${excludes})${adds}
+echo git archive list: ${archives}

Review comment:
       So perhaps I'm running this wrong, but my `$archives` comes out formatted kind of funny based on my expectations. Everything is separated by a newline character except for ` .baseline` which is separated by a space from the last entry. I believe you're correct that `git-archive` excpects the output of archive files / directories to be separated by spaces. Is it possible that we need to munge the output of `$archives` to be space delimited instead of the mixture of newline delimited and space delimited that I got? It's also possible that I ran the script wrong.
   
   Below is my $archives list after running this portion of the script using `apache-iceberg-0.9.0` release (I couldn't find any releases tagged with as an `rc` candidate though I didn't spend a ton of time looking as it likely wouldn't have changed the output of this portion of the script.
   
   ```bash
   $ echo ${archives}
   LICENSE
   NOTICE
   README.md
   api
   arrow
   baseline.gradle
   bundled-guava
   common
   core
   data
   deploy.gradle
   dev
   flink
   gradle
   gradle.properties
   gradlew
   hive
   jmh.gradle
   mr
   orc
   parquet
   pig
   settings.gradle
   spark
   spark-runtime
   spark2
   spark3
   spark3-runtime
   tasks.gradle
   versions.props .baseline
   ```
   
   I would think at the very least we would need to add `\` to the end of every line for the bash shell to understand the remaining lines are part of the same statement.

##########
File path: dev/source-release.sh
##########
@@ -66,7 +66,11 @@ tarball=$tag.tar.gz
 
 # be conservative and use the release hash, even though git produces the same
 # archive (identical hashes) using the scm tag
-git archive $release_hash --prefix $tag/ -o $tarball .baseline api arrow bundled-guava common core data dev flink gradle gradlew hive mr orc parquet pig spark spark2 spark-runtime spark3 spark3-runtime LICENSE NOTICE README.md build.gradle baseline.gradle deploy.gradle tasks.gradle jmh.gradle gradle.properties settings.gradle versions.lock versions.props version.txt
+adds=" .baseline"  # prefixed with a blank space for each file name including the first one
+excludes="build|examples|jitpack.yml|python|site"
+archives=$(ls | grep -vE ${excludes})${adds}
+echo git archive list: ${archives}

Review comment:
       If I add `archives=$(echo $archives | tr '\n' ' ')`, then my `$archives` output is space delimited as I believe is the expected behavior for input to `git archive`.
   
   ```bash
   $ echo ${archives}
   LICENSE
   NOTICE
   README.md
   api
   arrow
   baseline.gradle
   bundled-guava
   common
   core
   data
   deploy.gradle
   dev
   flink
   gradle
   gradle.properties
   gradlew
   hive
   jmh.gradle
   mr
   orc
   parquet
   pig
   settings.gradle
   spark
   spark-runtime
   spark2
   spark3
   spark3-runtime
   tasks.gradle
   versions.props .baseline
   
   $ archives=$(echo $archives | tr '\n' ' ')
   $ echo $archives
   LICENSE NOTICE README.md api arrow baseline.gradle bundled-guava common core data deploy.gradle dev flink gradle gradle.properties gradlew hive jmh.gradle mr orc parquet pig settings.gradle spark spark-runtime spark2 spark3 spark3-runtime tasks.gradle versions.props .baseline 
   ```

##########
File path: dev/source-release.sh
##########
@@ -66,7 +66,11 @@ tarball=$tag.tar.gz
 
 # be conservative and use the release hash, even though git produces the same
 # archive (identical hashes) using the scm tag
-git archive $release_hash --prefix $tag/ -o $tarball .baseline api arrow bundled-guava common core data dev flink gradle gradlew hive mr orc parquet pig spark spark2 spark-runtime spark3 spark3-runtime LICENSE NOTICE README.md build.gradle baseline.gradle deploy.gradle tasks.gradle jmh.gradle gradle.properties settings.gradle versions.lock versions.props version.txt
+adds=" .baseline"  # prefixed with a blank space for each file name including the first one
+excludes="build|examples|jitpack.yml|python|site"
+archives=$(ls | grep -vE ${excludes})${adds}
+echo git archive list: ${archives}

Review comment:
       Ahh I see. It's a good thing we checked on another shell then. It's probably safer to normalize to ensure the script is more portable.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] kbendick commented on a change in pull request #1227: Improve release script to detect archive list automatically

Posted by GitBox <gi...@apache.org>.
kbendick commented on a change in pull request #1227:
URL: https://github.com/apache/iceberg/pull/1227#discussion_r483927362



##########
File path: dev/source-release.sh
##########
@@ -66,7 +66,12 @@ tarball=$tag.tar.gz
 
 # be conservative and use the release hash, even though git produces the same
 # archive (identical hashes) using the scm tag
-git archive $release_hash --prefix $tag/ -o $tarball .baseline api arrow bundled-guava common core data dev flink gradle gradlew hive mr orc parquet pig spark spark2 spark-runtime spark3 spark3-runtime LICENSE NOTICE README.md build.gradle baseline.gradle deploy.gradle tasks.gradle jmh.gradle gradle.properties settings.gradle versions.lock versions.props version.txt
+adds=" .baseline"  # prefixed with a blank space for each file name including the first one
+excludes="build|examples|jitpack.yml|python|site"  # excluded as they are not of use for releasing jars
+echo WARNING! The following files/directories are to be excluded from git archive: ${excludes}
+archives=$(ls | grep -vE ${excludes} | tr '\n' ' ')${adds}
+echo git archive list: ${archives}

Review comment:
       The same comment of using quotes with echo applies here.
   
   I believe this should be `echo "git archive list: ${archives}"` for shell portability reasons and for consistent style reasons. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] kbendick commented on a change in pull request #1227: Improve release script to detect archive list automatically

Posted by GitBox <gi...@apache.org>.
kbendick commented on a change in pull request #1227:
URL: https://github.com/apache/iceberg/pull/1227#discussion_r483925144



##########
File path: dev/source-release.sh
##########
@@ -66,7 +66,12 @@ tarball=$tag.tar.gz
 
 # be conservative and use the release hash, even though git produces the same
 # archive (identical hashes) using the scm tag
-git archive $release_hash --prefix $tag/ -o $tarball .baseline api arrow bundled-guava common core data dev flink gradle gradlew hive mr orc parquet pig spark spark2 spark-runtime spark3 spark3-runtime LICENSE NOTICE README.md build.gradle baseline.gradle deploy.gradle tasks.gradle jmh.gradle gradle.properties settings.gradle versions.lock versions.props version.txt
+adds=" .baseline"  # prefixed with a blank space for each file name including the first one
+excludes="build|examples|jitpack.yml|python|site"  # excluded as they are not of use for releasing jars
+echo WARNING! The following files/directories are to be excluded from git archive: ${excludes}

Review comment:
       Can you please use quotes with `echo`? I'm not sure which shell you're using, but I suspect that this isn't going to be portable.
   
   i.e. `echo "WARNING! The following files/directories are to be excluded from git archive: ${excludes}"`. 
   
   Additionally, and this is more of a nit, but I would use `NOTE` over `WARNING`. To me, this is more of an info (or even debug) level log of sorts, and the use of `WARNING` might be confusing for someone simply glancing over it. Again, more of a nit really, but the quotes for `echo` are a best practice (if not typically required in some shells for correctness reasons.).




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] waterlx commented on a change in pull request #1227: Improve release script to detect archive list automatically

Posted by GitBox <gi...@apache.org>.
waterlx commented on a change in pull request #1227:
URL: https://github.com/apache/iceberg/pull/1227#discussion_r480193587



##########
File path: dev/source-release.sh
##########
@@ -66,7 +66,11 @@ tarball=$tag.tar.gz
 
 # be conservative and use the release hash, even though git produces the same
 # archive (identical hashes) using the scm tag
-git archive $release_hash --prefix $tag/ -o $tarball .baseline api arrow bundled-guava common core data dev flink gradle gradlew hive mr orc parquet pig spark spark2 spark-runtime spark3 spark3-runtime LICENSE NOTICE README.md build.gradle baseline.gradle deploy.gradle tasks.gradle jmh.gradle gradle.properties settings.gradle versions.lock versions.props version.txt
+adds=" .baseline"  # prefixed with a blank space for each file name including the first one
+excludes="build|examples|jitpack.yml|python|site"

Review comment:
       Addressed




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] waterlx commented on a change in pull request #1227: Improve release script to detect archive list automatically

Posted by GitBox <gi...@apache.org>.
waterlx commented on a change in pull request #1227:
URL: https://github.com/apache/iceberg/pull/1227#discussion_r480179970



##########
File path: dev/source-release.sh
##########
@@ -66,7 +66,11 @@ tarball=$tag.tar.gz
 
 # be conservative and use the release hash, even though git produces the same
 # archive (identical hashes) using the scm tag
-git archive $release_hash --prefix $tag/ -o $tarball .baseline api arrow bundled-guava common core data dev flink gradle gradlew hive mr orc parquet pig spark spark2 spark-runtime spark3 spark3-runtime LICENSE NOTICE README.md build.gradle baseline.gradle deploy.gradle tasks.gradle jmh.gradle gradle.properties settings.gradle versions.lock versions.props version.txt
+adds=" .baseline"  # prefixed with a blank space for each file name including the first one
+excludes="build|examples|jitpack.yml|python|site"
+archives=$(ls | grep -vE ${excludes})${adds}
+echo git archive list: ${archives}

Review comment:
       @kbendick Really sorry for my late response~~ Thanks for pointing that out. I never noticed that.
   But it seems a happy accident that it has the expected output...
   I made the following experiments just now:
   1. `echo ${archives}` directly (not in a shell script) -> the output is new line delimited.
   2. `echo "${archives}"` directly (not in a shell script) -> the output is new line delimited. 
   3. `echo ${archives}` in a shell sript -> the output is **space** delimited   <-- my lucky
   4. `echo "${archives}"` in a shell sript -> the output is new line delimited
   
   Yes, it is more safe to normalize the input to be all space delimited




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] kbendick commented on a change in pull request #1227: Improve release script to detect archive list automatically

Posted by GitBox <gi...@apache.org>.
kbendick commented on a change in pull request #1227:
URL: https://github.com/apache/iceberg/pull/1227#discussion_r463306415



##########
File path: dev/source-release.sh
##########
@@ -66,7 +66,11 @@ tarball=$tag.tar.gz
 
 # be conservative and use the release hash, even though git produces the same
 # archive (identical hashes) using the scm tag
-git archive $release_hash --prefix $tag/ -o $tarball .baseline api arrow bundled-guava common core data dev flink gradle gradlew hive mr orc parquet pig spark spark2 spark-runtime spark3 spark3-runtime LICENSE NOTICE README.md build.gradle baseline.gradle deploy.gradle tasks.gradle jmh.gradle gradle.properties settings.gradle versions.lock versions.props version.txt
+adds=" .baseline"  # prefixed with a blank space for each file name including the first one
+excludes="build|examples|jitpack.yml|python|site"

Review comment:
       Should we consider printing a warning to the console of what are the `$excludes` in case a file is introduced that needs to be added? A comment might also serve here about adding to `$excludes` if need be.

##########
File path: dev/source-release.sh
##########
@@ -66,7 +66,11 @@ tarball=$tag.tar.gz
 
 # be conservative and use the release hash, even though git produces the same
 # archive (identical hashes) using the scm tag
-git archive $release_hash --prefix $tag/ -o $tarball .baseline api arrow bundled-guava common core data dev flink gradle gradlew hive mr orc parquet pig spark spark2 spark-runtime spark3 spark3-runtime LICENSE NOTICE README.md build.gradle baseline.gradle deploy.gradle tasks.gradle jmh.gradle gradle.properties settings.gradle versions.lock versions.props version.txt
+adds=" .baseline"  # prefixed with a blank space for each file name including the first one
+excludes="build|examples|jitpack.yml|python|site"
+archives=$(ls | grep -vE ${excludes})${adds}
+echo git archive list: ${archives}

Review comment:
       If I add `archives=$(echo $archives | tr '\n' ' ')`, then my `$archives` output is space delimited as I believe is the expected behavior for input to `git archive`.
   
   ```bash
   $ echo ${archives}
   LICENSE
   NOTICE
   README.md
   api
   arrow
   baseline.gradle
   bundled-guava
   common
   core
   data
   deploy.gradle
   dev
   flink
   gradle
   gradle.properties
   gradlew
   hive
   jmh.gradle
   mr
   orc
   parquet
   pig
   settings.gradle
   spark
   spark-runtime
   spark2
   spark3
   spark3-runtime
   tasks.gradle
   versions.props .baseline
   
   $ archives=$(echo $archives | tr '\n' ' ')
   $ echo $archives
   LICENSE NOTICE README.md api arrow baseline.gradle bundled-guava common core data deploy.gradle dev flink gradle gradle.properties gradlew hive jmh.gradle mr orc parquet pig settings.gradle spark spark-runtime spark2 spark3 spark3-runtime tasks.gradle versions.props .baseline 
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] kbendick commented on a change in pull request #1227: Improve release script to detect archive list automatically

Posted by GitBox <gi...@apache.org>.
kbendick commented on a change in pull request #1227:
URL: https://github.com/apache/iceberg/pull/1227#discussion_r487612567



##########
File path: dev/source-release.sh
##########
@@ -66,7 +66,12 @@ tarball=$tag.tar.gz
 
 # be conservative and use the release hash, even though git produces the same
 # archive (identical hashes) using the scm tag
-git archive $release_hash --prefix $tag/ -o $tarball .baseline api arrow bundled-guava common core data dev flink gradle gradlew hive mr orc parquet pig spark spark2 spark-runtime spark3 spark3-runtime LICENSE NOTICE README.md build.gradle baseline.gradle deploy.gradle tasks.gradle jmh.gradle gradle.properties settings.gradle versions.lock versions.props version.txt
+adds=" .baseline"  # prefixed with a blank space for each file name including the first one
+excludes="build|examples|jitpack.yml|python|site"  # excluded as they are not of use for releasing jars
+echo "Excluded files and directories: ${excludes}"
+archives=$(git ls-tree --name-only -r HEAD | cut -d"/" -f1 | uniq | grep -vE ${excludes} | tr '\n' ' ')${adds}
+echo "Included files and directories: ${archives}"
+git archive $release_hash --prefix $tag/ -o $tarball ${archives}

Review comment:
       Actualy ignore this comment related to gzipping, the output is correct and that flag just mucks things up. I'll delete these comments as to not clutter the PR.
   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] kbendick commented on pull request #1227: Improve release script to detect archive list automatically

Posted by GitBox <gi...@apache.org>.
kbendick commented on pull request #1227:
URL: https://github.com/apache/iceberg/pull/1227#issuecomment-683355338


   @rdblue if @waterlx doesn't respond, I'd be happy to make these changes. I think this is overall a good addition to the automation!


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] kbendick commented on pull request #1227: Improve release script to detect archive list automatically

Posted by GitBox <gi...@apache.org>.
kbendick commented on pull request #1227:
URL: https://github.com/apache/iceberg/pull/1227#issuecomment-687568968


   Hi @waterlx. I will take a look at this this weekend (likely tomorrow). Thanks for coming back to it!


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] kbendick commented on a change in pull request #1227: Improve release script to detect archive list automatically

Posted by GitBox <gi...@apache.org>.
kbendick commented on a change in pull request #1227:
URL: https://github.com/apache/iceberg/pull/1227#discussion_r487659408



##########
File path: dev/source-release.sh
##########
@@ -66,7 +66,12 @@ tarball=$tag.tar.gz
 
 # be conservative and use the release hash, even though git produces the same
 # archive (identical hashes) using the scm tag
-git archive $release_hash --prefix $tag/ -o $tarball .baseline api arrow bundled-guava common core data dev flink gradle gradlew hive mr orc parquet pig spark spark2 spark-runtime spark3 spark3-runtime LICENSE NOTICE README.md build.gradle baseline.gradle deploy.gradle tasks.gradle jmh.gradle gradle.properties settings.gradle versions.lock versions.props version.txt
+adds=" .baseline"  # prefixed with a blank space for each file name including the first one
+excludes="build|examples|jitpack.yml|python|site"  # excluded as they are not of use for releasing jars
+echo "Excluded files and directories: ${excludes}"
+archives=$(git ls-tree --name-only -r HEAD | cut -d"/" -f1 | uniq | grep -vE ${excludes} | tr '\n' ' ')${adds}
+echo "Included files and directories: ${archives}"
+git archive $release_hash --prefix $tag/ -o $tarball ${archives}

Review comment:
       Ok. So I've I'm come up wth another solution that generates the proper tarball, akin to my second command there, but still broken up into chunks (vs doing WAY too much shell scripting and commands on a single line).
   
   Instead of doing `git archive $release_hash --prefix $tag/ -o $tarball $(git ls-tree --name-only -r HEAD | cut -d"/" -f1 | uniq | grep -vE ${excludes} | tr '\n' ' ')`, where we inline what was previously the $archives variable, we can either write the $archives to a file and then cat that file out as part of the git archive or we can just echo the $archives variable in line (which seems to handle the weird string issues in my zsh shell, which is a relatively common shell).
   
   If we wrote it to a file, that file wouldn't get included as it wouldn't be in the `git ls-tree`. We'd just want to `rm` it at the end to do some clean up. Alternatively, we could create the `/tmp` filder somewhat higher up in the script so that we can place the archives file there and then it wouldn't be in danger of accidentally making it into the git tree as `/tmp` is part of the gitignore.
   
   So here's my proposed solution. It's a little more complex than using $archives inline, but my zsh shell seems to be interpreting it as a single string when we substitute it in and either of these approaches gets around that.
   
   ```bash
   # Note I am removing the adds line as it is no longer needed because the `git ls-tree` will include .baseline
   excludes="build|examples|jitpack.yml|python|site"  # excluded as they are not of use for releasing jars
   # Alternativey, we might add the `.github` directory to the excludes list as it's not needed for releasing
   echo "Excluded files and directories: ${excludes}"
   # Generate the files to archive
   # Note that we no longer need to append the $adds as the `.baseline` file comes from git ls-tree
   archives=$(git ls-tree --name-only -r HEAD | cut -d"/" -f1 | uniq | grep -vE ${excludes} | tr '\n' ' ')
   echo "Included files and directories: ${archives}"
   
   # Two possible ways of achieving this.
   # Way 1: cat the variable so as to make it no longer a string. This worked for me.
   git archive $release_hash --prefix $tag/ -o $tarball $(cat $archives) 
   
   # Way two: Write the archives to a file and then echo that file where we are currently placing $archives
   echo $archives >> to_include.txt
   git archive $release_hash --prefix $tag/ -o $tarball $(cat to_include.txt) # Also generates the tarball
   rm to_include.txt
   ```
   
   I'm still looking to see if there's a cleaner way to do this, but either of the above solutions will work across multiple shells. I think excluding zsh shell portability would be short sighted as it's now the default on the latest versions of macOS if I'm not mistaken.
   
   Let me know what you think @rdblue. I'm personally fond of writing it out to a file.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] waterlx commented on pull request #1227: Improve release script to detect archive list automatically

Posted by GitBox <gi...@apache.org>.
waterlx commented on pull request #1227:
URL: https://github.com/apache/iceberg/pull/1227#issuecomment-683839747


   @kbendick The PR is updated according to your comments:
   (1) Replace line break with blank to make it more safe
   (2) Add a warning as well as a comment for `excludes`
   Would you please take a look at it at your most convenience?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on pull request #1227: Improve release script to detect archive list automatically

Posted by GitBox <gi...@apache.org>.
rdblue commented on pull request #1227:
URL: https://github.com/apache/iceberg/pull/1227#issuecomment-682256787


   @waterlx, did you want to take a look at the suggestions from @kbendick? Those seem reasonable to me.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] kbendick commented on a change in pull request #1227: Improve release script to detect archive list automatically

Posted by GitBox <gi...@apache.org>.
kbendick commented on a change in pull request #1227:
URL: https://github.com/apache/iceberg/pull/1227#discussion_r483927425



##########
File path: dev/source-release.sh
##########
@@ -66,7 +66,11 @@ tarball=$tag.tar.gz
 
 # be conservative and use the release hash, even though git produces the same
 # archive (identical hashes) using the scm tag
-git archive $release_hash --prefix $tag/ -o $tarball .baseline api arrow bundled-guava common core data dev flink gradle gradlew hive mr orc parquet pig spark spark2 spark-runtime spark3 spark3-runtime LICENSE NOTICE README.md build.gradle baseline.gradle deploy.gradle tasks.gradle jmh.gradle gradle.properties settings.gradle versions.lock versions.props version.txt
+adds=" .baseline"  # prefixed with a blank space for each file name including the first one
+excludes="build|examples|jitpack.yml|python|site"
+archives=$(ls | grep -vE ${excludes})${adds}
+echo git archive list: ${archives}

Review comment:
       Ahh I see. It's a good thing we checked on another shell then. It's probably safer to normalize to ensure the script is more portable.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] kbendick commented on pull request #1227: Improve release script to detect archive list automatically

Posted by GitBox <gi...@apache.org>.
kbendick commented on pull request #1227:
URL: https://github.com/apache/iceberg/pull/1227#issuecomment-691811328


   > I just tried running this locally, but it failed with this error: `fatal: pathspec 'bdp-iceberg' did not match any files`
   > 
   > It looks like the list operation can introduce files or directories that aren't tracked by git and will cause the command to fail.
   > 
   > The idea behind using `git archive` with a list was to ensure that the local copy doesn't leak into the archive so you can have extra files and folders that are automatically ignored. If we committed this, then a release manager would need to fix their local copy before running the script.
   > 
   > Can we fix this by getting the listing from `git` instead?
   
   Yeah we can definitely fix this by getting a list from `git ls-tree`. We've already worked on some changes, but I think I might have a fix for the last problem (some weird issue with using the assigned `archives` variable as a shell variable).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] kbendick commented on a change in pull request #1227: Improve release script to detect archive list automatically

Posted by GitBox <gi...@apache.org>.
kbendick commented on a change in pull request #1227:
URL: https://github.com/apache/iceberg/pull/1227#discussion_r463297612



##########
File path: dev/source-release.sh
##########
@@ -66,7 +66,11 @@ tarball=$tag.tar.gz
 
 # be conservative and use the release hash, even though git produces the same
 # archive (identical hashes) using the scm tag
-git archive $release_hash --prefix $tag/ -o $tarball .baseline api arrow bundled-guava common core data dev flink gradle gradlew hive mr orc parquet pig spark spark2 spark-runtime spark3 spark3-runtime LICENSE NOTICE README.md build.gradle baseline.gradle deploy.gradle tasks.gradle jmh.gradle gradle.properties settings.gradle versions.lock versions.props version.txt
+adds=" .baseline"  # prefixed with a blank space for each file name including the first one
+excludes="build|examples|jitpack.yml|python|site"
+archives=$(ls | grep -vE ${excludes})${adds}
+echo git archive list: ${archives}

Review comment:
       So perhaps I'm running this wrong, but my `$archives` comes out formatted kind of funny based on my expectations. Everything is separated by a newline character except for ` .baseline` which is separated by a space from the last entry. I believe you're correct that `git-archive` excpects the output of archive files / directories to be separated by spaces. Is it possible that we need to munge the output of `$archives` to be space delimited instead of the mixture of newline delimited and space delimited that I got? It's also possible that I ran the script wrong.
   
   Below is my $archives list after running this portion of the script using `apache-iceberg-0.9.0` release (I couldn't find any releases tagged with as an `rc` candidate though I didn't spend a ton of time looking.
   
   ```bash
   $ echo ${archives}
   LICENSE
   NOTICE
   README.md
   api
   arrow
   baseline.gradle
   bundled-guava
   common
   core
   data
   deploy.gradle
   dev
   flink
   gradle
   gradle.properties
   gradlew
   hive
   jmh.gradle
   mr
   orc
   parquet
   pig
   settings.gradle
   spark
   spark-runtime
   spark2
   spark3
   spark3-runtime
   tasks.gradle
   versions.props .baseline
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] kbendick commented on a change in pull request #1227: Improve release script to detect archive list automatically

Posted by GitBox <gi...@apache.org>.
kbendick commented on a change in pull request #1227:
URL: https://github.com/apache/iceberg/pull/1227#discussion_r487659408



##########
File path: dev/source-release.sh
##########
@@ -66,7 +66,12 @@ tarball=$tag.tar.gz
 
 # be conservative and use the release hash, even though git produces the same
 # archive (identical hashes) using the scm tag
-git archive $release_hash --prefix $tag/ -o $tarball .baseline api arrow bundled-guava common core data dev flink gradle gradlew hive mr orc parquet pig spark spark2 spark-runtime spark3 spark3-runtime LICENSE NOTICE README.md build.gradle baseline.gradle deploy.gradle tasks.gradle jmh.gradle gradle.properties settings.gradle versions.lock versions.props version.txt
+adds=" .baseline"  # prefixed with a blank space for each file name including the first one
+excludes="build|examples|jitpack.yml|python|site"  # excluded as they are not of use for releasing jars
+echo "Excluded files and directories: ${excludes}"
+archives=$(git ls-tree --name-only -r HEAD | cut -d"/" -f1 | uniq | grep -vE ${excludes} | tr '\n' ' ')${adds}
+echo "Included files and directories: ${archives}"
+git archive $release_hash --prefix $tag/ -o $tarball ${archives}

Review comment:
       Ok. So I've I'm come up wth another solution that generates the proper tarball, akin to my second command there, but still broken up into chunks (vs doing WAY too much shell scripting and commands on a single line).
   
   Instead of doing `git archive $release_hash --prefix $tag/ -o $tarball $(git ls-tree --name-only -r HEAD | cut -d"/" -f1 | uniq | grep -vE ${excludes} | tr '\n' ' ')`, where we inline what was previously the $archives variable, we can either write the $archives to a file and then cat that file out as part of the git archive or we can just echo the $archives variable in line (which seems to handle the weird string issues in my zsh shell, which is a relatively common shell).
   
   If we wrote it to a file, that file wouldn't get included as it wouldn't be in the `git ls-tree`. We'd just want to `rm` it at the end to do some clean up.
   
   So here's my proposed solution. It's a little more complex than using $archives inline, but my zsh shell seems to be interpreting it as a single string when we substitute it in and either of these approaches gets around that.
   
   ```bash
   # Note I am removing the adds line as it is no longer needed because the `git ls-tree` will include .baseline
   excludes="build|examples|jitpack.yml|python|site"  # excluded as they are not of use for releasing jars
   # Alternativey, we might add the `.github` directory to the excludes list as it's not needed for releasing
   echo "Excluded files and directories: ${excludes}"
   # Generate the files to archive
   # Note that we no longer need to append the $adds as the `.baseline` file comes from git ls-tree
   archives=$(git ls-tree --name-only -r HEAD | cut -d"/" -f1 | uniq | grep -vE ${excludes} | tr '\n' ' ')
   echo "Included files and directories: ${archives}"
   
   # Two possible ways of achieving this.
   # Way 1: cat the variable so as to make it no longer a string. This worked for me.
   git archive $release_hash --prefix $tag/ -o $tarball $(cat $archives) 
   
   # Way two: Write the archives to a file and then echo that file where we are currently placing $archives
   echo $archives >> to_include.txt
   git archive $release_hash --prefix $tag/ -o $tarball $(cat to_include.txt) # Also generates the tarball
   rm to_include.txt
   ```
   
   I'm still looking to see if there's a cleaner way to do this, but either of the above solutions will work across multiple shells. I think excluding zsh shell portability would be short sighted as it's now the default on the latest versions of macOS if I'm not mistaken.
   
   Let me know what you think @rdblue. I'm personally fond of writing it out to a file.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] kbendick commented on a change in pull request #1227: Improve release script to detect archive list automatically

Posted by GitBox <gi...@apache.org>.
kbendick commented on a change in pull request #1227:
URL: https://github.com/apache/iceberg/pull/1227#discussion_r483927060



##########
File path: dev/source-release.sh
##########
@@ -66,7 +66,12 @@ tarball=$tag.tar.gz
 
 # be conservative and use the release hash, even though git produces the same
 # archive (identical hashes) using the scm tag
-git archive $release_hash --prefix $tag/ -o $tarball .baseline api arrow bundled-guava common core data dev flink gradle gradlew hive mr orc parquet pig spark spark2 spark-runtime spark3 spark3-runtime LICENSE NOTICE README.md build.gradle baseline.gradle deploy.gradle tasks.gradle jmh.gradle gradle.properties settings.gradle versions.lock versions.props version.txt
+adds=" .baseline"  # prefixed with a blank space for each file name including the first one
+excludes="build|examples|jitpack.yml|python|site"  # excluded as they are not of use for releasing jars
+echo WARNING! The following files/directories are to be excluded from git archive: ${excludes}
+archives=$(ls | grep -vE ${excludes} | tr '\n' ' ')${adds}

Review comment:
       Small nit / fyi: When I ran this, I wound up with an additional space between everything generated from the ls / grep / tr and the $adds.
   
   This is not a huge deal, but it might be cleaner to either split this into two lines for readability or prepend the ${adds} and move the leading space from adds above on line 69 to the end, e.g. `adds=".baseline "`.
   
   I would personally suggest the two line approach (or an equivalent but arguably less readable one line approach):
   Two lines:
   ```bash
   adds=".baseline" # notice there's no whitespace included
   excludes="build|examples|jitpack.yml|python|site"
   echo "NOTICE! The following files/directories are to be excluded from git archive: ${excludes}"
   includes=$(ls | grep -vE $excludes)  # Where we're splitting the generation of archives into two lines
   archives=$(echo "${adds} ${includes}" | tr '\n' ' ') # Second line applies the space between `adds` and `includes` and then normalizes both.
   ```
   
   The equivalent one line solution would be
   ```bash
   adds=".baseline"
   ....
   archives=$(echo "${adds} $(ls | grep -vE ${excludes})" | tr '\n' ' ')
   ```
   I'm personally more of a fan of the two line solution as it removes the need for both a padded space (and the associated entire sentence commnent) in `$adds` and I personally find it more readable.
   
   However, this is just a nit / FYI and the current solution is fine (though I'm a big fan of removing the comment from `$adds` and letting the variable names be our comments as in the two line solution and also not requiring special padding in the $adds variable). 🙂 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] waterlx edited a comment on pull request #1227: Improve release script to detect archive list automatically

Posted by GitBox <gi...@apache.org>.
waterlx edited a comment on pull request #1227:
URL: https://github.com/apache/iceberg/pull/1227#issuecomment-662477808


   @rdblue would you please review this change at your most convenience?
   
   In our internal release within Tencent, we always forgot to update the release script when adding/removing modules. That could be only detected when dev/source-release.sh is executed when the release is being performed.
   If I remembered correctly, 0.9.0 release also met the same condition 8-)
   Currently, versions.lock is also a candidate to remove. We might need an automatic way to detect the archive list and I feel tht the exclusion list might be more stable than the inclusion list.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on pull request #1227: Improve release script to detect archive list automatically

Posted by GitBox <gi...@apache.org>.
rdblue commented on pull request #1227:
URL: https://github.com/apache/iceberg/pull/1227#issuecomment-662555422


   Thanks, @waterlx. I did have a few RCs fail because of this issue -- and my own mistake removing `parquet` instead of `project`.
   
   Can you confirm that if there is an extra directory in the local copy that this fails to build the tarball?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] waterlx commented on pull request #1227: Improve release script to detect archive list automatically

Posted by GitBox <gi...@apache.org>.
waterlx commented on pull request #1227:
URL: https://github.com/apache/iceberg/pull/1227#issuecomment-683368106


   @rdblue @kbendick Sorry for my late response. I am working on it during the weekend


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] kbendick commented on a change in pull request #1227: Improve release script to detect archive list automatically

Posted by GitBox <gi...@apache.org>.
kbendick commented on a change in pull request #1227:
URL: https://github.com/apache/iceberg/pull/1227#discussion_r487671520



##########
File path: dev/source-release.sh
##########
@@ -66,7 +66,12 @@ tarball=$tag.tar.gz
 
 # be conservative and use the release hash, even though git produces the same
 # archive (identical hashes) using the scm tag
-git archive $release_hash --prefix $tag/ -o $tarball .baseline api arrow bundled-guava common core data dev flink gradle gradlew hive mr orc parquet pig spark spark2 spark-runtime spark3 spark3-runtime LICENSE NOTICE README.md build.gradle baseline.gradle deploy.gradle tasks.gradle jmh.gradle gradle.properties settings.gradle versions.lock versions.props version.txt
+adds=" .baseline"  # prefixed with a blank space for each file name including the first one
+excludes="build|examples|jitpack.yml|python|site"  # excluded as they are not of use for releasing jars
+echo "Excluded files and directories: ${excludes}"
+archives=$(git ls-tree --name-only -r HEAD | cut -d"/" -f1 | uniq | grep -vE ${excludes} | tr '\n' ' ')${adds}
+echo "Included files and directories: ${archives}"
+git archive $release_hash --prefix $tag/ -o $tarball ${archives}

Review comment:
       Scratch that, we can tag certain files or folders with `export-ignore` in the .gitattributes file. This is the proper way to exclude them from a git archive command as it's built into git itself. This would also allows us to skip the admittedly large amount of shell commands that are going into this. And while some might not be familiar with `.gitattributes`, I would suspect that a release engineer should hopefully be.
   
   Here's the documentation on the git archive command: https://www.linux.org/docs/man1/git-archive.html
   And here's a post discussing the usage of `export-ignore` in the `.gitattributes` file to properly ignore files. https://feeding.cloud.geek.nz/posts/excluding-files-from-git-archive/
   
   I think this is the best approach as it uses the in built tools of git for generating this (so hopefully it should be better understood by release engineers), and it involves way less shell munging than currently.
   
   An example .gitattributes file for the excludes we want would be something like
   ```
   /build export-ignore
   /examples export-ignore
   .jitpack.yml export-ignore
   /python export-ignore  # though do we really want to continue to exclude the python directory from the release?
   /site export-ignore
   ```
   
   The above complicated set of `excludes`, `archives, etc could instead be used off of the tag that is already pushed earlier in the script.
   
   ```bash
   # Assuming the .gitattributes file is set up with the appropriate export-ignore
   git archive --prefix $tag/ -o $tarball --worktree-attributes  $tag^{tree}
   ```
   
   I am going to open a PR for adding a `.gitattributes` file so we can test with that. You can feel free to add it to this branch if you prefer, but I can't push to this branch so it's easier for me to create a separate PR.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org