You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/02/10 22:28:07 UTC

[GitHub] [arrow-datafusion] alamb opened a new pull request #1807: Update CHANGELOG.md, update release scripts

alamb opened a new pull request #1807:
URL: https://github.com/apache/arrow-datafusion/pull/1807


   # Which issue does this PR close?
   
   Re https://github.com/apache/arrow-datafusion/issues/1587
   
   
    # Rationale for this change
   Trying to tell the 🌎  🌍  🌏   about DataFusion
   
   # What changes are included in this PR?
   It was created using
   
   ```shell
   $ ./dev/release/update_change_log-datafusion.sh
   ```
   
   🤯  what a lot of stuff happened!!!
   
   Since this file is automatically generated: to make changes, please edit the ticket subjects / labels directly (or tag me `@alamb` if you don't have the permissions to do so).
   
   I am hoping to create a release candidate over the weekend (as I will be largely offline starting next Thursday Feb 17 for a week or so) so I can do the release next week before I head out.
   
   <!--
   If there are any breaking changes to public APIs, please add the `api change` label.
   -->
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] houqp commented on a change in pull request #1807: Update CHANGELOG.md, update release scripts

Posted by GitBox <gi...@apache.org>.
houqp commented on a change in pull request #1807:
URL: https://github.com/apache/arrow-datafusion/pull/1807#discussion_r804359144



##########
File path: dev/release/update_change_log.sh
##########
@@ -50,14 +50,14 @@ OUTPUT_PATH="${PROJECT}/CHANGELOG.md"
 pushd ${SOURCE_TOP_DIR}
 
 # reset content in changelog
-git co "${SINCE_TAG}" "${OUTPUT_PATH}"
+git checkout "${SINCE_TAG}" "${OUTPUT_PATH}"

Review comment:
       haha, sorry that's an alias i made up in my gitconfig, muscle memory...




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] alamb edited a comment on pull request #1807: Update CHANGELOG.md, update release scripts

Posted by GitBox <gi...@apache.org>.
alamb edited a comment on pull request #1807:
URL: https://github.com/apache/arrow-datafusion/pull/1807#issuecomment-1036531280


   > I'm interested in learning a bit more about the lifecycle of issues / tags / releases.
   > once i understand better maybe i could help to get things in order for the next release.
   
   🎉  that would be wonderful @matthewmturner 
   
   > @houqp @alamb could you provide a little more info? i didnt see anything mentioned in the developers guide.
   
   The basic release instructions https://github.com/apache/arrow-datafusion/blob/master/dev/release/README.md#update-changelogmd
   
   Talk about 
   ```shell
   # create the changelog
   CHANGELOG_GITHUB_TOKEN=<TOKEN> ./dev/release/update_change_log-all.sh
   # review change log / edit issues and labels if needed, rerun until you are happy with the result
   git commit -a -m 'Create changelog for release'
   ```
   
   But the "until you are happy with the result" leaves a lot to the imagination 😆 
   
   Basically what I did was run that script (it updates the `CHANGELOG.md` file locally) and then looked at the output and tried to make something that looked coherent.
   
   Examples of things that I did:
   1. Found issues that did not have `datafusion` but were related to datafusion and put them in (otherwise they don't show up in the release notes)
   2. Changed  titles of PRs / issues so they were more specified (e.g. from `Fixed bug in select` to `Fixed bug when there are order by number`)
   3. Applied a liberal dose of judgement to what issues / PRs should have `enhancement` `bug` or `api-change` on them. 
   
   The most questionable thing related to labels were:
   1. API changes get listed under the "Beaking Changes" labels, and some tickets had new apis, etc that I didn't feel were breaking changes or if there were several PRs that together made a "single" breaking change from the user's point of view (e.g. breaking LogicalPlan into enums)
   2. Changes with multiple labels got put under a single heading so I removed some labels that were accurate but were obscuring what I felt was the "most important" part. For example, a ticket with an `enhancement` and a `documentation` label ended up under the Documentation heading, even when it also had code changes. I removed the `documentation` ticket for that one
   
   It would be great to have some more help here. Some thoughts:
   1. We could probably automate adding `datafusion` labels for issues that are closed via PR that also has the `datafusion` label (and the `datafusion` label is applied automatically based on the path that was fixed)
   
   Something else I haven't even been looking into is releasing the python bindings and releasing ballista lol -- so any help you want to lend in there would be awesome


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] houqp commented on a change in pull request #1807: Update CHANGELOG.md, update release scripts

Posted by GitBox <gi...@apache.org>.
houqp commented on a change in pull request #1807:
URL: https://github.com/apache/arrow-datafusion/pull/1807#discussion_r804359286



##########
File path: dev/release/update_change_log.sh
##########
@@ -50,14 +50,14 @@ OUTPUT_PATH="${PROJECT}/CHANGELOG.md"
 pushd ${SOURCE_TOP_DIR}
 
 # reset content in changelog
-git co "${SINCE_TAG}" "${OUTPUT_PATH}"
+git checkout "${SINCE_TAG}" "${OUTPUT_PATH}"
 # remove license header so github-changelog-generator has a clean base to append
-sed -i '1,18d' "${OUTPUT_PATH}"
+sed -i.bak '1,18d' "${OUTPUT_PATH}"
 
 docker run -it --rm \
     -e CHANGELOG_GITHUB_TOKEN=$CHANGELOG_GITHUB_TOKEN \
     -v "$(pwd)":/usr/local/src/your-app \
-    githubchangeloggenerator/github-changelog-generator:1.16.2 \
+    githubchangeloggenerator/github-changelog-generator \

Review comment:
       might want to pin to the latest version number instead to avoid breaking changes in the future.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] xudong963 commented on a change in pull request #1807: Update CHANGELOG.md, update release scripts

Posted by GitBox <gi...@apache.org>.
xudong963 commented on a change in pull request #1807:
URL: https://github.com/apache/arrow-datafusion/pull/1807#discussion_r804206075



##########
File path: datafusion/CHANGELOG.md
##########
@@ -19,6 +19,293 @@
 
 # Changelog
 
+## [7.0.0](https://github.com/apache/arrow-datafusion/tree/7.0.0) (2022-02-10)
+
+[Full Changelog](https://github.com/apache/arrow-datafusion/compare/6.0.0...7.0.0)
+
+**Breaking changes:**
+
+- Consolidate various configurations options, remove unrelated `batch_size` [\#1565](https://github.com/apache/arrow-datafusion/issues/1565)
+- Extract logical plans in  LogicalPlan as independent struct [\#1228](https://github.com/apache/arrow-datafusion/issues/1228)
+- Update `ExecutionPlan` to know about sortedness and repartitioning optimizer pass respect the invariants [\#1776](https://github.com/apache/arrow-datafusion/pull/1776) ([alamb](https://github.com/alamb))
+- Update to `arrow 8.0.0` [\#1673](https://github.com/apache/arrow-datafusion/pull/1673) ([alamb](https://github.com/alamb))
+- Remove non idiomatic `DataFusionError::into_arrow_external_error` in favor of From conversion [\#1645](https://github.com/apache/arrow-datafusion/pull/1645) ([alamb](https://github.com/alamb))
+- Remove `Accumulator::update` and `Accumulator::merge` [\#1582](https://github.com/apache/arrow-datafusion/pull/1582) ([Jimexist](https://github.com/Jimexist))
+- implement `Hash` for various types and replace `PartialOrd` [\#1580](https://github.com/apache/arrow-datafusion/pull/1580) ([Jimexist](https://github.com/Jimexist))
+- Replace `DatafusionError` with `GenericError` in `ObjectStore` interface [\#1541](https://github.com/apache/arrow-datafusion/pull/1541) ([matthewmturner](https://github.com/matthewmturner))
+- Make `FLOAT` SQL type map to `Float32` rather than `Float64` [\#1423](https://github.com/apache/arrow-datafusion/pull/1423) [[sql](https://github.com/apache/arrow-datafusion/labels/sql)] ([liukun4515](https://github.com/liukun4515))
+- Map `REAL` SQL type to `Float32` rather than `Float64` to be consistent with pg  [\#1390](https://github.com/apache/arrow-datafusion/pull/1390) [[sql](https://github.com/apache/arrow-datafusion/labels/sql)] ([hntd187](https://github.com/hntd187))
+
+**Implemented enhancements:**
+
+- Create new `datafusion_expr` crate [\#1753](https://github.com/apache/arrow-datafusion/issues/1753)
+- Create new `datafusion_common` crate [\#1752](https://github.com/apache/arrow-datafusion/issues/1752)
+- API to get Expr's type and nullability without a `DFSchema` [\#1725](https://github.com/apache/arrow-datafusion/issues/1725)
+- Cleaner API to create `Expr::ScalarFunction` programatically [\#1718](https://github.com/apache/arrow-datafusion/issues/1718)
+- Introduce a `Vec<u8>` based row-wise representation for DataFusion [\#1708](https://github.com/apache/arrow-datafusion/issues/1708)
+- Simplify creating new `ListingTable`  [\#1705](https://github.com/apache/arrow-datafusion/issues/1705)
+- Implement TableProvider for DataFrameImpl to allow registration of logical plans [\#1698](https://github.com/apache/arrow-datafusion/issues/1698)
+- Public Expr simplification API [\#1694](https://github.com/apache/arrow-datafusion/issues/1694)
+- Query Optimizer: Add OUTER --\> INNER join conversion [\#1670](https://github.com/apache/arrow-datafusion/issues/1670)

Review comment:
       In fact, this is not implemented. BTW, I am worried that there are some errors similar to this one, we just closed repeated issues, but features are not implemented.

##########
File path: datafusion/CHANGELOG.md
##########
@@ -19,6 +19,293 @@
 
 # Changelog
 
+## [7.0.0](https://github.com/apache/arrow-datafusion/tree/7.0.0) (2022-02-10)
+
+[Full Changelog](https://github.com/apache/arrow-datafusion/compare/6.0.0...7.0.0)
+
+**Breaking changes:**
+
+- Consolidate various configurations options, remove unrelated `batch_size` [\#1565](https://github.com/apache/arrow-datafusion/issues/1565)
+- Extract logical plans in  LogicalPlan as independent struct [\#1228](https://github.com/apache/arrow-datafusion/issues/1228)
+- Update `ExecutionPlan` to know about sortedness and repartitioning optimizer pass respect the invariants [\#1776](https://github.com/apache/arrow-datafusion/pull/1776) ([alamb](https://github.com/alamb))
+- Update to `arrow 8.0.0` [\#1673](https://github.com/apache/arrow-datafusion/pull/1673) ([alamb](https://github.com/alamb))
+- Remove non idiomatic `DataFusionError::into_arrow_external_error` in favor of From conversion [\#1645](https://github.com/apache/arrow-datafusion/pull/1645) ([alamb](https://github.com/alamb))
+- Remove `Accumulator::update` and `Accumulator::merge` [\#1582](https://github.com/apache/arrow-datafusion/pull/1582) ([Jimexist](https://github.com/Jimexist))
+- implement `Hash` for various types and replace `PartialOrd` [\#1580](https://github.com/apache/arrow-datafusion/pull/1580) ([Jimexist](https://github.com/Jimexist))
+- Replace `DatafusionError` with `GenericError` in `ObjectStore` interface [\#1541](https://github.com/apache/arrow-datafusion/pull/1541) ([matthewmturner](https://github.com/matthewmturner))
+- Make `FLOAT` SQL type map to `Float32` rather than `Float64` [\#1423](https://github.com/apache/arrow-datafusion/pull/1423) [[sql](https://github.com/apache/arrow-datafusion/labels/sql)] ([liukun4515](https://github.com/liukun4515))
+- Map `REAL` SQL type to `Float32` rather than `Float64` to be consistent with pg  [\#1390](https://github.com/apache/arrow-datafusion/pull/1390) [[sql](https://github.com/apache/arrow-datafusion/labels/sql)] ([hntd187](https://github.com/hntd187))
+
+**Implemented enhancements:**
+
+- Create new `datafusion_expr` crate [\#1753](https://github.com/apache/arrow-datafusion/issues/1753)
+- Create new `datafusion_common` crate [\#1752](https://github.com/apache/arrow-datafusion/issues/1752)
+- API to get Expr's type and nullability without a `DFSchema` [\#1725](https://github.com/apache/arrow-datafusion/issues/1725)
+- Cleaner API to create `Expr::ScalarFunction` programatically [\#1718](https://github.com/apache/arrow-datafusion/issues/1718)
+- Introduce a `Vec<u8>` based row-wise representation for DataFusion [\#1708](https://github.com/apache/arrow-datafusion/issues/1708)
+- Simplify creating new `ListingTable`  [\#1705](https://github.com/apache/arrow-datafusion/issues/1705)
+- Implement TableProvider for DataFrameImpl to allow registration of logical plans [\#1698](https://github.com/apache/arrow-datafusion/issues/1698)
+- Public Expr simplification API [\#1694](https://github.com/apache/arrow-datafusion/issues/1694)
+- Query Optimizer: Add OUTER --\> INNER join conversion [\#1670](https://github.com/apache/arrow-datafusion/issues/1670)
+- Support reading from CSV, Avro and Json files that have mergeable/compatible,  but not identical schemas [\#1669](https://github.com/apache/arrow-datafusion/issues/1669)
+- Remove `DataFusionError::into_arrow_external_error` in favor of `From` conversion [\#1644](https://github.com/apache/arrow-datafusion/issues/1644)
+- Include join type in display implementation for logical plan [\#1620](https://github.com/apache/arrow-datafusion/issues/1620)
+- Switch datafusion to using `eq_dyn_scalar`, etc kernels [\#1610](https://github.com/apache/arrow-datafusion/issues/1610)
+- Proposal: Remove `Accumulator::update` and `Accumulator::merge` [\#1549](https://github.com/apache/arrow-datafusion/issues/1549)
+- Replace DataFusionError/Result with impl Error for ObjectStore and Reader  [\#1540](https://github.com/apache/arrow-datafusion/issues/1540)
+- Add `approx_quantile`  support [\#1538](https://github.com/apache/arrow-datafusion/issues/1538)
+- support sorting decimal data type [\#1522](https://github.com/apache/arrow-datafusion/issues/1522)
+- Keep all datafusion's packages up to date with Dependabot [\#1472](https://github.com/apache/arrow-datafusion/issues/1472)
+- ExecutionContext support init ExecutionContextState with `new(state: Arc<Mutex<ExecutionContextState>>)` method [\#1439](https://github.com/apache/arrow-datafusion/issues/1439)
+- support the decimal scalar value [\#1393](https://github.com/apache/arrow-datafusion/issues/1393)
+- Documentation for using scalar functions with the the DataFrame API [\#1364](https://github.com/apache/arrow-datafusion/issues/1364)
+- Support `boolean == boolean` and `boolean != boolean` operators  [\#1159](https://github.com/apache/arrow-datafusion/issues/1159)
+-  Support DataType::Decimal\(15, 2\) in TPC-H benchmark [\#174](https://github.com/apache/arrow-datafusion/issues/174)
+-   Make `MemoryStream` public [\#150](https://github.com/apache/arrow-datafusion/issues/150)
+-   Add support for Parquet schema merging  [\#132](https://github.com/apache/arrow-datafusion/issues/132)
+-   Add SQL support for IN expression [\#118](https://github.com/apache/arrow-datafusion/issues/118)

Review comment:
       ditto

##########
File path: datafusion/CHANGELOG.md
##########
@@ -19,6 +19,293 @@
 
 # Changelog
 
+## [7.0.0](https://github.com/apache/arrow-datafusion/tree/7.0.0) (2022-02-10)
+
+[Full Changelog](https://github.com/apache/arrow-datafusion/compare/6.0.0...7.0.0)
+
+**Breaking changes:**
+
+- Consolidate various configurations options, remove unrelated `batch_size` [\#1565](https://github.com/apache/arrow-datafusion/issues/1565)
+- Extract logical plans in  LogicalPlan as independent struct [\#1228](https://github.com/apache/arrow-datafusion/issues/1228)
+- Update `ExecutionPlan` to know about sortedness and repartitioning optimizer pass respect the invariants [\#1776](https://github.com/apache/arrow-datafusion/pull/1776) ([alamb](https://github.com/alamb))
+- Update to `arrow 8.0.0` [\#1673](https://github.com/apache/arrow-datafusion/pull/1673) ([alamb](https://github.com/alamb))
+- Remove non idiomatic `DataFusionError::into_arrow_external_error` in favor of From conversion [\#1645](https://github.com/apache/arrow-datafusion/pull/1645) ([alamb](https://github.com/alamb))
+- Remove `Accumulator::update` and `Accumulator::merge` [\#1582](https://github.com/apache/arrow-datafusion/pull/1582) ([Jimexist](https://github.com/Jimexist))
+- implement `Hash` for various types and replace `PartialOrd` [\#1580](https://github.com/apache/arrow-datafusion/pull/1580) ([Jimexist](https://github.com/Jimexist))
+- Replace `DatafusionError` with `GenericError` in `ObjectStore` interface [\#1541](https://github.com/apache/arrow-datafusion/pull/1541) ([matthewmturner](https://github.com/matthewmturner))
+- Make `FLOAT` SQL type map to `Float32` rather than `Float64` [\#1423](https://github.com/apache/arrow-datafusion/pull/1423) [[sql](https://github.com/apache/arrow-datafusion/labels/sql)] ([liukun4515](https://github.com/liukun4515))
+- Map `REAL` SQL type to `Float32` rather than `Float64` to be consistent with pg  [\#1390](https://github.com/apache/arrow-datafusion/pull/1390) [[sql](https://github.com/apache/arrow-datafusion/labels/sql)] ([hntd187](https://github.com/hntd187))
+
+**Implemented enhancements:**
+
+- Create new `datafusion_expr` crate [\#1753](https://github.com/apache/arrow-datafusion/issues/1753)
+- Create new `datafusion_common` crate [\#1752](https://github.com/apache/arrow-datafusion/issues/1752)
+- API to get Expr's type and nullability without a `DFSchema` [\#1725](https://github.com/apache/arrow-datafusion/issues/1725)
+- Cleaner API to create `Expr::ScalarFunction` programatically [\#1718](https://github.com/apache/arrow-datafusion/issues/1718)
+- Introduce a `Vec<u8>` based row-wise representation for DataFusion [\#1708](https://github.com/apache/arrow-datafusion/issues/1708)
+- Simplify creating new `ListingTable`  [\#1705](https://github.com/apache/arrow-datafusion/issues/1705)
+- Implement TableProvider for DataFrameImpl to allow registration of logical plans [\#1698](https://github.com/apache/arrow-datafusion/issues/1698)
+- Public Expr simplification API [\#1694](https://github.com/apache/arrow-datafusion/issues/1694)
+- Query Optimizer: Add OUTER --\> INNER join conversion [\#1670](https://github.com/apache/arrow-datafusion/issues/1670)

Review comment:
       I did not explore this script, which can generate changelog based on PRs already merged?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] alamb commented on a change in pull request #1807: Update CHANGELOG.md, update release scripts

Posted by GitBox <gi...@apache.org>.
alamb commented on a change in pull request #1807:
URL: https://github.com/apache/arrow-datafusion/pull/1807#discussion_r804182134



##########
File path: CHANGELOG.md
##########
@@ -21,7 +21,6 @@ Changelogs are maintained separately for each subproject. Please check out the
 changelog file within each subproject folder for more details:
 
 * [Datafusion CHANGELOG](./datafusion/CHANGELOG.md)
-* [Datafusion Python Binding CHANGELOG](./python/CHANGELOG.md)

Review comment:
       python has been moved to its own crate, I believe. 

##########
File path: dev/release/update_change_log.sh
##########
@@ -50,14 +50,14 @@ OUTPUT_PATH="${PROJECT}/CHANGELOG.md"
 pushd ${SOURCE_TOP_DIR}
 
 # reset content in changelog
-git co "${SINCE_TAG}" "${OUTPUT_PATH}"
+git checkout "${SINCE_TAG}" "${OUTPUT_PATH}"

Review comment:
       my mac didn't like `git co` 😢 

##########
File path: dev/release/update_change_log.sh
##########
@@ -50,14 +50,14 @@ OUTPUT_PATH="${PROJECT}/CHANGELOG.md"
 pushd ${SOURCE_TOP_DIR}
 
 # reset content in changelog
-git co "${SINCE_TAG}" "${OUTPUT_PATH}"
+git checkout "${SINCE_TAG}" "${OUTPUT_PATH}"
 # remove license header so github-changelog-generator has a clean base to append
-sed -i '1,18d' "${OUTPUT_PATH}"
+sed -i.bak '1,18d' "${OUTPUT_PATH}"

Review comment:
       Likewise, apparently `sed` from mac is slightly different than linux 
   
   https://stackoverflow.com/questions/5694228/sed-in-place-flag-that-works-both-on-mac-bsd-and-linux

##########
File path: dev/release/update_change_log.sh
##########
@@ -50,14 +50,14 @@ OUTPUT_PATH="${PROJECT}/CHANGELOG.md"
 pushd ${SOURCE_TOP_DIR}
 
 # reset content in changelog
-git co "${SINCE_TAG}" "${OUTPUT_PATH}"
+git checkout "${SINCE_TAG}" "${OUTPUT_PATH}"
 # remove license header so github-changelog-generator has a clean base to append
-sed -i '1,18d' "${OUTPUT_PATH}"
+sed -i.bak '1,18d' "${OUTPUT_PATH}"
 
 docker run -it --rm \
     -e CHANGELOG_GITHUB_TOKEN=$CHANGELOG_GITHUB_TOKEN \
     -v "$(pwd)":/usr/local/src/your-app \
-    githubchangeloggenerator/github-changelog-generator:1.16.2 \
+    githubchangeloggenerator/github-changelog-generator \

Review comment:
       use latest




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] matthewmturner commented on pull request #1807: Update CHANGELOG.md, update release scripts

Posted by GitBox <gi...@apache.org>.
matthewmturner commented on pull request #1807:
URL: https://github.com/apache/arrow-datafusion/pull/1807#issuecomment-1036289253


   I'm interested in learning a bit more about the lifecycle of issues / tags / releases.  
   
   @houqp @alamb could you provide a little more info? i didnt see anything mentioned in the developers guide.
   
   once i understand better maybe i could help to get things in order for the next release.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] alamb commented on pull request #1807: Update CHANGELOG.md, update release scripts

Posted by GitBox <gi...@apache.org>.
alamb commented on pull request #1807:
URL: https://github.com/apache/arrow-datafusion/pull/1807#issuecomment-1036531280


   > I'm interested in learning a bit more about the lifecycle of issues / tags / releases.
   > once i understand better maybe i could help to get things in order for the next release.
   
   🎉  that would be wonderful
   
   > @houqp @alamb could you provide a little more info? i didnt see anything mentioned in the developers guide.
   
   The basic release instructions https://github.com/apache/arrow-datafusion/blob/master/dev/release/README.md#update-changelogmd
   
   Talk about 
   ```shell
   # create the changelog
   CHANGELOG_GITHUB_TOKEN=<TOKEN> ./dev/release/update_change_log-all.sh
   # review change log / edit issues and labels if needed, rerun until you are happy with the result
   git commit -a -m 'Create changelog for release'
   ```
   
   But the "until you are happy with the result" leaves a lot to the imagination 😆 
   
   Basically what I did was run that script (it updates the `CHANGELOG.md` file locally) and then looked at the output and tried to make something that looked coherent.
   
   Examples of things that I did:
   1. Found issues that did not have `datafusion` but were related to datafusion and put them in (otherwise they don't show up in the release notes)
   2. Changed  titles of PRs / issues so they were more specified (e.g. from `Fixed bug in select` to `Fixed bug when there are order by number`)
   3. Applied a liberal dose of judgement to what issues / PRs should have `enhancement` `bug` or `api-change` on them. 
   
   The most questionable thing related to labels were:
   1. API changes get listed under the "Beaking Changes" labels, and some tickets had new apis, etc that I didn't feel were breaking changes or if there were several PRs that together made a "single" breaking change from the user's point of view (e.g. breaking LogicalPlan into enums)
   2. Changes with multiple labels got put under a single heading so I removed some labels that were accurate but were obscuring what I felt was the "most important" part. For example, a ticket with an `enhancement` and a `documentation` label ended up under the Documentation heading, even when it also had code changes. I removed the `documentation` ticket for that one
   
   It would be great to have some more help here. Some thoughts:
   1. We could probably automate adding `datafusion` labels for issues that are closed via PR that also has the `datafusion` label (and the `datafusion` label is applied automatically based on the path that was fixed)
   
   Something else I haven't even been looking into is releasing the python bindings and releasing ballista lol -- so any help you want to lend in there would be awesome


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] alamb commented on a change in pull request #1807: Update CHANGELOG.md, update release scripts

Posted by GitBox <gi...@apache.org>.
alamb commented on a change in pull request #1807:
URL: https://github.com/apache/arrow-datafusion/pull/1807#discussion_r804548747



##########
File path: dev/release/update_change_log.sh
##########
@@ -50,14 +50,14 @@ OUTPUT_PATH="${PROJECT}/CHANGELOG.md"
 pushd ${SOURCE_TOP_DIR}
 
 # reset content in changelog
-git co "${SINCE_TAG}" "${OUTPUT_PATH}"
+git checkout "${SINCE_TAG}" "${OUTPUT_PATH}"
 # remove license header so github-changelog-generator has a clean base to append
-sed -i '1,18d' "${OUTPUT_PATH}"
+sed -i.bak '1,18d' "${OUTPUT_PATH}"
 
 docker run -it --rm \
     -e CHANGELOG_GITHUB_TOKEN=$CHANGELOG_GITHUB_TOKEN \
     -v "$(pwd)":/usr/local/src/your-app \
-    githubchangeloggenerator/github-changelog-generator:1.16.2 \
+    githubchangeloggenerator/github-changelog-generator \

Review comment:
       🤔  https://hub.docker.com/r/githubchangeloggenerator/github-changelog-generator/tags
   
   Seems like `latest` has no other tag (as in `latest` is newer than `1.16.2` but there are no other numbered versions newer than `1.16.2`) 😞 
   
   ![Screen Shot 2022-02-11 at 6 00 42 AM](https://user-images.githubusercontent.com/490673/153580452-54ccec0a-47bb-48b2-b5e9-ec7967d58920.png)
   
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] alamb commented on a change in pull request #1807: Update CHANGELOG.md, update release scripts

Posted by GitBox <gi...@apache.org>.
alamb commented on a change in pull request #1807:
URL: https://github.com/apache/arrow-datafusion/pull/1807#discussion_r804546071



##########
File path: dev/release/update_change_log.sh
##########
@@ -50,14 +50,14 @@ OUTPUT_PATH="${PROJECT}/CHANGELOG.md"
 pushd ${SOURCE_TOP_DIR}
 
 # reset content in changelog
-git co "${SINCE_TAG}" "${OUTPUT_PATH}"
+git checkout "${SINCE_TAG}" "${OUTPUT_PATH}"

Review comment:
       No problem. I tip my hat to you for these scripts in general (automating the RAT in particular is genius, I normally just fix it up manually in arrow-rs)




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] alamb merged pull request #1807: Update CHANGELOG.md, update release scripts

Posted by GitBox <gi...@apache.org>.
alamb merged pull request #1807:
URL: https://github.com/apache/arrow-datafusion/pull/1807


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org