You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/01/19 16:06:06 UTC

[GitHub] [arrow-datafusion] matthewmturner opened a new pull request #1616: Add roadmap to readme

matthewmturner opened a new pull request #1616:
URL: https://github.com/apache/arrow-datafusion/pull/1616


   # Which issue does this PR close?
   
   Closes #1515 
   
   <!--
   We generally require a GitHub issue to be filed for all bug fixes and enhancements and this helps us generate change logs for our releases. You can link an issue to this PR using the GitHub syntax. For example `Closes #123` indicates that this PR will close issue #123.
   -->
   
    # Rationale for this change
   <!--
    Why are you proposing this change? If this is already explained clearly in the issue then this section is not needed.
    Explaining clearly why changes are proposed helps reviewers understand your changes and offer better suggestions for fixes.  
   -->
   
   # What changes are included in this PR?
   <!--
   There is no need to duplicate the description in the issue here but it is sometimes worth providing a summary of the individual changes in this PR.
   -->
   
   # Are there any user-facing changes?
   <!--
   If there are user-facing changes then we may require documentation to be updated before approving the PR.
   -->
   
   <!--
   If there are any breaking changes to public APIs, please add the `api change` label.
   -->
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] xudong963 commented on a change in pull request #1616: Add roadmap to readme

Posted by GitBox <gi...@apache.org>.
xudong963 commented on a change in pull request #1616:
URL: https://github.com/apache/arrow-datafusion/pull/1616#discussion_r787945413



##########
File path: README.md
##########
@@ -141,6 +141,60 @@ datafusion = "6.0.0"
 
 DataFusion also includes a simple command-line interactive SQL utility. See the [CLI reference](https://arrow.apache.org/datafusion/cli/index.html) for more information.
 
+# Roadmap
+
+A quarterly roadmap will be published to give the DataFusion community visibility into the priorities of the projects contributors. This roadmap is not binding.
+
+## 2022 Q1
+
+### DataFusion Core
+
+- Publish official Arrow2 branch
+- Implementation of memory manager (i.e. to enable spilling to disk as needed)
+
+### Benchmarking
+
+- Inclusion in Db-Benchmark with all quries covered
+- All TPCH queries covered
+
+### Performance Improvements
+
+- Predicate evaluation
+- Multi-column comparisons that can't be vectorized
+- Null constant support
+
+### New Features
+
+- Read JSON as table
+- Simplify DDL with Datafusion-Cli
+- Add Decimal128 data type and the attendant features such as Arrow Kernel and UDF support
+- Add new experimental e-graph based optimizer
+
+### Ballista
+
+- Begin work on design documents and plan / priorities for development
+
+### Extensions
+
+- Stable S3 support
+- Begin design discussions and prototyping of a stream provider
+
+## Beyond 2022 Q1
+
+There is no clear timeline for the below, but community members have expressed interest in working on these topics.
+
+### DataFusion Core
+
+- Custom SQL support
+- Split DataFusion into multiple crates
+- Push based query execution and code gen

Review comment:
       both are ok




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] matthewmturner commented on pull request #1616: Add roadmap to readme

Posted by GitBox <gi...@apache.org>.
matthewmturner commented on pull request #1616:
URL: https://github.com/apache/arrow-datafusion/pull/1616#issuecomment-1017955984


   @alamb i have a reminder for myself to refresh every 3 months.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] liukun4515 commented on pull request #1616: Add roadmap to readme

Posted by GitBox <gi...@apache.org>.
liukun4515 commented on pull request #1616:
URL: https://github.com/apache/arrow-datafusion/pull/1616#issuecomment-1017047766


   > Thank you, @alamb @houqp @xudong963 @yjshen @liukun4515 @hntd187 @realno @pjmore for contributions to roadmap.
   > 
   > Ive created PR here adding roadmap to the datafusion readme.
   > 
   > Let me know your thoughts :)
   
   LGTM


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] matthewmturner commented on a change in pull request #1616: Add roadmap to readme

Posted by GitBox <gi...@apache.org>.
matthewmturner commented on a change in pull request #1616:
URL: https://github.com/apache/arrow-datafusion/pull/1616#discussion_r832149847



##########
File path: README.md
##########
@@ -141,6 +141,60 @@ datafusion = "6.0.0"
 
 DataFusion also includes a simple command-line interactive SQL utility. See the [CLI reference](https://arrow.apache.org/datafusion/cli/index.html) for more information.
 
+# Roadmap
+
+A quarterly roadmap will be published to give the DataFusion community visibility into the priorities of the projects contributors. This roadmap is not binding.
+
+## 2022 Q1
+
+### DataFusion Core
+
+- Publish official Arrow2 branch

Review comment:
       Yes that's what it refers to and it has been completed. However it's not completely up to date with master. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] matthewmturner commented on pull request #1616: Add roadmap to readme

Posted by GitBox <gi...@apache.org>.
matthewmturner commented on pull request #1616:
URL: https://github.com/apache/arrow-datafusion/pull/1616#issuecomment-1016636637


   > What is the difference between a new feature and extension? They both sound like kinda the same thing. Semantics aside this sounds good to me.
   
   Extension is specifically referring to topics that would be in `datafusion-contrib` as opposed to a new feature that is in the `datafusion` crate


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] alamb commented on a change in pull request #1616: Add roadmap to readme

Posted by GitBox <gi...@apache.org>.
alamb commented on a change in pull request #1616:
URL: https://github.com/apache/arrow-datafusion/pull/1616#discussion_r788116417



##########
File path: README.md
##########
@@ -141,6 +141,60 @@ datafusion = "6.0.0"
 
 DataFusion also includes a simple command-line interactive SQL utility. See the [CLI reference](https://arrow.apache.org/datafusion/cli/index.html) for more information.
 
+# Roadmap
+
+A quarterly roadmap will be published to give the DataFusion community visibility into the priorities of the projects contributors. This roadmap is not binding.
+
+## 2022 Q1
+
+### DataFusion Core
+
+- Publish official Arrow2 branch
+- Implementation of memory manager (i.e. to enable spilling to disk as needed)
+
+### Benchmarking
+
+- Inclusion in Db-Benchmark with all quries covered
+- All TPCH queries covered
+
+### Performance Improvements
+
+- Predicate evaluation
+- Multi-column comparisons that can't be vectorized

Review comment:
       ```suggestion
   - Improve multi-column comparisons (that can't be vectorized at the moment)
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] matthewmturner commented on a change in pull request #1616: Add roadmap to readme

Posted by GitBox <gi...@apache.org>.
matthewmturner commented on a change in pull request #1616:
URL: https://github.com/apache/arrow-datafusion/pull/1616#discussion_r832149847



##########
File path: README.md
##########
@@ -141,6 +141,60 @@ datafusion = "6.0.0"
 
 DataFusion also includes a simple command-line interactive SQL utility. See the [CLI reference](https://arrow.apache.org/datafusion/cli/index.html) for more information.
 
+# Roadmap
+
+A quarterly roadmap will be published to give the DataFusion community visibility into the priorities of the projects contributors. This roadmap is not binding.
+
+## 2022 Q1
+
+### DataFusion Core
+
+- Publish official Arrow2 branch

Review comment:
       Yes that's what it refers to




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] xudong963 commented on a change in pull request #1616: Add roadmap to readme

Posted by GitBox <gi...@apache.org>.
xudong963 commented on a change in pull request #1616:
URL: https://github.com/apache/arrow-datafusion/pull/1616#discussion_r787932816



##########
File path: README.md
##########
@@ -141,6 +141,60 @@ datafusion = "6.0.0"
 
 DataFusion also includes a simple command-line interactive SQL utility. See the [CLI reference](https://arrow.apache.org/datafusion/cli/index.html) for more information.
 
+# Roadmap
+
+A quarterly roadmap will be published to give the DataFusion community visibility into the priorities of the projects contributors. This roadmap is not binding.
+
+## 2022 Q1
+
+### DataFusion Core
+
+- Publish official Arrow2 branch
+- Implementation of memory manager (i.e. to enable spilling to disk as needed)
+
+### Benchmarking
+
+- Inclusion in Db-Benchmark with all quries covered
+- All TPCH queries covered

Review comment:
       I'll check it after getting up




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] matthewmturner commented on a change in pull request #1616: Add roadmap to readme

Posted by GitBox <gi...@apache.org>.
matthewmturner commented on a change in pull request #1616:
URL: https://github.com/apache/arrow-datafusion/pull/1616#discussion_r787930891



##########
File path: README.md
##########
@@ -141,6 +141,60 @@ datafusion = "6.0.0"
 
 DataFusion also includes a simple command-line interactive SQL utility. See the [CLI reference](https://arrow.apache.org/datafusion/cli/index.html) for more information.
 
+# Roadmap
+
+A quarterly roadmap will be published to give the DataFusion community visibility into the priorities of the projects contributors. This roadmap is not binding.
+
+## 2022 Q1
+
+### DataFusion Core
+
+- Publish official Arrow2 branch
+- Implementation of memory manager (i.e. to enable spilling to disk as needed)
+
+### Benchmarking
+
+- Inclusion in Db-Benchmark with all quries covered
+- All TPCH queries covered

Review comment:
       To my knowledge, no. I thought they were two independent benchmarks that we wanted to cover.  However, I don't have much experience on the TPCH side / i've only been working on the db-benchmark solution.
   
   I don't see TPCH mentioned on db-benchmark.  Would you be able to expand on how you think they are duplicates?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] xudong963 commented on a change in pull request #1616: Add roadmap to readme

Posted by GitBox <gi...@apache.org>.
xudong963 commented on a change in pull request #1616:
URL: https://github.com/apache/arrow-datafusion/pull/1616#discussion_r788318868



##########
File path: README.md
##########
@@ -141,6 +141,60 @@ datafusion = "6.0.0"
 
 DataFusion also includes a simple command-line interactive SQL utility. See the [CLI reference](https://arrow.apache.org/datafusion/cli/index.html) for more information.
 
+# Roadmap
+
+A quarterly roadmap will be published to give the DataFusion community visibility into the priorities of the projects contributors. This roadmap is not binding.
+
+## 2022 Q1
+
+### DataFusion Core
+
+- Publish official Arrow2 branch
+- Implementation of memory manager (i.e. to enable spilling to disk as needed)
+
+### Benchmarking
+
+- Inclusion in Db-Benchmark with all quries covered
+- All TPCH queries covered

Review comment:
       Sorry, my mistake. I believe the Db-Benchmark you mentioned is `datafusion/benches/`?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] matthewmturner commented on pull request #1616: Add roadmap to readme

Posted by GitBox <gi...@apache.org>.
matthewmturner commented on pull request #1616:
URL: https://github.com/apache/arrow-datafusion/pull/1616#issuecomment-1016622084


   Thank you, @alamb @houqp @xudong963 @yjshen @liukun4515 @hntd187 @realno @pjmore for contributions to roadmap.
   
   Ive created PR here adding roadmap to the datafusion readme.  
   
   Let me know your thoughts :)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] matthewmturner commented on pull request #1616: Add roadmap to readme

Posted by GitBox <gi...@apache.org>.
matthewmturner commented on pull request #1616:
URL: https://github.com/apache/arrow-datafusion/pull/1616#issuecomment-1016662703


   @yahoNanJing FYI - if you want to add anything ballista related just let me know.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] xudong963 commented on a change in pull request #1616: Add roadmap to readme

Posted by GitBox <gi...@apache.org>.
xudong963 commented on a change in pull request #1616:
URL: https://github.com/apache/arrow-datafusion/pull/1616#discussion_r787911983



##########
File path: README.md
##########
@@ -141,6 +141,60 @@ datafusion = "6.0.0"
 
 DataFusion also includes a simple command-line interactive SQL utility. See the [CLI reference](https://arrow.apache.org/datafusion/cli/index.html) for more information.
 
+# Roadmap
+
+A quarterly roadmap will be published to give the DataFusion community visibility into the priorities of the projects contributors. This roadmap is not binding.
+
+## 2022 Q1
+
+### DataFusion Core
+
+- Publish official Arrow2 branch
+- Implementation of memory manager (i.e. to enable spilling to disk as needed)
+
+### Benchmarking
+
+- Inclusion in Db-Benchmark with all quries covered
+- All TPCH queries covered
+
+### Performance Improvements
+
+- Predicate evaluation
+- Multi-column comparisons that can't be vectorized

Review comment:
       `can't be` ?

##########
File path: README.md
##########
@@ -141,6 +141,60 @@ datafusion = "6.0.0"
 
 DataFusion also includes a simple command-line interactive SQL utility. See the [CLI reference](https://arrow.apache.org/datafusion/cli/index.html) for more information.
 
+# Roadmap
+
+A quarterly roadmap will be published to give the DataFusion community visibility into the priorities of the projects contributors. This roadmap is not binding.
+
+## 2022 Q1
+
+### DataFusion Core
+
+- Publish official Arrow2 branch
+- Implementation of memory manager (i.e. to enable spilling to disk as needed)
+
+### Benchmarking
+
+- Inclusion in Db-Benchmark with all quries covered
+- All TPCH queries covered
+
+### Performance Improvements
+
+- Predicate evaluation
+- Multi-column comparisons that can't be vectorized
+- Null constant support
+
+### New Features
+
+- Read JSON as table
+- Simplify DDL with Datafusion-Cli
+- Add Decimal128 data type and the attendant features such as Arrow Kernel and UDF support
+- Add new experimental e-graph based optimizer
+
+### Ballista
+
+- Begin work on design documents and plan / priorities for development
+
+### Extensions
+
+- Stable S3 support
+- Begin design discussions and prototyping of a stream provider
+
+## Beyond 2022 Q1
+
+There is no clear timeline for the below, but community members have expressed interest in working on these topics.
+
+### DataFusion Core
+
+- Custom SQL support
+- Split DataFusion into multiple crates
+- Push based query execution and code gen

Review comment:
       ```suggestions
   Push based query execution and codegen
   ```

##########
File path: README.md
##########
@@ -141,6 +141,60 @@ datafusion = "6.0.0"
 
 DataFusion also includes a simple command-line interactive SQL utility. See the [CLI reference](https://arrow.apache.org/datafusion/cli/index.html) for more information.
 
+# Roadmap
+
+A quarterly roadmap will be published to give the DataFusion community visibility into the priorities of the projects contributors. This roadmap is not binding.
+
+## 2022 Q1
+
+### DataFusion Core
+
+- Publish official Arrow2 branch
+- Implementation of memory manager (i.e. to enable spilling to disk as needed)
+
+### Benchmarking
+
+- Inclusion in Db-Benchmark with all quries covered
+- All TPCH queries covered

Review comment:
       Are these two duplicates?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] alamb commented on a change in pull request #1616: Add roadmap to readme

Posted by GitBox <gi...@apache.org>.
alamb commented on a change in pull request #1616:
URL: https://github.com/apache/arrow-datafusion/pull/1616#discussion_r832150242



##########
File path: README.md
##########
@@ -141,6 +141,60 @@ datafusion = "6.0.0"
 
 DataFusion also includes a simple command-line interactive SQL utility. See the [CLI reference](https://arrow.apache.org/datafusion/cli/index.html) for more information.
 
+# Roadmap
+
+A quarterly roadmap will be published to give the DataFusion community visibility into the priorities of the projects contributors. This roadmap is not binding.
+
+## 2022 Q1
+
+### DataFusion Core
+
+- Publish official Arrow2 branch

Review comment:
       yes, that is exactly what arrow2 means
   
   There is a branch https://github.com/apache/arrow-datafusion/tree/arrow2 and a discussion ticket https://github.com/apache/arrow-datafusion/issues/1532 that has more information if you are interested




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] hntd187 commented on pull request #1616: Add roadmap to readme

Posted by GitBox <gi...@apache.org>.
hntd187 commented on pull request #1616:
URL: https://github.com/apache/arrow-datafusion/pull/1616#issuecomment-1016632999


   What is the difference between a new feature and extension? They both sound like kinda the same thing. Semantics aside this sounds good to me.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] matthewmturner commented on a change in pull request #1616: Add roadmap to readme

Posted by GitBox <gi...@apache.org>.
matthewmturner commented on a change in pull request #1616:
URL: https://github.com/apache/arrow-datafusion/pull/1616#discussion_r787939283



##########
File path: README.md
##########
@@ -141,6 +141,60 @@ datafusion = "6.0.0"
 
 DataFusion also includes a simple command-line interactive SQL utility. See the [CLI reference](https://arrow.apache.org/datafusion/cli/index.html) for more information.
 
+# Roadmap
+
+A quarterly roadmap will be published to give the DataFusion community visibility into the priorities of the projects contributors. This roadmap is not binding.
+
+## 2022 Q1
+
+### DataFusion Core
+
+- Publish official Arrow2 branch
+- Implementation of memory manager (i.e. to enable spilling to disk as needed)
+
+### Benchmarking
+
+- Inclusion in Db-Benchmark with all quries covered
+- All TPCH queries covered
+
+### Performance Improvements
+
+- Predicate evaluation
+- Multi-column comparisons that can't be vectorized
+- Null constant support
+
+### New Features
+
+- Read JSON as table
+- Simplify DDL with Datafusion-Cli
+- Add Decimal128 data type and the attendant features such as Arrow Kernel and UDF support
+- Add new experimental e-graph based optimizer
+
+### Ballista
+
+- Begin work on design documents and plan / priorities for development
+
+### Extensions
+
+- Stable S3 support
+- Begin design discussions and prototyping of a stream provider
+
+## Beyond 2022 Q1
+
+There is no clear timeline for the below, but community members have expressed interest in working on these topics.
+
+### DataFusion Core
+
+- Custom SQL support
+- Split DataFusion into multiple crates
+- Push based query execution and code gen

Review comment:
       i think i will actually just write out the whole word




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] tisonkun commented on a change in pull request #1616: Add roadmap to readme

Posted by GitBox <gi...@apache.org>.
tisonkun commented on a change in pull request #1616:
URL: https://github.com/apache/arrow-datafusion/pull/1616#discussion_r832186729



##########
File path: README.md
##########
@@ -141,6 +141,60 @@ datafusion = "6.0.0"
 
 DataFusion also includes a simple command-line interactive SQL utility. See the [CLI reference](https://arrow.apache.org/datafusion/cli/index.html) for more information.
 
+# Roadmap
+
+A quarterly roadmap will be published to give the DataFusion community visibility into the priorities of the projects contributors. This roadmap is not binding.
+
+## 2022 Q1
+
+### DataFusion Core
+
+- Publish official Arrow2 branch

Review comment:
       Thank you!




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] xudong963 commented on a change in pull request #1616: Add roadmap to readme

Posted by GitBox <gi...@apache.org>.
xudong963 commented on a change in pull request #1616:
URL: https://github.com/apache/arrow-datafusion/pull/1616#discussion_r787915749



##########
File path: README.md
##########
@@ -141,6 +141,60 @@ datafusion = "6.0.0"
 
 DataFusion also includes a simple command-line interactive SQL utility. See the [CLI reference](https://arrow.apache.org/datafusion/cli/index.html) for more information.
 
+# Roadmap
+
+A quarterly roadmap will be published to give the DataFusion community visibility into the priorities of the projects contributors. This roadmap is not binding.
+
+## 2022 Q1
+
+### DataFusion Core
+
+- Publish official Arrow2 branch
+- Implementation of memory manager (i.e. to enable spilling to disk as needed)
+
+### Benchmarking
+
+- Inclusion in Db-Benchmark with all quries covered
+- All TPCH queries covered
+
+### Performance Improvements
+
+- Predicate evaluation
+- Multi-column comparisons that can't be vectorized
+- Null constant support
+
+### New Features
+
+- Read JSON as table
+- Simplify DDL with Datafusion-Cli
+- Add Decimal128 data type and the attendant features such as Arrow Kernel and UDF support
+- Add new experimental e-graph based optimizer
+
+### Ballista
+
+- Begin work on design documents and plan / priorities for development
+
+### Extensions
+
+- Stable S3 support
+- Begin design discussions and prototyping of a stream provider
+
+## Beyond 2022 Q1
+
+There is no clear timeline for the below, but community members have expressed interest in working on these topics.
+
+### DataFusion Core
+
+- Custom SQL support
+- Split DataFusion into multiple crates
+- Push based query execution and code gen

Review comment:
       ```suggestion
   Push based query execution and codegen
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] xudong963 commented on a change in pull request #1616: Add roadmap to readme

Posted by GitBox <gi...@apache.org>.
xudong963 commented on a change in pull request #1616:
URL: https://github.com/apache/arrow-datafusion/pull/1616#discussion_r787915749



##########
File path: README.md
##########
@@ -141,6 +141,60 @@ datafusion = "6.0.0"
 
 DataFusion also includes a simple command-line interactive SQL utility. See the [CLI reference](https://arrow.apache.org/datafusion/cli/index.html) for more information.
 
+# Roadmap
+
+A quarterly roadmap will be published to give the DataFusion community visibility into the priorities of the projects contributors. This roadmap is not binding.
+
+## 2022 Q1
+
+### DataFusion Core
+
+- Publish official Arrow2 branch
+- Implementation of memory manager (i.e. to enable spilling to disk as needed)
+
+### Benchmarking
+
+- Inclusion in Db-Benchmark with all quries covered
+- All TPCH queries covered
+
+### Performance Improvements
+
+- Predicate evaluation
+- Multi-column comparisons that can't be vectorized
+- Null constant support
+
+### New Features
+
+- Read JSON as table
+- Simplify DDL with Datafusion-Cli
+- Add Decimal128 data type and the attendant features such as Arrow Kernel and UDF support
+- Add new experimental e-graph based optimizer
+
+### Ballista
+
+- Begin work on design documents and plan / priorities for development
+
+### Extensions
+
+- Stable S3 support
+- Begin design discussions and prototyping of a stream provider
+
+## Beyond 2022 Q1
+
+There is no clear timeline for the below, but community members have expressed interest in working on these topics.
+
+### DataFusion Core
+
+- Custom SQL support
+- Split DataFusion into multiple crates
+- Push based query execution and code gen

Review comment:
       ```suggestion
   - Push based query execution and codegen
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] matthewmturner commented on a change in pull request #1616: Add roadmap to readme

Posted by GitBox <gi...@apache.org>.
matthewmturner commented on a change in pull request #1616:
URL: https://github.com/apache/arrow-datafusion/pull/1616#discussion_r788329491



##########
File path: README.md
##########
@@ -141,6 +141,60 @@ datafusion = "6.0.0"
 
 DataFusion also includes a simple command-line interactive SQL utility. See the [CLI reference](https://arrow.apache.org/datafusion/cli/index.html) for more information.
 
+# Roadmap
+
+A quarterly roadmap will be published to give the DataFusion community visibility into the priorities of the projects contributors. This roadmap is not binding.
+
+## 2022 Q1
+
+### DataFusion Core
+
+- Publish official Arrow2 branch
+- Implementation of memory manager (i.e. to enable spilling to disk as needed)
+
+### Benchmarking
+
+- Inclusion in Db-Benchmark with all quries covered
+- All TPCH queries covered

Review comment:
       The db-benchmark im referring to is getting datafusion included here https://h2oai.github.io/db-benchmark/
   
   I've opened a PR there to get it added.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] alamb commented on pull request #1616: Add roadmap to readme

Posted by GitBox <gi...@apache.org>.
alamb commented on pull request #1616:
URL: https://github.com/apache/arrow-datafusion/pull/1616#issuecomment-1017930373


   Thanks all who contributed to the roadmap -- would love to keep it as a living document


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] alamb merged pull request #1616: Add roadmap to readme

Posted by GitBox <gi...@apache.org>.
alamb merged pull request #1616:
URL: https://github.com/apache/arrow-datafusion/pull/1616


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] matthewmturner edited a comment on pull request #1616: Add roadmap to readme

Posted by GitBox <gi...@apache.org>.
matthewmturner edited a comment on pull request #1616:
URL: https://github.com/apache/arrow-datafusion/pull/1616#issuecomment-1016636637


   > What is the difference between a new feature and extension? They both sound like kinda the same thing. Semantics aside this sounds good to me.
   
   Extension is specifically referring to topics that would be in `datafusion-contrib` as opposed to a new feature that is in the `datafusion` crate.  I can make that more clear.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] tisonkun commented on a change in pull request #1616: Add roadmap to readme

Posted by GitBox <gi...@apache.org>.
tisonkun commented on a change in pull request #1616:
URL: https://github.com/apache/arrow-datafusion/pull/1616#discussion_r832112431



##########
File path: README.md
##########
@@ -141,6 +141,60 @@ datafusion = "6.0.0"
 
 DataFusion also includes a simple command-line interactive SQL utility. See the [CLI reference](https://arrow.apache.org/datafusion/cli/index.html) for more information.
 
+# Roadmap
+
+A quarterly roadmap will be published to give the DataFusion community visibility into the priorities of the projects contributors. This roadmap is not binding.
+
+## 2022 Q1
+
+### DataFusion Core
+
+- Publish official Arrow2 branch

Review comment:
       Hi @alamb! I'm curious what "arrow2" means here. Is it related to https://github.com/jorgecarleitao/arrow2?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org