You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/03/23 07:27:18 UTC

[GitHub] [arrow-datafusion] yahoNanJing opened a new pull request #2065: split datafusion-object-store module

yahoNanJing opened a new pull request #2065:
URL: https://github.com/apache/arrow-datafusion/pull/2065


   # Which issue does this PR close?
   
   <!--
   We generally require a GitHub issue to be filed for all bug fixes and enhancements and this helps us generate change logs for our releases. You can link an issue to this PR using the GitHub syntax. For example `Closes #123` indicates that this PR will close issue #123.
   -->
   
   Closes #1772.
   
    # Rationale for this change
   <!--
    Why are you proposing this change? If this is already explained clearly in the issue then this section is not needed.
    Explaining clearly why changes are proposed helps reviewers understand your changes and offer better suggestions for fixes.  
   -->
   By splitting the object store into a separate module, the object stores in datafusion-contrib no longer need to depend on the whole datafusion crate. They only need to depend on this separated module. What's more, if we hope to introduce the object stores in datafusion-contrib as some featuers in the datafusion crate, the issue of dependency cyclic can be avoided.
   
   # What changes are included in this PR?
   <!--
   There is no need to duplicate the description in the issue here but it is sometimes worth providing a summary of the individual changes in this PR.
   -->
   - Introduces a new crate datafusion-storage for object store interfaces.
   - Moves the object store in previous datasource mod into this new datafusion-storage crate.
   
   # Are there any user-facing changes?
   <!--
   If there are user-facing changes then we may require documentation to be updated before approving the PR.
   -->
   
   <!--
   If there are any breaking changes to public APIs, please add the `api change` label.
   -->
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] yjshen merged pull request #2065: split datafusion-object-store module

Posted by GitBox <gi...@apache.org>.
yjshen merged pull request #2065:
URL: https://github.com/apache/arrow-datafusion/pull/2065


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] matthewmturner commented on pull request #2065: split datafusion-object-store module

Posted by GitBox <gi...@apache.org>.
matthewmturner commented on pull request #2065:
URL: https://github.com/apache/arrow-datafusion/pull/2065#issuecomment-1076803178


   Overall looks good.  I think my only point, similar to what we discussed before, is on some of the `ObjectStore` functionality that is still hanging around in datafusion core i.e. `ListingTable` and functionality in `datasource`.  I know the `ObjectStore` doesnt need them and this keeps `datafusion-storage` light, which of course is nice.  To me they just seem logically coupled so at first was odd seeing the functionality split like that.
   
   That being said, i have given this more thought and given how the above mentioned functionalities leverage physical plan, etc. i do think it makes sense to have datafusion wrap these functionalities together.  just walking through my thought process out loud :)
   
   thanks for this.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] alamb commented on a change in pull request #2065: split datafusion-object-store module

Posted by GitBox <gi...@apache.org>.
alamb commented on a change in pull request #2065:
URL: https://github.com/apache/arrow-datafusion/pull/2065#discussion_r833623347



##########
File path: datafusion-storage/README.md
##########
@@ -0,0 +1,24 @@
+<!---
+  Licensed to the Apache Software Foundation (ASF) under one
+  or more contributor license agreements.  See the NOTICE file
+  distributed with this work for additional information
+  regarding copyright ownership.  The ASF licenses this file
+  to you under the Apache License, Version 2.0 (the
+  "License"); you may not use this file except in compliance
+  with the License.  You may obtain a copy of the License at
+
+    http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing,
+  software distributed under the License is distributed on an
+  "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+  KIND, either express or implied.  See the License for the
+  specific language governing permissions and limitations
+  under the License.
+-->
+
+# DataFusion Common

Review comment:
       ```suggestion
   # DataFusion Storage
   ```

##########
File path: datafusion-storage/README.md
##########
@@ -0,0 +1,24 @@
+<!---
+  Licensed to the Apache Software Foundation (ASF) under one
+  or more contributor license agreements.  See the NOTICE file
+  distributed with this work for additional information
+  regarding copyright ownership.  The ASF licenses this file
+  to you under the Apache License, Version 2.0 (the
+  "License"); you may not use this file except in compliance
+  with the License.  You may obtain a copy of the License at
+
+    http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing,
+  software distributed under the License is distributed on an
+  "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+  KIND, either express or implied.  See the License for the
+  specific language governing permissions and limitations
+  under the License.
+-->
+
+# DataFusion Common
+
+This is an internal module for the most fundamental datasource storage of [DataFusion][df].

Review comment:
       ```suggestion
   This module contains an `async` API for the [DataFusion][df] to access data, either remotely or locally.
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] yahoNanJing commented on pull request #2065: split datafusion-object-store module

Posted by GitBox <gi...@apache.org>.
yahoNanJing commented on pull request #2065:
URL: https://github.com/apache/arrow-datafusion/pull/2065#issuecomment-1076018420


   Hi @Jimexist, @matthewmturner, @yjshen,  @alamb, could you help review this patch?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] alamb commented on pull request #2065: split datafusion-object-store module

Posted by GitBox <gi...@apache.org>.
alamb commented on pull request #2065:
URL: https://github.com/apache/arrow-datafusion/pull/2065#issuecomment-1077745028


   🎉 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] alamb commented on pull request #2065: split datafusion-object-store module

Posted by GitBox <gi...@apache.org>.
alamb commented on pull request #2065:
URL: https://github.com/apache/arrow-datafusion/pull/2065#issuecomment-1076804460


   > To me they just seem logically coupled so at first was odd seeing the functionality split like that.
   
   🤔  maybe it is time for `datafusion-listing-table` crate


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org