You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "alamb (via GitHub)" <gi...@apache.org> on 2023/04/01 15:31:40 UTC

[GitHub] [arrow-datafusion] alamb opened a new pull request, #5824: Move content from README.md to docs site

alamb opened a new pull request, #5824:
URL: https://github.com/apache/arrow-datafusion/pull/5824

   # Which issue does this PR close?
   
   Closes https://github.com/apache/arrow-datafusion/issues/5755
   
   
   # Rationale for this change
   I would like the DataFusion documentation to get better. One problem, as described on https://github.com/apache/arrow-datafusion/issues/5755,  is that there are effectively 2 doc sites (the README in the repository as well as the https://arrow.apache.org/datafusion/)
   
   While https://arrow.apache.org/datafusion/ still needs some significant love, getting more eyes on it will make that more obvious and having a single place to focus documentation effort will improve our efficiency. 
   
   I plan several more PRs to update the website content
   
   # What changes are included in this PR?
   
   1. Move all content from README to https://github.com/apache/arrow-datafusion/tree/main/docs
   2. Leave README with a pointer to the main docs
   
   # Are these changes tested?
   N/A
   <!--
   We typically require tests for all PRs in order to:
   1. Prevent the code from being accidentally broken by subsequent changes
   2. Serve as another way to document the expected behavior of the code
   
   If tests are not included in your PR, please explain why (for example, are they covered by existing tests)?
   -->
   
   # Are there any user-facing changes?
   Hopefully better docs


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb commented on a diff in pull request #5824: Move content from README.md to docs site

Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb commented on code in PR #5824:
URL: https://github.com/apache/arrow-datafusion/pull/5824#discussion_r1155126098


##########
docs/source/contributor-guide/index.md:
##########
@@ -50,11 +50,11 @@ A "major" PR means there is a substantial change in design or a change in the AP
 3. Non-controversial build-related changes (clippy, version upgrades etc.)
 4. Smaller non-controversial feature additions
 
-# Developer's guide
+## Getting Started

Review Comment:
   I changed some of these heading levels so they didn't all appear in the main table of contents on https://arrow.apache.org/datafusion/



##########
docs/source/user-guide/comparison.md:
##########
@@ -0,0 +1,33 @@
+# Comparisons to Other Projects
+
+When compared to similar systems, DataFusion typically is:
+
+1. Targeted at developers, rather than end users / data scientists.
+2. Designed to be embedded, rather than a complete file based SQL system.
+3. Governed by the [Apache Software Foundation](https://www.apache.org/) process, rather than a single company or individual.
+4. Implemented in `Rust`, rather than `C/C++`
+
+Here is a comparison with similar projects that may help understand
+when DataFusion might be be suitable and unsuitable for your needs:
+
+- [DuckDB](http://www.duckdb.org) is an open source, in process analytic database.
+  Like DataFusion, it supports very fast execution, both from its custom file format
+  and directly from parquet files. Unlike DataFusion, it is written in C/C++ and it
+  is primarily used directly by users as a serverless database and query system rather
+  than as a library for building such database systems.
+
+- [Polars](http://pola.rs): Polars is one of the fastest DataFrame
+  libraries at the time of writing. Like DataFusion, it is also
+  written in Rust and uses the Apache Arrow memory model, but unlike
+  DataFusion it does not provide SQL nor as many extension points.
+
+- [Facebook Velox](https://engineering.fb.com/2022/08/31/open-source/velox/)
+  is an execution engine. Like DataFusion, Velox aims to
+  provide a reusable foundation for building database-like systems. Unlike DataFusion,
+  it is written in C/C++ and does not include a SQL frontend or planning /optimization
+  framework.
+
+- [Databend](https://github.com/datafuselabs/databend) is a complete
+  database system. Like DataFusion it is also written in Rust and
+  utilizes the Apache Arrow memory model, but unlike DataFusion it
+  targets end-users rather than developers of other database systems.

Review Comment:
   If someone who knew more about Apache Calcite wanted to add a note here that would also be awesome



##########
README.md:
##########
@@ -27,176 +29,8 @@ in-memory format.
 
 DataFusion offers SQL and Dataframe APIs, excellent [performance](https://benchmark.clickhouse.com/), built-in support for CSV, Parquet, JSON, and Avro, extensive customization, and a great community.
 
-[![Coverage Status](https://codecov.io/gh/apache/arrow-datafusion/rust/branch/master/graph/badge.svg)](https://codecov.io/gh/apache/arrow-datafusion?branch=master)

Review Comment:
   The point of this PR is to move all this content into the docs directory. Some was already partially replicated



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb merged pull request #5824: Move content from README.md to docs site

Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb merged PR #5824:
URL: https://github.com/apache/arrow-datafusion/pull/5824


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb commented on a diff in pull request #5824: Move content from README.md to docs site

Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb commented on code in PR #5824:
URL: https://github.com/apache/arrow-datafusion/pull/5824#discussion_r1155126523


##########
docs/source/contributor-guide/architecture.md:
##########
@@ -0,0 +1,26 @@
+<!---
+  Licensed to the Apache Software Foundation (ASF) under one
+  or more contributor license agreements.  See the NOTICE file
+  distributed with this work for additional information
+  regarding copyright ownership.  The ASF licenses this file
+  to you under the Apache License, Version 2.0 (the
+  "License"); you may not use this file except in compliance
+  with the License.  You may obtain a copy of the License at
+
+    http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing,
+  software distributed under the License is distributed on an
+  "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+  KIND, either express or implied.  See the License for the
+  specific language governing permissions and limitations
+  under the License.
+-->
+
+# Architecture

Review Comment:
   I am working on an update to this content -- #4990 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org