You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@arrow.apache.org by al...@apache.org on 2021/08/01 10:36:36 UTC
[arrow-datafusion] branch master updated: expand file glob within
prettier (#803)
This is an automated email from the ASF dual-hosted git repository.
alamb pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/arrow-datafusion.git
The following commit(s) were added to refs/heads/master by this push:
new 3eac2e6 expand file glob within prettier (#803)
3eac2e6 is described below
commit 3eac2e65437de52a26d2380a7d49fbcea9eb2c15
Author: QP Hou <qp...@scribd.com>
AuthorDate: Sun Aug 1 03:36:32 2021 -0700
expand file glob within prettier (#803)
'**' pattern is not supported to some of the shells including the one we
use in CI.
---
.github/workflows/dev.yml | 4 +--
ballista/README.md | 20 +++++++--------
docs/user-guide/src/distributed/docker-compose.md | 2 +-
docs/user-guide/src/distributed/kubernetes.md | 30 +++++++++++------------
4 files changed, 28 insertions(+), 28 deletions(-)
diff --git a/.github/workflows/dev.yml b/.github/workflows/dev.yml
index 8bb35f1..39c449c 100644
--- a/.github/workflows/dev.yml
+++ b/.github/workflows/dev.yml
@@ -64,7 +64,7 @@ jobs:
# if you encounter error, try rerun the command below with --write instead of --check
# and commit the changes
npx prettier@2.3.2 --check \
- {ballista,datafusion,datafusion-examples,docs,python}/**/*.md \
+ '{ballista,datafusion,datafusion-examples,docs,python}/**/*.md' \
README.md \
DEVELOPERS.md \
- ballista/**/*.{ts,tsx}
+ 'ballista/**/*.{ts,tsx}'
diff --git a/ballista/README.md b/ballista/README.md
index 0a8db63..eeb4273 100644
--- a/ballista/README.md
+++ b/ballista/README.md
@@ -19,8 +19,8 @@
# Ballista: Distributed Compute with Apache Arrow and DataFusion
-Ballista is a distributed compute platform primarily implemented in Rust, and powered by Apache Arrow and
-DataFusion. It is built on an architecture that allows other programming languages (such as Python, C++, and
+Ballista is a distributed compute platform primarily implemented in Rust, and powered by Apache Arrow and
+DataFusion. It is built on an architecture that allows other programming languages (such as Python, C++, and
Java) to be supported as first-class citizens without paying a penalty for serialization costs.
The foundational technologies in Ballista are:
@@ -37,23 +37,23 @@ redundancy in the case of a scheduler failing.
# Getting Started
-Fully working examples are available. Refer to the [Ballista Examples README](../ballista-examples/README.md) for
+Fully working examples are available. Refer to the [Ballista Examples README](../ballista-examples/README.md) for
more information.
## Distributed Scheduler Overview
-Ballista uses the DataFusion query execution framework to create a physical plan and then transforms it into a
+Ballista uses the DataFusion query execution framework to create a physical plan and then transforms it into a
distributed physical plan by breaking the query down into stages whenever the partitioning scheme changes.
-Specifically, any `RepartitionExec` operator is replaced with an `UnresolvedShuffleExec` and the child operator
+Specifically, any `RepartitionExec` operator is replaced with an `UnresolvedShuffleExec` and the child operator
of the repartition operator is wrapped in a `ShuffleWriterExec` operator and scheduled for execution.
-Each executor polls the scheduler for the next task to run. Tasks are currently always `ShuffleWriterExec` operators
-and each task represents one *input* partition that will be executed. The resulting batches are repartitioned
-according to the shuffle partitioning scheme and each *output* partition is streamed to disk in Arrow IPC format.
+Each executor polls the scheduler for the next task to run. Tasks are currently always `ShuffleWriterExec` operators
+and each task represents one _input_ partition that will be executed. The resulting batches are repartitioned
+according to the shuffle partitioning scheme and each _output_ partition is streamed to disk in Arrow IPC format.
-The scheduler will replace `UnresolvedShuffleExec` operators with `ShuffleReaderExec` operators once all shuffle
-tasks have completed. The `ShuffleReaderExec` operator connects to other executors as required using the Flight
+The scheduler will replace `UnresolvedShuffleExec` operators with `ShuffleReaderExec` operators once all shuffle
+tasks have completed. The `ShuffleReaderExec` operator connects to other executors as required using the Flight
interface, and streams the shuffle IPC files.
# How does this compare to Apache Spark?
diff --git a/docs/user-guide/src/distributed/docker-compose.md b/docs/user-guide/src/distributed/docker-compose.md
index 14989e5..9ada1ba 100644
--- a/docs/user-guide/src/distributed/docker-compose.md
+++ b/docs/user-guide/src/distributed/docker-compose.md
@@ -24,7 +24,7 @@ demonstrates how to start a cluster using a single process that acts as both a s
volume mounted into the container so that Ballista can access the host file system.
```yaml
-version: '2.2'
+version: "2.2"
services:
etcd:
image: quay.io/coreos/etcd:v3.4.9
diff --git a/docs/user-guide/src/distributed/kubernetes.md b/docs/user-guide/src/distributed/kubernetes.md
index 4b80d17..ef4acca 100644
--- a/docs/user-guide/src/distributed/kubernetes.md
+++ b/docs/user-guide/src/distributed/kubernetes.md
@@ -129,16 +129,16 @@ spec:
ballista-cluster: ballista
spec:
containers:
- - name: ballista-scheduler
- image: <your-image>
- command: ["/scheduler"]
- args: ["--bind-port=50050"]
- ports:
- - containerPort: 50050
- name: flight
- volumeMounts:
- - mountPath: /mnt
- name: data
+ - name: ballista-scheduler
+ image: <your-image>
+ command: ["/scheduler"]
+ args: ["--bind-port=50050"]
+ ports:
+ - containerPort: 50050
+ name: flight
+ volumeMounts:
+ - mountPath: /mnt
+ name: data
volumes:
- name: data
persistentVolumeClaim:
@@ -245,10 +245,10 @@ spec:
minReplicaCount: 0
maxReplicaCount: 5
triggers:
- - type: external
- metadata:
- # Change this DNS if the scheduler isn't deployed in the "default" namespace
- scalerAddress: ballista-scheduler.default.svc.cluster.local:50050
+ - type: external
+ metadata:
+ # Change this DNS if the scheduler isn't deployed in the "default" namespace
+ scalerAddress: ballista-scheduler.default.svc.cluster.local:50050
```
And then deploy it into the cluster:
@@ -261,4 +261,4 @@ If the cluster is inactive, Keda will now scale the number of executors down to
you launch a query. Please note that Keda will perform a scan once every 30 seconds, so it might take a bit to
scale the executors.
-Please visit Keda's [documentation page](https://keda.sh/docs/2.3/concepts/scaling-deployments/) for more information.
\ No newline at end of file
+Please visit Keda's [documentation page](https://keda.sh/docs/2.3/concepts/scaling-deployments/) for more information.