You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pulsar.apache.org by ni...@apache.org on 2022/09/23 13:20:05 UTC

[pulsar] branch master updated: docs: Add ceveat to Pulsar SQL overview and update Presto to Trino (#17798)

This is an automated email from the ASF dual-hosted git repository.

nicoloboschi pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/pulsar.git


The following commit(s) were added to refs/heads/master by this push:
     new d14a3bd822e docs: Add ceveat to Pulsar SQL overview and update Presto to Trino (#17798)
d14a3bd822e is described below

commit d14a3bd822ef1132aa4c41f46b56a61020c0fc5f
Author: tison <wa...@gmail.com>
AuthorDate: Fri Sep 23 21:19:56 2022 +0800

    docs: Add ceveat to Pulsar SQL overview and update Presto to Trino (#17798)
    
    * docs: Trino -> PrestoSQL
    
    Signed-off-by: tison <wa...@gmail.com>
    
    * Trino -> PrestoSQL for sql-rest-api
    
    Signed-off-by: tison <wa...@gmail.com>
    
    * Trino -> PrestoSQL for sql-deployment-configurations
    
    Signed-off-by: tison <wa...@gmail.com>
    
    * Trino -> PrestoSQL for sql-getting-started
    
    Signed-off-by: tison <wa...@gmail.com>
    
    * add caveat
    
    Signed-off-by: tison <wa...@gmail.com>
    
    Signed-off-by: tison <wa...@gmail.com>
---
 site2/docs/sql-deployment-configurations.md | 78 +++++++++-----------------
 site2/docs/sql-getting-started.md           | 85 +++++------------------------
 site2/docs/sql-overview.md                  |  8 ++-
 site2/docs/sql-rest-api.md                  | 45 ++++++++++-----
 4 files changed, 77 insertions(+), 139 deletions(-)

diff --git a/site2/docs/sql-deployment-configurations.md b/site2/docs/sql-deployment-configurations.md
index 7cc8e024dfd..8c1362e5637 100644
--- a/site2/docs/sql-deployment-configurations.md
+++ b/site2/docs/sql-deployment-configurations.md
@@ -4,10 +4,11 @@ title: Pulsar SQL configuration and deployment
 sidebar_label: "Configuration and deployment"
 ---
 
-You can configure the Presto Pulsar connector and deploy a cluster with the following instruction.
+You can configure the Pulsar Trino plugin and deploy a cluster with the following instruction.
 
-## Configure Presto Pulsar Connector
-You can configure Presto Pulsar Connector in the `${project.root}/conf/presto/catalog/pulsar.properties` properties file. The configuration for the connector and the default values are as follows.
+## Configure Pulsar Trino plugin
+
+You can configure the Pulsar Trino plugin in the `${project.root}/trino/conf/catalog/pulsar.properties` properties file. The configuration for the connector and the default values are as follows.
 
 ```properties
 # name of the connector to be displayed in the catalog
@@ -114,16 +115,17 @@ pulsar.nar-extraction-directory=System.getProperty("java.io.tmpdir")
 
 By default, the authentication and authorization between Pulsar and Pulsar SQL are disabled.
 
-To enable it, set the following configurations in the `${project.root}/conf/presto/catalog/pulsar.properties` properties file:
+To enable it, set the following configurations in the `${project.root}/trino/conf/catalog/pulsar.properties` properties file:
 
 ```properties
 pulsar.authorization-enabled=true
 pulsar.broker-binary-service-url=pulsar://localhost:6650
 ```
 
-### Connect Presto to Pulsar with multiple hosts
+### Connect Trino to Pulsar with multiple hosts
+
+You can connect Trino to a Pulsar cluster with multiple hosts.
 
-You can connect Presto to a Pulsar cluster with multiple hosts. 
 * To configure multiple hosts for brokers, add multiple URLs to `pulsar.web-service-url`. 
 * To configure multiple hosts for ZooKeeper, add multiple URIs to `pulsar.zookeeper-uri`. 
 
@@ -146,13 +148,13 @@ If you want to get the last message in a topic, set the following configurations
 
 1. For the broker configuration, set `bookkeeperExplicitLacIntervalInMills` > 0 in `broker.conf` or `standalone.conf`.
    
-2. For the Presto configuration, set `pulsar.bookkeeper-explicit-interval` > 0 and `pulsar.bookkeeper-use-v2-protocol=false`.
+2. For the Trino configuration, set `pulsar.bookkeeper-explicit-interval` > 0 and `pulsar.bookkeeper-use-v2-protocol=false`.
 
 However, using BookKeeper V3 protocol introduces additional GC overhead to BK as it uses Protobuf.
 
-## Query data from existing Presto clusters
+## Query data from existing Trino clusters
 
-If you already have a Presto cluster, you can copy the Presto Pulsar connector plugin to your existing cluster. Download the archived plugin package with the following command.
+If you already have a Trino cluster compatible to version 363, you can copy the Pulsar Trino plugin to your existing cluster. Download the archived plugin package with the following command.
 
 ```bash
 wget pulsar:binary_release_url
@@ -160,7 +162,7 @@ wget pulsar:binary_release_url
 
 ## Deploy a new cluster
 
-Since Pulsar SQL is powered by [Trino (formerly Presto SQL)](https://trino.io), the configuration for deployment is the same for the Pulsar SQL worker. 
+Since Pulsar SQL is powered by Trino, the configuration for deployment is the same for the Pulsar SQL worker. 
 
 :::note
 
@@ -168,42 +170,14 @@ For how to set up a standalone single node environment, refer to [Query data](sq
 
 :::
 
-You can use the same CLI args as the Presto launcher.
-
-```bash
-./bin/pulsar sql-worker --help
-Usage: launcher [options] command
-
-Commands: run, start, stop, restart, kill, status
-
-Options:
-  -h, --help            show this help message and exit
-  -v, --verbose         Run verbosely
-  --etc-dir=DIR         Defaults to INSTALL_PATH/etc
-  --launcher-config=FILE
-                        Defaults to INSTALL_PATH/bin/launcher.properties
-  --node-config=FILE    Defaults to ETC_DIR/node.properties
-  --jvm-config=FILE     Defaults to ETC_DIR/jvm.config
-  --config=FILE         Defaults to ETC_DIR/config.properties
-  --log-levels-file=FILE
-                        Defaults to ETC_DIR/log.properties
-  --data-dir=DIR        Defaults to INSTALL_PATH
-  --pid-file=FILE       Defaults to DATA_DIR/var/run/launcher.pid
-  --launcher-log-file=FILE
-                        Defaults to DATA_DIR/var/log/launcher.log (only in
-                        daemon mode)
-  --server-log-file=FILE
-                        Defaults to DATA_DIR/var/log/server.log (only in
-                        daemon mode)
-  -D NAME=VALUE         Set a Java system property
-```
+You can use the same CLI args as the Trino launcher.
 
-The default configuration for the cluster is located in `${project.root}/conf/presto`. You can customize your deployment by modifying the default configuration.
+The default configuration for the cluster is located in `${project.root}/trino/conf`. You can customize your deployment by modifying the default configuration.
 
 You can set the worker to read from a different configuration directory, or set a different directory to write data. 
 
 ```bash
-./bin/pulsar sql-worker run --etc-dir /tmp/incubator-pulsar/conf/presto --data-dir /tmp/presto-1
+./bin/pulsar sql-worker run --etc-dir /tmp/pulsar/trino/conf --data-dir /tmp/trino-1
 ```
 
 You can start the worker as daemon process.
@@ -214,11 +188,11 @@ You can start the worker as daemon process.
 
 ### Deploy a cluster on multiple nodes 
 
-You can deploy a Pulsar SQL cluster or Presto cluster on multiple nodes. The following example shows how to deploy a cluster on three-node cluster. 
+You can deploy a Pulsar SQL cluster or Trino cluster on multiple nodes. The following example shows how to deploy a cluster on three-node cluster. 
 
 1. Copy the Pulsar binary distribution to three nodes.
 
-The first node runs as Presto coordinator. The minimal configuration required in the `${project.root}/conf/presto/config.properties` file is as follows. 
+The first node runs as Trino coordinator. The minimal configuration required in the `${project.root}/trino/conf/config.properties` file is as follows. 
 
 ```properties
 coordinator=true
@@ -240,30 +214,30 @@ query.max-memory-per-node=1GB
 discovery.uri=<coordinator-url>
 ```
 
-2. Modify `pulsar.web-service-url` and  `pulsar.zookeeper-uri` configuration in the `${project.root}/conf/presto/catalog/pulsar.properties` file accordingly for the three nodes.
+2. Modify `pulsar.web-service-url` and  `pulsar.zookeeper-uri` configuration in the `${project.root}/trino/conf/catalog/pulsar.properties` file accordingly for the three nodes.
 
-3. Start the coordinator node.
+3. Start the coordinator node:
 
-```
+```bash
 ./bin/pulsar sql-worker run
 ```
 
-4. Start worker nodes.
+4. Start worker nodes:
 
-```
+```bash
 ./bin/pulsar sql-worker run
 ```
 
-5. Start the SQL CLI and check the status of your cluster.
+5. Start the SQL CLI and check the status of your cluster:
 
 ```bash
 ./bin/pulsar sql --server <coordinate_url>
 ```
 
-6. Check the status of your nodes.
+6. Check the status of your nodes:
 
 ```bash
-presto> SELECT * FROM system.runtime.nodes;
+trino> SELECT * FROM system.runtime.nodes;
  node_id |        http_uri         | node_version | coordinator | state  
 ---------+-------------------------+--------------+-------------+--------
  1       | http://192.168.2.1:8081 | testversion  | true        | active 
@@ -271,7 +245,7 @@ presto> SELECT * FROM system.runtime.nodes;
  2       | http://192.168.2.3:8081 | testversion  | false       | active
 ```
 
-For more information about the deployment in Presto, refer to [Presto deployment](https://trino.io/docs/current/installation/deployment.html).
+For more information about the deployment in Trino, refer to [Trino deployment](https://trino.io/docs/363/installation/deployment.html).
 
 :::note
 
diff --git a/site2/docs/sql-getting-started.md b/site2/docs/sql-getting-started.md
index 0732682df32..a49d43c48ff 100644
--- a/site2/docs/sql-getting-started.md
+++ b/site2/docs/sql-getting-started.md
@@ -15,7 +15,7 @@ Before querying data in Pulsar, you need to install Pulsar and built-in connecto
 
 To query data in Pulsar with Pulsar SQL, complete the following steps.
 
-1. Start a Pulsar standalone cluster.
+1. Start a Pulsar standalone cluster:
 
 ```bash
 PULSAR_STANDALONE_USE_ZOOKEEPER=1 ./bin/pulsar standalone
@@ -23,27 +23,26 @@ PULSAR_STANDALONE_USE_ZOOKEEPER=1 ./bin/pulsar standalone
 
 :::note
 
-Starting the Pulsar standalone cluster from scratch doesn't enable ZooKeeper by default.
-However, the Pulsar SQL depends on ZooKeeper. Therefore, you need to set `PULSAR_STANDALONE_USE_ZOOKEEPER=1` to enable ZooKeeper.
+Starting the Pulsar standalone cluster from scratch doesn't enable ZooKeeper by default. However, the Pulsar SQL depends on ZooKeeper. Therefore, you need to set `PULSAR_STANDALONE_USE_ZOOKEEPER=1` to enable ZooKeeper.
 
 :::
 
-2. Start a Pulsar SQL worker.
+2. Start a Pulsar SQL worker:
 
 ```bash
 ./bin/pulsar sql-worker run
 ```
 
-3. After initializing Pulsar standalone cluster and the SQL worker, run SQL CLI.
+3. After initializing Pulsar standalone cluster and the SQL worker, run SQL CLI:
 
 ```bash
 ./bin/pulsar sql
 ```
 
-4. Test with SQL commands.
+4. Test with SQL commands:
 
 ```bash
-presto> show catalogs;
+trino> show catalogs;
  Catalog
 ---------
  pulsar
@@ -55,7 +54,7 @@ Splits: 19 total, 19 done (100.00%)
 0:00 [0 rows, 0B] [0 rows/s, 0B/s]
 
 
-presto> show schemas in pulsar;
+trino> show schemas in pulsar;
         Schema
 -----------------------
  information_schema
@@ -68,7 +67,7 @@ Splits: 19 total, 19 done (100.00%)
 0:00 [4 rows, 89B] [21 rows/s, 471B/s]
 
 
-presto> show tables in pulsar."public/default";
+trino> show tables in pulsar."public/default";
  Table
 -------
 (0 rows)
@@ -80,16 +79,16 @@ Splits: 19 total, 19 done (100.00%)
 
 Since there is no data in Pulsar, no records are returned.
 
-5. Start the built-in connector _DataGeneratorSource_ and ingest some mock data.
+5. Start the built-in connector `DataGeneratorSource` and ingest some mock data:
 
 ```bash
 ./bin/pulsar-admin sources create --name generator --destinationTopicName generator_test --source-type data-generator
 ```
 
-And then you can query a topic in the namespace "public/default".
+And then you can query a topic in the namespace "public/default":
 
 ```bash
-presto> show tables in pulsar."public/default";
+trino> show tables in pulsar."public/default";
      Table
 ----------------
  generator_test
@@ -100,10 +99,10 @@ Splits: 19 total, 19 done (100.00%)
 0:02 [1 rows, 38B] [0 rows/s, 17B/s]
 ```
 
-You can now query the data within the topic "generator_test".
+You can now query the data within the topic "generator_test":
 
 ```bash
-presto> select * from pulsar."public/default".generator_test;
+trino> select * from pulsar."public/default".generator_test;
 
   firstname  | middlename  |  lastname   |              email               |   username   | password | telephonenumber | age |                 companyemail                  | nationalidentitycardnumber |
 -------------+-------------+-------------+----------------------------------+--------------+----------+-----------------+-----+-----------------------------------------------+----------------------------+
@@ -117,61 +116,3 @@ presto> select * from pulsar."public/default".generator_test;
 ```
 
 You can query the mock data.
-
-## Query your own data
-
-If you want to query your own data, you need to ingest your own data first. You can write a simple producer and write custom defined data to Pulsar. The following is an example.
-
-```java
-public class TestProducer {
-
-    public static class Foo {
-        private int field1 = 1;
-        private String field2;
-        private long field3;
-
-        public Foo() {
-        }
-
-        public int getField1() {
-            return field1;
-        }
-
-        public void setField1(int field1) {
-            this.field1 = field1;
-        }
-
-        public String getField2() {
-            return field2;
-        }
-
-        public void setField2(String field2) {
-            this.field2 = field2;
-        }
-
-        public long getField3() {
-            return field3;
-        }
-
-        public void setField3(long field3) {
-            this.field3 = field3;
-        }
-    }
-
-    public static void main(String[] args) throws Exception {
-        PulsarClient pulsarClient = PulsarClient.builder().serviceUrl("pulsar://localhost:6650").build();
-        Producer<Foo> producer = pulsarClient.newProducer(AvroSchema.of(Foo.class)).topic("test_topic").create();
-
-        for (int i = 0; i < 1000; i++) {
-            Foo foo = new Foo();
-            foo.setField1(i);
-            foo.setField2("foo" + i);
-            foo.setField3(System.currentTimeMillis());
-            producer.newMessage().value(foo).send();
-        }
-        producer.close();
-        pulsarClient.close();
-    }
-}
-```
-
diff --git a/site2/docs/sql-overview.md b/site2/docs/sql-overview.md
index 5c860a7e28e..f320e6a7291 100644
--- a/site2/docs/sql-overview.md
+++ b/site2/docs/sql-overview.md
@@ -6,12 +6,16 @@ sidebar_label: "Overview"
 
 Apache Pulsar is used to store streams of event data, and the event data is structured with predefined fields. With the implementation of the [Schema Registry](schema-get-started.md), you can store structured data in Pulsar and query the data by using [Trino (formerly Presto SQL)](https://trino.io/).
 
-As the core of Pulsar SQL, the Presto Pulsar connector enables Presto workers within a Presto cluster to query data from Pulsar.
+As the core of Pulsar SQL, the Pulsar Trino plugin enables Trino workers within a Trino cluster to query data from Pulsar.
 
 ![The Pulsar consumer and reader interfaces](/assets/pulsar-sql-arch-2.png)
 
 The query performance is efficient and highly scalable, because Pulsar adopts [two-level-segment-based architecture](concepts-architecture-overview.md#apache-bookkeeper). 
 
-Topics in Pulsar are stored as segments in [Apache BookKeeper](https://bookkeeper.apache.org/). Each topic segment is replicated to some BookKeeper nodes, which enables concurrent reads and high read throughput. You can configure the number of BookKeeper nodes, and the default number is `3`. In Presto Pulsar connector, data is read directly from BookKeeper, so Presto workers can read concurrently from a horizontally scalable number of BookKeeper nodes.
+Topics in Pulsar are stored as segments in [Apache BookKeeper](https://bookkeeper.apache.org/). Each topic segment is replicated to some BookKeeper nodes, which enables concurrent reads and high read throughput. In the Pulsar Trino connector, data is read directly from BookKeeper, so Trino workers can read concurrently from a horizontally scalable number of BookKeeper nodes.
 
 ![The Pulsar consumer and reader interfaces](/assets/pulsar-sql-arch-1.png)
+
+# Caveat
+
+If you're upgrading Pulsar SQL from 2.11 or early, you should copy config files from `conf/presto` to `trino/conf`. If you're downgrading Pulsar SQL to 2.11 or early from newer versions, do verse visa.
diff --git a/site2/docs/sql-rest-api.md b/site2/docs/sql-rest-api.md
index 606966552f7..39e23a8d003 100644
--- a/site2/docs/sql-rest-api.md
+++ b/site2/docs/sql-rest-api.md
@@ -4,21 +4,21 @@ title: Pulsar SQL REST APIs
 sidebar_label: "REST APIs"
 ---
 
-This section lists resources that make up the Presto REST API v1. 
+This section lists resources that make up the Trino REST API v1. 
 
-## Request for Presto services
+## Request for Trino services
 
-All requests for Presto services should use Presto REST API v1 version. 
+All requests for Trino services should use Trino REST API v1 version. 
 
-To request services, use the explicit URL `http://presto.service:8081/v1``. You need to update `presto.service:8081` with your real Presto address before sending requests.
+To request services, use the explicit URL `http://trino.service:8081/v1``. You need to update `trino.service:8081` with your real Trino address before sending requests.
 
-`POST` requests require the `X-Presto-User` header. If you use authentication, you must use the same `username` that is specified in the authentication configuration. If you do not use authentication, you can specify anything for `username`.
+`POST` requests require the `X-Trino-User` header. If you use authentication, you must use the same `username` that is specified in the authentication configuration. If you do not use authentication, you can specify anything for `username`.
 
-```properties
-X-Presto-User: username
+```http
+X-Trino-User: username
 ```
 
-For more information about headers, refer to [PrestoHeaders](https://github.com/trinodb/trino).
+For more information about headers, refer to [client request headers](https://trino.io/docs/363/develop/client-protocol.html#client-request-headers).
 
 ## Schema
 
@@ -26,8 +26,13 @@ You can use statement in the HTTP body. All data is received as JSON document th
 
 The following is an example of `show catalogs`. The query continues until the received JSON document does not contain a `nextUri` link. Since no `error` is displayed in `stats`, it means that the query completes successfully.
 
-```powershell
-➜  ~ curl --header "X-Presto-User: test-user" --request POST --data 'show catalogs' http://localhost:8081/v1/statement
+```bash
+curl --header "X-Trino-User: test-user" --request POST --data 'show catalogs' http://localhost:8081/v1/statement
+```
+
+Output:
+
+```json
 {
    "infoUri" : "http://localhost:8081/ui/query.html?20191113_033653_00006_dg6hb",
    "stats" : {
@@ -51,8 +56,15 @@ The following is an example of `show catalogs`. The query continues until the re
    "id" : "20191113_033653_00006_dg6hb",
    "nextUri" : "http://localhost:8081/v1/statement/20191113_033653_00006_dg6hb/1"
 }
+```
+
+```bash
+curl http://localhost:8081/v1/statement/20191113_033653_00006_dg6hb/1
+```
 
-➜  ~ curl http://localhost:8081/v1/statement/20191113_033653_00006_dg6hb/1
+Output:
+
+```json
 {
    "infoUri" : "http://localhost:8081/ui/query.html?20191113_033653_00006_dg6hb",
    "nextUri" : "http://localhost:8081/v1/statement/20191113_033653_00006_dg6hb/2",
@@ -76,8 +88,15 @@ The following is an example of `show catalogs`. The query continues until the re
       "peakMemoryBytes" : 0
    }
 }
+```
+
+```bash
+curl http://localhost:8081/v1/statement/20191113_033653_00006_dg6hb/2
+```
+
+Output:
 
-➜  ~ curl http://localhost:8081/v1/statement/20191113_033653_00006_dg6hb/2
+```json
 {
    "id" : "20191113_033653_00006_dg6hb",
    "data" : [
@@ -184,4 +203,4 @@ Since the response data is not in sync with the query state from the perspective
 
 :::
 
-For more information about Presto REST API, refer to [Presto HTTP Protocol](https://github.com/prestosql/presto/wiki/HTTP-Protocol).
+For more information about Trino REST API, refer to [Trino client REST API](https://trino.io/docs/363/develop/client-protocol.html).