You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pulsar.apache.org by si...@apache.org on 2018/03/07 06:51:57 UTC

[incubator-pulsar] branch master updated: Pulsar Functions documentation (#1350)

This is an automated email from the ASF dual-hosted git repository.

sijie pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-pulsar.git


The following commit(s) were added to refs/heads/master by this push:
     new a0623d5  Pulsar Functions documentation (#1350)
a0623d5 is described below

commit a0623d57f0ac680c589197d3a973c4760d3493ae
Author: Luc Perkins <lu...@gmail.com>
AuthorDate: Tue Mar 6 22:51:55 2018 -0800

    Pulsar Functions documentation (#1350)
    
    The first cut of pulsar-functions documentation.
---
 site/_data/popovers.yaml                           |  3 +
 site/_data/pulsar-functions.yaml                   | 85 ++++++++++++++++++++
 site/docs/latest/functions/api.md                  | 76 ++++++++++++++++++
 site/docs/latest/functions/deployment.md           |  9 +++
 site/docs/latest/functions/guarantees.md           | 28 +++++++
 site/docs/latest/functions/metrics-and-stats.md    | 19 +++++
 site/docs/latest/functions/overview.md             | 91 ++++++++++++++++++++++
 site/docs/latest/functions/quickstart.md           | 29 +++++++
 .../getting-started/ConceptsAndArchitecture.md     |  4 +
 9 files changed, 344 insertions(+)

diff --git a/site/_data/popovers.yaml b/site/_data/popovers.yaml
index 3db11d2..f001970 100644
--- a/site/_data/popovers.yaml
+++ b/site/_data/popovers.yaml
@@ -88,6 +88,9 @@ pub-sub:
 pulsar:
   q: What is Pulsar?
   def: Pulsar is a distributed messaging system originally created by Yahoo but now under the stewardship of the Apache Software Foundation.
+pulsar-functions:
+  q: What are Pulsar Functions?
+  def: Pulsar Functions are lightweight functions that can consume messages from Pulsar topics, apply custom processing logic, and, if desired, publish results to topics.
 retention-policy:
   q: What is a retention policy?
   def: Size and/or time limits that you can set on a namespace to configure retention of messages that have already been acknowledged.
diff --git a/site/_data/pulsar-functions.yaml b/site/_data/pulsar-functions.yaml
new file mode 100644
index 0000000..bea3166
--- /dev/null
+++ b/site/_data/pulsar-functions.yaml
@@ -0,0 +1,85 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+#
+
+description: |
+  A tool for deploying and managing Pulsar Functions.
+example: |
+  pulsar-functions localrun \
+    --function-config my-function.yaml
+commands:
+- name: localrun
+  description: Runs a Pulsar Function
+- name: create
+  description: Creates a new Pulsar Function
+- name: delete
+  description: Deletes an existing Pulsar Function
+- name: update
+  description: Updates an existing Pulsar Function
+- name: get
+  description: Returns information about an existing Pulsar Function
+- name: list
+  description: Lists all currently existing Pulsar Functions
+  options:
+  - flags: --namespace
+    description: The namespace of the Pulsar Functions you'd like to list
+  - flags: --tenant
+    description: The tenant of the Pulsar Functions you'd like to list (you must also specify a namespace using the `--namespace` flag)
+- name: getstatus
+  description: Checks on the status of the specified Pulsar Function
+  options:
+  - flags: --namespace
+    description: The name of the Pulsar Function whose status you'd like to check
+  - flags: --tenant
+    description: The tenant of the Pulsar Function whose status you'd like to check
+  - flags: --tenant
+- name: querystate
+  description: Displays the current state of the specified Pulsar Function, by key
+  options:
+  - flags: -k, --key
+    description: The key for the desired value
+  - flags: --name
+    description: The name of the Pulsar Function whose current state you'd like to query
+  - flags: --namespace
+    description: The namespace of the Pulsar Function whose current state you'd like to query
+  - flags: -u, --storage-service-url
+    description: The URL of the storage service
+  - flags: --tenant
+    description: The tenant of the Pulsar Function whose current state you'd like to query
+  - flags: -w, --watch
+    description: If set, watch for changes in the current state of the specified Pulsar Function (by the key set using `-k`/`--key`)
+    default: 'false'
+options:
+  - flags: --name
+    description: The name of the Pulsar Function
+  - flags: --function-classname
+    description: The Java class name of the Pulsar Function
+  - flags: --function-classpath
+    description: The Java classpath of the Pulsar Function
+  - flags: --source-topic
+    description: The topic from which the Pulsar Function consumes its input
+  - flags: --sink-topic
+    description: The topic to which the Pulsar Function publishes its output (if any)
+  - flags: --input-serde-classname
+    description: Input SerDe
+    default: org.apache.pulsar.functions.runtime.serde.Utf8StringSerDe
+  - flags: --output-serde-classname
+    description: Output SerDe
+    default: org.apache.pulsar.functions.runtime.serde.Utf8StringSerDe
+  - flags: --function-config
+    description: The path for the Pulsar Function's YAML configuration file
\ No newline at end of file
diff --git a/site/docs/latest/functions/api.md b/site/docs/latest/functions/api.md
new file mode 100644
index 0000000..ba71c86
--- /dev/null
+++ b/site/docs/latest/functions/api.md
@@ -0,0 +1,76 @@
+---
+title: The Pulsar Functions API
+---
+
+## Java
+
+Java API example:
+
+```java
+import java.util.Function;
+public class ExclamationFunction implements Function<String, String> {
+    @Override
+    public String apply(String input) { return String.format("%s!", input); }
+}
+```
+
+### With context
+
+```java
+public interface PulsarFunction<I, O> {
+    O process(I input, Context context) throws Exception;
+}
+```
+
+Context interface:
+
+```java
+public interface Context {
+    byte[] getMessageId();
+    String getTopicName();
+    Collection<String> getSourceTopics();
+    String getSinkTopic();
+    String getOutputSerdeClassName();
+    String getTenant();
+    String getNamespace();
+    String getFunctionName();
+    String getFunctionId();
+    String getInstanceId();
+    String getFunctionVersion();
+    Logger getLogger();
+    void incrCounter(String key, long amount);
+    String getUserConfigValue(String key);
+    void recordMetric(String metricName, double value);
+    <O> CompletableFuture<Void> publish(String topicName, O object, String serDeClassName);
+    <O> CompletableFuture<Void> publish(String topicName, O object);
+    CompletableFuture<Void> ack(byte[] messageId, String topic);
+}
+```
+
+### SerDe
+
+> Serde stands for **Ser**ialization and **De**serialization.
+
+Built-in vs. custom. For custom, you need to implement this interface:
+
+```java
+public interface SerDe<T> {
+    T deserialize(byte[] input);
+    byte[] serialize(T input);
+}
+```
+
+The built-in is the `org.apache.pulsar.functions.api.DefaultSerDe` class:
+
+```java
+
+```
+
+The built-in should work fine for basic Java types. For more advanced types,
+
+
+## Python
+
+```python
+def process(input):
+```
\ No newline at end of file
diff --git a/site/docs/latest/functions/deployment.md b/site/docs/latest/functions/deployment.md
new file mode 100644
index 0000000..ea2d79a
--- /dev/null
+++ b/site/docs/latest/functions/deployment.md
@@ -0,0 +1,9 @@
+---
+title: Deploying Pulsar Functions
+---
+
+At the moment, Pulsar Functions are deployed
+
+## State storage
+
+By default, Pulsar uses [Apache BookKeeper](https://bookkeeper.apache.org).
\ No newline at end of file
diff --git a/site/docs/latest/functions/guarantees.md b/site/docs/latest/functions/guarantees.md
new file mode 100644
index 0000000..3efd549
--- /dev/null
+++ b/site/docs/latest/functions/guarantees.md
@@ -0,0 +1,28 @@
+---
+title: Processing guarantees
+lead: Apply at-most-once, at-least-once, or effectively-once delivery semantics to Pulsar Functions
+---
+
+Pulsar Functions provides three different messaging semantics that you can apply to any Function:
+
+* **At-most-once** delivery
+* **At-least-once** delivery
+* **Effectively-once** delivery
+
+## How it works
+
+You can set the processing guarantees for a Pulsar Function when you create the Function. This [`pulsar-function create`](../../reference/CliTools#pulsar-functions-create) command, for example, would apply effectively-once guarantees to the Function:
+
+```bash
+$ bin/pulsar-functions \
+  # TODO
+  --processingGuarantees EFFECTIVELY_ONCE
+```
+
+The available options are:
+
+* `ATMOST_ONCE`
+* `ATLEAST_ONCE`
+* `EFFECTIVELY_ONCE`
+
+{% include admonition.html type='info' content='By default, Pulsar Functions provide at-most-once delivery guarantees. If you create a function without supplying a value for the `--processingGuarantees`flag, then the Function will provide only at-most-once guarantees.' %}
\ No newline at end of file
diff --git a/site/docs/latest/functions/metrics-and-stats.md b/site/docs/latest/functions/metrics-and-stats.md
new file mode 100644
index 0000000..90f3306
--- /dev/null
+++ b/site/docs/latest/functions/metrics-and-stats.md
@@ -0,0 +1,19 @@
+---
+title: Metrics and stats for Pulsar Functions
+---
+
+Pulsar Functions can publish arbitrary metrics to the metrics interface (which can then be queried).
+
+## Java API
+
+To publish a metric to the metrics interface:
+
+```java
+void recordMetric(String metricName, double value);
+```
+
+Here's an example:
+
+```java
+Context.recordMetric("my-custom-metrics", 475);
+```
\ No newline at end of file
diff --git a/site/docs/latest/functions/overview.md b/site/docs/latest/functions/overview.md
new file mode 100644
index 0000000..6dff8d4
--- /dev/null
+++ b/site/docs/latest/functions/overview.md
@@ -0,0 +1,91 @@
+---
+title: Pulsar Functions overview
+lead: A bird's-eye look at Pulsar's lightweight, developer-friendly compute platform
+---
+
+
+**Pulsar Functions** are lightweight compute processes that
+
+* consume {% popover messages %} from one or more Pulsar {% popover topics %},
+* apply a user-supplied processing logic to each message,
+* publish the results of the computation to another topic
+
+Here's an example Pulsar Function for Java:
+
+```java
+import java.util.Function;
+
+public class ExclamationFunction implements Function<String, String> {
+    @Override
+    public String apply(String input) { return String.format("%s!", input); }
+}
+```
+
+Functions are executed each time a message is published to the input topic. If a function is listening on the topic `tweet-stream`, for example, then the function would be run each time a message.
+
+> Pulsar features automatic message deduplication
+
+### Goals
+
+Core goal: make Pulsar do real heavy lifting without needing to deploy a neighboring system (Storm, Heron, Flink, etc.). Ready-made compute infrastructure at your disposal.
+
+* Developer productivity (easy troubleshooting and deployment)
+  * "Serverless" philosophy
+* No need for a separate SPE
+
+### Inspirations
+
+* AWS Lambda, Google Cloud Functions, etc.
+* FaaS
+* Serverless/NoOps philosophy
+
+### Command-line interface
+
+You can manage Pulsar Functions using the [`pulsar-functions`](../../reference/CliTools#pulsar-functions) CLI tool. Here's an example command that would
+
+```bash
+$ bin/pulsar-functions localrun \
+  --inputs persistent://sample/standalone/ns1/test_src \
+  --output persistent://sample/standalone/ns1/test_result \
+  --jar examples/api-examples.jar \
+  --className org.apache.pulsar.functions.api.examples.ExclamationFunction
+```
+
+### Supported languages
+
+Pulsar Functions can currently be written in [Java](../../functions/api#java) and [Python](../../functions/api#python). Support for additional languages is coming soon.
+
+### Runtime
+
+### Deployment modes
+
+* Local run
+* Cluster run
+
+### Delivery semantics
+
+* At most once
+* At least once
+* Effectively once
+
+### State storage
+
+### Metrics
+
+Here's an example function that publishes a value of 1 to the `my-metric` metric.
+
+```java
+public class MetricsFunction implements PulsarFunction<String, Void> {
+    @Override
+    public Void process(String input, Context context) {
+        context.recordMetric("my-metric", 1);
+        return null;
+    }
+}
+```
+
+### Logging
+
+### Data types
+
+* Strongly typed
diff --git a/site/docs/latest/functions/quickstart.md b/site/docs/latest/functions/quickstart.md
new file mode 100644
index 0000000..8c04c6b
--- /dev/null
+++ b/site/docs/latest/functions/quickstart.md
@@ -0,0 +1,29 @@
+---
+title: Getting started with Pulsar Functions
+---
+
+## The `pulsar-functions` CLI tool
+
+[`pulsar-functions`](../../reference/CliTools#pulsar-functions)
+
+```bash
+$ alias pulsar-functions='/path/to/pulsar/bin/pulsar-functions'
+```
+
+## Querying state
+
+```bash
+$ bin/pulsar-functions querystate \
+  --tenant sample \
+  --namespace my-functions \
+  --function-name my-function \
+  --key "some-key"
+```
+
+## Running functions locally
+
+[`localrun`](../../reference/CliTools#pulsar-functions-localrun)
+
+```bash
+$ bin/pulsar-functions localrun
+```
\ No newline at end of file
diff --git a/site/docs/latest/getting-started/ConceptsAndArchitecture.md b/site/docs/latest/getting-started/ConceptsAndArchitecture.md
index 045af3f..7989344 100644
--- a/site/docs/latest/getting-started/ConceptsAndArchitecture.md
+++ b/site/docs/latest/getting-started/ConceptsAndArchitecture.md
@@ -288,6 +288,10 @@ With message retention, shown at the top, a <span style="color: #89b557;">retent
 
 With message expiry, shown at the bottom, some messages are <span style="color: #bb3b3e;">deleted</span>, even though they <span style="color: #337db6;">haven't been acknowledged</span>, because they've expired according to the <span style="color: #e39441;">TTL applied to the namespace</span> (for example because a TTL of 5 minutes has been applied and the messages haven't been acknowledged but are 10 minutes old).
 
+## Pulsar Functions
+
+For an in-depth look at Pulsar Functions, see the [Pulsar Functions overview](../../functions/overview).
+
 ## Replication
 
 Pulsar enables messages to be produced and consumed in different geo-locations. For instance, your application may be publishing data in one region or market and you would like to process it for consumption in other regions or markets. [Geo-replication](../../admin/GeoReplication) in Pulsar enables you to do that.

-- 
To stop receiving notification emails like this one, please contact
sijie@apache.org.