You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pulsar.apache.org by GitBox <gi...@apache.org> on 2020/06/30 10:19:33 UTC

[GitHub] [pulsar] Anonymitaet opened a new pull request #7393: Update docs for tiered storage

Anonymitaet opened a new pull request #7393:
URL: https://github.com/apache/pulsar/pull/7393


   Currently, the docs for tiered storage is not clear and incomplete.
   
   This PR adds and reorganize contents for tiered storage.
   
   Old structure:
   ![image](https://user-images.githubusercontent.com/50226895/86115080-01a0db00-bafe-11ea-89bf-0f140fd3ce65.png)
   
   New structure:
   ![image](https://user-images.githubusercontent.com/50226895/86115128-14b3ab00-bafe-11ea-8740-9629c27a95f8.png)
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] gaoran10 commented on a change in pull request #7393: Update docs for tiered storage

Posted by GitBox <gi...@apache.org>.
gaoran10 commented on a change in pull request #7393:
URL: https://github.com/apache/pulsar/pull/7393#discussion_r448077327



##########
File path: site2/docs/tiered-storage-aws.md
##########
@@ -0,0 +1,283 @@
+---
+id: tiered-storage-aws
+title: Use AWS S3 offloader with Pulsar
+sidebar_label: AWS S3 offloader
+---
+
+This chapter guides you through every step of installing and configuring the AWS S3 offloader and using it with Pulsar.
+
+## Installation
+
+Follow the steps below to install the AWS S3 offloader.
+
+### Prerequisite
+
+- Pulsar: 2.4.2 or later versions
+  
+- Apache jclouds: 2.2.0 or later versions

Review comment:
       Which version of JClouds to use up to the Pulsar, so users could don't care about it.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] Anonymitaet merged pull request #7393: Update docs for tiered storage

Posted by GitBox <gi...@apache.org>.
Anonymitaet merged pull request #7393:
URL: https://github.com/apache/pulsar/pull/7393


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] gaoran10 commented on a change in pull request #7393: Update docs for tiered storage

Posted by GitBox <gi...@apache.org>.
gaoran10 commented on a change in pull request #7393:
URL: https://github.com/apache/pulsar/pull/7393#discussion_r448139695



##########
File path: site2/docs/tiered-storage-filesystem.md
##########
@@ -0,0 +1,268 @@
+---
+id: tiered-storage-filesystem
+title: Use filesystem offloader with Pulsar
+sidebar_label: Filesystem offloader
+---
+
+This chapter guides you through every step of installing and configuring the filesystem offloader and using it with Pulsar.
+
+## Installation
+
+Follow the steps below to install the filesystem offloader.
+
+### Prerequisite
+
+- Pulsar: 2.4.2 or later versions
+
+- Hadoop: 3.x.x
+
+### Step
+
+This example uses Pulsar 2.5.1.
+
+1. Download the Pulsar tarball using one of the following ways:
+
+   * Download from the [Apache mirror](https://archive.apache.org/dist/pulsar/pulsar-2.5.1/apache-pulsar-2.5.1-bin.tar.gz)
+
+   * Download from the Pulsar [download page](https://pulsar.apache.org/download)
+
+   * Use [wget](https://www.gnu.org/software/wget)
+
+     ```shell
+     wget https://archive.apache.org/dist/pulsar/pulsar-2.5.1/apache-pulsar-2.5.1-bin.tar.gz
+     ```
+
+2. Download and untar the Pulsar offloaders package. 
+
+    ```bash
+    wget https://downloads.apache.org/pulsar/pulsar-2.5.1/apache-pulsar-offloaders-2.5.1-bin.tar.gz
+
+    tar xvfz apache-pulsar-offloaders-2.5.1-bin.tar.gz
+    ```
+
+    > #### Note
+    >
+    > * If you are running Pulsar in a bare metal cluster, make sure that `offloaders` tarball is unzipped in every broker's Pulsar directory.
+    > 
+    > * If you are running Pulsar in Docker or deploying Pulsar using a Docker image (such as K8S and DCOS), you can use the `apachepulsar/pulsar-all` image instead of the `apachepulsar/pulsar` image. `apachepulsar/pulsar-all` image has already bundled tiered storage offloaders.
+
+3. Copy the Pulsar offloaders as `offloaders` in the Pulsar directory.
+
+    ```
+    mv apache-pulsar-offloaders-2.5.1/offloaders apache-pulsar-2.5.1/offloaders
+
+    ls offloaders
+    ```
+
+    **Output**
+
+    ```
+    tiered-storage-file-system-2.5.1.nar
+    tiered-storage-jcloud-2.5.1.nar
+    ```
+
+    > #### Note
+    >
+    > * If you are running Pulsar in a bare metal cluster, make sure that `offloaders` tarball is unzipped in every broker's Pulsar directory.
+    > 
+    > * If you are running Pulsar in Docker or deploying Pulsar using a Docker image (such as K8s and DCOS), you can use the `apachepulsar/pulsar-all` image instead of the `apachepulsar/pulsar` image. `apachepulsar/pulsar-all` image has already bundled tiered storage offloaders.
+
+## Configuration
+
+> #### Note
+> 
+> Before offloading data from BookKeeper to filesystem, you need to configure some properties of the filesystem offloader driver. 
+
+Besides, you can also configure the filesystem offloader to run it automatically or trigger it manually.
+
+### Configure filesystem offloader driver
+
+You can configure filesystem offloader driver in the configuration file `broker.conf` or `standalone.conf`.
+
+- **Required** configurations are as below.
+  
+    Required configuration | Description | Example value
+    |---|---|---
+    `managedLedgerOffloadDriver` | Offloader driver name, which is case-insensitive. | filesystem
+    `fileSystemURI` | Connection address | hdfs://127.0.0.1:9000
+    `offloadersDirectory` | Hadoop profile path | ../conf/filesystem_offload_core_site.xml

Review comment:
       ```suggestion
       `offloadersDirectory` | Offloader directory | offloaders
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] Anonymitaet commented on a change in pull request #7393: Update docs for tiered storage

Posted by GitBox <gi...@apache.org>.
Anonymitaet commented on a change in pull request #7393:
URL: https://github.com/apache/pulsar/pull/7393#discussion_r448086929



##########
File path: site2/docs/tiered-storage-gcs.md
##########
@@ -0,0 +1,275 @@
+---
+id: tiered-storage-gcs
+title: Use GCS offloader with Pulsar
+sidebar_label: GCS offloader
+---
+
+This chapter guides you through every step of installing and configuring the GCS offloader and using it with Pulsar.
+
+## Installation
+
+Follow the steps below to install the GCS offloader.
+
+### Prerequisite
+
+- Pulsar: 2.4.2 or later versions
+  
+- Apache jclouds: 2.2.0 or later versions
+
+### Step
+
+This example uses Pulsar 2.5.1.
+
+1. Download Pulsar tarball using one of the following ways:
+
+   * download the Pulsar tarball from the [Apache mirror](https://archive.apache.org/dist/pulsar/pulsar-2.5.1/apache-pulsar-2.5.1-bin.tar.gz)
+
+   * download from the Pulsar [download page](https://pulsar.apache.org/download)
+
+   * use [wget](https://www.gnu.org/software/wget)
+
+     ```shell
+     wget https://archive.apache.org/dist/pulsar/pulsar-2.5.1/apache-pulsar-2.5.1-bin.tar.gz
+     ```
+
+2. Download and untar the Pulsar offloaders package. 
+
+    ```bash
+    wget https://downloads.apache.org/pulsar/pulsar-2.5.1/apache-pulsar-offloaders-2.5.1-bin.tar.gz
+
+    tar xvfz apache-pulsar-offloaders-2.5.1-bin.tar.gz
+    ```
+
+    > #### Note
+    >
+    > * If you are running Pulsar in a bare metal cluster, make sure that `offloaders` tarball is unzipped in every broker's Pulsar directory.
+    > 
+    > * If you are running Pulsar in Docker or deploying Pulsar using a Docker image (such as K8S and DCOS), you can use the `apachepulsar/pulsar-all` image instead of the `apachepulsar/pulsar` image. `apachepulsar/pulsar-all` image has already bundled tiered storage offloaders.
+
+3. Copy the Pulsar offloaders as `offloaders` in the Pulsar directory.
+
+    ```
+    mv apache-pulsar-offloaders-2.5.1/offloaders apache-pulsar-2.5.1/offloaders
+
+    ls offloaders
+    ```
+
+    **Output**
+
+    As shown in the output, Pulsar uses [Apache jclouds](https://jclouds.apache.org) to support GCS and AWS S3 for long term storage. 
+
+
+    ```
+    tiered-storage-file-system-2.5.1.nar
+    tiered-storage-jcloud-2.5.1.nar
+    ```
+
+## Configuration
+
+> #### Note
+> 
+> Before offloading data from BookKeeper to GCS, you need to configure some properties of the GCS offloader driver. 
+
+Besides, you can also configure the GCS offloader to run it automatically or trigger it manually.
+
+### Configure GCS offloader driver
+
+You can configure GCS offloader driver in the configuration file `broker.conf` or `standalone.conf`.
+
+- **Required** configurations are as below.
+
+    **Required** configuration | Description | Example value
+    |---|---|---
+    `managedLedgerOffloadDriver`|Offloader driver name, which is case-insensitive.|google-cloud-storage
+    `offloadersDirectory`|Offloader directory|offloaders
+    `gcsManagedLedgerOffloadBucket`|Bucket|pulsar-topic-offload
+    `gcsManagedLedgerOffloadRegion`|Bucket region|europe-west3
+    `gcsManagedLedgerOffloadServiceAccountKeyFile`|Authentication |/Users/user-name/Downloads/project-804d5e6a6f33.json
+
+- **Optional** configurations are as below.
+
+    Optional configuration|Description|Example value
+    |---|---|---
+    `gcsManagedLedgerOffloadReadBufferSizeInBytes`|Size of block read|1 MB
+    `gcsManagedLedgerOffloadMaxBlockSizeInBytes`|Size of block write|64 MB
+    `managedLedgerMinLedgerRolloverTimeMinutes`|Minimum time between ledger rollover for a topic.|2
+    `managedLedgerMaxEntriesPerLedger`|Max number of entries to append to a ledger before triggering a rollover.|5000
+
+#### Bucket (required)
+
+A bucket is a basic container that holds your data. Everything you store in GCS **must** be contained in a bucket. You can use a bucket to organize your data and control access to your data, but unlike directory and folder, you can not nest a bucket.
+
+##### Example
+
+This example names the bucket as _pulsar-topic-offload_.
+
+```conf
+gcsManagedLedgerOffloadBucket=pulsar-topic-offload
+```
+
+#### Bucket region (required)
+
+Bucket region is the region where a bucket is located. If a bucket region is not specified, the **default** region (`us multi-regional location`) is used.
+
+> #### Tip
+>
+> For more information about bucket location, see [here](https://cloud.google.com/storage/docs/bucket-locations).
+
+##### Example
+
+This example sets the bucket region as _europe-west3_.
+
+```
+gcsManagedLedgerOffloadRegion=europe-west3
+```
+
+#### Authentication (required)
+
+To enable a broker access GCS, you need to configure `gcsManagedLedgerOffloadServiceAccountKeyFile` in the configuration file `broker.conf`. 
+
+`gcsManagedLedgerOffloadServiceAccountKeyFile` is
+a JSON file, containing GCS credentials of a service account.
+
+##### Example
+
+To generate service account credentials or view the public credentials that you've already generated, follow the following steps.
+
+1. Navigate to the [Service accounts page](https://console.developers.google.com/iam-admin/serviceaccounts).
+
+2. Select a project or create a new one.
+
+3. Click **Create service account**.
+
+4. In the **Create service account** window, type a name for the service account and select **Furnish a new private key**. 
+
+    If you want to [grant G Suite domain-wide authority](https://developers.google.com/identity/protocols/OAuth2ServiceAccount#delegatingauthority) to the service account, select **Enable G Suite Domain-wide Delegation**.
+
+5. Click **Create**.
+
+    > #### Note
+    >
+    > Make sure the service account you create has permission to operate GCS, you need to assign **Storage Admin** permission to your service account [here](https://cloud.google.com/storage/docs/access-control/iam).
+
+6. You can get the following information and set this in `broker.conf`.
+   
+    ```conf
+    gcsManagedLedgerOffloadServiceAccountKeyFile="/Users/user-name/Downloads/project-804d5e6a6f33.json"
+    ```
+
+    > #### Tip
+    >
+    > - For more information about how to create `gcsManagedLedgerOffloadServiceAccountKeyFile`, see [here](https://support.google.com/googleapi/answer/6158849).
+    >
+    > - For more information about Google Cloud IAM, see [here](https://cloud.google.com/storage/docs/access-control/iam).
+
+#### Size of block read/write
+
+You can configure the size of a request sent to or read from GCS in the configuration file `broker.conf`. 
+
+Configuration|Description
+|---|---
+`gcsManagedLedgerOffloadReadBufferSizeInBytes`|Block size for each individual read when reading back data from GCS.<br><br>The **default** value is 1 MB.
+`gcsManagedLedgerOffloadMaxBlockSizeInBytes`|Maximum size of a "part" sent during a multipart upload to GCS. <br><br>It **can not** be smaller than 5 MB. <br><br>The **default** value is 64 MB.
+
+### Configure GCS offloader to run automatically
+
+Namespace policy can be configured to offload data automatically once a threshold is reached. The threshold is based on the size of data that a topic has stored on a Pulsar cluster. Once the topic reaches the threshold, an offload operation is triggered automatically. 
+
+Threshold value|Action
+|---|---
+> 0 | It triggers the offloading operation if the topic storage reaches its threshold.
+= 0|It causes a broker to offload data as soon as possible.
+< 0 |It disables automatic offloading operation.
+
+Automatic offloading runs when a new segment is added to a topic log. If you set the threshold on a namespace, but few messages are being produced to the topic, offload does not work until the current segment is full.
+
+You can configure the threshold size using CLI tools, such as [pulsarctl](https://streamnative.io/docs/v1.0.0/manage-and-monitor/pulsarctl/overview/) or pulsar-admin.
+
+The offload configurations in `broker.conf` and `standalone.conf` are used for the namespaces that do not have namespace level offload policies. Each namespace can have its own offload policy. If you want to set offload policy for each namespace, use the command [`pulsar-admin namespaces set-offload-policies options`](http://pulsar.apache.org/tools/pulsar-admin/2.6.0-SNAPSHOT/#-em-set-offload-policies-em-) command.
+
+#### Example
+
+This example sets the GCS offloader threshold size to 10 MB using pulsarctl.
+
+```bash
+bin/pulsarctl namespaces set-offload-threshold --size 10M my-tenant/my-namespace
+```
+
+> #### Tip
+>
+> For more information about the `pulsarctl namespaces set-offload-threshold options` command, including flags, descriptions, default values, and shorthands, see [here](https://streamnative.io/docs/pulsarctl/v0.4.0/#-em-set-offload-threshold-em-). 
+
+### Configure GCS offloader to run manually
+
+For individual topics, you can trigger GCS offloader manually using one of the following methods:
+
+- Use REST endpoint 

Review comment:
       This is not a complete sentence which has a subject and predicate. 
   
   IBM style guide:
   Do not use a period after a phrase and use a period after a sentence.
   ![image](https://user-images.githubusercontent.com/50226895/86197914-ffd02980-bb88-11ea-86db-505303534e7e.png)
   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] Anonymitaet commented on a change in pull request #7393: Update docs for tiered storage

Posted by GitBox <gi...@apache.org>.
Anonymitaet commented on a change in pull request #7393:
URL: https://github.com/apache/pulsar/pull/7393#discussion_r448087902



##########
File path: site2/docs/tiered-storage-aws.md
##########
@@ -0,0 +1,283 @@
+---
+id: tiered-storage-aws
+title: Use AWS S3 offloader with Pulsar
+sidebar_label: AWS S3 offloader
+---
+
+This chapter guides you through every step of installing and configuring the AWS S3 offloader and using it with Pulsar.
+
+## Installation
+
+Follow the steps below to install the AWS S3 offloader.
+
+### Prerequisite
+
+- Pulsar: 2.4.2 or later versions
+  
+- Apache jclouds: 2.2.0 or later versions
+
+### Step
+
+This example uses Pulsar 2.5.1.
+
+1. Download Pulsar tarball using one of the following ways:
+
+   * download from the [Apache mirror](https://archive.apache.org/dist/pulsar/pulsar-2.5.1/apache-pulsar-2.5.1-bin.tar.gz)
+
+   * download from the Pulsar [downloads page](https://pulsar.apache.org/download)
+
+   * use [wget](https://www.gnu.org/software/wget):
+
+     ```shell
+     wget https://archive.apache.org/dist/pulsar/pulsar-2.5.1/apache-pulsar-2.5.1-bin.tar.gz
+     ```
+
+2. Download and untar the Pulsar offloaders package. 
+
+    ```bash
+    wget https://downloads.apache.org/pulsar/pulsar-2.5.1/apache-pulsar-offloaders-2.5.1-bin.tar.gz
+    tar xvfz apache-pulsar-offloaders-2.5.1-bin.tar.gz
+    ```
+
+3. Copy the Pulsar offloaders as `offloaders` in the Pulsar directory.
+
+    ```
+    mv apache-pulsar-offloaders-2.5.1/offloaders apache-pulsar-2.5.1/offloaders
+
+    ls offloaders
+    ```
+
+    **Output**
+
+    As shown from the output, Pulsar uses [Apache jclouds](https://jclouds.apache.org) to support [AWS S3](https://aws.amazon.com/s3/) and [GCS](https://cloud.google.com/storage/) for long term storage. 
+
+
+    ```
+    tiered-storage-file-system-2.5.1.nar
+    tiered-storage-jcloud-2.5.1.nar
+    ```
+
+    > #### Note
+    >
+    > * If you are running Pulsar in a bare metal cluster, make sure that `offloaders` tarball is unzipped in every broker's Pulsar directory.
+    > 
+    > * If you are running Pulsar in Docker or deploying Pulsar using a Docker image (such as K8s and DCOS), you can use the `apachepulsar/pulsar-all` image instead of the `apachepulsar/pulsar` image. `apachepulsar/pulsar-all` image has already bundled tiered storage offloaders.
+
+## Configuration
+
+> #### Note
+> 
+> Before offloading data from BookKeeper to AWS S3, you need to configure some properties of the AWS S3 offload driver.
+
+Besides, you can also configure the AWS S3 offloader to run it automatically or trigger it manually.
+
+### Configure AWS S3 offloader driver
+
+You can configure the AWS S3 offloader driver in the configuration file `broker.conf` or `standalone.conf`.
+
+- **Required** configurations are as below.
+  
+    Required configuration | Description | Example value
+    |---|---|---
+    `managedLedgerOffloadDriver` | Offloader driver name, which is case-insensitive. <br><br>**Note**: there is a third driver type, S3, which is identical to AWS S3, though S3 requires that you specify an endpoint URL using `s3ManagedLedgerOffloadServiceEndpoint`. This is useful if using an S3 compatible data store other than AWS S3. | aws-s3
+    `offloadersDirectory` | Offloader directory | offloaders
+    `s3ManagedLedgerOffloadBucket` | Bucket | pulsar-topic-offload
+
+- **Optional** configurations are as below.
+
+    Optional | Description | Example value
+    |---|---|---
+    `s3ManagedLedgerOffloadRegion` | Bucket region | eu-west-3
+    `s3ManagedLedgerOffloadReadBufferSizeInBytes`|Size of block read|1 MB
+    `s3ManagedLedgerOffloadMaxBlockSizeInBytes`|Size of block write|64 MB
+    `managedLedgerMinLedgerRolloverTimeMinutes`|Minimum time between ledger rollover for a topic<br><br>**Note**: it is not recommended that you set this configuration in the production environment.|2
+    `managedLedgerMaxEntriesPerLedger`|Maximum number of entries to append to a ledger before triggering a rollover.<br><br>**Note**: it is not recommended that you set this configuration in the production environment.|5000
+
+#### Bucket (required)
+
+A bucket is a basic container that holds your data. Everything you store in AWS S3 must be contained in a bucket. You can use a bucket to organize your data and control access to your data, but unlike directory and folder, you cannot nest a bucket.
+
+##### Example
+
+This example names the bucket as _pulsar-topic-offload_.
+
+```conf
+s3ManagedLedgerOffloadBucket=pulsar-topic-offload
+```
+
+#### Bucket region 
+
+A bucket region is a region where a bucket is located. If a bucket region is not specified, the **default** region (`US East (N. Virginia)`) is used.
+
+> #### Tip
+>
+> For more information about AWS regions and endpoints, see [here](https://docs.aws.amazon.com/general/latest/gr/rande.html).
+ 
+##### Example
+
+This example sets the bucket region as _europe-west-3_.
+
+```
+s3ManagedLedgerOffloadRegion=eu-west-3
+```
+
+#### Authentication (required)
+
+To be able to access AWS S3, you need to authenticate with AWS S3.
+
+Pulsar does not provide any direct methods of configuring authentication for AWS S3,
+but relies on the mechanisms supported by the
+[DefaultAWSCredentialsProviderChain](https://docs.aws.amazon.com/AWSJavaSDK/latest/javadoc/com/amazonaws/auth/DefaultAWSCredentialsProviderChain.html).
+
+Once you have created a set of credentials in the AWS IAM console, you can configure credentials using one of the following methods.
+
+* Use EC2 instance metadata credentials.
+
+    If you are on AWS instance with an instance profile that provides credentials, Pulsar uses these credentials if no other mechanism is provided.
+
+* Set the environment variables `AWS_ACCESS_KEY_ID` and `AWS_SECRET_ACCESS_KEY` in `conf/pulsar_env.sh`.
+
+    "export" is important so that the variables are made available in the environment of spawned processes.
+
+    ```bash
+    export AWS_ACCESS_KEY_ID=ABC123456789
+    export AWS_SECRET_ACCESS_KEY=ded7db27a4558e2ea8bbf0bf37ae0e8521618f366c
+    ```
+
+* Add the Java system properties `aws.accessKeyId` and `aws.secretKey` to `PULSAR_EXTRA_OPTS` in `conf/pulsar_env.sh`.
+
+    ```bash
+    PULSAR_EXTRA_OPTS="${PULSAR_EXTRA_OPTS} ${PULSAR_MEM} ${PULSAR_GC} -Daws.accessKeyId=ABC123456789 -Daws.secretKey=ded7db27a4558e2ea8bbf0bf37ae0e8521618f366c -Dio.netty.leakDetectionLevel=disabled -Dio.netty.recycler.maxCapacity.default=1000 -Dio.netty.recycler.linkCapacity=1024"
+    ```
+
+* Set the access credentials in `~/.aws/credentials`.
+
+    ```conf
+    [default]
+    aws_access_key_id=ABC123456789
+    aws_secret_access_key=ded7db27a4558e2ea8bbf0bf37ae0e8521618f366c
+    ```
+
+* Assume an IAM role.
+
+    This example uses the `DefaultAWSCredentialsProviderChain` for assuming this role.
+
+    The broker must be rebooted for credentials specified in `pulsar_env` to take effect.
+
+    ```conf
+    s3ManagedLedgerOffloadRole=<aws role arn>
+    s3ManagedLedgerOffloadRoleSessionName=pulsar-s3-offload
+    ```
+
+#### Size of block read/write
+
+You can configure the size of a request sent to or read from AWS S3 in the configuration file `broker.conf` or `standalone.conf`. 
+
+Configuration|Description|Default value
+|---|---|---
+`s3ManagedLedgerOffloadReadBufferSizeInBytes`|Block size for each individual read when reading back data from AWS S3.|1 MB
+`s3ManagedLedgerOffloadMaxBlockSizeInBytes`|Maximum size of a "part" sent during a multipart upload to GCS. It **cannot** be smaller than 5 MB. |64 MB
+
+### Configure AWS S3 offloader to run automatically
+
+Namespace policy can be configured to offload data automatically once a threshold is reached. The threshold is based on the size of data that a topic has stored on a Pulsar cluster. Once the topic reaches the threshold, an offloading operation is triggered automatically. 
+
+Threshold value|Action
+|---|---
+> 0 | It triggers the offloading operation if the topic storage reaches its threshold.
+= 0|It causes a broker to offload data as soon as possible.
+< 0 |It disables automatic offloading operation.
+
+Automatic offloading runs when a new segment is added to a topic log. If you set the threshold on a namespace, but few messages are being produced to the topic, offload does not work until the current segment is full.

Review comment:
       Offloader. Thanks for reminding me.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] gaoran10 commented on a change in pull request #7393: Update docs for tiered storage

Posted by GitBox <gi...@apache.org>.
gaoran10 commented on a change in pull request #7393:
URL: https://github.com/apache/pulsar/pull/7393#discussion_r448082726



##########
File path: site2/docs/tiered-storage-filesystem.md
##########
@@ -0,0 +1,268 @@
+---
+id: tiered-storage-filesystem
+title: Use filesystem offloader with Pulsar
+sidebar_label: Filesystem offloader
+---
+
+This chapter guides you through every step of installing and configuring the filesystem offloader and using it with Pulsar.
+
+## Installation
+
+Follow the steps below to install the filesystem offloader.
+
+### Prerequisite
+
+- Pulsar: 2.4.2 or later versions
+
+- Hadoop: 3.x.x

Review comment:
       Same as JClouds, users could don't care about this.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] Anonymitaet commented on a change in pull request #7393: Update docs for tiered storage

Posted by GitBox <gi...@apache.org>.
Anonymitaet commented on a change in pull request #7393:
URL: https://github.com/apache/pulsar/pull/7393#discussion_r448088204



##########
File path: site2/docs/tiered-storage-aws.md
##########
@@ -0,0 +1,283 @@
+---
+id: tiered-storage-aws
+title: Use AWS S3 offloader with Pulsar
+sidebar_label: AWS S3 offloader
+---
+
+This chapter guides you through every step of installing and configuring the AWS S3 offloader and using it with Pulsar.
+
+## Installation
+
+Follow the steps below to install the AWS S3 offloader.
+
+### Prerequisite
+
+- Pulsar: 2.4.2 or later versions
+  
+- Apache jclouds: 2.2.0 or later versions

Review comment:
       Thanks. Confirmed w/ @gaoran10, users do not care about the Apache jcloud version, so I've deleted this.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] gaoran10 commented on a change in pull request #7393: Update docs for tiered storage

Posted by GitBox <gi...@apache.org>.
gaoran10 commented on a change in pull request #7393:
URL: https://github.com/apache/pulsar/pull/7393#discussion_r448083179



##########
File path: site2/docs/tiered-storage-gcs.md
##########
@@ -0,0 +1,275 @@
+---
+id: tiered-storage-gcs
+title: Use GCS offloader with Pulsar
+sidebar_label: GCS offloader
+---
+
+This chapter guides you through every step of installing and configuring the GCS offloader and using it with Pulsar.
+
+## Installation
+
+Follow the steps below to install the GCS offloader.
+
+### Prerequisite
+
+- Pulsar: 2.4.2 or later versions
+  
+- Apache jclouds: 2.2.0 or later versions

Review comment:
       Same as tiered-storage-aws.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] Huanli-Meng commented on a change in pull request #7393: Update docs for tiered storage

Posted by GitBox <gi...@apache.org>.
Huanli-Meng commented on a change in pull request #7393:
URL: https://github.com/apache/pulsar/pull/7393#discussion_r448064049



##########
File path: site2/docs/tiered-storage-aws.md
##########
@@ -0,0 +1,283 @@
+---
+id: tiered-storage-aws
+title: Use AWS S3 offloader with Pulsar
+sidebar_label: AWS S3 offloader
+---
+
+This chapter guides you through every step of installing and configuring the AWS S3 offloader and using it with Pulsar.
+
+## Installation
+
+Follow the steps below to install the AWS S3 offloader.
+
+### Prerequisite
+
+- Pulsar: 2.4.2 or later versions
+  
+- Apache jclouds: 2.2.0 or later versions
+
+### Step
+
+This example uses Pulsar 2.5.1.
+
+1. Download Pulsar tarball using one of the following ways:
+
+   * download from the [Apache mirror](https://archive.apache.org/dist/pulsar/pulsar-2.5.1/apache-pulsar-2.5.1-bin.tar.gz)

Review comment:
       ```suggestion
      * Download the Pulsar tarball from the [Apache mirror](https://archive.apache.org/dist/pulsar/pulsar-2.5.1/apache-pulsar-2.5.1-bin.tar.gz).
   ```
   same comments for the next bullet

##########
File path: site2/docs/tiered-storage-aws.md
##########
@@ -0,0 +1,283 @@
+---
+id: tiered-storage-aws
+title: Use AWS S3 offloader with Pulsar
+sidebar_label: AWS S3 offloader
+---
+
+This chapter guides you through every step of installing and configuring the AWS S3 offloader and using it with Pulsar.
+
+## Installation
+
+Follow the steps below to install the AWS S3 offloader.
+
+### Prerequisite
+
+- Pulsar: 2.4.2 or later versions
+  
+- Apache jclouds: 2.2.0 or later versions
+
+### Step
+
+This example uses Pulsar 2.5.1.
+
+1. Download Pulsar tarball using one of the following ways:
+
+   * download from the [Apache mirror](https://archive.apache.org/dist/pulsar/pulsar-2.5.1/apache-pulsar-2.5.1-bin.tar.gz)
+
+   * download from the Pulsar [downloads page](https://pulsar.apache.org/download)
+
+   * use [wget](https://www.gnu.org/software/wget):
+
+     ```shell
+     wget https://archive.apache.org/dist/pulsar/pulsar-2.5.1/apache-pulsar-2.5.1-bin.tar.gz
+     ```
+
+2. Download and untar the Pulsar offloaders package. 
+
+    ```bash
+    wget https://downloads.apache.org/pulsar/pulsar-2.5.1/apache-pulsar-offloaders-2.5.1-bin.tar.gz
+    tar xvfz apache-pulsar-offloaders-2.5.1-bin.tar.gz
+    ```
+
+3. Copy the Pulsar offloaders as `offloaders` in the Pulsar directory.
+
+    ```
+    mv apache-pulsar-offloaders-2.5.1/offloaders apache-pulsar-2.5.1/offloaders
+
+    ls offloaders
+    ```
+
+    **Output**
+
+    As shown from the output, Pulsar uses [Apache jclouds](https://jclouds.apache.org) to support [AWS S3](https://aws.amazon.com/s3/) and [GCS](https://cloud.google.com/storage/) for long term storage. 
+
+
+    ```
+    tiered-storage-file-system-2.5.1.nar
+    tiered-storage-jcloud-2.5.1.nar
+    ```
+
+    > #### Note
+    >
+    > * If you are running Pulsar in a bare metal cluster, make sure that `offloaders` tarball is unzipped in every broker's Pulsar directory.
+    > 
+    > * If you are running Pulsar in Docker or deploying Pulsar using a Docker image (such as K8s and DCOS), you can use the `apachepulsar/pulsar-all` image instead of the `apachepulsar/pulsar` image. `apachepulsar/pulsar-all` image has already bundled tiered storage offloaders.
+
+## Configuration
+
+> #### Note
+> 
+> Before offloading data from BookKeeper to AWS S3, you need to configure some properties of the AWS S3 offload driver.
+
+Besides, you can also configure the AWS S3 offloader to run it automatically or trigger it manually.
+
+### Configure AWS S3 offloader driver
+
+You can configure the AWS S3 offloader driver in the configuration file `broker.conf` or `standalone.conf`.
+
+- **Required** configurations are as below.
+  
+    Required configuration | Description | Example value
+    |---|---|---
+    `managedLedgerOffloadDriver` | Offloader driver name, which is case-insensitive. <br><br>**Note**: there is a third driver type, S3, which is identical to AWS S3, though S3 requires that you specify an endpoint URL using `s3ManagedLedgerOffloadServiceEndpoint`. This is useful if using an S3 compatible data store other than AWS S3. | aws-s3
+    `offloadersDirectory` | Offloader directory | offloaders
+    `s3ManagedLedgerOffloadBucket` | Bucket | pulsar-topic-offload
+
+- **Optional** configurations are as below.
+
+    Optional | Description | Example value
+    |---|---|---
+    `s3ManagedLedgerOffloadRegion` | Bucket region | eu-west-3
+    `s3ManagedLedgerOffloadReadBufferSizeInBytes`|Size of block read|1 MB
+    `s3ManagedLedgerOffloadMaxBlockSizeInBytes`|Size of block write|64 MB
+    `managedLedgerMinLedgerRolloverTimeMinutes`|Minimum time between ledger rollover for a topic<br><br>**Note**: it is not recommended that you set this configuration in the production environment.|2
+    `managedLedgerMaxEntriesPerLedger`|Maximum number of entries to append to a ledger before triggering a rollover.<br><br>**Note**: it is not recommended that you set this configuration in the production environment.|5000
+
+#### Bucket (required)
+
+A bucket is a basic container that holds your data. Everything you store in AWS S3 must be contained in a bucket. You can use a bucket to organize your data and control access to your data, but unlike directory and folder, you cannot nest a bucket.
+
+##### Example
+
+This example names the bucket as _pulsar-topic-offload_.
+
+```conf
+s3ManagedLedgerOffloadBucket=pulsar-topic-offload
+```
+
+#### Bucket region 
+
+A bucket region is a region where a bucket is located. If a bucket region is not specified, the **default** region (`US East (N. Virginia)`) is used.
+
+> #### Tip
+>
+> For more information about AWS regions and endpoints, see [here](https://docs.aws.amazon.com/general/latest/gr/rande.html).
+ 
+##### Example
+
+This example sets the bucket region as _europe-west-3_.
+
+```
+s3ManagedLedgerOffloadRegion=eu-west-3
+```
+
+#### Authentication (required)
+
+To be able to access AWS S3, you need to authenticate with AWS S3.
+
+Pulsar does not provide any direct methods of configuring authentication for AWS S3,
+but relies on the mechanisms supported by the
+[DefaultAWSCredentialsProviderChain](https://docs.aws.amazon.com/AWSJavaSDK/latest/javadoc/com/amazonaws/auth/DefaultAWSCredentialsProviderChain.html).
+
+Once you have created a set of credentials in the AWS IAM console, you can configure credentials using one of the following methods.
+
+* Use EC2 instance metadata credentials.
+
+    If you are on AWS instance with an instance profile that provides credentials, Pulsar uses these credentials if no other mechanism is provided.
+
+* Set the environment variables `AWS_ACCESS_KEY_ID` and `AWS_SECRET_ACCESS_KEY` in `conf/pulsar_env.sh`.
+
+    "export" is important so that the variables are made available in the environment of spawned processes.
+
+    ```bash
+    export AWS_ACCESS_KEY_ID=ABC123456789
+    export AWS_SECRET_ACCESS_KEY=ded7db27a4558e2ea8bbf0bf37ae0e8521618f366c
+    ```
+
+* Add the Java system properties `aws.accessKeyId` and `aws.secretKey` to `PULSAR_EXTRA_OPTS` in `conf/pulsar_env.sh`.
+
+    ```bash
+    PULSAR_EXTRA_OPTS="${PULSAR_EXTRA_OPTS} ${PULSAR_MEM} ${PULSAR_GC} -Daws.accessKeyId=ABC123456789 -Daws.secretKey=ded7db27a4558e2ea8bbf0bf37ae0e8521618f366c -Dio.netty.leakDetectionLevel=disabled -Dio.netty.recycler.maxCapacity.default=1000 -Dio.netty.recycler.linkCapacity=1024"
+    ```
+
+* Set the access credentials in `~/.aws/credentials`.
+
+    ```conf
+    [default]
+    aws_access_key_id=ABC123456789
+    aws_secret_access_key=ded7db27a4558e2ea8bbf0bf37ae0e8521618f366c
+    ```
+
+* Assume an IAM role.
+
+    This example uses the `DefaultAWSCredentialsProviderChain` for assuming this role.
+
+    The broker must be rebooted for credentials specified in `pulsar_env` to take effect.
+
+    ```conf
+    s3ManagedLedgerOffloadRole=<aws role arn>
+    s3ManagedLedgerOffloadRoleSessionName=pulsar-s3-offload
+    ```
+
+#### Size of block read/write
+
+You can configure the size of a request sent to or read from AWS S3 in the configuration file `broker.conf` or `standalone.conf`. 
+
+Configuration|Description|Default value
+|---|---|---
+`s3ManagedLedgerOffloadReadBufferSizeInBytes`|Block size for each individual read when reading back data from AWS S3.|1 MB
+`s3ManagedLedgerOffloadMaxBlockSizeInBytes`|Maximum size of a "part" sent during a multipart upload to GCS. It **cannot** be smaller than 5 MB. |64 MB
+
+### Configure AWS S3 offloader to run automatically
+
+Namespace policy can be configured to offload data automatically once a threshold is reached. The threshold is based on the size of data that a topic has stored on a Pulsar cluster. Once the topic reaches the threshold, an offloading operation is triggered automatically. 
+
+Threshold value|Action
+|---|---
+> 0 | It triggers the offloading operation if the topic storage reaches its threshold.
+= 0|It causes a broker to offload data as soon as possible.
+< 0 |It disables automatic offloading operation.
+
+Automatic offloading runs when a new segment is added to a topic log. If you set the threshold on a namespace, but few messages are being produced to the topic, offload does not work until the current segment is full.

Review comment:
       offload or offloader?

##########
File path: site2/docs/tiered-storage-overview.md
##########
@@ -0,0 +1,49 @@
+---
+id: tiered-storage-overview
+title: Overview of tiered storage
+sidebar_label: Overview
+---
+
+Pulsar's **Tiered Storage** feature allows older backlog data to be moved from BookKeeper to long term and cheaper storage, while still allowing clients to access the backlog as if nothing has changed. 
+
+* Tiered storage uses [Apache jclouds](https://jclouds.apache.org) to support
+[Amazon S3](https://aws.amazon.com/s3/) and [GCS (Google Cloud Storage)](https://cloud.google.com/storage/) for long term storage. 
+
+    With jclouds, it is easy to add support for more
+[cloud storage providers](https://jclouds.apache.org/reference/providers/#blobstore-providers) in the future.
+
+    > #### Tip
+    > 
+    > For more information about how to use the AWS S3 offloader with Pulsar, see [here](tiered-storage-aws.md).
+    > 
+    > For more information about how to use the GCS offloader with Pulsar, see [here](tiered-storage-gcs.md).
+
+* Tiered storage uses [Apache Hadoop](http://hadoop.apache.org/) to support filesystems for long term storage. 
+
+    With Hadoop, it is easy to add support for more filesystems in the future.
+
+    > #### Tip
+    > 
+    > For more information about how to use the filesystem offloader with Pulsar, see [here](tiered-storage-filesystem.md).
+
+## When should I use tiered storage?
+
+Tiered storage should be used when you have a topic for which you want to keep a very long backlog for a long time. 

Review comment:
       how about to change it to "Tiered storage should be used when you want to keep a very long backlog for a topic."

##########
File path: site2/docs/tiered-storage-aws.md
##########
@@ -0,0 +1,283 @@
+---
+id: tiered-storage-aws
+title: Use AWS S3 offloader with Pulsar
+sidebar_label: AWS S3 offloader
+---
+
+This chapter guides you through every step of installing and configuring the AWS S3 offloader and using it with Pulsar.
+
+## Installation
+
+Follow the steps below to install the AWS S3 offloader.
+
+### Prerequisite
+
+- Pulsar: 2.4.2 or later versions
+  
+- Apache jclouds: 2.2.0 or later versions
+
+### Step
+
+This example uses Pulsar 2.5.1.
+
+1. Download Pulsar tarball using one of the following ways:

Review comment:
       ```suggestion
   1. Download the Pulsar tarball using one of the following ways:
   ```

##########
File path: site2/docs/tiered-storage-overview.md
##########
@@ -0,0 +1,49 @@
+---
+id: tiered-storage-overview
+title: Overview of tiered storage
+sidebar_label: Overview
+---
+
+Pulsar's **Tiered Storage** feature allows older backlog data to be moved from BookKeeper to long term and cheaper storage, while still allowing clients to access the backlog as if nothing has changed. 
+
+* Tiered storage uses [Apache jclouds](https://jclouds.apache.org) to support
+[Amazon S3](https://aws.amazon.com/s3/) and [GCS (Google Cloud Storage)](https://cloud.google.com/storage/) for long term storage. 
+
+    With jclouds, it is easy to add support for more
+[cloud storage providers](https://jclouds.apache.org/reference/providers/#blobstore-providers) in the future.
+
+    > #### Tip
+    > 
+    > For more information about how to use the AWS S3 offloader with Pulsar, see [here](tiered-storage-aws.md).
+    > 
+    > For more information about how to use the GCS offloader with Pulsar, see [here](tiered-storage-gcs.md).
+
+* Tiered storage uses [Apache Hadoop](http://hadoop.apache.org/) to support filesystems for long term storage. 
+
+    With Hadoop, it is easy to add support for more filesystems in the future.
+
+    > #### Tip
+    > 
+    > For more information about how to use the filesystem offloader with Pulsar, see [here](tiered-storage-filesystem.md).
+
+## When should I use tiered storage?
+
+Tiered storage should be used when you have a topic for which you want to keep a very long backlog for a long time. 
+
+For example, if you have a topic containing user actions which you use to train your recommendation systems, you may want to keep that data for a long time, so that if you change your recommendation algorithm, you can rerun it against your full user history.
+
+## How does tiered storage work?
+
+A topic in Pulsar is backed by a **log**, known as a **managed ledger**. This log is composed of an ordered list of segments. Pulsar only writes to the final segment of the log. All previous segments are sealed. The data within the segment is immutable. This is known as a **segment oriented architecture**.

Review comment:
       ```suggestion
   A topic in Pulsar is backed by a **log**, known as a **managed ledger**. This log consists of an ordered list of segments. Pulsar only writes to the final segment of the log. All previous segments are sealed. The data within the segment is immutable. This is known as a **segment oriented architecture**.
   ```

##########
File path: site2/docs/tiered-storage-gcs.md
##########
@@ -0,0 +1,275 @@
+---
+id: tiered-storage-gcs
+title: Use GCS offloader with Pulsar
+sidebar_label: GCS offloader
+---
+
+This chapter guides you through every step of installing and configuring the GCS offloader and using it with Pulsar.
+
+## Installation
+
+Follow the steps below to install the GCS offloader.
+
+### Prerequisite
+
+- Pulsar: 2.4.2 or later versions
+  
+- Apache jclouds: 2.2.0 or later versions
+
+### Step
+
+This example uses Pulsar 2.5.1.
+
+1. Download Pulsar tarball using one of the following ways:
+
+   * download the Pulsar tarball from the [Apache mirror](https://archive.apache.org/dist/pulsar/pulsar-2.5.1/apache-pulsar-2.5.1-bin.tar.gz)
+
+   * download from the Pulsar [download page](https://pulsar.apache.org/download)
+
+   * use [wget](https://www.gnu.org/software/wget)
+
+     ```shell
+     wget https://archive.apache.org/dist/pulsar/pulsar-2.5.1/apache-pulsar-2.5.1-bin.tar.gz
+     ```
+
+2. Download and untar the Pulsar offloaders package. 
+
+    ```bash
+    wget https://downloads.apache.org/pulsar/pulsar-2.5.1/apache-pulsar-offloaders-2.5.1-bin.tar.gz
+
+    tar xvfz apache-pulsar-offloaders-2.5.1-bin.tar.gz
+    ```
+
+    > #### Note
+    >
+    > * If you are running Pulsar in a bare metal cluster, make sure that `offloaders` tarball is unzipped in every broker's Pulsar directory.
+    > 
+    > * If you are running Pulsar in Docker or deploying Pulsar using a Docker image (such as K8S and DCOS), you can use the `apachepulsar/pulsar-all` image instead of the `apachepulsar/pulsar` image. `apachepulsar/pulsar-all` image has already bundled tiered storage offloaders.
+
+3. Copy the Pulsar offloaders as `offloaders` in the Pulsar directory.
+
+    ```
+    mv apache-pulsar-offloaders-2.5.1/offloaders apache-pulsar-2.5.1/offloaders
+
+    ls offloaders
+    ```
+
+    **Output**
+
+    As shown in the output, Pulsar uses [Apache jclouds](https://jclouds.apache.org) to support GCS and AWS S3 for long term storage. 
+
+
+    ```
+    tiered-storage-file-system-2.5.1.nar
+    tiered-storage-jcloud-2.5.1.nar
+    ```
+
+## Configuration
+
+> #### Note
+> 
+> Before offloading data from BookKeeper to GCS, you need to configure some properties of the GCS offloader driver. 
+
+Besides, you can also configure the GCS offloader to run it automatically or trigger it manually.
+
+### Configure GCS offloader driver
+
+You can configure GCS offloader driver in the configuration file `broker.conf` or `standalone.conf`.
+
+- **Required** configurations are as below.
+
+    **Required** configuration | Description | Example value
+    |---|---|---
+    `managedLedgerOffloadDriver`|Offloader driver name, which is case-insensitive.|google-cloud-storage
+    `offloadersDirectory`|Offloader directory|offloaders
+    `gcsManagedLedgerOffloadBucket`|Bucket|pulsar-topic-offload
+    `gcsManagedLedgerOffloadRegion`|Bucket region|europe-west3
+    `gcsManagedLedgerOffloadServiceAccountKeyFile`|Authentication |/Users/user-name/Downloads/project-804d5e6a6f33.json
+
+- **Optional** configurations are as below.
+
+    Optional configuration|Description|Example value
+    |---|---|---
+    `gcsManagedLedgerOffloadReadBufferSizeInBytes`|Size of block read|1 MB
+    `gcsManagedLedgerOffloadMaxBlockSizeInBytes`|Size of block write|64 MB
+    `managedLedgerMinLedgerRolloverTimeMinutes`|Minimum time between ledger rollover for a topic.|2
+    `managedLedgerMaxEntriesPerLedger`|Max number of entries to append to a ledger before triggering a rollover.|5000
+
+#### Bucket (required)
+
+A bucket is a basic container that holds your data. Everything you store in GCS **must** be contained in a bucket. You can use a bucket to organize your data and control access to your data, but unlike directory and folder, you can not nest a bucket.
+
+##### Example
+
+This example names the bucket as _pulsar-topic-offload_.
+
+```conf
+gcsManagedLedgerOffloadBucket=pulsar-topic-offload
+```
+
+#### Bucket region (required)
+
+Bucket region is the region where a bucket is located. If a bucket region is not specified, the **default** region (`us multi-regional location`) is used.
+
+> #### Tip
+>
+> For more information about bucket location, see [here](https://cloud.google.com/storage/docs/bucket-locations).
+
+##### Example
+
+This example sets the bucket region as _europe-west3_.
+
+```
+gcsManagedLedgerOffloadRegion=europe-west3
+```
+
+#### Authentication (required)
+
+To enable a broker access GCS, you need to configure `gcsManagedLedgerOffloadServiceAccountKeyFile` in the configuration file `broker.conf`. 
+
+`gcsManagedLedgerOffloadServiceAccountKeyFile` is
+a JSON file, containing GCS credentials of a service account.
+
+##### Example
+
+To generate service account credentials or view the public credentials that you've already generated, follow the following steps.
+
+1. Navigate to the [Service accounts page](https://console.developers.google.com/iam-admin/serviceaccounts).
+
+2. Select a project or create a new one.
+
+3. Click **Create service account**.
+
+4. In the **Create service account** window, type a name for the service account and select **Furnish a new private key**. 
+
+    If you want to [grant G Suite domain-wide authority](https://developers.google.com/identity/protocols/OAuth2ServiceAccount#delegatingauthority) to the service account, select **Enable G Suite Domain-wide Delegation**.
+
+5. Click **Create**.
+
+    > #### Note
+    >
+    > Make sure the service account you create has permission to operate GCS, you need to assign **Storage Admin** permission to your service account [here](https://cloud.google.com/storage/docs/access-control/iam).
+
+6. You can get the following information and set this in `broker.conf`.
+   
+    ```conf
+    gcsManagedLedgerOffloadServiceAccountKeyFile="/Users/user-name/Downloads/project-804d5e6a6f33.json"
+    ```
+
+    > #### Tip
+    >
+    > - For more information about how to create `gcsManagedLedgerOffloadServiceAccountKeyFile`, see [here](https://support.google.com/googleapi/answer/6158849).
+    >
+    > - For more information about Google Cloud IAM, see [here](https://cloud.google.com/storage/docs/access-control/iam).
+
+#### Size of block read/write
+
+You can configure the size of a request sent to or read from GCS in the configuration file `broker.conf`. 
+
+Configuration|Description
+|---|---
+`gcsManagedLedgerOffloadReadBufferSizeInBytes`|Block size for each individual read when reading back data from GCS.<br><br>The **default** value is 1 MB.
+`gcsManagedLedgerOffloadMaxBlockSizeInBytes`|Maximum size of a "part" sent during a multipart upload to GCS. <br><br>It **can not** be smaller than 5 MB. <br><br>The **default** value is 64 MB.
+
+### Configure GCS offloader to run automatically
+
+Namespace policy can be configured to offload data automatically once a threshold is reached. The threshold is based on the size of data that a topic has stored on a Pulsar cluster. Once the topic reaches the threshold, an offload operation is triggered automatically. 
+
+Threshold value|Action
+|---|---
+> 0 | It triggers the offloading operation if the topic storage reaches its threshold.
+= 0|It causes a broker to offload data as soon as possible.
+< 0 |It disables automatic offloading operation.
+
+Automatic offloading runs when a new segment is added to a topic log. If you set the threshold on a namespace, but few messages are being produced to the topic, offload does not work until the current segment is full.
+
+You can configure the threshold size using CLI tools, such as [pulsarctl](https://streamnative.io/docs/v1.0.0/manage-and-monitor/pulsarctl/overview/) or pulsar-admin.
+
+The offload configurations in `broker.conf` and `standalone.conf` are used for the namespaces that do not have namespace level offload policies. Each namespace can have its own offload policy. If you want to set offload policy for each namespace, use the command [`pulsar-admin namespaces set-offload-policies options`](http://pulsar.apache.org/tools/pulsar-admin/2.6.0-SNAPSHOT/#-em-set-offload-policies-em-) command.
+
+#### Example
+
+This example sets the GCS offloader threshold size to 10 MB using pulsarctl.
+
+```bash
+bin/pulsarctl namespaces set-offload-threshold --size 10M my-tenant/my-namespace
+```
+
+> #### Tip
+>
+> For more information about the `pulsarctl namespaces set-offload-threshold options` command, including flags, descriptions, default values, and shorthands, see [here](https://streamnative.io/docs/pulsarctl/v0.4.0/#-em-set-offload-threshold-em-). 
+
+### Configure GCS offloader to run manually
+
+For individual topics, you can trigger GCS offloader manually using one of the following methods:
+
+- Use REST endpoint 

Review comment:
       ```suggestion
   - Use REST endpoint.
   ```

##########
File path: site2/docs/tiered-storage-filesystem.md
##########
@@ -0,0 +1,268 @@
+---
+id: tiered-storage-filesystem
+title: Use filesystem offloader with Pulsar
+sidebar_label: Filesystem offloader
+---
+
+This chapter guides you through every step of installing and configuring the filesystem offloader and using it with Pulsar.
+
+## Installation
+
+Follow the steps below to install the filesystem offloader.
+
+### Prerequisite
+
+- Pulsar: 2.4.2 or later versions
+
+- Hadoop: 3.x.x
+
+### Step
+
+This example uses Pulsar 2.5.1.
+
+1. Download Pulsar tarball using one of the following ways:
+
+   * download the Pulsar tarball from the [Apache mirror](https://archive.apache.org/dist/pulsar/pulsar-2.5.1/apache-pulsar-2.5.1-bin.tar.gz)

Review comment:
       same comments as that in the AWS doc.

##########
File path: site2/docs/tiered-storage-overview.md
##########
@@ -0,0 +1,49 @@
+---
+id: tiered-storage-overview
+title: Overview of tiered storage
+sidebar_label: Overview
+---
+
+Pulsar's **Tiered Storage** feature allows older backlog data to be moved from BookKeeper to long term and cheaper storage, while still allowing clients to access the backlog as if nothing has changed. 
+
+* Tiered storage uses [Apache jclouds](https://jclouds.apache.org) to support
+[Amazon S3](https://aws.amazon.com/s3/) and [GCS (Google Cloud Storage)](https://cloud.google.com/storage/) for long term storage. 
+
+    With jclouds, it is easy to add support for more
+[cloud storage providers](https://jclouds.apache.org/reference/providers/#blobstore-providers) in the future.
+
+    > #### Tip
+    > 
+    > For more information about how to use the AWS S3 offloader with Pulsar, see [here](tiered-storage-aws.md).
+    > 
+    > For more information about how to use the GCS offloader with Pulsar, see [here](tiered-storage-gcs.md).
+
+* Tiered storage uses [Apache Hadoop](http://hadoop.apache.org/) to support filesystems for long term storage. 
+
+    With Hadoop, it is easy to add support for more filesystems in the future.
+
+    > #### Tip
+    > 
+    > For more information about how to use the filesystem offloader with Pulsar, see [here](tiered-storage-filesystem.md).
+
+## When should I use tiered storage?

Review comment:
       how about change the title to "when to use tiered storage?"

##########
File path: site2/docs/tiered-storage-gcs.md
##########
@@ -0,0 +1,275 @@
+---
+id: tiered-storage-gcs
+title: Use GCS offloader with Pulsar
+sidebar_label: GCS offloader
+---
+
+This chapter guides you through every step of installing and configuring the GCS offloader and using it with Pulsar.
+
+## Installation
+
+Follow the steps below to install the GCS offloader.
+
+### Prerequisite
+
+- Pulsar: 2.4.2 or later versions
+  
+- Apache jclouds: 2.2.0 or later versions
+
+### Step
+
+This example uses Pulsar 2.5.1.
+
+1. Download Pulsar tarball using one of the following ways:
+
+   * download the Pulsar tarball from the [Apache mirror](https://archive.apache.org/dist/pulsar/pulsar-2.5.1/apache-pulsar-2.5.1-bin.tar.gz)

Review comment:
       same comments




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] gaoran10 commented on a change in pull request #7393: Update docs for tiered storage

Posted by GitBox <gi...@apache.org>.
gaoran10 commented on a change in pull request #7393:
URL: https://github.com/apache/pulsar/pull/7393#discussion_r448139695



##########
File path: site2/docs/tiered-storage-filesystem.md
##########
@@ -0,0 +1,268 @@
+---
+id: tiered-storage-filesystem
+title: Use filesystem offloader with Pulsar
+sidebar_label: Filesystem offloader
+---
+
+This chapter guides you through every step of installing and configuring the filesystem offloader and using it with Pulsar.
+
+## Installation
+
+Follow the steps below to install the filesystem offloader.
+
+### Prerequisite
+
+- Pulsar: 2.4.2 or later versions
+
+- Hadoop: 3.x.x
+
+### Step
+
+This example uses Pulsar 2.5.1.
+
+1. Download the Pulsar tarball using one of the following ways:
+
+   * Download from the [Apache mirror](https://archive.apache.org/dist/pulsar/pulsar-2.5.1/apache-pulsar-2.5.1-bin.tar.gz)
+
+   * Download from the Pulsar [download page](https://pulsar.apache.org/download)
+
+   * Use [wget](https://www.gnu.org/software/wget)
+
+     ```shell
+     wget https://archive.apache.org/dist/pulsar/pulsar-2.5.1/apache-pulsar-2.5.1-bin.tar.gz
+     ```
+
+2. Download and untar the Pulsar offloaders package. 
+
+    ```bash
+    wget https://downloads.apache.org/pulsar/pulsar-2.5.1/apache-pulsar-offloaders-2.5.1-bin.tar.gz
+
+    tar xvfz apache-pulsar-offloaders-2.5.1-bin.tar.gz
+    ```
+
+    > #### Note
+    >
+    > * If you are running Pulsar in a bare metal cluster, make sure that `offloaders` tarball is unzipped in every broker's Pulsar directory.
+    > 
+    > * If you are running Pulsar in Docker or deploying Pulsar using a Docker image (such as K8S and DCOS), you can use the `apachepulsar/pulsar-all` image instead of the `apachepulsar/pulsar` image. `apachepulsar/pulsar-all` image has already bundled tiered storage offloaders.
+
+3. Copy the Pulsar offloaders as `offloaders` in the Pulsar directory.
+
+    ```
+    mv apache-pulsar-offloaders-2.5.1/offloaders apache-pulsar-2.5.1/offloaders
+
+    ls offloaders
+    ```
+
+    **Output**
+
+    ```
+    tiered-storage-file-system-2.5.1.nar
+    tiered-storage-jcloud-2.5.1.nar
+    ```
+
+    > #### Note
+    >
+    > * If you are running Pulsar in a bare metal cluster, make sure that `offloaders` tarball is unzipped in every broker's Pulsar directory.
+    > 
+    > * If you are running Pulsar in Docker or deploying Pulsar using a Docker image (such as K8s and DCOS), you can use the `apachepulsar/pulsar-all` image instead of the `apachepulsar/pulsar` image. `apachepulsar/pulsar-all` image has already bundled tiered storage offloaders.
+
+## Configuration
+
+> #### Note
+> 
+> Before offloading data from BookKeeper to filesystem, you need to configure some properties of the filesystem offloader driver. 
+
+Besides, you can also configure the filesystem offloader to run it automatically or trigger it manually.
+
+### Configure filesystem offloader driver
+
+You can configure filesystem offloader driver in the configuration file `broker.conf` or `standalone.conf`.
+
+- **Required** configurations are as below.
+  
+    Required configuration | Description | Example value
+    |---|---|---
+    `managedLedgerOffloadDriver` | Offloader driver name, which is case-insensitive. | filesystem
+    `fileSystemURI` | Connection address | hdfs://127.0.0.1:9000
+    `offloadersDirectory` | Hadoop profile path | ../conf/filesystem_offload_core_site.xml

Review comment:
       ```suggestion
       `fileSystemProfilePath ` | Hadoop profile path | ../conf/filesystem_offload_core_site.xml
       `offloadersDirectory` | Offloader directory | offloaders
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] Anonymitaet commented on pull request #7393: Update docs for tiered storage

Posted by GitBox <gi...@apache.org>.
Anonymitaet commented on pull request #7393:
URL: https://github.com/apache/pulsar/pull/7393#issuecomment-651703355


   @gaoran10 @Huanli-Meng could u pls help review? Thanks


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] Anonymitaet commented on a change in pull request #7393: Update docs for tiered storage

Posted by GitBox <gi...@apache.org>.
Anonymitaet commented on a change in pull request #7393:
URL: https://github.com/apache/pulsar/pull/7393#discussion_r448087902



##########
File path: site2/docs/tiered-storage-aws.md
##########
@@ -0,0 +1,283 @@
+---
+id: tiered-storage-aws
+title: Use AWS S3 offloader with Pulsar
+sidebar_label: AWS S3 offloader
+---
+
+This chapter guides you through every step of installing and configuring the AWS S3 offloader and using it with Pulsar.
+
+## Installation
+
+Follow the steps below to install the AWS S3 offloader.
+
+### Prerequisite
+
+- Pulsar: 2.4.2 or later versions
+  
+- Apache jclouds: 2.2.0 or later versions
+
+### Step
+
+This example uses Pulsar 2.5.1.
+
+1. Download Pulsar tarball using one of the following ways:
+
+   * download from the [Apache mirror](https://archive.apache.org/dist/pulsar/pulsar-2.5.1/apache-pulsar-2.5.1-bin.tar.gz)
+
+   * download from the Pulsar [downloads page](https://pulsar.apache.org/download)
+
+   * use [wget](https://www.gnu.org/software/wget):
+
+     ```shell
+     wget https://archive.apache.org/dist/pulsar/pulsar-2.5.1/apache-pulsar-2.5.1-bin.tar.gz
+     ```
+
+2. Download and untar the Pulsar offloaders package. 
+
+    ```bash
+    wget https://downloads.apache.org/pulsar/pulsar-2.5.1/apache-pulsar-offloaders-2.5.1-bin.tar.gz
+    tar xvfz apache-pulsar-offloaders-2.5.1-bin.tar.gz
+    ```
+
+3. Copy the Pulsar offloaders as `offloaders` in the Pulsar directory.
+
+    ```
+    mv apache-pulsar-offloaders-2.5.1/offloaders apache-pulsar-2.5.1/offloaders
+
+    ls offloaders
+    ```
+
+    **Output**
+
+    As shown from the output, Pulsar uses [Apache jclouds](https://jclouds.apache.org) to support [AWS S3](https://aws.amazon.com/s3/) and [GCS](https://cloud.google.com/storage/) for long term storage. 
+
+
+    ```
+    tiered-storage-file-system-2.5.1.nar
+    tiered-storage-jcloud-2.5.1.nar
+    ```
+
+    > #### Note
+    >
+    > * If you are running Pulsar in a bare metal cluster, make sure that `offloaders` tarball is unzipped in every broker's Pulsar directory.
+    > 
+    > * If you are running Pulsar in Docker or deploying Pulsar using a Docker image (such as K8s and DCOS), you can use the `apachepulsar/pulsar-all` image instead of the `apachepulsar/pulsar` image. `apachepulsar/pulsar-all` image has already bundled tiered storage offloaders.
+
+## Configuration
+
+> #### Note
+> 
+> Before offloading data from BookKeeper to AWS S3, you need to configure some properties of the AWS S3 offload driver.
+
+Besides, you can also configure the AWS S3 offloader to run it automatically or trigger it manually.
+
+### Configure AWS S3 offloader driver
+
+You can configure the AWS S3 offloader driver in the configuration file `broker.conf` or `standalone.conf`.
+
+- **Required** configurations are as below.
+  
+    Required configuration | Description | Example value
+    |---|---|---
+    `managedLedgerOffloadDriver` | Offloader driver name, which is case-insensitive. <br><br>**Note**: there is a third driver type, S3, which is identical to AWS S3, though S3 requires that you specify an endpoint URL using `s3ManagedLedgerOffloadServiceEndpoint`. This is useful if using an S3 compatible data store other than AWS S3. | aws-s3
+    `offloadersDirectory` | Offloader directory | offloaders
+    `s3ManagedLedgerOffloadBucket` | Bucket | pulsar-topic-offload
+
+- **Optional** configurations are as below.
+
+    Optional | Description | Example value
+    |---|---|---
+    `s3ManagedLedgerOffloadRegion` | Bucket region | eu-west-3
+    `s3ManagedLedgerOffloadReadBufferSizeInBytes`|Size of block read|1 MB
+    `s3ManagedLedgerOffloadMaxBlockSizeInBytes`|Size of block write|64 MB
+    `managedLedgerMinLedgerRolloverTimeMinutes`|Minimum time between ledger rollover for a topic<br><br>**Note**: it is not recommended that you set this configuration in the production environment.|2
+    `managedLedgerMaxEntriesPerLedger`|Maximum number of entries to append to a ledger before triggering a rollover.<br><br>**Note**: it is not recommended that you set this configuration in the production environment.|5000
+
+#### Bucket (required)
+
+A bucket is a basic container that holds your data. Everything you store in AWS S3 must be contained in a bucket. You can use a bucket to organize your data and control access to your data, but unlike directory and folder, you cannot nest a bucket.
+
+##### Example
+
+This example names the bucket as _pulsar-topic-offload_.
+
+```conf
+s3ManagedLedgerOffloadBucket=pulsar-topic-offload
+```
+
+#### Bucket region 
+
+A bucket region is a region where a bucket is located. If a bucket region is not specified, the **default** region (`US East (N. Virginia)`) is used.
+
+> #### Tip
+>
+> For more information about AWS regions and endpoints, see [here](https://docs.aws.amazon.com/general/latest/gr/rande.html).
+ 
+##### Example
+
+This example sets the bucket region as _europe-west-3_.
+
+```
+s3ManagedLedgerOffloadRegion=eu-west-3
+```
+
+#### Authentication (required)
+
+To be able to access AWS S3, you need to authenticate with AWS S3.
+
+Pulsar does not provide any direct methods of configuring authentication for AWS S3,
+but relies on the mechanisms supported by the
+[DefaultAWSCredentialsProviderChain](https://docs.aws.amazon.com/AWSJavaSDK/latest/javadoc/com/amazonaws/auth/DefaultAWSCredentialsProviderChain.html).
+
+Once you have created a set of credentials in the AWS IAM console, you can configure credentials using one of the following methods.
+
+* Use EC2 instance metadata credentials.
+
+    If you are on AWS instance with an instance profile that provides credentials, Pulsar uses these credentials if no other mechanism is provided.
+
+* Set the environment variables `AWS_ACCESS_KEY_ID` and `AWS_SECRET_ACCESS_KEY` in `conf/pulsar_env.sh`.
+
+    "export" is important so that the variables are made available in the environment of spawned processes.
+
+    ```bash
+    export AWS_ACCESS_KEY_ID=ABC123456789
+    export AWS_SECRET_ACCESS_KEY=ded7db27a4558e2ea8bbf0bf37ae0e8521618f366c
+    ```
+
+* Add the Java system properties `aws.accessKeyId` and `aws.secretKey` to `PULSAR_EXTRA_OPTS` in `conf/pulsar_env.sh`.
+
+    ```bash
+    PULSAR_EXTRA_OPTS="${PULSAR_EXTRA_OPTS} ${PULSAR_MEM} ${PULSAR_GC} -Daws.accessKeyId=ABC123456789 -Daws.secretKey=ded7db27a4558e2ea8bbf0bf37ae0e8521618f366c -Dio.netty.leakDetectionLevel=disabled -Dio.netty.recycler.maxCapacity.default=1000 -Dio.netty.recycler.linkCapacity=1024"
+    ```
+
+* Set the access credentials in `~/.aws/credentials`.
+
+    ```conf
+    [default]
+    aws_access_key_id=ABC123456789
+    aws_secret_access_key=ded7db27a4558e2ea8bbf0bf37ae0e8521618f366c
+    ```
+
+* Assume an IAM role.
+
+    This example uses the `DefaultAWSCredentialsProviderChain` for assuming this role.
+
+    The broker must be rebooted for credentials specified in `pulsar_env` to take effect.
+
+    ```conf
+    s3ManagedLedgerOffloadRole=<aws role arn>
+    s3ManagedLedgerOffloadRoleSessionName=pulsar-s3-offload
+    ```
+
+#### Size of block read/write
+
+You can configure the size of a request sent to or read from AWS S3 in the configuration file `broker.conf` or `standalone.conf`. 
+
+Configuration|Description|Default value
+|---|---|---
+`s3ManagedLedgerOffloadReadBufferSizeInBytes`|Block size for each individual read when reading back data from AWS S3.|1 MB
+`s3ManagedLedgerOffloadMaxBlockSizeInBytes`|Maximum size of a "part" sent during a multipart upload to GCS. It **cannot** be smaller than 5 MB. |64 MB
+
+### Configure AWS S3 offloader to run automatically
+
+Namespace policy can be configured to offload data automatically once a threshold is reached. The threshold is based on the size of data that a topic has stored on a Pulsar cluster. Once the topic reaches the threshold, an offloading operation is triggered automatically. 
+
+Threshold value|Action
+|---|---
+> 0 | It triggers the offloading operation if the topic storage reaches its threshold.
+= 0|It causes a broker to offload data as soon as possible.
+< 0 |It disables automatic offloading operation.
+
+Automatic offloading runs when a new segment is added to a topic log. If you set the threshold on a namespace, but few messages are being produced to the topic, offload does not work until the current segment is full.

Review comment:
       Offloader. Thanks for your reminding.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org