You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2020/01/21 02:29:34 UTC

[GitHub] [incubator-hudi] lamber-ken commented on a change in pull request #1261: [HUDI-403] Adds guidelines on deployment/upgrading

lamber-ken commented on a change in pull request #1261: [HUDI-403] Adds guidelines on deployment/upgrading
URL: https://github.com/apache/incubator-hudi/pull/1261#discussion_r368786695

##########
File path: docs/_docs/2_6_deployment.md
##########
@@ -1,51 +1,87 @@
---
-title: Administering Hudi Pipelines
-keywords: hudi, administration, operation, devops
-permalink: /docs/admin_guide.html
-summary: This section offers an overview of tools available to operate an ecosystem of Hudi datasets
+title: Deployment Guide
+keywords: hudi, administration, operation, devops, deployment
+permalink: /docs/deployment.html
+summary: This section offers an overview of tools available to operate an ecosystem of Hudi
toc: true
last_modified_at: 2019-12-30T15:59:57-04:00
---

-Admins/ops can gain visibility into Hudi datasets/pipelines in the following ways
+This section provides all the help you need to deploy and operate Hudi tables at scale.
+Specifically, we will cover the following aspects.

- - [Administering via the Admin CLI](#admin-cli)
- - [Graphite metrics](#metrics)
- - [Spark UI of the Hudi Application](#spark-ui)
+ - [Deployment Model](#deploying) : How various Hudi components are deployed and managed.
+ - [Upgrading Versions](#upgrading) : Picking up new releases of Hudi, guidelines and general best-practices
+ - [Migrating to Hudi](#migrating) : How to migrate your existing tables to Apache Hudi.
+ - [Interacting via CLI](#cli) : Using the CLI to perform maintenance or deeper introspection
+ - [Monitoring](#monitoring) : Tracking metrics from your hudi tables using popular tools.
+ - [Troubleshooting](#troubleshooting) : Uncovering, triaging and resolving issues in production.
+
+## Deploying

-This section provides a glimpse into each of these, with some general guidance on [troubleshooting](#troubleshooting)
+All in all, Hudi deploys with no long running servers or additional infrastructure cost to your data lake. In fact, Hudi pioneered this model of building a transactional distributed storage layer
+using existing infrastructure and its heartening to see other systems adopting similar approaches as well. Hudi writing is done via Spark jobs (DeltaStreamer or custom Spark datasource jobs), deployed per standard Apache Spark [recommendations](https://spark.apache.org/docs/latest/cluster-overview.html).
+Querying Hudi tables happens via libraries installed into Apache Hive, Apache Spark or Presto and hence no additional infrastructure is necessary.

-## Admin CLI

-Once hudi has been built, the shell can be fired by via `cd hudi-cli && ./hudi-cli.sh`.
-A hudi dataset resides on DFS, in a location referred to as the **basePath** and we would need this location in order to connect to a Hudi dataset.
-Hudi library effectively manages this dataset internally, using .hoodie subfolder to track all metadata
+## Upgrading
+
+New Hudi releases are listed on the [releases page](/releases), with detailed notes which list all the changes, with highlights in each release.
+At the end of the day, Hudi is a storage system and with that comes a lot of responsibilities, which we take seriously.
+
+As general guidelines,
+
+ - We strive to keep all changes backwards compatible (i.e new code can read old data/timeline files) and we cannot we will provide upgrade/downgrade tools via the CLI
+ - We cannot always guarantee forward compatibility (i.e old code being able to read data/timeline files written by a greater version). This is generally the norm, since no new features can be built otherwise.
+ However any large such changes, will be turned off by default, for smooth transition to newer release. After a few releases and once enough users deem the feature stable in production, we will flip the defaults in a subsequent release.
+ - Always upgrade the query bundles (mr-bundle, presto-bundle, spark-bundle) first and then upgrade the writers (deltastreamer, spark jobs using datasource). This often provides the best experience and it's easy to fix
+ any issues by rolling forward/back the writer code (which typically you might have more control over)
+ - With large, feature rich releases we recommend migrating slowly, by first testing in staging environments and running your own tests. Upgrading Hudi is no different than upgrading any database system.
+
+Note that release notes can override this information with specific instructions, applicable on case-by-case basis.
+
+## Migrating
+
+Currently migrating to Hudi can be done using two approaches

Review comment:
Hi, miss `.` at the end of statement.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

With regards,
Apache Git Services