You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@ozone.apache.org by pi...@apache.org on 2023/02/27 11:41:12 UTC

[ozone] branch HDDS-5447-httpfs updated: HDDS-5966. [HTTPFSGW] Update module doc, and place it in Ozone project docs (#4250)

This is an automated email from the ASF dual-hosted git repository.

pifta pushed a commit to branch HDDS-5447-httpfs
in repository https://gitbox.apache.org/repos/asf/ozone.git


The following commit(s) were added to refs/heads/HDDS-5447-httpfs by this push:
     new 8b2a3bd922 HDDS-5966. [HTTPFSGW] Update module doc, and place it in Ozone project docs (#4250)
8b2a3bd922 is described below

commit 8b2a3bd9220e4f23d14f673d88fdf709692b7896
Author: Zita Dombi <50...@users.noreply.github.com>
AuthorDate: Mon Feb 27 12:41:06 2023 +0100

    HDDS-5966. [HTTPFSGW] Update module doc, and place it in Ozone project docs (#4250)
---
 hadoop-hdds/docs/content/design/httpfs.md          |  31 ++++
 hadoop-hdds/docs/content/interface/HttpFS.md       | 119 +++++++++++++
 hadoop-hdds/docs/content/tools/_index.md           |   1 +
 .../src/site/markdown/ServerSetup.md.vm            | 198 ---------------------
 .../src/site/markdown/UsingHttpTools.md            |  62 -------
 .../httpfsgateway/src/site/markdown/index.md       |  54 ------
 6 files changed, 151 insertions(+), 314 deletions(-)

diff --git a/hadoop-hdds/docs/content/design/httpfs.md b/hadoop-hdds/docs/content/design/httpfs.md
new file mode 100644
index 0000000000..ad174199aa
--- /dev/null
+++ b/hadoop-hdds/docs/content/design/httpfs.md
@@ -0,0 +1,31 @@
+---
+title: HttpFS support for Ozone
+summary: HttpFS is a WebHDFS compatible interface that is added as a separate role to Ozone.
+date: 2023-02-03
+jira: HDDS-5447
+status: implemented
+author: Zita Dombi, Istvan Fajth
+---
+<!--
+  Licensed under the Apache License, Version 2.0 (the "License");
+  you may not use this file except in compliance with the License.
+  You may obtain a copy of the License at
+
+   http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License. See accompanying LICENSE file.
+-->
+
+# Abstract
+
+Ozone HttpFS provides an HttpFS-compatible REST API interface to enable applications
+that are designed to use [HttpFS](https://hadoop.apache.org/docs/stable/hadoop-hdfs-httpfs/index.html)
+to interact and integrate with Ozone.
+
+# Link
+
+https://issues.apache.org/jira/secure/attachment/13031822/HTTPFS%20interface%20for%20Ozone.pdf
diff --git a/hadoop-hdds/docs/content/interface/HttpFS.md b/hadoop-hdds/docs/content/interface/HttpFS.md
new file mode 100644
index 0000000000..e413faf03c
--- /dev/null
+++ b/hadoop-hdds/docs/content/interface/HttpFS.md
@@ -0,0 +1,119 @@
+---
+title: HttpFS Gateway
+weight: 7
+menu:
+    main:
+        parent: "Client Interfaces"
+summary: Ozone HttpFS is a WebHDFS compatible interface implementation, as a separate role it provides an easy integration with Ozone.
+---
+
+<!---
+  Licensed to the Apache Software Foundation (ASF) under one or more
+  contributor license agreements.  See the NOTICE file distributed with
+  this work for additional information regarding copyright ownership.
+  The ASF licenses this file to You under the Apache License, Version 2.0
+  (the "License"); you may not use this file except in compliance with
+  the License.  You may obtain a copy of the License at
+
+      http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License.
+-->
+
+Ozone HttpFS can be used to integrate Ozone with other tools via REST API.
+
+## Introduction
+
+Ozone HttpFS is forked from the HDFS HttpFS endpoint implementation ([HDDS-5448](https://issues.apache.org/jira/browse/HDDS-5448)). Ozone HttpFS is intended to be added optionally as a role in an Ozone cluster, similar to [S3 Gateway]({{< ref "design/s3gateway.md" >}}).
+
+HttpFS is a service that provides a REST HTTP gateway supporting File System operations (read and write). It is interoperable with the **webhdfs** REST HTTP API.
+
+HttpFS can be used to access data on an Ozone cluster behind of a firewall. For example, the HttpFS service acts as a gateway and is the only system that is allowed to cross the firewall into the cluster.
+
+HttpFS can be used to access data in Ozone using HTTP utilities (such as curl and wget) and HTTP libraries Perl from other languages than Java.
+
+The **webhdfs** client FileSystem implementation can be used to access HttpFS using the Ozone filesystem command line tool (`ozone fs`) as well as from Java applications using the Hadoop FileSystem Java API.
+
+HttpFS has built-in security supporting Hadoop pseudo authentication and Kerberos SPNEGO and other pluggable authentication mechanisms. It also provides Hadoop proxy user support.
+
+
+## Getting started
+
+HttpFS service itself is a Jetty based web-application that uses the Hadoop FileSystem API to talk to the cluster, it is a separate service which provides access to Ozone via a REST APIs. It should be started in addition to other regular Ozone components.
+
+To try it out, you can start a Docker Compose dev cluster that has an HttpFS gateway.
+
+Extract the release tarball, go to the `compose/ozone` directory and start the cluster:
+
+```bash
+docker-compose up -d --scale datanode=3
+```
+
+You can/should find now the HttpFS gateway in docker with the name `ozone_httpfs`.
+HttpFS HTTP web-service API calls are HTTP REST calls that map to an Ozone file system operation. For example, using the `curl` Unix command.
+
+E.g. in the docker cluster you can execute commands like these:
+
+* `curl -i -X PUT "http://httpfs:14000/webhdfs/v1/vol1?op=MKDIRS&user.name=hdfs"` creates a volume called `vol1`.
+
+
+* `$ curl 'http://httpfs-host:14000/webhdfs/v1/user/foo/README.txt?op=OPEN&user.name=foo'` returns the content of the key `/user/foo/README.txt`.
+
+
+## Supported operations
+
+Here are the tables of WebHDFS REST APIs and their state of support in Ozone.
+
+### File and Directory Operations
+
+Operation                       |      Support
+--------------------------------|---------------------
+Create and Write to a File      | supported
+Append to a File                | not implemented in Ozone
+Concat File(s)                  | not implemented in Ozone
+Open and Read a File            | supported
+Make a Directory                | supported
+Create a Symbolic Link          | not implemented in Ozone
+Rename a File/Directory         | supported (with limitations)
+Delete a File/Directory         | supported
+Truncate a File                 | not implemented in Ozone
+Status of a File/Directory      | supported
+List a Directory                | supported
+List a File                     | supported
+Iteratively List a Directory    | supported
+
+
+### Other File System Operations
+
+Operation                             |      Support
+--------------------------------------|---------------------
+Get Content Summary of a Directory    | supported
+Get Quota Usage of a Directory        | supported
+Set Quota                             | not implemented in Ozone FileSystem API
+Set Quota By Storage Type             | not implemented in Ozone
+Get File Checksum                     | unsupported (to be fixed)
+Get Home Directory                    | unsupported (to be fixed)
+Get Trash Root                        | unsupported
+Set Permission                        | not implemented in Ozone FileSystem API
+Set Owner                             | not implemented in Ozone FileSystem API
+Set Replication Factor                | not implemented in Ozone FileSystem API
+Set Access or Modification Time       | not implemented in Ozone FileSystem API
+Modify ACL Entries                    | not implemented in Ozone FileSystem API
+Remove ACL Entries                    | not implemented in Ozone FileSystem API
+Remove Default ACL                    | not implemented in Ozone FileSystem API
+Remove ACL                            | not implemented in Ozone FileSystem API
+Set ACL                               | not implemented in Ozone FileSystem API
+Get ACL Status                        | not implemented in Ozone FileSystem API
+Check access                          | not implemented in Ozone FileSystem API
+
+
+
+## Hadoop user and developer documentation about HttpFS
+
+* [HttpFS Server Setup](https://hadoop.apache.org/docs/stable/hadoop-hdfs-httpfs/ServerSetup.html)
+
+* [Using HTTP Tools](https://hadoop.apache.org/docs/stable/hadoop-hdfs-httpfs/ServerSetup.html)
\ No newline at end of file
diff --git a/hadoop-hdds/docs/content/tools/_index.md b/hadoop-hdds/docs/content/tools/_index.md
index 12dd7f4faa..133223d9ab 100644
--- a/hadoop-hdds/docs/content/tools/_index.md
+++ b/hadoop-hdds/docs/content/tools/_index.md
@@ -37,6 +37,7 @@ Daemon commands:
    stopped.
    * **s3g** - Start the S3 compatible REST gateway
    * **recon** - The Web UI service of Ozone can be started with this command.
+   * **httpfs** - Start the HttpFS gateway
    
 Client commands:
 
diff --git a/hadoop-ozone/httpfsgateway/src/site/markdown/ServerSetup.md.vm b/hadoop-ozone/httpfsgateway/src/site/markdown/ServerSetup.md.vm
deleted file mode 100644
index 2d0a5b8cd2..0000000000
--- a/hadoop-ozone/httpfsgateway/src/site/markdown/ServerSetup.md.vm
+++ /dev/null
@@ -1,198 +0,0 @@
-<!---
-  Licensed under the Apache License, Version 2.0 (the "License");
-  you may not use this file except in compliance with the License.
-  You may obtain a copy of the License at
-
-   http://www.apache.org/licenses/LICENSE-2.0
-
-  Unless required by applicable law or agreed to in writing, software
-  distributed under the License is distributed on an "AS IS" BASIS,
-  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-  See the License for the specific language governing permissions and
-  limitations under the License. See accompanying LICENSE file.
--->
-
-Hadoop HDFS over HTTP - Server Setup
-====================================
-
-This page explains how to quickly setup HttpFS with Pseudo authentication against a Hadoop cluster with Pseudo authentication.
-
-Install HttpFS
---------------
-
-    ~ $ tar xzf  httpfs-${project.version}.tar.gz
-
-Configure HttpFS
-----------------
-
-By default, HttpFS assumes that Hadoop configuration files (`core-site.xml & hdfs-site.xml`) are in the HttpFS configuration directory.
-
-If this is not the case, add to the `httpfs-site.xml` file the `httpfs.hadoop.config.dir` property set to the location of the Hadoop configuration directory.
-
-Configure Hadoop
-----------------
-
-Edit Hadoop `core-site.xml` and defined the Unix user that will run the HttpFS server as a proxyuser. For example:
-
-```xml
-  <property>
-    <name>hadoop.proxyuser.#HTTPFSUSER#.hosts</name>
-    <value>httpfs-host.foo.com</value>
-  </property>
-  <property>
-    <name>hadoop.proxyuser.#HTTPFSUSER#.groups</name>
-    <value>*</value>
-  </property>
-```
-
-IMPORTANT: Replace `#HTTPFSUSER#` with the Unix user that will start the HttpFS server.
-
-Restart Hadoop
---------------
-
-You need to restart Hadoop for the proxyuser configuration to become active.
-
-Start/Stop HttpFS
------------------
-
-To start/stop HttpFS, use `hdfs --daemon start|stop httpfs`. For example:
-
-    hadoop-${project.version} $ hdfs --daemon start httpfs
-
-NOTE: The script `httpfs.sh` is deprecated. It is now just a wrapper of
-`hdfs httpfs`.
-
-Test HttpFS is working
-----------------------
-
-    $ curl -sS 'http://<HTTPFSHOSTNAME>:14000/webhdfs/v1?op=gethomedirectory&user.name=hdfs'
-    {"Path":"\/user\/hdfs"}
-
-HttpFS Configuration
---------------------
-
-HttpFS preconfigures the HTTP port to 14000.
-
-HttpFS supports the following [configuration properties](./httpfs-default.html) in the HttpFS's `etc/hadoop/httpfs-site.xml` configuration file.
-
-HttpFS over HTTPS (SSL)
------------------------
-
-Enable SSL in `etc/hadoop/httpfs-site.xml`:
-
-```xml
-  <property>
-    <name>httpfs.ssl.enabled</name>
-    <value>true</value>
-    <description>
-      Whether SSL is enabled. Default is false, i.e. disabled.
-    </description>
-  </property>
-```
-
-Configure `etc/hadoop/ssl-server.xml` with proper values, for example:
-
-```xml
-  <property>
-    <name>ssl.server.keystore.location</name>
-    <value>${user.home}/.keystore</value>
-    <description>Keystore to be used. Must be specified.
-    </description>
-  </property>
-
-  <property>
-    <name>ssl.server.keystore.password</name>
-    <value></value>
-    <description>Must be specified.</description>
-  </property>
-
-  <property>
-    <name>ssl.server.keystore.keypassword</name>
-    <value></value>
-    <description>Must be specified.</description>
-  </property>
-```
-
-The SSL passwords can be secured by a credential provider. See
-[Credential Provider API](../hadoop-project-dist/hadoop-common/CredentialProviderAPI.html).
-
-You need to create an SSL certificate for the HttpFS server. As the `httpfs` Unix user, using the Java `keytool` command to create the SSL certificate:
-
-    $ keytool -genkey -alias jetty -keyalg RSA
-
-You will be asked a series of questions in an interactive prompt. It will create the keystore file, which will be named **.keystore** and located in the `httpfs` user home directory.
-
-The password you enter for "keystore password" must match the value of the
-property `ssl.server.keystore.password` set in the `ssl-server.xml` in the
-configuration directory.
-
-The answer to "What is your first and last name?" (i.e. "CN") must be the hostname of the machine where the HttpFS Server will be running.
-
-Start HttpFS. It should work over HTTPS.
-
-Using the Hadoop `FileSystem` API or the Hadoop FS shell, use the `swebhdfs://` scheme. Make sure the JVM is picking up the truststore containing the public key of the SSL certificate if using a self-signed certificate.
-For more information about the client side settings, see [SSL Configurations for SWebHDFS](../hadoop-project-dist/hadoop-hdfs/WebHDFS.html#SSL_Configurations_for_SWebHDFS).
-
-NOTE: Some old SSL clients may use weak ciphers that are not supported by the HttpFS server. It is recommended to upgrade the SSL client.
-
-Deprecated Environment Variables
---------------------------------
-
-The following environment variables are deprecated. Set the corresponding
-configuration properties instead.
-
-Environment Variable        | Configuration Property       | Configuration File
-----------------------------|------------------------------|--------------------
-HTTPFS_HTTP_HOSTNAME        | httpfs.http.hostname         | httpfs-site.xml
-HTTPFS_HTTP_PORT            | httpfs.http.port             | httpfs-site.xml
-HTTPFS_MAX_HTTP_HEADER_SIZE | hadoop.http.max.request.header.size and hadoop.http.max.response.header.size | httpfs-site.xml
-HTTPFS_MAX_THREADS          | hadoop.http.max.threads      | httpfs-site.xml
-HTTPFS_SSL_ENABLED          | httpfs.ssl.enabled           | httpfs-site.xml
-HTTPFS_SSL_KEYSTORE_FILE    | ssl.server.keystore.location | ssl-server.xml
-HTTPFS_SSL_KEYSTORE_PASS    | ssl.server.keystore.password | ssl-server.xml
-
-HTTP Default Services
----------------------
-
-Name               | Description
--------------------|------------------------------------
-/conf              | Display configuration properties
-/jmx               | Java JMX management interface
-/logLevel          | Get or set log level per class
-/logs              | Display log files
-/stacks            | Display JVM stacks
-/static/index.html | The static home page
-
-To control the access to servlet `/conf`, `/jmx`, `/logLevel`, `/logs`,
-and `/stacks`, configure the following properties in `httpfs-site.xml`:
-
-```xml
-  <property>
-    <name>hadoop.security.authorization</name>
-    <value>true</value>
-    <description>Is service-level authorization enabled?</description>
-  </property>
-
-  <property>
-    <name>hadoop.security.instrumentation.requires.admin</name>
-    <value>true</value>
-    <description>
-      Indicates if administrator ACLs are required to access
-      instrumentation servlets (JMX, METRICS, CONF, STACKS).
-    </description>
-  </property>
-
-  <property>
-    <name>httpfs.http.administrators</name>
-    <value></value>
-    <description>ACL for the admins, this configuration is used to control
-      who can access the default servlets for HttpFS server. The value
-      should be a comma separated list of users and groups. The user list
-      comes first and is separated by a space followed by the group list,
-      e.g. "user1,user2 group1,group2". Both users and groups are optional,
-      so "user1", " group1", "", "user1 group1", "user1,user2 group1,group2"
-      are all valid (note the leading space in " group1"). '*' grants access
-      to all users and groups, e.g. '*', '* ' and ' *' are all valid.
-    </description>
-  </property>
-```
\ No newline at end of file
diff --git a/hadoop-ozone/httpfsgateway/src/site/markdown/UsingHttpTools.md b/hadoop-ozone/httpfsgateway/src/site/markdown/UsingHttpTools.md
deleted file mode 100644
index 3045ad6506..0000000000
--- a/hadoop-ozone/httpfsgateway/src/site/markdown/UsingHttpTools.md
+++ /dev/null
@@ -1,62 +0,0 @@
-<!---
-  Licensed under the Apache License, Version 2.0 (the "License");
-  you may not use this file except in compliance with the License.
-  You may obtain a copy of the License at
-
-   http://www.apache.org/licenses/LICENSE-2.0
-
-  Unless required by applicable law or agreed to in writing, software
-  distributed under the License is distributed on an "AS IS" BASIS,
-  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-  See the License for the specific language governing permissions and
-  limitations under the License. See accompanying LICENSE file.
--->
-
-Hadoop HDFS over HTTP - Using HTTP Tools
-========================================
-
-Security
---------
-
-Out of the box HttpFS supports both pseudo authentication and Kerberos HTTP SPNEGO authentication.
-
-### Pseudo Authentication
-
-With pseudo authentication the user name must be specified in the `user.name=<USERNAME>` query string parameter of a HttpFS URL. For example:
-
-    $ curl "http://<HTTFS_HOST>:14000/webhdfs/v1?op=homedir&user.name=babu"
-
-### Kerberos HTTP SPNEGO Authentication
-
-Kerberos HTTP SPNEGO authentication requires a tool or library supporting Kerberos HTTP SPNEGO protocol.
-
-IMPORTANT: If using `curl`, the `curl` version being used must support GSS (`curl -V` prints out 'GSS' if it supports it).
-
-For example:
-
-    $ kinit
-    Please enter the password for user@LOCALHOST:
-    $ curl --negotiate -u foo "http://<HTTPFS_HOST>:14000/webhdfs/v1?op=homedir"
-    Enter host password for user 'foo':
-
-NOTE: the `-u USER` option is required by the `--negotiate` but it is not used. Use any value as `USER` and when asked for the password press [ENTER] as the password value is ignored.
-
-### Remembering Who I Am (Establishing an Authenticated Session)
-
-As most authentication mechanisms, Hadoop HTTP authentication authenticates users once and issues a short-lived authentication token to be presented in subsequent requests. This authentication token is a signed HTTP Cookie.
-
-When using tools like `curl`, the authentication token must be stored on the first request doing authentication, and submitted in subsequent requests. To do this with curl the `-b` and `-c` options to save and send HTTP Cookies must be used.
-
-For example, the first request doing authentication should save the received HTTP Cookies.
-
-Using Pseudo Authentication:
-
-    $ curl -c ~/.httpfsauth "http://<HTTPFS_HOST>:14000/webhdfs/v1?op=homedir&user.name=foo"
-
-Using Kerberos HTTP SPNEGO authentication:
-
-    $ curl --negotiate -u foo -c ~/.httpfsauth "http://<HTTPFS_HOST>:14000/webhdfs/v1?op=homedir"
-
-Then, subsequent requests forward the previously received HTTP Cookie:
-
-    $ curl -b ~/.httpfsauth "http://<HTTPFS_HOST>:14000/webhdfs/v1?op=liststatus"
diff --git a/hadoop-ozone/httpfsgateway/src/site/markdown/index.md b/hadoop-ozone/httpfsgateway/src/site/markdown/index.md
deleted file mode 100644
index 6eef9e7d30..0000000000
--- a/hadoop-ozone/httpfsgateway/src/site/markdown/index.md
+++ /dev/null
@@ -1,54 +0,0 @@
-<!---
-  Licensed under the Apache License, Version 2.0 (the "License");
-  you may not use this file except in compliance with the License.
-  You may obtain a copy of the License at
-
-   http://www.apache.org/licenses/LICENSE-2.0
-
-  Unless required by applicable law or agreed to in writing, software
-  distributed under the License is distributed on an "AS IS" BASIS,
-  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-  See the License for the specific language governing permissions and
-  limitations under the License. See accompanying LICENSE file.
--->
-
-Hadoop HDFS over HTTP - Documentation Sets
-==========================================
-
-HttpFS is a server that provides a REST HTTP gateway supporting all HDFS File System operations (read and write). And it is interoperable with the **webhdfs** REST HTTP API.
-
-HttpFS can be used to transfer data between clusters running different versions of Hadoop (overcoming RPC versioning issues), for example using Hadoop DistCP.
-
-HttpFS can be used to access data in HDFS on a cluster behind of a firewall (the HttpFS server acts as a gateway and is the only system that is allowed to cross the firewall into the cluster).
-
-HttpFS can be used to access data in HDFS using HTTP utilities (such as curl and wget) and HTTP libraries Perl from other languages than Java.
-
-The **webhdfs** client FileSystem implementation can be used to access HttpFS using the Hadoop filesystem command (`hadoop fs`) line tool as well as from Java applications using the Hadoop FileSystem Java API.
-
-HttpFS has built-in security supporting Hadoop pseudo authentication and HTTP SPNEGO Kerberos and other pluggable authentication mechanisms. It also provides Hadoop proxy user support.
-
-How Does HttpFS Works?
-----------------------
-
-HttpFS is a separate service from Hadoop NameNode.
-
-HttpFS itself is Java Jetty web-application.
-
-HttpFS HTTP web-service API calls are HTTP REST calls that map to a HDFS file system operation. For example, using the `curl` Unix command:
-
-* `$ curl 'http://httpfs-host:14000/webhdfs/v1/user/foo/README.txt?op=OPEN&user.name=foo'` returns the contents of the HDFS `/user/foo/README.txt` file.
-
-* `$ curl 'http://httpfs-host:14000/webhdfs/v1/user/foo?op=LISTSTATUS&user.name=foo'` returns the contents of the HDFS `/user/foo` directory in JSON format.
-
-* `$ curl 'http://httpfs-host:14000/webhdfs/v1/user/foo?op=GETTRASHROOT&user.name=foo'` returns the path `/user/foo/.Trash`, if `/` is an encrypted zone, returns the path `/.Trash/foo`. See [more details](../hadoop-project-dist/hadoop-hdfs/TransparentEncryption.html#Rename_and_Trash_considerations) about trash path in an encrypted zone.
-
-* `$ curl -X POST 'http://httpfs-host:14000/webhdfs/v1/user/foo/bar?op=MKDIRS&user.name=foo'` creates the HDFS `/user/foo/bar` directory.
-
-User and Developer Documentation
---------------------------------
-
-* [HttpFS Server Setup](./ServerSetup.html)
-
-* [Using HTTP Tools](./UsingHttpTools.html)
-
-


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@ozone.apache.org
For additional commands, e-mail: commits-help@ozone.apache.org