You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ozone.apache.org by GitBox <gi...@apache.org> on 2020/09/09 12:50:49 UTC

[GitHub] [hadoop-ozone] elek opened a new pull request #1411: HDDS-4097. [DESIGN] S3/Ozone Filesystem inter-op

elek opened a new pull request #1411:
URL: https://github.com/apache/hadoop-ozone/pull/1411


   ## What changes were proposed in this pull request?
   
   A new design doc is included about S3/HCFS interoperability. Earlier it was discussed under https://issues.apache.org/jira/browse/HDDS-4097. 
   
   But I created this PR as:
   
    1. I promised to do it to make it easier to include all the context specific comments
    2. Make it easier to follow the document specific changes / discussions


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org


[GitHub] [hadoop-ozone] codecov-commenter commented on pull request #1411: HDDS-4097. [DESIGN] S3/Ozone Filesystem inter-op

Posted by GitBox <gi...@apache.org>.
codecov-commenter commented on pull request #1411:
URL: https://github.com/apache/hadoop-ozone/pull/1411#issuecomment-691705833


   # [Codecov](https://codecov.io/gh/apache/hadoop-ozone/pull/1411?src=pr&el=h1) Report
   > Merging [#1411](https://codecov.io/gh/apache/hadoop-ozone/pull/1411?src=pr&el=desc) into [master](https://codecov.io/gh/apache/hadoop-ozone/commit/9a4cb9e385c9fc95331ff7a0d2dd731e0a74a21c?el=desc) will **increase** coverage by `0.07%`.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/hadoop-ozone/pull/1411/graphs/tree.svg?width=650&height=150&src=pr&token=5YeeptJMby)](https://codecov.io/gh/apache/hadoop-ozone/pull/1411?src=pr&el=tree)
   
   ```diff
   @@             Coverage Diff              @@
   ##             master    #1411      +/-   ##
   ============================================
   + Coverage     75.11%   75.19%   +0.07%     
   - Complexity    10488    10497       +9     
   ============================================
     Files           990      990              
     Lines         50885    50885              
     Branches       4960     4960              
   ============================================
   + Hits          38221    38261      +40     
   + Misses        10280    10238      -42     
   - Partials       2384     2386       +2     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/hadoop-ozone/pull/1411?src=pr&el=tree) | Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | [...er/common/transport/server/GrpcXceiverService.java](https://codecov.io/gh/apache/hadoop-ozone/pull/1411/diff?src=pr&el=tree#diff-aGFkb29wLWhkZHMvY29udGFpbmVyLXNlcnZpY2Uvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2hhZG9vcC9vem9uZS9jb250YWluZXIvY29tbW9uL3RyYW5zcG9ydC9zZXJ2ZXIvR3JwY1hjZWl2ZXJTZXJ2aWNlLmphdmE=) | `70.00% <0.00%> (-10.00%)` | `3.00% <0.00%> (ø%)` | |
   | [...ache/hadoop/ozone/om/codec/S3SecretValueCodec.java](https://codecov.io/gh/apache/hadoop-ozone/pull/1411/diff?src=pr&el=tree#diff-aGFkb29wLW96b25lL296b25lLW1hbmFnZXIvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2hhZG9vcC9vem9uZS9vbS9jb2RlYy9TM1NlY3JldFZhbHVlQ29kZWMuamF2YQ==) | `90.90% <0.00%> (-9.10%)` | `3.00% <0.00%> (-1.00%)` | |
   | [...hdds/scm/container/common/helpers/ExcludeList.java](https://codecov.io/gh/apache/hadoop-ozone/pull/1411/diff?src=pr&el=tree#diff-aGFkb29wLWhkZHMvY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9oYWRvb3AvaGRkcy9zY20vY29udGFpbmVyL2NvbW1vbi9oZWxwZXJzL0V4Y2x1ZGVMaXN0LmphdmE=) | `78.26% <0.00%> (-8.70%)` | `17.00% <0.00%> (-2.00%)` | |
   | [...doop/hdds/scm/container/ContainerStateManager.java](https://codecov.io/gh/apache/hadoop-ozone/pull/1411/diff?src=pr&el=tree#diff-aGFkb29wLWhkZHMvc2VydmVyLXNjbS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvaGFkb29wL2hkZHMvc2NtL2NvbnRhaW5lci9Db250YWluZXJTdGF0ZU1hbmFnZXIuamF2YQ==) | `81.67% <0.00%> (-6.88%)` | `32.00% <0.00%> (-3.00%)` | |
   | [...apache/hadoop/hdds/server/events/EventWatcher.java](https://codecov.io/gh/apache/hadoop-ozone/pull/1411/diff?src=pr&el=tree#diff-aGFkb29wLWhkZHMvZnJhbWV3b3JrL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9oYWRvb3AvaGRkcy9zZXJ2ZXIvZXZlbnRzL0V2ZW50V2F0Y2hlci5qYXZh) | `77.77% <0.00%> (-4.17%)` | `14.00% <0.00%> (ø%)` | |
   | [...doop/hdds/scm/pipeline/SimplePipelineProvider.java](https://codecov.io/gh/apache/hadoop-ozone/pull/1411/diff?src=pr&el=tree#diff-aGFkb29wLWhkZHMvc2VydmVyLXNjbS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvaGFkb29wL2hkZHMvc2NtL3BpcGVsaW5lL1NpbXBsZVBpcGVsaW5lUHJvdmlkZXIuamF2YQ==) | `76.00% <0.00%> (-4.00%)` | `4.00% <0.00%> (-1.00%)` | |
   | [...ent/algorithms/SCMContainerPlacementRackAware.java](https://codecov.io/gh/apache/hadoop-ozone/pull/1411/diff?src=pr&el=tree#diff-aGFkb29wLWhkZHMvc2VydmVyLXNjbS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvaGFkb29wL2hkZHMvc2NtL2NvbnRhaW5lci9wbGFjZW1lbnQvYWxnb3JpdGhtcy9TQ01Db250YWluZXJQbGFjZW1lbnRSYWNrQXdhcmUuamF2YQ==) | `76.69% <0.00%> (-3.01%)` | `31.00% <0.00%> (-2.00%)` | |
   | [...va/org/apache/hadoop/ozone/lease/LeaseManager.java](https://codecov.io/gh/apache/hadoop-ozone/pull/1411/diff?src=pr&el=tree#diff-aGFkb29wLWhkZHMvY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9oYWRvb3Avb3pvbmUvbGVhc2UvTGVhc2VNYW5hZ2VyLmphdmE=) | `90.80% <0.00%> (-2.30%)` | `15.00% <0.00%> (-1.00%)` | |
   | [...apache/hadoop/ozone/client/io/KeyOutputStream.java](https://codecov.io/gh/apache/hadoop-ozone/pull/1411/diff?src=pr&el=tree#diff-aGFkb29wLW96b25lL2NsaWVudC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvaGFkb29wL296b25lL2NsaWVudC9pby9LZXlPdXRwdXRTdHJlYW0uamF2YQ==) | `78.75% <0.00%> (-2.09%)` | `45.00% <0.00%> (-3.00%)` | |
   | [...hadoop/hdds/scm/container/SCMContainerManager.java](https://codecov.io/gh/apache/hadoop-ozone/pull/1411/diff?src=pr&el=tree#diff-aGFkb29wLWhkZHMvc2VydmVyLXNjbS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvaGFkb29wL2hkZHMvc2NtL2NvbnRhaW5lci9TQ01Db250YWluZXJNYW5hZ2VyLmphdmE=) | `73.68% <0.00%> (-1.92%)` | `39.00% <0.00%> (-1.00%)` | |
   | ... and [20 more](https://codecov.io/gh/apache/hadoop-ozone/pull/1411/diff?src=pr&el=tree-more) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/hadoop-ozone/pull/1411?src=pr&el=continue).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/hadoop-ozone/pull/1411?src=pr&el=footer). Last update [9a4cb9e...78fbaff](https://codecov.io/gh/apache/hadoop-ozone/pull/1411?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org


[GitHub] [hadoop-ozone] bharatviswa504 commented on a change in pull request #1411: HDDS-4097. [DESIGN] S3/Ozone Filesystem inter-op

Posted by GitBox <gi...@apache.org>.
bharatviswa504 commented on a change in pull request #1411:
URL: https://github.com/apache/hadoop-ozone/pull/1411#discussion_r485794525



##########
File path: hadoop-hdds/docs/content/design/s3_hcfs.md
##########
@@ -0,0 +1,282 @@
+---
+title: S3/Ozone Filesystem inter-op 
+summary: How to support both S3 and HCFS and the same time
+date: 2020-09-09
+jira: HDDS-4097
+status: draft
+author: Marton Elek, 
+---
+<!--
+  Licensed under the Apache License, Version 2.0 (the "License");
+  you may not use this file except in compliance with the License.
+  You may obtain a copy of the License at
+
+   http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License. See accompanying LICENSE file.
+-->
+
+# Ozone S3 vs file-system semantics
+
+Ozone is an object-store for Hadoop ecosystem which can be used from multiple interfaces: 
+
+ 1. From Hadoop Compatible File Systems (will be called as *HCFS* in the remaining of this document) (RPC)
+ 2. From S3 compatible applications (REST)
+ 3. From container orchestrator as mounted volume (CSI, alpha feature)
+
+As Ozone is an object store it stores key and values in a flat hierarchy which is enough to support S3 (2). But to support Hadoop Compatible File System (and CSI), Ozone should simulated file system hierarchy.
+
+There are multiple challenges when file system hierarchy is simulated by a flat namespace:
+
+ 1. Some key patterns couldn't be easily transformed to file system path (e.g. `/a/b/../c`, `/a/b//d`, or a real key with directory path like `/b/d/`)
+ 2. Directory entries (which may have own properties) require special handling as file system interface requires a dir entry even if it's not created explicitly (for example if key `/a/b/c` is created `/a/b` supposed to be a visible directory entry for file system interface) 
+ 3. Non-recursive listing of directories can be hard (Listing direct entries under `/a` should ignore all the `/a/b/...`, `/a/b/c/...` keys) 
+ 4. Similar to listing, rename can be a costly operation as it requires to rename many keys (renaming a first level directory means a rename of all the keys with the same prefix)
+
+See also the [Hadoop S3A documentation](https://hadoop.apache.org/docs/current/hadoop-aws/tools/hadoop-aws/index.html#Introducing_the_Hadoop_S3A_client) which describes some of these problem when AWS S3 is used. (*Warnings* section)
+
+# Current status
+
+As of today *Ozone Manager* has two different interfaces (both are defined in `OmClientProtocol.proto`): 
+
+ 1. object store related functions (like *CreateKey*, *LookupKey*, *DeleteKey*,...)  
+ 2. file system related functions (like *CreateFile*, *LookupFile*,...)
+
+File system related functions uses the same flat hierarchy under the hood but includes additional functionalities. For example the `createFile` call creates all the intermediate directories for a specific key (create file `/a/b/c` will create `/a/b` and `/a` entries in the key space)
+
+Today, if a key is created from the S3 interface can cause exceptions if the intermediate directories are checked from HCFS:
+
+
+```shell
+$ aws s3api put-object --endpoint http://localhost:9878 --bucket bucket1 --key /a/b/c/d
+
+$ ozone fs -ls o3fs://bucket1.s3v/a/
+ls: `o3fs://bucket1.s3v/a/': No such file or directory
+```
+
+This problem is reported in [HDDS-3955](https://issues.apache.org/jira/browse/HDDS-3955), where a new configuration key is introduced (`ozone.om.enable.filesystem.paths`). If this is enabled, intermediate directories are created even if the object store interface is used.
+
+This configuration is turned off by default, which means that S3 and HCFS couldn't be used together.
+
+To solve the performance problems of the directory listing / rename, [HDDS-2939](https://issues.apache.org/jira/browse/HDDS-2939) is created, which propose to use a new prefix table to store the "directory" entries (=prefixes).
+
+[HDDS-4097](https://issues.apache.org/jira/browse/HDDS-4097) is created to normalize the key names based on file-system semantics if `ozone.om.enable.filesystem.paths` is enabled. But please note that `ozone.om.enable.filesystem.paths` should always be turned on if S3 and HCFS are both used which means that S3 and HCFS couldn't be used together with normalization.
+
+# Goals
+
+ * Out of the box Ozone should support both S3 and HCFS interfaces without any settings. (It's possible only for the regular, fs compatible key names)
+ * As 100% compatibility couldn't be achieved on both side we need a configuration to set the expectations for incompatible key names
+ * Default behavior of `o3fs` and `ofs` should be as close to `s3a` as possible (when s3 compatibilty is prefered)
+
+# Possible cases to support
+
+There are two main aspects of supporting both `ofs/o3fs` and `s3` together:
+
+ 1. `ofs/o3fs` require to create intermediate directory entries (for exapmle `/a/b` for the key `/b/c/c`)
+ 2. Special file-system incompatible key names require special attention
+
+The second couldn't be done with compromise.
+
+ 1. We either support all key names (including non fs compatible key names), which means `ofs/o3fs` can provide only a partial view
+ 2. Or we can normalize the key names to be fs compatible (which makes it possible to create inconsistent S3 keys)
+
+HDDS-3955 introduced `ozone.om.enable.filesystem.paths`, with this setting we will have two possible usage pattern:
+
+| ozone.om.enable.filesystem.paths= | true | false
+|-|-|-|
+| create itermediate dirs | YES | NO |
+| normalize key names from `ofs/o3fs` | YES | NO
+| force to normalize key names of `s3` interface | YES (1) | NO 
+| `s3` key `/a/b/c` available from `ofs/o3fs` | YES | NO
+| `s3` key `/a/b//c` available from `ofs/o3fs` | YES | NO
+| `s3` key `/a/b//c` available from `s3` | AWS S3 incompatibility | YES
+
+(1): Under implementation
+
+This proposal suggest to use a 3rd option where 100% AWS compatiblity is guaranteed in exchange of a limited `ofs/o3fs` view:
+
+| ozone.om.intermediate.dir.generation= | true |
+|-|-|-|
+| create itermediate dirs | YES | 
+| normalize key names from `ofs/o3fs` | YES |
+| force to normalize key names of `s3` interface | **NO** |
+| `s3` key `/a/b/c` available from `ofs/o3fs` | YES | 
+| `s3` key `/a/b//c` available from `ofs/o3fs` | NO | 
+| `s3` key `/a/b//c` available from `s3` | *YES* (100% AWS compatibility) |
+
+
+# Proposed solution
+
+In short: 
+
+ **I propose to make it possible to configure **normalization** and **intermediate dir creation**, independent from each other**
+ 
+It can be done in multiple ways. For the sake of simplicity, let's imagine two configuration option
+
+| configuration | behavior | 
+|-|-|
+| `ozone.om.enable.filesystem.paths=true`  | Enable intermediate dir generation **AND** key name normalization 

Review comment:
       Question: Is this config still OM config?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org


[GitHub] [hadoop-ozone] bharatviswa504 commented on a change in pull request #1411: HDDS-4097. [DESIGN] S3/Ozone Filesystem inter-op

Posted by GitBox <gi...@apache.org>.
bharatviswa504 commented on a change in pull request #1411:
URL: https://github.com/apache/hadoop-ozone/pull/1411#discussion_r485798906



##########
File path: hadoop-hdds/docs/content/design/s3_hcfs.md
##########
@@ -0,0 +1,282 @@
+---
+title: S3/Ozone Filesystem inter-op 
+summary: How to support both S3 and HCFS and the same time
+date: 2020-09-09
+jira: HDDS-4097
+status: draft
+author: Marton Elek, 
+---
+<!--
+  Licensed under the Apache License, Version 2.0 (the "License");
+  you may not use this file except in compliance with the License.
+  You may obtain a copy of the License at
+
+   http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License. See accompanying LICENSE file.
+-->
+
+# Ozone S3 vs file-system semantics
+
+Ozone is an object-store for Hadoop ecosystem which can be used from multiple interfaces: 
+
+ 1. From Hadoop Compatible File Systems (will be called as *HCFS* in the remaining of this document) (RPC)
+ 2. From S3 compatible applications (REST)
+ 3. From container orchestrator as mounted volume (CSI, alpha feature)
+
+As Ozone is an object store it stores key and values in a flat hierarchy which is enough to support S3 (2). But to support Hadoop Compatible File System (and CSI), Ozone should simulated file system hierarchy.
+
+There are multiple challenges when file system hierarchy is simulated by a flat namespace:
+
+ 1. Some key patterns couldn't be easily transformed to file system path (e.g. `/a/b/../c`, `/a/b//d`, or a real key with directory path like `/b/d/`)
+ 2. Directory entries (which may have own properties) require special handling as file system interface requires a dir entry even if it's not created explicitly (for example if key `/a/b/c` is created `/a/b` supposed to be a visible directory entry for file system interface) 
+ 3. Non-recursive listing of directories can be hard (Listing direct entries under `/a` should ignore all the `/a/b/...`, `/a/b/c/...` keys) 
+ 4. Similar to listing, rename can be a costly operation as it requires to rename many keys (renaming a first level directory means a rename of all the keys with the same prefix)
+
+See also the [Hadoop S3A documentation](https://hadoop.apache.org/docs/current/hadoop-aws/tools/hadoop-aws/index.html#Introducing_the_Hadoop_S3A_client) which describes some of these problem when AWS S3 is used. (*Warnings* section)
+
+# Current status
+
+As of today *Ozone Manager* has two different interfaces (both are defined in `OmClientProtocol.proto`): 
+
+ 1. object store related functions (like *CreateKey*, *LookupKey*, *DeleteKey*,...)  
+ 2. file system related functions (like *CreateFile*, *LookupFile*,...)
+
+File system related functions uses the same flat hierarchy under the hood but includes additional functionalities. For example the `createFile` call creates all the intermediate directories for a specific key (create file `/a/b/c` will create `/a/b` and `/a` entries in the key space)
+
+Today, if a key is created from the S3 interface can cause exceptions if the intermediate directories are checked from HCFS:
+
+
+```shell
+$ aws s3api put-object --endpoint http://localhost:9878 --bucket bucket1 --key /a/b/c/d
+
+$ ozone fs -ls o3fs://bucket1.s3v/a/
+ls: `o3fs://bucket1.s3v/a/': No such file or directory
+```
+
+This problem is reported in [HDDS-3955](https://issues.apache.org/jira/browse/HDDS-3955), where a new configuration key is introduced (`ozone.om.enable.filesystem.paths`). If this is enabled, intermediate directories are created even if the object store interface is used.
+
+This configuration is turned off by default, which means that S3 and HCFS couldn't be used together.
+
+To solve the performance problems of the directory listing / rename, [HDDS-2939](https://issues.apache.org/jira/browse/HDDS-2939) is created, which propose to use a new prefix table to store the "directory" entries (=prefixes).
+
+[HDDS-4097](https://issues.apache.org/jira/browse/HDDS-4097) is created to normalize the key names based on file-system semantics if `ozone.om.enable.filesystem.paths` is enabled. But please note that `ozone.om.enable.filesystem.paths` should always be turned on if S3 and HCFS are both used which means that S3 and HCFS couldn't be used together with normalization.
+
+# Goals
+
+ * Out of the box Ozone should support both S3 and HCFS interfaces without any settings. (It's possible only for the regular, fs compatible key names)
+ * As 100% compatibility couldn't be achieved on both side we need a configuration to set the expectations for incompatible key names
+ * Default behavior of `o3fs` and `ofs` should be as close to `s3a` as possible (when s3 compatibilty is prefered)
+
+# Possible cases to support
+
+There are two main aspects of supporting both `ofs/o3fs` and `s3` together:
+
+ 1. `ofs/o3fs` require to create intermediate directory entries (for exapmle `/a/b` for the key `/b/c/c`)
+ 2. Special file-system incompatible key names require special attention
+
+The second couldn't be done with compromise.
+
+ 1. We either support all key names (including non fs compatible key names), which means `ofs/o3fs` can provide only a partial view
+ 2. Or we can normalize the key names to be fs compatible (which makes it possible to create inconsistent S3 keys)
+
+HDDS-3955 introduced `ozone.om.enable.filesystem.paths`, with this setting we will have two possible usage pattern:
+
+| ozone.om.enable.filesystem.paths= | true | false
+|-|-|-|
+| create itermediate dirs | YES | NO |
+| normalize key names from `ofs/o3fs` | YES | NO
+| force to normalize key names of `s3` interface | YES (1) | NO 
+| `s3` key `/a/b/c` available from `ofs/o3fs` | YES | NO
+| `s3` key `/a/b//c` available from `ofs/o3fs` | YES | NO
+| `s3` key `/a/b//c` available from `s3` | AWS S3 incompatibility | YES
+
+(1): Under implementation
+
+This proposal suggest to use a 3rd option where 100% AWS compatiblity is guaranteed in exchange of a limited `ofs/o3fs` view:
+
+| ozone.om.intermediate.dir.generation= | true |
+|-|-|-|
+| create itermediate dirs | YES | 
+| normalize key names from `ofs/o3fs` | YES |
+| force to normalize key names of `s3` interface | **NO** |
+| `s3` key `/a/b/c` available from `ofs/o3fs` | YES | 
+| `s3` key `/a/b//c` available from `ofs/o3fs` | NO | 
+| `s3` key `/a/b//c` available from `s3` | *YES* (100% AWS compatibility) |
+
+
+# Proposed solution
+
+In short: 
+
+ **I propose to make it possible to configure **normalization** and **intermediate dir creation**, independent from each other**
+ 
+It can be done in multiple ways. For the sake of simplicity, let's imagine two configuration option
+
+| configuration | behavior | 
+|-|-|
+| `ozone.om.enable.filesystem.paths=true`  | Enable intermediate dir generation **AND** key name normalization 
+| `ozone.om.enable.intermediate.dirs=true` | Enable only the intermediate dir generation
+
+## S3/HCFS Interoperability
+
+**In case of intermediate directory generation is enabled (with either of the configuraiton keys)**:
+
+When somebody creates a new key like `/a/b/c/d`, the same key should be visible from HCFS (`o3fs//` or `o3://`). `/a`, `/a/b` and `/a/b/c` should be visible as directories from HCFS.
+
+S3 should list only the `/a/b/c/d` keys, (`/a`, `/a/b`, `/a/b/c` keys, created to help HCFS, **won't be visible** if the key is created from S3)
+
+This can be done with persisting an extra flag with the implicit directory entries. These entries can be modified if they are explicit created.
+
+This flag should be added only for the keys which are created by S3. `ofs://` and `of3fs://`  create explicit directories all the time.

Review comment:
        These entries can be modified if they are explicit created.
   Can you explain a little more about this statement?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org


[GitHub] [hadoop-ozone] bharatviswa504 commented on a change in pull request #1411: HDDS-4097. [DESIGN] S3/Ozone Filesystem inter-op

Posted by GitBox <gi...@apache.org>.
bharatviswa504 commented on a change in pull request #1411:
URL: https://github.com/apache/hadoop-ozone/pull/1411#discussion_r485802830



##########
File path: hadoop-hdds/docs/content/design/s3_hcfs.md
##########
@@ -0,0 +1,282 @@
+---
+title: S3/Ozone Filesystem inter-op 
+summary: How to support both S3 and HCFS and the same time
+date: 2020-09-09
+jira: HDDS-4097
+status: draft
+author: Marton Elek, 
+---
+<!--
+  Licensed under the Apache License, Version 2.0 (the "License");
+  you may not use this file except in compliance with the License.
+  You may obtain a copy of the License at
+
+   http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License. See accompanying LICENSE file.
+-->
+
+# Ozone S3 vs file-system semantics
+
+Ozone is an object-store for Hadoop ecosystem which can be used from multiple interfaces: 
+
+ 1. From Hadoop Compatible File Systems (will be called as *HCFS* in the remaining of this document) (RPC)
+ 2. From S3 compatible applications (REST)
+ 3. From container orchestrator as mounted volume (CSI, alpha feature)
+
+As Ozone is an object store it stores key and values in a flat hierarchy which is enough to support S3 (2). But to support Hadoop Compatible File System (and CSI), Ozone should simulated file system hierarchy.
+
+There are multiple challenges when file system hierarchy is simulated by a flat namespace:
+
+ 1. Some key patterns couldn't be easily transformed to file system path (e.g. `/a/b/../c`, `/a/b//d`, or a real key with directory path like `/b/d/`)
+ 2. Directory entries (which may have own properties) require special handling as file system interface requires a dir entry even if it's not created explicitly (for example if key `/a/b/c` is created `/a/b` supposed to be a visible directory entry for file system interface) 
+ 3. Non-recursive listing of directories can be hard (Listing direct entries under `/a` should ignore all the `/a/b/...`, `/a/b/c/...` keys) 
+ 4. Similar to listing, rename can be a costly operation as it requires to rename many keys (renaming a first level directory means a rename of all the keys with the same prefix)
+
+See also the [Hadoop S3A documentation](https://hadoop.apache.org/docs/current/hadoop-aws/tools/hadoop-aws/index.html#Introducing_the_Hadoop_S3A_client) which describes some of these problem when AWS S3 is used. (*Warnings* section)
+
+# Current status
+
+As of today *Ozone Manager* has two different interfaces (both are defined in `OmClientProtocol.proto`): 
+
+ 1. object store related functions (like *CreateKey*, *LookupKey*, *DeleteKey*,...)  
+ 2. file system related functions (like *CreateFile*, *LookupFile*,...)
+
+File system related functions uses the same flat hierarchy under the hood but includes additional functionalities. For example the `createFile` call creates all the intermediate directories for a specific key (create file `/a/b/c` will create `/a/b` and `/a` entries in the key space)
+
+Today, if a key is created from the S3 interface can cause exceptions if the intermediate directories are checked from HCFS:
+
+
+```shell
+$ aws s3api put-object --endpoint http://localhost:9878 --bucket bucket1 --key /a/b/c/d
+
+$ ozone fs -ls o3fs://bucket1.s3v/a/
+ls: `o3fs://bucket1.s3v/a/': No such file or directory
+```
+
+This problem is reported in [HDDS-3955](https://issues.apache.org/jira/browse/HDDS-3955), where a new configuration key is introduced (`ozone.om.enable.filesystem.paths`). If this is enabled, intermediate directories are created even if the object store interface is used.
+
+This configuration is turned off by default, which means that S3 and HCFS couldn't be used together.
+
+To solve the performance problems of the directory listing / rename, [HDDS-2939](https://issues.apache.org/jira/browse/HDDS-2939) is created, which propose to use a new prefix table to store the "directory" entries (=prefixes).
+
+[HDDS-4097](https://issues.apache.org/jira/browse/HDDS-4097) is created to normalize the key names based on file-system semantics if `ozone.om.enable.filesystem.paths` is enabled. But please note that `ozone.om.enable.filesystem.paths` should always be turned on if S3 and HCFS are both used which means that S3 and HCFS couldn't be used together with normalization.
+
+# Goals
+
+ * Out of the box Ozone should support both S3 and HCFS interfaces without any settings. (It's possible only for the regular, fs compatible key names)
+ * As 100% compatibility couldn't be achieved on both side we need a configuration to set the expectations for incompatible key names
+ * Default behavior of `o3fs` and `ofs` should be as close to `s3a` as possible (when s3 compatibilty is prefered)
+
+# Possible cases to support
+
+There are two main aspects of supporting both `ofs/o3fs` and `s3` together:
+
+ 1. `ofs/o3fs` require to create intermediate directory entries (for exapmle `/a/b` for the key `/b/c/c`)
+ 2. Special file-system incompatible key names require special attention
+
+The second couldn't be done with compromise.
+
+ 1. We either support all key names (including non fs compatible key names), which means `ofs/o3fs` can provide only a partial view
+ 2. Or we can normalize the key names to be fs compatible (which makes it possible to create inconsistent S3 keys)
+
+HDDS-3955 introduced `ozone.om.enable.filesystem.paths`, with this setting we will have two possible usage pattern:
+
+| ozone.om.enable.filesystem.paths= | true | false
+|-|-|-|
+| create itermediate dirs | YES | NO |
+| normalize key names from `ofs/o3fs` | YES | NO
+| force to normalize key names of `s3` interface | YES (1) | NO 
+| `s3` key `/a/b/c` available from `ofs/o3fs` | YES | NO
+| `s3` key `/a/b//c` available from `ofs/o3fs` | YES | NO
+| `s3` key `/a/b//c` available from `s3` | AWS S3 incompatibility | YES
+
+(1): Under implementation
+
+This proposal suggest to use a 3rd option where 100% AWS compatiblity is guaranteed in exchange of a limited `ofs/o3fs` view:
+
+| ozone.om.intermediate.dir.generation= | true |
+|-|-|-|
+| create itermediate dirs | YES | 
+| normalize key names from `ofs/o3fs` | YES |
+| force to normalize key names of `s3` interface | **NO** |
+| `s3` key `/a/b/c` available from `ofs/o3fs` | YES | 
+| `s3` key `/a/b//c` available from `ofs/o3fs` | NO | 
+| `s3` key `/a/b//c` available from `s3` | *YES* (100% AWS compatibility) |
+
+
+# Proposed solution
+
+In short: 
+
+ **I propose to make it possible to configure **normalization** and **intermediate dir creation**, independent from each other**
+ 
+It can be done in multiple ways. For the sake of simplicity, let's imagine two configuration option
+
+| configuration | behavior | 
+|-|-|
+| `ozone.om.enable.filesystem.paths=true`  | Enable intermediate dir generation **AND** key name normalization 
+| `ozone.om.enable.intermediate.dirs=true` | Enable only the intermediate dir generation
+
+## S3/HCFS Interoperability
+
+**In case of intermediate directory generation is enabled (with either of the configuraiton keys)**:
+
+When somebody creates a new key like `/a/b/c/d`, the same key should be visible from HCFS (`o3fs//` or `o3://`). `/a`, `/a/b` and `/a/b/c` should be visible as directories from HCFS.
+
+S3 should list only the `/a/b/c/d` keys, (`/a`, `/a/b`, `/a/b/c` keys, created to help HCFS, **won't be visible** if the key is created from S3)
+
+This can be done with persisting an extra flag with the implicit directory entries. These entries can be modified if they are explicit created.
+
+This flag should be added only for the keys which are created by S3. `ofs://` and `of3fs://`  create explicit directories all the time.
+
+Advantages of this approach:
+
+ 1. HCFS and S3 can work together
+ 2. S3 behavior is closer to the original AWS s3 behavior (when `/a/b/c` key is created `/a/b` won't be visible)
+
+## Handling of the incompatible paths
+
+As it's defined above the intermediate directory generation and normalization are two independent settings. (It's possible to choose only to create the intermediate directories).
+
+**If normalization is choosen**: (`ozone.om.enable.filesystem.paths=true`), all the key names will be normalized to fs-compatible name. It may cause a conflict (error) if the normalized key is already exists (or exists as a file instead of directory)
+
+**Without normalization (`ozone.om.enable.intermediate.dirs=true`)**:
+
+Creating intermediate directories might not be possible if path contains illegal characters or can't be parsed as a file system path. **These keys will be invisible from HCFS** by default. They will be ignored during the normal file list.
+
+## Using Ozone in object-store only mode
+
+Creating intermediate directories can have some overhead (write amplification is increased if many keys are written with different prefixes as we need an entry for each prefixes). This write-amplification can be handled with the current implementation: based on the measurements RocksDB has no problems with billions of keys.
+
+If none of the mentioned configurations are enabled, the intermediate directories won't be created. But in this case, the consistent view of `ofs/o3fs` couldn't be guaranteed, so `ofs/o3fs` **should be disabled and throw an exception** (But Ozone can be used as a pure S3 replacement without using as a HCFS).
+
+# Problematic cases
+
+As described in the previous section there are some cases which couldn't be supported out-of-the-box due to the differences between the flat key-space and file-system hierarchy. These cases are collected here together with the information how existing tools (AWS console, AWS cli, AWS S3A Hadoop connector) behaves.
+
+## Empty directory path
+
+With a pure object store keys with empty directory names can be created.
+
+```
+aws s3api put-object --bucket ozonetest --key a/b//y --body README.md
+```
+
+Behavior:
+ * *S3 web console*: empty dir is rendered as `____` to make it possible to navigate in
+ * **aws s3 ls**: Prefix entry is visible as `PRE /`
+ * S3A: Not visible
+ 
+Proposed behavior:
+
+ * `ozone.om.enable.intermediate.dirs=true`: `/y` is not accessible, `/a/b` directory doesn't contain this entry
+ * `ozone.om.enable.filesystem.paths=true`: key stored as `/a/b/c`  
+ 
+## Path with invalid characters (`..`,`.`) 
+
+Path segments might include parts which has file system semantics:
+
+```
+aws s3api put-object --bucket ozonetest --key a/b/../e --body README.md
+aws s3api put-object --bucket ozonetest --key a/b/./f --body README.md
+```
+
+Behavior:
+ * *S3 web console*: `.` and `..` are rendered as directories to make it possible to navigate in
+ * **aws s3 ls**: Prefix entry is visible as `PRE ../` and `PRE ./`
+ * S3A: Entries are not visible
+ 
+Proposed behavior:
+
+ * `ozone.om.enable.intermediate.dirs=true`: `e` and `f` are not visible
+ * `ozone.om.enable.filesystem.paths=true`: key stored as `/a/e` and `a/b/f`  
+
+## Key and directory with the same name
+
+It is possible to create directory and key with the same name in AWS:
+
+```
+aws s3api put-object --bucket ozonetest --key a/b/h --body README.md

Review comment:
       aws s3api put-object --bucket ozonetest --key a/b/ --body README.md
   Need one more put above




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org


[GitHub] [hadoop-ozone] elek commented on a change in pull request #1411: HDDS-4097. [DESIGN] S3/Ozone Filesystem inter-op

Posted by GitBox <gi...@apache.org>.
elek commented on a change in pull request #1411:
URL: https://github.com/apache/hadoop-ozone/pull/1411#discussion_r486428009



##########
File path: hadoop-hdds/docs/content/design/s3_hcfs.md
##########
@@ -0,0 +1,282 @@
+---
+title: S3/Ozone Filesystem inter-op 
+summary: How to support both S3 and HCFS and the same time
+date: 2020-09-09
+jira: HDDS-4097
+status: draft
+author: Marton Elek, 
+---
+<!--
+  Licensed under the Apache License, Version 2.0 (the "License");
+  you may not use this file except in compliance with the License.
+  You may obtain a copy of the License at
+
+   http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License. See accompanying LICENSE file.
+-->
+
+# Ozone S3 vs file-system semantics
+
+Ozone is an object-store for Hadoop ecosystem which can be used from multiple interfaces: 
+
+ 1. From Hadoop Compatible File Systems (will be called as *HCFS* in the remaining of this document) (RPC)
+ 2. From S3 compatible applications (REST)
+ 3. From container orchestrator as mounted volume (CSI, alpha feature)
+
+As Ozone is an object store it stores key and values in a flat hierarchy which is enough to support S3 (2). But to support Hadoop Compatible File System (and CSI), Ozone should simulated file system hierarchy.
+
+There are multiple challenges when file system hierarchy is simulated by a flat namespace:
+
+ 1. Some key patterns couldn't be easily transformed to file system path (e.g. `/a/b/../c`, `/a/b//d`, or a real key with directory path like `/b/d/`)
+ 2. Directory entries (which may have own properties) require special handling as file system interface requires a dir entry even if it's not created explicitly (for example if key `/a/b/c` is created `/a/b` supposed to be a visible directory entry for file system interface) 
+ 3. Non-recursive listing of directories can be hard (Listing direct entries under `/a` should ignore all the `/a/b/...`, `/a/b/c/...` keys) 
+ 4. Similar to listing, rename can be a costly operation as it requires to rename many keys (renaming a first level directory means a rename of all the keys with the same prefix)
+
+See also the [Hadoop S3A documentation](https://hadoop.apache.org/docs/current/hadoop-aws/tools/hadoop-aws/index.html#Introducing_the_Hadoop_S3A_client) which describes some of these problem when AWS S3 is used. (*Warnings* section)
+
+# Current status
+
+As of today *Ozone Manager* has two different interfaces (both are defined in `OmClientProtocol.proto`): 
+
+ 1. object store related functions (like *CreateKey*, *LookupKey*, *DeleteKey*,...)  
+ 2. file system related functions (like *CreateFile*, *LookupFile*,...)
+
+File system related functions uses the same flat hierarchy under the hood but includes additional functionalities. For example the `createFile` call creates all the intermediate directories for a specific key (create file `/a/b/c` will create `/a/b` and `/a` entries in the key space)
+
+Today, if a key is created from the S3 interface can cause exceptions if the intermediate directories are checked from HCFS:
+
+
+```shell
+$ aws s3api put-object --endpoint http://localhost:9878 --bucket bucket1 --key /a/b/c/d
+
+$ ozone fs -ls o3fs://bucket1.s3v/a/
+ls: `o3fs://bucket1.s3v/a/': No such file or directory
+```
+
+This problem is reported in [HDDS-3955](https://issues.apache.org/jira/browse/HDDS-3955), where a new configuration key is introduced (`ozone.om.enable.filesystem.paths`). If this is enabled, intermediate directories are created even if the object store interface is used.
+
+This configuration is turned off by default, which means that S3 and HCFS couldn't be used together.
+
+To solve the performance problems of the directory listing / rename, [HDDS-2939](https://issues.apache.org/jira/browse/HDDS-2939) is created, which propose to use a new prefix table to store the "directory" entries (=prefixes).
+
+[HDDS-4097](https://issues.apache.org/jira/browse/HDDS-4097) is created to normalize the key names based on file-system semantics if `ozone.om.enable.filesystem.paths` is enabled. But please note that `ozone.om.enable.filesystem.paths` should always be turned on if S3 and HCFS are both used which means that S3 and HCFS couldn't be used together with normalization.

Review comment:
       Yes, thanks. I also clarified this paragraph a little:
   
   > But please note that `ozone.om.enable.filesystem.paths` should always be turned on if S3 and HCFS are both used. It means that if both S3 and HCFS are used, normalization is forced, and S3 interface is not fully AWS S3 compatible. There is no option to use HCFS and S3 but with full AWS compatibility (and reduced HCFS compatibility).
   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org


[GitHub] [hadoop-ozone] elek commented on a change in pull request #1411: HDDS-4097. [DESIGN] S3/Ozone Filesystem inter-op

Posted by GitBox <gi...@apache.org>.
elek commented on a change in pull request #1411:
URL: https://github.com/apache/hadoop-ozone/pull/1411#discussion_r487496485



##########
File path: hadoop-hdds/docs/content/design/s3_hcfs.md
##########
@@ -0,0 +1,280 @@
+---
+title: S3/Ozone Filesystem inter-op 
+summary: How to support both S3 and HCFS and the same time
+date: 2020-09-09
+jira: HDDS-4097
+status: draft
+author: Marton Elek, 
+---
+<!--
+  Licensed under the Apache License, Version 2.0 (the "License");
+  you may not use this file except in compliance with the License.
+  You may obtain a copy of the License at
+
+   http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License. See accompanying LICENSE file.
+-->
+
+# Ozone S3 vs file-system semantics
+
+Ozone is an object-store for Hadoop ecosystem which can be used from multiple interfaces: 
+
+ 1. From Hadoop Compatible File Systems (will be called as *HCFS* in the remaining of this document) (RPC)
+ 2. From S3 compatible applications (REST)
+ 3. From container orchestrator as mounted volume (CSI, alpha feature)
+
+As Ozone is an object store it stores key and values in a flat hierarchy which is enough to support S3 (2). But to support Hadoop Compatible File System (and CSI), Ozone should simulated file system hierarchy.
+
+There are multiple challenges when file system hierarchy is simulated by a flat namespace:
+
+ 1. Some key patterns couldn't be easily transformed to file system path (e.g. `/a/b/../c`, `/a/b//d`, or a real key with directory path like `/b/d/`)
+ 2. Directory entries (which may have own properties) require special handling as file system interface requires a dir entry even if it's not created explicitly (for example if key `/a/b/c` is created `/a/b` supposed to be a visible directory entry for file system interface) 
+ 3. Non-recursive listing of directories can be hard (Listing direct entries under `/a` should ignore all the `/a/b/...`, `/a/b/c/...` keys) 
+ 4. Similar to listing, rename can be a costly operation as it requires to rename many keys (renaming a first level directory means a rename of all the keys with the same prefix)
+
+See also the [Hadoop S3A documentation](https://hadoop.apache.org/docs/current/hadoop-aws/tools/hadoop-aws/index.html#Introducing_the_Hadoop_S3A_client) which describes some of these problem when AWS S3 is used. (*Warnings* section)
+
+# Current status
+
+As of today *Ozone Manager* has two different interfaces (both are defined in `OmClientProtocol.proto`): 
+
+ 1. object store related functions (like *CreateKey*, *LookupKey*, *DeleteKey*,...)  
+ 2. file system related functions (like *CreateFile*, *LookupFile*,...)
+
+File system related functions uses the same flat hierarchy under the hood but includes additional functionalities. For example the `createFile` call creates all the intermediate directories for a specific key (create file `/a/b/c` will create `/a/b` and `/a` entries in the key space)
+
+Today, if a key is created from the S3 interface can cause exceptions if the intermediate directories are checked from HCFS:
+
+
+```shell
+$ aws s3api put-object --endpoint http://localhost:9878 --bucket bucket1 --key /a/b/c/d
+
+$ ozone fs -ls o3fs://bucket1.s3v/a/
+ls: `o3fs://bucket1.s3v/a/': No such file or directory
+```
+
+This problem is reported in [HDDS-3955](https://issues.apache.org/jira/browse/HDDS-3955), where a new configuration key is introduced (`ozone.om.enable.filesystem.paths`). If this is enabled, intermediate directories are created even if the object store interface is used.
+
+This configuration is turned off by default, which means that S3 and HCFS couldn't be used together.
+
+To solve the performance problems of the directory listing / rename, [HDDS-2939](https://issues.apache.org/jira/browse/HDDS-2939) is created, which propose to use a new prefix table to store the "directory" entries (=prefixes).
+
+[HDDS-4097](https://issues.apache.org/jira/browse/HDDS-4097) is created to normalize the key names based on file-system semantics if `ozone.om.enable.filesystem.paths` is enabled. But please note that `ozone.om.enable.filesystem.paths` should always be turned on if S3 and HCFS are both used. It means that if both S3 and HCFS are used, normalization is forced, and S3 interface is not fully AWS S3 compatible. There is no option to use HCFS and S3 but with full AWS compatibility (and reduced HCFS compatibility). 
+
+# Goals
+
+ * Out of the box Ozone should support both S3 and HCFS interfaces without any settings. (It's possible only for the regular, fs compatible key names)
+ * As 100% compatibility couldn't be achieved on both side we need a configuration to set the expectations for incompatible key names
+ * Default behavior of `o3fs` and `ofs` should be as close to `s3a` as possible (when s3 compatibilty is prefered)
+
+# Possible cases to support
+
+There are two main aspects of supporting both `ofs/o3fs` and `s3` together:
+
+ 1. `ofs/o3fs` require to create intermediate directory entries (for example `/a/b` for the key `/b/c/c`)

Review comment:
       Good catch, thanks, fixed.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org


[GitHub] [hadoop-ozone] arp7 commented on pull request #1411: HDDS-4097. [DESIGN] S3/Ozone Filesystem inter-op

Posted by GitBox <gi...@apache.org>.
arp7 commented on pull request #1411:
URL: https://github.com/apache/hadoop-ozone/pull/1411#issuecomment-689638762


   I have an alternate proposal, idea left in a comment. cc @bharatviswa504 @elek 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org


[GitHub] [hadoop-ozone] elek commented on a change in pull request #1411: HDDS-4097. [DESIGN] S3/Ozone Filesystem inter-op

Posted by GitBox <gi...@apache.org>.
elek commented on a change in pull request #1411:
URL: https://github.com/apache/hadoop-ozone/pull/1411#discussion_r486432322



##########
File path: hadoop-hdds/docs/content/design/s3_hcfs.md
##########
@@ -0,0 +1,282 @@
+---
+title: S3/Ozone Filesystem inter-op 
+summary: How to support both S3 and HCFS and the same time
+date: 2020-09-09
+jira: HDDS-4097
+status: draft
+author: Marton Elek, 
+---
+<!--
+  Licensed under the Apache License, Version 2.0 (the "License");
+  you may not use this file except in compliance with the License.
+  You may obtain a copy of the License at
+
+   http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License. See accompanying LICENSE file.
+-->
+
+# Ozone S3 vs file-system semantics
+
+Ozone is an object-store for Hadoop ecosystem which can be used from multiple interfaces: 
+
+ 1. From Hadoop Compatible File Systems (will be called as *HCFS* in the remaining of this document) (RPC)
+ 2. From S3 compatible applications (REST)
+ 3. From container orchestrator as mounted volume (CSI, alpha feature)
+
+As Ozone is an object store it stores key and values in a flat hierarchy which is enough to support S3 (2). But to support Hadoop Compatible File System (and CSI), Ozone should simulated file system hierarchy.
+
+There are multiple challenges when file system hierarchy is simulated by a flat namespace:
+
+ 1. Some key patterns couldn't be easily transformed to file system path (e.g. `/a/b/../c`, `/a/b//d`, or a real key with directory path like `/b/d/`)
+ 2. Directory entries (which may have own properties) require special handling as file system interface requires a dir entry even if it's not created explicitly (for example if key `/a/b/c` is created `/a/b` supposed to be a visible directory entry for file system interface) 
+ 3. Non-recursive listing of directories can be hard (Listing direct entries under `/a` should ignore all the `/a/b/...`, `/a/b/c/...` keys) 
+ 4. Similar to listing, rename can be a costly operation as it requires to rename many keys (renaming a first level directory means a rename of all the keys with the same prefix)
+
+See also the [Hadoop S3A documentation](https://hadoop.apache.org/docs/current/hadoop-aws/tools/hadoop-aws/index.html#Introducing_the_Hadoop_S3A_client) which describes some of these problem when AWS S3 is used. (*Warnings* section)
+
+# Current status
+
+As of today *Ozone Manager* has two different interfaces (both are defined in `OmClientProtocol.proto`): 
+
+ 1. object store related functions (like *CreateKey*, *LookupKey*, *DeleteKey*,...)  
+ 2. file system related functions (like *CreateFile*, *LookupFile*,...)
+
+File system related functions uses the same flat hierarchy under the hood but includes additional functionalities. For example the `createFile` call creates all the intermediate directories for a specific key (create file `/a/b/c` will create `/a/b` and `/a` entries in the key space)
+
+Today, if a key is created from the S3 interface can cause exceptions if the intermediate directories are checked from HCFS:
+
+
+```shell
+$ aws s3api put-object --endpoint http://localhost:9878 --bucket bucket1 --key /a/b/c/d
+
+$ ozone fs -ls o3fs://bucket1.s3v/a/
+ls: `o3fs://bucket1.s3v/a/': No such file or directory
+```
+
+This problem is reported in [HDDS-3955](https://issues.apache.org/jira/browse/HDDS-3955), where a new configuration key is introduced (`ozone.om.enable.filesystem.paths`). If this is enabled, intermediate directories are created even if the object store interface is used.
+
+This configuration is turned off by default, which means that S3 and HCFS couldn't be used together.
+
+To solve the performance problems of the directory listing / rename, [HDDS-2939](https://issues.apache.org/jira/browse/HDDS-2939) is created, which propose to use a new prefix table to store the "directory" entries (=prefixes).
+
+[HDDS-4097](https://issues.apache.org/jira/browse/HDDS-4097) is created to normalize the key names based on file-system semantics if `ozone.om.enable.filesystem.paths` is enabled. But please note that `ozone.om.enable.filesystem.paths` should always be turned on if S3 and HCFS are both used which means that S3 and HCFS couldn't be used together with normalization.
+
+# Goals
+
+ * Out of the box Ozone should support both S3 and HCFS interfaces without any settings. (It's possible only for the regular, fs compatible key names)
+ * As 100% compatibility couldn't be achieved on both side we need a configuration to set the expectations for incompatible key names
+ * Default behavior of `o3fs` and `ofs` should be as close to `s3a` as possible (when s3 compatibilty is prefered)
+
+# Possible cases to support
+
+There are two main aspects of supporting both `ofs/o3fs` and `s3` together:
+
+ 1. `ofs/o3fs` require to create intermediate directory entries (for exapmle `/a/b` for the key `/b/c/c`)
+ 2. Special file-system incompatible key names require special attention
+
+The second couldn't be done with compromise.
+
+ 1. We either support all key names (including non fs compatible key names), which means `ofs/o3fs` can provide only a partial view
+ 2. Or we can normalize the key names to be fs compatible (which makes it possible to create inconsistent S3 keys)
+
+HDDS-3955 introduced `ozone.om.enable.filesystem.paths`, with this setting we will have two possible usage pattern:
+
+| ozone.om.enable.filesystem.paths= | true | false
+|-|-|-|
+| create itermediate dirs | YES | NO |
+| normalize key names from `ofs/o3fs` | YES | NO
+| force to normalize key names of `s3` interface | YES (1) | NO 
+| `s3` key `/a/b/c` available from `ofs/o3fs` | YES | NO
+| `s3` key `/a/b//c` available from `ofs/o3fs` | YES | NO
+| `s3` key `/a/b//c` available from `s3` | AWS S3 incompatibility | YES
+
+(1): Under implementation
+
+This proposal suggest to use a 3rd option where 100% AWS compatiblity is guaranteed in exchange of a limited `ofs/o3fs` view:
+
+| ozone.om.intermediate.dir.generation= | true |
+|-|-|-|
+| create itermediate dirs | YES | 
+| normalize key names from `ofs/o3fs` | YES |

Review comment:
       Yes. I just tried to show here this out-of-the-box behavior. But agree. 
   
   We have it for free and actually this couldn't be removed: if we use ofs/o3fs, the keys *from* ofs/o3fs should always be normalized.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org


[GitHub] [hadoop-ozone] elek commented on a change in pull request #1411: HDDS-4097. [DESIGN] S3/Ozone Filesystem inter-op

Posted by GitBox <gi...@apache.org>.
elek commented on a change in pull request #1411:
URL: https://github.com/apache/hadoop-ozone/pull/1411#discussion_r486441102



##########
File path: hadoop-hdds/docs/content/design/s3_hcfs.md
##########
@@ -0,0 +1,282 @@
+---
+title: S3/Ozone Filesystem inter-op 
+summary: How to support both S3 and HCFS and the same time
+date: 2020-09-09
+jira: HDDS-4097
+status: draft
+author: Marton Elek, 
+---
+<!--
+  Licensed under the Apache License, Version 2.0 (the "License");
+  you may not use this file except in compliance with the License.
+  You may obtain a copy of the License at
+
+   http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License. See accompanying LICENSE file.
+-->
+
+# Ozone S3 vs file-system semantics
+
+Ozone is an object-store for Hadoop ecosystem which can be used from multiple interfaces: 
+
+ 1. From Hadoop Compatible File Systems (will be called as *HCFS* in the remaining of this document) (RPC)
+ 2. From S3 compatible applications (REST)
+ 3. From container orchestrator as mounted volume (CSI, alpha feature)
+
+As Ozone is an object store it stores key and values in a flat hierarchy which is enough to support S3 (2). But to support Hadoop Compatible File System (and CSI), Ozone should simulated file system hierarchy.
+
+There are multiple challenges when file system hierarchy is simulated by a flat namespace:
+
+ 1. Some key patterns couldn't be easily transformed to file system path (e.g. `/a/b/../c`, `/a/b//d`, or a real key with directory path like `/b/d/`)
+ 2. Directory entries (which may have own properties) require special handling as file system interface requires a dir entry even if it's not created explicitly (for example if key `/a/b/c` is created `/a/b` supposed to be a visible directory entry for file system interface) 
+ 3. Non-recursive listing of directories can be hard (Listing direct entries under `/a` should ignore all the `/a/b/...`, `/a/b/c/...` keys) 
+ 4. Similar to listing, rename can be a costly operation as it requires to rename many keys (renaming a first level directory means a rename of all the keys with the same prefix)
+
+See also the [Hadoop S3A documentation](https://hadoop.apache.org/docs/current/hadoop-aws/tools/hadoop-aws/index.html#Introducing_the_Hadoop_S3A_client) which describes some of these problem when AWS S3 is used. (*Warnings* section)
+
+# Current status
+
+As of today *Ozone Manager* has two different interfaces (both are defined in `OmClientProtocol.proto`): 
+
+ 1. object store related functions (like *CreateKey*, *LookupKey*, *DeleteKey*,...)  
+ 2. file system related functions (like *CreateFile*, *LookupFile*,...)
+
+File system related functions uses the same flat hierarchy under the hood but includes additional functionalities. For example the `createFile` call creates all the intermediate directories for a specific key (create file `/a/b/c` will create `/a/b` and `/a` entries in the key space)
+
+Today, if a key is created from the S3 interface can cause exceptions if the intermediate directories are checked from HCFS:
+
+
+```shell
+$ aws s3api put-object --endpoint http://localhost:9878 --bucket bucket1 --key /a/b/c/d
+
+$ ozone fs -ls o3fs://bucket1.s3v/a/
+ls: `o3fs://bucket1.s3v/a/': No such file or directory
+```
+
+This problem is reported in [HDDS-3955](https://issues.apache.org/jira/browse/HDDS-3955), where a new configuration key is introduced (`ozone.om.enable.filesystem.paths`). If this is enabled, intermediate directories are created even if the object store interface is used.
+
+This configuration is turned off by default, which means that S3 and HCFS couldn't be used together.
+
+To solve the performance problems of the directory listing / rename, [HDDS-2939](https://issues.apache.org/jira/browse/HDDS-2939) is created, which propose to use a new prefix table to store the "directory" entries (=prefixes).
+
+[HDDS-4097](https://issues.apache.org/jira/browse/HDDS-4097) is created to normalize the key names based on file-system semantics if `ozone.om.enable.filesystem.paths` is enabled. But please note that `ozone.om.enable.filesystem.paths` should always be turned on if S3 and HCFS are both used which means that S3 and HCFS couldn't be used together with normalization.
+
+# Goals
+
+ * Out of the box Ozone should support both S3 and HCFS interfaces without any settings. (It's possible only for the regular, fs compatible key names)
+ * As 100% compatibility couldn't be achieved on both side we need a configuration to set the expectations for incompatible key names
+ * Default behavior of `o3fs` and `ofs` should be as close to `s3a` as possible (when s3 compatibilty is prefered)
+
+# Possible cases to support
+
+There are two main aspects of supporting both `ofs/o3fs` and `s3` together:
+
+ 1. `ofs/o3fs` require to create intermediate directory entries (for exapmle `/a/b` for the key `/b/c/c`)
+ 2. Special file-system incompatible key names require special attention
+
+The second couldn't be done with compromise.
+
+ 1. We either support all key names (including non fs compatible key names), which means `ofs/o3fs` can provide only a partial view
+ 2. Or we can normalize the key names to be fs compatible (which makes it possible to create inconsistent S3 keys)
+
+HDDS-3955 introduced `ozone.om.enable.filesystem.paths`, with this setting we will have two possible usage pattern:
+
+| ozone.om.enable.filesystem.paths= | true | false
+|-|-|-|
+| create itermediate dirs | YES | NO |
+| normalize key names from `ofs/o3fs` | YES | NO
+| force to normalize key names of `s3` interface | YES (1) | NO 
+| `s3` key `/a/b/c` available from `ofs/o3fs` | YES | NO
+| `s3` key `/a/b//c` available from `ofs/o3fs` | YES | NO
+| `s3` key `/a/b//c` available from `s3` | AWS S3 incompatibility | YES
+
+(1): Under implementation
+
+This proposal suggest to use a 3rd option where 100% AWS compatiblity is guaranteed in exchange of a limited `ofs/o3fs` view:
+
+| ozone.om.intermediate.dir.generation= | true |
+|-|-|-|
+| create itermediate dirs | YES | 
+| normalize key names from `ofs/o3fs` | YES |
+| force to normalize key names of `s3` interface | **NO** |
+| `s3` key `/a/b/c` available from `ofs/o3fs` | YES | 
+| `s3` key `/a/b//c` available from `ofs/o3fs` | NO | 
+| `s3` key `/a/b//c` available from `s3` | *YES* (100% AWS compatibility) |
+
+
+# Proposed solution
+
+In short: 
+
+ **I propose to make it possible to configure **normalization** and **intermediate dir creation**, independent from each other**
+ 
+It can be done in multiple ways. For the sake of simplicity, let's imagine two configuration option
+
+| configuration | behavior | 
+|-|-|
+| `ozone.om.enable.filesystem.paths=true`  | Enable intermediate dir generation **AND** key name normalization 
+| `ozone.om.enable.intermediate.dirs=true` | Enable only the intermediate dir generation
+
+## S3/HCFS Interoperability
+
+**In case of intermediate directory generation is enabled (with either of the configuraiton keys)**:
+
+When somebody creates a new key like `/a/b/c/d`, the same key should be visible from HCFS (`o3fs//` or `o3://`). `/a`, `/a/b` and `/a/b/c` should be visible as directories from HCFS.
+
+S3 should list only the `/a/b/c/d` keys, (`/a`, `/a/b`, `/a/b/c` keys, created to help HCFS, **won't be visible** if the key is created from S3)
+
+This can be done with persisting an extra flag with the implicit directory entries. These entries can be modified if they are explicit created.
+
+This flag should be added only for the keys which are created by S3. `ofs://` and `of3fs://`  create explicit directories all the time.
+
+Advantages of this approach:
+
+ 1. HCFS and S3 can work together
+ 2. S3 behavior is closer to the original AWS s3 behavior (when `/a/b/c` key is created `/a/b` won't be visible)
+
+## Handling of the incompatible paths
+
+As it's defined above the intermediate directory generation and normalization are two independent settings. (It's possible to choose only to create the intermediate directories).
+
+**If normalization is choosen**: (`ozone.om.enable.filesystem.paths=true`), all the key names will be normalized to fs-compatible name. It may cause a conflict (error) if the normalized key is already exists (or exists as a file instead of directory)
+
+**Without normalization (`ozone.om.enable.intermediate.dirs=true`)**:
+
+Creating intermediate directories might not be possible if path contains illegal characters or can't be parsed as a file system path. **These keys will be invisible from HCFS** by default. They will be ignored during the normal file list.
+
+## Using Ozone in object-store only mode
+
+Creating intermediate directories can have some overhead (write amplification is increased if many keys are written with different prefixes as we need an entry for each prefixes). This write-amplification can be handled with the current implementation: based on the measurements RocksDB has no problems with billions of keys.
+
+If none of the mentioned configurations are enabled, the intermediate directories won't be created. But in this case, the consistent view of `ofs/o3fs` couldn't be guaranteed, so `ofs/o3fs` **should be disabled and throw an exception** (But Ozone can be used as a pure S3 replacement without using as a HCFS).
+
+# Problematic cases
+
+As described in the previous section there are some cases which couldn't be supported out-of-the-box due to the differences between the flat key-space and file-system hierarchy. These cases are collected here together with the information how existing tools (AWS console, AWS cli, AWS S3A Hadoop connector) behaves.
+
+## Empty directory path
+
+With a pure object store keys with empty directory names can be created.
+
+```
+aws s3api put-object --bucket ozonetest --key a/b//y --body README.md
+```
+
+Behavior:
+ * *S3 web console*: empty dir is rendered as `____` to make it possible to navigate in
+ * **aws s3 ls**: Prefix entry is visible as `PRE /`
+ * S3A: Not visible
+ 
+Proposed behavior:
+
+ * `ozone.om.enable.intermediate.dirs=true`: `/y` is not accessible, `/a/b` directory doesn't contain this entry
+ * `ozone.om.enable.filesystem.paths=true`: key stored as `/a/b/c`  
+ 
+## Path with invalid characters (`..`,`.`) 
+
+Path segments might include parts which has file system semantics:
+
+```
+aws s3api put-object --bucket ozonetest --key a/b/../e --body README.md
+aws s3api put-object --bucket ozonetest --key a/b/./f --body README.md
+```
+
+Behavior:
+ * *S3 web console*: `.` and `..` are rendered as directories to make it possible to navigate in
+ * **aws s3 ls**: Prefix entry is visible as `PRE ../` and `PRE ./`
+ * S3A: Entries are not visible
+ 
+Proposed behavior:
+
+ * `ozone.om.enable.intermediate.dirs=true`: `e` and `f` are not visible
+ * `ozone.om.enable.filesystem.paths=true`: key stored as `/a/e` and `a/b/f`  
+
+## Key and directory with the same name
+
+It is possible to create directory and key with the same name in AWS:
+
+```
+aws s3api put-object --bucket ozonetest --key a/b/h --body README.md

Review comment:
       Why is it required? I think the current example defines the behavior when only s3 part is used.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org


[GitHub] [hadoop-ozone] bharatviswa504 commented on a change in pull request #1411: HDDS-4097. [DESIGN] S3/Ozone Filesystem inter-op

Posted by GitBox <gi...@apache.org>.
bharatviswa504 commented on a change in pull request #1411:
URL: https://github.com/apache/hadoop-ozone/pull/1411#discussion_r485778176



##########
File path: hadoop-hdds/docs/content/design/s3_hcfs.md
##########
@@ -0,0 +1,282 @@
+---
+title: S3/Ozone Filesystem inter-op 
+summary: How to support both S3 and HCFS and the same time
+date: 2020-09-09
+jira: HDDS-4097
+status: draft
+author: Marton Elek, 
+---
+<!--
+  Licensed under the Apache License, Version 2.0 (the "License");
+  you may not use this file except in compliance with the License.
+  You may obtain a copy of the License at
+
+   http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License. See accompanying LICENSE file.
+-->
+
+# Ozone S3 vs file-system semantics
+
+Ozone is an object-store for Hadoop ecosystem which can be used from multiple interfaces: 
+
+ 1. From Hadoop Compatible File Systems (will be called as *HCFS* in the remaining of this document) (RPC)
+ 2. From S3 compatible applications (REST)
+ 3. From container orchestrator as mounted volume (CSI, alpha feature)
+
+As Ozone is an object store it stores key and values in a flat hierarchy which is enough to support S3 (2). But to support Hadoop Compatible File System (and CSI), Ozone should simulated file system hierarchy.
+
+There are multiple challenges when file system hierarchy is simulated by a flat namespace:
+
+ 1. Some key patterns couldn't be easily transformed to file system path (e.g. `/a/b/../c`, `/a/b//d`, or a real key with directory path like `/b/d/`)
+ 2. Directory entries (which may have own properties) require special handling as file system interface requires a dir entry even if it's not created explicitly (for example if key `/a/b/c` is created `/a/b` supposed to be a visible directory entry for file system interface) 
+ 3. Non-recursive listing of directories can be hard (Listing direct entries under `/a` should ignore all the `/a/b/...`, `/a/b/c/...` keys) 
+ 4. Similar to listing, rename can be a costly operation as it requires to rename many keys (renaming a first level directory means a rename of all the keys with the same prefix)
+
+See also the [Hadoop S3A documentation](https://hadoop.apache.org/docs/current/hadoop-aws/tools/hadoop-aws/index.html#Introducing_the_Hadoop_S3A_client) which describes some of these problem when AWS S3 is used. (*Warnings* section)
+
+# Current status
+
+As of today *Ozone Manager* has two different interfaces (both are defined in `OmClientProtocol.proto`): 
+
+ 1. object store related functions (like *CreateKey*, *LookupKey*, *DeleteKey*,...)  
+ 2. file system related functions (like *CreateFile*, *LookupFile*,...)
+
+File system related functions uses the same flat hierarchy under the hood but includes additional functionalities. For example the `createFile` call creates all the intermediate directories for a specific key (create file `/a/b/c` will create `/a/b` and `/a` entries in the key space)
+
+Today, if a key is created from the S3 interface can cause exceptions if the intermediate directories are checked from HCFS:
+
+
+```shell
+$ aws s3api put-object --endpoint http://localhost:9878 --bucket bucket1 --key /a/b/c/d
+
+$ ozone fs -ls o3fs://bucket1.s3v/a/
+ls: `o3fs://bucket1.s3v/a/': No such file or directory
+```
+
+This problem is reported in [HDDS-3955](https://issues.apache.org/jira/browse/HDDS-3955), where a new configuration key is introduced (`ozone.om.enable.filesystem.paths`). If this is enabled, intermediate directories are created even if the object store interface is used.
+
+This configuration is turned off by default, which means that S3 and HCFS couldn't be used together.
+
+To solve the performance problems of the directory listing / rename, [HDDS-2939](https://issues.apache.org/jira/browse/HDDS-2939) is created, which propose to use a new prefix table to store the "directory" entries (=prefixes).
+
+[HDDS-4097](https://issues.apache.org/jira/browse/HDDS-4097) is created to normalize the key names based on file-system semantics if `ozone.om.enable.filesystem.paths` is enabled. But please note that `ozone.om.enable.filesystem.paths` should always be turned on if S3 and HCFS are both used which means that S3 and HCFS couldn't be used together with normalization.

Review comment:
        S3 and HCFS couldn't be used together **without** normalization??




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org


[GitHub] [hadoop-ozone] elek commented on a change in pull request #1411: HDDS-4097. [DESIGN] S3/Ozone Filesystem inter-op

Posted by GitBox <gi...@apache.org>.
elek commented on a change in pull request #1411:
URL: https://github.com/apache/hadoop-ozone/pull/1411#discussion_r486439769



##########
File path: hadoop-hdds/docs/content/design/s3_hcfs.md
##########
@@ -0,0 +1,282 @@
+---
+title: S3/Ozone Filesystem inter-op 
+summary: How to support both S3 and HCFS and the same time
+date: 2020-09-09
+jira: HDDS-4097
+status: draft
+author: Marton Elek, 
+---
+<!--
+  Licensed under the Apache License, Version 2.0 (the "License");
+  you may not use this file except in compliance with the License.
+  You may obtain a copy of the License at
+
+   http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License. See accompanying LICENSE file.
+-->
+
+# Ozone S3 vs file-system semantics
+
+Ozone is an object-store for Hadoop ecosystem which can be used from multiple interfaces: 
+
+ 1. From Hadoop Compatible File Systems (will be called as *HCFS* in the remaining of this document) (RPC)
+ 2. From S3 compatible applications (REST)
+ 3. From container orchestrator as mounted volume (CSI, alpha feature)
+
+As Ozone is an object store it stores key and values in a flat hierarchy which is enough to support S3 (2). But to support Hadoop Compatible File System (and CSI), Ozone should simulated file system hierarchy.
+
+There are multiple challenges when file system hierarchy is simulated by a flat namespace:
+
+ 1. Some key patterns couldn't be easily transformed to file system path (e.g. `/a/b/../c`, `/a/b//d`, or a real key with directory path like `/b/d/`)
+ 2. Directory entries (which may have own properties) require special handling as file system interface requires a dir entry even if it's not created explicitly (for example if key `/a/b/c` is created `/a/b` supposed to be a visible directory entry for file system interface) 
+ 3. Non-recursive listing of directories can be hard (Listing direct entries under `/a` should ignore all the `/a/b/...`, `/a/b/c/...` keys) 
+ 4. Similar to listing, rename can be a costly operation as it requires to rename many keys (renaming a first level directory means a rename of all the keys with the same prefix)
+
+See also the [Hadoop S3A documentation](https://hadoop.apache.org/docs/current/hadoop-aws/tools/hadoop-aws/index.html#Introducing_the_Hadoop_S3A_client) which describes some of these problem when AWS S3 is used. (*Warnings* section)
+
+# Current status
+
+As of today *Ozone Manager* has two different interfaces (both are defined in `OmClientProtocol.proto`): 
+
+ 1. object store related functions (like *CreateKey*, *LookupKey*, *DeleteKey*,...)  
+ 2. file system related functions (like *CreateFile*, *LookupFile*,...)
+
+File system related functions uses the same flat hierarchy under the hood but includes additional functionalities. For example the `createFile` call creates all the intermediate directories for a specific key (create file `/a/b/c` will create `/a/b` and `/a` entries in the key space)
+
+Today, if a key is created from the S3 interface can cause exceptions if the intermediate directories are checked from HCFS:
+
+
+```shell
+$ aws s3api put-object --endpoint http://localhost:9878 --bucket bucket1 --key /a/b/c/d
+
+$ ozone fs -ls o3fs://bucket1.s3v/a/
+ls: `o3fs://bucket1.s3v/a/': No such file or directory
+```
+
+This problem is reported in [HDDS-3955](https://issues.apache.org/jira/browse/HDDS-3955), where a new configuration key is introduced (`ozone.om.enable.filesystem.paths`). If this is enabled, intermediate directories are created even if the object store interface is used.
+
+This configuration is turned off by default, which means that S3 and HCFS couldn't be used together.
+
+To solve the performance problems of the directory listing / rename, [HDDS-2939](https://issues.apache.org/jira/browse/HDDS-2939) is created, which propose to use a new prefix table to store the "directory" entries (=prefixes).
+
+[HDDS-4097](https://issues.apache.org/jira/browse/HDDS-4097) is created to normalize the key names based on file-system semantics if `ozone.om.enable.filesystem.paths` is enabled. But please note that `ozone.om.enable.filesystem.paths` should always be turned on if S3 and HCFS are both used which means that S3 and HCFS couldn't be used together with normalization.
+
+# Goals
+
+ * Out of the box Ozone should support both S3 and HCFS interfaces without any settings. (It's possible only for the regular, fs compatible key names)
+ * As 100% compatibility couldn't be achieved on both side we need a configuration to set the expectations for incompatible key names
+ * Default behavior of `o3fs` and `ofs` should be as close to `s3a` as possible (when s3 compatibilty is prefered)
+
+# Possible cases to support
+
+There are two main aspects of supporting both `ofs/o3fs` and `s3` together:
+
+ 1. `ofs/o3fs` require to create intermediate directory entries (for exapmle `/a/b` for the key `/b/c/c`)
+ 2. Special file-system incompatible key names require special attention
+
+The second couldn't be done with compromise.
+
+ 1. We either support all key names (including non fs compatible key names), which means `ofs/o3fs` can provide only a partial view
+ 2. Or we can normalize the key names to be fs compatible (which makes it possible to create inconsistent S3 keys)
+
+HDDS-3955 introduced `ozone.om.enable.filesystem.paths`, with this setting we will have two possible usage pattern:
+
+| ozone.om.enable.filesystem.paths= | true | false
+|-|-|-|
+| create itermediate dirs | YES | NO |
+| normalize key names from `ofs/o3fs` | YES | NO
+| force to normalize key names of `s3` interface | YES (1) | NO 
+| `s3` key `/a/b/c` available from `ofs/o3fs` | YES | NO
+| `s3` key `/a/b//c` available from `ofs/o3fs` | YES | NO
+| `s3` key `/a/b//c` available from `s3` | AWS S3 incompatibility | YES
+
+(1): Under implementation
+
+This proposal suggest to use a 3rd option where 100% AWS compatiblity is guaranteed in exchange of a limited `ofs/o3fs` view:
+
+| ozone.om.intermediate.dir.generation= | true |
+|-|-|-|
+| create itermediate dirs | YES | 
+| normalize key names from `ofs/o3fs` | YES |
+| force to normalize key names of `s3` interface | **NO** |
+| `s3` key `/a/b/c` available from `ofs/o3fs` | YES | 
+| `s3` key `/a/b//c` available from `ofs/o3fs` | NO | 
+| `s3` key `/a/b//c` available from `s3` | *YES* (100% AWS compatibility) |
+
+
+# Proposed solution
+
+In short: 
+
+ **I propose to make it possible to configure **normalization** and **intermediate dir creation**, independent from each other**
+ 
+It can be done in multiple ways. For the sake of simplicity, let's imagine two configuration option
+
+| configuration | behavior | 
+|-|-|
+| `ozone.om.enable.filesystem.paths=true`  | Enable intermediate dir generation **AND** key name normalization 
+| `ozone.om.enable.intermediate.dirs=true` | Enable only the intermediate dir generation
+
+## S3/HCFS Interoperability
+
+**In case of intermediate directory generation is enabled (with either of the configuraiton keys)**:
+
+When somebody creates a new key like `/a/b/c/d`, the same key should be visible from HCFS (`o3fs//` or `o3://`). `/a`, `/a/b` and `/a/b/c` should be visible as directories from HCFS.
+
+S3 should list only the `/a/b/c/d` keys, (`/a`, `/a/b`, `/a/b/c` keys, created to help HCFS, **won't be visible** if the key is created from S3)
+
+This can be done with persisting an extra flag with the implicit directory entries. These entries can be modified if they are explicit created.
+
+This flag should be added only for the keys which are created by S3. `ofs://` and `of3fs://`  create explicit directories all the time.
+
+Advantages of this approach:
+
+ 1. HCFS and S3 can work together
+ 2. S3 behavior is closer to the original AWS s3 behavior (when `/a/b/c` key is created `/a/b` won't be visible)
+
+## Handling of the incompatible paths
+
+As it's defined above the intermediate directory generation and normalization are two independent settings. (It's possible to choose only to create the intermediate directories).
+
+**If normalization is choosen**: (`ozone.om.enable.filesystem.paths=true`), all the key names will be normalized to fs-compatible name. It may cause a conflict (error) if the normalized key is already exists (or exists as a file instead of directory)
+
+**Without normalization (`ozone.om.enable.intermediate.dirs=true`)**:
+
+Creating intermediate directories might not be possible if path contains illegal characters or can't be parsed as a file system path. **These keys will be invisible from HCFS** by default. They will be ignored during the normal file list.
+
+## Using Ozone in object-store only mode
+
+Creating intermediate directories can have some overhead (write amplification is increased if many keys are written with different prefixes as we need an entry for each prefixes). This write-amplification can be handled with the current implementation: based on the measurements RocksDB has no problems with billions of keys.
+
+If none of the mentioned configurations are enabled, the intermediate directories won't be created. But in this case, the consistent view of `ofs/o3fs` couldn't be guaranteed, so `ofs/o3fs` **should be disabled and throw an exception** (But Ozone can be used as a pure S3 replacement without using as a HCFS).
+
+# Problematic cases
+
+As described in the previous section there are some cases which couldn't be supported out-of-the-box due to the differences between the flat key-space and file-system hierarchy. These cases are collected here together with the information how existing tools (AWS console, AWS cli, AWS S3A Hadoop connector) behaves.
+
+## Empty directory path
+
+With a pure object store keys with empty directory names can be created.
+
+```
+aws s3api put-object --bucket ozonetest --key a/b//y --body README.md
+```
+
+Behavior:
+ * *S3 web console*: empty dir is rendered as `____` to make it possible to navigate in
+ * **aws s3 ls**: Prefix entry is visible as `PRE /`
+ * S3A: Not visible
+ 
+Proposed behavior:
+
+ * `ozone.om.enable.intermediate.dirs=true`: `/y` is not accessible, `/a/b` directory doesn't contain this entry
+ * `ozone.om.enable.filesystem.paths=true`: key stored as `/a/b/c`  
+ 
+## Path with invalid characters (`..`,`.`) 
+
+Path segments might include parts which has file system semantics:
+
+```
+aws s3api put-object --bucket ozonetest --key a/b/../e --body README.md
+aws s3api put-object --bucket ozonetest --key a/b/./f --body README.md
+```
+
+Behavior:
+ * *S3 web console*: `.` and `..` are rendered as directories to make it possible to navigate in
+ * **aws s3 ls**: Prefix entry is visible as `PRE ../` and `PRE ./`
+ * S3A: Entries are not visible
+ 
+Proposed behavior:
+
+ * `ozone.om.enable.intermediate.dirs=true`: `e` and `f` are not visible
+ * `ozone.om.enable.filesystem.paths=true`: key stored as `/a/e` and `a/b/f`  

Review comment:
       Sure, I added.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org


[GitHub] [hadoop-ozone] elek commented on a change in pull request #1411: HDDS-4097. [DESIGN] S3/Ozone Filesystem inter-op

Posted by GitBox <gi...@apache.org>.
elek commented on a change in pull request #1411:
URL: https://github.com/apache/hadoop-ozone/pull/1411#discussion_r487495511



##########
File path: hadoop-hdds/docs/content/design/s3_hcfs.md
##########
@@ -0,0 +1,282 @@
+---
+title: S3/Ozone Filesystem inter-op 
+summary: How to support both S3 and HCFS and the same time
+date: 2020-09-09
+jira: HDDS-4097
+status: draft
+author: Marton Elek, 
+---
+<!--
+  Licensed under the Apache License, Version 2.0 (the "License");
+  you may not use this file except in compliance with the License.
+  You may obtain a copy of the License at
+
+   http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License. See accompanying LICENSE file.
+-->
+
+# Ozone S3 vs file-system semantics
+
+Ozone is an object-store for Hadoop ecosystem which can be used from multiple interfaces: 
+
+ 1. From Hadoop Compatible File Systems (will be called as *HCFS* in the remaining of this document) (RPC)
+ 2. From S3 compatible applications (REST)
+ 3. From container orchestrator as mounted volume (CSI, alpha feature)
+
+As Ozone is an object store it stores key and values in a flat hierarchy which is enough to support S3 (2). But to support Hadoop Compatible File System (and CSI), Ozone should simulated file system hierarchy.
+
+There are multiple challenges when file system hierarchy is simulated by a flat namespace:
+
+ 1. Some key patterns couldn't be easily transformed to file system path (e.g. `/a/b/../c`, `/a/b//d`, or a real key with directory path like `/b/d/`)
+ 2. Directory entries (which may have own properties) require special handling as file system interface requires a dir entry even if it's not created explicitly (for example if key `/a/b/c` is created `/a/b` supposed to be a visible directory entry for file system interface) 
+ 3. Non-recursive listing of directories can be hard (Listing direct entries under `/a` should ignore all the `/a/b/...`, `/a/b/c/...` keys) 
+ 4. Similar to listing, rename can be a costly operation as it requires to rename many keys (renaming a first level directory means a rename of all the keys with the same prefix)
+
+See also the [Hadoop S3A documentation](https://hadoop.apache.org/docs/current/hadoop-aws/tools/hadoop-aws/index.html#Introducing_the_Hadoop_S3A_client) which describes some of these problem when AWS S3 is used. (*Warnings* section)
+
+# Current status
+
+As of today *Ozone Manager* has two different interfaces (both are defined in `OmClientProtocol.proto`): 
+
+ 1. object store related functions (like *CreateKey*, *LookupKey*, *DeleteKey*,...)  
+ 2. file system related functions (like *CreateFile*, *LookupFile*,...)
+
+File system related functions uses the same flat hierarchy under the hood but includes additional functionalities. For example the `createFile` call creates all the intermediate directories for a specific key (create file `/a/b/c` will create `/a/b` and `/a` entries in the key space)
+
+Today, if a key is created from the S3 interface can cause exceptions if the intermediate directories are checked from HCFS:
+
+
+```shell
+$ aws s3api put-object --endpoint http://localhost:9878 --bucket bucket1 --key /a/b/c/d
+
+$ ozone fs -ls o3fs://bucket1.s3v/a/
+ls: `o3fs://bucket1.s3v/a/': No such file or directory
+```
+
+This problem is reported in [HDDS-3955](https://issues.apache.org/jira/browse/HDDS-3955), where a new configuration key is introduced (`ozone.om.enable.filesystem.paths`). If this is enabled, intermediate directories are created even if the object store interface is used.
+
+This configuration is turned off by default, which means that S3 and HCFS couldn't be used together.
+
+To solve the performance problems of the directory listing / rename, [HDDS-2939](https://issues.apache.org/jira/browse/HDDS-2939) is created, which propose to use a new prefix table to store the "directory" entries (=prefixes).
+
+[HDDS-4097](https://issues.apache.org/jira/browse/HDDS-4097) is created to normalize the key names based on file-system semantics if `ozone.om.enable.filesystem.paths` is enabled. But please note that `ozone.om.enable.filesystem.paths` should always be turned on if S3 and HCFS are both used which means that S3 and HCFS couldn't be used together with normalization.
+
+# Goals
+
+ * Out of the box Ozone should support both S3 and HCFS interfaces without any settings. (It's possible only for the regular, fs compatible key names)
+ * As 100% compatibility couldn't be achieved on both side we need a configuration to set the expectations for incompatible key names
+ * Default behavior of `o3fs` and `ofs` should be as close to `s3a` as possible (when s3 compatibilty is prefered)
+
+# Possible cases to support
+
+There are two main aspects of supporting both `ofs/o3fs` and `s3` together:
+
+ 1. `ofs/o3fs` require to create intermediate directory entries (for exapmle `/a/b` for the key `/b/c/c`)
+ 2. Special file-system incompatible key names require special attention
+
+The second couldn't be done with compromise.
+
+ 1. We either support all key names (including non fs compatible key names), which means `ofs/o3fs` can provide only a partial view
+ 2. Or we can normalize the key names to be fs compatible (which makes it possible to create inconsistent S3 keys)
+
+HDDS-3955 introduced `ozone.om.enable.filesystem.paths`, with this setting we will have two possible usage pattern:
+
+| ozone.om.enable.filesystem.paths= | true | false
+|-|-|-|
+| create itermediate dirs | YES | NO |
+| normalize key names from `ofs/o3fs` | YES | NO
+| force to normalize key names of `s3` interface | YES (1) | NO 
+| `s3` key `/a/b/c` available from `ofs/o3fs` | YES | NO
+| `s3` key `/a/b//c` available from `ofs/o3fs` | YES | NO
+| `s3` key `/a/b//c` available from `s3` | AWS S3 incompatibility | YES
+
+(1): Under implementation
+
+This proposal suggest to use a 3rd option where 100% AWS compatiblity is guaranteed in exchange of a limited `ofs/o3fs` view:
+
+| ozone.om.intermediate.dir.generation= | true |
+|-|-|-|
+| create itermediate dirs | YES | 
+| normalize key names from `ofs/o3fs` | YES |
+| force to normalize key names of `s3` interface | **NO** |
+| `s3` key `/a/b/c` available from `ofs/o3fs` | YES | 
+| `s3` key `/a/b//c` available from `ofs/o3fs` | NO | 
+| `s3` key `/a/b//c` available from `s3` | *YES* (100% AWS compatibility) |
+
+
+# Proposed solution
+
+In short: 
+
+ **I propose to make it possible to configure **normalization** and **intermediate dir creation**, independent from each other**
+ 
+It can be done in multiple ways. For the sake of simplicity, let's imagine two configuration option
+
+| configuration | behavior | 
+|-|-|
+| `ozone.om.enable.filesystem.paths=true`  | Enable intermediate dir generation **AND** key name normalization 

Review comment:
       Good question, there are three useful cases IMHO:
   
    1. Do nothing (neither normalize, nor create dir) --> Ozone can be used only from S3, ofs/o3fs is inconsistent due to the missing intermediate directories (I suggest to throw exception in this case for ofs/o3fs)
    2. create dirs, don't normalize --> 100% AWS compatibility, partial view from ofs (invalid keys not visible from ofs. eg `/a/b/c////d`)
    3. create dirs, normalize --> reduced AWS s3 compatibility, full ofs view
   
   The last option is: normalize but don't create dirs --> it doesn't make sense, as without creating intermediate dirs, ofs is not usable, therefore we don't need normalization.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org


[GitHub] [hadoop-ozone] bharatviswa504 commented on a change in pull request #1411: HDDS-4097. [DESIGN] S3/Ozone Filesystem inter-op

Posted by GitBox <gi...@apache.org>.
bharatviswa504 commented on a change in pull request #1411:
URL: https://github.com/apache/hadoop-ozone/pull/1411#discussion_r485817494



##########
File path: hadoop-hdds/docs/content/design/s3_hcfs.md
##########
@@ -67,45 +66,100 @@ To solve the performance problems of the directory listing / rename, [HDDS-2939]
 
 [HDDS-4097](https://issues.apache.org/jira/browse/HDDS-4097) is created to normalize the key names based on file-system semantics if `ozone.om.enable.filesystem.paths` is enabled. But please note that `ozone.om.enable.filesystem.paths` should always be turned on if S3 and HCFS are both used which means that S3 and HCFS couldn't be used together with normalization.
 
-## Goals
+# Goals
+
+ * Out of the box Ozone should support both S3 and HCFS interfaces without any settings. (It's possible only for the regular, fs compatible key names)
+ * As 100% compatibility couldn't be achieved on both side we need a configuration to set the expectations for incompatible key names
+ * Default behavior of `o3fs` and `ofs` should be as close to `s3a` as possible (when s3 compatibilty is prefered)
+
+# Possible cases to support
+
+There are two main aspects of supporting both `ofs/o3fs` and `s3` together:
+
+ 1. `ofs/o3fs` require to create intermediate directory entries (for exapmle `/a/b` for the key `/b/c/c`)
+ 2. Special file-system incompatible key names require special attention
+
+The second couldn't be done with compromise.
+
+ 1. We either support all key names (including non fs compatible key names), which means `ofs/o3fs` can provide only a partial view
+ 2. Or we can normalize the key names to be fs compatible (which makes it possible to create inconsistent S3 keys)
+
+HDDS-3955 introduced `ozone.om.enable.filesystem.paths`, with this setting we will have two possible usage pattern:
+
+| ozone.om.enable.filesystem.paths= | true | false
+|-|-|-|
+| create itermediate dirs | YES | NO |
+| normalize key names from `ofs/o3fs` | YES | NO
+| force to normalize key names of `s3` interface | YES (1) | NO 
+| `s3` key `/a/b/c` available from `ofs/o3fs` | YES | NO
+| `s3` key `/a/b//c` available from `ofs/o3fs` | YES | NO
+| `s3` key `/a/b//c` available from `s3` | AWS S3 incompatibility | YES
+
+(1): Under implementation
 
- * Out of the box Ozone should support both S3 and HCFS interfaces without any settings. (It's possible only for the regular path)
- * As 100% compatibility couldn't be achieved on both side we need a configuration to set the expectations in case of incompatible key names
- * Default behavior of `o3fs` and `ofs` should be as close to `s3a` as possible
+This proposal suggest to use a 3rd option where 100% AWS compatiblity is guaranteed in exchange of a limited `ofs/o3fs` view:

Review comment:
       We also have another way right, an existing bucket can be exposed created via CLI to be exposed to S3, what semantics that bucket will get?
   
   >For buckets created via FS interface, the FS semantics will always take precedence
   Buckets creation is possible via only OFS, what about O3fs?
   
   >If the global setting is enabled, then the value of the setting at the time of bucket creation is sampled and that takes >effect for the lifetime of the bucket.
   
   A bucket created via Shell, when global flag (assuming ozone.om.enable.filesystem.paths=true), they will follow FS semantics and with slight S3 incompatibility.
   So, a bucket created via Shell, when global flag (assuming ozone.om.enable.filesystem.paths=false), they will follow S3 semantics and with broken FS semantics or completely disallow.
   
   Written from my understanding, as I have not got the complete context of the proposal.
   
   I might be missing somethings here.
   
   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org


[GitHub] [hadoop-ozone] bharatviswa504 commented on a change in pull request #1411: HDDS-4097. [DESIGN] S3/Ozone Filesystem inter-op

Posted by GitBox <gi...@apache.org>.
bharatviswa504 commented on a change in pull request #1411:
URL: https://github.com/apache/hadoop-ozone/pull/1411#discussion_r485804988



##########
File path: hadoop-hdds/docs/content/design/s3_hcfs.md
##########
@@ -0,0 +1,282 @@
+---
+title: S3/Ozone Filesystem inter-op 
+summary: How to support both S3 and HCFS and the same time
+date: 2020-09-09
+jira: HDDS-4097
+status: draft
+author: Marton Elek, 
+---
+<!--
+  Licensed under the Apache License, Version 2.0 (the "License");
+  you may not use this file except in compliance with the License.
+  You may obtain a copy of the License at
+
+   http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License. See accompanying LICENSE file.
+-->
+
+# Ozone S3 vs file-system semantics
+
+Ozone is an object-store for Hadoop ecosystem which can be used from multiple interfaces: 
+
+ 1. From Hadoop Compatible File Systems (will be called as *HCFS* in the remaining of this document) (RPC)
+ 2. From S3 compatible applications (REST)
+ 3. From container orchestrator as mounted volume (CSI, alpha feature)
+
+As Ozone is an object store it stores key and values in a flat hierarchy which is enough to support S3 (2). But to support Hadoop Compatible File System (and CSI), Ozone should simulated file system hierarchy.
+
+There are multiple challenges when file system hierarchy is simulated by a flat namespace:
+
+ 1. Some key patterns couldn't be easily transformed to file system path (e.g. `/a/b/../c`, `/a/b//d`, or a real key with directory path like `/b/d/`)
+ 2. Directory entries (which may have own properties) require special handling as file system interface requires a dir entry even if it's not created explicitly (for example if key `/a/b/c` is created `/a/b` supposed to be a visible directory entry for file system interface) 
+ 3. Non-recursive listing of directories can be hard (Listing direct entries under `/a` should ignore all the `/a/b/...`, `/a/b/c/...` keys) 
+ 4. Similar to listing, rename can be a costly operation as it requires to rename many keys (renaming a first level directory means a rename of all the keys with the same prefix)
+
+See also the [Hadoop S3A documentation](https://hadoop.apache.org/docs/current/hadoop-aws/tools/hadoop-aws/index.html#Introducing_the_Hadoop_S3A_client) which describes some of these problem when AWS S3 is used. (*Warnings* section)
+
+# Current status
+
+As of today *Ozone Manager* has two different interfaces (both are defined in `OmClientProtocol.proto`): 
+
+ 1. object store related functions (like *CreateKey*, *LookupKey*, *DeleteKey*,...)  
+ 2. file system related functions (like *CreateFile*, *LookupFile*,...)
+
+File system related functions uses the same flat hierarchy under the hood but includes additional functionalities. For example the `createFile` call creates all the intermediate directories for a specific key (create file `/a/b/c` will create `/a/b` and `/a` entries in the key space)
+
+Today, if a key is created from the S3 interface can cause exceptions if the intermediate directories are checked from HCFS:
+
+
+```shell
+$ aws s3api put-object --endpoint http://localhost:9878 --bucket bucket1 --key /a/b/c/d
+
+$ ozone fs -ls o3fs://bucket1.s3v/a/
+ls: `o3fs://bucket1.s3v/a/': No such file or directory
+```
+
+This problem is reported in [HDDS-3955](https://issues.apache.org/jira/browse/HDDS-3955), where a new configuration key is introduced (`ozone.om.enable.filesystem.paths`). If this is enabled, intermediate directories are created even if the object store interface is used.
+
+This configuration is turned off by default, which means that S3 and HCFS couldn't be used together.
+
+To solve the performance problems of the directory listing / rename, [HDDS-2939](https://issues.apache.org/jira/browse/HDDS-2939) is created, which propose to use a new prefix table to store the "directory" entries (=prefixes).
+
+[HDDS-4097](https://issues.apache.org/jira/browse/HDDS-4097) is created to normalize the key names based on file-system semantics if `ozone.om.enable.filesystem.paths` is enabled. But please note that `ozone.om.enable.filesystem.paths` should always be turned on if S3 and HCFS are both used which means that S3 and HCFS couldn't be used together with normalization.
+
+# Goals
+
+ * Out of the box Ozone should support both S3 and HCFS interfaces without any settings. (It's possible only for the regular, fs compatible key names)
+ * As 100% compatibility couldn't be achieved on both side we need a configuration to set the expectations for incompatible key names
+ * Default behavior of `o3fs` and `ofs` should be as close to `s3a` as possible (when s3 compatibilty is prefered)
+
+# Possible cases to support
+
+There are two main aspects of supporting both `ofs/o3fs` and `s3` together:
+
+ 1. `ofs/o3fs` require to create intermediate directory entries (for exapmle `/a/b` for the key `/b/c/c`)
+ 2. Special file-system incompatible key names require special attention
+
+The second couldn't be done with compromise.
+
+ 1. We either support all key names (including non fs compatible key names), which means `ofs/o3fs` can provide only a partial view
+ 2. Or we can normalize the key names to be fs compatible (which makes it possible to create inconsistent S3 keys)
+
+HDDS-3955 introduced `ozone.om.enable.filesystem.paths`, with this setting we will have two possible usage pattern:
+
+| ozone.om.enable.filesystem.paths= | true | false
+|-|-|-|
+| create itermediate dirs | YES | NO |
+| normalize key names from `ofs/o3fs` | YES | NO
+| force to normalize key names of `s3` interface | YES (1) | NO 
+| `s3` key `/a/b/c` available from `ofs/o3fs` | YES | NO
+| `s3` key `/a/b//c` available from `ofs/o3fs` | YES | NO
+| `s3` key `/a/b//c` available from `s3` | AWS S3 incompatibility | YES
+
+(1): Under implementation
+
+This proposal suggest to use a 3rd option where 100% AWS compatiblity is guaranteed in exchange of a limited `ofs/o3fs` view:
+
+| ozone.om.intermediate.dir.generation= | true |
+|-|-|-|
+| create itermediate dirs | YES | 
+| normalize key names from `ofs/o3fs` | YES |
+| force to normalize key names of `s3` interface | **NO** |
+| `s3` key `/a/b/c` available from `ofs/o3fs` | YES | 
+| `s3` key `/a/b//c` available from `ofs/o3fs` | NO | 
+| `s3` key `/a/b//c` available from `s3` | *YES* (100% AWS compatibility) |
+
+
+# Proposed solution
+
+In short: 
+
+ **I propose to make it possible to configure **normalization** and **intermediate dir creation**, independent from each other**
+ 
+It can be done in multiple ways. For the sake of simplicity, let's imagine two configuration option
+
+| configuration | behavior | 
+|-|-|
+| `ozone.om.enable.filesystem.paths=true`  | Enable intermediate dir generation **AND** key name normalization 
+| `ozone.om.enable.intermediate.dirs=true` | Enable only the intermediate dir generation
+
+## S3/HCFS Interoperability
+
+**In case of intermediate directory generation is enabled (with either of the configuraiton keys)**:
+
+When somebody creates a new key like `/a/b/c/d`, the same key should be visible from HCFS (`o3fs//` or `o3://`). `/a`, `/a/b` and `/a/b/c` should be visible as directories from HCFS.
+
+S3 should list only the `/a/b/c/d` keys, (`/a`, `/a/b`, `/a/b/c` keys, created to help HCFS, **won't be visible** if the key is created from S3)
+
+This can be done with persisting an extra flag with the implicit directory entries. These entries can be modified if they are explicit created.
+
+This flag should be added only for the keys which are created by S3. `ofs://` and `of3fs://`  create explicit directories all the time.
+
+Advantages of this approach:
+
+ 1. HCFS and S3 can work together
+ 2. S3 behavior is closer to the original AWS s3 behavior (when `/a/b/c` key is created `/a/b` won't be visible)
+
+## Handling of the incompatible paths
+
+As it's defined above the intermediate directory generation and normalization are two independent settings. (It's possible to choose only to create the intermediate directories).
+
+**If normalization is choosen**: (`ozone.om.enable.filesystem.paths=true`), all the key names will be normalized to fs-compatible name. It may cause a conflict (error) if the normalized key is already exists (or exists as a file instead of directory)
+
+**Without normalization (`ozone.om.enable.intermediate.dirs=true`)**:
+
+Creating intermediate directories might not be possible if path contains illegal characters or can't be parsed as a file system path. **These keys will be invisible from HCFS** by default. They will be ignored during the normal file list.
+
+## Using Ozone in object-store only mode
+
+Creating intermediate directories can have some overhead (write amplification is increased if many keys are written with different prefixes as we need an entry for each prefixes). This write-amplification can be handled with the current implementation: based on the measurements RocksDB has no problems with billions of keys.
+
+If none of the mentioned configurations are enabled, the intermediate directories won't be created. But in this case, the consistent view of `ofs/o3fs` couldn't be guaranteed, so `ofs/o3fs` **should be disabled and throw an exception** (But Ozone can be used as a pure S3 replacement without using as a HCFS).
+
+# Problematic cases
+
+As described in the previous section there are some cases which couldn't be supported out-of-the-box due to the differences between the flat key-space and file-system hierarchy. These cases are collected here together with the information how existing tools (AWS console, AWS cli, AWS S3A Hadoop connector) behaves.
+
+## Empty directory path
+
+With a pure object store keys with empty directory names can be created.
+
+```
+aws s3api put-object --bucket ozonetest --key a/b//y --body README.md
+```
+
+Behavior:
+ * *S3 web console*: empty dir is rendered as `____` to make it possible to navigate in
+ * **aws s3 ls**: Prefix entry is visible as `PRE /`
+ * S3A: Not visible
+ 
+Proposed behavior:
+
+ * `ozone.om.enable.intermediate.dirs=true`: `/y` is not accessible, `/a/b` directory doesn't contain this entry
+ * `ozone.om.enable.filesystem.paths=true`: key stored as `/a/b/c`  
+ 
+## Path with invalid characters (`..`,`.`) 
+
+Path segments might include parts which has file system semantics:
+
+```
+aws s3api put-object --bucket ozonetest --key a/b/../e --body README.md
+aws s3api put-object --bucket ozonetest --key a/b/./f --body README.md
+```
+
+Behavior:
+ * *S3 web console*: `.` and `..` are rendered as directories to make it possible to navigate in
+ * **aws s3 ls**: Prefix entry is visible as `PRE ../` and `PRE ./`
+ * S3A: Entries are not visible
+ 
+Proposed behavior:
+
+ * `ozone.om.enable.intermediate.dirs=true`: `e` and `f` are not visible
+ * `ozone.om.enable.filesystem.paths=true`: key stored as `/a/e` and `a/b/f`  
+
+## Key and directory with the same name
+
+It is possible to create directory and key with the same name in AWS:
+
+```
+aws s3api put-object --bucket ozonetest --key a/b/h --body README.md
+aws s3api list-objects --bucket ozonetest --prefix=a/b/h/
+```
+
+Behavior:
+ * *S3 web console*: both directory and file are rendered
+ * **aws s3 ls**: prefix (`PRE h/`) and file (`h`) are both displayed
+ * S3A: both entries are visible with the name `/a/b/h` but firt is a file (with size) second is a directory (with directory attributes)
+ 
+Proposed behavior:
+
+ * `ozone.om.enable.intermediate.dirs=true`: show both the file and the directory with the same name (similar to S3A)
+ * `ozone.om.enable.filesystem.paths=true`: throwing exception when the second one is created  
+
+## Directory entry created with file content
+
+In this case we create a directory (key which ends with `/`) but with real file content:
+
+```
+aws s3api put-object --bucket ozonetest --key a/b/i/ --body README.md
+```
+
+Behavior:
+ * *S3 web console*: rendered as directory (couldn't be downloaded)
+ * **aws s3 ls**: showed as a prefix (`aws s3 ls s3://ozonetest/a/b`), but when the full path is used showed as a file without name (`aws s3 ls s3://ozonetest/a/b/i/`)
+ * S3A: `./bin/hdfs dfs -ls s3a://ozonetest/a/b/` shows a directory `h`, `./bin/hdfs dfs -ls s3a://ozonetest/a/b/i` shows a file `i`
+ 
+Proposed behavior:
+
+ * `ozone.om.enable.intermediate.dirs=true`: possible but `i/` is hidden from o3fs/ofs
+ * `ozone.om.enable.filesystem.paths=true`: key name is normalized to real key name
+
+## Create key and explicit create parent dir
+
+```
+aws s3api put-object --bucket ozonetest --key e/f/g/
+aws s3api put-object --bucket ozonetest --key e/f/
+```
+
+Behavior:
+
+ * S3 can support it without any problem
+ 
+Proposed behavior:
+
+After the first command `/e/f/` and `/e/` entries created in the key space (as they are required by `ofs`/`o3fs`) but **with a specific flag** (explicit=false). 
+
+AWS S3 list-objects API should exclude those entries from the result (!).
+
+Second command execution should modify the flag of `/e/f/` key to (explicit=true).
+
+## Create parent dir AND key with S3a
+
+This is the problem which is reported by [HDDS-4209](https://issues.apache.org/jira/browse/HDDS-4209)
+
+```
+hdfs dfs -mkdir -p s3a://b12345/d11/d12 # -> Success
+
+hdfs dfs -put /tmp/file1 s3a://b12345/d11/d12/file1 # -> fails with below error
+```
+
+Proposed behavior:
+
+ * `ozone.om.enable.intermediate.dirs=true`: should work without error

Review comment:
       Will not work, as createDirectory is putObject with 0 byte.
   Refer [link](https://issues.apache.org/jira/browse/HDDS-4209?focusedCommentId=17193033&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17193033)

##########
File path: hadoop-hdds/docs/content/design/s3_hcfs.md
##########
@@ -0,0 +1,282 @@
+---
+title: S3/Ozone Filesystem inter-op 
+summary: How to support both S3 and HCFS and the same time
+date: 2020-09-09
+jira: HDDS-4097
+status: draft
+author: Marton Elek, 
+---
+<!--
+  Licensed under the Apache License, Version 2.0 (the "License");
+  you may not use this file except in compliance with the License.
+  You may obtain a copy of the License at
+
+   http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License. See accompanying LICENSE file.
+-->
+
+# Ozone S3 vs file-system semantics
+
+Ozone is an object-store for Hadoop ecosystem which can be used from multiple interfaces: 
+
+ 1. From Hadoop Compatible File Systems (will be called as *HCFS* in the remaining of this document) (RPC)
+ 2. From S3 compatible applications (REST)
+ 3. From container orchestrator as mounted volume (CSI, alpha feature)
+
+As Ozone is an object store it stores key and values in a flat hierarchy which is enough to support S3 (2). But to support Hadoop Compatible File System (and CSI), Ozone should simulated file system hierarchy.
+
+There are multiple challenges when file system hierarchy is simulated by a flat namespace:
+
+ 1. Some key patterns couldn't be easily transformed to file system path (e.g. `/a/b/../c`, `/a/b//d`, or a real key with directory path like `/b/d/`)
+ 2. Directory entries (which may have own properties) require special handling as file system interface requires a dir entry even if it's not created explicitly (for example if key `/a/b/c` is created `/a/b` supposed to be a visible directory entry for file system interface) 
+ 3. Non-recursive listing of directories can be hard (Listing direct entries under `/a` should ignore all the `/a/b/...`, `/a/b/c/...` keys) 
+ 4. Similar to listing, rename can be a costly operation as it requires to rename many keys (renaming a first level directory means a rename of all the keys with the same prefix)
+
+See also the [Hadoop S3A documentation](https://hadoop.apache.org/docs/current/hadoop-aws/tools/hadoop-aws/index.html#Introducing_the_Hadoop_S3A_client) which describes some of these problem when AWS S3 is used. (*Warnings* section)
+
+# Current status
+
+As of today *Ozone Manager* has two different interfaces (both are defined in `OmClientProtocol.proto`): 
+
+ 1. object store related functions (like *CreateKey*, *LookupKey*, *DeleteKey*,...)  
+ 2. file system related functions (like *CreateFile*, *LookupFile*,...)
+
+File system related functions uses the same flat hierarchy under the hood but includes additional functionalities. For example the `createFile` call creates all the intermediate directories for a specific key (create file `/a/b/c` will create `/a/b` and `/a` entries in the key space)
+
+Today, if a key is created from the S3 interface can cause exceptions if the intermediate directories are checked from HCFS:
+
+
+```shell
+$ aws s3api put-object --endpoint http://localhost:9878 --bucket bucket1 --key /a/b/c/d
+
+$ ozone fs -ls o3fs://bucket1.s3v/a/
+ls: `o3fs://bucket1.s3v/a/': No such file or directory
+```
+
+This problem is reported in [HDDS-3955](https://issues.apache.org/jira/browse/HDDS-3955), where a new configuration key is introduced (`ozone.om.enable.filesystem.paths`). If this is enabled, intermediate directories are created even if the object store interface is used.
+
+This configuration is turned off by default, which means that S3 and HCFS couldn't be used together.
+
+To solve the performance problems of the directory listing / rename, [HDDS-2939](https://issues.apache.org/jira/browse/HDDS-2939) is created, which propose to use a new prefix table to store the "directory" entries (=prefixes).
+
+[HDDS-4097](https://issues.apache.org/jira/browse/HDDS-4097) is created to normalize the key names based on file-system semantics if `ozone.om.enable.filesystem.paths` is enabled. But please note that `ozone.om.enable.filesystem.paths` should always be turned on if S3 and HCFS are both used which means that S3 and HCFS couldn't be used together with normalization.
+
+# Goals
+
+ * Out of the box Ozone should support both S3 and HCFS interfaces without any settings. (It's possible only for the regular, fs compatible key names)
+ * As 100% compatibility couldn't be achieved on both side we need a configuration to set the expectations for incompatible key names
+ * Default behavior of `o3fs` and `ofs` should be as close to `s3a` as possible (when s3 compatibilty is prefered)
+
+# Possible cases to support
+
+There are two main aspects of supporting both `ofs/o3fs` and `s3` together:
+
+ 1. `ofs/o3fs` require to create intermediate directory entries (for exapmle `/a/b` for the key `/b/c/c`)
+ 2. Special file-system incompatible key names require special attention
+
+The second couldn't be done with compromise.
+
+ 1. We either support all key names (including non fs compatible key names), which means `ofs/o3fs` can provide only a partial view
+ 2. Or we can normalize the key names to be fs compatible (which makes it possible to create inconsistent S3 keys)
+
+HDDS-3955 introduced `ozone.om.enable.filesystem.paths`, with this setting we will have two possible usage pattern:
+
+| ozone.om.enable.filesystem.paths= | true | false
+|-|-|-|
+| create itermediate dirs | YES | NO |
+| normalize key names from `ofs/o3fs` | YES | NO
+| force to normalize key names of `s3` interface | YES (1) | NO 
+| `s3` key `/a/b/c` available from `ofs/o3fs` | YES | NO
+| `s3` key `/a/b//c` available from `ofs/o3fs` | YES | NO
+| `s3` key `/a/b//c` available from `s3` | AWS S3 incompatibility | YES
+
+(1): Under implementation
+
+This proposal suggest to use a 3rd option where 100% AWS compatiblity is guaranteed in exchange of a limited `ofs/o3fs` view:
+
+| ozone.om.intermediate.dir.generation= | true |
+|-|-|-|
+| create itermediate dirs | YES | 
+| normalize key names from `ofs/o3fs` | YES |
+| force to normalize key names of `s3` interface | **NO** |
+| `s3` key `/a/b/c` available from `ofs/o3fs` | YES | 
+| `s3` key `/a/b//c` available from `ofs/o3fs` | NO | 
+| `s3` key `/a/b//c` available from `s3` | *YES* (100% AWS compatibility) |
+
+
+# Proposed solution
+
+In short: 
+
+ **I propose to make it possible to configure **normalization** and **intermediate dir creation**, independent from each other**
+ 
+It can be done in multiple ways. For the sake of simplicity, let's imagine two configuration option
+
+| configuration | behavior | 
+|-|-|
+| `ozone.om.enable.filesystem.paths=true`  | Enable intermediate dir generation **AND** key name normalization 
+| `ozone.om.enable.intermediate.dirs=true` | Enable only the intermediate dir generation
+
+## S3/HCFS Interoperability
+
+**In case of intermediate directory generation is enabled (with either of the configuraiton keys)**:
+
+When somebody creates a new key like `/a/b/c/d`, the same key should be visible from HCFS (`o3fs//` or `o3://`). `/a`, `/a/b` and `/a/b/c` should be visible as directories from HCFS.
+
+S3 should list only the `/a/b/c/d` keys, (`/a`, `/a/b`, `/a/b/c` keys, created to help HCFS, **won't be visible** if the key is created from S3)
+
+This can be done with persisting an extra flag with the implicit directory entries. These entries can be modified if they are explicit created.
+
+This flag should be added only for the keys which are created by S3. `ofs://` and `of3fs://`  create explicit directories all the time.
+
+Advantages of this approach:
+
+ 1. HCFS and S3 can work together
+ 2. S3 behavior is closer to the original AWS s3 behavior (when `/a/b/c` key is created `/a/b` won't be visible)
+
+## Handling of the incompatible paths
+
+As it's defined above the intermediate directory generation and normalization are two independent settings. (It's possible to choose only to create the intermediate directories).
+
+**If normalization is choosen**: (`ozone.om.enable.filesystem.paths=true`), all the key names will be normalized to fs-compatible name. It may cause a conflict (error) if the normalized key is already exists (or exists as a file instead of directory)
+
+**Without normalization (`ozone.om.enable.intermediate.dirs=true`)**:
+
+Creating intermediate directories might not be possible if path contains illegal characters or can't be parsed as a file system path. **These keys will be invisible from HCFS** by default. They will be ignored during the normal file list.
+
+## Using Ozone in object-store only mode
+
+Creating intermediate directories can have some overhead (write amplification is increased if many keys are written with different prefixes as we need an entry for each prefixes). This write-amplification can be handled with the current implementation: based on the measurements RocksDB has no problems with billions of keys.
+
+If none of the mentioned configurations are enabled, the intermediate directories won't be created. But in this case, the consistent view of `ofs/o3fs` couldn't be guaranteed, so `ofs/o3fs` **should be disabled and throw an exception** (But Ozone can be used as a pure S3 replacement without using as a HCFS).
+
+# Problematic cases
+
+As described in the previous section there are some cases which couldn't be supported out-of-the-box due to the differences between the flat key-space and file-system hierarchy. These cases are collected here together with the information how existing tools (AWS console, AWS cli, AWS S3A Hadoop connector) behaves.
+
+## Empty directory path
+
+With a pure object store keys with empty directory names can be created.
+
+```
+aws s3api put-object --bucket ozonetest --key a/b//y --body README.md
+```
+
+Behavior:
+ * *S3 web console*: empty dir is rendered as `____` to make it possible to navigate in
+ * **aws s3 ls**: Prefix entry is visible as `PRE /`
+ * S3A: Not visible
+ 
+Proposed behavior:
+
+ * `ozone.om.enable.intermediate.dirs=true`: `/y` is not accessible, `/a/b` directory doesn't contain this entry
+ * `ozone.om.enable.filesystem.paths=true`: key stored as `/a/b/c`  
+ 
+## Path with invalid characters (`..`,`.`) 
+
+Path segments might include parts which has file system semantics:
+
+```
+aws s3api put-object --bucket ozonetest --key a/b/../e --body README.md
+aws s3api put-object --bucket ozonetest --key a/b/./f --body README.md
+```
+
+Behavior:
+ * *S3 web console*: `.` and `..` are rendered as directories to make it possible to navigate in
+ * **aws s3 ls**: Prefix entry is visible as `PRE ../` and `PRE ./`
+ * S3A: Entries are not visible
+ 
+Proposed behavior:
+
+ * `ozone.om.enable.intermediate.dirs=true`: `e` and `f` are not visible
+ * `ozone.om.enable.filesystem.paths=true`: key stored as `/a/e` and `a/b/f`  
+
+## Key and directory with the same name
+
+It is possible to create directory and key with the same name in AWS:
+
+```
+aws s3api put-object --bucket ozonetest --key a/b/h --body README.md
+aws s3api list-objects --bucket ozonetest --prefix=a/b/h/
+```
+
+Behavior:
+ * *S3 web console*: both directory and file are rendered
+ * **aws s3 ls**: prefix (`PRE h/`) and file (`h`) are both displayed
+ * S3A: both entries are visible with the name `/a/b/h` but firt is a file (with size) second is a directory (with directory attributes)
+ 
+Proposed behavior:
+
+ * `ozone.om.enable.intermediate.dirs=true`: show both the file and the directory with the same name (similar to S3A)
+ * `ozone.om.enable.filesystem.paths=true`: throwing exception when the second one is created  
+
+## Directory entry created with file content
+
+In this case we create a directory (key which ends with `/`) but with real file content:
+
+```
+aws s3api put-object --bucket ozonetest --key a/b/i/ --body README.md
+```
+
+Behavior:
+ * *S3 web console*: rendered as directory (couldn't be downloaded)
+ * **aws s3 ls**: showed as a prefix (`aws s3 ls s3://ozonetest/a/b`), but when the full path is used showed as a file without name (`aws s3 ls s3://ozonetest/a/b/i/`)
+ * S3A: `./bin/hdfs dfs -ls s3a://ozonetest/a/b/` shows a directory `h`, `./bin/hdfs dfs -ls s3a://ozonetest/a/b/i` shows a file `i`
+ 
+Proposed behavior:
+
+ * `ozone.om.enable.intermediate.dirs=true`: possible but `i/` is hidden from o3fs/ofs
+ * `ozone.om.enable.filesystem.paths=true`: key name is normalized to real key name
+
+## Create key and explicit create parent dir
+
+```
+aws s3api put-object --bucket ozonetest --key e/f/g/
+aws s3api put-object --bucket ozonetest --key e/f/
+```
+
+Behavior:
+
+ * S3 can support it without any problem
+ 
+Proposed behavior:
+
+After the first command `/e/f/` and `/e/` entries created in the key space (as they are required by `ofs`/`o3fs`) but **with a specific flag** (explicit=false). 
+
+AWS S3 list-objects API should exclude those entries from the result (!).
+
+Second command execution should modify the flag of `/e/f/` key to (explicit=true).
+
+## Create parent dir AND key with S3a
+
+This is the problem which is reported by [HDDS-4209](https://issues.apache.org/jira/browse/HDDS-4209)
+
+```
+hdfs dfs -mkdir -p s3a://b12345/d11/d12 # -> Success
+
+hdfs dfs -put /tmp/file1 s3a://b12345/d11/d12/file1 # -> fails with below error
+```
+
+Proposed behavior:
+
+ * `ozone.om.enable.intermediate.dirs=true`: should work without error

Review comment:
       Will not work, as createDirectory is putObject with 0 byte.
   Refer [link](https://issues.apache.org/jira/browse/HDDS-4209?focusedCommentId=17193033&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17193033) for more info




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org


[GitHub] [hadoop-ozone] bharatviswa504 commented on a change in pull request #1411: HDDS-4097. [DESIGN] S3/Ozone Filesystem inter-op

Posted by GitBox <gi...@apache.org>.
bharatviswa504 commented on a change in pull request #1411:
URL: https://github.com/apache/hadoop-ozone/pull/1411#discussion_r485789295



##########
File path: hadoop-hdds/docs/content/design/s3_hcfs.md
##########
@@ -0,0 +1,282 @@
+---
+title: S3/Ozone Filesystem inter-op 
+summary: How to support both S3 and HCFS and the same time
+date: 2020-09-09
+jira: HDDS-4097
+status: draft
+author: Marton Elek, 
+---
+<!--
+  Licensed under the Apache License, Version 2.0 (the "License");
+  you may not use this file except in compliance with the License.
+  You may obtain a copy of the License at
+
+   http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License. See accompanying LICENSE file.
+-->
+
+# Ozone S3 vs file-system semantics
+
+Ozone is an object-store for Hadoop ecosystem which can be used from multiple interfaces: 
+
+ 1. From Hadoop Compatible File Systems (will be called as *HCFS* in the remaining of this document) (RPC)
+ 2. From S3 compatible applications (REST)
+ 3. From container orchestrator as mounted volume (CSI, alpha feature)
+
+As Ozone is an object store it stores key and values in a flat hierarchy which is enough to support S3 (2). But to support Hadoop Compatible File System (and CSI), Ozone should simulated file system hierarchy.
+
+There are multiple challenges when file system hierarchy is simulated by a flat namespace:
+
+ 1. Some key patterns couldn't be easily transformed to file system path (e.g. `/a/b/../c`, `/a/b//d`, or a real key with directory path like `/b/d/`)
+ 2. Directory entries (which may have own properties) require special handling as file system interface requires a dir entry even if it's not created explicitly (for example if key `/a/b/c` is created `/a/b` supposed to be a visible directory entry for file system interface) 
+ 3. Non-recursive listing of directories can be hard (Listing direct entries under `/a` should ignore all the `/a/b/...`, `/a/b/c/...` keys) 
+ 4. Similar to listing, rename can be a costly operation as it requires to rename many keys (renaming a first level directory means a rename of all the keys with the same prefix)
+
+See also the [Hadoop S3A documentation](https://hadoop.apache.org/docs/current/hadoop-aws/tools/hadoop-aws/index.html#Introducing_the_Hadoop_S3A_client) which describes some of these problem when AWS S3 is used. (*Warnings* section)
+
+# Current status
+
+As of today *Ozone Manager* has two different interfaces (both are defined in `OmClientProtocol.proto`): 
+
+ 1. object store related functions (like *CreateKey*, *LookupKey*, *DeleteKey*,...)  
+ 2. file system related functions (like *CreateFile*, *LookupFile*,...)
+
+File system related functions uses the same flat hierarchy under the hood but includes additional functionalities. For example the `createFile` call creates all the intermediate directories for a specific key (create file `/a/b/c` will create `/a/b` and `/a` entries in the key space)
+
+Today, if a key is created from the S3 interface can cause exceptions if the intermediate directories are checked from HCFS:
+
+
+```shell
+$ aws s3api put-object --endpoint http://localhost:9878 --bucket bucket1 --key /a/b/c/d
+
+$ ozone fs -ls o3fs://bucket1.s3v/a/
+ls: `o3fs://bucket1.s3v/a/': No such file or directory
+```
+
+This problem is reported in [HDDS-3955](https://issues.apache.org/jira/browse/HDDS-3955), where a new configuration key is introduced (`ozone.om.enable.filesystem.paths`). If this is enabled, intermediate directories are created even if the object store interface is used.
+
+This configuration is turned off by default, which means that S3 and HCFS couldn't be used together.
+
+To solve the performance problems of the directory listing / rename, [HDDS-2939](https://issues.apache.org/jira/browse/HDDS-2939) is created, which propose to use a new prefix table to store the "directory" entries (=prefixes).
+
+[HDDS-4097](https://issues.apache.org/jira/browse/HDDS-4097) is created to normalize the key names based on file-system semantics if `ozone.om.enable.filesystem.paths` is enabled. But please note that `ozone.om.enable.filesystem.paths` should always be turned on if S3 and HCFS are both used which means that S3 and HCFS couldn't be used together with normalization.
+
+# Goals
+
+ * Out of the box Ozone should support both S3 and HCFS interfaces without any settings. (It's possible only for the regular, fs compatible key names)
+ * As 100% compatibility couldn't be achieved on both side we need a configuration to set the expectations for incompatible key names
+ * Default behavior of `o3fs` and `ofs` should be as close to `s3a` as possible (when s3 compatibilty is prefered)
+
+# Possible cases to support
+
+There are two main aspects of supporting both `ofs/o3fs` and `s3` together:
+
+ 1. `ofs/o3fs` require to create intermediate directory entries (for exapmle `/a/b` for the key `/b/c/c`)
+ 2. Special file-system incompatible key names require special attention
+
+The second couldn't be done with compromise.
+
+ 1. We either support all key names (including non fs compatible key names), which means `ofs/o3fs` can provide only a partial view
+ 2. Or we can normalize the key names to be fs compatible (which makes it possible to create inconsistent S3 keys)
+
+HDDS-3955 introduced `ozone.om.enable.filesystem.paths`, with this setting we will have two possible usage pattern:
+
+| ozone.om.enable.filesystem.paths= | true | false
+|-|-|-|
+| create itermediate dirs | YES | NO |
+| normalize key names from `ofs/o3fs` | YES | NO
+| force to normalize key names of `s3` interface | YES (1) | NO 
+| `s3` key `/a/b/c` available from `ofs/o3fs` | YES | NO
+| `s3` key `/a/b//c` available from `ofs/o3fs` | YES | NO
+| `s3` key `/a/b//c` available from `s3` | AWS S3 incompatibility | YES
+
+(1): Under implementation
+
+This proposal suggest to use a 3rd option where 100% AWS compatiblity is guaranteed in exchange of a limited `ofs/o3fs` view:
+
+| ozone.om.intermediate.dir.generation= | true |
+|-|-|-|
+| create itermediate dirs | YES | 
+| normalize key names from `ofs/o3fs` | YES |

Review comment:
       There is no special normalize is needed, as we use Path Object, keyNames are normalized and sent to OM




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org


[GitHub] [hadoop-ozone] arp7 commented on a change in pull request #1411: HDDS-4097. [DESIGN] S3/Ozone Filesystem inter-op

Posted by GitBox <gi...@apache.org>.
arp7 commented on a change in pull request #1411:
URL: https://github.com/apache/hadoop-ozone/pull/1411#discussion_r485699608



##########
File path: hadoop-hdds/docs/content/design/s3_hcfs.md
##########
@@ -67,45 +66,100 @@ To solve the performance problems of the directory listing / rename, [HDDS-2939]
 
 [HDDS-4097](https://issues.apache.org/jira/browse/HDDS-4097) is created to normalize the key names based on file-system semantics if `ozone.om.enable.filesystem.paths` is enabled. But please note that `ozone.om.enable.filesystem.paths` should always be turned on if S3 and HCFS are both used which means that S3 and HCFS couldn't be used together with normalization.
 
-## Goals
+# Goals
+
+ * Out of the box Ozone should support both S3 and HCFS interfaces without any settings. (It's possible only for the regular, fs compatible key names)
+ * As 100% compatibility couldn't be achieved on both side we need a configuration to set the expectations for incompatible key names
+ * Default behavior of `o3fs` and `ofs` should be as close to `s3a` as possible (when s3 compatibilty is prefered)
+
+# Possible cases to support
+
+There are two main aspects of supporting both `ofs/o3fs` and `s3` together:
+
+ 1. `ofs/o3fs` require to create intermediate directory entries (for exapmle `/a/b` for the key `/b/c/c`)
+ 2. Special file-system incompatible key names require special attention
+
+The second couldn't be done with compromise.
+
+ 1. We either support all key names (including non fs compatible key names), which means `ofs/o3fs` can provide only a partial view
+ 2. Or we can normalize the key names to be fs compatible (which makes it possible to create inconsistent S3 keys)
+
+HDDS-3955 introduced `ozone.om.enable.filesystem.paths`, with this setting we will have two possible usage pattern:
+
+| ozone.om.enable.filesystem.paths= | true | false
+|-|-|-|
+| create itermediate dirs | YES | NO |
+| normalize key names from `ofs/o3fs` | YES | NO
+| force to normalize key names of `s3` interface | YES (1) | NO 
+| `s3` key `/a/b/c` available from `ofs/o3fs` | YES | NO
+| `s3` key `/a/b//c` available from `ofs/o3fs` | YES | NO
+| `s3` key `/a/b//c` available from `s3` | AWS S3 incompatibility | YES
+
+(1): Under implementation
 
- * Out of the box Ozone should support both S3 and HCFS interfaces without any settings. (It's possible only for the regular path)
- * As 100% compatibility couldn't be achieved on both side we need a configuration to set the expectations in case of incompatible key names
- * Default behavior of `o3fs` and `ofs` should be as close to `s3a` as possible
+This proposal suggest to use a 3rd option where 100% AWS compatiblity is guaranteed in exchange of a limited `ofs/o3fs` view:

Review comment:
       I completely disagree with this trade-off. The FS limited view is neither here nor there. You can insert keys via the S3 interface that are not visible via the FS view at all. To me this is the same as a corrupted filesystem. Marton, I liked your offline suggestion much better - disable FS access completely when operating in S3-compatible mode.
   
   Taking this one step further, I have a different approach in mind. Let's make this a per-bucket setting. For buckets created via the S3 interface, by default the S3 semantics will be preserved 100% unless the global setting is enabled and FS access will not be allowed at all. For buckets created via FS interface, the FS semantics will always take precedence. If the global setting is enabled, then the value of the setting at the time of bucket creation is sampled and that takes effect for the lifetime of the bucket. Basically you can't change the behavior for a given bucket.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org


[GitHub] [hadoop-ozone] elek commented on a change in pull request #1411: HDDS-4097. [DESIGN] S3/Ozone Filesystem inter-op

Posted by GitBox <gi...@apache.org>.
elek commented on a change in pull request #1411:
URL: https://github.com/apache/hadoop-ozone/pull/1411#discussion_r486423208



##########
File path: hadoop-hdds/docs/content/design/s3_hcfs.md
##########
@@ -67,45 +66,100 @@ To solve the performance problems of the directory listing / rename, [HDDS-2939]
 
 [HDDS-4097](https://issues.apache.org/jira/browse/HDDS-4097) is created to normalize the key names based on file-system semantics if `ozone.om.enable.filesystem.paths` is enabled. But please note that `ozone.om.enable.filesystem.paths` should always be turned on if S3 and HCFS are both used which means that S3 and HCFS couldn't be used together with normalization.
 
-## Goals
+# Goals
+
+ * Out of the box Ozone should support both S3 and HCFS interfaces without any settings. (It's possible only for the regular, fs compatible key names)
+ * As 100% compatibility couldn't be achieved on both side we need a configuration to set the expectations for incompatible key names
+ * Default behavior of `o3fs` and `ofs` should be as close to `s3a` as possible (when s3 compatibilty is prefered)
+
+# Possible cases to support
+
+There are two main aspects of supporting both `ofs/o3fs` and `s3` together:
+
+ 1. `ofs/o3fs` require to create intermediate directory entries (for exapmle `/a/b` for the key `/b/c/c`)
+ 2. Special file-system incompatible key names require special attention
+
+The second couldn't be done with compromise.
+
+ 1. We either support all key names (including non fs compatible key names), which means `ofs/o3fs` can provide only a partial view
+ 2. Or we can normalize the key names to be fs compatible (which makes it possible to create inconsistent S3 keys)
+
+HDDS-3955 introduced `ozone.om.enable.filesystem.paths`, with this setting we will have two possible usage pattern:
+
+| ozone.om.enable.filesystem.paths= | true | false
+|-|-|-|
+| create itermediate dirs | YES | NO |
+| normalize key names from `ofs/o3fs` | YES | NO
+| force to normalize key names of `s3` interface | YES (1) | NO 
+| `s3` key `/a/b/c` available from `ofs/o3fs` | YES | NO
+| `s3` key `/a/b//c` available from `ofs/o3fs` | YES | NO
+| `s3` key `/a/b//c` available from `s3` | AWS S3 incompatibility | YES
+
+(1): Under implementation
 
- * Out of the box Ozone should support both S3 and HCFS interfaces without any settings. (It's possible only for the regular path)
- * As 100% compatibility couldn't be achieved on both side we need a configuration to set the expectations in case of incompatible key names
- * Default behavior of `o3fs` and `ofs` should be as close to `s3a` as possible
+This proposal suggest to use a 3rd option where 100% AWS compatiblity is guaranteed in exchange of a limited `ofs/o3fs` view:

Review comment:
       Different behavior on bucket level seems to be an interesting idea.
   
   > For buckets created via FS interface, the FS semantics will always take precedence. 
   
   How would you define the behavior of bucket is created from S3? 
   
   I suppose in this case we should support 100% AWS S3 compatibility (without forced normalization).
   
   But how would o3fs/ofs work in case of `s3` buckets:
   
    1. Partial view from ofs (incompatible keys are hidden)
    2. `ofs/o3fs` is disabled (exception), no intermediate directories are created.
    
   
   
   
   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org


[GitHub] [hadoop-ozone] bharatviswa504 commented on a change in pull request #1411: HDDS-4097. [DESIGN] S3/Ozone Filesystem inter-op

Posted by GitBox <gi...@apache.org>.
bharatviswa504 commented on a change in pull request #1411:
URL: https://github.com/apache/hadoop-ozone/pull/1411#discussion_r486535317



##########
File path: hadoop-hdds/docs/content/design/s3_hcfs.md
##########
@@ -0,0 +1,282 @@
+---
+title: S3/Ozone Filesystem inter-op 
+summary: How to support both S3 and HCFS and the same time
+date: 2020-09-09
+jira: HDDS-4097
+status: draft
+author: Marton Elek, 
+---
+<!--
+  Licensed under the Apache License, Version 2.0 (the "License");
+  you may not use this file except in compliance with the License.
+  You may obtain a copy of the License at
+
+   http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License. See accompanying LICENSE file.
+-->
+
+# Ozone S3 vs file-system semantics
+
+Ozone is an object-store for Hadoop ecosystem which can be used from multiple interfaces: 
+
+ 1. From Hadoop Compatible File Systems (will be called as *HCFS* in the remaining of this document) (RPC)
+ 2. From S3 compatible applications (REST)
+ 3. From container orchestrator as mounted volume (CSI, alpha feature)
+
+As Ozone is an object store it stores key and values in a flat hierarchy which is enough to support S3 (2). But to support Hadoop Compatible File System (and CSI), Ozone should simulated file system hierarchy.
+
+There are multiple challenges when file system hierarchy is simulated by a flat namespace:
+
+ 1. Some key patterns couldn't be easily transformed to file system path (e.g. `/a/b/../c`, `/a/b//d`, or a real key with directory path like `/b/d/`)
+ 2. Directory entries (which may have own properties) require special handling as file system interface requires a dir entry even if it's not created explicitly (for example if key `/a/b/c` is created `/a/b` supposed to be a visible directory entry for file system interface) 
+ 3. Non-recursive listing of directories can be hard (Listing direct entries under `/a` should ignore all the `/a/b/...`, `/a/b/c/...` keys) 
+ 4. Similar to listing, rename can be a costly operation as it requires to rename many keys (renaming a first level directory means a rename of all the keys with the same prefix)
+
+See also the [Hadoop S3A documentation](https://hadoop.apache.org/docs/current/hadoop-aws/tools/hadoop-aws/index.html#Introducing_the_Hadoop_S3A_client) which describes some of these problem when AWS S3 is used. (*Warnings* section)
+
+# Current status
+
+As of today *Ozone Manager* has two different interfaces (both are defined in `OmClientProtocol.proto`): 
+
+ 1. object store related functions (like *CreateKey*, *LookupKey*, *DeleteKey*,...)  
+ 2. file system related functions (like *CreateFile*, *LookupFile*,...)
+
+File system related functions uses the same flat hierarchy under the hood but includes additional functionalities. For example the `createFile` call creates all the intermediate directories for a specific key (create file `/a/b/c` will create `/a/b` and `/a` entries in the key space)
+
+Today, if a key is created from the S3 interface can cause exceptions if the intermediate directories are checked from HCFS:
+
+
+```shell
+$ aws s3api put-object --endpoint http://localhost:9878 --bucket bucket1 --key /a/b/c/d
+
+$ ozone fs -ls o3fs://bucket1.s3v/a/
+ls: `o3fs://bucket1.s3v/a/': No such file or directory
+```
+
+This problem is reported in [HDDS-3955](https://issues.apache.org/jira/browse/HDDS-3955), where a new configuration key is introduced (`ozone.om.enable.filesystem.paths`). If this is enabled, intermediate directories are created even if the object store interface is used.
+
+This configuration is turned off by default, which means that S3 and HCFS couldn't be used together.
+
+To solve the performance problems of the directory listing / rename, [HDDS-2939](https://issues.apache.org/jira/browse/HDDS-2939) is created, which propose to use a new prefix table to store the "directory" entries (=prefixes).
+
+[HDDS-4097](https://issues.apache.org/jira/browse/HDDS-4097) is created to normalize the key names based on file-system semantics if `ozone.om.enable.filesystem.paths` is enabled. But please note that `ozone.om.enable.filesystem.paths` should always be turned on if S3 and HCFS are both used which means that S3 and HCFS couldn't be used together with normalization.
+
+# Goals
+
+ * Out of the box Ozone should support both S3 and HCFS interfaces without any settings. (It's possible only for the regular, fs compatible key names)
+ * As 100% compatibility couldn't be achieved on both side we need a configuration to set the expectations for incompatible key names
+ * Default behavior of `o3fs` and `ofs` should be as close to `s3a` as possible (when s3 compatibilty is prefered)
+
+# Possible cases to support
+
+There are two main aspects of supporting both `ofs/o3fs` and `s3` together:
+
+ 1. `ofs/o3fs` require to create intermediate directory entries (for exapmle `/a/b` for the key `/b/c/c`)
+ 2. Special file-system incompatible key names require special attention
+
+The second couldn't be done with compromise.
+
+ 1. We either support all key names (including non fs compatible key names), which means `ofs/o3fs` can provide only a partial view
+ 2. Or we can normalize the key names to be fs compatible (which makes it possible to create inconsistent S3 keys)
+
+HDDS-3955 introduced `ozone.om.enable.filesystem.paths`, with this setting we will have two possible usage pattern:
+
+| ozone.om.enable.filesystem.paths= | true | false
+|-|-|-|
+| create itermediate dirs | YES | NO |
+| normalize key names from `ofs/o3fs` | YES | NO
+| force to normalize key names of `s3` interface | YES (1) | NO 
+| `s3` key `/a/b/c` available from `ofs/o3fs` | YES | NO
+| `s3` key `/a/b//c` available from `ofs/o3fs` | YES | NO
+| `s3` key `/a/b//c` available from `s3` | AWS S3 incompatibility | YES
+
+(1): Under implementation
+
+This proposal suggest to use a 3rd option where 100% AWS compatiblity is guaranteed in exchange of a limited `ofs/o3fs` view:
+
+| ozone.om.intermediate.dir.generation= | true |
+|-|-|-|
+| create itermediate dirs | YES | 
+| normalize key names from `ofs/o3fs` | YES |
+| force to normalize key names of `s3` interface | **NO** |
+| `s3` key `/a/b/c` available from `ofs/o3fs` | YES | 
+| `s3` key `/a/b//c` available from `ofs/o3fs` | NO | 
+| `s3` key `/a/b//c` available from `s3` | *YES* (100% AWS compatibility) |
+
+
+# Proposed solution
+
+In short: 
+
+ **I propose to make it possible to configure **normalization** and **intermediate dir creation**, independent from each other**
+ 
+It can be done in multiple ways. For the sake of simplicity, let's imagine two configuration option
+
+| configuration | behavior | 
+|-|-|
+| `ozone.om.enable.filesystem.paths=true`  | Enable intermediate dir generation **AND** key name normalization 
+| `ozone.om.enable.intermediate.dirs=true` | Enable only the intermediate dir generation
+
+## S3/HCFS Interoperability
+
+**In case of intermediate directory generation is enabled (with either of the configuraiton keys)**:
+
+When somebody creates a new key like `/a/b/c/d`, the same key should be visible from HCFS (`o3fs//` or `o3://`). `/a`, `/a/b` and `/a/b/c` should be visible as directories from HCFS.
+
+S3 should list only the `/a/b/c/d` keys, (`/a`, `/a/b`, `/a/b/c` keys, created to help HCFS, **won't be visible** if the key is created from S3)
+
+This can be done with persisting an extra flag with the implicit directory entries. These entries can be modified if they are explicit created.
+
+This flag should be added only for the keys which are created by S3. `ofs://` and `of3fs://`  create explicit directories all the time.
+
+Advantages of this approach:
+
+ 1. HCFS and S3 can work together
+ 2. S3 behavior is closer to the original AWS s3 behavior (when `/a/b/c` key is created `/a/b` won't be visible)
+
+## Handling of the incompatible paths
+
+As it's defined above the intermediate directory generation and normalization are two independent settings. (It's possible to choose only to create the intermediate directories).
+
+**If normalization is choosen**: (`ozone.om.enable.filesystem.paths=true`), all the key names will be normalized to fs-compatible name. It may cause a conflict (error) if the normalized key is already exists (or exists as a file instead of directory)
+
+**Without normalization (`ozone.om.enable.intermediate.dirs=true`)**:
+
+Creating intermediate directories might not be possible if path contains illegal characters or can't be parsed as a file system path. **These keys will be invisible from HCFS** by default. They will be ignored during the normal file list.
+
+## Using Ozone in object-store only mode
+
+Creating intermediate directories can have some overhead (write amplification is increased if many keys are written with different prefixes as we need an entry for each prefixes). This write-amplification can be handled with the current implementation: based on the measurements RocksDB has no problems with billions of keys.
+
+If none of the mentioned configurations are enabled, the intermediate directories won't be created. But in this case, the consistent view of `ofs/o3fs` couldn't be guaranteed, so `ofs/o3fs` **should be disabled and throw an exception** (But Ozone can be used as a pure S3 replacement without using as a HCFS).
+
+# Problematic cases
+
+As described in the previous section there are some cases which couldn't be supported out-of-the-box due to the differences between the flat key-space and file-system hierarchy. These cases are collected here together with the information how existing tools (AWS console, AWS cli, AWS S3A Hadoop connector) behaves.
+
+## Empty directory path
+
+With a pure object store keys with empty directory names can be created.
+
+```
+aws s3api put-object --bucket ozonetest --key a/b//y --body README.md
+```
+
+Behavior:
+ * *S3 web console*: empty dir is rendered as `____` to make it possible to navigate in
+ * **aws s3 ls**: Prefix entry is visible as `PRE /`
+ * S3A: Not visible
+ 
+Proposed behavior:
+
+ * `ozone.om.enable.intermediate.dirs=true`: `/y` is not accessible, `/a/b` directory doesn't contain this entry
+ * `ozone.om.enable.filesystem.paths=true`: key stored as `/a/b/c`  
+ 
+## Path with invalid characters (`..`,`.`) 
+
+Path segments might include parts which has file system semantics:
+
+```
+aws s3api put-object --bucket ozonetest --key a/b/../e --body README.md
+aws s3api put-object --bucket ozonetest --key a/b/./f --body README.md
+```
+
+Behavior:
+ * *S3 web console*: `.` and `..` are rendered as directories to make it possible to navigate in
+ * **aws s3 ls**: Prefix entry is visible as `PRE ../` and `PRE ./`
+ * S3A: Entries are not visible
+ 
+Proposed behavior:
+
+ * `ozone.om.enable.intermediate.dirs=true`: `e` and `f` are not visible
+ * `ozone.om.enable.filesystem.paths=true`: key stored as `/a/e` and `a/b/f`  
+
+## Key and directory with the same name
+
+It is possible to create directory and key with the same name in AWS:
+
+```
+aws s3api put-object --bucket ozonetest --key a/b/h --body README.md
+aws s3api list-objects --bucket ozonetest --prefix=a/b/h/
+```
+
+Behavior:
+ * *S3 web console*: both directory and file are rendered
+ * **aws s3 ls**: prefix (`PRE h/`) and file (`h`) are both displayed
+ * S3A: both entries are visible with the name `/a/b/h` but firt is a file (with size) second is a directory (with directory attributes)
+ 
+Proposed behavior:
+
+ * `ozone.om.enable.intermediate.dirs=true`: show both the file and the directory with the same name (similar to S3A)
+ * `ozone.om.enable.filesystem.paths=true`: throwing exception when the second one is created  
+
+## Directory entry created with file content
+
+In this case we create a directory (key which ends with `/`) but with real file content:
+
+```
+aws s3api put-object --bucket ozonetest --key a/b/i/ --body README.md
+```
+
+Behavior:
+ * *S3 web console*: rendered as directory (couldn't be downloaded)
+ * **aws s3 ls**: showed as a prefix (`aws s3 ls s3://ozonetest/a/b`), but when the full path is used showed as a file without name (`aws s3 ls s3://ozonetest/a/b/i/`)
+ * S3A: `./bin/hdfs dfs -ls s3a://ozonetest/a/b/` shows a directory `h`, `./bin/hdfs dfs -ls s3a://ozonetest/a/b/i` shows a file `i`
+ 
+Proposed behavior:
+
+ * `ozone.om.enable.intermediate.dirs=true`: possible but `i/` is hidden from o3fs/ofs
+ * `ozone.om.enable.filesystem.paths=true`: key name is normalized to real key name
+
+## Create key and explicit create parent dir
+
+```
+aws s3api put-object --bucket ozonetest --key e/f/g/
+aws s3api put-object --bucket ozonetest --key e/f/
+```
+
+Behavior:
+
+ * S3 can support it without any problem
+ 
+Proposed behavior:
+
+After the first command `/e/f/` and `/e/` entries created in the key space (as they are required by `ofs`/`o3fs`) but **with a specific flag** (explicit=false). 
+
+AWS S3 list-objects API should exclude those entries from the result (!).
+
+Second command execution should modify the flag of `/e/f/` key to (explicit=true).
+
+## Create parent dir AND key with S3a
+
+This is the problem which is reported by [HDDS-4209](https://issues.apache.org/jira/browse/HDDS-4209)
+
+```
+hdfs dfs -mkdir -p s3a://b12345/d11/d12 # -> Success
+
+hdfs dfs -put /tmp/file1 s3a://b12345/d11/d12/file1 # -> fails with below error
+```
+
+Proposed behavior:
+
+ * `ozone.om.enable.intermediate.dirs=true`: should work without error
+ * `ozone.om.enable.filesystem.paths=true`: should work without error.
+
+This is an `ofs`/`o3fs` question not an S3. The directory created in the first step shouldn't block the creation of the file. This can be a **mandatory** normalization for `mkdir` directory creation. As it's an HCFS operation, s3 is not affected. Entries created from S3 can be visible from s3 without any problem.
+
+## Create file and directory with S3
+
+This problem is reported in HDDS-4209, thanks to @Bharat
+
+```
+hdfs dfs -mkdir -p s3a://b12345/d11/d12 -> Success
+
+hdfs dfs -put /tmp/file1 s3a://b12345/d11/d12/file1 
+```
+
+In this case first a `d11/d12/` key is created. The intermediate key creation logic in the second step should use it as a directory instead of throwing an exception.

Review comment:
       Yes. Posted my comment in HDDS-4209. Refer for more info




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org


[GitHub] [hadoop-ozone] elek commented on a change in pull request #1411: HDDS-4097. [DESIGN] S3/Ozone Filesystem inter-op

Posted by GitBox <gi...@apache.org>.
elek commented on a change in pull request #1411:
URL: https://github.com/apache/hadoop-ozone/pull/1411#discussion_r486441859



##########
File path: hadoop-hdds/docs/content/design/s3_hcfs.md
##########
@@ -0,0 +1,282 @@
+---
+title: S3/Ozone Filesystem inter-op 
+summary: How to support both S3 and HCFS and the same time
+date: 2020-09-09
+jira: HDDS-4097
+status: draft
+author: Marton Elek, 
+---
+<!--
+  Licensed under the Apache License, Version 2.0 (the "License");
+  you may not use this file except in compliance with the License.
+  You may obtain a copy of the License at
+
+   http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License. See accompanying LICENSE file.
+-->
+
+# Ozone S3 vs file-system semantics
+
+Ozone is an object-store for Hadoop ecosystem which can be used from multiple interfaces: 
+
+ 1. From Hadoop Compatible File Systems (will be called as *HCFS* in the remaining of this document) (RPC)
+ 2. From S3 compatible applications (REST)
+ 3. From container orchestrator as mounted volume (CSI, alpha feature)
+
+As Ozone is an object store it stores key and values in a flat hierarchy which is enough to support S3 (2). But to support Hadoop Compatible File System (and CSI), Ozone should simulated file system hierarchy.
+
+There are multiple challenges when file system hierarchy is simulated by a flat namespace:
+
+ 1. Some key patterns couldn't be easily transformed to file system path (e.g. `/a/b/../c`, `/a/b//d`, or a real key with directory path like `/b/d/`)
+ 2. Directory entries (which may have own properties) require special handling as file system interface requires a dir entry even if it's not created explicitly (for example if key `/a/b/c` is created `/a/b` supposed to be a visible directory entry for file system interface) 
+ 3. Non-recursive listing of directories can be hard (Listing direct entries under `/a` should ignore all the `/a/b/...`, `/a/b/c/...` keys) 
+ 4. Similar to listing, rename can be a costly operation as it requires to rename many keys (renaming a first level directory means a rename of all the keys with the same prefix)
+
+See also the [Hadoop S3A documentation](https://hadoop.apache.org/docs/current/hadoop-aws/tools/hadoop-aws/index.html#Introducing_the_Hadoop_S3A_client) which describes some of these problem when AWS S3 is used. (*Warnings* section)
+
+# Current status
+
+As of today *Ozone Manager* has two different interfaces (both are defined in `OmClientProtocol.proto`): 
+
+ 1. object store related functions (like *CreateKey*, *LookupKey*, *DeleteKey*,...)  
+ 2. file system related functions (like *CreateFile*, *LookupFile*,...)
+
+File system related functions uses the same flat hierarchy under the hood but includes additional functionalities. For example the `createFile` call creates all the intermediate directories for a specific key (create file `/a/b/c` will create `/a/b` and `/a` entries in the key space)
+
+Today, if a key is created from the S3 interface can cause exceptions if the intermediate directories are checked from HCFS:
+
+
+```shell
+$ aws s3api put-object --endpoint http://localhost:9878 --bucket bucket1 --key /a/b/c/d
+
+$ ozone fs -ls o3fs://bucket1.s3v/a/
+ls: `o3fs://bucket1.s3v/a/': No such file or directory
+```
+
+This problem is reported in [HDDS-3955](https://issues.apache.org/jira/browse/HDDS-3955), where a new configuration key is introduced (`ozone.om.enable.filesystem.paths`). If this is enabled, intermediate directories are created even if the object store interface is used.
+
+This configuration is turned off by default, which means that S3 and HCFS couldn't be used together.
+
+To solve the performance problems of the directory listing / rename, [HDDS-2939](https://issues.apache.org/jira/browse/HDDS-2939) is created, which propose to use a new prefix table to store the "directory" entries (=prefixes).
+
+[HDDS-4097](https://issues.apache.org/jira/browse/HDDS-4097) is created to normalize the key names based on file-system semantics if `ozone.om.enable.filesystem.paths` is enabled. But please note that `ozone.om.enable.filesystem.paths` should always be turned on if S3 and HCFS are both used which means that S3 and HCFS couldn't be used together with normalization.
+
+# Goals
+
+ * Out of the box Ozone should support both S3 and HCFS interfaces without any settings. (It's possible only for the regular, fs compatible key names)
+ * As 100% compatibility couldn't be achieved on both side we need a configuration to set the expectations for incompatible key names
+ * Default behavior of `o3fs` and `ofs` should be as close to `s3a` as possible (when s3 compatibilty is prefered)
+
+# Possible cases to support
+
+There are two main aspects of supporting both `ofs/o3fs` and `s3` together:
+
+ 1. `ofs/o3fs` require to create intermediate directory entries (for exapmle `/a/b` for the key `/b/c/c`)
+ 2. Special file-system incompatible key names require special attention
+
+The second couldn't be done with compromise.
+
+ 1. We either support all key names (including non fs compatible key names), which means `ofs/o3fs` can provide only a partial view
+ 2. Or we can normalize the key names to be fs compatible (which makes it possible to create inconsistent S3 keys)
+
+HDDS-3955 introduced `ozone.om.enable.filesystem.paths`, with this setting we will have two possible usage pattern:
+
+| ozone.om.enable.filesystem.paths= | true | false
+|-|-|-|
+| create itermediate dirs | YES | NO |
+| normalize key names from `ofs/o3fs` | YES | NO
+| force to normalize key names of `s3` interface | YES (1) | NO 
+| `s3` key `/a/b/c` available from `ofs/o3fs` | YES | NO
+| `s3` key `/a/b//c` available from `ofs/o3fs` | YES | NO
+| `s3` key `/a/b//c` available from `s3` | AWS S3 incompatibility | YES
+
+(1): Under implementation
+
+This proposal suggest to use a 3rd option where 100% AWS compatiblity is guaranteed in exchange of a limited `ofs/o3fs` view:
+
+| ozone.om.intermediate.dir.generation= | true |
+|-|-|-|
+| create itermediate dirs | YES | 
+| normalize key names from `ofs/o3fs` | YES |
+| force to normalize key names of `s3` interface | **NO** |
+| `s3` key `/a/b/c` available from `ofs/o3fs` | YES | 
+| `s3` key `/a/b//c` available from `ofs/o3fs` | NO | 
+| `s3` key `/a/b//c` available from `s3` | *YES* (100% AWS compatibility) |
+
+
+# Proposed solution
+
+In short: 
+
+ **I propose to make it possible to configure **normalization** and **intermediate dir creation**, independent from each other**
+ 
+It can be done in multiple ways. For the sake of simplicity, let's imagine two configuration option
+
+| configuration | behavior | 
+|-|-|
+| `ozone.om.enable.filesystem.paths=true`  | Enable intermediate dir generation **AND** key name normalization 
+| `ozone.om.enable.intermediate.dirs=true` | Enable only the intermediate dir generation
+
+## S3/HCFS Interoperability
+
+**In case of intermediate directory generation is enabled (with either of the configuraiton keys)**:
+
+When somebody creates a new key like `/a/b/c/d`, the same key should be visible from HCFS (`o3fs//` or `o3://`). `/a`, `/a/b` and `/a/b/c` should be visible as directories from HCFS.
+
+S3 should list only the `/a/b/c/d` keys, (`/a`, `/a/b`, `/a/b/c` keys, created to help HCFS, **won't be visible** if the key is created from S3)
+
+This can be done with persisting an extra flag with the implicit directory entries. These entries can be modified if they are explicit created.
+
+This flag should be added only for the keys which are created by S3. `ofs://` and `of3fs://`  create explicit directories all the time.
+
+Advantages of this approach:
+
+ 1. HCFS and S3 can work together
+ 2. S3 behavior is closer to the original AWS s3 behavior (when `/a/b/c` key is created `/a/b` won't be visible)
+
+## Handling of the incompatible paths
+
+As it's defined above the intermediate directory generation and normalization are two independent settings. (It's possible to choose only to create the intermediate directories).
+
+**If normalization is choosen**: (`ozone.om.enable.filesystem.paths=true`), all the key names will be normalized to fs-compatible name. It may cause a conflict (error) if the normalized key is already exists (or exists as a file instead of directory)
+
+**Without normalization (`ozone.om.enable.intermediate.dirs=true`)**:
+
+Creating intermediate directories might not be possible if path contains illegal characters or can't be parsed as a file system path. **These keys will be invisible from HCFS** by default. They will be ignored during the normal file list.
+
+## Using Ozone in object-store only mode
+
+Creating intermediate directories can have some overhead (write amplification is increased if many keys are written with different prefixes as we need an entry for each prefixes). This write-amplification can be handled with the current implementation: based on the measurements RocksDB has no problems with billions of keys.
+
+If none of the mentioned configurations are enabled, the intermediate directories won't be created. But in this case, the consistent view of `ofs/o3fs` couldn't be guaranteed, so `ofs/o3fs` **should be disabled and throw an exception** (But Ozone can be used as a pure S3 replacement without using as a HCFS).
+
+# Problematic cases
+
+As described in the previous section there are some cases which couldn't be supported out-of-the-box due to the differences between the flat key-space and file-system hierarchy. These cases are collected here together with the information how existing tools (AWS console, AWS cli, AWS S3A Hadoop connector) behaves.
+
+## Empty directory path
+
+With a pure object store keys with empty directory names can be created.
+
+```
+aws s3api put-object --bucket ozonetest --key a/b//y --body README.md
+```
+
+Behavior:
+ * *S3 web console*: empty dir is rendered as `____` to make it possible to navigate in
+ * **aws s3 ls**: Prefix entry is visible as `PRE /`
+ * S3A: Not visible
+ 
+Proposed behavior:
+
+ * `ozone.om.enable.intermediate.dirs=true`: `/y` is not accessible, `/a/b` directory doesn't contain this entry
+ * `ozone.om.enable.filesystem.paths=true`: key stored as `/a/b/c`  
+ 
+## Path with invalid characters (`..`,`.`) 
+
+Path segments might include parts which has file system semantics:
+
+```
+aws s3api put-object --bucket ozonetest --key a/b/../e --body README.md
+aws s3api put-object --bucket ozonetest --key a/b/./f --body README.md
+```
+
+Behavior:
+ * *S3 web console*: `.` and `..` are rendered as directories to make it possible to navigate in
+ * **aws s3 ls**: Prefix entry is visible as `PRE ../` and `PRE ./`
+ * S3A: Entries are not visible
+ 
+Proposed behavior:
+
+ * `ozone.om.enable.intermediate.dirs=true`: `e` and `f` are not visible
+ * `ozone.om.enable.filesystem.paths=true`: key stored as `/a/e` and `a/b/f`  
+
+## Key and directory with the same name
+
+It is possible to create directory and key with the same name in AWS:
+
+```
+aws s3api put-object --bucket ozonetest --key a/b/h --body README.md
+aws s3api list-objects --bucket ozonetest --prefix=a/b/h/
+```
+
+Behavior:
+ * *S3 web console*: both directory and file are rendered
+ * **aws s3 ls**: prefix (`PRE h/`) and file (`h`) are both displayed
+ * S3A: both entries are visible with the name `/a/b/h` but firt is a file (with size) second is a directory (with directory attributes)
+ 
+Proposed behavior:
+
+ * `ozone.om.enable.intermediate.dirs=true`: show both the file and the directory with the same name (similar to S3A)
+ * `ozone.om.enable.filesystem.paths=true`: throwing exception when the second one is created  
+
+## Directory entry created with file content
+
+In this case we create a directory (key which ends with `/`) but with real file content:
+
+```
+aws s3api put-object --bucket ozonetest --key a/b/i/ --body README.md
+```
+
+Behavior:
+ * *S3 web console*: rendered as directory (couldn't be downloaded)
+ * **aws s3 ls**: showed as a prefix (`aws s3 ls s3://ozonetest/a/b`), but when the full path is used showed as a file without name (`aws s3 ls s3://ozonetest/a/b/i/`)
+ * S3A: `./bin/hdfs dfs -ls s3a://ozonetest/a/b/` shows a directory `h`, `./bin/hdfs dfs -ls s3a://ozonetest/a/b/i` shows a file `i`
+ 
+Proposed behavior:
+
+ * `ozone.om.enable.intermediate.dirs=true`: possible but `i/` is hidden from o3fs/ofs
+ * `ozone.om.enable.filesystem.paths=true`: key name is normalized to real key name
+
+## Create key and explicit create parent dir
+
+```
+aws s3api put-object --bucket ozonetest --key e/f/g/
+aws s3api put-object --bucket ozonetest --key e/f/
+```
+
+Behavior:
+
+ * S3 can support it without any problem
+ 
+Proposed behavior:
+
+After the first command `/e/f/` and `/e/` entries created in the key space (as they are required by `ofs`/`o3fs`) but **with a specific flag** (explicit=false). 
+
+AWS S3 list-objects API should exclude those entries from the result (!).
+
+Second command execution should modify the flag of `/e/f/` key to (explicit=true).
+
+## Create parent dir AND key with S3a
+
+This is the problem which is reported by [HDDS-4209](https://issues.apache.org/jira/browse/HDDS-4209)
+
+```
+hdfs dfs -mkdir -p s3a://b12345/d11/d12 # -> Success
+
+hdfs dfs -put /tmp/file1 s3a://b12345/d11/d12/file1 # -> fails with below error
+```
+
+Proposed behavior:
+
+ * `ozone.om.enable.intermediate.dirs=true`: should work without error

Review comment:
       I understand that we will have 0 size file with `/` at the end, but I propose to fix the intermediate directory creation to accept it.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org


[GitHub] [hadoop-ozone] bharatviswa504 commented on a change in pull request #1411: HDDS-4097. [DESIGN] S3/Ozone Filesystem inter-op

Posted by GitBox <gi...@apache.org>.
bharatviswa504 commented on a change in pull request #1411:
URL: https://github.com/apache/hadoop-ozone/pull/1411#discussion_r485813294



##########
File path: hadoop-hdds/docs/content/design/s3_hcfs.md
##########
@@ -0,0 +1,282 @@
+---
+title: S3/Ozone Filesystem inter-op 
+summary: How to support both S3 and HCFS and the same time
+date: 2020-09-09
+jira: HDDS-4097
+status: draft
+author: Marton Elek, 
+---
+<!--
+  Licensed under the Apache License, Version 2.0 (the "License");
+  you may not use this file except in compliance with the License.
+  You may obtain a copy of the License at
+
+   http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License. See accompanying LICENSE file.
+-->
+
+# Ozone S3 vs file-system semantics
+
+Ozone is an object-store for Hadoop ecosystem which can be used from multiple interfaces: 
+
+ 1. From Hadoop Compatible File Systems (will be called as *HCFS* in the remaining of this document) (RPC)
+ 2. From S3 compatible applications (REST)
+ 3. From container orchestrator as mounted volume (CSI, alpha feature)
+
+As Ozone is an object store it stores key and values in a flat hierarchy which is enough to support S3 (2). But to support Hadoop Compatible File System (and CSI), Ozone should simulated file system hierarchy.
+
+There are multiple challenges when file system hierarchy is simulated by a flat namespace:
+
+ 1. Some key patterns couldn't be easily transformed to file system path (e.g. `/a/b/../c`, `/a/b//d`, or a real key with directory path like `/b/d/`)
+ 2. Directory entries (which may have own properties) require special handling as file system interface requires a dir entry even if it's not created explicitly (for example if key `/a/b/c` is created `/a/b` supposed to be a visible directory entry for file system interface) 
+ 3. Non-recursive listing of directories can be hard (Listing direct entries under `/a` should ignore all the `/a/b/...`, `/a/b/c/...` keys) 
+ 4. Similar to listing, rename can be a costly operation as it requires to rename many keys (renaming a first level directory means a rename of all the keys with the same prefix)
+
+See also the [Hadoop S3A documentation](https://hadoop.apache.org/docs/current/hadoop-aws/tools/hadoop-aws/index.html#Introducing_the_Hadoop_S3A_client) which describes some of these problem when AWS S3 is used. (*Warnings* section)
+
+# Current status
+
+As of today *Ozone Manager* has two different interfaces (both are defined in `OmClientProtocol.proto`): 
+
+ 1. object store related functions (like *CreateKey*, *LookupKey*, *DeleteKey*,...)  
+ 2. file system related functions (like *CreateFile*, *LookupFile*,...)
+
+File system related functions uses the same flat hierarchy under the hood but includes additional functionalities. For example the `createFile` call creates all the intermediate directories for a specific key (create file `/a/b/c` will create `/a/b` and `/a` entries in the key space)
+
+Today, if a key is created from the S3 interface can cause exceptions if the intermediate directories are checked from HCFS:
+
+
+```shell
+$ aws s3api put-object --endpoint http://localhost:9878 --bucket bucket1 --key /a/b/c/d
+
+$ ozone fs -ls o3fs://bucket1.s3v/a/
+ls: `o3fs://bucket1.s3v/a/': No such file or directory
+```
+
+This problem is reported in [HDDS-3955](https://issues.apache.org/jira/browse/HDDS-3955), where a new configuration key is introduced (`ozone.om.enable.filesystem.paths`). If this is enabled, intermediate directories are created even if the object store interface is used.
+
+This configuration is turned off by default, which means that S3 and HCFS couldn't be used together.
+
+To solve the performance problems of the directory listing / rename, [HDDS-2939](https://issues.apache.org/jira/browse/HDDS-2939) is created, which propose to use a new prefix table to store the "directory" entries (=prefixes).
+
+[HDDS-4097](https://issues.apache.org/jira/browse/HDDS-4097) is created to normalize the key names based on file-system semantics if `ozone.om.enable.filesystem.paths` is enabled. But please note that `ozone.om.enable.filesystem.paths` should always be turned on if S3 and HCFS are both used which means that S3 and HCFS couldn't be used together with normalization.
+
+# Goals
+
+ * Out of the box Ozone should support both S3 and HCFS interfaces without any settings. (It's possible only for the regular, fs compatible key names)
+ * As 100% compatibility couldn't be achieved on both side we need a configuration to set the expectations for incompatible key names
+ * Default behavior of `o3fs` and `ofs` should be as close to `s3a` as possible (when s3 compatibilty is prefered)
+
+# Possible cases to support
+
+There are two main aspects of supporting both `ofs/o3fs` and `s3` together:
+
+ 1. `ofs/o3fs` require to create intermediate directory entries (for exapmle `/a/b` for the key `/b/c/c`)
+ 2. Special file-system incompatible key names require special attention
+
+The second couldn't be done with compromise.
+
+ 1. We either support all key names (including non fs compatible key names), which means `ofs/o3fs` can provide only a partial view
+ 2. Or we can normalize the key names to be fs compatible (which makes it possible to create inconsistent S3 keys)
+
+HDDS-3955 introduced `ozone.om.enable.filesystem.paths`, with this setting we will have two possible usage pattern:
+
+| ozone.om.enable.filesystem.paths= | true | false
+|-|-|-|
+| create itermediate dirs | YES | NO |
+| normalize key names from `ofs/o3fs` | YES | NO
+| force to normalize key names of `s3` interface | YES (1) | NO 
+| `s3` key `/a/b/c` available from `ofs/o3fs` | YES | NO
+| `s3` key `/a/b//c` available from `ofs/o3fs` | YES | NO
+| `s3` key `/a/b//c` available from `s3` | AWS S3 incompatibility | YES
+
+(1): Under implementation
+
+This proposal suggest to use a 3rd option where 100% AWS compatiblity is guaranteed in exchange of a limited `ofs/o3fs` view:
+
+| ozone.om.intermediate.dir.generation= | true |
+|-|-|-|
+| create itermediate dirs | YES | 
+| normalize key names from `ofs/o3fs` | YES |
+| force to normalize key names of `s3` interface | **NO** |
+| `s3` key `/a/b/c` available from `ofs/o3fs` | YES | 
+| `s3` key `/a/b//c` available from `ofs/o3fs` | NO | 
+| `s3` key `/a/b//c` available from `s3` | *YES* (100% AWS compatibility) |
+
+
+# Proposed solution
+
+In short: 
+
+ **I propose to make it possible to configure **normalization** and **intermediate dir creation**, independent from each other**
+ 
+It can be done in multiple ways. For the sake of simplicity, let's imagine two configuration option
+
+| configuration | behavior | 
+|-|-|
+| `ozone.om.enable.filesystem.paths=true`  | Enable intermediate dir generation **AND** key name normalization 

Review comment:
       And do we need this **AND key name normalization** as anyway, we have another config to define this behavior, which can be handled from code, instead of giving some additional meaning to the config?
   
   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org


[GitHub] [hadoop-ozone] elek commented on a change in pull request #1411: HDDS-4097. [DESIGN] S3/Ozone Filesystem inter-op

Posted by GitBox <gi...@apache.org>.
elek commented on a change in pull request #1411:
URL: https://github.com/apache/hadoop-ozone/pull/1411#discussion_r487495900



##########
File path: hadoop-hdds/docs/content/design/s3_hcfs.md
##########
@@ -0,0 +1,282 @@
+---
+title: S3/Ozone Filesystem inter-op 
+summary: How to support both S3 and HCFS and the same time
+date: 2020-09-09
+jira: HDDS-4097
+status: draft
+author: Marton Elek, 
+---
+<!--
+  Licensed under the Apache License, Version 2.0 (the "License");
+  you may not use this file except in compliance with the License.
+  You may obtain a copy of the License at
+
+   http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License. See accompanying LICENSE file.
+-->
+
+# Ozone S3 vs file-system semantics
+
+Ozone is an object-store for Hadoop ecosystem which can be used from multiple interfaces: 
+
+ 1. From Hadoop Compatible File Systems (will be called as *HCFS* in the remaining of this document) (RPC)
+ 2. From S3 compatible applications (REST)
+ 3. From container orchestrator as mounted volume (CSI, alpha feature)
+
+As Ozone is an object store it stores key and values in a flat hierarchy which is enough to support S3 (2). But to support Hadoop Compatible File System (and CSI), Ozone should simulated file system hierarchy.
+
+There are multiple challenges when file system hierarchy is simulated by a flat namespace:
+
+ 1. Some key patterns couldn't be easily transformed to file system path (e.g. `/a/b/../c`, `/a/b//d`, or a real key with directory path like `/b/d/`)
+ 2. Directory entries (which may have own properties) require special handling as file system interface requires a dir entry even if it's not created explicitly (for example if key `/a/b/c` is created `/a/b` supposed to be a visible directory entry for file system interface) 
+ 3. Non-recursive listing of directories can be hard (Listing direct entries under `/a` should ignore all the `/a/b/...`, `/a/b/c/...` keys) 
+ 4. Similar to listing, rename can be a costly operation as it requires to rename many keys (renaming a first level directory means a rename of all the keys with the same prefix)
+
+See also the [Hadoop S3A documentation](https://hadoop.apache.org/docs/current/hadoop-aws/tools/hadoop-aws/index.html#Introducing_the_Hadoop_S3A_client) which describes some of these problem when AWS S3 is used. (*Warnings* section)
+
+# Current status
+
+As of today *Ozone Manager* has two different interfaces (both are defined in `OmClientProtocol.proto`): 
+
+ 1. object store related functions (like *CreateKey*, *LookupKey*, *DeleteKey*,...)  
+ 2. file system related functions (like *CreateFile*, *LookupFile*,...)
+
+File system related functions uses the same flat hierarchy under the hood but includes additional functionalities. For example the `createFile` call creates all the intermediate directories for a specific key (create file `/a/b/c` will create `/a/b` and `/a` entries in the key space)
+
+Today, if a key is created from the S3 interface can cause exceptions if the intermediate directories are checked from HCFS:
+
+
+```shell
+$ aws s3api put-object --endpoint http://localhost:9878 --bucket bucket1 --key /a/b/c/d
+
+$ ozone fs -ls o3fs://bucket1.s3v/a/
+ls: `o3fs://bucket1.s3v/a/': No such file or directory
+```
+
+This problem is reported in [HDDS-3955](https://issues.apache.org/jira/browse/HDDS-3955), where a new configuration key is introduced (`ozone.om.enable.filesystem.paths`). If this is enabled, intermediate directories are created even if the object store interface is used.
+
+This configuration is turned off by default, which means that S3 and HCFS couldn't be used together.
+
+To solve the performance problems of the directory listing / rename, [HDDS-2939](https://issues.apache.org/jira/browse/HDDS-2939) is created, which propose to use a new prefix table to store the "directory" entries (=prefixes).
+
+[HDDS-4097](https://issues.apache.org/jira/browse/HDDS-4097) is created to normalize the key names based on file-system semantics if `ozone.om.enable.filesystem.paths` is enabled. But please note that `ozone.om.enable.filesystem.paths` should always be turned on if S3 and HCFS are both used which means that S3 and HCFS couldn't be used together with normalization.
+
+# Goals
+
+ * Out of the box Ozone should support both S3 and HCFS interfaces without any settings. (It's possible only for the regular, fs compatible key names)
+ * As 100% compatibility couldn't be achieved on both side we need a configuration to set the expectations for incompatible key names
+ * Default behavior of `o3fs` and `ofs` should be as close to `s3a` as possible (when s3 compatibilty is prefered)
+
+# Possible cases to support
+
+There are two main aspects of supporting both `ofs/o3fs` and `s3` together:
+
+ 1. `ofs/o3fs` require to create intermediate directory entries (for exapmle `/a/b` for the key `/b/c/c`)
+ 2. Special file-system incompatible key names require special attention
+
+The second couldn't be done with compromise.
+
+ 1. We either support all key names (including non fs compatible key names), which means `ofs/o3fs` can provide only a partial view
+ 2. Or we can normalize the key names to be fs compatible (which makes it possible to create inconsistent S3 keys)
+
+HDDS-3955 introduced `ozone.om.enable.filesystem.paths`, with this setting we will have two possible usage pattern:
+
+| ozone.om.enable.filesystem.paths= | true | false
+|-|-|-|
+| create itermediate dirs | YES | NO |
+| normalize key names from `ofs/o3fs` | YES | NO
+| force to normalize key names of `s3` interface | YES (1) | NO 
+| `s3` key `/a/b/c` available from `ofs/o3fs` | YES | NO
+| `s3` key `/a/b//c` available from `ofs/o3fs` | YES | NO
+| `s3` key `/a/b//c` available from `s3` | AWS S3 incompatibility | YES
+
+(1): Under implementation
+
+This proposal suggest to use a 3rd option where 100% AWS compatiblity is guaranteed in exchange of a limited `ofs/o3fs` view:
+
+| ozone.om.intermediate.dir.generation= | true |
+|-|-|-|
+| create itermediate dirs | YES | 
+| normalize key names from `ofs/o3fs` | YES |
+| force to normalize key names of `s3` interface | **NO** |
+| `s3` key `/a/b/c` available from `ofs/o3fs` | YES | 
+| `s3` key `/a/b//c` available from `ofs/o3fs` | NO | 
+| `s3` key `/a/b//c` available from `s3` | *YES* (100% AWS compatibility) |
+
+
+# Proposed solution
+
+In short: 
+
+ **I propose to make it possible to configure **normalization** and **intermediate dir creation**, independent from each other**
+ 
+It can be done in multiple ways. For the sake of simplicity, let's imagine two configuration option
+
+| configuration | behavior | 
+|-|-|
+| `ozone.om.enable.filesystem.paths=true`  | Enable intermediate dir generation **AND** key name normalization 
+| `ozone.om.enable.intermediate.dirs=true` | Enable only the intermediate dir generation
+
+## S3/HCFS Interoperability
+
+**In case of intermediate directory generation is enabled (with either of the configuraiton keys)**:
+
+When somebody creates a new key like `/a/b/c/d`, the same key should be visible from HCFS (`o3fs//` or `o3://`). `/a`, `/a/b` and `/a/b/c` should be visible as directories from HCFS.
+
+S3 should list only the `/a/b/c/d` keys, (`/a`, `/a/b`, `/a/b/c` keys, created to help HCFS, **won't be visible** if the key is created from S3)
+
+This can be done with persisting an extra flag with the implicit directory entries. These entries can be modified if they are explicit created.
+
+This flag should be added only for the keys which are created by S3. `ofs://` and `of3fs://`  create explicit directories all the time.

Review comment:
       The compatibility story is only important when somebody uses `s3` interface. S3 gateway expected to be compatible with AWS S3.
   
   But there is no rule limtations about showing something for S3G which is created from ofs/o3fs. From this point of view ofs/o3fs are external clients which can do anything, there couldn't be any excepttations in s3g.
   
    * Directories created by ofs/o3fs can be shown from s3 all the time
    * During key S3 creation intermediate directories should be created but (by default) hidden to S3
    * Unless somebody creates them with ofs interface. In those case they can be visible. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org


[GitHub] [hadoop-ozone] elek commented on a change in pull request #1411: HDDS-4097. [DESIGN] S3/Ozone Filesystem inter-op

Posted by GitBox <gi...@apache.org>.
elek commented on a change in pull request #1411:
URL: https://github.com/apache/hadoop-ozone/pull/1411#discussion_r487496850



##########
File path: hadoop-hdds/docs/content/design/s3_hcfs.md
##########
@@ -0,0 +1,280 @@
+---
+title: S3/Ozone Filesystem inter-op 
+summary: How to support both S3 and HCFS and the same time
+date: 2020-09-09
+jira: HDDS-4097
+status: draft
+author: Marton Elek, 
+---
+<!--
+  Licensed under the Apache License, Version 2.0 (the "License");
+  you may not use this file except in compliance with the License.
+  You may obtain a copy of the License at
+
+   http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License. See accompanying LICENSE file.
+-->
+
+# Ozone S3 vs file-system semantics
+
+Ozone is an object-store for Hadoop ecosystem which can be used from multiple interfaces: 
+
+ 1. From Hadoop Compatible File Systems (will be called as *HCFS* in the remaining of this document) (RPC)
+ 2. From S3 compatible applications (REST)
+ 3. From container orchestrator as mounted volume (CSI, alpha feature)
+
+As Ozone is an object store it stores key and values in a flat hierarchy which is enough to support S3 (2). But to support Hadoop Compatible File System (and CSI), Ozone should simulated file system hierarchy.
+
+There are multiple challenges when file system hierarchy is simulated by a flat namespace:
+
+ 1. Some key patterns couldn't be easily transformed to file system path (e.g. `/a/b/../c`, `/a/b//d`, or a real key with directory path like `/b/d/`)
+ 2. Directory entries (which may have own properties) require special handling as file system interface requires a dir entry even if it's not created explicitly (for example if key `/a/b/c` is created `/a/b` supposed to be a visible directory entry for file system interface) 
+ 3. Non-recursive listing of directories can be hard (Listing direct entries under `/a` should ignore all the `/a/b/...`, `/a/b/c/...` keys) 
+ 4. Similar to listing, rename can be a costly operation as it requires to rename many keys (renaming a first level directory means a rename of all the keys with the same prefix)
+
+See also the [Hadoop S3A documentation](https://hadoop.apache.org/docs/current/hadoop-aws/tools/hadoop-aws/index.html#Introducing_the_Hadoop_S3A_client) which describes some of these problem when AWS S3 is used. (*Warnings* section)
+
+# Current status
+
+As of today *Ozone Manager* has two different interfaces (both are defined in `OmClientProtocol.proto`): 
+
+ 1. object store related functions (like *CreateKey*, *LookupKey*, *DeleteKey*,...)  
+ 2. file system related functions (like *CreateFile*, *LookupFile*,...)
+
+File system related functions uses the same flat hierarchy under the hood but includes additional functionalities. For example the `createFile` call creates all the intermediate directories for a specific key (create file `/a/b/c` will create `/a/b` and `/a` entries in the key space)
+
+Today, if a key is created from the S3 interface can cause exceptions if the intermediate directories are checked from HCFS:
+
+
+```shell
+$ aws s3api put-object --endpoint http://localhost:9878 --bucket bucket1 --key /a/b/c/d
+
+$ ozone fs -ls o3fs://bucket1.s3v/a/
+ls: `o3fs://bucket1.s3v/a/': No such file or directory
+```
+
+This problem is reported in [HDDS-3955](https://issues.apache.org/jira/browse/HDDS-3955), where a new configuration key is introduced (`ozone.om.enable.filesystem.paths`). If this is enabled, intermediate directories are created even if the object store interface is used.
+
+This configuration is turned off by default, which means that S3 and HCFS couldn't be used together.
+
+To solve the performance problems of the directory listing / rename, [HDDS-2939](https://issues.apache.org/jira/browse/HDDS-2939) is created, which propose to use a new prefix table to store the "directory" entries (=prefixes).
+
+[HDDS-4097](https://issues.apache.org/jira/browse/HDDS-4097) is created to normalize the key names based on file-system semantics if `ozone.om.enable.filesystem.paths` is enabled. But please note that `ozone.om.enable.filesystem.paths` should always be turned on if S3 and HCFS are both used. It means that if both S3 and HCFS are used, normalization is forced, and S3 interface is not fully AWS S3 compatible. There is no option to use HCFS and S3 but with full AWS compatibility (and reduced HCFS compatibility). 
+
+# Goals
+
+ * Out of the box Ozone should support both S3 and HCFS interfaces without any settings. (It's possible only for the regular, fs compatible key names)
+ * As 100% compatibility couldn't be achieved on both side we need a configuration to set the expectations for incompatible key names
+ * Default behavior of `o3fs` and `ofs` should be as close to `s3a` as possible (when s3 compatibilty is prefered)
+
+# Possible cases to support
+
+There are two main aspects of supporting both `ofs/o3fs` and `s3` together:
+
+ 1. `ofs/o3fs` require to create intermediate directory entries (for example `/a/b` for the key `/b/c/c`)
+ 2. Special file-system incompatible key names require special attention
+
+The second couldn't be done with compromise.
+
+ 1. We either support all key names (including non fs compatible key names), which means `ofs/o3fs` can provide only a partial view
+ 2. Or we can normalize the key names to be fs compatible (which makes it possible to create inconsistent S3 keys)
+
+HDDS-3955 introduced `ozone.om.enable.filesystem.paths`, with this setting we will have two possible usage pattern:
+
+| ozone.om.enable.filesystem.paths= | true | false
+|-|-|-|
+| create itermediate dirs | YES | NO |
+| normalize key names from `ofs/o3fs` | YES | NO
+| force to normalize key names of `s3` interface | YES | NO 
+| `s3` key `/a/b/c` available from `ofs/o3fs` | YES | NO
+| `s3` key `/a/b//c` available from `ofs/o3fs` | YES | NO
+| `s3` key `/a/b//c` available from `s3` | AWS S3 incompatibility | YES

Review comment:
       In general, AWS S3 can save any key without normalization, and the same content can be retrieved with the same key (and list shows the raw names).
   
   `AWS S3 incompatibility`: means partial incompatibility. All the `s3` tools which depends on the original AWS S3 behavior can be failed. (For example if a S3 Fuse file system creates dirs with keys ending with `/` **AND** store real directory metadata on that specific key --> it will be broken).
   
   I started to create robot tests for these cases which can be used to check this side of the compatibility:
   
   https://github.com/elek/hadoop-ozone/commit/d21890bb0b0d08d0f8c631e9f24af4745b2f3aca  




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org


[GitHub] [hadoop-ozone] xiaoyuyao commented on a change in pull request #1411: HDDS-4097. [DESIGN] S3/Ozone Filesystem inter-op

Posted by GitBox <gi...@apache.org>.
xiaoyuyao commented on a change in pull request #1411:
URL: https://github.com/apache/hadoop-ozone/pull/1411#discussion_r486761072



##########
File path: hadoop-hdds/docs/content/design/s3_hcfs.md
##########
@@ -0,0 +1,282 @@
+---
+title: S3/Ozone Filesystem inter-op 
+summary: How to support both S3 and HCFS and the same time
+date: 2020-09-09
+jira: HDDS-4097
+status: draft
+author: Marton Elek, 
+---
+<!--
+  Licensed under the Apache License, Version 2.0 (the "License");
+  you may not use this file except in compliance with the License.
+  You may obtain a copy of the License at
+
+   http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License. See accompanying LICENSE file.
+-->
+
+# Ozone S3 vs file-system semantics
+
+Ozone is an object-store for Hadoop ecosystem which can be used from multiple interfaces: 
+
+ 1. From Hadoop Compatible File Systems (will be called as *HCFS* in the remaining of this document) (RPC)
+ 2. From S3 compatible applications (REST)
+ 3. From container orchestrator as mounted volume (CSI, alpha feature)
+
+As Ozone is an object store it stores key and values in a flat hierarchy which is enough to support S3 (2). But to support Hadoop Compatible File System (and CSI), Ozone should simulated file system hierarchy.
+
+There are multiple challenges when file system hierarchy is simulated by a flat namespace:
+
+ 1. Some key patterns couldn't be easily transformed to file system path (e.g. `/a/b/../c`, `/a/b//d`, or a real key with directory path like `/b/d/`)
+ 2. Directory entries (which may have own properties) require special handling as file system interface requires a dir entry even if it's not created explicitly (for example if key `/a/b/c` is created `/a/b` supposed to be a visible directory entry for file system interface) 
+ 3. Non-recursive listing of directories can be hard (Listing direct entries under `/a` should ignore all the `/a/b/...`, `/a/b/c/...` keys) 
+ 4. Similar to listing, rename can be a costly operation as it requires to rename many keys (renaming a first level directory means a rename of all the keys with the same prefix)
+
+See also the [Hadoop S3A documentation](https://hadoop.apache.org/docs/current/hadoop-aws/tools/hadoop-aws/index.html#Introducing_the_Hadoop_S3A_client) which describes some of these problem when AWS S3 is used. (*Warnings* section)
+
+# Current status
+
+As of today *Ozone Manager* has two different interfaces (both are defined in `OmClientProtocol.proto`): 
+
+ 1. object store related functions (like *CreateKey*, *LookupKey*, *DeleteKey*,...)  
+ 2. file system related functions (like *CreateFile*, *LookupFile*,...)
+
+File system related functions uses the same flat hierarchy under the hood but includes additional functionalities. For example the `createFile` call creates all the intermediate directories for a specific key (create file `/a/b/c` will create `/a/b` and `/a` entries in the key space)
+
+Today, if a key is created from the S3 interface can cause exceptions if the intermediate directories are checked from HCFS:
+
+
+```shell
+$ aws s3api put-object --endpoint http://localhost:9878 --bucket bucket1 --key /a/b/c/d
+
+$ ozone fs -ls o3fs://bucket1.s3v/a/
+ls: `o3fs://bucket1.s3v/a/': No such file or directory
+```
+
+This problem is reported in [HDDS-3955](https://issues.apache.org/jira/browse/HDDS-3955), where a new configuration key is introduced (`ozone.om.enable.filesystem.paths`). If this is enabled, intermediate directories are created even if the object store interface is used.
+
+This configuration is turned off by default, which means that S3 and HCFS couldn't be used together.
+
+To solve the performance problems of the directory listing / rename, [HDDS-2939](https://issues.apache.org/jira/browse/HDDS-2939) is created, which propose to use a new prefix table to store the "directory" entries (=prefixes).
+
+[HDDS-4097](https://issues.apache.org/jira/browse/HDDS-4097) is created to normalize the key names based on file-system semantics if `ozone.om.enable.filesystem.paths` is enabled. But please note that `ozone.om.enable.filesystem.paths` should always be turned on if S3 and HCFS are both used which means that S3 and HCFS couldn't be used together with normalization.
+
+# Goals
+
+ * Out of the box Ozone should support both S3 and HCFS interfaces without any settings. (It's possible only for the regular, fs compatible key names)
+ * As 100% compatibility couldn't be achieved on both side we need a configuration to set the expectations for incompatible key names
+ * Default behavior of `o3fs` and `ofs` should be as close to `s3a` as possible (when s3 compatibilty is prefered)
+
+# Possible cases to support
+
+There are two main aspects of supporting both `ofs/o3fs` and `s3` together:
+
+ 1. `ofs/o3fs` require to create intermediate directory entries (for exapmle `/a/b` for the key `/b/c/c`)
+ 2. Special file-system incompatible key names require special attention
+
+The second couldn't be done with compromise.
+
+ 1. We either support all key names (including non fs compatible key names), which means `ofs/o3fs` can provide only a partial view
+ 2. Or we can normalize the key names to be fs compatible (which makes it possible to create inconsistent S3 keys)
+
+HDDS-3955 introduced `ozone.om.enable.filesystem.paths`, with this setting we will have two possible usage pattern:
+
+| ozone.om.enable.filesystem.paths= | true | false
+|-|-|-|
+| create itermediate dirs | YES | NO |
+| normalize key names from `ofs/o3fs` | YES | NO
+| force to normalize key names of `s3` interface | YES (1) | NO 
+| `s3` key `/a/b/c` available from `ofs/o3fs` | YES | NO
+| `s3` key `/a/b//c` available from `ofs/o3fs` | YES | NO
+| `s3` key `/a/b//c` available from `s3` | AWS S3 incompatibility | YES
+
+(1): Under implementation
+
+This proposal suggest to use a 3rd option where 100% AWS compatiblity is guaranteed in exchange of a limited `ofs/o3fs` view:
+
+| ozone.om.intermediate.dir.generation= | true |
+|-|-|-|
+| create itermediate dirs | YES | 
+| normalize key names from `ofs/o3fs` | YES |
+| force to normalize key names of `s3` interface | **NO** |
+| `s3` key `/a/b/c` available from `ofs/o3fs` | YES | 
+| `s3` key `/a/b//c` available from `ofs/o3fs` | NO | 
+| `s3` key `/a/b//c` available from `s3` | *YES* (100% AWS compatibility) |
+
+
+# Proposed solution
+
+In short: 
+
+ **I propose to make it possible to configure **normalization** and **intermediate dir creation**, independent from each other**
+ 
+It can be done in multiple ways. For the sake of simplicity, let's imagine two configuration option
+
+| configuration | behavior | 
+|-|-|
+| `ozone.om.enable.filesystem.paths=true`  | Enable intermediate dir generation **AND** key name normalization 
+| `ozone.om.enable.intermediate.dirs=true` | Enable only the intermediate dir generation
+
+## S3/HCFS Interoperability
+
+**In case of intermediate directory generation is enabled (with either of the configuraiton keys)**:
+
+When somebody creates a new key like `/a/b/c/d`, the same key should be visible from HCFS (`o3fs//` or `o3://`). `/a`, `/a/b` and `/a/b/c` should be visible as directories from HCFS.
+
+S3 should list only the `/a/b/c/d` keys, (`/a`, `/a/b`, `/a/b/c` keys, created to help HCFS, **won't be visible** if the key is created from S3)
+
+This can be done with persisting an extra flag with the implicit directory entries. These entries can be modified if they are explicit created.
+
+This flag should be added only for the keys which are created by S3. `ofs://` and `of3fs://`  create explicit directories all the time.

Review comment:
       > S3 should list only the `/a/b/c/d` keys, (`/a`, `/a/b`, `/a/b/c` keys, created to help HCFS, **won't be visible** if the key is created from S3)
   
   what if user create additional files under intermediate dirs such as /a/b/d via FS interface, we still want to show them in this case for interop?
   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org


[GitHub] [hadoop-ozone] xiaoyuyao commented on a change in pull request #1411: HDDS-4097. [DESIGN] S3/Ozone Filesystem inter-op

Posted by GitBox <gi...@apache.org>.
xiaoyuyao commented on a change in pull request #1411:
URL: https://github.com/apache/hadoop-ozone/pull/1411#discussion_r486760564



##########
File path: hadoop-hdds/docs/content/design/s3_hcfs.md
##########
@@ -0,0 +1,282 @@
+---
+title: S3/Ozone Filesystem inter-op 
+summary: How to support both S3 and HCFS and the same time
+date: 2020-09-09
+jira: HDDS-4097
+status: draft
+author: Marton Elek, 
+---
+<!--
+  Licensed under the Apache License, Version 2.0 (the "License");
+  you may not use this file except in compliance with the License.
+  You may obtain a copy of the License at
+
+   http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License. See accompanying LICENSE file.
+-->
+
+# Ozone S3 vs file-system semantics
+
+Ozone is an object-store for Hadoop ecosystem which can be used from multiple interfaces: 
+
+ 1. From Hadoop Compatible File Systems (will be called as *HCFS* in the remaining of this document) (RPC)
+ 2. From S3 compatible applications (REST)
+ 3. From container orchestrator as mounted volume (CSI, alpha feature)
+
+As Ozone is an object store it stores key and values in a flat hierarchy which is enough to support S3 (2). But to support Hadoop Compatible File System (and CSI), Ozone should simulated file system hierarchy.
+
+There are multiple challenges when file system hierarchy is simulated by a flat namespace:
+
+ 1. Some key patterns couldn't be easily transformed to file system path (e.g. `/a/b/../c`, `/a/b//d`, or a real key with directory path like `/b/d/`)
+ 2. Directory entries (which may have own properties) require special handling as file system interface requires a dir entry even if it's not created explicitly (for example if key `/a/b/c` is created `/a/b` supposed to be a visible directory entry for file system interface) 
+ 3. Non-recursive listing of directories can be hard (Listing direct entries under `/a` should ignore all the `/a/b/...`, `/a/b/c/...` keys) 
+ 4. Similar to listing, rename can be a costly operation as it requires to rename many keys (renaming a first level directory means a rename of all the keys with the same prefix)
+
+See also the [Hadoop S3A documentation](https://hadoop.apache.org/docs/current/hadoop-aws/tools/hadoop-aws/index.html#Introducing_the_Hadoop_S3A_client) which describes some of these problem when AWS S3 is used. (*Warnings* section)
+
+# Current status
+
+As of today *Ozone Manager* has two different interfaces (both are defined in `OmClientProtocol.proto`): 
+
+ 1. object store related functions (like *CreateKey*, *LookupKey*, *DeleteKey*,...)  
+ 2. file system related functions (like *CreateFile*, *LookupFile*,...)
+
+File system related functions uses the same flat hierarchy under the hood but includes additional functionalities. For example the `createFile` call creates all the intermediate directories for a specific key (create file `/a/b/c` will create `/a/b` and `/a` entries in the key space)
+
+Today, if a key is created from the S3 interface can cause exceptions if the intermediate directories are checked from HCFS:
+
+
+```shell
+$ aws s3api put-object --endpoint http://localhost:9878 --bucket bucket1 --key /a/b/c/d
+
+$ ozone fs -ls o3fs://bucket1.s3v/a/
+ls: `o3fs://bucket1.s3v/a/': No such file or directory
+```
+
+This problem is reported in [HDDS-3955](https://issues.apache.org/jira/browse/HDDS-3955), where a new configuration key is introduced (`ozone.om.enable.filesystem.paths`). If this is enabled, intermediate directories are created even if the object store interface is used.
+
+This configuration is turned off by default, which means that S3 and HCFS couldn't be used together.
+
+To solve the performance problems of the directory listing / rename, [HDDS-2939](https://issues.apache.org/jira/browse/HDDS-2939) is created, which propose to use a new prefix table to store the "directory" entries (=prefixes).
+
+[HDDS-4097](https://issues.apache.org/jira/browse/HDDS-4097) is created to normalize the key names based on file-system semantics if `ozone.om.enable.filesystem.paths` is enabled. But please note that `ozone.om.enable.filesystem.paths` should always be turned on if S3 and HCFS are both used which means that S3 and HCFS couldn't be used together with normalization.
+
+# Goals
+
+ * Out of the box Ozone should support both S3 and HCFS interfaces without any settings. (It's possible only for the regular, fs compatible key names)
+ * As 100% compatibility couldn't be achieved on both side we need a configuration to set the expectations for incompatible key names
+ * Default behavior of `o3fs` and `ofs` should be as close to `s3a` as possible (when s3 compatibilty is prefered)
+
+# Possible cases to support
+
+There are two main aspects of supporting both `ofs/o3fs` and `s3` together:
+
+ 1. `ofs/o3fs` require to create intermediate directory entries (for exapmle `/a/b` for the key `/b/c/c`)
+ 2. Special file-system incompatible key names require special attention
+
+The second couldn't be done with compromise.
+
+ 1. We either support all key names (including non fs compatible key names), which means `ofs/o3fs` can provide only a partial view
+ 2. Or we can normalize the key names to be fs compatible (which makes it possible to create inconsistent S3 keys)
+
+HDDS-3955 introduced `ozone.om.enable.filesystem.paths`, with this setting we will have two possible usage pattern:
+
+| ozone.om.enable.filesystem.paths= | true | false
+|-|-|-|
+| create itermediate dirs | YES | NO |
+| normalize key names from `ofs/o3fs` | YES | NO
+| force to normalize key names of `s3` interface | YES (1) | NO 
+| `s3` key `/a/b/c` available from `ofs/o3fs` | YES | NO
+| `s3` key `/a/b//c` available from `ofs/o3fs` | YES | NO
+| `s3` key `/a/b//c` available from `s3` | AWS S3 incompatibility | YES
+
+(1): Under implementation
+
+This proposal suggest to use a 3rd option where 100% AWS compatiblity is guaranteed in exchange of a limited `ofs/o3fs` view:
+
+| ozone.om.intermediate.dir.generation= | true |
+|-|-|-|
+| create itermediate dirs | YES | 
+| normalize key names from `ofs/o3fs` | YES |
+| force to normalize key names of `s3` interface | **NO** |
+| `s3` key `/a/b/c` available from `ofs/o3fs` | YES | 
+| `s3` key `/a/b//c` available from `ofs/o3fs` | NO | 
+| `s3` key `/a/b//c` available from `s3` | *YES* (100% AWS compatibility) |
+
+
+# Proposed solution
+
+In short: 
+
+ **I propose to make it possible to configure **normalization** and **intermediate dir creation**, independent from each other**
+ 
+It can be done in multiple ways. For the sake of simplicity, let's imagine two configuration option
+
+| configuration | behavior | 
+|-|-|
+| `ozone.om.enable.filesystem.paths=true`  | Enable intermediate dir generation **AND** key name normalization 

Review comment:
       bq. Existing config means CREATE_DIR+NORMALIZE, new config is just CREATE_DIR.
   
   Are these the only two cases that are useful? do we need to support other combinations?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org


[GitHub] [hadoop-ozone] bharatviswa504 commented on a change in pull request #1411: HDDS-4097. [DESIGN] S3/Ozone Filesystem inter-op

Posted by GitBox <gi...@apache.org>.
bharatviswa504 commented on a change in pull request #1411:
URL: https://github.com/apache/hadoop-ozone/pull/1411#discussion_r485784568



##########
File path: hadoop-hdds/docs/content/design/s3_hcfs.md
##########
@@ -67,45 +66,100 @@ To solve the performance problems of the directory listing / rename, [HDDS-2939]
 
 [HDDS-4097](https://issues.apache.org/jira/browse/HDDS-4097) is created to normalize the key names based on file-system semantics if `ozone.om.enable.filesystem.paths` is enabled. But please note that `ozone.om.enable.filesystem.paths` should always be turned on if S3 and HCFS are both used which means that S3 and HCFS couldn't be used together with normalization.
 
-## Goals
+# Goals
+
+ * Out of the box Ozone should support both S3 and HCFS interfaces without any settings. (It's possible only for the regular, fs compatible key names)
+ * As 100% compatibility couldn't be achieved on both side we need a configuration to set the expectations for incompatible key names
+ * Default behavior of `o3fs` and `ofs` should be as close to `s3a` as possible (when s3 compatibilty is prefered)
+
+# Possible cases to support
+
+There are two main aspects of supporting both `ofs/o3fs` and `s3` together:
+
+ 1. `ofs/o3fs` require to create intermediate directory entries (for exapmle `/a/b` for the key `/b/c/c`)
+ 2. Special file-system incompatible key names require special attention
+
+The second couldn't be done with compromise.
+
+ 1. We either support all key names (including non fs compatible key names), which means `ofs/o3fs` can provide only a partial view
+ 2. Or we can normalize the key names to be fs compatible (which makes it possible to create inconsistent S3 keys)
+
+HDDS-3955 introduced `ozone.om.enable.filesystem.paths`, with this setting we will have two possible usage pattern:
+
+| ozone.om.enable.filesystem.paths= | true | false
+|-|-|-|
+| create itermediate dirs | YES | NO |
+| normalize key names from `ofs/o3fs` | YES | NO
+| force to normalize key names of `s3` interface | YES (1) | NO 
+| `s3` key `/a/b/c` available from `ofs/o3fs` | YES | NO
+| `s3` key `/a/b//c` available from `ofs/o3fs` | YES | NO
+| `s3` key `/a/b//c` available from `s3` | AWS S3 incompatibility | YES
+
+(1): Under implementation

Review comment:
       Here AWS S3 incompatibility means, is it because we are showing normalized keys?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org


[GitHub] [hadoop-ozone] elek commented on pull request #1411: HDDS-4097. [DESIGN] S3/Ozone Filesystem inter-op

Posted by GitBox <gi...@apache.org>.
elek commented on pull request #1411:
URL: https://github.com/apache/hadoop-ozone/pull/1411#issuecomment-691630554


   > > > Thank You @elek for the design document.
   > > > My understanding from this is the draft is as below. Let me know if I am missing something here.
   > > > <img alt="Screen Shot 2020-09-09 at 10 59 11 AM" width="998" src="https://user-images.githubusercontent.com/8586345/92635994-8856fe80-f28b-11ea-95bf-8864d48e488f.png">
   > > 
   > > 
   > > Correct. But this is not a matrix anymore. You should turn on either first or second of the configs, but not both.
   > 
   > Not sure what is meant here, because we have 2 configs, now we can have 4 combinations according to proposal 3 are valid, 4th one is not.
   
   Agree, but there are two ways to define this 3 options:
   
   1st approach:
   
   KEY1=true,KEY2=true --> option1
   KEY1=false,KEY2=false --> option2
   KEY1=true,KEY2=false --> option3
   KEY1=false,KEY2=true --> invalid 
   
   2nd approach:
   
   KEY1=true --> option 1
   KEY2=true --> option2
   else --> option3
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org


[GitHub] [hadoop-ozone] bharatviswa504 commented on a change in pull request #1411: HDDS-4097. [DESIGN] S3/Ozone Filesystem inter-op

Posted by GitBox <gi...@apache.org>.
bharatviswa504 commented on a change in pull request #1411:
URL: https://github.com/apache/hadoop-ozone/pull/1411#discussion_r485805457



##########
File path: hadoop-hdds/docs/content/design/s3_hcfs.md
##########
@@ -0,0 +1,282 @@
+---
+title: S3/Ozone Filesystem inter-op 
+summary: How to support both S3 and HCFS and the same time
+date: 2020-09-09
+jira: HDDS-4097
+status: draft
+author: Marton Elek, 
+---
+<!--
+  Licensed under the Apache License, Version 2.0 (the "License");
+  you may not use this file except in compliance with the License.
+  You may obtain a copy of the License at
+
+   http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License. See accompanying LICENSE file.
+-->
+
+# Ozone S3 vs file-system semantics
+
+Ozone is an object-store for Hadoop ecosystem which can be used from multiple interfaces: 
+
+ 1. From Hadoop Compatible File Systems (will be called as *HCFS* in the remaining of this document) (RPC)
+ 2. From S3 compatible applications (REST)
+ 3. From container orchestrator as mounted volume (CSI, alpha feature)
+
+As Ozone is an object store it stores key and values in a flat hierarchy which is enough to support S3 (2). But to support Hadoop Compatible File System (and CSI), Ozone should simulated file system hierarchy.
+
+There are multiple challenges when file system hierarchy is simulated by a flat namespace:
+
+ 1. Some key patterns couldn't be easily transformed to file system path (e.g. `/a/b/../c`, `/a/b//d`, or a real key with directory path like `/b/d/`)
+ 2. Directory entries (which may have own properties) require special handling as file system interface requires a dir entry even if it's not created explicitly (for example if key `/a/b/c` is created `/a/b` supposed to be a visible directory entry for file system interface) 
+ 3. Non-recursive listing of directories can be hard (Listing direct entries under `/a` should ignore all the `/a/b/...`, `/a/b/c/...` keys) 
+ 4. Similar to listing, rename can be a costly operation as it requires to rename many keys (renaming a first level directory means a rename of all the keys with the same prefix)
+
+See also the [Hadoop S3A documentation](https://hadoop.apache.org/docs/current/hadoop-aws/tools/hadoop-aws/index.html#Introducing_the_Hadoop_S3A_client) which describes some of these problem when AWS S3 is used. (*Warnings* section)
+
+# Current status
+
+As of today *Ozone Manager* has two different interfaces (both are defined in `OmClientProtocol.proto`): 
+
+ 1. object store related functions (like *CreateKey*, *LookupKey*, *DeleteKey*,...)  
+ 2. file system related functions (like *CreateFile*, *LookupFile*,...)
+
+File system related functions uses the same flat hierarchy under the hood but includes additional functionalities. For example the `createFile` call creates all the intermediate directories for a specific key (create file `/a/b/c` will create `/a/b` and `/a` entries in the key space)
+
+Today, if a key is created from the S3 interface can cause exceptions if the intermediate directories are checked from HCFS:
+
+
+```shell
+$ aws s3api put-object --endpoint http://localhost:9878 --bucket bucket1 --key /a/b/c/d
+
+$ ozone fs -ls o3fs://bucket1.s3v/a/
+ls: `o3fs://bucket1.s3v/a/': No such file or directory
+```
+
+This problem is reported in [HDDS-3955](https://issues.apache.org/jira/browse/HDDS-3955), where a new configuration key is introduced (`ozone.om.enable.filesystem.paths`). If this is enabled, intermediate directories are created even if the object store interface is used.
+
+This configuration is turned off by default, which means that S3 and HCFS couldn't be used together.
+
+To solve the performance problems of the directory listing / rename, [HDDS-2939](https://issues.apache.org/jira/browse/HDDS-2939) is created, which propose to use a new prefix table to store the "directory" entries (=prefixes).
+
+[HDDS-4097](https://issues.apache.org/jira/browse/HDDS-4097) is created to normalize the key names based on file-system semantics if `ozone.om.enable.filesystem.paths` is enabled. But please note that `ozone.om.enable.filesystem.paths` should always be turned on if S3 and HCFS are both used which means that S3 and HCFS couldn't be used together with normalization.
+
+# Goals
+
+ * Out of the box Ozone should support both S3 and HCFS interfaces without any settings. (It's possible only for the regular, fs compatible key names)
+ * As 100% compatibility couldn't be achieved on both side we need a configuration to set the expectations for incompatible key names
+ * Default behavior of `o3fs` and `ofs` should be as close to `s3a` as possible (when s3 compatibilty is prefered)
+
+# Possible cases to support
+
+There are two main aspects of supporting both `ofs/o3fs` and `s3` together:
+
+ 1. `ofs/o3fs` require to create intermediate directory entries (for exapmle `/a/b` for the key `/b/c/c`)
+ 2. Special file-system incompatible key names require special attention
+
+The second couldn't be done with compromise.
+
+ 1. We either support all key names (including non fs compatible key names), which means `ofs/o3fs` can provide only a partial view
+ 2. Or we can normalize the key names to be fs compatible (which makes it possible to create inconsistent S3 keys)
+
+HDDS-3955 introduced `ozone.om.enable.filesystem.paths`, with this setting we will have two possible usage pattern:
+
+| ozone.om.enable.filesystem.paths= | true | false
+|-|-|-|
+| create itermediate dirs | YES | NO |
+| normalize key names from `ofs/o3fs` | YES | NO
+| force to normalize key names of `s3` interface | YES (1) | NO 
+| `s3` key `/a/b/c` available from `ofs/o3fs` | YES | NO
+| `s3` key `/a/b//c` available from `ofs/o3fs` | YES | NO
+| `s3` key `/a/b//c` available from `s3` | AWS S3 incompatibility | YES
+
+(1): Under implementation
+
+This proposal suggest to use a 3rd option where 100% AWS compatiblity is guaranteed in exchange of a limited `ofs/o3fs` view:
+
+| ozone.om.intermediate.dir.generation= | true |
+|-|-|-|
+| create itermediate dirs | YES | 
+| normalize key names from `ofs/o3fs` | YES |
+| force to normalize key names of `s3` interface | **NO** |
+| `s3` key `/a/b/c` available from `ofs/o3fs` | YES | 
+| `s3` key `/a/b//c` available from `ofs/o3fs` | NO | 
+| `s3` key `/a/b//c` available from `s3` | *YES* (100% AWS compatibility) |
+
+
+# Proposed solution
+
+In short: 
+
+ **I propose to make it possible to configure **normalization** and **intermediate dir creation**, independent from each other**
+ 
+It can be done in multiple ways. For the sake of simplicity, let's imagine two configuration option
+
+| configuration | behavior | 
+|-|-|
+| `ozone.om.enable.filesystem.paths=true`  | Enable intermediate dir generation **AND** key name normalization 
+| `ozone.om.enable.intermediate.dirs=true` | Enable only the intermediate dir generation
+
+## S3/HCFS Interoperability
+
+**In case of intermediate directory generation is enabled (with either of the configuraiton keys)**:
+
+When somebody creates a new key like `/a/b/c/d`, the same key should be visible from HCFS (`o3fs//` or `o3://`). `/a`, `/a/b` and `/a/b/c` should be visible as directories from HCFS.
+
+S3 should list only the `/a/b/c/d` keys, (`/a`, `/a/b`, `/a/b/c` keys, created to help HCFS, **won't be visible** if the key is created from S3)
+
+This can be done with persisting an extra flag with the implicit directory entries. These entries can be modified if they are explicit created.
+
+This flag should be added only for the keys which are created by S3. `ofs://` and `of3fs://`  create explicit directories all the time.
+
+Advantages of this approach:
+
+ 1. HCFS and S3 can work together
+ 2. S3 behavior is closer to the original AWS s3 behavior (when `/a/b/c` key is created `/a/b` won't be visible)
+
+## Handling of the incompatible paths
+
+As it's defined above the intermediate directory generation and normalization are two independent settings. (It's possible to choose only to create the intermediate directories).
+
+**If normalization is choosen**: (`ozone.om.enable.filesystem.paths=true`), all the key names will be normalized to fs-compatible name. It may cause a conflict (error) if the normalized key is already exists (or exists as a file instead of directory)
+
+**Without normalization (`ozone.om.enable.intermediate.dirs=true`)**:
+
+Creating intermediate directories might not be possible if path contains illegal characters or can't be parsed as a file system path. **These keys will be invisible from HCFS** by default. They will be ignored during the normal file list.
+
+## Using Ozone in object-store only mode
+
+Creating intermediate directories can have some overhead (write amplification is increased if many keys are written with different prefixes as we need an entry for each prefixes). This write-amplification can be handled with the current implementation: based on the measurements RocksDB has no problems with billions of keys.
+
+If none of the mentioned configurations are enabled, the intermediate directories won't be created. But in this case, the consistent view of `ofs/o3fs` couldn't be guaranteed, so `ofs/o3fs` **should be disabled and throw an exception** (But Ozone can be used as a pure S3 replacement without using as a HCFS).
+
+# Problematic cases
+
+As described in the previous section there are some cases which couldn't be supported out-of-the-box due to the differences between the flat key-space and file-system hierarchy. These cases are collected here together with the information how existing tools (AWS console, AWS cli, AWS S3A Hadoop connector) behaves.
+
+## Empty directory path
+
+With a pure object store keys with empty directory names can be created.
+
+```
+aws s3api put-object --bucket ozonetest --key a/b//y --body README.md
+```
+
+Behavior:
+ * *S3 web console*: empty dir is rendered as `____` to make it possible to navigate in
+ * **aws s3 ls**: Prefix entry is visible as `PRE /`
+ * S3A: Not visible
+ 
+Proposed behavior:
+
+ * `ozone.om.enable.intermediate.dirs=true`: `/y` is not accessible, `/a/b` directory doesn't contain this entry
+ * `ozone.om.enable.filesystem.paths=true`: key stored as `/a/b/c`  
+ 
+## Path with invalid characters (`..`,`.`) 
+
+Path segments might include parts which has file system semantics:
+
+```
+aws s3api put-object --bucket ozonetest --key a/b/../e --body README.md
+aws s3api put-object --bucket ozonetest --key a/b/./f --body README.md
+```
+
+Behavior:
+ * *S3 web console*: `.` and `..` are rendered as directories to make it possible to navigate in
+ * **aws s3 ls**: Prefix entry is visible as `PRE ../` and `PRE ./`
+ * S3A: Entries are not visible
+ 
+Proposed behavior:
+
+ * `ozone.om.enable.intermediate.dirs=true`: `e` and `f` are not visible
+ * `ozone.om.enable.filesystem.paths=true`: key stored as `/a/e` and `a/b/f`  
+
+## Key and directory with the same name
+
+It is possible to create directory and key with the same name in AWS:
+
+```
+aws s3api put-object --bucket ozonetest --key a/b/h --body README.md
+aws s3api list-objects --bucket ozonetest --prefix=a/b/h/
+```
+
+Behavior:
+ * *S3 web console*: both directory and file are rendered
+ * **aws s3 ls**: prefix (`PRE h/`) and file (`h`) are both displayed
+ * S3A: both entries are visible with the name `/a/b/h` but firt is a file (with size) second is a directory (with directory attributes)
+ 
+Proposed behavior:
+
+ * `ozone.om.enable.intermediate.dirs=true`: show both the file and the directory with the same name (similar to S3A)
+ * `ozone.om.enable.filesystem.paths=true`: throwing exception when the second one is created  
+
+## Directory entry created with file content
+
+In this case we create a directory (key which ends with `/`) but with real file content:
+
+```
+aws s3api put-object --bucket ozonetest --key a/b/i/ --body README.md
+```
+
+Behavior:
+ * *S3 web console*: rendered as directory (couldn't be downloaded)
+ * **aws s3 ls**: showed as a prefix (`aws s3 ls s3://ozonetest/a/b`), but when the full path is used showed as a file without name (`aws s3 ls s3://ozonetest/a/b/i/`)
+ * S3A: `./bin/hdfs dfs -ls s3a://ozonetest/a/b/` shows a directory `h`, `./bin/hdfs dfs -ls s3a://ozonetest/a/b/i` shows a file `i`
+ 
+Proposed behavior:
+
+ * `ozone.om.enable.intermediate.dirs=true`: possible but `i/` is hidden from o3fs/ofs
+ * `ozone.om.enable.filesystem.paths=true`: key name is normalized to real key name
+
+## Create key and explicit create parent dir
+
+```
+aws s3api put-object --bucket ozonetest --key e/f/g/
+aws s3api put-object --bucket ozonetest --key e/f/
+```
+
+Behavior:
+
+ * S3 can support it without any problem
+ 
+Proposed behavior:
+
+After the first command `/e/f/` and `/e/` entries created in the key space (as they are required by `ofs`/`o3fs`) but **with a specific flag** (explicit=false). 
+
+AWS S3 list-objects API should exclude those entries from the result (!).
+
+Second command execution should modify the flag of `/e/f/` key to (explicit=true).
+
+## Create parent dir AND key with S3a
+
+This is the problem which is reported by [HDDS-4209](https://issues.apache.org/jira/browse/HDDS-4209)
+
+```
+hdfs dfs -mkdir -p s3a://b12345/d11/d12 # -> Success
+
+hdfs dfs -put /tmp/file1 s3a://b12345/d11/d12/file1 # -> fails with below error
+```
+
+Proposed behavior:
+
+ * `ozone.om.enable.intermediate.dirs=true`: should work without error
+ * `ozone.om.enable.filesystem.paths=true`: should work without error.
+
+This is an `ofs`/`o3fs` question not an S3. The directory created in the first step shouldn't block the creation of the file. This can be a **mandatory** normalization for `mkdir` directory creation. As it's an HCFS operation, s3 is not affected. Entries created from S3 can be visible from s3 without any problem.

Review comment:
       It will block ofs/o3fs also, as it is not a directory in ozone, it is a file.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org


[GitHub] [hadoop-ozone] elek commented on a change in pull request #1411: HDDS-4097. [DESIGN] S3/Ozone Filesystem inter-op

Posted by GitBox <gi...@apache.org>.
elek commented on a change in pull request #1411:
URL: https://github.com/apache/hadoop-ozone/pull/1411#discussion_r486439055



##########
File path: hadoop-hdds/docs/content/design/s3_hcfs.md
##########
@@ -0,0 +1,282 @@
+---
+title: S3/Ozone Filesystem inter-op 
+summary: How to support both S3 and HCFS and the same time
+date: 2020-09-09
+jira: HDDS-4097
+status: draft
+author: Marton Elek, 
+---
+<!--
+  Licensed under the Apache License, Version 2.0 (the "License");
+  you may not use this file except in compliance with the License.
+  You may obtain a copy of the License at
+
+   http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License. See accompanying LICENSE file.
+-->
+
+# Ozone S3 vs file-system semantics
+
+Ozone is an object-store for Hadoop ecosystem which can be used from multiple interfaces: 
+
+ 1. From Hadoop Compatible File Systems (will be called as *HCFS* in the remaining of this document) (RPC)
+ 2. From S3 compatible applications (REST)
+ 3. From container orchestrator as mounted volume (CSI, alpha feature)
+
+As Ozone is an object store it stores key and values in a flat hierarchy which is enough to support S3 (2). But to support Hadoop Compatible File System (and CSI), Ozone should simulated file system hierarchy.
+
+There are multiple challenges when file system hierarchy is simulated by a flat namespace:
+
+ 1. Some key patterns couldn't be easily transformed to file system path (e.g. `/a/b/../c`, `/a/b//d`, or a real key with directory path like `/b/d/`)
+ 2. Directory entries (which may have own properties) require special handling as file system interface requires a dir entry even if it's not created explicitly (for example if key `/a/b/c` is created `/a/b` supposed to be a visible directory entry for file system interface) 
+ 3. Non-recursive listing of directories can be hard (Listing direct entries under `/a` should ignore all the `/a/b/...`, `/a/b/c/...` keys) 
+ 4. Similar to listing, rename can be a costly operation as it requires to rename many keys (renaming a first level directory means a rename of all the keys with the same prefix)
+
+See also the [Hadoop S3A documentation](https://hadoop.apache.org/docs/current/hadoop-aws/tools/hadoop-aws/index.html#Introducing_the_Hadoop_S3A_client) which describes some of these problem when AWS S3 is used. (*Warnings* section)
+
+# Current status
+
+As of today *Ozone Manager* has two different interfaces (both are defined in `OmClientProtocol.proto`): 
+
+ 1. object store related functions (like *CreateKey*, *LookupKey*, *DeleteKey*,...)  
+ 2. file system related functions (like *CreateFile*, *LookupFile*,...)
+
+File system related functions uses the same flat hierarchy under the hood but includes additional functionalities. For example the `createFile` call creates all the intermediate directories for a specific key (create file `/a/b/c` will create `/a/b` and `/a` entries in the key space)
+
+Today, if a key is created from the S3 interface can cause exceptions if the intermediate directories are checked from HCFS:
+
+
+```shell
+$ aws s3api put-object --endpoint http://localhost:9878 --bucket bucket1 --key /a/b/c/d
+
+$ ozone fs -ls o3fs://bucket1.s3v/a/
+ls: `o3fs://bucket1.s3v/a/': No such file or directory
+```
+
+This problem is reported in [HDDS-3955](https://issues.apache.org/jira/browse/HDDS-3955), where a new configuration key is introduced (`ozone.om.enable.filesystem.paths`). If this is enabled, intermediate directories are created even if the object store interface is used.
+
+This configuration is turned off by default, which means that S3 and HCFS couldn't be used together.
+
+To solve the performance problems of the directory listing / rename, [HDDS-2939](https://issues.apache.org/jira/browse/HDDS-2939) is created, which propose to use a new prefix table to store the "directory" entries (=prefixes).
+
+[HDDS-4097](https://issues.apache.org/jira/browse/HDDS-4097) is created to normalize the key names based on file-system semantics if `ozone.om.enable.filesystem.paths` is enabled. But please note that `ozone.om.enable.filesystem.paths` should always be turned on if S3 and HCFS are both used which means that S3 and HCFS couldn't be used together with normalization.
+
+# Goals
+
+ * Out of the box Ozone should support both S3 and HCFS interfaces without any settings. (It's possible only for the regular, fs compatible key names)
+ * As 100% compatibility couldn't be achieved on both side we need a configuration to set the expectations for incompatible key names
+ * Default behavior of `o3fs` and `ofs` should be as close to `s3a` as possible (when s3 compatibilty is prefered)
+
+# Possible cases to support
+
+There are two main aspects of supporting both `ofs/o3fs` and `s3` together:
+
+ 1. `ofs/o3fs` require to create intermediate directory entries (for exapmle `/a/b` for the key `/b/c/c`)
+ 2. Special file-system incompatible key names require special attention
+
+The second couldn't be done with compromise.
+
+ 1. We either support all key names (including non fs compatible key names), which means `ofs/o3fs` can provide only a partial view
+ 2. Or we can normalize the key names to be fs compatible (which makes it possible to create inconsistent S3 keys)
+
+HDDS-3955 introduced `ozone.om.enable.filesystem.paths`, with this setting we will have two possible usage pattern:
+
+| ozone.om.enable.filesystem.paths= | true | false
+|-|-|-|
+| create itermediate dirs | YES | NO |
+| normalize key names from `ofs/o3fs` | YES | NO
+| force to normalize key names of `s3` interface | YES (1) | NO 
+| `s3` key `/a/b/c` available from `ofs/o3fs` | YES | NO
+| `s3` key `/a/b//c` available from `ofs/o3fs` | YES | NO
+| `s3` key `/a/b//c` available from `s3` | AWS S3 incompatibility | YES
+
+(1): Under implementation
+
+This proposal suggest to use a 3rd option where 100% AWS compatiblity is guaranteed in exchange of a limited `ofs/o3fs` view:
+
+| ozone.om.intermediate.dir.generation= | true |
+|-|-|-|
+| create itermediate dirs | YES | 
+| normalize key names from `ofs/o3fs` | YES |
+| force to normalize key names of `s3` interface | **NO** |
+| `s3` key `/a/b/c` available from `ofs/o3fs` | YES | 
+| `s3` key `/a/b//c` available from `ofs/o3fs` | NO | 
+| `s3` key `/a/b//c` available from `s3` | *YES* (100% AWS compatibility) |
+
+
+# Proposed solution
+
+In short: 
+
+ **I propose to make it possible to configure **normalization** and **intermediate dir creation**, independent from each other**
+ 
+It can be done in multiple ways. For the sake of simplicity, let's imagine two configuration option
+
+| configuration | behavior | 
+|-|-|
+| `ozone.om.enable.filesystem.paths=true`  | Enable intermediate dir generation **AND** key name normalization 
+| `ozone.om.enable.intermediate.dirs=true` | Enable only the intermediate dir generation
+
+## S3/HCFS Interoperability
+
+**In case of intermediate directory generation is enabled (with either of the configuraiton keys)**:
+
+When somebody creates a new key like `/a/b/c/d`, the same key should be visible from HCFS (`o3fs//` or `o3://`). `/a`, `/a/b` and `/a/b/c` should be visible as directories from HCFS.
+
+S3 should list only the `/a/b/c/d` keys, (`/a`, `/a/b`, `/a/b/c` keys, created to help HCFS, **won't be visible** if the key is created from S3)
+
+This can be done with persisting an extra flag with the implicit directory entries. These entries can be modified if they are explicit created.
+
+This flag should be added only for the keys which are created by S3. `ofs://` and `of3fs://`  create explicit directories all the time.
+
+Advantages of this approach:
+
+ 1. HCFS and S3 can work together
+ 2. S3 behavior is closer to the original AWS s3 behavior (when `/a/b/c` key is created `/a/b` won't be visible)
+
+## Handling of the incompatible paths
+
+As it's defined above the intermediate directory generation and normalization are two independent settings. (It's possible to choose only to create the intermediate directories).
+
+**If normalization is choosen**: (`ozone.om.enable.filesystem.paths=true`), all the key names will be normalized to fs-compatible name. It may cause a conflict (error) if the normalized key is already exists (or exists as a file instead of directory)
+
+**Without normalization (`ozone.om.enable.intermediate.dirs=true`)**:
+
+Creating intermediate directories might not be possible if path contains illegal characters or can't be parsed as a file system path. **These keys will be invisible from HCFS** by default. They will be ignored during the normal file list.
+
+## Using Ozone in object-store only mode
+
+Creating intermediate directories can have some overhead (write amplification is increased if many keys are written with different prefixes as we need an entry for each prefixes). This write-amplification can be handled with the current implementation: based on the measurements RocksDB has no problems with billions of keys.
+
+If none of the mentioned configurations are enabled, the intermediate directories won't be created. But in this case, the consistent view of `ofs/o3fs` couldn't be guaranteed, so `ofs/o3fs` **should be disabled and throw an exception** (But Ozone can be used as a pure S3 replacement without using as a HCFS).
+
+# Problematic cases
+
+As described in the previous section there are some cases which couldn't be supported out-of-the-box due to the differences between the flat key-space and file-system hierarchy. These cases are collected here together with the information how existing tools (AWS console, AWS cli, AWS S3A Hadoop connector) behaves.
+
+## Empty directory path
+
+With a pure object store keys with empty directory names can be created.
+
+```
+aws s3api put-object --bucket ozonetest --key a/b//y --body README.md
+```
+
+Behavior:
+ * *S3 web console*: empty dir is rendered as `____` to make it possible to navigate in
+ * **aws s3 ls**: Prefix entry is visible as `PRE /`
+ * S3A: Not visible
+ 
+Proposed behavior:
+
+ * `ozone.om.enable.intermediate.dirs=true`: `/y` is not accessible, `/a/b` directory doesn't contain this entry
+ * `ozone.om.enable.filesystem.paths=true`: key stored as `/a/b/c`  

Review comment:
       Yes, exactly.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org


[GitHub] [hadoop-ozone] xiaoyuyao commented on a change in pull request #1411: HDDS-4097. [DESIGN] S3/Ozone Filesystem inter-op

Posted by GitBox <gi...@apache.org>.
xiaoyuyao commented on a change in pull request #1411:
URL: https://github.com/apache/hadoop-ozone/pull/1411#discussion_r486757610



##########
File path: hadoop-hdds/docs/content/design/s3_hcfs.md
##########
@@ -0,0 +1,280 @@
+---
+title: S3/Ozone Filesystem inter-op 
+summary: How to support both S3 and HCFS and the same time
+date: 2020-09-09
+jira: HDDS-4097
+status: draft
+author: Marton Elek, 
+---
+<!--
+  Licensed under the Apache License, Version 2.0 (the "License");
+  you may not use this file except in compliance with the License.
+  You may obtain a copy of the License at
+
+   http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License. See accompanying LICENSE file.
+-->
+
+# Ozone S3 vs file-system semantics
+
+Ozone is an object-store for Hadoop ecosystem which can be used from multiple interfaces: 
+
+ 1. From Hadoop Compatible File Systems (will be called as *HCFS* in the remaining of this document) (RPC)
+ 2. From S3 compatible applications (REST)
+ 3. From container orchestrator as mounted volume (CSI, alpha feature)
+
+As Ozone is an object store it stores key and values in a flat hierarchy which is enough to support S3 (2). But to support Hadoop Compatible File System (and CSI), Ozone should simulated file system hierarchy.
+
+There are multiple challenges when file system hierarchy is simulated by a flat namespace:
+
+ 1. Some key patterns couldn't be easily transformed to file system path (e.g. `/a/b/../c`, `/a/b//d`, or a real key with directory path like `/b/d/`)
+ 2. Directory entries (which may have own properties) require special handling as file system interface requires a dir entry even if it's not created explicitly (for example if key `/a/b/c` is created `/a/b` supposed to be a visible directory entry for file system interface) 
+ 3. Non-recursive listing of directories can be hard (Listing direct entries under `/a` should ignore all the `/a/b/...`, `/a/b/c/...` keys) 
+ 4. Similar to listing, rename can be a costly operation as it requires to rename many keys (renaming a first level directory means a rename of all the keys with the same prefix)
+
+See also the [Hadoop S3A documentation](https://hadoop.apache.org/docs/current/hadoop-aws/tools/hadoop-aws/index.html#Introducing_the_Hadoop_S3A_client) which describes some of these problem when AWS S3 is used. (*Warnings* section)
+
+# Current status
+
+As of today *Ozone Manager* has two different interfaces (both are defined in `OmClientProtocol.proto`): 
+
+ 1. object store related functions (like *CreateKey*, *LookupKey*, *DeleteKey*,...)  
+ 2. file system related functions (like *CreateFile*, *LookupFile*,...)
+
+File system related functions uses the same flat hierarchy under the hood but includes additional functionalities. For example the `createFile` call creates all the intermediate directories for a specific key (create file `/a/b/c` will create `/a/b` and `/a` entries in the key space)
+
+Today, if a key is created from the S3 interface can cause exceptions if the intermediate directories are checked from HCFS:
+
+
+```shell
+$ aws s3api put-object --endpoint http://localhost:9878 --bucket bucket1 --key /a/b/c/d
+
+$ ozone fs -ls o3fs://bucket1.s3v/a/
+ls: `o3fs://bucket1.s3v/a/': No such file or directory
+```
+
+This problem is reported in [HDDS-3955](https://issues.apache.org/jira/browse/HDDS-3955), where a new configuration key is introduced (`ozone.om.enable.filesystem.paths`). If this is enabled, intermediate directories are created even if the object store interface is used.
+
+This configuration is turned off by default, which means that S3 and HCFS couldn't be used together.
+
+To solve the performance problems of the directory listing / rename, [HDDS-2939](https://issues.apache.org/jira/browse/HDDS-2939) is created, which propose to use a new prefix table to store the "directory" entries (=prefixes).
+
+[HDDS-4097](https://issues.apache.org/jira/browse/HDDS-4097) is created to normalize the key names based on file-system semantics if `ozone.om.enable.filesystem.paths` is enabled. But please note that `ozone.om.enable.filesystem.paths` should always be turned on if S3 and HCFS are both used. It means that if both S3 and HCFS are used, normalization is forced, and S3 interface is not fully AWS S3 compatible. There is no option to use HCFS and S3 but with full AWS compatibility (and reduced HCFS compatibility). 
+
+# Goals
+
+ * Out of the box Ozone should support both S3 and HCFS interfaces without any settings. (It's possible only for the regular, fs compatible key names)
+ * As 100% compatibility couldn't be achieved on both side we need a configuration to set the expectations for incompatible key names
+ * Default behavior of `o3fs` and `ofs` should be as close to `s3a` as possible (when s3 compatibilty is prefered)
+
+# Possible cases to support
+
+There are two main aspects of supporting both `ofs/o3fs` and `s3` together:
+
+ 1. `ofs/o3fs` require to create intermediate directory entries (for example `/a/b` for the key `/b/c/c`)

Review comment:
       /b/c/c => /a/b/c

##########
File path: hadoop-hdds/docs/content/design/s3_hcfs.md
##########
@@ -0,0 +1,280 @@
+---
+title: S3/Ozone Filesystem inter-op 
+summary: How to support both S3 and HCFS and the same time
+date: 2020-09-09
+jira: HDDS-4097
+status: draft
+author: Marton Elek, 
+---
+<!--
+  Licensed under the Apache License, Version 2.0 (the "License");
+  you may not use this file except in compliance with the License.
+  You may obtain a copy of the License at
+
+   http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License. See accompanying LICENSE file.
+-->
+
+# Ozone S3 vs file-system semantics
+
+Ozone is an object-store for Hadoop ecosystem which can be used from multiple interfaces: 
+
+ 1. From Hadoop Compatible File Systems (will be called as *HCFS* in the remaining of this document) (RPC)
+ 2. From S3 compatible applications (REST)
+ 3. From container orchestrator as mounted volume (CSI, alpha feature)
+
+As Ozone is an object store it stores key and values in a flat hierarchy which is enough to support S3 (2). But to support Hadoop Compatible File System (and CSI), Ozone should simulated file system hierarchy.
+
+There are multiple challenges when file system hierarchy is simulated by a flat namespace:
+
+ 1. Some key patterns couldn't be easily transformed to file system path (e.g. `/a/b/../c`, `/a/b//d`, or a real key with directory path like `/b/d/`)
+ 2. Directory entries (which may have own properties) require special handling as file system interface requires a dir entry even if it's not created explicitly (for example if key `/a/b/c` is created `/a/b` supposed to be a visible directory entry for file system interface) 
+ 3. Non-recursive listing of directories can be hard (Listing direct entries under `/a` should ignore all the `/a/b/...`, `/a/b/c/...` keys) 
+ 4. Similar to listing, rename can be a costly operation as it requires to rename many keys (renaming a first level directory means a rename of all the keys with the same prefix)
+
+See also the [Hadoop S3A documentation](https://hadoop.apache.org/docs/current/hadoop-aws/tools/hadoop-aws/index.html#Introducing_the_Hadoop_S3A_client) which describes some of these problem when AWS S3 is used. (*Warnings* section)
+
+# Current status
+
+As of today *Ozone Manager* has two different interfaces (both are defined in `OmClientProtocol.proto`): 
+
+ 1. object store related functions (like *CreateKey*, *LookupKey*, *DeleteKey*,...)  
+ 2. file system related functions (like *CreateFile*, *LookupFile*,...)
+
+File system related functions uses the same flat hierarchy under the hood but includes additional functionalities. For example the `createFile` call creates all the intermediate directories for a specific key (create file `/a/b/c` will create `/a/b` and `/a` entries in the key space)
+
+Today, if a key is created from the S3 interface can cause exceptions if the intermediate directories are checked from HCFS:
+
+
+```shell
+$ aws s3api put-object --endpoint http://localhost:9878 --bucket bucket1 --key /a/b/c/d
+
+$ ozone fs -ls o3fs://bucket1.s3v/a/
+ls: `o3fs://bucket1.s3v/a/': No such file or directory
+```
+
+This problem is reported in [HDDS-3955](https://issues.apache.org/jira/browse/HDDS-3955), where a new configuration key is introduced (`ozone.om.enable.filesystem.paths`). If this is enabled, intermediate directories are created even if the object store interface is used.
+
+This configuration is turned off by default, which means that S3 and HCFS couldn't be used together.
+
+To solve the performance problems of the directory listing / rename, [HDDS-2939](https://issues.apache.org/jira/browse/HDDS-2939) is created, which propose to use a new prefix table to store the "directory" entries (=prefixes).
+
+[HDDS-4097](https://issues.apache.org/jira/browse/HDDS-4097) is created to normalize the key names based on file-system semantics if `ozone.om.enable.filesystem.paths` is enabled. But please note that `ozone.om.enable.filesystem.paths` should always be turned on if S3 and HCFS are both used. It means that if both S3 and HCFS are used, normalization is forced, and S3 interface is not fully AWS S3 compatible. There is no option to use HCFS and S3 but with full AWS compatibility (and reduced HCFS compatibility). 
+
+# Goals
+
+ * Out of the box Ozone should support both S3 and HCFS interfaces without any settings. (It's possible only for the regular, fs compatible key names)
+ * As 100% compatibility couldn't be achieved on both side we need a configuration to set the expectations for incompatible key names
+ * Default behavior of `o3fs` and `ofs` should be as close to `s3a` as possible (when s3 compatibilty is prefered)
+
+# Possible cases to support
+
+There are two main aspects of supporting both `ofs/o3fs` and `s3` together:
+
+ 1. `ofs/o3fs` require to create intermediate directory entries (for example `/a/b` for the key `/b/c/c`)
+ 2. Special file-system incompatible key names require special attention
+
+The second couldn't be done with compromise.
+
+ 1. We either support all key names (including non fs compatible key names), which means `ofs/o3fs` can provide only a partial view
+ 2. Or we can normalize the key names to be fs compatible (which makes it possible to create inconsistent S3 keys)
+
+HDDS-3955 introduced `ozone.om.enable.filesystem.paths`, with this setting we will have two possible usage pattern:
+
+| ozone.om.enable.filesystem.paths= | true | false
+|-|-|-|
+| create itermediate dirs | YES | NO |
+| normalize key names from `ofs/o3fs` | YES | NO
+| force to normalize key names of `s3` interface | YES | NO 
+| `s3` key `/a/b/c` available from `ofs/o3fs` | YES | NO
+| `s3` key `/a/b//c` available from `ofs/o3fs` | YES | NO
+| `s3` key `/a/b//c` available from `s3` | AWS S3 incompatibility | YES

Review comment:
       Any pointer to S3 compatibility requirement? e.g., path handling, normalization, etc.

##########
File path: hadoop-hdds/docs/content/design/s3_hcfs.md
##########
@@ -0,0 +1,282 @@
+---
+title: S3/Ozone Filesystem inter-op 
+summary: How to support both S3 and HCFS and the same time
+date: 2020-09-09
+jira: HDDS-4097
+status: draft
+author: Marton Elek, 
+---
+<!--
+  Licensed under the Apache License, Version 2.0 (the "License");
+  you may not use this file except in compliance with the License.
+  You may obtain a copy of the License at
+
+   http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License. See accompanying LICENSE file.
+-->
+
+# Ozone S3 vs file-system semantics
+
+Ozone is an object-store for Hadoop ecosystem which can be used from multiple interfaces: 
+
+ 1. From Hadoop Compatible File Systems (will be called as *HCFS* in the remaining of this document) (RPC)
+ 2. From S3 compatible applications (REST)
+ 3. From container orchestrator as mounted volume (CSI, alpha feature)
+
+As Ozone is an object store it stores key and values in a flat hierarchy which is enough to support S3 (2). But to support Hadoop Compatible File System (and CSI), Ozone should simulated file system hierarchy.
+
+There are multiple challenges when file system hierarchy is simulated by a flat namespace:
+
+ 1. Some key patterns couldn't be easily transformed to file system path (e.g. `/a/b/../c`, `/a/b//d`, or a real key with directory path like `/b/d/`)
+ 2. Directory entries (which may have own properties) require special handling as file system interface requires a dir entry even if it's not created explicitly (for example if key `/a/b/c` is created `/a/b` supposed to be a visible directory entry for file system interface) 
+ 3. Non-recursive listing of directories can be hard (Listing direct entries under `/a` should ignore all the `/a/b/...`, `/a/b/c/...` keys) 
+ 4. Similar to listing, rename can be a costly operation as it requires to rename many keys (renaming a first level directory means a rename of all the keys with the same prefix)
+
+See also the [Hadoop S3A documentation](https://hadoop.apache.org/docs/current/hadoop-aws/tools/hadoop-aws/index.html#Introducing_the_Hadoop_S3A_client) which describes some of these problem when AWS S3 is used. (*Warnings* section)
+
+# Current status
+
+As of today *Ozone Manager* has two different interfaces (both are defined in `OmClientProtocol.proto`): 
+
+ 1. object store related functions (like *CreateKey*, *LookupKey*, *DeleteKey*,...)  
+ 2. file system related functions (like *CreateFile*, *LookupFile*,...)
+
+File system related functions uses the same flat hierarchy under the hood but includes additional functionalities. For example the `createFile` call creates all the intermediate directories for a specific key (create file `/a/b/c` will create `/a/b` and `/a` entries in the key space)
+
+Today, if a key is created from the S3 interface can cause exceptions if the intermediate directories are checked from HCFS:
+
+
+```shell
+$ aws s3api put-object --endpoint http://localhost:9878 --bucket bucket1 --key /a/b/c/d
+
+$ ozone fs -ls o3fs://bucket1.s3v/a/
+ls: `o3fs://bucket1.s3v/a/': No such file or directory
+```
+
+This problem is reported in [HDDS-3955](https://issues.apache.org/jira/browse/HDDS-3955), where a new configuration key is introduced (`ozone.om.enable.filesystem.paths`). If this is enabled, intermediate directories are created even if the object store interface is used.
+
+This configuration is turned off by default, which means that S3 and HCFS couldn't be used together.
+
+To solve the performance problems of the directory listing / rename, [HDDS-2939](https://issues.apache.org/jira/browse/HDDS-2939) is created, which propose to use a new prefix table to store the "directory" entries (=prefixes).
+
+[HDDS-4097](https://issues.apache.org/jira/browse/HDDS-4097) is created to normalize the key names based on file-system semantics if `ozone.om.enable.filesystem.paths` is enabled. But please note that `ozone.om.enable.filesystem.paths` should always be turned on if S3 and HCFS are both used which means that S3 and HCFS couldn't be used together with normalization.
+
+# Goals
+
+ * Out of the box Ozone should support both S3 and HCFS interfaces without any settings. (It's possible only for the regular, fs compatible key names)
+ * As 100% compatibility couldn't be achieved on both side we need a configuration to set the expectations for incompatible key names
+ * Default behavior of `o3fs` and `ofs` should be as close to `s3a` as possible (when s3 compatibilty is prefered)
+
+# Possible cases to support
+
+There are two main aspects of supporting both `ofs/o3fs` and `s3` together:
+
+ 1. `ofs/o3fs` require to create intermediate directory entries (for exapmle `/a/b` for the key `/b/c/c`)
+ 2. Special file-system incompatible key names require special attention
+
+The second couldn't be done with compromise.
+
+ 1. We either support all key names (including non fs compatible key names), which means `ofs/o3fs` can provide only a partial view
+ 2. Or we can normalize the key names to be fs compatible (which makes it possible to create inconsistent S3 keys)
+
+HDDS-3955 introduced `ozone.om.enable.filesystem.paths`, with this setting we will have two possible usage pattern:
+
+| ozone.om.enable.filesystem.paths= | true | false
+|-|-|-|
+| create itermediate dirs | YES | NO |
+| normalize key names from `ofs/o3fs` | YES | NO
+| force to normalize key names of `s3` interface | YES (1) | NO 
+| `s3` key `/a/b/c` available from `ofs/o3fs` | YES | NO
+| `s3` key `/a/b//c` available from `ofs/o3fs` | YES | NO
+| `s3` key `/a/b//c` available from `s3` | AWS S3 incompatibility | YES
+
+(1): Under implementation
+
+This proposal suggest to use a 3rd option where 100% AWS compatiblity is guaranteed in exchange of a limited `ofs/o3fs` view:
+
+| ozone.om.intermediate.dir.generation= | true |
+|-|-|-|
+| create itermediate dirs | YES | 
+| normalize key names from `ofs/o3fs` | YES |
+| force to normalize key names of `s3` interface | **NO** |
+| `s3` key `/a/b/c` available from `ofs/o3fs` | YES | 
+| `s3` key `/a/b//c` available from `ofs/o3fs` | NO | 
+| `s3` key `/a/b//c` available from `s3` | *YES* (100% AWS compatibility) |
+
+
+# Proposed solution
+
+In short: 
+
+ **I propose to make it possible to configure **normalization** and **intermediate dir creation**, independent from each other**
+ 
+It can be done in multiple ways. For the sake of simplicity, let's imagine two configuration option
+
+| configuration | behavior | 
+|-|-|
+| `ozone.om.enable.filesystem.paths=true`  | Enable intermediate dir generation **AND** key name normalization 

Review comment:
       bq. Existing config means CREATE_DIR+NORMALIZE, new config is just CREATE_DIR.
   
   Are these the only two cases that are useful? do we need to support other combinations?

##########
File path: hadoop-hdds/docs/content/design/s3_hcfs.md
##########
@@ -0,0 +1,282 @@
+---
+title: S3/Ozone Filesystem inter-op 
+summary: How to support both S3 and HCFS and the same time
+date: 2020-09-09
+jira: HDDS-4097
+status: draft
+author: Marton Elek, 
+---
+<!--
+  Licensed under the Apache License, Version 2.0 (the "License");
+  you may not use this file except in compliance with the License.
+  You may obtain a copy of the License at
+
+   http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License. See accompanying LICENSE file.
+-->
+
+# Ozone S3 vs file-system semantics
+
+Ozone is an object-store for Hadoop ecosystem which can be used from multiple interfaces: 
+
+ 1. From Hadoop Compatible File Systems (will be called as *HCFS* in the remaining of this document) (RPC)
+ 2. From S3 compatible applications (REST)
+ 3. From container orchestrator as mounted volume (CSI, alpha feature)
+
+As Ozone is an object store it stores key and values in a flat hierarchy which is enough to support S3 (2). But to support Hadoop Compatible File System (and CSI), Ozone should simulated file system hierarchy.
+
+There are multiple challenges when file system hierarchy is simulated by a flat namespace:
+
+ 1. Some key patterns couldn't be easily transformed to file system path (e.g. `/a/b/../c`, `/a/b//d`, or a real key with directory path like `/b/d/`)
+ 2. Directory entries (which may have own properties) require special handling as file system interface requires a dir entry even if it's not created explicitly (for example if key `/a/b/c` is created `/a/b` supposed to be a visible directory entry for file system interface) 
+ 3. Non-recursive listing of directories can be hard (Listing direct entries under `/a` should ignore all the `/a/b/...`, `/a/b/c/...` keys) 
+ 4. Similar to listing, rename can be a costly operation as it requires to rename many keys (renaming a first level directory means a rename of all the keys with the same prefix)
+
+See also the [Hadoop S3A documentation](https://hadoop.apache.org/docs/current/hadoop-aws/tools/hadoop-aws/index.html#Introducing_the_Hadoop_S3A_client) which describes some of these problem when AWS S3 is used. (*Warnings* section)
+
+# Current status
+
+As of today *Ozone Manager* has two different interfaces (both are defined in `OmClientProtocol.proto`): 
+
+ 1. object store related functions (like *CreateKey*, *LookupKey*, *DeleteKey*,...)  
+ 2. file system related functions (like *CreateFile*, *LookupFile*,...)
+
+File system related functions uses the same flat hierarchy under the hood but includes additional functionalities. For example the `createFile` call creates all the intermediate directories for a specific key (create file `/a/b/c` will create `/a/b` and `/a` entries in the key space)
+
+Today, if a key is created from the S3 interface can cause exceptions if the intermediate directories are checked from HCFS:
+
+
+```shell
+$ aws s3api put-object --endpoint http://localhost:9878 --bucket bucket1 --key /a/b/c/d
+
+$ ozone fs -ls o3fs://bucket1.s3v/a/
+ls: `o3fs://bucket1.s3v/a/': No such file or directory
+```
+
+This problem is reported in [HDDS-3955](https://issues.apache.org/jira/browse/HDDS-3955), where a new configuration key is introduced (`ozone.om.enable.filesystem.paths`). If this is enabled, intermediate directories are created even if the object store interface is used.
+
+This configuration is turned off by default, which means that S3 and HCFS couldn't be used together.
+
+To solve the performance problems of the directory listing / rename, [HDDS-2939](https://issues.apache.org/jira/browse/HDDS-2939) is created, which propose to use a new prefix table to store the "directory" entries (=prefixes).
+
+[HDDS-4097](https://issues.apache.org/jira/browse/HDDS-4097) is created to normalize the key names based on file-system semantics if `ozone.om.enable.filesystem.paths` is enabled. But please note that `ozone.om.enable.filesystem.paths` should always be turned on if S3 and HCFS are both used which means that S3 and HCFS couldn't be used together with normalization.
+
+# Goals
+
+ * Out of the box Ozone should support both S3 and HCFS interfaces without any settings. (It's possible only for the regular, fs compatible key names)
+ * As 100% compatibility couldn't be achieved on both side we need a configuration to set the expectations for incompatible key names
+ * Default behavior of `o3fs` and `ofs` should be as close to `s3a` as possible (when s3 compatibilty is prefered)
+
+# Possible cases to support
+
+There are two main aspects of supporting both `ofs/o3fs` and `s3` together:
+
+ 1. `ofs/o3fs` require to create intermediate directory entries (for exapmle `/a/b` for the key `/b/c/c`)
+ 2. Special file-system incompatible key names require special attention
+
+The second couldn't be done with compromise.
+
+ 1. We either support all key names (including non fs compatible key names), which means `ofs/o3fs` can provide only a partial view
+ 2. Or we can normalize the key names to be fs compatible (which makes it possible to create inconsistent S3 keys)
+
+HDDS-3955 introduced `ozone.om.enable.filesystem.paths`, with this setting we will have two possible usage pattern:
+
+| ozone.om.enable.filesystem.paths= | true | false
+|-|-|-|
+| create itermediate dirs | YES | NO |
+| normalize key names from `ofs/o3fs` | YES | NO
+| force to normalize key names of `s3` interface | YES (1) | NO 
+| `s3` key `/a/b/c` available from `ofs/o3fs` | YES | NO
+| `s3` key `/a/b//c` available from `ofs/o3fs` | YES | NO
+| `s3` key `/a/b//c` available from `s3` | AWS S3 incompatibility | YES
+
+(1): Under implementation
+
+This proposal suggest to use a 3rd option where 100% AWS compatiblity is guaranteed in exchange of a limited `ofs/o3fs` view:
+
+| ozone.om.intermediate.dir.generation= | true |
+|-|-|-|
+| create itermediate dirs | YES | 
+| normalize key names from `ofs/o3fs` | YES |
+| force to normalize key names of `s3` interface | **NO** |
+| `s3` key `/a/b/c` available from `ofs/o3fs` | YES | 
+| `s3` key `/a/b//c` available from `ofs/o3fs` | NO | 
+| `s3` key `/a/b//c` available from `s3` | *YES* (100% AWS compatibility) |
+
+
+# Proposed solution
+
+In short: 
+
+ **I propose to make it possible to configure **normalization** and **intermediate dir creation**, independent from each other**
+ 
+It can be done in multiple ways. For the sake of simplicity, let's imagine two configuration option
+
+| configuration | behavior | 
+|-|-|
+| `ozone.om.enable.filesystem.paths=true`  | Enable intermediate dir generation **AND** key name normalization 
+| `ozone.om.enable.intermediate.dirs=true` | Enable only the intermediate dir generation
+
+## S3/HCFS Interoperability
+
+**In case of intermediate directory generation is enabled (with either of the configuraiton keys)**:
+
+When somebody creates a new key like `/a/b/c/d`, the same key should be visible from HCFS (`o3fs//` or `o3://`). `/a`, `/a/b` and `/a/b/c` should be visible as directories from HCFS.
+
+S3 should list only the `/a/b/c/d` keys, (`/a`, `/a/b`, `/a/b/c` keys, created to help HCFS, **won't be visible** if the key is created from S3)
+
+This can be done with persisting an extra flag with the implicit directory entries. These entries can be modified if they are explicit created.
+
+This flag should be added only for the keys which are created by S3. `ofs://` and `of3fs://`  create explicit directories all the time.

Review comment:
       bq. S3 should list only the `/a/b/c/d` keys, (`/a`, `/a/b`, `/a/b/c` keys, created to help HCFS, **won't be visible** if the key is created from S3)
   
   what if user create additional files under intermediate dirs such as /a/b/d via FS interface, we still want to show them in this case for interop?
   

##########
File path: hadoop-hdds/docs/content/design/s3_hcfs.md
##########
@@ -0,0 +1,280 @@
+---
+title: S3/Ozone Filesystem inter-op 
+summary: How to support both S3 and HCFS and the same time
+date: 2020-09-09
+jira: HDDS-4097
+status: draft
+author: Marton Elek, 
+---
+<!--
+  Licensed under the Apache License, Version 2.0 (the "License");
+  you may not use this file except in compliance with the License.
+  You may obtain a copy of the License at
+
+   http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License. See accompanying LICENSE file.
+-->
+
+# Ozone S3 vs file-system semantics
+
+Ozone is an object-store for Hadoop ecosystem which can be used from multiple interfaces: 
+
+ 1. From Hadoop Compatible File Systems (will be called as *HCFS* in the remaining of this document) (RPC)
+ 2. From S3 compatible applications (REST)
+ 3. From container orchestrator as mounted volume (CSI, alpha feature)
+
+As Ozone is an object store it stores key and values in a flat hierarchy which is enough to support S3 (2). But to support Hadoop Compatible File System (and CSI), Ozone should simulated file system hierarchy.
+
+There are multiple challenges when file system hierarchy is simulated by a flat namespace:
+
+ 1. Some key patterns couldn't be easily transformed to file system path (e.g. `/a/b/../c`, `/a/b//d`, or a real key with directory path like `/b/d/`)
+ 2. Directory entries (which may have own properties) require special handling as file system interface requires a dir entry even if it's not created explicitly (for example if key `/a/b/c` is created `/a/b` supposed to be a visible directory entry for file system interface) 
+ 3. Non-recursive listing of directories can be hard (Listing direct entries under `/a` should ignore all the `/a/b/...`, `/a/b/c/...` keys) 
+ 4. Similar to listing, rename can be a costly operation as it requires to rename many keys (renaming a first level directory means a rename of all the keys with the same prefix)
+
+See also the [Hadoop S3A documentation](https://hadoop.apache.org/docs/current/hadoop-aws/tools/hadoop-aws/index.html#Introducing_the_Hadoop_S3A_client) which describes some of these problem when AWS S3 is used. (*Warnings* section)
+
+# Current status
+
+As of today *Ozone Manager* has two different interfaces (both are defined in `OmClientProtocol.proto`): 
+
+ 1. object store related functions (like *CreateKey*, *LookupKey*, *DeleteKey*,...)  
+ 2. file system related functions (like *CreateFile*, *LookupFile*,...)
+
+File system related functions uses the same flat hierarchy under the hood but includes additional functionalities. For example the `createFile` call creates all the intermediate directories for a specific key (create file `/a/b/c` will create `/a/b` and `/a` entries in the key space)
+
+Today, if a key is created from the S3 interface can cause exceptions if the intermediate directories are checked from HCFS:
+
+
+```shell
+$ aws s3api put-object --endpoint http://localhost:9878 --bucket bucket1 --key /a/b/c/d
+
+$ ozone fs -ls o3fs://bucket1.s3v/a/
+ls: `o3fs://bucket1.s3v/a/': No such file or directory
+```
+
+This problem is reported in [HDDS-3955](https://issues.apache.org/jira/browse/HDDS-3955), where a new configuration key is introduced (`ozone.om.enable.filesystem.paths`). If this is enabled, intermediate directories are created even if the object store interface is used.
+
+This configuration is turned off by default, which means that S3 and HCFS couldn't be used together.
+
+To solve the performance problems of the directory listing / rename, [HDDS-2939](https://issues.apache.org/jira/browse/HDDS-2939) is created, which propose to use a new prefix table to store the "directory" entries (=prefixes).
+
+[HDDS-4097](https://issues.apache.org/jira/browse/HDDS-4097) is created to normalize the key names based on file-system semantics if `ozone.om.enable.filesystem.paths` is enabled. But please note that `ozone.om.enable.filesystem.paths` should always be turned on if S3 and HCFS are both used. It means that if both S3 and HCFS are used, normalization is forced, and S3 interface is not fully AWS S3 compatible. There is no option to use HCFS and S3 but with full AWS compatibility (and reduced HCFS compatibility). 
+
+# Goals
+
+ * Out of the box Ozone should support both S3 and HCFS interfaces without any settings. (It's possible only for the regular, fs compatible key names)
+ * As 100% compatibility couldn't be achieved on both side we need a configuration to set the expectations for incompatible key names
+ * Default behavior of `o3fs` and `ofs` should be as close to `s3a` as possible (when s3 compatibilty is prefered)
+
+# Possible cases to support
+
+There are two main aspects of supporting both `ofs/o3fs` and `s3` together:
+
+ 1. `ofs/o3fs` require to create intermediate directory entries (for example `/a/b` for the key `/b/c/c`)
+ 2. Special file-system incompatible key names require special attention
+
+The second couldn't be done with compromise.
+
+ 1. We either support all key names (including non fs compatible key names), which means `ofs/o3fs` can provide only a partial view
+ 2. Or we can normalize the key names to be fs compatible (which makes it possible to create inconsistent S3 keys)
+
+HDDS-3955 introduced `ozone.om.enable.filesystem.paths`, with this setting we will have two possible usage pattern:
+
+| ozone.om.enable.filesystem.paths= | true | false
+|-|-|-|
+| create itermediate dirs | YES | NO |
+| normalize key names from `ofs/o3fs` | YES | NO
+| force to normalize key names of `s3` interface | YES | NO 
+| `s3` key `/a/b/c` available from `ofs/o3fs` | YES | NO
+| `s3` key `/a/b//c` available from `ofs/o3fs` | YES | NO
+| `s3` key `/a/b//c` available from `s3` | AWS S3 incompatibility | YES
+
+This proposal suggest to use a 3rd option where 100% AWS compatiblity is guaranteed in exchange of a limited `ofs/o3fs` view:
+
+| ozone.om.intermediate.dir.generation= | true |
+|-|-|-|
+| create itermediate dirs | YES | 
+| normalize key names from `ofs/o3fs` | YES |
+| force to normalize key names of `s3` interface | **NO** |
+| `s3` key `/a/b/c` available from `ofs/o3fs` | YES | 
+| `s3` key `/a/b//c` available from `ofs/o3fs` | NO | 
+| `s3` key `/a/b//c` available from `s3` | *YES* (100% AWS compatibility) |
+
+
+# Proposed solution
+
+In short: 
+
+ **I propose to make it possible to configure **normalization** and **intermediate dir creation**, independent from each other**
+ 
+It can be done in multiple ways. For the sake of simplicity, let's imagine two configuration option
+
+| configuration | behavior | 
+|-|-|
+| `ozone.om.enable.filesystem.paths=true`  | Enable intermediate dir generation **AND** key name normalization 
+| `ozone.om.enable.intermediate.dirs=true` | Enable only the intermediate dir generation
+
+## S3/HCFS Interoperability
+
+**In case of intermediate directory generation is enabled (with either of the configuraiton keys)**:
+
+When somebody creates a new key like `/a/b/c/d`, the same key should be visible from HCFS (`o3fs//` or `o3://`). `/a`, `/a/b` and `/a/b/c` should be visible as directories from HCFS.
+
+S3 should list only the `/a/b/c/d` keys, (`/a`, `/a/b`, `/a/b/c` keys, created to help HCFS, **won't be visible** if the key is created from S3)
+
+This can be done with persisting an extra flag with the implicit directory entries. These entries can be modified if they are explicit created.
+
+This flag should be added only for the keys which are created by S3. `ofs://` and `of3fs://`  create explicit directories all the time.
+
+Advantages of this approach:
+
+ 1. HCFS and S3 can work together
+ 2. S3 behavior is closer to the original AWS s3 behavior (when `/a/b/c` key is created `/a/b` won't be visible)
+
+## Handling of the incompatible paths
+
+As it's defined above the intermediate directory generation and normalization are two independent settings. (It's possible to choose only to create the intermediate directories).
+
+**If normalization is choosen**: (`ozone.om.enable.filesystem.paths=true`), all the key names will be normalized to fs-compatible name. It may cause a conflict (error) if the normalized key is already exists (or exists as a file instead of directory)
+
+**Without normalization (`ozone.om.enable.intermediate.dirs=true`)**:
+
+Creating intermediate directories might not be possible if path contains illegal characters or can't be parsed as a file system path. **These keys will be invisible from HCFS** by default. They will be ignored during the normal file list.

Review comment:
       The illegal char from S3 path can be encoded into FS path except the /. 

##########
File path: hadoop-hdds/docs/content/design/s3_hcfs.md
##########
@@ -0,0 +1,280 @@
+---
+title: S3/Ozone Filesystem inter-op 
+summary: How to support both S3 and HCFS and the same time
+date: 2020-09-09
+jira: HDDS-4097
+status: draft
+author: Marton Elek, 
+---
+<!--
+  Licensed under the Apache License, Version 2.0 (the "License");
+  you may not use this file except in compliance with the License.
+  You may obtain a copy of the License at
+
+   http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License. See accompanying LICENSE file.
+-->
+
+# Ozone S3 vs file-system semantics
+
+Ozone is an object-store for Hadoop ecosystem which can be used from multiple interfaces: 
+
+ 1. From Hadoop Compatible File Systems (will be called as *HCFS* in the remaining of this document) (RPC)
+ 2. From S3 compatible applications (REST)
+ 3. From container orchestrator as mounted volume (CSI, alpha feature)
+
+As Ozone is an object store it stores key and values in a flat hierarchy which is enough to support S3 (2). But to support Hadoop Compatible File System (and CSI), Ozone should simulated file system hierarchy.
+
+There are multiple challenges when file system hierarchy is simulated by a flat namespace:
+
+ 1. Some key patterns couldn't be easily transformed to file system path (e.g. `/a/b/../c`, `/a/b//d`, or a real key with directory path like `/b/d/`)
+ 2. Directory entries (which may have own properties) require special handling as file system interface requires a dir entry even if it's not created explicitly (for example if key `/a/b/c` is created `/a/b` supposed to be a visible directory entry for file system interface) 
+ 3. Non-recursive listing of directories can be hard (Listing direct entries under `/a` should ignore all the `/a/b/...`, `/a/b/c/...` keys) 
+ 4. Similar to listing, rename can be a costly operation as it requires to rename many keys (renaming a first level directory means a rename of all the keys with the same prefix)
+
+See also the [Hadoop S3A documentation](https://hadoop.apache.org/docs/current/hadoop-aws/tools/hadoop-aws/index.html#Introducing_the_Hadoop_S3A_client) which describes some of these problem when AWS S3 is used. (*Warnings* section)
+
+# Current status
+
+As of today *Ozone Manager* has two different interfaces (both are defined in `OmClientProtocol.proto`): 
+
+ 1. object store related functions (like *CreateKey*, *LookupKey*, *DeleteKey*,...)  
+ 2. file system related functions (like *CreateFile*, *LookupFile*,...)
+
+File system related functions uses the same flat hierarchy under the hood but includes additional functionalities. For example the `createFile` call creates all the intermediate directories for a specific key (create file `/a/b/c` will create `/a/b` and `/a` entries in the key space)
+
+Today, if a key is created from the S3 interface can cause exceptions if the intermediate directories are checked from HCFS:
+
+
+```shell
+$ aws s3api put-object --endpoint http://localhost:9878 --bucket bucket1 --key /a/b/c/d
+
+$ ozone fs -ls o3fs://bucket1.s3v/a/
+ls: `o3fs://bucket1.s3v/a/': No such file or directory
+```
+
+This problem is reported in [HDDS-3955](https://issues.apache.org/jira/browse/HDDS-3955), where a new configuration key is introduced (`ozone.om.enable.filesystem.paths`). If this is enabled, intermediate directories are created even if the object store interface is used.
+
+This configuration is turned off by default, which means that S3 and HCFS couldn't be used together.
+
+To solve the performance problems of the directory listing / rename, [HDDS-2939](https://issues.apache.org/jira/browse/HDDS-2939) is created, which propose to use a new prefix table to store the "directory" entries (=prefixes).
+
+[HDDS-4097](https://issues.apache.org/jira/browse/HDDS-4097) is created to normalize the key names based on file-system semantics if `ozone.om.enable.filesystem.paths` is enabled. But please note that `ozone.om.enable.filesystem.paths` should always be turned on if S3 and HCFS are both used. It means that if both S3 and HCFS are used, normalization is forced, and S3 interface is not fully AWS S3 compatible. There is no option to use HCFS and S3 but with full AWS compatibility (and reduced HCFS compatibility). 
+
+# Goals
+
+ * Out of the box Ozone should support both S3 and HCFS interfaces without any settings. (It's possible only for the regular, fs compatible key names)
+ * As 100% compatibility couldn't be achieved on both side we need a configuration to set the expectations for incompatible key names
+ * Default behavior of `o3fs` and `ofs` should be as close to `s3a` as possible (when s3 compatibilty is prefered)
+
+# Possible cases to support
+
+There are two main aspects of supporting both `ofs/o3fs` and `s3` together:
+
+ 1. `ofs/o3fs` require to create intermediate directory entries (for example `/a/b` for the key `/b/c/c`)

Review comment:
       /b/c/c => /a/b/c

##########
File path: hadoop-hdds/docs/content/design/s3_hcfs.md
##########
@@ -0,0 +1,280 @@
+---
+title: S3/Ozone Filesystem inter-op 
+summary: How to support both S3 and HCFS and the same time
+date: 2020-09-09
+jira: HDDS-4097
+status: draft
+author: Marton Elek, 
+---
+<!--
+  Licensed under the Apache License, Version 2.0 (the "License");
+  you may not use this file except in compliance with the License.
+  You may obtain a copy of the License at
+
+   http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License. See accompanying LICENSE file.
+-->
+
+# Ozone S3 vs file-system semantics
+
+Ozone is an object-store for Hadoop ecosystem which can be used from multiple interfaces: 
+
+ 1. From Hadoop Compatible File Systems (will be called as *HCFS* in the remaining of this document) (RPC)
+ 2. From S3 compatible applications (REST)
+ 3. From container orchestrator as mounted volume (CSI, alpha feature)
+
+As Ozone is an object store it stores key and values in a flat hierarchy which is enough to support S3 (2). But to support Hadoop Compatible File System (and CSI), Ozone should simulated file system hierarchy.
+
+There are multiple challenges when file system hierarchy is simulated by a flat namespace:
+
+ 1. Some key patterns couldn't be easily transformed to file system path (e.g. `/a/b/../c`, `/a/b//d`, or a real key with directory path like `/b/d/`)
+ 2. Directory entries (which may have own properties) require special handling as file system interface requires a dir entry even if it's not created explicitly (for example if key `/a/b/c` is created `/a/b` supposed to be a visible directory entry for file system interface) 
+ 3. Non-recursive listing of directories can be hard (Listing direct entries under `/a` should ignore all the `/a/b/...`, `/a/b/c/...` keys) 
+ 4. Similar to listing, rename can be a costly operation as it requires to rename many keys (renaming a first level directory means a rename of all the keys with the same prefix)
+
+See also the [Hadoop S3A documentation](https://hadoop.apache.org/docs/current/hadoop-aws/tools/hadoop-aws/index.html#Introducing_the_Hadoop_S3A_client) which describes some of these problem when AWS S3 is used. (*Warnings* section)
+
+# Current status
+
+As of today *Ozone Manager* has two different interfaces (both are defined in `OmClientProtocol.proto`): 
+
+ 1. object store related functions (like *CreateKey*, *LookupKey*, *DeleteKey*,...)  
+ 2. file system related functions (like *CreateFile*, *LookupFile*,...)
+
+File system related functions uses the same flat hierarchy under the hood but includes additional functionalities. For example the `createFile` call creates all the intermediate directories for a specific key (create file `/a/b/c` will create `/a/b` and `/a` entries in the key space)
+
+Today, if a key is created from the S3 interface can cause exceptions if the intermediate directories are checked from HCFS:
+
+
+```shell
+$ aws s3api put-object --endpoint http://localhost:9878 --bucket bucket1 --key /a/b/c/d
+
+$ ozone fs -ls o3fs://bucket1.s3v/a/
+ls: `o3fs://bucket1.s3v/a/': No such file or directory
+```
+
+This problem is reported in [HDDS-3955](https://issues.apache.org/jira/browse/HDDS-3955), where a new configuration key is introduced (`ozone.om.enable.filesystem.paths`). If this is enabled, intermediate directories are created even if the object store interface is used.
+
+This configuration is turned off by default, which means that S3 and HCFS couldn't be used together.
+
+To solve the performance problems of the directory listing / rename, [HDDS-2939](https://issues.apache.org/jira/browse/HDDS-2939) is created, which propose to use a new prefix table to store the "directory" entries (=prefixes).
+
+[HDDS-4097](https://issues.apache.org/jira/browse/HDDS-4097) is created to normalize the key names based on file-system semantics if `ozone.om.enable.filesystem.paths` is enabled. But please note that `ozone.om.enable.filesystem.paths` should always be turned on if S3 and HCFS are both used. It means that if both S3 and HCFS are used, normalization is forced, and S3 interface is not fully AWS S3 compatible. There is no option to use HCFS and S3 but with full AWS compatibility (and reduced HCFS compatibility). 
+
+# Goals
+
+ * Out of the box Ozone should support both S3 and HCFS interfaces without any settings. (It's possible only for the regular, fs compatible key names)
+ * As 100% compatibility couldn't be achieved on both side we need a configuration to set the expectations for incompatible key names
+ * Default behavior of `o3fs` and `ofs` should be as close to `s3a` as possible (when s3 compatibilty is prefered)
+
+# Possible cases to support
+
+There are two main aspects of supporting both `ofs/o3fs` and `s3` together:
+
+ 1. `ofs/o3fs` require to create intermediate directory entries (for example `/a/b` for the key `/b/c/c`)
+ 2. Special file-system incompatible key names require special attention
+
+The second couldn't be done with compromise.
+
+ 1. We either support all key names (including non fs compatible key names), which means `ofs/o3fs` can provide only a partial view
+ 2. Or we can normalize the key names to be fs compatible (which makes it possible to create inconsistent S3 keys)
+
+HDDS-3955 introduced `ozone.om.enable.filesystem.paths`, with this setting we will have two possible usage pattern:
+
+| ozone.om.enable.filesystem.paths= | true | false
+|-|-|-|
+| create itermediate dirs | YES | NO |
+| normalize key names from `ofs/o3fs` | YES | NO
+| force to normalize key names of `s3` interface | YES | NO 
+| `s3` key `/a/b/c` available from `ofs/o3fs` | YES | NO
+| `s3` key `/a/b//c` available from `ofs/o3fs` | YES | NO
+| `s3` key `/a/b//c` available from `s3` | AWS S3 incompatibility | YES

Review comment:
       Any pointer to S3 compatibility requirement? e.g., path handling, normalization, etc.

##########
File path: hadoop-hdds/docs/content/design/s3_hcfs.md
##########
@@ -0,0 +1,282 @@
+---
+title: S3/Ozone Filesystem inter-op 
+summary: How to support both S3 and HCFS and the same time
+date: 2020-09-09
+jira: HDDS-4097
+status: draft
+author: Marton Elek, 
+---
+<!--
+  Licensed under the Apache License, Version 2.0 (the "License");
+  you may not use this file except in compliance with the License.
+  You may obtain a copy of the License at
+
+   http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License. See accompanying LICENSE file.
+-->
+
+# Ozone S3 vs file-system semantics
+
+Ozone is an object-store for Hadoop ecosystem which can be used from multiple interfaces: 
+
+ 1. From Hadoop Compatible File Systems (will be called as *HCFS* in the remaining of this document) (RPC)
+ 2. From S3 compatible applications (REST)
+ 3. From container orchestrator as mounted volume (CSI, alpha feature)
+
+As Ozone is an object store it stores key and values in a flat hierarchy which is enough to support S3 (2). But to support Hadoop Compatible File System (and CSI), Ozone should simulated file system hierarchy.
+
+There are multiple challenges when file system hierarchy is simulated by a flat namespace:
+
+ 1. Some key patterns couldn't be easily transformed to file system path (e.g. `/a/b/../c`, `/a/b//d`, or a real key with directory path like `/b/d/`)
+ 2. Directory entries (which may have own properties) require special handling as file system interface requires a dir entry even if it's not created explicitly (for example if key `/a/b/c` is created `/a/b` supposed to be a visible directory entry for file system interface) 
+ 3. Non-recursive listing of directories can be hard (Listing direct entries under `/a` should ignore all the `/a/b/...`, `/a/b/c/...` keys) 
+ 4. Similar to listing, rename can be a costly operation as it requires to rename many keys (renaming a first level directory means a rename of all the keys with the same prefix)
+
+See also the [Hadoop S3A documentation](https://hadoop.apache.org/docs/current/hadoop-aws/tools/hadoop-aws/index.html#Introducing_the_Hadoop_S3A_client) which describes some of these problem when AWS S3 is used. (*Warnings* section)
+
+# Current status
+
+As of today *Ozone Manager* has two different interfaces (both are defined in `OmClientProtocol.proto`): 
+
+ 1. object store related functions (like *CreateKey*, *LookupKey*, *DeleteKey*,...)  
+ 2. file system related functions (like *CreateFile*, *LookupFile*,...)
+
+File system related functions uses the same flat hierarchy under the hood but includes additional functionalities. For example the `createFile` call creates all the intermediate directories for a specific key (create file `/a/b/c` will create `/a/b` and `/a` entries in the key space)
+
+Today, if a key is created from the S3 interface can cause exceptions if the intermediate directories are checked from HCFS:
+
+
+```shell
+$ aws s3api put-object --endpoint http://localhost:9878 --bucket bucket1 --key /a/b/c/d
+
+$ ozone fs -ls o3fs://bucket1.s3v/a/
+ls: `o3fs://bucket1.s3v/a/': No such file or directory
+```
+
+This problem is reported in [HDDS-3955](https://issues.apache.org/jira/browse/HDDS-3955), where a new configuration key is introduced (`ozone.om.enable.filesystem.paths`). If this is enabled, intermediate directories are created even if the object store interface is used.
+
+This configuration is turned off by default, which means that S3 and HCFS couldn't be used together.
+
+To solve the performance problems of the directory listing / rename, [HDDS-2939](https://issues.apache.org/jira/browse/HDDS-2939) is created, which propose to use a new prefix table to store the "directory" entries (=prefixes).
+
+[HDDS-4097](https://issues.apache.org/jira/browse/HDDS-4097) is created to normalize the key names based on file-system semantics if `ozone.om.enable.filesystem.paths` is enabled. But please note that `ozone.om.enable.filesystem.paths` should always be turned on if S3 and HCFS are both used which means that S3 and HCFS couldn't be used together with normalization.
+
+# Goals
+
+ * Out of the box Ozone should support both S3 and HCFS interfaces without any settings. (It's possible only for the regular, fs compatible key names)
+ * As 100% compatibility couldn't be achieved on both side we need a configuration to set the expectations for incompatible key names
+ * Default behavior of `o3fs` and `ofs` should be as close to `s3a` as possible (when s3 compatibilty is prefered)
+
+# Possible cases to support
+
+There are two main aspects of supporting both `ofs/o3fs` and `s3` together:
+
+ 1. `ofs/o3fs` require to create intermediate directory entries (for exapmle `/a/b` for the key `/b/c/c`)
+ 2. Special file-system incompatible key names require special attention
+
+The second couldn't be done with compromise.
+
+ 1. We either support all key names (including non fs compatible key names), which means `ofs/o3fs` can provide only a partial view
+ 2. Or we can normalize the key names to be fs compatible (which makes it possible to create inconsistent S3 keys)
+
+HDDS-3955 introduced `ozone.om.enable.filesystem.paths`, with this setting we will have two possible usage pattern:
+
+| ozone.om.enable.filesystem.paths= | true | false
+|-|-|-|
+| create itermediate dirs | YES | NO |
+| normalize key names from `ofs/o3fs` | YES | NO
+| force to normalize key names of `s3` interface | YES (1) | NO 
+| `s3` key `/a/b/c` available from `ofs/o3fs` | YES | NO
+| `s3` key `/a/b//c` available from `ofs/o3fs` | YES | NO
+| `s3` key `/a/b//c` available from `s3` | AWS S3 incompatibility | YES
+
+(1): Under implementation
+
+This proposal suggest to use a 3rd option where 100% AWS compatiblity is guaranteed in exchange of a limited `ofs/o3fs` view:
+
+| ozone.om.intermediate.dir.generation= | true |
+|-|-|-|
+| create itermediate dirs | YES | 
+| normalize key names from `ofs/o3fs` | YES |
+| force to normalize key names of `s3` interface | **NO** |
+| `s3` key `/a/b/c` available from `ofs/o3fs` | YES | 
+| `s3` key `/a/b//c` available from `ofs/o3fs` | NO | 
+| `s3` key `/a/b//c` available from `s3` | *YES* (100% AWS compatibility) |
+
+
+# Proposed solution
+
+In short: 
+
+ **I propose to make it possible to configure **normalization** and **intermediate dir creation**, independent from each other**
+ 
+It can be done in multiple ways. For the sake of simplicity, let's imagine two configuration option
+
+| configuration | behavior | 
+|-|-|
+| `ozone.om.enable.filesystem.paths=true`  | Enable intermediate dir generation **AND** key name normalization 

Review comment:
       bq. Existing config means CREATE_DIR+NORMALIZE, new config is just CREATE_DIR.
   
   Are these the only two cases that are useful? do we need to support other combinations?

##########
File path: hadoop-hdds/docs/content/design/s3_hcfs.md
##########
@@ -0,0 +1,282 @@
+---
+title: S3/Ozone Filesystem inter-op 
+summary: How to support both S3 and HCFS and the same time
+date: 2020-09-09
+jira: HDDS-4097
+status: draft
+author: Marton Elek, 
+---
+<!--
+  Licensed under the Apache License, Version 2.0 (the "License");
+  you may not use this file except in compliance with the License.
+  You may obtain a copy of the License at
+
+   http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License. See accompanying LICENSE file.
+-->
+
+# Ozone S3 vs file-system semantics
+
+Ozone is an object-store for Hadoop ecosystem which can be used from multiple interfaces: 
+
+ 1. From Hadoop Compatible File Systems (will be called as *HCFS* in the remaining of this document) (RPC)
+ 2. From S3 compatible applications (REST)
+ 3. From container orchestrator as mounted volume (CSI, alpha feature)
+
+As Ozone is an object store it stores key and values in a flat hierarchy which is enough to support S3 (2). But to support Hadoop Compatible File System (and CSI), Ozone should simulated file system hierarchy.
+
+There are multiple challenges when file system hierarchy is simulated by a flat namespace:
+
+ 1. Some key patterns couldn't be easily transformed to file system path (e.g. `/a/b/../c`, `/a/b//d`, or a real key with directory path like `/b/d/`)
+ 2. Directory entries (which may have own properties) require special handling as file system interface requires a dir entry even if it's not created explicitly (for example if key `/a/b/c` is created `/a/b` supposed to be a visible directory entry for file system interface) 
+ 3. Non-recursive listing of directories can be hard (Listing direct entries under `/a` should ignore all the `/a/b/...`, `/a/b/c/...` keys) 
+ 4. Similar to listing, rename can be a costly operation as it requires to rename many keys (renaming a first level directory means a rename of all the keys with the same prefix)
+
+See also the [Hadoop S3A documentation](https://hadoop.apache.org/docs/current/hadoop-aws/tools/hadoop-aws/index.html#Introducing_the_Hadoop_S3A_client) which describes some of these problem when AWS S3 is used. (*Warnings* section)
+
+# Current status
+
+As of today *Ozone Manager* has two different interfaces (both are defined in `OmClientProtocol.proto`): 
+
+ 1. object store related functions (like *CreateKey*, *LookupKey*, *DeleteKey*,...)  
+ 2. file system related functions (like *CreateFile*, *LookupFile*,...)
+
+File system related functions uses the same flat hierarchy under the hood but includes additional functionalities. For example the `createFile` call creates all the intermediate directories for a specific key (create file `/a/b/c` will create `/a/b` and `/a` entries in the key space)
+
+Today, if a key is created from the S3 interface can cause exceptions if the intermediate directories are checked from HCFS:
+
+
+```shell
+$ aws s3api put-object --endpoint http://localhost:9878 --bucket bucket1 --key /a/b/c/d
+
+$ ozone fs -ls o3fs://bucket1.s3v/a/
+ls: `o3fs://bucket1.s3v/a/': No such file or directory
+```
+
+This problem is reported in [HDDS-3955](https://issues.apache.org/jira/browse/HDDS-3955), where a new configuration key is introduced (`ozone.om.enable.filesystem.paths`). If this is enabled, intermediate directories are created even if the object store interface is used.
+
+This configuration is turned off by default, which means that S3 and HCFS couldn't be used together.
+
+To solve the performance problems of the directory listing / rename, [HDDS-2939](https://issues.apache.org/jira/browse/HDDS-2939) is created, which propose to use a new prefix table to store the "directory" entries (=prefixes).
+
+[HDDS-4097](https://issues.apache.org/jira/browse/HDDS-4097) is created to normalize the key names based on file-system semantics if `ozone.om.enable.filesystem.paths` is enabled. But please note that `ozone.om.enable.filesystem.paths` should always be turned on if S3 and HCFS are both used which means that S3 and HCFS couldn't be used together with normalization.
+
+# Goals
+
+ * Out of the box Ozone should support both S3 and HCFS interfaces without any settings. (It's possible only for the regular, fs compatible key names)
+ * As 100% compatibility couldn't be achieved on both side we need a configuration to set the expectations for incompatible key names
+ * Default behavior of `o3fs` and `ofs` should be as close to `s3a` as possible (when s3 compatibilty is prefered)
+
+# Possible cases to support
+
+There are two main aspects of supporting both `ofs/o3fs` and `s3` together:
+
+ 1. `ofs/o3fs` require to create intermediate directory entries (for exapmle `/a/b` for the key `/b/c/c`)
+ 2. Special file-system incompatible key names require special attention
+
+The second couldn't be done with compromise.
+
+ 1. We either support all key names (including non fs compatible key names), which means `ofs/o3fs` can provide only a partial view
+ 2. Or we can normalize the key names to be fs compatible (which makes it possible to create inconsistent S3 keys)
+
+HDDS-3955 introduced `ozone.om.enable.filesystem.paths`, with this setting we will have two possible usage pattern:
+
+| ozone.om.enable.filesystem.paths= | true | false
+|-|-|-|
+| create itermediate dirs | YES | NO |
+| normalize key names from `ofs/o3fs` | YES | NO
+| force to normalize key names of `s3` interface | YES (1) | NO 
+| `s3` key `/a/b/c` available from `ofs/o3fs` | YES | NO
+| `s3` key `/a/b//c` available from `ofs/o3fs` | YES | NO
+| `s3` key `/a/b//c` available from `s3` | AWS S3 incompatibility | YES
+
+(1): Under implementation
+
+This proposal suggest to use a 3rd option where 100% AWS compatiblity is guaranteed in exchange of a limited `ofs/o3fs` view:
+
+| ozone.om.intermediate.dir.generation= | true |
+|-|-|-|
+| create itermediate dirs | YES | 
+| normalize key names from `ofs/o3fs` | YES |
+| force to normalize key names of `s3` interface | **NO** |
+| `s3` key `/a/b/c` available from `ofs/o3fs` | YES | 
+| `s3` key `/a/b//c` available from `ofs/o3fs` | NO | 
+| `s3` key `/a/b//c` available from `s3` | *YES* (100% AWS compatibility) |
+
+
+# Proposed solution
+
+In short: 
+
+ **I propose to make it possible to configure **normalization** and **intermediate dir creation**, independent from each other**
+ 
+It can be done in multiple ways. For the sake of simplicity, let's imagine two configuration option
+
+| configuration | behavior | 
+|-|-|
+| `ozone.om.enable.filesystem.paths=true`  | Enable intermediate dir generation **AND** key name normalization 
+| `ozone.om.enable.intermediate.dirs=true` | Enable only the intermediate dir generation
+
+## S3/HCFS Interoperability
+
+**In case of intermediate directory generation is enabled (with either of the configuraiton keys)**:
+
+When somebody creates a new key like `/a/b/c/d`, the same key should be visible from HCFS (`o3fs//` or `o3://`). `/a`, `/a/b` and `/a/b/c` should be visible as directories from HCFS.
+
+S3 should list only the `/a/b/c/d` keys, (`/a`, `/a/b`, `/a/b/c` keys, created to help HCFS, **won't be visible** if the key is created from S3)
+
+This can be done with persisting an extra flag with the implicit directory entries. These entries can be modified if they are explicit created.
+
+This flag should be added only for the keys which are created by S3. `ofs://` and `of3fs://`  create explicit directories all the time.

Review comment:
       bq. S3 should list only the `/a/b/c/d` keys, (`/a`, `/a/b`, `/a/b/c` keys, created to help HCFS, **won't be visible** if the key is created from S3)
   
   what if user create additional files under intermediate dirs such as /a/b/d via FS interface, we still want to show them in this case for interop?
   

##########
File path: hadoop-hdds/docs/content/design/s3_hcfs.md
##########
@@ -0,0 +1,280 @@
+---
+title: S3/Ozone Filesystem inter-op 
+summary: How to support both S3 and HCFS and the same time
+date: 2020-09-09
+jira: HDDS-4097
+status: draft
+author: Marton Elek, 
+---
+<!--
+  Licensed under the Apache License, Version 2.0 (the "License");
+  you may not use this file except in compliance with the License.
+  You may obtain a copy of the License at
+
+   http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License. See accompanying LICENSE file.
+-->
+
+# Ozone S3 vs file-system semantics
+
+Ozone is an object-store for Hadoop ecosystem which can be used from multiple interfaces: 
+
+ 1. From Hadoop Compatible File Systems (will be called as *HCFS* in the remaining of this document) (RPC)
+ 2. From S3 compatible applications (REST)
+ 3. From container orchestrator as mounted volume (CSI, alpha feature)
+
+As Ozone is an object store it stores key and values in a flat hierarchy which is enough to support S3 (2). But to support Hadoop Compatible File System (and CSI), Ozone should simulated file system hierarchy.
+
+There are multiple challenges when file system hierarchy is simulated by a flat namespace:
+
+ 1. Some key patterns couldn't be easily transformed to file system path (e.g. `/a/b/../c`, `/a/b//d`, or a real key with directory path like `/b/d/`)
+ 2. Directory entries (which may have own properties) require special handling as file system interface requires a dir entry even if it's not created explicitly (for example if key `/a/b/c` is created `/a/b` supposed to be a visible directory entry for file system interface) 
+ 3. Non-recursive listing of directories can be hard (Listing direct entries under `/a` should ignore all the `/a/b/...`, `/a/b/c/...` keys) 
+ 4. Similar to listing, rename can be a costly operation as it requires to rename many keys (renaming a first level directory means a rename of all the keys with the same prefix)
+
+See also the [Hadoop S3A documentation](https://hadoop.apache.org/docs/current/hadoop-aws/tools/hadoop-aws/index.html#Introducing_the_Hadoop_S3A_client) which describes some of these problem when AWS S3 is used. (*Warnings* section)
+
+# Current status
+
+As of today *Ozone Manager* has two different interfaces (both are defined in `OmClientProtocol.proto`): 
+
+ 1. object store related functions (like *CreateKey*, *LookupKey*, *DeleteKey*,...)  
+ 2. file system related functions (like *CreateFile*, *LookupFile*,...)
+
+File system related functions uses the same flat hierarchy under the hood but includes additional functionalities. For example the `createFile` call creates all the intermediate directories for a specific key (create file `/a/b/c` will create `/a/b` and `/a` entries in the key space)
+
+Today, if a key is created from the S3 interface can cause exceptions if the intermediate directories are checked from HCFS:
+
+
+```shell
+$ aws s3api put-object --endpoint http://localhost:9878 --bucket bucket1 --key /a/b/c/d
+
+$ ozone fs -ls o3fs://bucket1.s3v/a/
+ls: `o3fs://bucket1.s3v/a/': No such file or directory
+```
+
+This problem is reported in [HDDS-3955](https://issues.apache.org/jira/browse/HDDS-3955), where a new configuration key is introduced (`ozone.om.enable.filesystem.paths`). If this is enabled, intermediate directories are created even if the object store interface is used.
+
+This configuration is turned off by default, which means that S3 and HCFS couldn't be used together.
+
+To solve the performance problems of the directory listing / rename, [HDDS-2939](https://issues.apache.org/jira/browse/HDDS-2939) is created, which propose to use a new prefix table to store the "directory" entries (=prefixes).
+
+[HDDS-4097](https://issues.apache.org/jira/browse/HDDS-4097) is created to normalize the key names based on file-system semantics if `ozone.om.enable.filesystem.paths` is enabled. But please note that `ozone.om.enable.filesystem.paths` should always be turned on if S3 and HCFS are both used. It means that if both S3 and HCFS are used, normalization is forced, and S3 interface is not fully AWS S3 compatible. There is no option to use HCFS and S3 but with full AWS compatibility (and reduced HCFS compatibility). 
+
+# Goals
+
+ * Out of the box Ozone should support both S3 and HCFS interfaces without any settings. (It's possible only for the regular, fs compatible key names)
+ * As 100% compatibility couldn't be achieved on both side we need a configuration to set the expectations for incompatible key names
+ * Default behavior of `o3fs` and `ofs` should be as close to `s3a` as possible (when s3 compatibilty is prefered)
+
+# Possible cases to support
+
+There are two main aspects of supporting both `ofs/o3fs` and `s3` together:
+
+ 1. `ofs/o3fs` require to create intermediate directory entries (for example `/a/b` for the key `/b/c/c`)
+ 2. Special file-system incompatible key names require special attention
+
+The second couldn't be done with compromise.
+
+ 1. We either support all key names (including non fs compatible key names), which means `ofs/o3fs` can provide only a partial view
+ 2. Or we can normalize the key names to be fs compatible (which makes it possible to create inconsistent S3 keys)
+
+HDDS-3955 introduced `ozone.om.enable.filesystem.paths`, with this setting we will have two possible usage pattern:
+
+| ozone.om.enable.filesystem.paths= | true | false
+|-|-|-|
+| create itermediate dirs | YES | NO |
+| normalize key names from `ofs/o3fs` | YES | NO
+| force to normalize key names of `s3` interface | YES | NO 
+| `s3` key `/a/b/c` available from `ofs/o3fs` | YES | NO
+| `s3` key `/a/b//c` available from `ofs/o3fs` | YES | NO
+| `s3` key `/a/b//c` available from `s3` | AWS S3 incompatibility | YES
+
+This proposal suggest to use a 3rd option where 100% AWS compatiblity is guaranteed in exchange of a limited `ofs/o3fs` view:
+
+| ozone.om.intermediate.dir.generation= | true |
+|-|-|-|
+| create itermediate dirs | YES | 
+| normalize key names from `ofs/o3fs` | YES |
+| force to normalize key names of `s3` interface | **NO** |
+| `s3` key `/a/b/c` available from `ofs/o3fs` | YES | 
+| `s3` key `/a/b//c` available from `ofs/o3fs` | NO | 
+| `s3` key `/a/b//c` available from `s3` | *YES* (100% AWS compatibility) |
+
+
+# Proposed solution
+
+In short: 
+
+ **I propose to make it possible to configure **normalization** and **intermediate dir creation**, independent from each other**
+ 
+It can be done in multiple ways. For the sake of simplicity, let's imagine two configuration option
+
+| configuration | behavior | 
+|-|-|
+| `ozone.om.enable.filesystem.paths=true`  | Enable intermediate dir generation **AND** key name normalization 
+| `ozone.om.enable.intermediate.dirs=true` | Enable only the intermediate dir generation
+
+## S3/HCFS Interoperability
+
+**In case of intermediate directory generation is enabled (with either of the configuraiton keys)**:
+
+When somebody creates a new key like `/a/b/c/d`, the same key should be visible from HCFS (`o3fs//` or `o3://`). `/a`, `/a/b` and `/a/b/c` should be visible as directories from HCFS.
+
+S3 should list only the `/a/b/c/d` keys, (`/a`, `/a/b`, `/a/b/c` keys, created to help HCFS, **won't be visible** if the key is created from S3)
+
+This can be done with persisting an extra flag with the implicit directory entries. These entries can be modified if they are explicit created.
+
+This flag should be added only for the keys which are created by S3. `ofs://` and `of3fs://`  create explicit directories all the time.
+
+Advantages of this approach:
+
+ 1. HCFS and S3 can work together
+ 2. S3 behavior is closer to the original AWS s3 behavior (when `/a/b/c` key is created `/a/b` won't be visible)
+
+## Handling of the incompatible paths
+
+As it's defined above the intermediate directory generation and normalization are two independent settings. (It's possible to choose only to create the intermediate directories).
+
+**If normalization is choosen**: (`ozone.om.enable.filesystem.paths=true`), all the key names will be normalized to fs-compatible name. It may cause a conflict (error) if the normalized key is already exists (or exists as a file instead of directory)
+
+**Without normalization (`ozone.om.enable.intermediate.dirs=true`)**:
+
+Creating intermediate directories might not be possible if path contains illegal characters or can't be parsed as a file system path. **These keys will be invisible from HCFS** by default. They will be ignored during the normal file list.

Review comment:
       The illegal char from S3 path can be encoded into FS path except the /. 

##########
File path: hadoop-hdds/docs/content/design/s3_hcfs.md
##########
@@ -0,0 +1,280 @@
+---
+title: S3/Ozone Filesystem inter-op 
+summary: How to support both S3 and HCFS and the same time
+date: 2020-09-09
+jira: HDDS-4097
+status: draft
+author: Marton Elek, 
+---
+<!--
+  Licensed under the Apache License, Version 2.0 (the "License");
+  you may not use this file except in compliance with the License.
+  You may obtain a copy of the License at
+
+   http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License. See accompanying LICENSE file.
+-->
+
+# Ozone S3 vs file-system semantics
+
+Ozone is an object-store for Hadoop ecosystem which can be used from multiple interfaces: 
+
+ 1. From Hadoop Compatible File Systems (will be called as *HCFS* in the remaining of this document) (RPC)
+ 2. From S3 compatible applications (REST)
+ 3. From container orchestrator as mounted volume (CSI, alpha feature)
+
+As Ozone is an object store it stores key and values in a flat hierarchy which is enough to support S3 (2). But to support Hadoop Compatible File System (and CSI), Ozone should simulated file system hierarchy.
+
+There are multiple challenges when file system hierarchy is simulated by a flat namespace:
+
+ 1. Some key patterns couldn't be easily transformed to file system path (e.g. `/a/b/../c`, `/a/b//d`, or a real key with directory path like `/b/d/`)
+ 2. Directory entries (which may have own properties) require special handling as file system interface requires a dir entry even if it's not created explicitly (for example if key `/a/b/c` is created `/a/b` supposed to be a visible directory entry for file system interface) 
+ 3. Non-recursive listing of directories can be hard (Listing direct entries under `/a` should ignore all the `/a/b/...`, `/a/b/c/...` keys) 
+ 4. Similar to listing, rename can be a costly operation as it requires to rename many keys (renaming a first level directory means a rename of all the keys with the same prefix)
+
+See also the [Hadoop S3A documentation](https://hadoop.apache.org/docs/current/hadoop-aws/tools/hadoop-aws/index.html#Introducing_the_Hadoop_S3A_client) which describes some of these problem when AWS S3 is used. (*Warnings* section)
+
+# Current status
+
+As of today *Ozone Manager* has two different interfaces (both are defined in `OmClientProtocol.proto`): 
+
+ 1. object store related functions (like *CreateKey*, *LookupKey*, *DeleteKey*,...)  
+ 2. file system related functions (like *CreateFile*, *LookupFile*,...)
+
+File system related functions uses the same flat hierarchy under the hood but includes additional functionalities. For example the `createFile` call creates all the intermediate directories for a specific key (create file `/a/b/c` will create `/a/b` and `/a` entries in the key space)
+
+Today, if a key is created from the S3 interface can cause exceptions if the intermediate directories are checked from HCFS:
+
+
+```shell
+$ aws s3api put-object --endpoint http://localhost:9878 --bucket bucket1 --key /a/b/c/d
+
+$ ozone fs -ls o3fs://bucket1.s3v/a/
+ls: `o3fs://bucket1.s3v/a/': No such file or directory
+```
+
+This problem is reported in [HDDS-3955](https://issues.apache.org/jira/browse/HDDS-3955), where a new configuration key is introduced (`ozone.om.enable.filesystem.paths`). If this is enabled, intermediate directories are created even if the object store interface is used.
+
+This configuration is turned off by default, which means that S3 and HCFS couldn't be used together.
+
+To solve the performance problems of the directory listing / rename, [HDDS-2939](https://issues.apache.org/jira/browse/HDDS-2939) is created, which propose to use a new prefix table to store the "directory" entries (=prefixes).
+
+[HDDS-4097](https://issues.apache.org/jira/browse/HDDS-4097) is created to normalize the key names based on file-system semantics if `ozone.om.enable.filesystem.paths` is enabled. But please note that `ozone.om.enable.filesystem.paths` should always be turned on if S3 and HCFS are both used. It means that if both S3 and HCFS are used, normalization is forced, and S3 interface is not fully AWS S3 compatible. There is no option to use HCFS and S3 but with full AWS compatibility (and reduced HCFS compatibility). 
+
+# Goals
+
+ * Out of the box Ozone should support both S3 and HCFS interfaces without any settings. (It's possible only for the regular, fs compatible key names)
+ * As 100% compatibility couldn't be achieved on both side we need a configuration to set the expectations for incompatible key names
+ * Default behavior of `o3fs` and `ofs` should be as close to `s3a` as possible (when s3 compatibilty is prefered)
+
+# Possible cases to support
+
+There are two main aspects of supporting both `ofs/o3fs` and `s3` together:
+
+ 1. `ofs/o3fs` require to create intermediate directory entries (for example `/a/b` for the key `/b/c/c`)

Review comment:
       /b/c/c => /a/b/c

##########
File path: hadoop-hdds/docs/content/design/s3_hcfs.md
##########
@@ -0,0 +1,280 @@
+---
+title: S3/Ozone Filesystem inter-op 
+summary: How to support both S3 and HCFS and the same time
+date: 2020-09-09
+jira: HDDS-4097
+status: draft
+author: Marton Elek, 
+---
+<!--
+  Licensed under the Apache License, Version 2.0 (the "License");
+  you may not use this file except in compliance with the License.
+  You may obtain a copy of the License at
+
+   http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License. See accompanying LICENSE file.
+-->
+
+# Ozone S3 vs file-system semantics
+
+Ozone is an object-store for Hadoop ecosystem which can be used from multiple interfaces: 
+
+ 1. From Hadoop Compatible File Systems (will be called as *HCFS* in the remaining of this document) (RPC)
+ 2. From S3 compatible applications (REST)
+ 3. From container orchestrator as mounted volume (CSI, alpha feature)
+
+As Ozone is an object store it stores key and values in a flat hierarchy which is enough to support S3 (2). But to support Hadoop Compatible File System (and CSI), Ozone should simulated file system hierarchy.
+
+There are multiple challenges when file system hierarchy is simulated by a flat namespace:
+
+ 1. Some key patterns couldn't be easily transformed to file system path (e.g. `/a/b/../c`, `/a/b//d`, or a real key with directory path like `/b/d/`)
+ 2. Directory entries (which may have own properties) require special handling as file system interface requires a dir entry even if it's not created explicitly (for example if key `/a/b/c` is created `/a/b` supposed to be a visible directory entry for file system interface) 
+ 3. Non-recursive listing of directories can be hard (Listing direct entries under `/a` should ignore all the `/a/b/...`, `/a/b/c/...` keys) 
+ 4. Similar to listing, rename can be a costly operation as it requires to rename many keys (renaming a first level directory means a rename of all the keys with the same prefix)
+
+See also the [Hadoop S3A documentation](https://hadoop.apache.org/docs/current/hadoop-aws/tools/hadoop-aws/index.html#Introducing_the_Hadoop_S3A_client) which describes some of these problem when AWS S3 is used. (*Warnings* section)
+
+# Current status
+
+As of today *Ozone Manager* has two different interfaces (both are defined in `OmClientProtocol.proto`): 
+
+ 1. object store related functions (like *CreateKey*, *LookupKey*, *DeleteKey*,...)  
+ 2. file system related functions (like *CreateFile*, *LookupFile*,...)
+
+File system related functions uses the same flat hierarchy under the hood but includes additional functionalities. For example the `createFile` call creates all the intermediate directories for a specific key (create file `/a/b/c` will create `/a/b` and `/a` entries in the key space)
+
+Today, if a key is created from the S3 interface can cause exceptions if the intermediate directories are checked from HCFS:
+
+
+```shell
+$ aws s3api put-object --endpoint http://localhost:9878 --bucket bucket1 --key /a/b/c/d
+
+$ ozone fs -ls o3fs://bucket1.s3v/a/
+ls: `o3fs://bucket1.s3v/a/': No such file or directory
+```
+
+This problem is reported in [HDDS-3955](https://issues.apache.org/jira/browse/HDDS-3955), where a new configuration key is introduced (`ozone.om.enable.filesystem.paths`). If this is enabled, intermediate directories are created even if the object store interface is used.
+
+This configuration is turned off by default, which means that S3 and HCFS couldn't be used together.
+
+To solve the performance problems of the directory listing / rename, [HDDS-2939](https://issues.apache.org/jira/browse/HDDS-2939) is created, which propose to use a new prefix table to store the "directory" entries (=prefixes).
+
+[HDDS-4097](https://issues.apache.org/jira/browse/HDDS-4097) is created to normalize the key names based on file-system semantics if `ozone.om.enable.filesystem.paths` is enabled. But please note that `ozone.om.enable.filesystem.paths` should always be turned on if S3 and HCFS are both used. It means that if both S3 and HCFS are used, normalization is forced, and S3 interface is not fully AWS S3 compatible. There is no option to use HCFS and S3 but with full AWS compatibility (and reduced HCFS compatibility). 
+
+# Goals
+
+ * Out of the box Ozone should support both S3 and HCFS interfaces without any settings. (It's possible only for the regular, fs compatible key names)
+ * As 100% compatibility couldn't be achieved on both side we need a configuration to set the expectations for incompatible key names
+ * Default behavior of `o3fs` and `ofs` should be as close to `s3a` as possible (when s3 compatibilty is prefered)
+
+# Possible cases to support
+
+There are two main aspects of supporting both `ofs/o3fs` and `s3` together:
+
+ 1. `ofs/o3fs` require to create intermediate directory entries (for example `/a/b` for the key `/b/c/c`)
+ 2. Special file-system incompatible key names require special attention
+
+The second couldn't be done with compromise.
+
+ 1. We either support all key names (including non fs compatible key names), which means `ofs/o3fs` can provide only a partial view
+ 2. Or we can normalize the key names to be fs compatible (which makes it possible to create inconsistent S3 keys)
+
+HDDS-3955 introduced `ozone.om.enable.filesystem.paths`, with this setting we will have two possible usage pattern:
+
+| ozone.om.enable.filesystem.paths= | true | false
+|-|-|-|
+| create itermediate dirs | YES | NO |
+| normalize key names from `ofs/o3fs` | YES | NO
+| force to normalize key names of `s3` interface | YES | NO 
+| `s3` key `/a/b/c` available from `ofs/o3fs` | YES | NO
+| `s3` key `/a/b//c` available from `ofs/o3fs` | YES | NO
+| `s3` key `/a/b//c` available from `s3` | AWS S3 incompatibility | YES

Review comment:
       Any pointer to S3 compatibility requirement? e.g., path handling, normalization, etc.

##########
File path: hadoop-hdds/docs/content/design/s3_hcfs.md
##########
@@ -0,0 +1,282 @@
+---
+title: S3/Ozone Filesystem inter-op 
+summary: How to support both S3 and HCFS and the same time
+date: 2020-09-09
+jira: HDDS-4097
+status: draft
+author: Marton Elek, 
+---
+<!--
+  Licensed under the Apache License, Version 2.0 (the "License");
+  you may not use this file except in compliance with the License.
+  You may obtain a copy of the License at
+
+   http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License. See accompanying LICENSE file.
+-->
+
+# Ozone S3 vs file-system semantics
+
+Ozone is an object-store for Hadoop ecosystem which can be used from multiple interfaces: 
+
+ 1. From Hadoop Compatible File Systems (will be called as *HCFS* in the remaining of this document) (RPC)
+ 2. From S3 compatible applications (REST)
+ 3. From container orchestrator as mounted volume (CSI, alpha feature)
+
+As Ozone is an object store it stores key and values in a flat hierarchy which is enough to support S3 (2). But to support Hadoop Compatible File System (and CSI), Ozone should simulated file system hierarchy.
+
+There are multiple challenges when file system hierarchy is simulated by a flat namespace:
+
+ 1. Some key patterns couldn't be easily transformed to file system path (e.g. `/a/b/../c`, `/a/b//d`, or a real key with directory path like `/b/d/`)
+ 2. Directory entries (which may have own properties) require special handling as file system interface requires a dir entry even if it's not created explicitly (for example if key `/a/b/c` is created `/a/b` supposed to be a visible directory entry for file system interface) 
+ 3. Non-recursive listing of directories can be hard (Listing direct entries under `/a` should ignore all the `/a/b/...`, `/a/b/c/...` keys) 
+ 4. Similar to listing, rename can be a costly operation as it requires to rename many keys (renaming a first level directory means a rename of all the keys with the same prefix)
+
+See also the [Hadoop S3A documentation](https://hadoop.apache.org/docs/current/hadoop-aws/tools/hadoop-aws/index.html#Introducing_the_Hadoop_S3A_client) which describes some of these problem when AWS S3 is used. (*Warnings* section)
+
+# Current status
+
+As of today *Ozone Manager* has two different interfaces (both are defined in `OmClientProtocol.proto`): 
+
+ 1. object store related functions (like *CreateKey*, *LookupKey*, *DeleteKey*,...)  
+ 2. file system related functions (like *CreateFile*, *LookupFile*,...)
+
+File system related functions uses the same flat hierarchy under the hood but includes additional functionalities. For example the `createFile` call creates all the intermediate directories for a specific key (create file `/a/b/c` will create `/a/b` and `/a` entries in the key space)
+
+Today, if a key is created from the S3 interface can cause exceptions if the intermediate directories are checked from HCFS:
+
+
+```shell
+$ aws s3api put-object --endpoint http://localhost:9878 --bucket bucket1 --key /a/b/c/d
+
+$ ozone fs -ls o3fs://bucket1.s3v/a/
+ls: `o3fs://bucket1.s3v/a/': No such file or directory
+```
+
+This problem is reported in [HDDS-3955](https://issues.apache.org/jira/browse/HDDS-3955), where a new configuration key is introduced (`ozone.om.enable.filesystem.paths`). If this is enabled, intermediate directories are created even if the object store interface is used.
+
+This configuration is turned off by default, which means that S3 and HCFS couldn't be used together.
+
+To solve the performance problems of the directory listing / rename, [HDDS-2939](https://issues.apache.org/jira/browse/HDDS-2939) is created, which propose to use a new prefix table to store the "directory" entries (=prefixes).
+
+[HDDS-4097](https://issues.apache.org/jira/browse/HDDS-4097) is created to normalize the key names based on file-system semantics if `ozone.om.enable.filesystem.paths` is enabled. But please note that `ozone.om.enable.filesystem.paths` should always be turned on if S3 and HCFS are both used which means that S3 and HCFS couldn't be used together with normalization.
+
+# Goals
+
+ * Out of the box Ozone should support both S3 and HCFS interfaces without any settings. (It's possible only for the regular, fs compatible key names)
+ * As 100% compatibility couldn't be achieved on both side we need a configuration to set the expectations for incompatible key names
+ * Default behavior of `o3fs` and `ofs` should be as close to `s3a` as possible (when s3 compatibilty is prefered)
+
+# Possible cases to support
+
+There are two main aspects of supporting both `ofs/o3fs` and `s3` together:
+
+ 1. `ofs/o3fs` require to create intermediate directory entries (for exapmle `/a/b` for the key `/b/c/c`)
+ 2. Special file-system incompatible key names require special attention
+
+The second couldn't be done with compromise.
+
+ 1. We either support all key names (including non fs compatible key names), which means `ofs/o3fs` can provide only a partial view
+ 2. Or we can normalize the key names to be fs compatible (which makes it possible to create inconsistent S3 keys)
+
+HDDS-3955 introduced `ozone.om.enable.filesystem.paths`, with this setting we will have two possible usage pattern:
+
+| ozone.om.enable.filesystem.paths= | true | false
+|-|-|-|
+| create itermediate dirs | YES | NO |
+| normalize key names from `ofs/o3fs` | YES | NO
+| force to normalize key names of `s3` interface | YES (1) | NO 
+| `s3` key `/a/b/c` available from `ofs/o3fs` | YES | NO
+| `s3` key `/a/b//c` available from `ofs/o3fs` | YES | NO
+| `s3` key `/a/b//c` available from `s3` | AWS S3 incompatibility | YES
+
+(1): Under implementation
+
+This proposal suggest to use a 3rd option where 100% AWS compatiblity is guaranteed in exchange of a limited `ofs/o3fs` view:
+
+| ozone.om.intermediate.dir.generation= | true |
+|-|-|-|
+| create itermediate dirs | YES | 
+| normalize key names from `ofs/o3fs` | YES |
+| force to normalize key names of `s3` interface | **NO** |
+| `s3` key `/a/b/c` available from `ofs/o3fs` | YES | 
+| `s3` key `/a/b//c` available from `ofs/o3fs` | NO | 
+| `s3` key `/a/b//c` available from `s3` | *YES* (100% AWS compatibility) |
+
+
+# Proposed solution
+
+In short: 
+
+ **I propose to make it possible to configure **normalization** and **intermediate dir creation**, independent from each other**
+ 
+It can be done in multiple ways. For the sake of simplicity, let's imagine two configuration option
+
+| configuration | behavior | 
+|-|-|
+| `ozone.om.enable.filesystem.paths=true`  | Enable intermediate dir generation **AND** key name normalization 

Review comment:
       bq. Existing config means CREATE_DIR+NORMALIZE, new config is just CREATE_DIR.
   
   Are these the only two cases that are useful? do we need to support other combinations?

##########
File path: hadoop-hdds/docs/content/design/s3_hcfs.md
##########
@@ -0,0 +1,282 @@
+---
+title: S3/Ozone Filesystem inter-op 
+summary: How to support both S3 and HCFS and the same time
+date: 2020-09-09
+jira: HDDS-4097
+status: draft
+author: Marton Elek, 
+---
+<!--
+  Licensed under the Apache License, Version 2.0 (the "License");
+  you may not use this file except in compliance with the License.
+  You may obtain a copy of the License at
+
+   http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License. See accompanying LICENSE file.
+-->
+
+# Ozone S3 vs file-system semantics
+
+Ozone is an object-store for Hadoop ecosystem which can be used from multiple interfaces: 
+
+ 1. From Hadoop Compatible File Systems (will be called as *HCFS* in the remaining of this document) (RPC)
+ 2. From S3 compatible applications (REST)
+ 3. From container orchestrator as mounted volume (CSI, alpha feature)
+
+As Ozone is an object store it stores key and values in a flat hierarchy which is enough to support S3 (2). But to support Hadoop Compatible File System (and CSI), Ozone should simulated file system hierarchy.
+
+There are multiple challenges when file system hierarchy is simulated by a flat namespace:
+
+ 1. Some key patterns couldn't be easily transformed to file system path (e.g. `/a/b/../c`, `/a/b//d`, or a real key with directory path like `/b/d/`)
+ 2. Directory entries (which may have own properties) require special handling as file system interface requires a dir entry even if it's not created explicitly (for example if key `/a/b/c` is created `/a/b` supposed to be a visible directory entry for file system interface) 
+ 3. Non-recursive listing of directories can be hard (Listing direct entries under `/a` should ignore all the `/a/b/...`, `/a/b/c/...` keys) 
+ 4. Similar to listing, rename can be a costly operation as it requires to rename many keys (renaming a first level directory means a rename of all the keys with the same prefix)
+
+See also the [Hadoop S3A documentation](https://hadoop.apache.org/docs/current/hadoop-aws/tools/hadoop-aws/index.html#Introducing_the_Hadoop_S3A_client) which describes some of these problem when AWS S3 is used. (*Warnings* section)
+
+# Current status
+
+As of today *Ozone Manager* has two different interfaces (both are defined in `OmClientProtocol.proto`): 
+
+ 1. object store related functions (like *CreateKey*, *LookupKey*, *DeleteKey*,...)  
+ 2. file system related functions (like *CreateFile*, *LookupFile*,...)
+
+File system related functions uses the same flat hierarchy under the hood but includes additional functionalities. For example the `createFile` call creates all the intermediate directories for a specific key (create file `/a/b/c` will create `/a/b` and `/a` entries in the key space)
+
+Today, if a key is created from the S3 interface can cause exceptions if the intermediate directories are checked from HCFS:
+
+
+```shell
+$ aws s3api put-object --endpoint http://localhost:9878 --bucket bucket1 --key /a/b/c/d
+
+$ ozone fs -ls o3fs://bucket1.s3v/a/
+ls: `o3fs://bucket1.s3v/a/': No such file or directory
+```
+
+This problem is reported in [HDDS-3955](https://issues.apache.org/jira/browse/HDDS-3955), where a new configuration key is introduced (`ozone.om.enable.filesystem.paths`). If this is enabled, intermediate directories are created even if the object store interface is used.
+
+This configuration is turned off by default, which means that S3 and HCFS couldn't be used together.
+
+To solve the performance problems of the directory listing / rename, [HDDS-2939](https://issues.apache.org/jira/browse/HDDS-2939) is created, which propose to use a new prefix table to store the "directory" entries (=prefixes).
+
+[HDDS-4097](https://issues.apache.org/jira/browse/HDDS-4097) is created to normalize the key names based on file-system semantics if `ozone.om.enable.filesystem.paths` is enabled. But please note that `ozone.om.enable.filesystem.paths` should always be turned on if S3 and HCFS are both used which means that S3 and HCFS couldn't be used together with normalization.
+
+# Goals
+
+ * Out of the box Ozone should support both S3 and HCFS interfaces without any settings. (It's possible only for the regular, fs compatible key names)
+ * As 100% compatibility couldn't be achieved on both side we need a configuration to set the expectations for incompatible key names
+ * Default behavior of `o3fs` and `ofs` should be as close to `s3a` as possible (when s3 compatibilty is prefered)
+
+# Possible cases to support
+
+There are two main aspects of supporting both `ofs/o3fs` and `s3` together:
+
+ 1. `ofs/o3fs` require to create intermediate directory entries (for exapmle `/a/b` for the key `/b/c/c`)
+ 2. Special file-system incompatible key names require special attention
+
+The second couldn't be done with compromise.
+
+ 1. We either support all key names (including non fs compatible key names), which means `ofs/o3fs` can provide only a partial view
+ 2. Or we can normalize the key names to be fs compatible (which makes it possible to create inconsistent S3 keys)
+
+HDDS-3955 introduced `ozone.om.enable.filesystem.paths`, with this setting we will have two possible usage pattern:
+
+| ozone.om.enable.filesystem.paths= | true | false
+|-|-|-|
+| create itermediate dirs | YES | NO |
+| normalize key names from `ofs/o3fs` | YES | NO
+| force to normalize key names of `s3` interface | YES (1) | NO 
+| `s3` key `/a/b/c` available from `ofs/o3fs` | YES | NO
+| `s3` key `/a/b//c` available from `ofs/o3fs` | YES | NO
+| `s3` key `/a/b//c` available from `s3` | AWS S3 incompatibility | YES
+
+(1): Under implementation
+
+This proposal suggest to use a 3rd option where 100% AWS compatiblity is guaranteed in exchange of a limited `ofs/o3fs` view:
+
+| ozone.om.intermediate.dir.generation= | true |
+|-|-|-|
+| create itermediate dirs | YES | 
+| normalize key names from `ofs/o3fs` | YES |
+| force to normalize key names of `s3` interface | **NO** |
+| `s3` key `/a/b/c` available from `ofs/o3fs` | YES | 
+| `s3` key `/a/b//c` available from `ofs/o3fs` | NO | 
+| `s3` key `/a/b//c` available from `s3` | *YES* (100% AWS compatibility) |
+
+
+# Proposed solution
+
+In short: 
+
+ **I propose to make it possible to configure **normalization** and **intermediate dir creation**, independent from each other**
+ 
+It can be done in multiple ways. For the sake of simplicity, let's imagine two configuration option
+
+| configuration | behavior | 
+|-|-|
+| `ozone.om.enable.filesystem.paths=true`  | Enable intermediate dir generation **AND** key name normalization 
+| `ozone.om.enable.intermediate.dirs=true` | Enable only the intermediate dir generation
+
+## S3/HCFS Interoperability
+
+**In case of intermediate directory generation is enabled (with either of the configuraiton keys)**:
+
+When somebody creates a new key like `/a/b/c/d`, the same key should be visible from HCFS (`o3fs//` or `o3://`). `/a`, `/a/b` and `/a/b/c` should be visible as directories from HCFS.
+
+S3 should list only the `/a/b/c/d` keys, (`/a`, `/a/b`, `/a/b/c` keys, created to help HCFS, **won't be visible** if the key is created from S3)
+
+This can be done with persisting an extra flag with the implicit directory entries. These entries can be modified if they are explicit created.
+
+This flag should be added only for the keys which are created by S3. `ofs://` and `of3fs://`  create explicit directories all the time.

Review comment:
       bq. S3 should list only the `/a/b/c/d` keys, (`/a`, `/a/b`, `/a/b/c` keys, created to help HCFS, **won't be visible** if the key is created from S3)
   
   what if user create additional files under intermediate dirs such as /a/b/d via FS interface, we still want to show them in this case for interop?
   

##########
File path: hadoop-hdds/docs/content/design/s3_hcfs.md
##########
@@ -0,0 +1,280 @@
+---
+title: S3/Ozone Filesystem inter-op 
+summary: How to support both S3 and HCFS and the same time
+date: 2020-09-09
+jira: HDDS-4097
+status: draft
+author: Marton Elek, 
+---
+<!--
+  Licensed under the Apache License, Version 2.0 (the "License");
+  you may not use this file except in compliance with the License.
+  You may obtain a copy of the License at
+
+   http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License. See accompanying LICENSE file.
+-->
+
+# Ozone S3 vs file-system semantics
+
+Ozone is an object-store for Hadoop ecosystem which can be used from multiple interfaces: 
+
+ 1. From Hadoop Compatible File Systems (will be called as *HCFS* in the remaining of this document) (RPC)
+ 2. From S3 compatible applications (REST)
+ 3. From container orchestrator as mounted volume (CSI, alpha feature)
+
+As Ozone is an object store it stores key and values in a flat hierarchy which is enough to support S3 (2). But to support Hadoop Compatible File System (and CSI), Ozone should simulated file system hierarchy.
+
+There are multiple challenges when file system hierarchy is simulated by a flat namespace:
+
+ 1. Some key patterns couldn't be easily transformed to file system path (e.g. `/a/b/../c`, `/a/b//d`, or a real key with directory path like `/b/d/`)
+ 2. Directory entries (which may have own properties) require special handling as file system interface requires a dir entry even if it's not created explicitly (for example if key `/a/b/c` is created `/a/b` supposed to be a visible directory entry for file system interface) 
+ 3. Non-recursive listing of directories can be hard (Listing direct entries under `/a` should ignore all the `/a/b/...`, `/a/b/c/...` keys) 
+ 4. Similar to listing, rename can be a costly operation as it requires to rename many keys (renaming a first level directory means a rename of all the keys with the same prefix)
+
+See also the [Hadoop S3A documentation](https://hadoop.apache.org/docs/current/hadoop-aws/tools/hadoop-aws/index.html#Introducing_the_Hadoop_S3A_client) which describes some of these problem when AWS S3 is used. (*Warnings* section)
+
+# Current status
+
+As of today *Ozone Manager* has two different interfaces (both are defined in `OmClientProtocol.proto`): 
+
+ 1. object store related functions (like *CreateKey*, *LookupKey*, *DeleteKey*,...)  
+ 2. file system related functions (like *CreateFile*, *LookupFile*,...)
+
+File system related functions uses the same flat hierarchy under the hood but includes additional functionalities. For example the `createFile` call creates all the intermediate directories for a specific key (create file `/a/b/c` will create `/a/b` and `/a` entries in the key space)
+
+Today, if a key is created from the S3 interface can cause exceptions if the intermediate directories are checked from HCFS:
+
+
+```shell
+$ aws s3api put-object --endpoint http://localhost:9878 --bucket bucket1 --key /a/b/c/d
+
+$ ozone fs -ls o3fs://bucket1.s3v/a/
+ls: `o3fs://bucket1.s3v/a/': No such file or directory
+```
+
+This problem is reported in [HDDS-3955](https://issues.apache.org/jira/browse/HDDS-3955), where a new configuration key is introduced (`ozone.om.enable.filesystem.paths`). If this is enabled, intermediate directories are created even if the object store interface is used.
+
+This configuration is turned off by default, which means that S3 and HCFS couldn't be used together.
+
+To solve the performance problems of the directory listing / rename, [HDDS-2939](https://issues.apache.org/jira/browse/HDDS-2939) is created, which propose to use a new prefix table to store the "directory" entries (=prefixes).
+
+[HDDS-4097](https://issues.apache.org/jira/browse/HDDS-4097) is created to normalize the key names based on file-system semantics if `ozone.om.enable.filesystem.paths` is enabled. But please note that `ozone.om.enable.filesystem.paths` should always be turned on if S3 and HCFS are both used. It means that if both S3 and HCFS are used, normalization is forced, and S3 interface is not fully AWS S3 compatible. There is no option to use HCFS and S3 but with full AWS compatibility (and reduced HCFS compatibility). 
+
+# Goals
+
+ * Out of the box Ozone should support both S3 and HCFS interfaces without any settings. (It's possible only for the regular, fs compatible key names)
+ * As 100% compatibility couldn't be achieved on both side we need a configuration to set the expectations for incompatible key names
+ * Default behavior of `o3fs` and `ofs` should be as close to `s3a` as possible (when s3 compatibilty is prefered)
+
+# Possible cases to support
+
+There are two main aspects of supporting both `ofs/o3fs` and `s3` together:
+
+ 1. `ofs/o3fs` require to create intermediate directory entries (for example `/a/b` for the key `/b/c/c`)
+ 2. Special file-system incompatible key names require special attention
+
+The second couldn't be done with compromise.
+
+ 1. We either support all key names (including non fs compatible key names), which means `ofs/o3fs` can provide only a partial view
+ 2. Or we can normalize the key names to be fs compatible (which makes it possible to create inconsistent S3 keys)
+
+HDDS-3955 introduced `ozone.om.enable.filesystem.paths`, with this setting we will have two possible usage pattern:
+
+| ozone.om.enable.filesystem.paths= | true | false
+|-|-|-|
+| create itermediate dirs | YES | NO |
+| normalize key names from `ofs/o3fs` | YES | NO
+| force to normalize key names of `s3` interface | YES | NO 
+| `s3` key `/a/b/c` available from `ofs/o3fs` | YES | NO
+| `s3` key `/a/b//c` available from `ofs/o3fs` | YES | NO
+| `s3` key `/a/b//c` available from `s3` | AWS S3 incompatibility | YES
+
+This proposal suggest to use a 3rd option where 100% AWS compatiblity is guaranteed in exchange of a limited `ofs/o3fs` view:
+
+| ozone.om.intermediate.dir.generation= | true |
+|-|-|-|
+| create itermediate dirs | YES | 
+| normalize key names from `ofs/o3fs` | YES |
+| force to normalize key names of `s3` interface | **NO** |
+| `s3` key `/a/b/c` available from `ofs/o3fs` | YES | 
+| `s3` key `/a/b//c` available from `ofs/o3fs` | NO | 
+| `s3` key `/a/b//c` available from `s3` | *YES* (100% AWS compatibility) |
+
+
+# Proposed solution
+
+In short: 
+
+ **I propose to make it possible to configure **normalization** and **intermediate dir creation**, independent from each other**
+ 
+It can be done in multiple ways. For the sake of simplicity, let's imagine two configuration option
+
+| configuration | behavior | 
+|-|-|
+| `ozone.om.enable.filesystem.paths=true`  | Enable intermediate dir generation **AND** key name normalization 
+| `ozone.om.enable.intermediate.dirs=true` | Enable only the intermediate dir generation
+
+## S3/HCFS Interoperability
+
+**In case of intermediate directory generation is enabled (with either of the configuraiton keys)**:
+
+When somebody creates a new key like `/a/b/c/d`, the same key should be visible from HCFS (`o3fs//` or `o3://`). `/a`, `/a/b` and `/a/b/c` should be visible as directories from HCFS.
+
+S3 should list only the `/a/b/c/d` keys, (`/a`, `/a/b`, `/a/b/c` keys, created to help HCFS, **won't be visible** if the key is created from S3)
+
+This can be done with persisting an extra flag with the implicit directory entries. These entries can be modified if they are explicit created.
+
+This flag should be added only for the keys which are created by S3. `ofs://` and `of3fs://`  create explicit directories all the time.
+
+Advantages of this approach:
+
+ 1. HCFS and S3 can work together
+ 2. S3 behavior is closer to the original AWS s3 behavior (when `/a/b/c` key is created `/a/b` won't be visible)
+
+## Handling of the incompatible paths
+
+As it's defined above the intermediate directory generation and normalization are two independent settings. (It's possible to choose only to create the intermediate directories).
+
+**If normalization is choosen**: (`ozone.om.enable.filesystem.paths=true`), all the key names will be normalized to fs-compatible name. It may cause a conflict (error) if the normalized key is already exists (or exists as a file instead of directory)
+
+**Without normalization (`ozone.om.enable.intermediate.dirs=true`)**:
+
+Creating intermediate directories might not be possible if path contains illegal characters or can't be parsed as a file system path. **These keys will be invisible from HCFS** by default. They will be ignored during the normal file list.

Review comment:
       The illegal char from S3 path can be encoded into FS path except the /. 

##########
File path: hadoop-hdds/docs/content/design/s3_hcfs.md
##########
@@ -0,0 +1,280 @@
+---
+title: S3/Ozone Filesystem inter-op 
+summary: How to support both S3 and HCFS and the same time
+date: 2020-09-09
+jira: HDDS-4097
+status: draft
+author: Marton Elek, 
+---
+<!--
+  Licensed under the Apache License, Version 2.0 (the "License");
+  you may not use this file except in compliance with the License.
+  You may obtain a copy of the License at
+
+   http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License. See accompanying LICENSE file.
+-->
+
+# Ozone S3 vs file-system semantics
+
+Ozone is an object-store for Hadoop ecosystem which can be used from multiple interfaces: 
+
+ 1. From Hadoop Compatible File Systems (will be called as *HCFS* in the remaining of this document) (RPC)
+ 2. From S3 compatible applications (REST)
+ 3. From container orchestrator as mounted volume (CSI, alpha feature)
+
+As Ozone is an object store it stores key and values in a flat hierarchy which is enough to support S3 (2). But to support Hadoop Compatible File System (and CSI), Ozone should simulated file system hierarchy.
+
+There are multiple challenges when file system hierarchy is simulated by a flat namespace:
+
+ 1. Some key patterns couldn't be easily transformed to file system path (e.g. `/a/b/../c`, `/a/b//d`, or a real key with directory path like `/b/d/`)
+ 2. Directory entries (which may have own properties) require special handling as file system interface requires a dir entry even if it's not created explicitly (for example if key `/a/b/c` is created `/a/b` supposed to be a visible directory entry for file system interface) 
+ 3. Non-recursive listing of directories can be hard (Listing direct entries under `/a` should ignore all the `/a/b/...`, `/a/b/c/...` keys) 
+ 4. Similar to listing, rename can be a costly operation as it requires to rename many keys (renaming a first level directory means a rename of all the keys with the same prefix)
+
+See also the [Hadoop S3A documentation](https://hadoop.apache.org/docs/current/hadoop-aws/tools/hadoop-aws/index.html#Introducing_the_Hadoop_S3A_client) which describes some of these problem when AWS S3 is used. (*Warnings* section)
+
+# Current status
+
+As of today *Ozone Manager* has two different interfaces (both are defined in `OmClientProtocol.proto`): 
+
+ 1. object store related functions (like *CreateKey*, *LookupKey*, *DeleteKey*,...)  
+ 2. file system related functions (like *CreateFile*, *LookupFile*,...)
+
+File system related functions uses the same flat hierarchy under the hood but includes additional functionalities. For example the `createFile` call creates all the intermediate directories for a specific key (create file `/a/b/c` will create `/a/b` and `/a` entries in the key space)
+
+Today, if a key is created from the S3 interface can cause exceptions if the intermediate directories are checked from HCFS:
+
+
+```shell
+$ aws s3api put-object --endpoint http://localhost:9878 --bucket bucket1 --key /a/b/c/d
+
+$ ozone fs -ls o3fs://bucket1.s3v/a/
+ls: `o3fs://bucket1.s3v/a/': No such file or directory
+```
+
+This problem is reported in [HDDS-3955](https://issues.apache.org/jira/browse/HDDS-3955), where a new configuration key is introduced (`ozone.om.enable.filesystem.paths`). If this is enabled, intermediate directories are created even if the object store interface is used.
+
+This configuration is turned off by default, which means that S3 and HCFS couldn't be used together.
+
+To solve the performance problems of the directory listing / rename, [HDDS-2939](https://issues.apache.org/jira/browse/HDDS-2939) is created, which propose to use a new prefix table to store the "directory" entries (=prefixes).
+
+[HDDS-4097](https://issues.apache.org/jira/browse/HDDS-4097) is created to normalize the key names based on file-system semantics if `ozone.om.enable.filesystem.paths` is enabled. But please note that `ozone.om.enable.filesystem.paths` should always be turned on if S3 and HCFS are both used. It means that if both S3 and HCFS are used, normalization is forced, and S3 interface is not fully AWS S3 compatible. There is no option to use HCFS and S3 but with full AWS compatibility (and reduced HCFS compatibility). 
+
+# Goals
+
+ * Out of the box Ozone should support both S3 and HCFS interfaces without any settings. (It's possible only for the regular, fs compatible key names)
+ * As 100% compatibility couldn't be achieved on both side we need a configuration to set the expectations for incompatible key names
+ * Default behavior of `o3fs` and `ofs` should be as close to `s3a` as possible (when s3 compatibilty is prefered)
+
+# Possible cases to support
+
+There are two main aspects of supporting both `ofs/o3fs` and `s3` together:
+
+ 1. `ofs/o3fs` require to create intermediate directory entries (for example `/a/b` for the key `/b/c/c`)

Review comment:
       /b/c/c => /a/b/c

##########
File path: hadoop-hdds/docs/content/design/s3_hcfs.md
##########
@@ -0,0 +1,280 @@
+---
+title: S3/Ozone Filesystem inter-op 
+summary: How to support both S3 and HCFS and the same time
+date: 2020-09-09
+jira: HDDS-4097
+status: draft
+author: Marton Elek, 
+---
+<!--
+  Licensed under the Apache License, Version 2.0 (the "License");
+  you may not use this file except in compliance with the License.
+  You may obtain a copy of the License at
+
+   http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License. See accompanying LICENSE file.
+-->
+
+# Ozone S3 vs file-system semantics
+
+Ozone is an object-store for Hadoop ecosystem which can be used from multiple interfaces: 
+
+ 1. From Hadoop Compatible File Systems (will be called as *HCFS* in the remaining of this document) (RPC)
+ 2. From S3 compatible applications (REST)
+ 3. From container orchestrator as mounted volume (CSI, alpha feature)
+
+As Ozone is an object store it stores key and values in a flat hierarchy which is enough to support S3 (2). But to support Hadoop Compatible File System (and CSI), Ozone should simulated file system hierarchy.
+
+There are multiple challenges when file system hierarchy is simulated by a flat namespace:
+
+ 1. Some key patterns couldn't be easily transformed to file system path (e.g. `/a/b/../c`, `/a/b//d`, or a real key with directory path like `/b/d/`)
+ 2. Directory entries (which may have own properties) require special handling as file system interface requires a dir entry even if it's not created explicitly (for example if key `/a/b/c` is created `/a/b` supposed to be a visible directory entry for file system interface) 
+ 3. Non-recursive listing of directories can be hard (Listing direct entries under `/a` should ignore all the `/a/b/...`, `/a/b/c/...` keys) 
+ 4. Similar to listing, rename can be a costly operation as it requires to rename many keys (renaming a first level directory means a rename of all the keys with the same prefix)
+
+See also the [Hadoop S3A documentation](https://hadoop.apache.org/docs/current/hadoop-aws/tools/hadoop-aws/index.html#Introducing_the_Hadoop_S3A_client) which describes some of these problem when AWS S3 is used. (*Warnings* section)
+
+# Current status
+
+As of today *Ozone Manager* has two different interfaces (both are defined in `OmClientProtocol.proto`): 
+
+ 1. object store related functions (like *CreateKey*, *LookupKey*, *DeleteKey*,...)  
+ 2. file system related functions (like *CreateFile*, *LookupFile*,...)
+
+File system related functions uses the same flat hierarchy under the hood but includes additional functionalities. For example the `createFile` call creates all the intermediate directories for a specific key (create file `/a/b/c` will create `/a/b` and `/a` entries in the key space)
+
+Today, if a key is created from the S3 interface can cause exceptions if the intermediate directories are checked from HCFS:
+
+
+```shell
+$ aws s3api put-object --endpoint http://localhost:9878 --bucket bucket1 --key /a/b/c/d
+
+$ ozone fs -ls o3fs://bucket1.s3v/a/
+ls: `o3fs://bucket1.s3v/a/': No such file or directory
+```
+
+This problem is reported in [HDDS-3955](https://issues.apache.org/jira/browse/HDDS-3955), where a new configuration key is introduced (`ozone.om.enable.filesystem.paths`). If this is enabled, intermediate directories are created even if the object store interface is used.
+
+This configuration is turned off by default, which means that S3 and HCFS couldn't be used together.
+
+To solve the performance problems of the directory listing / rename, [HDDS-2939](https://issues.apache.org/jira/browse/HDDS-2939) is created, which propose to use a new prefix table to store the "directory" entries (=prefixes).
+
+[HDDS-4097](https://issues.apache.org/jira/browse/HDDS-4097) is created to normalize the key names based on file-system semantics if `ozone.om.enable.filesystem.paths` is enabled. But please note that `ozone.om.enable.filesystem.paths` should always be turned on if S3 and HCFS are both used. It means that if both S3 and HCFS are used, normalization is forced, and S3 interface is not fully AWS S3 compatible. There is no option to use HCFS and S3 but with full AWS compatibility (and reduced HCFS compatibility). 
+
+# Goals
+
+ * Out of the box Ozone should support both S3 and HCFS interfaces without any settings. (It's possible only for the regular, fs compatible key names)
+ * As 100% compatibility couldn't be achieved on both side we need a configuration to set the expectations for incompatible key names
+ * Default behavior of `o3fs` and `ofs` should be as close to `s3a` as possible (when s3 compatibilty is prefered)
+
+# Possible cases to support
+
+There are two main aspects of supporting both `ofs/o3fs` and `s3` together:
+
+ 1. `ofs/o3fs` require to create intermediate directory entries (for example `/a/b` for the key `/b/c/c`)
+ 2. Special file-system incompatible key names require special attention
+
+The second couldn't be done with compromise.
+
+ 1. We either support all key names (including non fs compatible key names), which means `ofs/o3fs` can provide only a partial view
+ 2. Or we can normalize the key names to be fs compatible (which makes it possible to create inconsistent S3 keys)
+
+HDDS-3955 introduced `ozone.om.enable.filesystem.paths`, with this setting we will have two possible usage pattern:
+
+| ozone.om.enable.filesystem.paths= | true | false
+|-|-|-|
+| create itermediate dirs | YES | NO |
+| normalize key names from `ofs/o3fs` | YES | NO
+| force to normalize key names of `s3` interface | YES | NO 
+| `s3` key `/a/b/c` available from `ofs/o3fs` | YES | NO
+| `s3` key `/a/b//c` available from `ofs/o3fs` | YES | NO
+| `s3` key `/a/b//c` available from `s3` | AWS S3 incompatibility | YES

Review comment:
       Any pointer to S3 compatibility requirement? e.g., path handling, normalization, etc.

##########
File path: hadoop-hdds/docs/content/design/s3_hcfs.md
##########
@@ -0,0 +1,282 @@
+---
+title: S3/Ozone Filesystem inter-op 
+summary: How to support both S3 and HCFS and the same time
+date: 2020-09-09
+jira: HDDS-4097
+status: draft
+author: Marton Elek, 
+---
+<!--
+  Licensed under the Apache License, Version 2.0 (the "License");
+  you may not use this file except in compliance with the License.
+  You may obtain a copy of the License at
+
+   http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License. See accompanying LICENSE file.
+-->
+
+# Ozone S3 vs file-system semantics
+
+Ozone is an object-store for Hadoop ecosystem which can be used from multiple interfaces: 
+
+ 1. From Hadoop Compatible File Systems (will be called as *HCFS* in the remaining of this document) (RPC)
+ 2. From S3 compatible applications (REST)
+ 3. From container orchestrator as mounted volume (CSI, alpha feature)
+
+As Ozone is an object store it stores key and values in a flat hierarchy which is enough to support S3 (2). But to support Hadoop Compatible File System (and CSI), Ozone should simulated file system hierarchy.
+
+There are multiple challenges when file system hierarchy is simulated by a flat namespace:
+
+ 1. Some key patterns couldn't be easily transformed to file system path (e.g. `/a/b/../c`, `/a/b//d`, or a real key with directory path like `/b/d/`)
+ 2. Directory entries (which may have own properties) require special handling as file system interface requires a dir entry even if it's not created explicitly (for example if key `/a/b/c` is created `/a/b` supposed to be a visible directory entry for file system interface) 
+ 3. Non-recursive listing of directories can be hard (Listing direct entries under `/a` should ignore all the `/a/b/...`, `/a/b/c/...` keys) 
+ 4. Similar to listing, rename can be a costly operation as it requires to rename many keys (renaming a first level directory means a rename of all the keys with the same prefix)
+
+See also the [Hadoop S3A documentation](https://hadoop.apache.org/docs/current/hadoop-aws/tools/hadoop-aws/index.html#Introducing_the_Hadoop_S3A_client) which describes some of these problem when AWS S3 is used. (*Warnings* section)
+
+# Current status
+
+As of today *Ozone Manager* has two different interfaces (both are defined in `OmClientProtocol.proto`): 
+
+ 1. object store related functions (like *CreateKey*, *LookupKey*, *DeleteKey*,...)  
+ 2. file system related functions (like *CreateFile*, *LookupFile*,...)
+
+File system related functions uses the same flat hierarchy under the hood but includes additional functionalities. For example the `createFile` call creates all the intermediate directories for a specific key (create file `/a/b/c` will create `/a/b` and `/a` entries in the key space)
+
+Today, if a key is created from the S3 interface can cause exceptions if the intermediate directories are checked from HCFS:
+
+
+```shell
+$ aws s3api put-object --endpoint http://localhost:9878 --bucket bucket1 --key /a/b/c/d
+
+$ ozone fs -ls o3fs://bucket1.s3v/a/
+ls: `o3fs://bucket1.s3v/a/': No such file or directory
+```
+
+This problem is reported in [HDDS-3955](https://issues.apache.org/jira/browse/HDDS-3955), where a new configuration key is introduced (`ozone.om.enable.filesystem.paths`). If this is enabled, intermediate directories are created even if the object store interface is used.
+
+This configuration is turned off by default, which means that S3 and HCFS couldn't be used together.
+
+To solve the performance problems of the directory listing / rename, [HDDS-2939](https://issues.apache.org/jira/browse/HDDS-2939) is created, which propose to use a new prefix table to store the "directory" entries (=prefixes).
+
+[HDDS-4097](https://issues.apache.org/jira/browse/HDDS-4097) is created to normalize the key names based on file-system semantics if `ozone.om.enable.filesystem.paths` is enabled. But please note that `ozone.om.enable.filesystem.paths` should always be turned on if S3 and HCFS are both used which means that S3 and HCFS couldn't be used together with normalization.
+
+# Goals
+
+ * Out of the box Ozone should support both S3 and HCFS interfaces without any settings. (It's possible only for the regular, fs compatible key names)
+ * As 100% compatibility couldn't be achieved on both side we need a configuration to set the expectations for incompatible key names
+ * Default behavior of `o3fs` and `ofs` should be as close to `s3a` as possible (when s3 compatibilty is prefered)
+
+# Possible cases to support
+
+There are two main aspects of supporting both `ofs/o3fs` and `s3` together:
+
+ 1. `ofs/o3fs` require to create intermediate directory entries (for exapmle `/a/b` for the key `/b/c/c`)
+ 2. Special file-system incompatible key names require special attention
+
+The second couldn't be done with compromise.
+
+ 1. We either support all key names (including non fs compatible key names), which means `ofs/o3fs` can provide only a partial view
+ 2. Or we can normalize the key names to be fs compatible (which makes it possible to create inconsistent S3 keys)
+
+HDDS-3955 introduced `ozone.om.enable.filesystem.paths`, with this setting we will have two possible usage pattern:
+
+| ozone.om.enable.filesystem.paths= | true | false
+|-|-|-|
+| create itermediate dirs | YES | NO |
+| normalize key names from `ofs/o3fs` | YES | NO
+| force to normalize key names of `s3` interface | YES (1) | NO 
+| `s3` key `/a/b/c` available from `ofs/o3fs` | YES | NO
+| `s3` key `/a/b//c` available from `ofs/o3fs` | YES | NO
+| `s3` key `/a/b//c` available from `s3` | AWS S3 incompatibility | YES
+
+(1): Under implementation
+
+This proposal suggest to use a 3rd option where 100% AWS compatiblity is guaranteed in exchange of a limited `ofs/o3fs` view:
+
+| ozone.om.intermediate.dir.generation= | true |
+|-|-|-|
+| create itermediate dirs | YES | 
+| normalize key names from `ofs/o3fs` | YES |
+| force to normalize key names of `s3` interface | **NO** |
+| `s3` key `/a/b/c` available from `ofs/o3fs` | YES | 
+| `s3` key `/a/b//c` available from `ofs/o3fs` | NO | 
+| `s3` key `/a/b//c` available from `s3` | *YES* (100% AWS compatibility) |
+
+
+# Proposed solution
+
+In short: 
+
+ **I propose to make it possible to configure **normalization** and **intermediate dir creation**, independent from each other**
+ 
+It can be done in multiple ways. For the sake of simplicity, let's imagine two configuration option
+
+| configuration | behavior | 
+|-|-|
+| `ozone.om.enable.filesystem.paths=true`  | Enable intermediate dir generation **AND** key name normalization 

Review comment:
       bq. Existing config means CREATE_DIR+NORMALIZE, new config is just CREATE_DIR.
   
   Are these the only two cases that are useful? do we need to support other combinations?

##########
File path: hadoop-hdds/docs/content/design/s3_hcfs.md
##########
@@ -0,0 +1,282 @@
+---
+title: S3/Ozone Filesystem inter-op 
+summary: How to support both S3 and HCFS and the same time
+date: 2020-09-09
+jira: HDDS-4097
+status: draft
+author: Marton Elek, 
+---
+<!--
+  Licensed under the Apache License, Version 2.0 (the "License");
+  you may not use this file except in compliance with the License.
+  You may obtain a copy of the License at
+
+   http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License. See accompanying LICENSE file.
+-->
+
+# Ozone S3 vs file-system semantics
+
+Ozone is an object-store for Hadoop ecosystem which can be used from multiple interfaces: 
+
+ 1. From Hadoop Compatible File Systems (will be called as *HCFS* in the remaining of this document) (RPC)
+ 2. From S3 compatible applications (REST)
+ 3. From container orchestrator as mounted volume (CSI, alpha feature)
+
+As Ozone is an object store it stores key and values in a flat hierarchy which is enough to support S3 (2). But to support Hadoop Compatible File System (and CSI), Ozone should simulated file system hierarchy.
+
+There are multiple challenges when file system hierarchy is simulated by a flat namespace:
+
+ 1. Some key patterns couldn't be easily transformed to file system path (e.g. `/a/b/../c`, `/a/b//d`, or a real key with directory path like `/b/d/`)
+ 2. Directory entries (which may have own properties) require special handling as file system interface requires a dir entry even if it's not created explicitly (for example if key `/a/b/c` is created `/a/b` supposed to be a visible directory entry for file system interface) 
+ 3. Non-recursive listing of directories can be hard (Listing direct entries under `/a` should ignore all the `/a/b/...`, `/a/b/c/...` keys) 
+ 4. Similar to listing, rename can be a costly operation as it requires to rename many keys (renaming a first level directory means a rename of all the keys with the same prefix)
+
+See also the [Hadoop S3A documentation](https://hadoop.apache.org/docs/current/hadoop-aws/tools/hadoop-aws/index.html#Introducing_the_Hadoop_S3A_client) which describes some of these problem when AWS S3 is used. (*Warnings* section)
+
+# Current status
+
+As of today *Ozone Manager* has two different interfaces (both are defined in `OmClientProtocol.proto`): 
+
+ 1. object store related functions (like *CreateKey*, *LookupKey*, *DeleteKey*,...)  
+ 2. file system related functions (like *CreateFile*, *LookupFile*,...)
+
+File system related functions uses the same flat hierarchy under the hood but includes additional functionalities. For example the `createFile` call creates all the intermediate directories for a specific key (create file `/a/b/c` will create `/a/b` and `/a` entries in the key space)
+
+Today, if a key is created from the S3 interface can cause exceptions if the intermediate directories are checked from HCFS:
+
+
+```shell
+$ aws s3api put-object --endpoint http://localhost:9878 --bucket bucket1 --key /a/b/c/d
+
+$ ozone fs -ls o3fs://bucket1.s3v/a/
+ls: `o3fs://bucket1.s3v/a/': No such file or directory
+```
+
+This problem is reported in [HDDS-3955](https://issues.apache.org/jira/browse/HDDS-3955), where a new configuration key is introduced (`ozone.om.enable.filesystem.paths`). If this is enabled, intermediate directories are created even if the object store interface is used.
+
+This configuration is turned off by default, which means that S3 and HCFS couldn't be used together.
+
+To solve the performance problems of the directory listing / rename, [HDDS-2939](https://issues.apache.org/jira/browse/HDDS-2939) is created, which propose to use a new prefix table to store the "directory" entries (=prefixes).
+
+[HDDS-4097](https://issues.apache.org/jira/browse/HDDS-4097) is created to normalize the key names based on file-system semantics if `ozone.om.enable.filesystem.paths` is enabled. But please note that `ozone.om.enable.filesystem.paths` should always be turned on if S3 and HCFS are both used which means that S3 and HCFS couldn't be used together with normalization.
+
+# Goals
+
+ * Out of the box Ozone should support both S3 and HCFS interfaces without any settings. (It's possible only for the regular, fs compatible key names)
+ * As 100% compatibility couldn't be achieved on both side we need a configuration to set the expectations for incompatible key names
+ * Default behavior of `o3fs` and `ofs` should be as close to `s3a` as possible (when s3 compatibilty is prefered)
+
+# Possible cases to support
+
+There are two main aspects of supporting both `ofs/o3fs` and `s3` together:
+
+ 1. `ofs/o3fs` require to create intermediate directory entries (for exapmle `/a/b` for the key `/b/c/c`)
+ 2. Special file-system incompatible key names require special attention
+
+The second couldn't be done with compromise.
+
+ 1. We either support all key names (including non fs compatible key names), which means `ofs/o3fs` can provide only a partial view
+ 2. Or we can normalize the key names to be fs compatible (which makes it possible to create inconsistent S3 keys)
+
+HDDS-3955 introduced `ozone.om.enable.filesystem.paths`, with this setting we will have two possible usage pattern:
+
+| ozone.om.enable.filesystem.paths= | true | false
+|-|-|-|
+| create itermediate dirs | YES | NO |
+| normalize key names from `ofs/o3fs` | YES | NO
+| force to normalize key names of `s3` interface | YES (1) | NO 
+| `s3` key `/a/b/c` available from `ofs/o3fs` | YES | NO
+| `s3` key `/a/b//c` available from `ofs/o3fs` | YES | NO
+| `s3` key `/a/b//c` available from `s3` | AWS S3 incompatibility | YES
+
+(1): Under implementation
+
+This proposal suggest to use a 3rd option where 100% AWS compatiblity is guaranteed in exchange of a limited `ofs/o3fs` view:
+
+| ozone.om.intermediate.dir.generation= | true |
+|-|-|-|
+| create itermediate dirs | YES | 
+| normalize key names from `ofs/o3fs` | YES |
+| force to normalize key names of `s3` interface | **NO** |
+| `s3` key `/a/b/c` available from `ofs/o3fs` | YES | 
+| `s3` key `/a/b//c` available from `ofs/o3fs` | NO | 
+| `s3` key `/a/b//c` available from `s3` | *YES* (100% AWS compatibility) |
+
+
+# Proposed solution
+
+In short: 
+
+ **I propose to make it possible to configure **normalization** and **intermediate dir creation**, independent from each other**
+ 
+It can be done in multiple ways. For the sake of simplicity, let's imagine two configuration option
+
+| configuration | behavior | 
+|-|-|
+| `ozone.om.enable.filesystem.paths=true`  | Enable intermediate dir generation **AND** key name normalization 
+| `ozone.om.enable.intermediate.dirs=true` | Enable only the intermediate dir generation
+
+## S3/HCFS Interoperability
+
+**In case of intermediate directory generation is enabled (with either of the configuraiton keys)**:
+
+When somebody creates a new key like `/a/b/c/d`, the same key should be visible from HCFS (`o3fs//` or `o3://`). `/a`, `/a/b` and `/a/b/c` should be visible as directories from HCFS.
+
+S3 should list only the `/a/b/c/d` keys, (`/a`, `/a/b`, `/a/b/c` keys, created to help HCFS, **won't be visible** if the key is created from S3)
+
+This can be done with persisting an extra flag with the implicit directory entries. These entries can be modified if they are explicit created.
+
+This flag should be added only for the keys which are created by S3. `ofs://` and `of3fs://`  create explicit directories all the time.

Review comment:
       bq. S3 should list only the `/a/b/c/d` keys, (`/a`, `/a/b`, `/a/b/c` keys, created to help HCFS, **won't be visible** if the key is created from S3)
   
   what if user create additional files under intermediate dirs such as /a/b/d via FS interface, we still want to show them in this case for interop?
   

##########
File path: hadoop-hdds/docs/content/design/s3_hcfs.md
##########
@@ -0,0 +1,280 @@
+---
+title: S3/Ozone Filesystem inter-op 
+summary: How to support both S3 and HCFS and the same time
+date: 2020-09-09
+jira: HDDS-4097
+status: draft
+author: Marton Elek, 
+---
+<!--
+  Licensed under the Apache License, Version 2.0 (the "License");
+  you may not use this file except in compliance with the License.
+  You may obtain a copy of the License at
+
+   http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License. See accompanying LICENSE file.
+-->
+
+# Ozone S3 vs file-system semantics
+
+Ozone is an object-store for Hadoop ecosystem which can be used from multiple interfaces: 
+
+ 1. From Hadoop Compatible File Systems (will be called as *HCFS* in the remaining of this document) (RPC)
+ 2. From S3 compatible applications (REST)
+ 3. From container orchestrator as mounted volume (CSI, alpha feature)
+
+As Ozone is an object store it stores key and values in a flat hierarchy which is enough to support S3 (2). But to support Hadoop Compatible File System (and CSI), Ozone should simulated file system hierarchy.
+
+There are multiple challenges when file system hierarchy is simulated by a flat namespace:
+
+ 1. Some key patterns couldn't be easily transformed to file system path (e.g. `/a/b/../c`, `/a/b//d`, or a real key with directory path like `/b/d/`)
+ 2. Directory entries (which may have own properties) require special handling as file system interface requires a dir entry even if it's not created explicitly (for example if key `/a/b/c` is created `/a/b` supposed to be a visible directory entry for file system interface) 
+ 3. Non-recursive listing of directories can be hard (Listing direct entries under `/a` should ignore all the `/a/b/...`, `/a/b/c/...` keys) 
+ 4. Similar to listing, rename can be a costly operation as it requires to rename many keys (renaming a first level directory means a rename of all the keys with the same prefix)
+
+See also the [Hadoop S3A documentation](https://hadoop.apache.org/docs/current/hadoop-aws/tools/hadoop-aws/index.html#Introducing_the_Hadoop_S3A_client) which describes some of these problem when AWS S3 is used. (*Warnings* section)
+
+# Current status
+
+As of today *Ozone Manager* has two different interfaces (both are defined in `OmClientProtocol.proto`): 
+
+ 1. object store related functions (like *CreateKey*, *LookupKey*, *DeleteKey*,...)  
+ 2. file system related functions (like *CreateFile*, *LookupFile*,...)
+
+File system related functions uses the same flat hierarchy under the hood but includes additional functionalities. For example the `createFile` call creates all the intermediate directories for a specific key (create file `/a/b/c` will create `/a/b` and `/a` entries in the key space)
+
+Today, if a key is created from the S3 interface can cause exceptions if the intermediate directories are checked from HCFS:
+
+
+```shell
+$ aws s3api put-object --endpoint http://localhost:9878 --bucket bucket1 --key /a/b/c/d
+
+$ ozone fs -ls o3fs://bucket1.s3v/a/
+ls: `o3fs://bucket1.s3v/a/': No such file or directory
+```
+
+This problem is reported in [HDDS-3955](https://issues.apache.org/jira/browse/HDDS-3955), where a new configuration key is introduced (`ozone.om.enable.filesystem.paths`). If this is enabled, intermediate directories are created even if the object store interface is used.
+
+This configuration is turned off by default, which means that S3 and HCFS couldn't be used together.
+
+To solve the performance problems of the directory listing / rename, [HDDS-2939](https://issues.apache.org/jira/browse/HDDS-2939) is created, which propose to use a new prefix table to store the "directory" entries (=prefixes).
+
+[HDDS-4097](https://issues.apache.org/jira/browse/HDDS-4097) is created to normalize the key names based on file-system semantics if `ozone.om.enable.filesystem.paths` is enabled. But please note that `ozone.om.enable.filesystem.paths` should always be turned on if S3 and HCFS are both used. It means that if both S3 and HCFS are used, normalization is forced, and S3 interface is not fully AWS S3 compatible. There is no option to use HCFS and S3 but with full AWS compatibility (and reduced HCFS compatibility). 
+
+# Goals
+
+ * Out of the box Ozone should support both S3 and HCFS interfaces without any settings. (It's possible only for the regular, fs compatible key names)
+ * As 100% compatibility couldn't be achieved on both side we need a configuration to set the expectations for incompatible key names
+ * Default behavior of `o3fs` and `ofs` should be as close to `s3a` as possible (when s3 compatibilty is prefered)
+
+# Possible cases to support
+
+There are two main aspects of supporting both `ofs/o3fs` and `s3` together:
+
+ 1. `ofs/o3fs` require to create intermediate directory entries (for example `/a/b` for the key `/b/c/c`)
+ 2. Special file-system incompatible key names require special attention
+
+The second couldn't be done with compromise.
+
+ 1. We either support all key names (including non fs compatible key names), which means `ofs/o3fs` can provide only a partial view
+ 2. Or we can normalize the key names to be fs compatible (which makes it possible to create inconsistent S3 keys)
+
+HDDS-3955 introduced `ozone.om.enable.filesystem.paths`, with this setting we will have two possible usage pattern:
+
+| ozone.om.enable.filesystem.paths= | true | false
+|-|-|-|
+| create itermediate dirs | YES | NO |
+| normalize key names from `ofs/o3fs` | YES | NO
+| force to normalize key names of `s3` interface | YES | NO 
+| `s3` key `/a/b/c` available from `ofs/o3fs` | YES | NO
+| `s3` key `/a/b//c` available from `ofs/o3fs` | YES | NO
+| `s3` key `/a/b//c` available from `s3` | AWS S3 incompatibility | YES
+
+This proposal suggest to use a 3rd option where 100% AWS compatiblity is guaranteed in exchange of a limited `ofs/o3fs` view:
+
+| ozone.om.intermediate.dir.generation= | true |
+|-|-|-|
+| create itermediate dirs | YES | 
+| normalize key names from `ofs/o3fs` | YES |
+| force to normalize key names of `s3` interface | **NO** |
+| `s3` key `/a/b/c` available from `ofs/o3fs` | YES | 
+| `s3` key `/a/b//c` available from `ofs/o3fs` | NO | 
+| `s3` key `/a/b//c` available from `s3` | *YES* (100% AWS compatibility) |
+
+
+# Proposed solution
+
+In short: 
+
+ **I propose to make it possible to configure **normalization** and **intermediate dir creation**, independent from each other**
+ 
+It can be done in multiple ways. For the sake of simplicity, let's imagine two configuration option
+
+| configuration | behavior | 
+|-|-|
+| `ozone.om.enable.filesystem.paths=true`  | Enable intermediate dir generation **AND** key name normalization 
+| `ozone.om.enable.intermediate.dirs=true` | Enable only the intermediate dir generation
+
+## S3/HCFS Interoperability
+
+**In case of intermediate directory generation is enabled (with either of the configuraiton keys)**:
+
+When somebody creates a new key like `/a/b/c/d`, the same key should be visible from HCFS (`o3fs//` or `o3://`). `/a`, `/a/b` and `/a/b/c` should be visible as directories from HCFS.
+
+S3 should list only the `/a/b/c/d` keys, (`/a`, `/a/b`, `/a/b/c` keys, created to help HCFS, **won't be visible** if the key is created from S3)
+
+This can be done with persisting an extra flag with the implicit directory entries. These entries can be modified if they are explicit created.
+
+This flag should be added only for the keys which are created by S3. `ofs://` and `of3fs://`  create explicit directories all the time.
+
+Advantages of this approach:
+
+ 1. HCFS and S3 can work together
+ 2. S3 behavior is closer to the original AWS s3 behavior (when `/a/b/c` key is created `/a/b` won't be visible)
+
+## Handling of the incompatible paths
+
+As it's defined above the intermediate directory generation and normalization are two independent settings. (It's possible to choose only to create the intermediate directories).
+
+**If normalization is choosen**: (`ozone.om.enable.filesystem.paths=true`), all the key names will be normalized to fs-compatible name. It may cause a conflict (error) if the normalized key is already exists (or exists as a file instead of directory)
+
+**Without normalization (`ozone.om.enable.intermediate.dirs=true`)**:
+
+Creating intermediate directories might not be possible if path contains illegal characters or can't be parsed as a file system path. **These keys will be invisible from HCFS** by default. They will be ignored during the normal file list.

Review comment:
       The illegal char from S3 path can be encoded into FS path except the /. 

##########
File path: hadoop-hdds/docs/content/design/s3_hcfs.md
##########
@@ -0,0 +1,280 @@
+---
+title: S3/Ozone Filesystem inter-op 
+summary: How to support both S3 and HCFS and the same time
+date: 2020-09-09
+jira: HDDS-4097
+status: draft
+author: Marton Elek, 
+---
+<!--
+  Licensed under the Apache License, Version 2.0 (the "License");
+  you may not use this file except in compliance with the License.
+  You may obtain a copy of the License at
+
+   http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License. See accompanying LICENSE file.
+-->
+
+# Ozone S3 vs file-system semantics
+
+Ozone is an object-store for Hadoop ecosystem which can be used from multiple interfaces: 
+
+ 1. From Hadoop Compatible File Systems (will be called as *HCFS* in the remaining of this document) (RPC)
+ 2. From S3 compatible applications (REST)
+ 3. From container orchestrator as mounted volume (CSI, alpha feature)
+
+As Ozone is an object store it stores key and values in a flat hierarchy which is enough to support S3 (2). But to support Hadoop Compatible File System (and CSI), Ozone should simulated file system hierarchy.
+
+There are multiple challenges when file system hierarchy is simulated by a flat namespace:
+
+ 1. Some key patterns couldn't be easily transformed to file system path (e.g. `/a/b/../c`, `/a/b//d`, or a real key with directory path like `/b/d/`)
+ 2. Directory entries (which may have own properties) require special handling as file system interface requires a dir entry even if it's not created explicitly (for example if key `/a/b/c` is created `/a/b` supposed to be a visible directory entry for file system interface) 
+ 3. Non-recursive listing of directories can be hard (Listing direct entries under `/a` should ignore all the `/a/b/...`, `/a/b/c/...` keys) 
+ 4. Similar to listing, rename can be a costly operation as it requires to rename many keys (renaming a first level directory means a rename of all the keys with the same prefix)
+
+See also the [Hadoop S3A documentation](https://hadoop.apache.org/docs/current/hadoop-aws/tools/hadoop-aws/index.html#Introducing_the_Hadoop_S3A_client) which describes some of these problem when AWS S3 is used. (*Warnings* section)
+
+# Current status
+
+As of today *Ozone Manager* has two different interfaces (both are defined in `OmClientProtocol.proto`): 
+
+ 1. object store related functions (like *CreateKey*, *LookupKey*, *DeleteKey*,...)  
+ 2. file system related functions (like *CreateFile*, *LookupFile*,...)
+
+File system related functions uses the same flat hierarchy under the hood but includes additional functionalities. For example the `createFile` call creates all the intermediate directories for a specific key (create file `/a/b/c` will create `/a/b` and `/a` entries in the key space)
+
+Today, if a key is created from the S3 interface can cause exceptions if the intermediate directories are checked from HCFS:
+
+
+```shell
+$ aws s3api put-object --endpoint http://localhost:9878 --bucket bucket1 --key /a/b/c/d
+
+$ ozone fs -ls o3fs://bucket1.s3v/a/
+ls: `o3fs://bucket1.s3v/a/': No such file or directory
+```
+
+This problem is reported in [HDDS-3955](https://issues.apache.org/jira/browse/HDDS-3955), where a new configuration key is introduced (`ozone.om.enable.filesystem.paths`). If this is enabled, intermediate directories are created even if the object store interface is used.
+
+This configuration is turned off by default, which means that S3 and HCFS couldn't be used together.
+
+To solve the performance problems of the directory listing / rename, [HDDS-2939](https://issues.apache.org/jira/browse/HDDS-2939) is created, which propose to use a new prefix table to store the "directory" entries (=prefixes).
+
+[HDDS-4097](https://issues.apache.org/jira/browse/HDDS-4097) is created to normalize the key names based on file-system semantics if `ozone.om.enable.filesystem.paths` is enabled. But please note that `ozone.om.enable.filesystem.paths` should always be turned on if S3 and HCFS are both used. It means that if both S3 and HCFS are used, normalization is forced, and S3 interface is not fully AWS S3 compatible. There is no option to use HCFS and S3 but with full AWS compatibility (and reduced HCFS compatibility). 
+
+# Goals
+
+ * Out of the box Ozone should support both S3 and HCFS interfaces without any settings. (It's possible only for the regular, fs compatible key names)
+ * As 100% compatibility couldn't be achieved on both side we need a configuration to set the expectations for incompatible key names
+ * Default behavior of `o3fs` and `ofs` should be as close to `s3a` as possible (when s3 compatibilty is prefered)
+
+# Possible cases to support
+
+There are two main aspects of supporting both `ofs/o3fs` and `s3` together:
+
+ 1. `ofs/o3fs` require to create intermediate directory entries (for example `/a/b` for the key `/b/c/c`)

Review comment:
       /b/c/c => /a/b/c

##########
File path: hadoop-hdds/docs/content/design/s3_hcfs.md
##########
@@ -0,0 +1,280 @@
+---
+title: S3/Ozone Filesystem inter-op 
+summary: How to support both S3 and HCFS and the same time
+date: 2020-09-09
+jira: HDDS-4097
+status: draft
+author: Marton Elek, 
+---
+<!--
+  Licensed under the Apache License, Version 2.0 (the "License");
+  you may not use this file except in compliance with the License.
+  You may obtain a copy of the License at
+
+   http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License. See accompanying LICENSE file.
+-->
+
+# Ozone S3 vs file-system semantics
+
+Ozone is an object-store for Hadoop ecosystem which can be used from multiple interfaces: 
+
+ 1. From Hadoop Compatible File Systems (will be called as *HCFS* in the remaining of this document) (RPC)
+ 2. From S3 compatible applications (REST)
+ 3. From container orchestrator as mounted volume (CSI, alpha feature)
+
+As Ozone is an object store it stores key and values in a flat hierarchy which is enough to support S3 (2). But to support Hadoop Compatible File System (and CSI), Ozone should simulated file system hierarchy.
+
+There are multiple challenges when file system hierarchy is simulated by a flat namespace:
+
+ 1. Some key patterns couldn't be easily transformed to file system path (e.g. `/a/b/../c`, `/a/b//d`, or a real key with directory path like `/b/d/`)
+ 2. Directory entries (which may have own properties) require special handling as file system interface requires a dir entry even if it's not created explicitly (for example if key `/a/b/c` is created `/a/b` supposed to be a visible directory entry for file system interface) 
+ 3. Non-recursive listing of directories can be hard (Listing direct entries under `/a` should ignore all the `/a/b/...`, `/a/b/c/...` keys) 
+ 4. Similar to listing, rename can be a costly operation as it requires to rename many keys (renaming a first level directory means a rename of all the keys with the same prefix)
+
+See also the [Hadoop S3A documentation](https://hadoop.apache.org/docs/current/hadoop-aws/tools/hadoop-aws/index.html#Introducing_the_Hadoop_S3A_client) which describes some of these problem when AWS S3 is used. (*Warnings* section)
+
+# Current status
+
+As of today *Ozone Manager* has two different interfaces (both are defined in `OmClientProtocol.proto`): 
+
+ 1. object store related functions (like *CreateKey*, *LookupKey*, *DeleteKey*,...)  
+ 2. file system related functions (like *CreateFile*, *LookupFile*,...)
+
+File system related functions uses the same flat hierarchy under the hood but includes additional functionalities. For example the `createFile` call creates all the intermediate directories for a specific key (create file `/a/b/c` will create `/a/b` and `/a` entries in the key space)
+
+Today, if a key is created from the S3 interface can cause exceptions if the intermediate directories are checked from HCFS:
+
+
+```shell
+$ aws s3api put-object --endpoint http://localhost:9878 --bucket bucket1 --key /a/b/c/d
+
+$ ozone fs -ls o3fs://bucket1.s3v/a/
+ls: `o3fs://bucket1.s3v/a/': No such file or directory
+```
+
+This problem is reported in [HDDS-3955](https://issues.apache.org/jira/browse/HDDS-3955), where a new configuration key is introduced (`ozone.om.enable.filesystem.paths`). If this is enabled, intermediate directories are created even if the object store interface is used.
+
+This configuration is turned off by default, which means that S3 and HCFS couldn't be used together.
+
+To solve the performance problems of the directory listing / rename, [HDDS-2939](https://issues.apache.org/jira/browse/HDDS-2939) is created, which propose to use a new prefix table to store the "directory" entries (=prefixes).
+
+[HDDS-4097](https://issues.apache.org/jira/browse/HDDS-4097) is created to normalize the key names based on file-system semantics if `ozone.om.enable.filesystem.paths` is enabled. But please note that `ozone.om.enable.filesystem.paths` should always be turned on if S3 and HCFS are both used. It means that if both S3 and HCFS are used, normalization is forced, and S3 interface is not fully AWS S3 compatible. There is no option to use HCFS and S3 but with full AWS compatibility (and reduced HCFS compatibility). 
+
+# Goals
+
+ * Out of the box Ozone should support both S3 and HCFS interfaces without any settings. (It's possible only for the regular, fs compatible key names)
+ * As 100% compatibility couldn't be achieved on both side we need a configuration to set the expectations for incompatible key names
+ * Default behavior of `o3fs` and `ofs` should be as close to `s3a` as possible (when s3 compatibilty is prefered)
+
+# Possible cases to support
+
+There are two main aspects of supporting both `ofs/o3fs` and `s3` together:
+
+ 1. `ofs/o3fs` require to create intermediate directory entries (for example `/a/b` for the key `/b/c/c`)
+ 2. Special file-system incompatible key names require special attention
+
+The second couldn't be done with compromise.
+
+ 1. We either support all key names (including non fs compatible key names), which means `ofs/o3fs` can provide only a partial view
+ 2. Or we can normalize the key names to be fs compatible (which makes it possible to create inconsistent S3 keys)
+
+HDDS-3955 introduced `ozone.om.enable.filesystem.paths`, with this setting we will have two possible usage pattern:
+
+| ozone.om.enable.filesystem.paths= | true | false
+|-|-|-|
+| create itermediate dirs | YES | NO |
+| normalize key names from `ofs/o3fs` | YES | NO
+| force to normalize key names of `s3` interface | YES | NO 
+| `s3` key `/a/b/c` available from `ofs/o3fs` | YES | NO
+| `s3` key `/a/b//c` available from `ofs/o3fs` | YES | NO
+| `s3` key `/a/b//c` available from `s3` | AWS S3 incompatibility | YES

Review comment:
       Any pointer to S3 compatibility requirement? e.g., path handling, normalization, etc.

##########
File path: hadoop-hdds/docs/content/design/s3_hcfs.md
##########
@@ -0,0 +1,282 @@
+---
+title: S3/Ozone Filesystem inter-op 
+summary: How to support both S3 and HCFS and the same time
+date: 2020-09-09
+jira: HDDS-4097
+status: draft
+author: Marton Elek, 
+---
+<!--
+  Licensed under the Apache License, Version 2.0 (the "License");
+  you may not use this file except in compliance with the License.
+  You may obtain a copy of the License at
+
+   http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License. See accompanying LICENSE file.
+-->
+
+# Ozone S3 vs file-system semantics
+
+Ozone is an object-store for Hadoop ecosystem which can be used from multiple interfaces: 
+
+ 1. From Hadoop Compatible File Systems (will be called as *HCFS* in the remaining of this document) (RPC)
+ 2. From S3 compatible applications (REST)
+ 3. From container orchestrator as mounted volume (CSI, alpha feature)
+
+As Ozone is an object store it stores key and values in a flat hierarchy which is enough to support S3 (2). But to support Hadoop Compatible File System (and CSI), Ozone should simulated file system hierarchy.
+
+There are multiple challenges when file system hierarchy is simulated by a flat namespace:
+
+ 1. Some key patterns couldn't be easily transformed to file system path (e.g. `/a/b/../c`, `/a/b//d`, or a real key with directory path like `/b/d/`)
+ 2. Directory entries (which may have own properties) require special handling as file system interface requires a dir entry even if it's not created explicitly (for example if key `/a/b/c` is created `/a/b` supposed to be a visible directory entry for file system interface) 
+ 3. Non-recursive listing of directories can be hard (Listing direct entries under `/a` should ignore all the `/a/b/...`, `/a/b/c/...` keys) 
+ 4. Similar to listing, rename can be a costly operation as it requires to rename many keys (renaming a first level directory means a rename of all the keys with the same prefix)
+
+See also the [Hadoop S3A documentation](https://hadoop.apache.org/docs/current/hadoop-aws/tools/hadoop-aws/index.html#Introducing_the_Hadoop_S3A_client) which describes some of these problem when AWS S3 is used. (*Warnings* section)
+
+# Current status
+
+As of today *Ozone Manager* has two different interfaces (both are defined in `OmClientProtocol.proto`): 
+
+ 1. object store related functions (like *CreateKey*, *LookupKey*, *DeleteKey*,...)  
+ 2. file system related functions (like *CreateFile*, *LookupFile*,...)
+
+File system related functions uses the same flat hierarchy under the hood but includes additional functionalities. For example the `createFile` call creates all the intermediate directories for a specific key (create file `/a/b/c` will create `/a/b` and `/a` entries in the key space)
+
+Today, if a key is created from the S3 interface can cause exceptions if the intermediate directories are checked from HCFS:
+
+
+```shell
+$ aws s3api put-object --endpoint http://localhost:9878 --bucket bucket1 --key /a/b/c/d
+
+$ ozone fs -ls o3fs://bucket1.s3v/a/
+ls: `o3fs://bucket1.s3v/a/': No such file or directory
+```
+
+This problem is reported in [HDDS-3955](https://issues.apache.org/jira/browse/HDDS-3955), where a new configuration key is introduced (`ozone.om.enable.filesystem.paths`). If this is enabled, intermediate directories are created even if the object store interface is used.
+
+This configuration is turned off by default, which means that S3 and HCFS couldn't be used together.
+
+To solve the performance problems of the directory listing / rename, [HDDS-2939](https://issues.apache.org/jira/browse/HDDS-2939) is created, which propose to use a new prefix table to store the "directory" entries (=prefixes).
+
+[HDDS-4097](https://issues.apache.org/jira/browse/HDDS-4097) is created to normalize the key names based on file-system semantics if `ozone.om.enable.filesystem.paths` is enabled. But please note that `ozone.om.enable.filesystem.paths` should always be turned on if S3 and HCFS are both used which means that S3 and HCFS couldn't be used together with normalization.
+
+# Goals
+
+ * Out of the box Ozone should support both S3 and HCFS interfaces without any settings. (It's possible only for the regular, fs compatible key names)
+ * As 100% compatibility couldn't be achieved on both side we need a configuration to set the expectations for incompatible key names
+ * Default behavior of `o3fs` and `ofs` should be as close to `s3a` as possible (when s3 compatibilty is prefered)
+
+# Possible cases to support
+
+There are two main aspects of supporting both `ofs/o3fs` and `s3` together:
+
+ 1. `ofs/o3fs` require to create intermediate directory entries (for exapmle `/a/b` for the key `/b/c/c`)
+ 2. Special file-system incompatible key names require special attention
+
+The second couldn't be done with compromise.
+
+ 1. We either support all key names (including non fs compatible key names), which means `ofs/o3fs` can provide only a partial view
+ 2. Or we can normalize the key names to be fs compatible (which makes it possible to create inconsistent S3 keys)
+
+HDDS-3955 introduced `ozone.om.enable.filesystem.paths`, with this setting we will have two possible usage pattern:
+
+| ozone.om.enable.filesystem.paths= | true | false
+|-|-|-|
+| create itermediate dirs | YES | NO |
+| normalize key names from `ofs/o3fs` | YES | NO
+| force to normalize key names of `s3` interface | YES (1) | NO 
+| `s3` key `/a/b/c` available from `ofs/o3fs` | YES | NO
+| `s3` key `/a/b//c` available from `ofs/o3fs` | YES | NO
+| `s3` key `/a/b//c` available from `s3` | AWS S3 incompatibility | YES
+
+(1): Under implementation
+
+This proposal suggest to use a 3rd option where 100% AWS compatiblity is guaranteed in exchange of a limited `ofs/o3fs` view:
+
+| ozone.om.intermediate.dir.generation= | true |
+|-|-|-|
+| create itermediate dirs | YES | 
+| normalize key names from `ofs/o3fs` | YES |
+| force to normalize key names of `s3` interface | **NO** |
+| `s3` key `/a/b/c` available from `ofs/o3fs` | YES | 
+| `s3` key `/a/b//c` available from `ofs/o3fs` | NO | 
+| `s3` key `/a/b//c` available from `s3` | *YES* (100% AWS compatibility) |
+
+
+# Proposed solution
+
+In short: 
+
+ **I propose to make it possible to configure **normalization** and **intermediate dir creation**, independent from each other**
+ 
+It can be done in multiple ways. For the sake of simplicity, let's imagine two configuration option
+
+| configuration | behavior | 
+|-|-|
+| `ozone.om.enable.filesystem.paths=true`  | Enable intermediate dir generation **AND** key name normalization 

Review comment:
       bq. Existing config means CREATE_DIR+NORMALIZE, new config is just CREATE_DIR.
   
   Are these the only two cases that are useful? do we need to support other combinations?

##########
File path: hadoop-hdds/docs/content/design/s3_hcfs.md
##########
@@ -0,0 +1,282 @@
+---
+title: S3/Ozone Filesystem inter-op 
+summary: How to support both S3 and HCFS and the same time
+date: 2020-09-09
+jira: HDDS-4097
+status: draft
+author: Marton Elek, 
+---
+<!--
+  Licensed under the Apache License, Version 2.0 (the "License");
+  you may not use this file except in compliance with the License.
+  You may obtain a copy of the License at
+
+   http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License. See accompanying LICENSE file.
+-->
+
+# Ozone S3 vs file-system semantics
+
+Ozone is an object-store for Hadoop ecosystem which can be used from multiple interfaces: 
+
+ 1. From Hadoop Compatible File Systems (will be called as *HCFS* in the remaining of this document) (RPC)
+ 2. From S3 compatible applications (REST)
+ 3. From container orchestrator as mounted volume (CSI, alpha feature)
+
+As Ozone is an object store it stores key and values in a flat hierarchy which is enough to support S3 (2). But to support Hadoop Compatible File System (and CSI), Ozone should simulated file system hierarchy.
+
+There are multiple challenges when file system hierarchy is simulated by a flat namespace:
+
+ 1. Some key patterns couldn't be easily transformed to file system path (e.g. `/a/b/../c`, `/a/b//d`, or a real key with directory path like `/b/d/`)
+ 2. Directory entries (which may have own properties) require special handling as file system interface requires a dir entry even if it's not created explicitly (for example if key `/a/b/c` is created `/a/b` supposed to be a visible directory entry for file system interface) 
+ 3. Non-recursive listing of directories can be hard (Listing direct entries under `/a` should ignore all the `/a/b/...`, `/a/b/c/...` keys) 
+ 4. Similar to listing, rename can be a costly operation as it requires to rename many keys (renaming a first level directory means a rename of all the keys with the same prefix)
+
+See also the [Hadoop S3A documentation](https://hadoop.apache.org/docs/current/hadoop-aws/tools/hadoop-aws/index.html#Introducing_the_Hadoop_S3A_client) which describes some of these problem when AWS S3 is used. (*Warnings* section)
+
+# Current status
+
+As of today *Ozone Manager* has two different interfaces (both are defined in `OmClientProtocol.proto`): 
+
+ 1. object store related functions (like *CreateKey*, *LookupKey*, *DeleteKey*,...)  
+ 2. file system related functions (like *CreateFile*, *LookupFile*,...)
+
+File system related functions uses the same flat hierarchy under the hood but includes additional functionalities. For example the `createFile` call creates all the intermediate directories for a specific key (create file `/a/b/c` will create `/a/b` and `/a` entries in the key space)
+
+Today, if a key is created from the S3 interface can cause exceptions if the intermediate directories are checked from HCFS:
+
+
+```shell
+$ aws s3api put-object --endpoint http://localhost:9878 --bucket bucket1 --key /a/b/c/d
+
+$ ozone fs -ls o3fs://bucket1.s3v/a/
+ls: `o3fs://bucket1.s3v/a/': No such file or directory
+```
+
+This problem is reported in [HDDS-3955](https://issues.apache.org/jira/browse/HDDS-3955), where a new configuration key is introduced (`ozone.om.enable.filesystem.paths`). If this is enabled, intermediate directories are created even if the object store interface is used.
+
+This configuration is turned off by default, which means that S3 and HCFS couldn't be used together.
+
+To solve the performance problems of the directory listing / rename, [HDDS-2939](https://issues.apache.org/jira/browse/HDDS-2939) is created, which propose to use a new prefix table to store the "directory" entries (=prefixes).
+
+[HDDS-4097](https://issues.apache.org/jira/browse/HDDS-4097) is created to normalize the key names based on file-system semantics if `ozone.om.enable.filesystem.paths` is enabled. But please note that `ozone.om.enable.filesystem.paths` should always be turned on if S3 and HCFS are both used which means that S3 and HCFS couldn't be used together with normalization.
+
+# Goals
+
+ * Out of the box Ozone should support both S3 and HCFS interfaces without any settings. (It's possible only for the regular, fs compatible key names)
+ * As 100% compatibility couldn't be achieved on both side we need a configuration to set the expectations for incompatible key names
+ * Default behavior of `o3fs` and `ofs` should be as close to `s3a` as possible (when s3 compatibilty is prefered)
+
+# Possible cases to support
+
+There are two main aspects of supporting both `ofs/o3fs` and `s3` together:
+
+ 1. `ofs/o3fs` require to create intermediate directory entries (for exapmle `/a/b` for the key `/b/c/c`)
+ 2. Special file-system incompatible key names require special attention
+
+The second couldn't be done with compromise.
+
+ 1. We either support all key names (including non fs compatible key names), which means `ofs/o3fs` can provide only a partial view
+ 2. Or we can normalize the key names to be fs compatible (which makes it possible to create inconsistent S3 keys)
+
+HDDS-3955 introduced `ozone.om.enable.filesystem.paths`, with this setting we will have two possible usage pattern:
+
+| ozone.om.enable.filesystem.paths= | true | false
+|-|-|-|
+| create itermediate dirs | YES | NO |
+| normalize key names from `ofs/o3fs` | YES | NO
+| force to normalize key names of `s3` interface | YES (1) | NO 
+| `s3` key `/a/b/c` available from `ofs/o3fs` | YES | NO
+| `s3` key `/a/b//c` available from `ofs/o3fs` | YES | NO
+| `s3` key `/a/b//c` available from `s3` | AWS S3 incompatibility | YES
+
+(1): Under implementation
+
+This proposal suggest to use a 3rd option where 100% AWS compatiblity is guaranteed in exchange of a limited `ofs/o3fs` view:
+
+| ozone.om.intermediate.dir.generation= | true |
+|-|-|-|
+| create itermediate dirs | YES | 
+| normalize key names from `ofs/o3fs` | YES |
+| force to normalize key names of `s3` interface | **NO** |
+| `s3` key `/a/b/c` available from `ofs/o3fs` | YES | 
+| `s3` key `/a/b//c` available from `ofs/o3fs` | NO | 
+| `s3` key `/a/b//c` available from `s3` | *YES* (100% AWS compatibility) |
+
+
+# Proposed solution
+
+In short: 
+
+ **I propose to make it possible to configure **normalization** and **intermediate dir creation**, independent from each other**
+ 
+It can be done in multiple ways. For the sake of simplicity, let's imagine two configuration option
+
+| configuration | behavior | 
+|-|-|
+| `ozone.om.enable.filesystem.paths=true`  | Enable intermediate dir generation **AND** key name normalization 
+| `ozone.om.enable.intermediate.dirs=true` | Enable only the intermediate dir generation
+
+## S3/HCFS Interoperability
+
+**In case of intermediate directory generation is enabled (with either of the configuraiton keys)**:
+
+When somebody creates a new key like `/a/b/c/d`, the same key should be visible from HCFS (`o3fs//` or `o3://`). `/a`, `/a/b` and `/a/b/c` should be visible as directories from HCFS.
+
+S3 should list only the `/a/b/c/d` keys, (`/a`, `/a/b`, `/a/b/c` keys, created to help HCFS, **won't be visible** if the key is created from S3)
+
+This can be done with persisting an extra flag with the implicit directory entries. These entries can be modified if they are explicit created.
+
+This flag should be added only for the keys which are created by S3. `ofs://` and `of3fs://`  create explicit directories all the time.

Review comment:
       bq. S3 should list only the `/a/b/c/d` keys, (`/a`, `/a/b`, `/a/b/c` keys, created to help HCFS, **won't be visible** if the key is created from S3)
   
   what if user create additional files under intermediate dirs such as /a/b/d via FS interface, we still want to show them in this case for interop?
   

##########
File path: hadoop-hdds/docs/content/design/s3_hcfs.md
##########
@@ -0,0 +1,280 @@
+---
+title: S3/Ozone Filesystem inter-op 
+summary: How to support both S3 and HCFS and the same time
+date: 2020-09-09
+jira: HDDS-4097
+status: draft
+author: Marton Elek, 
+---
+<!--
+  Licensed under the Apache License, Version 2.0 (the "License");
+  you may not use this file except in compliance with the License.
+  You may obtain a copy of the License at
+
+   http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License. See accompanying LICENSE file.
+-->
+
+# Ozone S3 vs file-system semantics
+
+Ozone is an object-store for Hadoop ecosystem which can be used from multiple interfaces: 
+
+ 1. From Hadoop Compatible File Systems (will be called as *HCFS* in the remaining of this document) (RPC)
+ 2. From S3 compatible applications (REST)
+ 3. From container orchestrator as mounted volume (CSI, alpha feature)
+
+As Ozone is an object store it stores key and values in a flat hierarchy which is enough to support S3 (2). But to support Hadoop Compatible File System (and CSI), Ozone should simulated file system hierarchy.
+
+There are multiple challenges when file system hierarchy is simulated by a flat namespace:
+
+ 1. Some key patterns couldn't be easily transformed to file system path (e.g. `/a/b/../c`, `/a/b//d`, or a real key with directory path like `/b/d/`)
+ 2. Directory entries (which may have own properties) require special handling as file system interface requires a dir entry even if it's not created explicitly (for example if key `/a/b/c` is created `/a/b` supposed to be a visible directory entry for file system interface) 
+ 3. Non-recursive listing of directories can be hard (Listing direct entries under `/a` should ignore all the `/a/b/...`, `/a/b/c/...` keys) 
+ 4. Similar to listing, rename can be a costly operation as it requires to rename many keys (renaming a first level directory means a rename of all the keys with the same prefix)
+
+See also the [Hadoop S3A documentation](https://hadoop.apache.org/docs/current/hadoop-aws/tools/hadoop-aws/index.html#Introducing_the_Hadoop_S3A_client) which describes some of these problem when AWS S3 is used. (*Warnings* section)
+
+# Current status
+
+As of today *Ozone Manager* has two different interfaces (both are defined in `OmClientProtocol.proto`): 
+
+ 1. object store related functions (like *CreateKey*, *LookupKey*, *DeleteKey*,...)  
+ 2. file system related functions (like *CreateFile*, *LookupFile*,...)
+
+File system related functions uses the same flat hierarchy under the hood but includes additional functionalities. For example the `createFile` call creates all the intermediate directories for a specific key (create file `/a/b/c` will create `/a/b` and `/a` entries in the key space)
+
+Today, if a key is created from the S3 interface can cause exceptions if the intermediate directories are checked from HCFS:
+
+
+```shell
+$ aws s3api put-object --endpoint http://localhost:9878 --bucket bucket1 --key /a/b/c/d
+
+$ ozone fs -ls o3fs://bucket1.s3v/a/
+ls: `o3fs://bucket1.s3v/a/': No such file or directory
+```
+
+This problem is reported in [HDDS-3955](https://issues.apache.org/jira/browse/HDDS-3955), where a new configuration key is introduced (`ozone.om.enable.filesystem.paths`). If this is enabled, intermediate directories are created even if the object store interface is used.
+
+This configuration is turned off by default, which means that S3 and HCFS couldn't be used together.
+
+To solve the performance problems of the directory listing / rename, [HDDS-2939](https://issues.apache.org/jira/browse/HDDS-2939) is created, which propose to use a new prefix table to store the "directory" entries (=prefixes).
+
+[HDDS-4097](https://issues.apache.org/jira/browse/HDDS-4097) is created to normalize the key names based on file-system semantics if `ozone.om.enable.filesystem.paths` is enabled. But please note that `ozone.om.enable.filesystem.paths` should always be turned on if S3 and HCFS are both used. It means that if both S3 and HCFS are used, normalization is forced, and S3 interface is not fully AWS S3 compatible. There is no option to use HCFS and S3 but with full AWS compatibility (and reduced HCFS compatibility). 
+
+# Goals
+
+ * Out of the box Ozone should support both S3 and HCFS interfaces without any settings. (It's possible only for the regular, fs compatible key names)
+ * As 100% compatibility couldn't be achieved on both side we need a configuration to set the expectations for incompatible key names
+ * Default behavior of `o3fs` and `ofs` should be as close to `s3a` as possible (when s3 compatibilty is prefered)
+
+# Possible cases to support
+
+There are two main aspects of supporting both `ofs/o3fs` and `s3` together:
+
+ 1. `ofs/o3fs` require to create intermediate directory entries (for example `/a/b` for the key `/b/c/c`)
+ 2. Special file-system incompatible key names require special attention
+
+The second couldn't be done with compromise.
+
+ 1. We either support all key names (including non fs compatible key names), which means `ofs/o3fs` can provide only a partial view
+ 2. Or we can normalize the key names to be fs compatible (which makes it possible to create inconsistent S3 keys)
+
+HDDS-3955 introduced `ozone.om.enable.filesystem.paths`, with this setting we will have two possible usage pattern:
+
+| ozone.om.enable.filesystem.paths= | true | false
+|-|-|-|
+| create itermediate dirs | YES | NO |
+| normalize key names from `ofs/o3fs` | YES | NO
+| force to normalize key names of `s3` interface | YES | NO 
+| `s3` key `/a/b/c` available from `ofs/o3fs` | YES | NO
+| `s3` key `/a/b//c` available from `ofs/o3fs` | YES | NO
+| `s3` key `/a/b//c` available from `s3` | AWS S3 incompatibility | YES
+
+This proposal suggest to use a 3rd option where 100% AWS compatiblity is guaranteed in exchange of a limited `ofs/o3fs` view:
+
+| ozone.om.intermediate.dir.generation= | true |
+|-|-|-|
+| create itermediate dirs | YES | 
+| normalize key names from `ofs/o3fs` | YES |
+| force to normalize key names of `s3` interface | **NO** |
+| `s3` key `/a/b/c` available from `ofs/o3fs` | YES | 
+| `s3` key `/a/b//c` available from `ofs/o3fs` | NO | 
+| `s3` key `/a/b//c` available from `s3` | *YES* (100% AWS compatibility) |
+
+
+# Proposed solution
+
+In short: 
+
+ **I propose to make it possible to configure **normalization** and **intermediate dir creation**, independent from each other**
+ 
+It can be done in multiple ways. For the sake of simplicity, let's imagine two configuration option
+
+| configuration | behavior | 
+|-|-|
+| `ozone.om.enable.filesystem.paths=true`  | Enable intermediate dir generation **AND** key name normalization 
+| `ozone.om.enable.intermediate.dirs=true` | Enable only the intermediate dir generation
+
+## S3/HCFS Interoperability
+
+**In case of intermediate directory generation is enabled (with either of the configuraiton keys)**:
+
+When somebody creates a new key like `/a/b/c/d`, the same key should be visible from HCFS (`o3fs//` or `o3://`). `/a`, `/a/b` and `/a/b/c` should be visible as directories from HCFS.
+
+S3 should list only the `/a/b/c/d` keys, (`/a`, `/a/b`, `/a/b/c` keys, created to help HCFS, **won't be visible** if the key is created from S3)
+
+This can be done with persisting an extra flag with the implicit directory entries. These entries can be modified if they are explicit created.
+
+This flag should be added only for the keys which are created by S3. `ofs://` and `of3fs://`  create explicit directories all the time.
+
+Advantages of this approach:
+
+ 1. HCFS and S3 can work together
+ 2. S3 behavior is closer to the original AWS s3 behavior (when `/a/b/c` key is created `/a/b` won't be visible)
+
+## Handling of the incompatible paths
+
+As it's defined above the intermediate directory generation and normalization are two independent settings. (It's possible to choose only to create the intermediate directories).
+
+**If normalization is choosen**: (`ozone.om.enable.filesystem.paths=true`), all the key names will be normalized to fs-compatible name. It may cause a conflict (error) if the normalized key is already exists (or exists as a file instead of directory)
+
+**Without normalization (`ozone.om.enable.intermediate.dirs=true`)**:
+
+Creating intermediate directories might not be possible if path contains illegal characters or can't be parsed as a file system path. **These keys will be invisible from HCFS** by default. They will be ignored during the normal file list.

Review comment:
       The illegal char from S3 path can be encoded into FS path except the /. 

##########
File path: hadoop-hdds/docs/content/design/s3_hcfs.md
##########
@@ -0,0 +1,280 @@
+---
+title: S3/Ozone Filesystem inter-op 
+summary: How to support both S3 and HCFS and the same time
+date: 2020-09-09
+jira: HDDS-4097
+status: draft
+author: Marton Elek, 
+---
+<!--
+  Licensed under the Apache License, Version 2.0 (the "License");
+  you may not use this file except in compliance with the License.
+  You may obtain a copy of the License at
+
+   http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License. See accompanying LICENSE file.
+-->
+
+# Ozone S3 vs file-system semantics
+
+Ozone is an object-store for Hadoop ecosystem which can be used from multiple interfaces: 
+
+ 1. From Hadoop Compatible File Systems (will be called as *HCFS* in the remaining of this document) (RPC)
+ 2. From S3 compatible applications (REST)
+ 3. From container orchestrator as mounted volume (CSI, alpha feature)
+
+As Ozone is an object store it stores key and values in a flat hierarchy which is enough to support S3 (2). But to support Hadoop Compatible File System (and CSI), Ozone should simulated file system hierarchy.
+
+There are multiple challenges when file system hierarchy is simulated by a flat namespace:
+
+ 1. Some key patterns couldn't be easily transformed to file system path (e.g. `/a/b/../c`, `/a/b//d`, or a real key with directory path like `/b/d/`)
+ 2. Directory entries (which may have own properties) require special handling as file system interface requires a dir entry even if it's not created explicitly (for example if key `/a/b/c` is created `/a/b` supposed to be a visible directory entry for file system interface) 
+ 3. Non-recursive listing of directories can be hard (Listing direct entries under `/a` should ignore all the `/a/b/...`, `/a/b/c/...` keys) 
+ 4. Similar to listing, rename can be a costly operation as it requires to rename many keys (renaming a first level directory means a rename of all the keys with the same prefix)
+
+See also the [Hadoop S3A documentation](https://hadoop.apache.org/docs/current/hadoop-aws/tools/hadoop-aws/index.html#Introducing_the_Hadoop_S3A_client) which describes some of these problem when AWS S3 is used. (*Warnings* section)
+
+# Current status
+
+As of today *Ozone Manager* has two different interfaces (both are defined in `OmClientProtocol.proto`): 
+
+ 1. object store related functions (like *CreateKey*, *LookupKey*, *DeleteKey*,...)  
+ 2. file system related functions (like *CreateFile*, *LookupFile*,...)
+
+File system related functions uses the same flat hierarchy under the hood but includes additional functionalities. For example the `createFile` call creates all the intermediate directories for a specific key (create file `/a/b/c` will create `/a/b` and `/a` entries in the key space)
+
+Today, if a key is created from the S3 interface can cause exceptions if the intermediate directories are checked from HCFS:
+
+
+```shell
+$ aws s3api put-object --endpoint http://localhost:9878 --bucket bucket1 --key /a/b/c/d
+
+$ ozone fs -ls o3fs://bucket1.s3v/a/
+ls: `o3fs://bucket1.s3v/a/': No such file or directory
+```
+
+This problem is reported in [HDDS-3955](https://issues.apache.org/jira/browse/HDDS-3955), where a new configuration key is introduced (`ozone.om.enable.filesystem.paths`). If this is enabled, intermediate directories are created even if the object store interface is used.
+
+This configuration is turned off by default, which means that S3 and HCFS couldn't be used together.
+
+To solve the performance problems of the directory listing / rename, [HDDS-2939](https://issues.apache.org/jira/browse/HDDS-2939) is created, which propose to use a new prefix table to store the "directory" entries (=prefixes).
+
+[HDDS-4097](https://issues.apache.org/jira/browse/HDDS-4097) is created to normalize the key names based on file-system semantics if `ozone.om.enable.filesystem.paths` is enabled. But please note that `ozone.om.enable.filesystem.paths` should always be turned on if S3 and HCFS are both used. It means that if both S3 and HCFS are used, normalization is forced, and S3 interface is not fully AWS S3 compatible. There is no option to use HCFS and S3 but with full AWS compatibility (and reduced HCFS compatibility). 
+
+# Goals
+
+ * Out of the box Ozone should support both S3 and HCFS interfaces without any settings. (It's possible only for the regular, fs compatible key names)
+ * As 100% compatibility couldn't be achieved on both side we need a configuration to set the expectations for incompatible key names
+ * Default behavior of `o3fs` and `ofs` should be as close to `s3a` as possible (when s3 compatibilty is prefered)
+
+# Possible cases to support
+
+There are two main aspects of supporting both `ofs/o3fs` and `s3` together:
+
+ 1. `ofs/o3fs` require to create intermediate directory entries (for example `/a/b` for the key `/b/c/c`)

Review comment:
       /b/c/c => /a/b/c

##########
File path: hadoop-hdds/docs/content/design/s3_hcfs.md
##########
@@ -0,0 +1,280 @@
+---
+title: S3/Ozone Filesystem inter-op 
+summary: How to support both S3 and HCFS and the same time
+date: 2020-09-09
+jira: HDDS-4097
+status: draft
+author: Marton Elek, 
+---
+<!--
+  Licensed under the Apache License, Version 2.0 (the "License");
+  you may not use this file except in compliance with the License.
+  You may obtain a copy of the License at
+
+   http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License. See accompanying LICENSE file.
+-->
+
+# Ozone S3 vs file-system semantics
+
+Ozone is an object-store for Hadoop ecosystem which can be used from multiple interfaces: 
+
+ 1. From Hadoop Compatible File Systems (will be called as *HCFS* in the remaining of this document) (RPC)
+ 2. From S3 compatible applications (REST)
+ 3. From container orchestrator as mounted volume (CSI, alpha feature)
+
+As Ozone is an object store it stores key and values in a flat hierarchy which is enough to support S3 (2). But to support Hadoop Compatible File System (and CSI), Ozone should simulated file system hierarchy.
+
+There are multiple challenges when file system hierarchy is simulated by a flat namespace:
+
+ 1. Some key patterns couldn't be easily transformed to file system path (e.g. `/a/b/../c`, `/a/b//d`, or a real key with directory path like `/b/d/`)
+ 2. Directory entries (which may have own properties) require special handling as file system interface requires a dir entry even if it's not created explicitly (for example if key `/a/b/c` is created `/a/b` supposed to be a visible directory entry for file system interface) 
+ 3. Non-recursive listing of directories can be hard (Listing direct entries under `/a` should ignore all the `/a/b/...`, `/a/b/c/...` keys) 
+ 4. Similar to listing, rename can be a costly operation as it requires to rename many keys (renaming a first level directory means a rename of all the keys with the same prefix)
+
+See also the [Hadoop S3A documentation](https://hadoop.apache.org/docs/current/hadoop-aws/tools/hadoop-aws/index.html#Introducing_the_Hadoop_S3A_client) which describes some of these problem when AWS S3 is used. (*Warnings* section)
+
+# Current status
+
+As of today *Ozone Manager* has two different interfaces (both are defined in `OmClientProtocol.proto`): 
+
+ 1. object store related functions (like *CreateKey*, *LookupKey*, *DeleteKey*,...)  
+ 2. file system related functions (like *CreateFile*, *LookupFile*,...)
+
+File system related functions uses the same flat hierarchy under the hood but includes additional functionalities. For example the `createFile` call creates all the intermediate directories for a specific key (create file `/a/b/c` will create `/a/b` and `/a` entries in the key space)
+
+Today, if a key is created from the S3 interface can cause exceptions if the intermediate directories are checked from HCFS:
+
+
+```shell
+$ aws s3api put-object --endpoint http://localhost:9878 --bucket bucket1 --key /a/b/c/d
+
+$ ozone fs -ls o3fs://bucket1.s3v/a/
+ls: `o3fs://bucket1.s3v/a/': No such file or directory
+```
+
+This problem is reported in [HDDS-3955](https://issues.apache.org/jira/browse/HDDS-3955), where a new configuration key is introduced (`ozone.om.enable.filesystem.paths`). If this is enabled, intermediate directories are created even if the object store interface is used.
+
+This configuration is turned off by default, which means that S3 and HCFS couldn't be used together.
+
+To solve the performance problems of the directory listing / rename, [HDDS-2939](https://issues.apache.org/jira/browse/HDDS-2939) is created, which propose to use a new prefix table to store the "directory" entries (=prefixes).
+
+[HDDS-4097](https://issues.apache.org/jira/browse/HDDS-4097) is created to normalize the key names based on file-system semantics if `ozone.om.enable.filesystem.paths` is enabled. But please note that `ozone.om.enable.filesystem.paths` should always be turned on if S3 and HCFS are both used. It means that if both S3 and HCFS are used, normalization is forced, and S3 interface is not fully AWS S3 compatible. There is no option to use HCFS and S3 but with full AWS compatibility (and reduced HCFS compatibility). 
+
+# Goals
+
+ * Out of the box Ozone should support both S3 and HCFS interfaces without any settings. (It's possible only for the regular, fs compatible key names)
+ * As 100% compatibility couldn't be achieved on both side we need a configuration to set the expectations for incompatible key names
+ * Default behavior of `o3fs` and `ofs` should be as close to `s3a` as possible (when s3 compatibilty is prefered)
+
+# Possible cases to support
+
+There are two main aspects of supporting both `ofs/o3fs` and `s3` together:
+
+ 1. `ofs/o3fs` require to create intermediate directory entries (for example `/a/b` for the key `/b/c/c`)
+ 2. Special file-system incompatible key names require special attention
+
+The second couldn't be done with compromise.
+
+ 1. We either support all key names (including non fs compatible key names), which means `ofs/o3fs` can provide only a partial view
+ 2. Or we can normalize the key names to be fs compatible (which makes it possible to create inconsistent S3 keys)
+
+HDDS-3955 introduced `ozone.om.enable.filesystem.paths`, with this setting we will have two possible usage pattern:
+
+| ozone.om.enable.filesystem.paths= | true | false
+|-|-|-|
+| create itermediate dirs | YES | NO |
+| normalize key names from `ofs/o3fs` | YES | NO
+| force to normalize key names of `s3` interface | YES | NO 
+| `s3` key `/a/b/c` available from `ofs/o3fs` | YES | NO
+| `s3` key `/a/b//c` available from `ofs/o3fs` | YES | NO
+| `s3` key `/a/b//c` available from `s3` | AWS S3 incompatibility | YES

Review comment:
       Any pointer to S3 compatibility requirement? e.g., path handling, normalization, etc.

##########
File path: hadoop-hdds/docs/content/design/s3_hcfs.md
##########
@@ -0,0 +1,282 @@
+---
+title: S3/Ozone Filesystem inter-op 
+summary: How to support both S3 and HCFS and the same time
+date: 2020-09-09
+jira: HDDS-4097
+status: draft
+author: Marton Elek, 
+---
+<!--
+  Licensed under the Apache License, Version 2.0 (the "License");
+  you may not use this file except in compliance with the License.
+  You may obtain a copy of the License at
+
+   http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License. See accompanying LICENSE file.
+-->
+
+# Ozone S3 vs file-system semantics
+
+Ozone is an object-store for Hadoop ecosystem which can be used from multiple interfaces: 
+
+ 1. From Hadoop Compatible File Systems (will be called as *HCFS* in the remaining of this document) (RPC)
+ 2. From S3 compatible applications (REST)
+ 3. From container orchestrator as mounted volume (CSI, alpha feature)
+
+As Ozone is an object store it stores key and values in a flat hierarchy which is enough to support S3 (2). But to support Hadoop Compatible File System (and CSI), Ozone should simulated file system hierarchy.
+
+There are multiple challenges when file system hierarchy is simulated by a flat namespace:
+
+ 1. Some key patterns couldn't be easily transformed to file system path (e.g. `/a/b/../c`, `/a/b//d`, or a real key with directory path like `/b/d/`)
+ 2. Directory entries (which may have own properties) require special handling as file system interface requires a dir entry even if it's not created explicitly (for example if key `/a/b/c` is created `/a/b` supposed to be a visible directory entry for file system interface) 
+ 3. Non-recursive listing of directories can be hard (Listing direct entries under `/a` should ignore all the `/a/b/...`, `/a/b/c/...` keys) 
+ 4. Similar to listing, rename can be a costly operation as it requires to rename many keys (renaming a first level directory means a rename of all the keys with the same prefix)
+
+See also the [Hadoop S3A documentation](https://hadoop.apache.org/docs/current/hadoop-aws/tools/hadoop-aws/index.html#Introducing_the_Hadoop_S3A_client) which describes some of these problem when AWS S3 is used. (*Warnings* section)
+
+# Current status
+
+As of today *Ozone Manager* has two different interfaces (both are defined in `OmClientProtocol.proto`): 
+
+ 1. object store related functions (like *CreateKey*, *LookupKey*, *DeleteKey*,...)  
+ 2. file system related functions (like *CreateFile*, *LookupFile*,...)
+
+File system related functions uses the same flat hierarchy under the hood but includes additional functionalities. For example the `createFile` call creates all the intermediate directories for a specific key (create file `/a/b/c` will create `/a/b` and `/a` entries in the key space)
+
+Today, if a key is created from the S3 interface can cause exceptions if the intermediate directories are checked from HCFS:
+
+
+```shell
+$ aws s3api put-object --endpoint http://localhost:9878 --bucket bucket1 --key /a/b/c/d
+
+$ ozone fs -ls o3fs://bucket1.s3v/a/
+ls: `o3fs://bucket1.s3v/a/': No such file or directory
+```
+
+This problem is reported in [HDDS-3955](https://issues.apache.org/jira/browse/HDDS-3955), where a new configuration key is introduced (`ozone.om.enable.filesystem.paths`). If this is enabled, intermediate directories are created even if the object store interface is used.
+
+This configuration is turned off by default, which means that S3 and HCFS couldn't be used together.
+
+To solve the performance problems of the directory listing / rename, [HDDS-2939](https://issues.apache.org/jira/browse/HDDS-2939) is created, which propose to use a new prefix table to store the "directory" entries (=prefixes).
+
+[HDDS-4097](https://issues.apache.org/jira/browse/HDDS-4097) is created to normalize the key names based on file-system semantics if `ozone.om.enable.filesystem.paths` is enabled. But please note that `ozone.om.enable.filesystem.paths` should always be turned on if S3 and HCFS are both used which means that S3 and HCFS couldn't be used together with normalization.
+
+# Goals
+
+ * Out of the box Ozone should support both S3 and HCFS interfaces without any settings. (It's possible only for the regular, fs compatible key names)
+ * As 100% compatibility couldn't be achieved on both side we need a configuration to set the expectations for incompatible key names
+ * Default behavior of `o3fs` and `ofs` should be as close to `s3a` as possible (when s3 compatibilty is prefered)
+
+# Possible cases to support
+
+There are two main aspects of supporting both `ofs/o3fs` and `s3` together:
+
+ 1. `ofs/o3fs` require to create intermediate directory entries (for exapmle `/a/b` for the key `/b/c/c`)
+ 2. Special file-system incompatible key names require special attention
+
+The second couldn't be done with compromise.
+
+ 1. We either support all key names (including non fs compatible key names), which means `ofs/o3fs` can provide only a partial view
+ 2. Or we can normalize the key names to be fs compatible (which makes it possible to create inconsistent S3 keys)
+
+HDDS-3955 introduced `ozone.om.enable.filesystem.paths`, with this setting we will have two possible usage pattern:
+
+| ozone.om.enable.filesystem.paths= | true | false
+|-|-|-|
+| create itermediate dirs | YES | NO |
+| normalize key names from `ofs/o3fs` | YES | NO
+| force to normalize key names of `s3` interface | YES (1) | NO 
+| `s3` key `/a/b/c` available from `ofs/o3fs` | YES | NO
+| `s3` key `/a/b//c` available from `ofs/o3fs` | YES | NO
+| `s3` key `/a/b//c` available from `s3` | AWS S3 incompatibility | YES
+
+(1): Under implementation
+
+This proposal suggest to use a 3rd option where 100% AWS compatiblity is guaranteed in exchange of a limited `ofs/o3fs` view:
+
+| ozone.om.intermediate.dir.generation= | true |
+|-|-|-|
+| create itermediate dirs | YES | 
+| normalize key names from `ofs/o3fs` | YES |
+| force to normalize key names of `s3` interface | **NO** |
+| `s3` key `/a/b/c` available from `ofs/o3fs` | YES | 
+| `s3` key `/a/b//c` available from `ofs/o3fs` | NO | 
+| `s3` key `/a/b//c` available from `s3` | *YES* (100% AWS compatibility) |
+
+
+# Proposed solution
+
+In short: 
+
+ **I propose to make it possible to configure **normalization** and **intermediate dir creation**, independent from each other**
+ 
+It can be done in multiple ways. For the sake of simplicity, let's imagine two configuration option
+
+| configuration | behavior | 
+|-|-|
+| `ozone.om.enable.filesystem.paths=true`  | Enable intermediate dir generation **AND** key name normalization 

Review comment:
       bq. Existing config means CREATE_DIR+NORMALIZE, new config is just CREATE_DIR.
   
   Are these the only two cases that are useful? do we need to support other combinations?

##########
File path: hadoop-hdds/docs/content/design/s3_hcfs.md
##########
@@ -0,0 +1,282 @@
+---
+title: S3/Ozone Filesystem inter-op 
+summary: How to support both S3 and HCFS and the same time
+date: 2020-09-09
+jira: HDDS-4097
+status: draft
+author: Marton Elek, 
+---
+<!--
+  Licensed under the Apache License, Version 2.0 (the "License");
+  you may not use this file except in compliance with the License.
+  You may obtain a copy of the License at
+
+   http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License. See accompanying LICENSE file.
+-->
+
+# Ozone S3 vs file-system semantics
+
+Ozone is an object-store for Hadoop ecosystem which can be used from multiple interfaces: 
+
+ 1. From Hadoop Compatible File Systems (will be called as *HCFS* in the remaining of this document) (RPC)
+ 2. From S3 compatible applications (REST)
+ 3. From container orchestrator as mounted volume (CSI, alpha feature)
+
+As Ozone is an object store it stores key and values in a flat hierarchy which is enough to support S3 (2). But to support Hadoop Compatible File System (and CSI), Ozone should simulated file system hierarchy.
+
+There are multiple challenges when file system hierarchy is simulated by a flat namespace:
+
+ 1. Some key patterns couldn't be easily transformed to file system path (e.g. `/a/b/../c`, `/a/b//d`, or a real key with directory path like `/b/d/`)
+ 2. Directory entries (which may have own properties) require special handling as file system interface requires a dir entry even if it's not created explicitly (for example if key `/a/b/c` is created `/a/b` supposed to be a visible directory entry for file system interface) 
+ 3. Non-recursive listing of directories can be hard (Listing direct entries under `/a` should ignore all the `/a/b/...`, `/a/b/c/...` keys) 
+ 4. Similar to listing, rename can be a costly operation as it requires to rename many keys (renaming a first level directory means a rename of all the keys with the same prefix)
+
+See also the [Hadoop S3A documentation](https://hadoop.apache.org/docs/current/hadoop-aws/tools/hadoop-aws/index.html#Introducing_the_Hadoop_S3A_client) which describes some of these problem when AWS S3 is used. (*Warnings* section)
+
+# Current status
+
+As of today *Ozone Manager* has two different interfaces (both are defined in `OmClientProtocol.proto`): 
+
+ 1. object store related functions (like *CreateKey*, *LookupKey*, *DeleteKey*,...)  
+ 2. file system related functions (like *CreateFile*, *LookupFile*,...)
+
+File system related functions uses the same flat hierarchy under the hood but includes additional functionalities. For example the `createFile` call creates all the intermediate directories for a specific key (create file `/a/b/c` will create `/a/b` and `/a` entries in the key space)
+
+Today, if a key is created from the S3 interface can cause exceptions if the intermediate directories are checked from HCFS:
+
+
+```shell
+$ aws s3api put-object --endpoint http://localhost:9878 --bucket bucket1 --key /a/b/c/d
+
+$ ozone fs -ls o3fs://bucket1.s3v/a/
+ls: `o3fs://bucket1.s3v/a/': No such file or directory
+```
+
+This problem is reported in [HDDS-3955](https://issues.apache.org/jira/browse/HDDS-3955), where a new configuration key is introduced (`ozone.om.enable.filesystem.paths`). If this is enabled, intermediate directories are created even if the object store interface is used.
+
+This configuration is turned off by default, which means that S3 and HCFS couldn't be used together.
+
+To solve the performance problems of the directory listing / rename, [HDDS-2939](https://issues.apache.org/jira/browse/HDDS-2939) is created, which propose to use a new prefix table to store the "directory" entries (=prefixes).
+
+[HDDS-4097](https://issues.apache.org/jira/browse/HDDS-4097) is created to normalize the key names based on file-system semantics if `ozone.om.enable.filesystem.paths` is enabled. But please note that `ozone.om.enable.filesystem.paths` should always be turned on if S3 and HCFS are both used which means that S3 and HCFS couldn't be used together with normalization.
+
+# Goals
+
+ * Out of the box Ozone should support both S3 and HCFS interfaces without any settings. (It's possible only for the regular, fs compatible key names)
+ * As 100% compatibility couldn't be achieved on both side we need a configuration to set the expectations for incompatible key names
+ * Default behavior of `o3fs` and `ofs` should be as close to `s3a` as possible (when s3 compatibilty is prefered)
+
+# Possible cases to support
+
+There are two main aspects of supporting both `ofs/o3fs` and `s3` together:
+
+ 1. `ofs/o3fs` require to create intermediate directory entries (for exapmle `/a/b` for the key `/b/c/c`)
+ 2. Special file-system incompatible key names require special attention
+
+The second couldn't be done with compromise.
+
+ 1. We either support all key names (including non fs compatible key names), which means `ofs/o3fs` can provide only a partial view
+ 2. Or we can normalize the key names to be fs compatible (which makes it possible to create inconsistent S3 keys)
+
+HDDS-3955 introduced `ozone.om.enable.filesystem.paths`, with this setting we will have two possible usage pattern:
+
+| ozone.om.enable.filesystem.paths= | true | false
+|-|-|-|
+| create itermediate dirs | YES | NO |
+| normalize key names from `ofs/o3fs` | YES | NO
+| force to normalize key names of `s3` interface | YES (1) | NO 
+| `s3` key `/a/b/c` available from `ofs/o3fs` | YES | NO
+| `s3` key `/a/b//c` available from `ofs/o3fs` | YES | NO
+| `s3` key `/a/b//c` available from `s3` | AWS S3 incompatibility | YES
+
+(1): Under implementation
+
+This proposal suggest to use a 3rd option where 100% AWS compatiblity is guaranteed in exchange of a limited `ofs/o3fs` view:
+
+| ozone.om.intermediate.dir.generation= | true |
+|-|-|-|
+| create itermediate dirs | YES | 
+| normalize key names from `ofs/o3fs` | YES |
+| force to normalize key names of `s3` interface | **NO** |
+| `s3` key `/a/b/c` available from `ofs/o3fs` | YES | 
+| `s3` key `/a/b//c` available from `ofs/o3fs` | NO | 
+| `s3` key `/a/b//c` available from `s3` | *YES* (100% AWS compatibility) |
+
+
+# Proposed solution
+
+In short: 
+
+ **I propose to make it possible to configure **normalization** and **intermediate dir creation**, independent from each other**
+ 
+It can be done in multiple ways. For the sake of simplicity, let's imagine two configuration option
+
+| configuration | behavior | 
+|-|-|
+| `ozone.om.enable.filesystem.paths=true`  | Enable intermediate dir generation **AND** key name normalization 
+| `ozone.om.enable.intermediate.dirs=true` | Enable only the intermediate dir generation
+
+## S3/HCFS Interoperability
+
+**In case of intermediate directory generation is enabled (with either of the configuraiton keys)**:
+
+When somebody creates a new key like `/a/b/c/d`, the same key should be visible from HCFS (`o3fs//` or `o3://`). `/a`, `/a/b` and `/a/b/c` should be visible as directories from HCFS.
+
+S3 should list only the `/a/b/c/d` keys, (`/a`, `/a/b`, `/a/b/c` keys, created to help HCFS, **won't be visible** if the key is created from S3)
+
+This can be done with persisting an extra flag with the implicit directory entries. These entries can be modified if they are explicit created.
+
+This flag should be added only for the keys which are created by S3. `ofs://` and `of3fs://`  create explicit directories all the time.

Review comment:
       bq. S3 should list only the `/a/b/c/d` keys, (`/a`, `/a/b`, `/a/b/c` keys, created to help HCFS, **won't be visible** if the key is created from S3)
   
   what if user create additional files under intermediate dirs such as /a/b/d via FS interface, we still want to show them in this case for interop?
   

##########
File path: hadoop-hdds/docs/content/design/s3_hcfs.md
##########
@@ -0,0 +1,280 @@
+---
+title: S3/Ozone Filesystem inter-op 
+summary: How to support both S3 and HCFS and the same time
+date: 2020-09-09
+jira: HDDS-4097
+status: draft
+author: Marton Elek, 
+---
+<!--
+  Licensed under the Apache License, Version 2.0 (the "License");
+  you may not use this file except in compliance with the License.
+  You may obtain a copy of the License at
+
+   http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License. See accompanying LICENSE file.
+-->
+
+# Ozone S3 vs file-system semantics
+
+Ozone is an object-store for Hadoop ecosystem which can be used from multiple interfaces: 
+
+ 1. From Hadoop Compatible File Systems (will be called as *HCFS* in the remaining of this document) (RPC)
+ 2. From S3 compatible applications (REST)
+ 3. From container orchestrator as mounted volume (CSI, alpha feature)
+
+As Ozone is an object store it stores key and values in a flat hierarchy which is enough to support S3 (2). But to support Hadoop Compatible File System (and CSI), Ozone should simulated file system hierarchy.
+
+There are multiple challenges when file system hierarchy is simulated by a flat namespace:
+
+ 1. Some key patterns couldn't be easily transformed to file system path (e.g. `/a/b/../c`, `/a/b//d`, or a real key with directory path like `/b/d/`)
+ 2. Directory entries (which may have own properties) require special handling as file system interface requires a dir entry even if it's not created explicitly (for example if key `/a/b/c` is created `/a/b` supposed to be a visible directory entry for file system interface) 
+ 3. Non-recursive listing of directories can be hard (Listing direct entries under `/a` should ignore all the `/a/b/...`, `/a/b/c/...` keys) 
+ 4. Similar to listing, rename can be a costly operation as it requires to rename many keys (renaming a first level directory means a rename of all the keys with the same prefix)
+
+See also the [Hadoop S3A documentation](https://hadoop.apache.org/docs/current/hadoop-aws/tools/hadoop-aws/index.html#Introducing_the_Hadoop_S3A_client) which describes some of these problem when AWS S3 is used. (*Warnings* section)
+
+# Current status
+
+As of today *Ozone Manager* has two different interfaces (both are defined in `OmClientProtocol.proto`): 
+
+ 1. object store related functions (like *CreateKey*, *LookupKey*, *DeleteKey*,...)  
+ 2. file system related functions (like *CreateFile*, *LookupFile*,...)
+
+File system related functions uses the same flat hierarchy under the hood but includes additional functionalities. For example the `createFile` call creates all the intermediate directories for a specific key (create file `/a/b/c` will create `/a/b` and `/a` entries in the key space)
+
+Today, if a key is created from the S3 interface can cause exceptions if the intermediate directories are checked from HCFS:
+
+
+```shell
+$ aws s3api put-object --endpoint http://localhost:9878 --bucket bucket1 --key /a/b/c/d
+
+$ ozone fs -ls o3fs://bucket1.s3v/a/
+ls: `o3fs://bucket1.s3v/a/': No such file or directory
+```
+
+This problem is reported in [HDDS-3955](https://issues.apache.org/jira/browse/HDDS-3955), where a new configuration key is introduced (`ozone.om.enable.filesystem.paths`). If this is enabled, intermediate directories are created even if the object store interface is used.
+
+This configuration is turned off by default, which means that S3 and HCFS couldn't be used together.
+
+To solve the performance problems of the directory listing / rename, [HDDS-2939](https://issues.apache.org/jira/browse/HDDS-2939) is created, which propose to use a new prefix table to store the "directory" entries (=prefixes).
+
+[HDDS-4097](https://issues.apache.org/jira/browse/HDDS-4097) is created to normalize the key names based on file-system semantics if `ozone.om.enable.filesystem.paths` is enabled. But please note that `ozone.om.enable.filesystem.paths` should always be turned on if S3 and HCFS are both used. It means that if both S3 and HCFS are used, normalization is forced, and S3 interface is not fully AWS S3 compatible. There is no option to use HCFS and S3 but with full AWS compatibility (and reduced HCFS compatibility). 
+
+# Goals
+
+ * Out of the box Ozone should support both S3 and HCFS interfaces without any settings. (It's possible only for the regular, fs compatible key names)
+ * As 100% compatibility couldn't be achieved on both side we need a configuration to set the expectations for incompatible key names
+ * Default behavior of `o3fs` and `ofs` should be as close to `s3a` as possible (when s3 compatibilty is prefered)
+
+# Possible cases to support
+
+There are two main aspects of supporting both `ofs/o3fs` and `s3` together:
+
+ 1. `ofs/o3fs` require to create intermediate directory entries (for example `/a/b` for the key `/b/c/c`)
+ 2. Special file-system incompatible key names require special attention
+
+The second couldn't be done with compromise.
+
+ 1. We either support all key names (including non fs compatible key names), which means `ofs/o3fs` can provide only a partial view
+ 2. Or we can normalize the key names to be fs compatible (which makes it possible to create inconsistent S3 keys)
+
+HDDS-3955 introduced `ozone.om.enable.filesystem.paths`, with this setting we will have two possible usage pattern:
+
+| ozone.om.enable.filesystem.paths= | true | false
+|-|-|-|
+| create itermediate dirs | YES | NO |
+| normalize key names from `ofs/o3fs` | YES | NO
+| force to normalize key names of `s3` interface | YES | NO 
+| `s3` key `/a/b/c` available from `ofs/o3fs` | YES | NO
+| `s3` key `/a/b//c` available from `ofs/o3fs` | YES | NO
+| `s3` key `/a/b//c` available from `s3` | AWS S3 incompatibility | YES
+
+This proposal suggest to use a 3rd option where 100% AWS compatiblity is guaranteed in exchange of a limited `ofs/o3fs` view:
+
+| ozone.om.intermediate.dir.generation= | true |
+|-|-|-|
+| create itermediate dirs | YES | 
+| normalize key names from `ofs/o3fs` | YES |
+| force to normalize key names of `s3` interface | **NO** |
+| `s3` key `/a/b/c` available from `ofs/o3fs` | YES | 
+| `s3` key `/a/b//c` available from `ofs/o3fs` | NO | 
+| `s3` key `/a/b//c` available from `s3` | *YES* (100% AWS compatibility) |
+
+
+# Proposed solution
+
+In short: 
+
+ **I propose to make it possible to configure **normalization** and **intermediate dir creation**, independent from each other**
+ 
+It can be done in multiple ways. For the sake of simplicity, let's imagine two configuration option
+
+| configuration | behavior | 
+|-|-|
+| `ozone.om.enable.filesystem.paths=true`  | Enable intermediate dir generation **AND** key name normalization 
+| `ozone.om.enable.intermediate.dirs=true` | Enable only the intermediate dir generation
+
+## S3/HCFS Interoperability
+
+**In case of intermediate directory generation is enabled (with either of the configuraiton keys)**:
+
+When somebody creates a new key like `/a/b/c/d`, the same key should be visible from HCFS (`o3fs//` or `o3://`). `/a`, `/a/b` and `/a/b/c` should be visible as directories from HCFS.
+
+S3 should list only the `/a/b/c/d` keys, (`/a`, `/a/b`, `/a/b/c` keys, created to help HCFS, **won't be visible** if the key is created from S3)
+
+This can be done with persisting an extra flag with the implicit directory entries. These entries can be modified if they are explicit created.
+
+This flag should be added only for the keys which are created by S3. `ofs://` and `of3fs://`  create explicit directories all the time.
+
+Advantages of this approach:
+
+ 1. HCFS and S3 can work together
+ 2. S3 behavior is closer to the original AWS s3 behavior (when `/a/b/c` key is created `/a/b` won't be visible)
+
+## Handling of the incompatible paths
+
+As it's defined above the intermediate directory generation and normalization are two independent settings. (It's possible to choose only to create the intermediate directories).
+
+**If normalization is choosen**: (`ozone.om.enable.filesystem.paths=true`), all the key names will be normalized to fs-compatible name. It may cause a conflict (error) if the normalized key is already exists (or exists as a file instead of directory)
+
+**Without normalization (`ozone.om.enable.intermediate.dirs=true`)**:
+
+Creating intermediate directories might not be possible if path contains illegal characters or can't be parsed as a file system path. **These keys will be invisible from HCFS** by default. They will be ignored during the normal file list.

Review comment:
       The illegal char from S3 path can be encoded into FS path except the /. 

##########
File path: hadoop-hdds/docs/content/design/s3_hcfs.md
##########
@@ -0,0 +1,280 @@
+---
+title: S3/Ozone Filesystem inter-op 
+summary: How to support both S3 and HCFS and the same time
+date: 2020-09-09
+jira: HDDS-4097
+status: draft
+author: Marton Elek, 
+---
+<!--
+  Licensed under the Apache License, Version 2.0 (the "License");
+  you may not use this file except in compliance with the License.
+  You may obtain a copy of the License at
+
+   http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License. See accompanying LICENSE file.
+-->
+
+# Ozone S3 vs file-system semantics
+
+Ozone is an object-store for Hadoop ecosystem which can be used from multiple interfaces: 
+
+ 1. From Hadoop Compatible File Systems (will be called as *HCFS* in the remaining of this document) (RPC)
+ 2. From S3 compatible applications (REST)
+ 3. From container orchestrator as mounted volume (CSI, alpha feature)
+
+As Ozone is an object store it stores key and values in a flat hierarchy which is enough to support S3 (2). But to support Hadoop Compatible File System (and CSI), Ozone should simulated file system hierarchy.
+
+There are multiple challenges when file system hierarchy is simulated by a flat namespace:
+
+ 1. Some key patterns couldn't be easily transformed to file system path (e.g. `/a/b/../c`, `/a/b//d`, or a real key with directory path like `/b/d/`)
+ 2. Directory entries (which may have own properties) require special handling as file system interface requires a dir entry even if it's not created explicitly (for example if key `/a/b/c` is created `/a/b` supposed to be a visible directory entry for file system interface) 
+ 3. Non-recursive listing of directories can be hard (Listing direct entries under `/a` should ignore all the `/a/b/...`, `/a/b/c/...` keys) 
+ 4. Similar to listing, rename can be a costly operation as it requires to rename many keys (renaming a first level directory means a rename of all the keys with the same prefix)
+
+See also the [Hadoop S3A documentation](https://hadoop.apache.org/docs/current/hadoop-aws/tools/hadoop-aws/index.html#Introducing_the_Hadoop_S3A_client) which describes some of these problem when AWS S3 is used. (*Warnings* section)
+
+# Current status
+
+As of today *Ozone Manager* has two different interfaces (both are defined in `OmClientProtocol.proto`): 
+
+ 1. object store related functions (like *CreateKey*, *LookupKey*, *DeleteKey*,...)  
+ 2. file system related functions (like *CreateFile*, *LookupFile*,...)
+
+File system related functions uses the same flat hierarchy under the hood but includes additional functionalities. For example the `createFile` call creates all the intermediate directories for a specific key (create file `/a/b/c` will create `/a/b` and `/a` entries in the key space)
+
+Today, if a key is created from the S3 interface can cause exceptions if the intermediate directories are checked from HCFS:
+
+
+```shell
+$ aws s3api put-object --endpoint http://localhost:9878 --bucket bucket1 --key /a/b/c/d
+
+$ ozone fs -ls o3fs://bucket1.s3v/a/
+ls: `o3fs://bucket1.s3v/a/': No such file or directory
+```
+
+This problem is reported in [HDDS-3955](https://issues.apache.org/jira/browse/HDDS-3955), where a new configuration key is introduced (`ozone.om.enable.filesystem.paths`). If this is enabled, intermediate directories are created even if the object store interface is used.
+
+This configuration is turned off by default, which means that S3 and HCFS couldn't be used together.
+
+To solve the performance problems of the directory listing / rename, [HDDS-2939](https://issues.apache.org/jira/browse/HDDS-2939) is created, which propose to use a new prefix table to store the "directory" entries (=prefixes).
+
+[HDDS-4097](https://issues.apache.org/jira/browse/HDDS-4097) is created to normalize the key names based on file-system semantics if `ozone.om.enable.filesystem.paths` is enabled. But please note that `ozone.om.enable.filesystem.paths` should always be turned on if S3 and HCFS are both used. It means that if both S3 and HCFS are used, normalization is forced, and S3 interface is not fully AWS S3 compatible. There is no option to use HCFS and S3 but with full AWS compatibility (and reduced HCFS compatibility). 
+
+# Goals
+
+ * Out of the box Ozone should support both S3 and HCFS interfaces without any settings. (It's possible only for the regular, fs compatible key names)
+ * As 100% compatibility couldn't be achieved on both side we need a configuration to set the expectations for incompatible key names
+ * Default behavior of `o3fs` and `ofs` should be as close to `s3a` as possible (when s3 compatibilty is prefered)
+
+# Possible cases to support
+
+There are two main aspects of supporting both `ofs/o3fs` and `s3` together:
+
+ 1. `ofs/o3fs` require to create intermediate directory entries (for example `/a/b` for the key `/b/c/c`)

Review comment:
       /b/c/c => /a/b/c

##########
File path: hadoop-hdds/docs/content/design/s3_hcfs.md
##########
@@ -0,0 +1,280 @@
+---
+title: S3/Ozone Filesystem inter-op 
+summary: How to support both S3 and HCFS and the same time
+date: 2020-09-09
+jira: HDDS-4097
+status: draft
+author: Marton Elek, 
+---
+<!--
+  Licensed under the Apache License, Version 2.0 (the "License");
+  you may not use this file except in compliance with the License.
+  You may obtain a copy of the License at
+
+   http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License. See accompanying LICENSE file.
+-->
+
+# Ozone S3 vs file-system semantics
+
+Ozone is an object-store for Hadoop ecosystem which can be used from multiple interfaces: 
+
+ 1. From Hadoop Compatible File Systems (will be called as *HCFS* in the remaining of this document) (RPC)
+ 2. From S3 compatible applications (REST)
+ 3. From container orchestrator as mounted volume (CSI, alpha feature)
+
+As Ozone is an object store it stores key and values in a flat hierarchy which is enough to support S3 (2). But to support Hadoop Compatible File System (and CSI), Ozone should simulated file system hierarchy.
+
+There are multiple challenges when file system hierarchy is simulated by a flat namespace:
+
+ 1. Some key patterns couldn't be easily transformed to file system path (e.g. `/a/b/../c`, `/a/b//d`, or a real key with directory path like `/b/d/`)
+ 2. Directory entries (which may have own properties) require special handling as file system interface requires a dir entry even if it's not created explicitly (for example if key `/a/b/c` is created `/a/b` supposed to be a visible directory entry for file system interface) 
+ 3. Non-recursive listing of directories can be hard (Listing direct entries under `/a` should ignore all the `/a/b/...`, `/a/b/c/...` keys) 
+ 4. Similar to listing, rename can be a costly operation as it requires to rename many keys (renaming a first level directory means a rename of all the keys with the same prefix)
+
+See also the [Hadoop S3A documentation](https://hadoop.apache.org/docs/current/hadoop-aws/tools/hadoop-aws/index.html#Introducing_the_Hadoop_S3A_client) which describes some of these problem when AWS S3 is used. (*Warnings* section)
+
+# Current status
+
+As of today *Ozone Manager* has two different interfaces (both are defined in `OmClientProtocol.proto`): 
+
+ 1. object store related functions (like *CreateKey*, *LookupKey*, *DeleteKey*,...)  
+ 2. file system related functions (like *CreateFile*, *LookupFile*,...)
+
+File system related functions uses the same flat hierarchy under the hood but includes additional functionalities. For example the `createFile` call creates all the intermediate directories for a specific key (create file `/a/b/c` will create `/a/b` and `/a` entries in the key space)
+
+Today, if a key is created from the S3 interface can cause exceptions if the intermediate directories are checked from HCFS:
+
+
+```shell
+$ aws s3api put-object --endpoint http://localhost:9878 --bucket bucket1 --key /a/b/c/d
+
+$ ozone fs -ls o3fs://bucket1.s3v/a/
+ls: `o3fs://bucket1.s3v/a/': No such file or directory
+```
+
+This problem is reported in [HDDS-3955](https://issues.apache.org/jira/browse/HDDS-3955), where a new configuration key is introduced (`ozone.om.enable.filesystem.paths`). If this is enabled, intermediate directories are created even if the object store interface is used.
+
+This configuration is turned off by default, which means that S3 and HCFS couldn't be used together.
+
+To solve the performance problems of the directory listing / rename, [HDDS-2939](https://issues.apache.org/jira/browse/HDDS-2939) is created, which propose to use a new prefix table to store the "directory" entries (=prefixes).
+
+[HDDS-4097](https://issues.apache.org/jira/browse/HDDS-4097) is created to normalize the key names based on file-system semantics if `ozone.om.enable.filesystem.paths` is enabled. But please note that `ozone.om.enable.filesystem.paths` should always be turned on if S3 and HCFS are both used. It means that if both S3 and HCFS are used, normalization is forced, and S3 interface is not fully AWS S3 compatible. There is no option to use HCFS and S3 but with full AWS compatibility (and reduced HCFS compatibility). 
+
+# Goals
+
+ * Out of the box Ozone should support both S3 and HCFS interfaces without any settings. (It's possible only for the regular, fs compatible key names)
+ * As 100% compatibility couldn't be achieved on both side we need a configuration to set the expectations for incompatible key names
+ * Default behavior of `o3fs` and `ofs` should be as close to `s3a` as possible (when s3 compatibilty is prefered)
+
+# Possible cases to support
+
+There are two main aspects of supporting both `ofs/o3fs` and `s3` together:
+
+ 1. `ofs/o3fs` require to create intermediate directory entries (for example `/a/b` for the key `/b/c/c`)
+ 2. Special file-system incompatible key names require special attention
+
+The second couldn't be done with compromise.
+
+ 1. We either support all key names (including non fs compatible key names), which means `ofs/o3fs` can provide only a partial view
+ 2. Or we can normalize the key names to be fs compatible (which makes it possible to create inconsistent S3 keys)
+
+HDDS-3955 introduced `ozone.om.enable.filesystem.paths`, with this setting we will have two possible usage pattern:
+
+| ozone.om.enable.filesystem.paths= | true | false
+|-|-|-|
+| create itermediate dirs | YES | NO |
+| normalize key names from `ofs/o3fs` | YES | NO
+| force to normalize key names of `s3` interface | YES | NO 
+| `s3` key `/a/b/c` available from `ofs/o3fs` | YES | NO
+| `s3` key `/a/b//c` available from `ofs/o3fs` | YES | NO
+| `s3` key `/a/b//c` available from `s3` | AWS S3 incompatibility | YES

Review comment:
       Any pointer to S3 compatibility requirement? e.g., path handling, normalization, etc.

##########
File path: hadoop-hdds/docs/content/design/s3_hcfs.md
##########
@@ -0,0 +1,282 @@
+---
+title: S3/Ozone Filesystem inter-op 
+summary: How to support both S3 and HCFS and the same time
+date: 2020-09-09
+jira: HDDS-4097
+status: draft
+author: Marton Elek, 
+---
+<!--
+  Licensed under the Apache License, Version 2.0 (the "License");
+  you may not use this file except in compliance with the License.
+  You may obtain a copy of the License at
+
+   http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License. See accompanying LICENSE file.
+-->
+
+# Ozone S3 vs file-system semantics
+
+Ozone is an object-store for Hadoop ecosystem which can be used from multiple interfaces: 
+
+ 1. From Hadoop Compatible File Systems (will be called as *HCFS* in the remaining of this document) (RPC)
+ 2. From S3 compatible applications (REST)
+ 3. From container orchestrator as mounted volume (CSI, alpha feature)
+
+As Ozone is an object store it stores key and values in a flat hierarchy which is enough to support S3 (2). But to support Hadoop Compatible File System (and CSI), Ozone should simulated file system hierarchy.
+
+There are multiple challenges when file system hierarchy is simulated by a flat namespace:
+
+ 1. Some key patterns couldn't be easily transformed to file system path (e.g. `/a/b/../c`, `/a/b//d`, or a real key with directory path like `/b/d/`)
+ 2. Directory entries (which may have own properties) require special handling as file system interface requires a dir entry even if it's not created explicitly (for example if key `/a/b/c` is created `/a/b` supposed to be a visible directory entry for file system interface) 
+ 3. Non-recursive listing of directories can be hard (Listing direct entries under `/a` should ignore all the `/a/b/...`, `/a/b/c/...` keys) 
+ 4. Similar to listing, rename can be a costly operation as it requires to rename many keys (renaming a first level directory means a rename of all the keys with the same prefix)
+
+See also the [Hadoop S3A documentation](https://hadoop.apache.org/docs/current/hadoop-aws/tools/hadoop-aws/index.html#Introducing_the_Hadoop_S3A_client) which describes some of these problem when AWS S3 is used. (*Warnings* section)
+
+# Current status
+
+As of today *Ozone Manager* has two different interfaces (both are defined in `OmClientProtocol.proto`): 
+
+ 1. object store related functions (like *CreateKey*, *LookupKey*, *DeleteKey*,...)  
+ 2. file system related functions (like *CreateFile*, *LookupFile*,...)
+
+File system related functions uses the same flat hierarchy under the hood but includes additional functionalities. For example the `createFile` call creates all the intermediate directories for a specific key (create file `/a/b/c` will create `/a/b` and `/a` entries in the key space)
+
+Today, if a key is created from the S3 interface can cause exceptions if the intermediate directories are checked from HCFS:
+
+
+```shell
+$ aws s3api put-object --endpoint http://localhost:9878 --bucket bucket1 --key /a/b/c/d
+
+$ ozone fs -ls o3fs://bucket1.s3v/a/
+ls: `o3fs://bucket1.s3v/a/': No such file or directory
+```
+
+This problem is reported in [HDDS-3955](https://issues.apache.org/jira/browse/HDDS-3955), where a new configuration key is introduced (`ozone.om.enable.filesystem.paths`). If this is enabled, intermediate directories are created even if the object store interface is used.
+
+This configuration is turned off by default, which means that S3 and HCFS couldn't be used together.
+
+To solve the performance problems of the directory listing / rename, [HDDS-2939](https://issues.apache.org/jira/browse/HDDS-2939) is created, which propose to use a new prefix table to store the "directory" entries (=prefixes).
+
+[HDDS-4097](https://issues.apache.org/jira/browse/HDDS-4097) is created to normalize the key names based on file-system semantics if `ozone.om.enable.filesystem.paths` is enabled. But please note that `ozone.om.enable.filesystem.paths` should always be turned on if S3 and HCFS are both used which means that S3 and HCFS couldn't be used together with normalization.
+
+# Goals
+
+ * Out of the box Ozone should support both S3 and HCFS interfaces without any settings. (It's possible only for the regular, fs compatible key names)
+ * As 100% compatibility couldn't be achieved on both side we need a configuration to set the expectations for incompatible key names
+ * Default behavior of `o3fs` and `ofs` should be as close to `s3a` as possible (when s3 compatibilty is prefered)
+
+# Possible cases to support
+
+There are two main aspects of supporting both `ofs/o3fs` and `s3` together:
+
+ 1. `ofs/o3fs` require to create intermediate directory entries (for exapmle `/a/b` for the key `/b/c/c`)
+ 2. Special file-system incompatible key names require special attention
+
+The second couldn't be done with compromise.
+
+ 1. We either support all key names (including non fs compatible key names), which means `ofs/o3fs` can provide only a partial view
+ 2. Or we can normalize the key names to be fs compatible (which makes it possible to create inconsistent S3 keys)
+
+HDDS-3955 introduced `ozone.om.enable.filesystem.paths`, with this setting we will have two possible usage pattern:
+
+| ozone.om.enable.filesystem.paths= | true | false
+|-|-|-|
+| create itermediate dirs | YES | NO |
+| normalize key names from `ofs/o3fs` | YES | NO
+| force to normalize key names of `s3` interface | YES (1) | NO 
+| `s3` key `/a/b/c` available from `ofs/o3fs` | YES | NO
+| `s3` key `/a/b//c` available from `ofs/o3fs` | YES | NO
+| `s3` key `/a/b//c` available from `s3` | AWS S3 incompatibility | YES
+
+(1): Under implementation
+
+This proposal suggest to use a 3rd option where 100% AWS compatiblity is guaranteed in exchange of a limited `ofs/o3fs` view:
+
+| ozone.om.intermediate.dir.generation= | true |
+|-|-|-|
+| create itermediate dirs | YES | 
+| normalize key names from `ofs/o3fs` | YES |
+| force to normalize key names of `s3` interface | **NO** |
+| `s3` key `/a/b/c` available from `ofs/o3fs` | YES | 
+| `s3` key `/a/b//c` available from `ofs/o3fs` | NO | 
+| `s3` key `/a/b//c` available from `s3` | *YES* (100% AWS compatibility) |
+
+
+# Proposed solution
+
+In short: 
+
+ **I propose to make it possible to configure **normalization** and **intermediate dir creation**, independent from each other**
+ 
+It can be done in multiple ways. For the sake of simplicity, let's imagine two configuration option
+
+| configuration | behavior | 
+|-|-|
+| `ozone.om.enable.filesystem.paths=true`  | Enable intermediate dir generation **AND** key name normalization 

Review comment:
       bq. Existing config means CREATE_DIR+NORMALIZE, new config is just CREATE_DIR.
   
   Are these the only two cases that are useful? do we need to support other combinations?

##########
File path: hadoop-hdds/docs/content/design/s3_hcfs.md
##########
@@ -0,0 +1,282 @@
+---
+title: S3/Ozone Filesystem inter-op 
+summary: How to support both S3 and HCFS and the same time
+date: 2020-09-09
+jira: HDDS-4097
+status: draft
+author: Marton Elek, 
+---
+<!--
+  Licensed under the Apache License, Version 2.0 (the "License");
+  you may not use this file except in compliance with the License.
+  You may obtain a copy of the License at
+
+   http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License. See accompanying LICENSE file.
+-->
+
+# Ozone S3 vs file-system semantics
+
+Ozone is an object-store for Hadoop ecosystem which can be used from multiple interfaces: 
+
+ 1. From Hadoop Compatible File Systems (will be called as *HCFS* in the remaining of this document) (RPC)
+ 2. From S3 compatible applications (REST)
+ 3. From container orchestrator as mounted volume (CSI, alpha feature)
+
+As Ozone is an object store it stores key and values in a flat hierarchy which is enough to support S3 (2). But to support Hadoop Compatible File System (and CSI), Ozone should simulated file system hierarchy.
+
+There are multiple challenges when file system hierarchy is simulated by a flat namespace:
+
+ 1. Some key patterns couldn't be easily transformed to file system path (e.g. `/a/b/../c`, `/a/b//d`, or a real key with directory path like `/b/d/`)
+ 2. Directory entries (which may have own properties) require special handling as file system interface requires a dir entry even if it's not created explicitly (for example if key `/a/b/c` is created `/a/b` supposed to be a visible directory entry for file system interface) 
+ 3. Non-recursive listing of directories can be hard (Listing direct entries under `/a` should ignore all the `/a/b/...`, `/a/b/c/...` keys) 
+ 4. Similar to listing, rename can be a costly operation as it requires to rename many keys (renaming a first level directory means a rename of all the keys with the same prefix)
+
+See also the [Hadoop S3A documentation](https://hadoop.apache.org/docs/current/hadoop-aws/tools/hadoop-aws/index.html#Introducing_the_Hadoop_S3A_client) which describes some of these problem when AWS S3 is used. (*Warnings* section)
+
+# Current status
+
+As of today *Ozone Manager* has two different interfaces (both are defined in `OmClientProtocol.proto`): 
+
+ 1. object store related functions (like *CreateKey*, *LookupKey*, *DeleteKey*,...)  
+ 2. file system related functions (like *CreateFile*, *LookupFile*,...)
+
+File system related functions uses the same flat hierarchy under the hood but includes additional functionalities. For example the `createFile` call creates all the intermediate directories for a specific key (create file `/a/b/c` will create `/a/b` and `/a` entries in the key space)
+
+Today, if a key is created from the S3 interface can cause exceptions if the intermediate directories are checked from HCFS:
+
+
+```shell
+$ aws s3api put-object --endpoint http://localhost:9878 --bucket bucket1 --key /a/b/c/d
+
+$ ozone fs -ls o3fs://bucket1.s3v/a/
+ls: `o3fs://bucket1.s3v/a/': No such file or directory
+```
+
+This problem is reported in [HDDS-3955](https://issues.apache.org/jira/browse/HDDS-3955), where a new configuration key is introduced (`ozone.om.enable.filesystem.paths`). If this is enabled, intermediate directories are created even if the object store interface is used.
+
+This configuration is turned off by default, which means that S3 and HCFS couldn't be used together.
+
+To solve the performance problems of the directory listing / rename, [HDDS-2939](https://issues.apache.org/jira/browse/HDDS-2939) is created, which propose to use a new prefix table to store the "directory" entries (=prefixes).
+
+[HDDS-4097](https://issues.apache.org/jira/browse/HDDS-4097) is created to normalize the key names based on file-system semantics if `ozone.om.enable.filesystem.paths` is enabled. But please note that `ozone.om.enable.filesystem.paths` should always be turned on if S3 and HCFS are both used which means that S3 and HCFS couldn't be used together with normalization.
+
+# Goals
+
+ * Out of the box Ozone should support both S3 and HCFS interfaces without any settings. (It's possible only for the regular, fs compatible key names)
+ * As 100% compatibility couldn't be achieved on both side we need a configuration to set the expectations for incompatible key names
+ * Default behavior of `o3fs` and `ofs` should be as close to `s3a` as possible (when s3 compatibilty is prefered)
+
+# Possible cases to support
+
+There are two main aspects of supporting both `ofs/o3fs` and `s3` together:
+
+ 1. `ofs/o3fs` require to create intermediate directory entries (for exapmle `/a/b` for the key `/b/c/c`)
+ 2. Special file-system incompatible key names require special attention
+
+The second couldn't be done with compromise.
+
+ 1. We either support all key names (including non fs compatible key names), which means `ofs/o3fs` can provide only a partial view
+ 2. Or we can normalize the key names to be fs compatible (which makes it possible to create inconsistent S3 keys)
+
+HDDS-3955 introduced `ozone.om.enable.filesystem.paths`, with this setting we will have two possible usage pattern:
+
+| ozone.om.enable.filesystem.paths= | true | false
+|-|-|-|
+| create itermediate dirs | YES | NO |
+| normalize key names from `ofs/o3fs` | YES | NO
+| force to normalize key names of `s3` interface | YES (1) | NO 
+| `s3` key `/a/b/c` available from `ofs/o3fs` | YES | NO
+| `s3` key `/a/b//c` available from `ofs/o3fs` | YES | NO
+| `s3` key `/a/b//c` available from `s3` | AWS S3 incompatibility | YES
+
+(1): Under implementation
+
+This proposal suggest to use a 3rd option where 100% AWS compatiblity is guaranteed in exchange of a limited `ofs/o3fs` view:
+
+| ozone.om.intermediate.dir.generation= | true |
+|-|-|-|
+| create itermediate dirs | YES | 
+| normalize key names from `ofs/o3fs` | YES |
+| force to normalize key names of `s3` interface | **NO** |
+| `s3` key `/a/b/c` available from `ofs/o3fs` | YES | 
+| `s3` key `/a/b//c` available from `ofs/o3fs` | NO | 
+| `s3` key `/a/b//c` available from `s3` | *YES* (100% AWS compatibility) |
+
+
+# Proposed solution
+
+In short: 
+
+ **I propose to make it possible to configure **normalization** and **intermediate dir creation**, independent from each other**
+ 
+It can be done in multiple ways. For the sake of simplicity, let's imagine two configuration option
+
+| configuration | behavior | 
+|-|-|
+| `ozone.om.enable.filesystem.paths=true`  | Enable intermediate dir generation **AND** key name normalization 
+| `ozone.om.enable.intermediate.dirs=true` | Enable only the intermediate dir generation
+
+## S3/HCFS Interoperability
+
+**In case of intermediate directory generation is enabled (with either of the configuraiton keys)**:
+
+When somebody creates a new key like `/a/b/c/d`, the same key should be visible from HCFS (`o3fs//` or `o3://`). `/a`, `/a/b` and `/a/b/c` should be visible as directories from HCFS.
+
+S3 should list only the `/a/b/c/d` keys, (`/a`, `/a/b`, `/a/b/c` keys, created to help HCFS, **won't be visible** if the key is created from S3)
+
+This can be done with persisting an extra flag with the implicit directory entries. These entries can be modified if they are explicit created.
+
+This flag should be added only for the keys which are created by S3. `ofs://` and `of3fs://`  create explicit directories all the time.

Review comment:
       bq. S3 should list only the `/a/b/c/d` keys, (`/a`, `/a/b`, `/a/b/c` keys, created to help HCFS, **won't be visible** if the key is created from S3)
   
   what if user create additional files under intermediate dirs such as /a/b/d via FS interface, we still want to show them in this case for interop?
   

##########
File path: hadoop-hdds/docs/content/design/s3_hcfs.md
##########
@@ -0,0 +1,280 @@
+---
+title: S3/Ozone Filesystem inter-op 
+summary: How to support both S3 and HCFS and the same time
+date: 2020-09-09
+jira: HDDS-4097
+status: draft
+author: Marton Elek, 
+---
+<!--
+  Licensed under the Apache License, Version 2.0 (the "License");
+  you may not use this file except in compliance with the License.
+  You may obtain a copy of the License at
+
+   http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License. See accompanying LICENSE file.
+-->
+
+# Ozone S3 vs file-system semantics
+
+Ozone is an object-store for Hadoop ecosystem which can be used from multiple interfaces: 
+
+ 1. From Hadoop Compatible File Systems (will be called as *HCFS* in the remaining of this document) (RPC)
+ 2. From S3 compatible applications (REST)
+ 3. From container orchestrator as mounted volume (CSI, alpha feature)
+
+As Ozone is an object store it stores key and values in a flat hierarchy which is enough to support S3 (2). But to support Hadoop Compatible File System (and CSI), Ozone should simulated file system hierarchy.
+
+There are multiple challenges when file system hierarchy is simulated by a flat namespace:
+
+ 1. Some key patterns couldn't be easily transformed to file system path (e.g. `/a/b/../c`, `/a/b//d`, or a real key with directory path like `/b/d/`)
+ 2. Directory entries (which may have own properties) require special handling as file system interface requires a dir entry even if it's not created explicitly (for example if key `/a/b/c` is created `/a/b` supposed to be a visible directory entry for file system interface) 
+ 3. Non-recursive listing of directories can be hard (Listing direct entries under `/a` should ignore all the `/a/b/...`, `/a/b/c/...` keys) 
+ 4. Similar to listing, rename can be a costly operation as it requires to rename many keys (renaming a first level directory means a rename of all the keys with the same prefix)
+
+See also the [Hadoop S3A documentation](https://hadoop.apache.org/docs/current/hadoop-aws/tools/hadoop-aws/index.html#Introducing_the_Hadoop_S3A_client) which describes some of these problem when AWS S3 is used. (*Warnings* section)
+
+# Current status
+
+As of today *Ozone Manager* has two different interfaces (both are defined in `OmClientProtocol.proto`): 
+
+ 1. object store related functions (like *CreateKey*, *LookupKey*, *DeleteKey*,...)  
+ 2. file system related functions (like *CreateFile*, *LookupFile*,...)
+
+File system related functions uses the same flat hierarchy under the hood but includes additional functionalities. For example the `createFile` call creates all the intermediate directories for a specific key (create file `/a/b/c` will create `/a/b` and `/a` entries in the key space)
+
+Today, if a key is created from the S3 interface can cause exceptions if the intermediate directories are checked from HCFS:
+
+
+```shell
+$ aws s3api put-object --endpoint http://localhost:9878 --bucket bucket1 --key /a/b/c/d
+
+$ ozone fs -ls o3fs://bucket1.s3v/a/
+ls: `o3fs://bucket1.s3v/a/': No such file or directory
+```
+
+This problem is reported in [HDDS-3955](https://issues.apache.org/jira/browse/HDDS-3955), where a new configuration key is introduced (`ozone.om.enable.filesystem.paths`). If this is enabled, intermediate directories are created even if the object store interface is used.
+
+This configuration is turned off by default, which means that S3 and HCFS couldn't be used together.
+
+To solve the performance problems of the directory listing / rename, [HDDS-2939](https://issues.apache.org/jira/browse/HDDS-2939) is created, which propose to use a new prefix table to store the "directory" entries (=prefixes).
+
+[HDDS-4097](https://issues.apache.org/jira/browse/HDDS-4097) is created to normalize the key names based on file-system semantics if `ozone.om.enable.filesystem.paths` is enabled. But please note that `ozone.om.enable.filesystem.paths` should always be turned on if S3 and HCFS are both used. It means that if both S3 and HCFS are used, normalization is forced, and S3 interface is not fully AWS S3 compatible. There is no option to use HCFS and S3 but with full AWS compatibility (and reduced HCFS compatibility). 
+
+# Goals
+
+ * Out of the box Ozone should support both S3 and HCFS interfaces without any settings. (It's possible only for the regular, fs compatible key names)
+ * As 100% compatibility couldn't be achieved on both side we need a configuration to set the expectations for incompatible key names
+ * Default behavior of `o3fs` and `ofs` should be as close to `s3a` as possible (when s3 compatibilty is prefered)
+
+# Possible cases to support
+
+There are two main aspects of supporting both `ofs/o3fs` and `s3` together:
+
+ 1. `ofs/o3fs` require to create intermediate directory entries (for example `/a/b` for the key `/b/c/c`)
+ 2. Special file-system incompatible key names require special attention
+
+The second couldn't be done with compromise.
+
+ 1. We either support all key names (including non fs compatible key names), which means `ofs/o3fs` can provide only a partial view
+ 2. Or we can normalize the key names to be fs compatible (which makes it possible to create inconsistent S3 keys)
+
+HDDS-3955 introduced `ozone.om.enable.filesystem.paths`, with this setting we will have two possible usage pattern:
+
+| ozone.om.enable.filesystem.paths= | true | false
+|-|-|-|
+| create itermediate dirs | YES | NO |
+| normalize key names from `ofs/o3fs` | YES | NO
+| force to normalize key names of `s3` interface | YES | NO 
+| `s3` key `/a/b/c` available from `ofs/o3fs` | YES | NO
+| `s3` key `/a/b//c` available from `ofs/o3fs` | YES | NO
+| `s3` key `/a/b//c` available from `s3` | AWS S3 incompatibility | YES
+
+This proposal suggest to use a 3rd option where 100% AWS compatiblity is guaranteed in exchange of a limited `ofs/o3fs` view:
+
+| ozone.om.intermediate.dir.generation= | true |
+|-|-|-|
+| create itermediate dirs | YES | 
+| normalize key names from `ofs/o3fs` | YES |
+| force to normalize key names of `s3` interface | **NO** |
+| `s3` key `/a/b/c` available from `ofs/o3fs` | YES | 
+| `s3` key `/a/b//c` available from `ofs/o3fs` | NO | 
+| `s3` key `/a/b//c` available from `s3` | *YES* (100% AWS compatibility) |
+
+
+# Proposed solution
+
+In short: 
+
+ **I propose to make it possible to configure **normalization** and **intermediate dir creation**, independent from each other**
+ 
+It can be done in multiple ways. For the sake of simplicity, let's imagine two configuration option
+
+| configuration | behavior | 
+|-|-|
+| `ozone.om.enable.filesystem.paths=true`  | Enable intermediate dir generation **AND** key name normalization 
+| `ozone.om.enable.intermediate.dirs=true` | Enable only the intermediate dir generation
+
+## S3/HCFS Interoperability
+
+**In case of intermediate directory generation is enabled (with either of the configuraiton keys)**:
+
+When somebody creates a new key like `/a/b/c/d`, the same key should be visible from HCFS (`o3fs//` or `o3://`). `/a`, `/a/b` and `/a/b/c` should be visible as directories from HCFS.
+
+S3 should list only the `/a/b/c/d` keys, (`/a`, `/a/b`, `/a/b/c` keys, created to help HCFS, **won't be visible** if the key is created from S3)
+
+This can be done with persisting an extra flag with the implicit directory entries. These entries can be modified if they are explicit created.
+
+This flag should be added only for the keys which are created by S3. `ofs://` and `of3fs://`  create explicit directories all the time.
+
+Advantages of this approach:
+
+ 1. HCFS and S3 can work together
+ 2. S3 behavior is closer to the original AWS s3 behavior (when `/a/b/c` key is created `/a/b` won't be visible)
+
+## Handling of the incompatible paths
+
+As it's defined above the intermediate directory generation and normalization are two independent settings. (It's possible to choose only to create the intermediate directories).
+
+**If normalization is choosen**: (`ozone.om.enable.filesystem.paths=true`), all the key names will be normalized to fs-compatible name. It may cause a conflict (error) if the normalized key is already exists (or exists as a file instead of directory)
+
+**Without normalization (`ozone.om.enable.intermediate.dirs=true`)**:
+
+Creating intermediate directories might not be possible if path contains illegal characters or can't be parsed as a file system path. **These keys will be invisible from HCFS** by default. They will be ignored during the normal file list.

Review comment:
       The illegal char from S3 path can be encoded into FS path except the /. 

##########
File path: hadoop-hdds/docs/content/design/s3_hcfs.md
##########
@@ -0,0 +1,280 @@
+---
+title: S3/Ozone Filesystem inter-op 
+summary: How to support both S3 and HCFS and the same time
+date: 2020-09-09
+jira: HDDS-4097
+status: draft
+author: Marton Elek, 
+---
+<!--
+  Licensed under the Apache License, Version 2.0 (the "License");
+  you may not use this file except in compliance with the License.
+  You may obtain a copy of the License at
+
+   http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License. See accompanying LICENSE file.
+-->
+
+# Ozone S3 vs file-system semantics
+
+Ozone is an object-store for Hadoop ecosystem which can be used from multiple interfaces: 
+
+ 1. From Hadoop Compatible File Systems (will be called as *HCFS* in the remaining of this document) (RPC)
+ 2. From S3 compatible applications (REST)
+ 3. From container orchestrator as mounted volume (CSI, alpha feature)
+
+As Ozone is an object store it stores key and values in a flat hierarchy which is enough to support S3 (2). But to support Hadoop Compatible File System (and CSI), Ozone should simulated file system hierarchy.
+
+There are multiple challenges when file system hierarchy is simulated by a flat namespace:
+
+ 1. Some key patterns couldn't be easily transformed to file system path (e.g. `/a/b/../c`, `/a/b//d`, or a real key with directory path like `/b/d/`)
+ 2. Directory entries (which may have own properties) require special handling as file system interface requires a dir entry even if it's not created explicitly (for example if key `/a/b/c` is created `/a/b` supposed to be a visible directory entry for file system interface) 
+ 3. Non-recursive listing of directories can be hard (Listing direct entries under `/a` should ignore all the `/a/b/...`, `/a/b/c/...` keys) 
+ 4. Similar to listing, rename can be a costly operation as it requires to rename many keys (renaming a first level directory means a rename of all the keys with the same prefix)
+
+See also the [Hadoop S3A documentation](https://hadoop.apache.org/docs/current/hadoop-aws/tools/hadoop-aws/index.html#Introducing_the_Hadoop_S3A_client) which describes some of these problem when AWS S3 is used. (*Warnings* section)
+
+# Current status
+
+As of today *Ozone Manager* has two different interfaces (both are defined in `OmClientProtocol.proto`): 
+
+ 1. object store related functions (like *CreateKey*, *LookupKey*, *DeleteKey*,...)  
+ 2. file system related functions (like *CreateFile*, *LookupFile*,...)
+
+File system related functions uses the same flat hierarchy under the hood but includes additional functionalities. For example the `createFile` call creates all the intermediate directories for a specific key (create file `/a/b/c` will create `/a/b` and `/a` entries in the key space)
+
+Today, if a key is created from the S3 interface can cause exceptions if the intermediate directories are checked from HCFS:
+
+
+```shell
+$ aws s3api put-object --endpoint http://localhost:9878 --bucket bucket1 --key /a/b/c/d
+
+$ ozone fs -ls o3fs://bucket1.s3v/a/
+ls: `o3fs://bucket1.s3v/a/': No such file or directory
+```
+
+This problem is reported in [HDDS-3955](https://issues.apache.org/jira/browse/HDDS-3955), where a new configuration key is introduced (`ozone.om.enable.filesystem.paths`). If this is enabled, intermediate directories are created even if the object store interface is used.
+
+This configuration is turned off by default, which means that S3 and HCFS couldn't be used together.
+
+To solve the performance problems of the directory listing / rename, [HDDS-2939](https://issues.apache.org/jira/browse/HDDS-2939) is created, which propose to use a new prefix table to store the "directory" entries (=prefixes).
+
+[HDDS-4097](https://issues.apache.org/jira/browse/HDDS-4097) is created to normalize the key names based on file-system semantics if `ozone.om.enable.filesystem.paths` is enabled. But please note that `ozone.om.enable.filesystem.paths` should always be turned on if S3 and HCFS are both used. It means that if both S3 and HCFS are used, normalization is forced, and S3 interface is not fully AWS S3 compatible. There is no option to use HCFS and S3 but with full AWS compatibility (and reduced HCFS compatibility). 
+
+# Goals
+
+ * Out of the box Ozone should support both S3 and HCFS interfaces without any settings. (It's possible only for the regular, fs compatible key names)
+ * As 100% compatibility couldn't be achieved on both side we need a configuration to set the expectations for incompatible key names
+ * Default behavior of `o3fs` and `ofs` should be as close to `s3a` as possible (when s3 compatibilty is prefered)
+
+# Possible cases to support
+
+There are two main aspects of supporting both `ofs/o3fs` and `s3` together:
+
+ 1. `ofs/o3fs` require to create intermediate directory entries (for example `/a/b` for the key `/b/c/c`)

Review comment:
       /b/c/c => /a/b/c

##########
File path: hadoop-hdds/docs/content/design/s3_hcfs.md
##########
@@ -0,0 +1,280 @@
+---
+title: S3/Ozone Filesystem inter-op 
+summary: How to support both S3 and HCFS and the same time
+date: 2020-09-09
+jira: HDDS-4097
+status: draft
+author: Marton Elek, 
+---
+<!--
+  Licensed under the Apache License, Version 2.0 (the "License");
+  you may not use this file except in compliance with the License.
+  You may obtain a copy of the License at
+
+   http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License. See accompanying LICENSE file.
+-->
+
+# Ozone S3 vs file-system semantics
+
+Ozone is an object-store for Hadoop ecosystem which can be used from multiple interfaces: 
+
+ 1. From Hadoop Compatible File Systems (will be called as *HCFS* in the remaining of this document) (RPC)
+ 2. From S3 compatible applications (REST)
+ 3. From container orchestrator as mounted volume (CSI, alpha feature)
+
+As Ozone is an object store it stores key and values in a flat hierarchy which is enough to support S3 (2). But to support Hadoop Compatible File System (and CSI), Ozone should simulated file system hierarchy.
+
+There are multiple challenges when file system hierarchy is simulated by a flat namespace:
+
+ 1. Some key patterns couldn't be easily transformed to file system path (e.g. `/a/b/../c`, `/a/b//d`, or a real key with directory path like `/b/d/`)
+ 2. Directory entries (which may have own properties) require special handling as file system interface requires a dir entry even if it's not created explicitly (for example if key `/a/b/c` is created `/a/b` supposed to be a visible directory entry for file system interface) 
+ 3. Non-recursive listing of directories can be hard (Listing direct entries under `/a` should ignore all the `/a/b/...`, `/a/b/c/...` keys) 
+ 4. Similar to listing, rename can be a costly operation as it requires to rename many keys (renaming a first level directory means a rename of all the keys with the same prefix)
+
+See also the [Hadoop S3A documentation](https://hadoop.apache.org/docs/current/hadoop-aws/tools/hadoop-aws/index.html#Introducing_the_Hadoop_S3A_client) which describes some of these problem when AWS S3 is used. (*Warnings* section)
+
+# Current status
+
+As of today *Ozone Manager* has two different interfaces (both are defined in `OmClientProtocol.proto`): 
+
+ 1. object store related functions (like *CreateKey*, *LookupKey*, *DeleteKey*,...)  
+ 2. file system related functions (like *CreateFile*, *LookupFile*,...)
+
+File system related functions uses the same flat hierarchy under the hood but includes additional functionalities. For example the `createFile` call creates all the intermediate directories for a specific key (create file `/a/b/c` will create `/a/b` and `/a` entries in the key space)
+
+Today, if a key is created from the S3 interface can cause exceptions if the intermediate directories are checked from HCFS:
+
+
+```shell
+$ aws s3api put-object --endpoint http://localhost:9878 --bucket bucket1 --key /a/b/c/d
+
+$ ozone fs -ls o3fs://bucket1.s3v/a/
+ls: `o3fs://bucket1.s3v/a/': No such file or directory
+```
+
+This problem is reported in [HDDS-3955](https://issues.apache.org/jira/browse/HDDS-3955), where a new configuration key is introduced (`ozone.om.enable.filesystem.paths`). If this is enabled, intermediate directories are created even if the object store interface is used.
+
+This configuration is turned off by default, which means that S3 and HCFS couldn't be used together.
+
+To solve the performance problems of the directory listing / rename, [HDDS-2939](https://issues.apache.org/jira/browse/HDDS-2939) is created, which propose to use a new prefix table to store the "directory" entries (=prefixes).
+
+[HDDS-4097](https://issues.apache.org/jira/browse/HDDS-4097) is created to normalize the key names based on file-system semantics if `ozone.om.enable.filesystem.paths` is enabled. But please note that `ozone.om.enable.filesystem.paths` should always be turned on if S3 and HCFS are both used. It means that if both S3 and HCFS are used, normalization is forced, and S3 interface is not fully AWS S3 compatible. There is no option to use HCFS and S3 but with full AWS compatibility (and reduced HCFS compatibility). 
+
+# Goals
+
+ * Out of the box Ozone should support both S3 and HCFS interfaces without any settings. (It's possible only for the regular, fs compatible key names)
+ * As 100% compatibility couldn't be achieved on both side we need a configuration to set the expectations for incompatible key names
+ * Default behavior of `o3fs` and `ofs` should be as close to `s3a` as possible (when s3 compatibilty is prefered)
+
+# Possible cases to support
+
+There are two main aspects of supporting both `ofs/o3fs` and `s3` together:
+
+ 1. `ofs/o3fs` require to create intermediate directory entries (for example `/a/b` for the key `/b/c/c`)
+ 2. Special file-system incompatible key names require special attention
+
+The second couldn't be done with compromise.
+
+ 1. We either support all key names (including non fs compatible key names), which means `ofs/o3fs` can provide only a partial view
+ 2. Or we can normalize the key names to be fs compatible (which makes it possible to create inconsistent S3 keys)
+
+HDDS-3955 introduced `ozone.om.enable.filesystem.paths`, with this setting we will have two possible usage pattern:
+
+| ozone.om.enable.filesystem.paths= | true | false
+|-|-|-|
+| create itermediate dirs | YES | NO |
+| normalize key names from `ofs/o3fs` | YES | NO
+| force to normalize key names of `s3` interface | YES | NO 
+| `s3` key `/a/b/c` available from `ofs/o3fs` | YES | NO
+| `s3` key `/a/b//c` available from `ofs/o3fs` | YES | NO
+| `s3` key `/a/b//c` available from `s3` | AWS S3 incompatibility | YES

Review comment:
       Any pointer to S3 compatibility requirement? e.g., path handling, normalization, etc.

##########
File path: hadoop-hdds/docs/content/design/s3_hcfs.md
##########
@@ -0,0 +1,282 @@
+---
+title: S3/Ozone Filesystem inter-op 
+summary: How to support both S3 and HCFS and the same time
+date: 2020-09-09
+jira: HDDS-4097
+status: draft
+author: Marton Elek, 
+---
+<!--
+  Licensed under the Apache License, Version 2.0 (the "License");
+  you may not use this file except in compliance with the License.
+  You may obtain a copy of the License at
+
+   http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License. See accompanying LICENSE file.
+-->
+
+# Ozone S3 vs file-system semantics
+
+Ozone is an object-store for Hadoop ecosystem which can be used from multiple interfaces: 
+
+ 1. From Hadoop Compatible File Systems (will be called as *HCFS* in the remaining of this document) (RPC)
+ 2. From S3 compatible applications (REST)
+ 3. From container orchestrator as mounted volume (CSI, alpha feature)
+
+As Ozone is an object store it stores key and values in a flat hierarchy which is enough to support S3 (2). But to support Hadoop Compatible File System (and CSI), Ozone should simulated file system hierarchy.
+
+There are multiple challenges when file system hierarchy is simulated by a flat namespace:
+
+ 1. Some key patterns couldn't be easily transformed to file system path (e.g. `/a/b/../c`, `/a/b//d`, or a real key with directory path like `/b/d/`)
+ 2. Directory entries (which may have own properties) require special handling as file system interface requires a dir entry even if it's not created explicitly (for example if key `/a/b/c` is created `/a/b` supposed to be a visible directory entry for file system interface) 
+ 3. Non-recursive listing of directories can be hard (Listing direct entries under `/a` should ignore all the `/a/b/...`, `/a/b/c/...` keys) 
+ 4. Similar to listing, rename can be a costly operation as it requires to rename many keys (renaming a first level directory means a rename of all the keys with the same prefix)
+
+See also the [Hadoop S3A documentation](https://hadoop.apache.org/docs/current/hadoop-aws/tools/hadoop-aws/index.html#Introducing_the_Hadoop_S3A_client) which describes some of these problem when AWS S3 is used. (*Warnings* section)
+
+# Current status
+
+As of today *Ozone Manager* has two different interfaces (both are defined in `OmClientProtocol.proto`): 
+
+ 1. object store related functions (like *CreateKey*, *LookupKey*, *DeleteKey*,...)  
+ 2. file system related functions (like *CreateFile*, *LookupFile*,...)
+
+File system related functions uses the same flat hierarchy under the hood but includes additional functionalities. For example the `createFile` call creates all the intermediate directories for a specific key (create file `/a/b/c` will create `/a/b` and `/a` entries in the key space)
+
+Today, if a key is created from the S3 interface can cause exceptions if the intermediate directories are checked from HCFS:
+
+
+```shell
+$ aws s3api put-object --endpoint http://localhost:9878 --bucket bucket1 --key /a/b/c/d
+
+$ ozone fs -ls o3fs://bucket1.s3v/a/
+ls: `o3fs://bucket1.s3v/a/': No such file or directory
+```
+
+This problem is reported in [HDDS-3955](https://issues.apache.org/jira/browse/HDDS-3955), where a new configuration key is introduced (`ozone.om.enable.filesystem.paths`). If this is enabled, intermediate directories are created even if the object store interface is used.
+
+This configuration is turned off by default, which means that S3 and HCFS couldn't be used together.
+
+To solve the performance problems of the directory listing / rename, [HDDS-2939](https://issues.apache.org/jira/browse/HDDS-2939) is created, which propose to use a new prefix table to store the "directory" entries (=prefixes).
+
+[HDDS-4097](https://issues.apache.org/jira/browse/HDDS-4097) is created to normalize the key names based on file-system semantics if `ozone.om.enable.filesystem.paths` is enabled. But please note that `ozone.om.enable.filesystem.paths` should always be turned on if S3 and HCFS are both used which means that S3 and HCFS couldn't be used together with normalization.
+
+# Goals
+
+ * Out of the box Ozone should support both S3 and HCFS interfaces without any settings. (It's possible only for the regular, fs compatible key names)
+ * As 100% compatibility couldn't be achieved on both side we need a configuration to set the expectations for incompatible key names
+ * Default behavior of `o3fs` and `ofs` should be as close to `s3a` as possible (when s3 compatibilty is prefered)
+
+# Possible cases to support
+
+There are two main aspects of supporting both `ofs/o3fs` and `s3` together:
+
+ 1. `ofs/o3fs` require to create intermediate directory entries (for exapmle `/a/b` for the key `/b/c/c`)
+ 2. Special file-system incompatible key names require special attention
+
+The second couldn't be done with compromise.
+
+ 1. We either support all key names (including non fs compatible key names), which means `ofs/o3fs` can provide only a partial view
+ 2. Or we can normalize the key names to be fs compatible (which makes it possible to create inconsistent S3 keys)
+
+HDDS-3955 introduced `ozone.om.enable.filesystem.paths`, with this setting we will have two possible usage pattern:
+
+| ozone.om.enable.filesystem.paths= | true | false
+|-|-|-|
+| create itermediate dirs | YES | NO |
+| normalize key names from `ofs/o3fs` | YES | NO
+| force to normalize key names of `s3` interface | YES (1) | NO 
+| `s3` key `/a/b/c` available from `ofs/o3fs` | YES | NO
+| `s3` key `/a/b//c` available from `ofs/o3fs` | YES | NO
+| `s3` key `/a/b//c` available from `s3` | AWS S3 incompatibility | YES
+
+(1): Under implementation
+
+This proposal suggest to use a 3rd option where 100% AWS compatiblity is guaranteed in exchange of a limited `ofs/o3fs` view:
+
+| ozone.om.intermediate.dir.generation= | true |
+|-|-|-|
+| create itermediate dirs | YES | 
+| normalize key names from `ofs/o3fs` | YES |
+| force to normalize key names of `s3` interface | **NO** |
+| `s3` key `/a/b/c` available from `ofs/o3fs` | YES | 
+| `s3` key `/a/b//c` available from `ofs/o3fs` | NO | 
+| `s3` key `/a/b//c` available from `s3` | *YES* (100% AWS compatibility) |
+
+
+# Proposed solution
+
+In short: 
+
+ **I propose to make it possible to configure **normalization** and **intermediate dir creation**, independent from each other**
+ 
+It can be done in multiple ways. For the sake of simplicity, let's imagine two configuration option
+
+| configuration | behavior | 
+|-|-|
+| `ozone.om.enable.filesystem.paths=true`  | Enable intermediate dir generation **AND** key name normalization 

Review comment:
       bq. Existing config means CREATE_DIR+NORMALIZE, new config is just CREATE_DIR.
   
   Are these the only two cases that are useful? do we need to support other combinations?

##########
File path: hadoop-hdds/docs/content/design/s3_hcfs.md
##########
@@ -0,0 +1,282 @@
+---
+title: S3/Ozone Filesystem inter-op 
+summary: How to support both S3 and HCFS and the same time
+date: 2020-09-09
+jira: HDDS-4097
+status: draft
+author: Marton Elek, 
+---
+<!--
+  Licensed under the Apache License, Version 2.0 (the "License");
+  you may not use this file except in compliance with the License.
+  You may obtain a copy of the License at
+
+   http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License. See accompanying LICENSE file.
+-->
+
+# Ozone S3 vs file-system semantics
+
+Ozone is an object-store for Hadoop ecosystem which can be used from multiple interfaces: 
+
+ 1. From Hadoop Compatible File Systems (will be called as *HCFS* in the remaining of this document) (RPC)
+ 2. From S3 compatible applications (REST)
+ 3. From container orchestrator as mounted volume (CSI, alpha feature)
+
+As Ozone is an object store it stores key and values in a flat hierarchy which is enough to support S3 (2). But to support Hadoop Compatible File System (and CSI), Ozone should simulated file system hierarchy.
+
+There are multiple challenges when file system hierarchy is simulated by a flat namespace:
+
+ 1. Some key patterns couldn't be easily transformed to file system path (e.g. `/a/b/../c`, `/a/b//d`, or a real key with directory path like `/b/d/`)
+ 2. Directory entries (which may have own properties) require special handling as file system interface requires a dir entry even if it's not created explicitly (for example if key `/a/b/c` is created `/a/b` supposed to be a visible directory entry for file system interface) 
+ 3. Non-recursive listing of directories can be hard (Listing direct entries under `/a` should ignore all the `/a/b/...`, `/a/b/c/...` keys) 
+ 4. Similar to listing, rename can be a costly operation as it requires to rename many keys (renaming a first level directory means a rename of all the keys with the same prefix)
+
+See also the [Hadoop S3A documentation](https://hadoop.apache.org/docs/current/hadoop-aws/tools/hadoop-aws/index.html#Introducing_the_Hadoop_S3A_client) which describes some of these problem when AWS S3 is used. (*Warnings* section)
+
+# Current status
+
+As of today *Ozone Manager* has two different interfaces (both are defined in `OmClientProtocol.proto`): 
+
+ 1. object store related functions (like *CreateKey*, *LookupKey*, *DeleteKey*,...)  
+ 2. file system related functions (like *CreateFile*, *LookupFile*,...)
+
+File system related functions uses the same flat hierarchy under the hood but includes additional functionalities. For example the `createFile` call creates all the intermediate directories for a specific key (create file `/a/b/c` will create `/a/b` and `/a` entries in the key space)
+
+Today, if a key is created from the S3 interface can cause exceptions if the intermediate directories are checked from HCFS:
+
+
+```shell
+$ aws s3api put-object --endpoint http://localhost:9878 --bucket bucket1 --key /a/b/c/d
+
+$ ozone fs -ls o3fs://bucket1.s3v/a/
+ls: `o3fs://bucket1.s3v/a/': No such file or directory
+```
+
+This problem is reported in [HDDS-3955](https://issues.apache.org/jira/browse/HDDS-3955), where a new configuration key is introduced (`ozone.om.enable.filesystem.paths`). If this is enabled, intermediate directories are created even if the object store interface is used.
+
+This configuration is turned off by default, which means that S3 and HCFS couldn't be used together.
+
+To solve the performance problems of the directory listing / rename, [HDDS-2939](https://issues.apache.org/jira/browse/HDDS-2939) is created, which propose to use a new prefix table to store the "directory" entries (=prefixes).
+
+[HDDS-4097](https://issues.apache.org/jira/browse/HDDS-4097) is created to normalize the key names based on file-system semantics if `ozone.om.enable.filesystem.paths` is enabled. But please note that `ozone.om.enable.filesystem.paths` should always be turned on if S3 and HCFS are both used which means that S3 and HCFS couldn't be used together with normalization.
+
+# Goals
+
+ * Out of the box Ozone should support both S3 and HCFS interfaces without any settings. (It's possible only for the regular, fs compatible key names)
+ * As 100% compatibility couldn't be achieved on both side we need a configuration to set the expectations for incompatible key names
+ * Default behavior of `o3fs` and `ofs` should be as close to `s3a` as possible (when s3 compatibilty is prefered)
+
+# Possible cases to support
+
+There are two main aspects of supporting both `ofs/o3fs` and `s3` together:
+
+ 1. `ofs/o3fs` require to create intermediate directory entries (for exapmle `/a/b` for the key `/b/c/c`)
+ 2. Special file-system incompatible key names require special attention
+
+The second couldn't be done with compromise.
+
+ 1. We either support all key names (including non fs compatible key names), which means `ofs/o3fs` can provide only a partial view
+ 2. Or we can normalize the key names to be fs compatible (which makes it possible to create inconsistent S3 keys)
+
+HDDS-3955 introduced `ozone.om.enable.filesystem.paths`, with this setting we will have two possible usage pattern:
+
+| ozone.om.enable.filesystem.paths= | true | false
+|-|-|-|
+| create itermediate dirs | YES | NO |
+| normalize key names from `ofs/o3fs` | YES | NO
+| force to normalize key names of `s3` interface | YES (1) | NO 
+| `s3` key `/a/b/c` available from `ofs/o3fs` | YES | NO
+| `s3` key `/a/b//c` available from `ofs/o3fs` | YES | NO
+| `s3` key `/a/b//c` available from `s3` | AWS S3 incompatibility | YES
+
+(1): Under implementation
+
+This proposal suggest to use a 3rd option where 100% AWS compatiblity is guaranteed in exchange of a limited `ofs/o3fs` view:
+
+| ozone.om.intermediate.dir.generation= | true |
+|-|-|-|
+| create itermediate dirs | YES | 
+| normalize key names from `ofs/o3fs` | YES |
+| force to normalize key names of `s3` interface | **NO** |
+| `s3` key `/a/b/c` available from `ofs/o3fs` | YES | 
+| `s3` key `/a/b//c` available from `ofs/o3fs` | NO | 
+| `s3` key `/a/b//c` available from `s3` | *YES* (100% AWS compatibility) |
+
+
+# Proposed solution
+
+In short: 
+
+ **I propose to make it possible to configure **normalization** and **intermediate dir creation**, independent from each other**
+ 
+It can be done in multiple ways. For the sake of simplicity, let's imagine two configuration option
+
+| configuration | behavior | 
+|-|-|
+| `ozone.om.enable.filesystem.paths=true`  | Enable intermediate dir generation **AND** key name normalization 
+| `ozone.om.enable.intermediate.dirs=true` | Enable only the intermediate dir generation
+
+## S3/HCFS Interoperability
+
+**In case of intermediate directory generation is enabled (with either of the configuraiton keys)**:
+
+When somebody creates a new key like `/a/b/c/d`, the same key should be visible from HCFS (`o3fs//` or `o3://`). `/a`, `/a/b` and `/a/b/c` should be visible as directories from HCFS.
+
+S3 should list only the `/a/b/c/d` keys, (`/a`, `/a/b`, `/a/b/c` keys, created to help HCFS, **won't be visible** if the key is created from S3)
+
+This can be done with persisting an extra flag with the implicit directory entries. These entries can be modified if they are explicit created.
+
+This flag should be added only for the keys which are created by S3. `ofs://` and `of3fs://`  create explicit directories all the time.

Review comment:
       bq. S3 should list only the `/a/b/c/d` keys, (`/a`, `/a/b`, `/a/b/c` keys, created to help HCFS, **won't be visible** if the key is created from S3)
   
   what if user create additional files under intermediate dirs such as /a/b/d via FS interface, we still want to show them in this case for interop?
   

##########
File path: hadoop-hdds/docs/content/design/s3_hcfs.md
##########
@@ -0,0 +1,280 @@
+---
+title: S3/Ozone Filesystem inter-op 
+summary: How to support both S3 and HCFS and the same time
+date: 2020-09-09
+jira: HDDS-4097
+status: draft
+author: Marton Elek, 
+---
+<!--
+  Licensed under the Apache License, Version 2.0 (the "License");
+  you may not use this file except in compliance with the License.
+  You may obtain a copy of the License at
+
+   http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License. See accompanying LICENSE file.
+-->
+
+# Ozone S3 vs file-system semantics
+
+Ozone is an object-store for Hadoop ecosystem which can be used from multiple interfaces: 
+
+ 1. From Hadoop Compatible File Systems (will be called as *HCFS* in the remaining of this document) (RPC)
+ 2. From S3 compatible applications (REST)
+ 3. From container orchestrator as mounted volume (CSI, alpha feature)
+
+As Ozone is an object store it stores key and values in a flat hierarchy which is enough to support S3 (2). But to support Hadoop Compatible File System (and CSI), Ozone should simulated file system hierarchy.
+
+There are multiple challenges when file system hierarchy is simulated by a flat namespace:
+
+ 1. Some key patterns couldn't be easily transformed to file system path (e.g. `/a/b/../c`, `/a/b//d`, or a real key with directory path like `/b/d/`)
+ 2. Directory entries (which may have own properties) require special handling as file system interface requires a dir entry even if it's not created explicitly (for example if key `/a/b/c` is created `/a/b` supposed to be a visible directory entry for file system interface) 
+ 3. Non-recursive listing of directories can be hard (Listing direct entries under `/a` should ignore all the `/a/b/...`, `/a/b/c/...` keys) 
+ 4. Similar to listing, rename can be a costly operation as it requires to rename many keys (renaming a first level directory means a rename of all the keys with the same prefix)
+
+See also the [Hadoop S3A documentation](https://hadoop.apache.org/docs/current/hadoop-aws/tools/hadoop-aws/index.html#Introducing_the_Hadoop_S3A_client) which describes some of these problem when AWS S3 is used. (*Warnings* section)
+
+# Current status
+
+As of today *Ozone Manager* has two different interfaces (both are defined in `OmClientProtocol.proto`): 
+
+ 1. object store related functions (like *CreateKey*, *LookupKey*, *DeleteKey*,...)  
+ 2. file system related functions (like *CreateFile*, *LookupFile*,...)
+
+File system related functions uses the same flat hierarchy under the hood but includes additional functionalities. For example the `createFile` call creates all the intermediate directories for a specific key (create file `/a/b/c` will create `/a/b` and `/a` entries in the key space)
+
+Today, if a key is created from the S3 interface can cause exceptions if the intermediate directories are checked from HCFS:
+
+
+```shell
+$ aws s3api put-object --endpoint http://localhost:9878 --bucket bucket1 --key /a/b/c/d
+
+$ ozone fs -ls o3fs://bucket1.s3v/a/
+ls: `o3fs://bucket1.s3v/a/': No such file or directory
+```
+
+This problem is reported in [HDDS-3955](https://issues.apache.org/jira/browse/HDDS-3955), where a new configuration key is introduced (`ozone.om.enable.filesystem.paths`). If this is enabled, intermediate directories are created even if the object store interface is used.
+
+This configuration is turned off by default, which means that S3 and HCFS couldn't be used together.
+
+To solve the performance problems of the directory listing / rename, [HDDS-2939](https://issues.apache.org/jira/browse/HDDS-2939) is created, which propose to use a new prefix table to store the "directory" entries (=prefixes).
+
+[HDDS-4097](https://issues.apache.org/jira/browse/HDDS-4097) is created to normalize the key names based on file-system semantics if `ozone.om.enable.filesystem.paths` is enabled. But please note that `ozone.om.enable.filesystem.paths` should always be turned on if S3 and HCFS are both used. It means that if both S3 and HCFS are used, normalization is forced, and S3 interface is not fully AWS S3 compatible. There is no option to use HCFS and S3 but with full AWS compatibility (and reduced HCFS compatibility). 
+
+# Goals
+
+ * Out of the box Ozone should support both S3 and HCFS interfaces without any settings. (It's possible only for the regular, fs compatible key names)
+ * As 100% compatibility couldn't be achieved on both side we need a configuration to set the expectations for incompatible key names
+ * Default behavior of `o3fs` and `ofs` should be as close to `s3a` as possible (when s3 compatibilty is prefered)
+
+# Possible cases to support
+
+There are two main aspects of supporting both `ofs/o3fs` and `s3` together:
+
+ 1. `ofs/o3fs` require to create intermediate directory entries (for example `/a/b` for the key `/b/c/c`)
+ 2. Special file-system incompatible key names require special attention
+
+The second couldn't be done with compromise.
+
+ 1. We either support all key names (including non fs compatible key names), which means `ofs/o3fs` can provide only a partial view
+ 2. Or we can normalize the key names to be fs compatible (which makes it possible to create inconsistent S3 keys)
+
+HDDS-3955 introduced `ozone.om.enable.filesystem.paths`, with this setting we will have two possible usage pattern:
+
+| ozone.om.enable.filesystem.paths= | true | false
+|-|-|-|
+| create itermediate dirs | YES | NO |
+| normalize key names from `ofs/o3fs` | YES | NO
+| force to normalize key names of `s3` interface | YES | NO 
+| `s3` key `/a/b/c` available from `ofs/o3fs` | YES | NO
+| `s3` key `/a/b//c` available from `ofs/o3fs` | YES | NO
+| `s3` key `/a/b//c` available from `s3` | AWS S3 incompatibility | YES
+
+This proposal suggest to use a 3rd option where 100% AWS compatiblity is guaranteed in exchange of a limited `ofs/o3fs` view:
+
+| ozone.om.intermediate.dir.generation= | true |
+|-|-|-|
+| create itermediate dirs | YES | 
+| normalize key names from `ofs/o3fs` | YES |
+| force to normalize key names of `s3` interface | **NO** |
+| `s3` key `/a/b/c` available from `ofs/o3fs` | YES | 
+| `s3` key `/a/b//c` available from `ofs/o3fs` | NO | 
+| `s3` key `/a/b//c` available from `s3` | *YES* (100% AWS compatibility) |
+
+
+# Proposed solution
+
+In short: 
+
+ **I propose to make it possible to configure **normalization** and **intermediate dir creation**, independent from each other**
+ 
+It can be done in multiple ways. For the sake of simplicity, let's imagine two configuration option
+
+| configuration | behavior | 
+|-|-|
+| `ozone.om.enable.filesystem.paths=true`  | Enable intermediate dir generation **AND** key name normalization 
+| `ozone.om.enable.intermediate.dirs=true` | Enable only the intermediate dir generation
+
+## S3/HCFS Interoperability
+
+**In case of intermediate directory generation is enabled (with either of the configuraiton keys)**:
+
+When somebody creates a new key like `/a/b/c/d`, the same key should be visible from HCFS (`o3fs//` or `o3://`). `/a`, `/a/b` and `/a/b/c` should be visible as directories from HCFS.
+
+S3 should list only the `/a/b/c/d` keys, (`/a`, `/a/b`, `/a/b/c` keys, created to help HCFS, **won't be visible** if the key is created from S3)
+
+This can be done with persisting an extra flag with the implicit directory entries. These entries can be modified if they are explicit created.
+
+This flag should be added only for the keys which are created by S3. `ofs://` and `of3fs://`  create explicit directories all the time.
+
+Advantages of this approach:
+
+ 1. HCFS and S3 can work together
+ 2. S3 behavior is closer to the original AWS s3 behavior (when `/a/b/c` key is created `/a/b` won't be visible)
+
+## Handling of the incompatible paths
+
+As it's defined above the intermediate directory generation and normalization are two independent settings. (It's possible to choose only to create the intermediate directories).
+
+**If normalization is choosen**: (`ozone.om.enable.filesystem.paths=true`), all the key names will be normalized to fs-compatible name. It may cause a conflict (error) if the normalized key is already exists (or exists as a file instead of directory)
+
+**Without normalization (`ozone.om.enable.intermediate.dirs=true`)**:
+
+Creating intermediate directories might not be possible if path contains illegal characters or can't be parsed as a file system path. **These keys will be invisible from HCFS** by default. They will be ignored during the normal file list.

Review comment:
       The illegal char from S3 path can be encoded into FS path except the /. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org


[GitHub] [hadoop-ozone] bharatviswa504 commented on a change in pull request #1411: HDDS-4097. [DESIGN] S3/Ozone Filesystem inter-op

Posted by GitBox <gi...@apache.org>.
bharatviswa504 commented on a change in pull request #1411:
URL: https://github.com/apache/hadoop-ozone/pull/1411#discussion_r485794138



##########
File path: hadoop-hdds/docs/content/design/s3_hcfs.md
##########
@@ -0,0 +1,282 @@
+---
+title: S3/Ozone Filesystem inter-op 
+summary: How to support both S3 and HCFS and the same time
+date: 2020-09-09
+jira: HDDS-4097
+status: draft
+author: Marton Elek, 
+---
+<!--
+  Licensed under the Apache License, Version 2.0 (the "License");
+  you may not use this file except in compliance with the License.
+  You may obtain a copy of the License at
+
+   http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License. See accompanying LICENSE file.
+-->
+
+# Ozone S3 vs file-system semantics
+
+Ozone is an object-store for Hadoop ecosystem which can be used from multiple interfaces: 
+
+ 1. From Hadoop Compatible File Systems (will be called as *HCFS* in the remaining of this document) (RPC)
+ 2. From S3 compatible applications (REST)
+ 3. From container orchestrator as mounted volume (CSI, alpha feature)
+
+As Ozone is an object store it stores key and values in a flat hierarchy which is enough to support S3 (2). But to support Hadoop Compatible File System (and CSI), Ozone should simulated file system hierarchy.
+
+There are multiple challenges when file system hierarchy is simulated by a flat namespace:
+
+ 1. Some key patterns couldn't be easily transformed to file system path (e.g. `/a/b/../c`, `/a/b//d`, or a real key with directory path like `/b/d/`)
+ 2. Directory entries (which may have own properties) require special handling as file system interface requires a dir entry even if it's not created explicitly (for example if key `/a/b/c` is created `/a/b` supposed to be a visible directory entry for file system interface) 
+ 3. Non-recursive listing of directories can be hard (Listing direct entries under `/a` should ignore all the `/a/b/...`, `/a/b/c/...` keys) 
+ 4. Similar to listing, rename can be a costly operation as it requires to rename many keys (renaming a first level directory means a rename of all the keys with the same prefix)
+
+See also the [Hadoop S3A documentation](https://hadoop.apache.org/docs/current/hadoop-aws/tools/hadoop-aws/index.html#Introducing_the_Hadoop_S3A_client) which describes some of these problem when AWS S3 is used. (*Warnings* section)
+
+# Current status
+
+As of today *Ozone Manager* has two different interfaces (both are defined in `OmClientProtocol.proto`): 
+
+ 1. object store related functions (like *CreateKey*, *LookupKey*, *DeleteKey*,...)  
+ 2. file system related functions (like *CreateFile*, *LookupFile*,...)
+
+File system related functions uses the same flat hierarchy under the hood but includes additional functionalities. For example the `createFile` call creates all the intermediate directories for a specific key (create file `/a/b/c` will create `/a/b` and `/a` entries in the key space)
+
+Today, if a key is created from the S3 interface can cause exceptions if the intermediate directories are checked from HCFS:
+
+
+```shell
+$ aws s3api put-object --endpoint http://localhost:9878 --bucket bucket1 --key /a/b/c/d
+
+$ ozone fs -ls o3fs://bucket1.s3v/a/
+ls: `o3fs://bucket1.s3v/a/': No such file or directory
+```
+
+This problem is reported in [HDDS-3955](https://issues.apache.org/jira/browse/HDDS-3955), where a new configuration key is introduced (`ozone.om.enable.filesystem.paths`). If this is enabled, intermediate directories are created even if the object store interface is used.
+
+This configuration is turned off by default, which means that S3 and HCFS couldn't be used together.
+
+To solve the performance problems of the directory listing / rename, [HDDS-2939](https://issues.apache.org/jira/browse/HDDS-2939) is created, which propose to use a new prefix table to store the "directory" entries (=prefixes).
+
+[HDDS-4097](https://issues.apache.org/jira/browse/HDDS-4097) is created to normalize the key names based on file-system semantics if `ozone.om.enable.filesystem.paths` is enabled. But please note that `ozone.om.enable.filesystem.paths` should always be turned on if S3 and HCFS are both used which means that S3 and HCFS couldn't be used together with normalization.
+
+# Goals
+
+ * Out of the box Ozone should support both S3 and HCFS interfaces without any settings. (It's possible only for the regular, fs compatible key names)
+ * As 100% compatibility couldn't be achieved on both side we need a configuration to set the expectations for incompatible key names
+ * Default behavior of `o3fs` and `ofs` should be as close to `s3a` as possible (when s3 compatibilty is prefered)
+
+# Possible cases to support
+
+There are two main aspects of supporting both `ofs/o3fs` and `s3` together:
+
+ 1. `ofs/o3fs` require to create intermediate directory entries (for exapmle `/a/b` for the key `/b/c/c`)
+ 2. Special file-system incompatible key names require special attention
+
+The second couldn't be done with compromise.
+
+ 1. We either support all key names (including non fs compatible key names), which means `ofs/o3fs` can provide only a partial view
+ 2. Or we can normalize the key names to be fs compatible (which makes it possible to create inconsistent S3 keys)
+
+HDDS-3955 introduced `ozone.om.enable.filesystem.paths`, with this setting we will have two possible usage pattern:
+
+| ozone.om.enable.filesystem.paths= | true | false
+|-|-|-|
+| create itermediate dirs | YES | NO |
+| normalize key names from `ofs/o3fs` | YES | NO
+| force to normalize key names of `s3` interface | YES (1) | NO 
+| `s3` key `/a/b/c` available from `ofs/o3fs` | YES | NO
+| `s3` key `/a/b//c` available from `ofs/o3fs` | YES | NO
+| `s3` key `/a/b//c` available from `s3` | AWS S3 incompatibility | YES
+
+(1): Under implementation
+
+This proposal suggest to use a 3rd option where 100% AWS compatiblity is guaranteed in exchange of a limited `ofs/o3fs` view:
+
+| ozone.om.intermediate.dir.generation= | true |
+|-|-|-|
+| create itermediate dirs | YES | 
+| normalize key names from `ofs/o3fs` | YES |
+| force to normalize key names of `s3` interface | **NO** |
+| `s3` key `/a/b/c` available from `ofs/o3fs` | YES | 
+| `s3` key `/a/b//c` available from `ofs/o3fs` | NO | 
+| `s3` key `/a/b//c` available from `s3` | *YES* (100% AWS compatibility) |
+
+
+# Proposed solution
+
+In short: 
+
+ **I propose to make it possible to configure **normalization** and **intermediate dir creation**, independent from each other**
+ 
+It can be done in multiple ways. For the sake of simplicity, let's imagine two configuration option
+
+| configuration | behavior | 
+|-|-|
+| `ozone.om.enable.filesystem.paths=true`  | Enable intermediate dir generation **AND** key name normalization 

Review comment:
       so even if the other setting is false(ozone.om.enable.intermediate.dirs = false), we create intermediate directories?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org


[GitHub] [hadoop-ozone] elek commented on a change in pull request #1411: HDDS-4097. [DESIGN] S3/Ozone Filesystem inter-op

Posted by GitBox <gi...@apache.org>.
elek commented on a change in pull request #1411:
URL: https://github.com/apache/hadoop-ozone/pull/1411#discussion_r486424461



##########
File path: hadoop-hdds/docs/content/design/s3_hcfs.md
##########
@@ -67,45 +66,100 @@ To solve the performance problems of the directory listing / rename, [HDDS-2939]
 
 [HDDS-4097](https://issues.apache.org/jira/browse/HDDS-4097) is created to normalize the key names based on file-system semantics if `ozone.om.enable.filesystem.paths` is enabled. But please note that `ozone.om.enable.filesystem.paths` should always be turned on if S3 and HCFS are both used which means that S3 and HCFS couldn't be used together with normalization.
 
-## Goals
+# Goals
+
+ * Out of the box Ozone should support both S3 and HCFS interfaces without any settings. (It's possible only for the regular, fs compatible key names)
+ * As 100% compatibility couldn't be achieved on both side we need a configuration to set the expectations for incompatible key names
+ * Default behavior of `o3fs` and `ofs` should be as close to `s3a` as possible (when s3 compatibilty is prefered)
+
+# Possible cases to support
+
+There are two main aspects of supporting both `ofs/o3fs` and `s3` together:
+
+ 1. `ofs/o3fs` require to create intermediate directory entries (for exapmle `/a/b` for the key `/b/c/c`)
+ 2. Special file-system incompatible key names require special attention
+
+The second couldn't be done with compromise.
+
+ 1. We either support all key names (including non fs compatible key names), which means `ofs/o3fs` can provide only a partial view
+ 2. Or we can normalize the key names to be fs compatible (which makes it possible to create inconsistent S3 keys)
+
+HDDS-3955 introduced `ozone.om.enable.filesystem.paths`, with this setting we will have two possible usage pattern:
+
+| ozone.om.enable.filesystem.paths= | true | false
+|-|-|-|
+| create itermediate dirs | YES | NO |
+| normalize key names from `ofs/o3fs` | YES | NO
+| force to normalize key names of `s3` interface | YES (1) | NO 
+| `s3` key `/a/b/c` available from `ofs/o3fs` | YES | NO
+| `s3` key `/a/b//c` available from `ofs/o3fs` | YES | NO
+| `s3` key `/a/b//c` available from `s3` | AWS S3 incompatibility | YES
+
+(1): Under implementation
 
- * Out of the box Ozone should support both S3 and HCFS interfaces without any settings. (It's possible only for the regular path)
- * As 100% compatibility couldn't be achieved on both side we need a configuration to set the expectations in case of incompatible key names
- * Default behavior of `o3fs` and `ofs` should be as close to `s3a` as possible
+This proposal suggest to use a 3rd option where 100% AWS compatiblity is guaranteed in exchange of a limited `ofs/o3fs` view:

Review comment:
       Also, one disadvantage: bucket level settings have increased complexity. It's harder to define the expected behavior for a specific path. Cluster level settings is easie, as there is one global behavior for the setup.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org


[GitHub] [hadoop-ozone] xiaoyuyao commented on a change in pull request #1411: HDDS-4097. [DESIGN] S3/Ozone Filesystem inter-op

Posted by GitBox <gi...@apache.org>.
xiaoyuyao commented on a change in pull request #1411:
URL: https://github.com/apache/hadoop-ozone/pull/1411#discussion_r486761072



##########
File path: hadoop-hdds/docs/content/design/s3_hcfs.md
##########
@@ -0,0 +1,282 @@
+---
+title: S3/Ozone Filesystem inter-op 
+summary: How to support both S3 and HCFS and the same time
+date: 2020-09-09
+jira: HDDS-4097
+status: draft
+author: Marton Elek, 
+---
+<!--
+  Licensed under the Apache License, Version 2.0 (the "License");
+  you may not use this file except in compliance with the License.
+  You may obtain a copy of the License at
+
+   http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License. See accompanying LICENSE file.
+-->
+
+# Ozone S3 vs file-system semantics
+
+Ozone is an object-store for Hadoop ecosystem which can be used from multiple interfaces: 
+
+ 1. From Hadoop Compatible File Systems (will be called as *HCFS* in the remaining of this document) (RPC)
+ 2. From S3 compatible applications (REST)
+ 3. From container orchestrator as mounted volume (CSI, alpha feature)
+
+As Ozone is an object store it stores key and values in a flat hierarchy which is enough to support S3 (2). But to support Hadoop Compatible File System (and CSI), Ozone should simulated file system hierarchy.
+
+There are multiple challenges when file system hierarchy is simulated by a flat namespace:
+
+ 1. Some key patterns couldn't be easily transformed to file system path (e.g. `/a/b/../c`, `/a/b//d`, or a real key with directory path like `/b/d/`)
+ 2. Directory entries (which may have own properties) require special handling as file system interface requires a dir entry even if it's not created explicitly (for example if key `/a/b/c` is created `/a/b` supposed to be a visible directory entry for file system interface) 
+ 3. Non-recursive listing of directories can be hard (Listing direct entries under `/a` should ignore all the `/a/b/...`, `/a/b/c/...` keys) 
+ 4. Similar to listing, rename can be a costly operation as it requires to rename many keys (renaming a first level directory means a rename of all the keys with the same prefix)
+
+See also the [Hadoop S3A documentation](https://hadoop.apache.org/docs/current/hadoop-aws/tools/hadoop-aws/index.html#Introducing_the_Hadoop_S3A_client) which describes some of these problem when AWS S3 is used. (*Warnings* section)
+
+# Current status
+
+As of today *Ozone Manager* has two different interfaces (both are defined in `OmClientProtocol.proto`): 
+
+ 1. object store related functions (like *CreateKey*, *LookupKey*, *DeleteKey*,...)  
+ 2. file system related functions (like *CreateFile*, *LookupFile*,...)
+
+File system related functions uses the same flat hierarchy under the hood but includes additional functionalities. For example the `createFile` call creates all the intermediate directories for a specific key (create file `/a/b/c` will create `/a/b` and `/a` entries in the key space)
+
+Today, if a key is created from the S3 interface can cause exceptions if the intermediate directories are checked from HCFS:
+
+
+```shell
+$ aws s3api put-object --endpoint http://localhost:9878 --bucket bucket1 --key /a/b/c/d
+
+$ ozone fs -ls o3fs://bucket1.s3v/a/
+ls: `o3fs://bucket1.s3v/a/': No such file or directory
+```
+
+This problem is reported in [HDDS-3955](https://issues.apache.org/jira/browse/HDDS-3955), where a new configuration key is introduced (`ozone.om.enable.filesystem.paths`). If this is enabled, intermediate directories are created even if the object store interface is used.
+
+This configuration is turned off by default, which means that S3 and HCFS couldn't be used together.
+
+To solve the performance problems of the directory listing / rename, [HDDS-2939](https://issues.apache.org/jira/browse/HDDS-2939) is created, which propose to use a new prefix table to store the "directory" entries (=prefixes).
+
+[HDDS-4097](https://issues.apache.org/jira/browse/HDDS-4097) is created to normalize the key names based on file-system semantics if `ozone.om.enable.filesystem.paths` is enabled. But please note that `ozone.om.enable.filesystem.paths` should always be turned on if S3 and HCFS are both used which means that S3 and HCFS couldn't be used together with normalization.
+
+# Goals
+
+ * Out of the box Ozone should support both S3 and HCFS interfaces without any settings. (It's possible only for the regular, fs compatible key names)
+ * As 100% compatibility couldn't be achieved on both side we need a configuration to set the expectations for incompatible key names
+ * Default behavior of `o3fs` and `ofs` should be as close to `s3a` as possible (when s3 compatibilty is prefered)
+
+# Possible cases to support
+
+There are two main aspects of supporting both `ofs/o3fs` and `s3` together:
+
+ 1. `ofs/o3fs` require to create intermediate directory entries (for exapmle `/a/b` for the key `/b/c/c`)
+ 2. Special file-system incompatible key names require special attention
+
+The second couldn't be done with compromise.
+
+ 1. We either support all key names (including non fs compatible key names), which means `ofs/o3fs` can provide only a partial view
+ 2. Or we can normalize the key names to be fs compatible (which makes it possible to create inconsistent S3 keys)
+
+HDDS-3955 introduced `ozone.om.enable.filesystem.paths`, with this setting we will have two possible usage pattern:
+
+| ozone.om.enable.filesystem.paths= | true | false
+|-|-|-|
+| create itermediate dirs | YES | NO |
+| normalize key names from `ofs/o3fs` | YES | NO
+| force to normalize key names of `s3` interface | YES (1) | NO 
+| `s3` key `/a/b/c` available from `ofs/o3fs` | YES | NO
+| `s3` key `/a/b//c` available from `ofs/o3fs` | YES | NO
+| `s3` key `/a/b//c` available from `s3` | AWS S3 incompatibility | YES
+
+(1): Under implementation
+
+This proposal suggest to use a 3rd option where 100% AWS compatiblity is guaranteed in exchange of a limited `ofs/o3fs` view:
+
+| ozone.om.intermediate.dir.generation= | true |
+|-|-|-|
+| create itermediate dirs | YES | 
+| normalize key names from `ofs/o3fs` | YES |
+| force to normalize key names of `s3` interface | **NO** |
+| `s3` key `/a/b/c` available from `ofs/o3fs` | YES | 
+| `s3` key `/a/b//c` available from `ofs/o3fs` | NO | 
+| `s3` key `/a/b//c` available from `s3` | *YES* (100% AWS compatibility) |
+
+
+# Proposed solution
+
+In short: 
+
+ **I propose to make it possible to configure **normalization** and **intermediate dir creation**, independent from each other**
+ 
+It can be done in multiple ways. For the sake of simplicity, let's imagine two configuration option
+
+| configuration | behavior | 
+|-|-|
+| `ozone.om.enable.filesystem.paths=true`  | Enable intermediate dir generation **AND** key name normalization 
+| `ozone.om.enable.intermediate.dirs=true` | Enable only the intermediate dir generation
+
+## S3/HCFS Interoperability
+
+**In case of intermediate directory generation is enabled (with either of the configuraiton keys)**:
+
+When somebody creates a new key like `/a/b/c/d`, the same key should be visible from HCFS (`o3fs//` or `o3://`). `/a`, `/a/b` and `/a/b/c` should be visible as directories from HCFS.
+
+S3 should list only the `/a/b/c/d` keys, (`/a`, `/a/b`, `/a/b/c` keys, created to help HCFS, **won't be visible** if the key is created from S3)
+
+This can be done with persisting an extra flag with the implicit directory entries. These entries can be modified if they are explicit created.
+
+This flag should be added only for the keys which are created by S3. `ofs://` and `of3fs://`  create explicit directories all the time.

Review comment:
       bq. S3 should list only the `/a/b/c/d` keys, (`/a`, `/a/b`, `/a/b/c` keys, created to help HCFS, **won't be visible** if the key is created from S3)
   
   what if user create additional files under intermediate dirs such as /a/b/d via FS interface, we still want to show them in this case for interop?
   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org


[GitHub] [hadoop-ozone] elek commented on a change in pull request #1411: HDDS-4097. [DESIGN] S3/Ozone Filesystem inter-op

Posted by GitBox <gi...@apache.org>.
elek commented on a change in pull request #1411:
URL: https://github.com/apache/hadoop-ozone/pull/1411#discussion_r486437934



##########
File path: hadoop-hdds/docs/content/design/s3_hcfs.md
##########
@@ -0,0 +1,282 @@
+---
+title: S3/Ozone Filesystem inter-op 
+summary: How to support both S3 and HCFS and the same time
+date: 2020-09-09
+jira: HDDS-4097
+status: draft
+author: Marton Elek, 
+---
+<!--
+  Licensed under the Apache License, Version 2.0 (the "License");
+  you may not use this file except in compliance with the License.
+  You may obtain a copy of the License at
+
+   http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License. See accompanying LICENSE file.
+-->
+
+# Ozone S3 vs file-system semantics
+
+Ozone is an object-store for Hadoop ecosystem which can be used from multiple interfaces: 
+
+ 1. From Hadoop Compatible File Systems (will be called as *HCFS* in the remaining of this document) (RPC)
+ 2. From S3 compatible applications (REST)
+ 3. From container orchestrator as mounted volume (CSI, alpha feature)
+
+As Ozone is an object store it stores key and values in a flat hierarchy which is enough to support S3 (2). But to support Hadoop Compatible File System (and CSI), Ozone should simulated file system hierarchy.
+
+There are multiple challenges when file system hierarchy is simulated by a flat namespace:
+
+ 1. Some key patterns couldn't be easily transformed to file system path (e.g. `/a/b/../c`, `/a/b//d`, or a real key with directory path like `/b/d/`)
+ 2. Directory entries (which may have own properties) require special handling as file system interface requires a dir entry even if it's not created explicitly (for example if key `/a/b/c` is created `/a/b` supposed to be a visible directory entry for file system interface) 
+ 3. Non-recursive listing of directories can be hard (Listing direct entries under `/a` should ignore all the `/a/b/...`, `/a/b/c/...` keys) 
+ 4. Similar to listing, rename can be a costly operation as it requires to rename many keys (renaming a first level directory means a rename of all the keys with the same prefix)
+
+See also the [Hadoop S3A documentation](https://hadoop.apache.org/docs/current/hadoop-aws/tools/hadoop-aws/index.html#Introducing_the_Hadoop_S3A_client) which describes some of these problem when AWS S3 is used. (*Warnings* section)
+
+# Current status
+
+As of today *Ozone Manager* has two different interfaces (both are defined in `OmClientProtocol.proto`): 
+
+ 1. object store related functions (like *CreateKey*, *LookupKey*, *DeleteKey*,...)  
+ 2. file system related functions (like *CreateFile*, *LookupFile*,...)
+
+File system related functions uses the same flat hierarchy under the hood but includes additional functionalities. For example the `createFile` call creates all the intermediate directories for a specific key (create file `/a/b/c` will create `/a/b` and `/a` entries in the key space)
+
+Today, if a key is created from the S3 interface can cause exceptions if the intermediate directories are checked from HCFS:
+
+
+```shell
+$ aws s3api put-object --endpoint http://localhost:9878 --bucket bucket1 --key /a/b/c/d
+
+$ ozone fs -ls o3fs://bucket1.s3v/a/
+ls: `o3fs://bucket1.s3v/a/': No such file or directory
+```
+
+This problem is reported in [HDDS-3955](https://issues.apache.org/jira/browse/HDDS-3955), where a new configuration key is introduced (`ozone.om.enable.filesystem.paths`). If this is enabled, intermediate directories are created even if the object store interface is used.
+
+This configuration is turned off by default, which means that S3 and HCFS couldn't be used together.
+
+To solve the performance problems of the directory listing / rename, [HDDS-2939](https://issues.apache.org/jira/browse/HDDS-2939) is created, which propose to use a new prefix table to store the "directory" entries (=prefixes).
+
+[HDDS-4097](https://issues.apache.org/jira/browse/HDDS-4097) is created to normalize the key names based on file-system semantics if `ozone.om.enable.filesystem.paths` is enabled. But please note that `ozone.om.enable.filesystem.paths` should always be turned on if S3 and HCFS are both used which means that S3 and HCFS couldn't be used together with normalization.
+
+# Goals
+
+ * Out of the box Ozone should support both S3 and HCFS interfaces without any settings. (It's possible only for the regular, fs compatible key names)
+ * As 100% compatibility couldn't be achieved on both side we need a configuration to set the expectations for incompatible key names
+ * Default behavior of `o3fs` and `ofs` should be as close to `s3a` as possible (when s3 compatibilty is prefered)
+
+# Possible cases to support
+
+There are two main aspects of supporting both `ofs/o3fs` and `s3` together:
+
+ 1. `ofs/o3fs` require to create intermediate directory entries (for exapmle `/a/b` for the key `/b/c/c`)
+ 2. Special file-system incompatible key names require special attention
+
+The second couldn't be done with compromise.
+
+ 1. We either support all key names (including non fs compatible key names), which means `ofs/o3fs` can provide only a partial view
+ 2. Or we can normalize the key names to be fs compatible (which makes it possible to create inconsistent S3 keys)
+
+HDDS-3955 introduced `ozone.om.enable.filesystem.paths`, with this setting we will have two possible usage pattern:
+
+| ozone.om.enable.filesystem.paths= | true | false
+|-|-|-|
+| create itermediate dirs | YES | NO |
+| normalize key names from `ofs/o3fs` | YES | NO
+| force to normalize key names of `s3` interface | YES (1) | NO 
+| `s3` key `/a/b/c` available from `ofs/o3fs` | YES | NO
+| `s3` key `/a/b//c` available from `ofs/o3fs` | YES | NO
+| `s3` key `/a/b//c` available from `s3` | AWS S3 incompatibility | YES
+
+(1): Under implementation
+
+This proposal suggest to use a 3rd option where 100% AWS compatiblity is guaranteed in exchange of a limited `ofs/o3fs` view:
+
+| ozone.om.intermediate.dir.generation= | true |
+|-|-|-|
+| create itermediate dirs | YES | 
+| normalize key names from `ofs/o3fs` | YES |
+| force to normalize key names of `s3` interface | **NO** |
+| `s3` key `/a/b/c` available from `ofs/o3fs` | YES | 
+| `s3` key `/a/b//c` available from `ofs/o3fs` | NO | 
+| `s3` key `/a/b//c` available from `s3` | *YES* (100% AWS compatibility) |
+
+
+# Proposed solution
+
+In short: 
+
+ **I propose to make it possible to configure **normalization** and **intermediate dir creation**, independent from each other**
+ 
+It can be done in multiple ways. For the sake of simplicity, let's imagine two configuration option
+
+| configuration | behavior | 
+|-|-|
+| `ozone.om.enable.filesystem.paths=true`  | Enable intermediate dir generation **AND** key name normalization 
+| `ozone.om.enable.intermediate.dirs=true` | Enable only the intermediate dir generation
+
+## S3/HCFS Interoperability
+
+**In case of intermediate directory generation is enabled (with either of the configuraiton keys)**:
+
+When somebody creates a new key like `/a/b/c/d`, the same key should be visible from HCFS (`o3fs//` or `o3://`). `/a`, `/a/b` and `/a/b/c` should be visible as directories from HCFS.
+
+S3 should list only the `/a/b/c/d` keys, (`/a`, `/a/b`, `/a/b/c` keys, created to help HCFS, **won't be visible** if the key is created from S3)
+
+This can be done with persisting an extra flag with the implicit directory entries. These entries can be modified if they are explicit created.
+
+This flag should be added only for the keys which are created by S3. `ofs://` and `of3fs://`  create explicit directories all the time.

Review comment:
       If any key is created from s3 interface but it already exists but explicit=false flag, the flag should be changed to explicit=true, to make it visible from s3.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org


[GitHub] [hadoop-ozone] elek commented on a change in pull request #1411: HDDS-4097. [DESIGN] S3/Ozone Filesystem inter-op

Posted by GitBox <gi...@apache.org>.
elek commented on a change in pull request #1411:
URL: https://github.com/apache/hadoop-ozone/pull/1411#discussion_r486418379



##########
File path: hadoop-hdds/docs/content/design/s3_hcfs.md
##########
@@ -67,45 +66,100 @@ To solve the performance problems of the directory listing / rename, [HDDS-2939]
 
 [HDDS-4097](https://issues.apache.org/jira/browse/HDDS-4097) is created to normalize the key names based on file-system semantics if `ozone.om.enable.filesystem.paths` is enabled. But please note that `ozone.om.enable.filesystem.paths` should always be turned on if S3 and HCFS are both used which means that S3 and HCFS couldn't be used together with normalization.
 
-## Goals
+# Goals
+
+ * Out of the box Ozone should support both S3 and HCFS interfaces without any settings. (It's possible only for the regular, fs compatible key names)
+ * As 100% compatibility couldn't be achieved on both side we need a configuration to set the expectations for incompatible key names
+ * Default behavior of `o3fs` and `ofs` should be as close to `s3a` as possible (when s3 compatibilty is prefered)
+
+# Possible cases to support
+
+There are two main aspects of supporting both `ofs/o3fs` and `s3` together:
+
+ 1. `ofs/o3fs` require to create intermediate directory entries (for exapmle `/a/b` for the key `/b/c/c`)
+ 2. Special file-system incompatible key names require special attention
+
+The second couldn't be done with compromise.
+
+ 1. We either support all key names (including non fs compatible key names), which means `ofs/o3fs` can provide only a partial view
+ 2. Or we can normalize the key names to be fs compatible (which makes it possible to create inconsistent S3 keys)
+
+HDDS-3955 introduced `ozone.om.enable.filesystem.paths`, with this setting we will have two possible usage pattern:
+
+| ozone.om.enable.filesystem.paths= | true | false
+|-|-|-|
+| create itermediate dirs | YES | NO |
+| normalize key names from `ofs/o3fs` | YES | NO
+| force to normalize key names of `s3` interface | YES (1) | NO 
+| `s3` key `/a/b/c` available from `ofs/o3fs` | YES | NO
+| `s3` key `/a/b//c` available from `ofs/o3fs` | YES | NO
+| `s3` key `/a/b//c` available from `s3` | AWS S3 incompatibility | YES
+
+(1): Under implementation

Review comment:
       > I don't think that is true. Paths are normalized already on the S3 interface when writing new keys.
   
   But not for read, if I understood well. But happy to remove this line if it's confusing.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org


[GitHub] [hadoop-ozone] elek commented on a change in pull request #1411: HDDS-4097. [DESIGN] S3/Ozone Filesystem inter-op

Posted by GitBox <gi...@apache.org>.
elek commented on a change in pull request #1411:
URL: https://github.com/apache/hadoop-ozone/pull/1411#discussion_r486438477



##########
File path: hadoop-hdds/docs/content/design/s3_hcfs.md
##########
@@ -0,0 +1,282 @@
+---
+title: S3/Ozone Filesystem inter-op 
+summary: How to support both S3 and HCFS and the same time
+date: 2020-09-09
+jira: HDDS-4097
+status: draft
+author: Marton Elek, 
+---
+<!--
+  Licensed under the Apache License, Version 2.0 (the "License");
+  you may not use this file except in compliance with the License.
+  You may obtain a copy of the License at
+
+   http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License. See accompanying LICENSE file.
+-->
+
+# Ozone S3 vs file-system semantics
+
+Ozone is an object-store for Hadoop ecosystem which can be used from multiple interfaces: 
+
+ 1. From Hadoop Compatible File Systems (will be called as *HCFS* in the remaining of this document) (RPC)
+ 2. From S3 compatible applications (REST)
+ 3. From container orchestrator as mounted volume (CSI, alpha feature)
+
+As Ozone is an object store it stores key and values in a flat hierarchy which is enough to support S3 (2). But to support Hadoop Compatible File System (and CSI), Ozone should simulated file system hierarchy.
+
+There are multiple challenges when file system hierarchy is simulated by a flat namespace:
+
+ 1. Some key patterns couldn't be easily transformed to file system path (e.g. `/a/b/../c`, `/a/b//d`, or a real key with directory path like `/b/d/`)
+ 2. Directory entries (which may have own properties) require special handling as file system interface requires a dir entry even if it's not created explicitly (for example if key `/a/b/c` is created `/a/b` supposed to be a visible directory entry for file system interface) 
+ 3. Non-recursive listing of directories can be hard (Listing direct entries under `/a` should ignore all the `/a/b/...`, `/a/b/c/...` keys) 
+ 4. Similar to listing, rename can be a costly operation as it requires to rename many keys (renaming a first level directory means a rename of all the keys with the same prefix)
+
+See also the [Hadoop S3A documentation](https://hadoop.apache.org/docs/current/hadoop-aws/tools/hadoop-aws/index.html#Introducing_the_Hadoop_S3A_client) which describes some of these problem when AWS S3 is used. (*Warnings* section)
+
+# Current status
+
+As of today *Ozone Manager* has two different interfaces (both are defined in `OmClientProtocol.proto`): 
+
+ 1. object store related functions (like *CreateKey*, *LookupKey*, *DeleteKey*,...)  
+ 2. file system related functions (like *CreateFile*, *LookupFile*,...)
+
+File system related functions uses the same flat hierarchy under the hood but includes additional functionalities. For example the `createFile` call creates all the intermediate directories for a specific key (create file `/a/b/c` will create `/a/b` and `/a` entries in the key space)
+
+Today, if a key is created from the S3 interface can cause exceptions if the intermediate directories are checked from HCFS:
+
+
+```shell
+$ aws s3api put-object --endpoint http://localhost:9878 --bucket bucket1 --key /a/b/c/d
+
+$ ozone fs -ls o3fs://bucket1.s3v/a/
+ls: `o3fs://bucket1.s3v/a/': No such file or directory
+```
+
+This problem is reported in [HDDS-3955](https://issues.apache.org/jira/browse/HDDS-3955), where a new configuration key is introduced (`ozone.om.enable.filesystem.paths`). If this is enabled, intermediate directories are created even if the object store interface is used.
+
+This configuration is turned off by default, which means that S3 and HCFS couldn't be used together.
+
+To solve the performance problems of the directory listing / rename, [HDDS-2939](https://issues.apache.org/jira/browse/HDDS-2939) is created, which propose to use a new prefix table to store the "directory" entries (=prefixes).
+
+[HDDS-4097](https://issues.apache.org/jira/browse/HDDS-4097) is created to normalize the key names based on file-system semantics if `ozone.om.enable.filesystem.paths` is enabled. But please note that `ozone.om.enable.filesystem.paths` should always be turned on if S3 and HCFS are both used which means that S3 and HCFS couldn't be used together with normalization.
+
+# Goals
+
+ * Out of the box Ozone should support both S3 and HCFS interfaces without any settings. (It's possible only for the regular, fs compatible key names)
+ * As 100% compatibility couldn't be achieved on both side we need a configuration to set the expectations for incompatible key names
+ * Default behavior of `o3fs` and `ofs` should be as close to `s3a` as possible (when s3 compatibilty is prefered)
+
+# Possible cases to support
+
+There are two main aspects of supporting both `ofs/o3fs` and `s3` together:
+
+ 1. `ofs/o3fs` require to create intermediate directory entries (for exapmle `/a/b` for the key `/b/c/c`)
+ 2. Special file-system incompatible key names require special attention
+
+The second couldn't be done with compromise.
+
+ 1. We either support all key names (including non fs compatible key names), which means `ofs/o3fs` can provide only a partial view
+ 2. Or we can normalize the key names to be fs compatible (which makes it possible to create inconsistent S3 keys)
+
+HDDS-3955 introduced `ozone.om.enable.filesystem.paths`, with this setting we will have two possible usage pattern:
+
+| ozone.om.enable.filesystem.paths= | true | false
+|-|-|-|
+| create itermediate dirs | YES | NO |
+| normalize key names from `ofs/o3fs` | YES | NO
+| force to normalize key names of `s3` interface | YES (1) | NO 
+| `s3` key `/a/b/c` available from `ofs/o3fs` | YES | NO
+| `s3` key `/a/b//c` available from `ofs/o3fs` | YES | NO
+| `s3` key `/a/b//c` available from `s3` | AWS S3 incompatibility | YES
+
+(1): Under implementation
+
+This proposal suggest to use a 3rd option where 100% AWS compatiblity is guaranteed in exchange of a limited `ofs/o3fs` view:
+
+| ozone.om.intermediate.dir.generation= | true |
+|-|-|-|
+| create itermediate dirs | YES | 
+| normalize key names from `ofs/o3fs` | YES |
+| force to normalize key names of `s3` interface | **NO** |
+| `s3` key `/a/b/c` available from `ofs/o3fs` | YES | 
+| `s3` key `/a/b//c` available from `ofs/o3fs` | NO | 
+| `s3` key `/a/b//c` available from `s3` | *YES* (100% AWS compatibility) |
+
+
+# Proposed solution
+
+In short: 
+
+ **I propose to make it possible to configure **normalization** and **intermediate dir creation**, independent from each other**
+ 
+It can be done in multiple ways. For the sake of simplicity, let's imagine two configuration option
+
+| configuration | behavior | 
+|-|-|
+| `ozone.om.enable.filesystem.paths=true`  | Enable intermediate dir generation **AND** key name normalization 
+| `ozone.om.enable.intermediate.dirs=true` | Enable only the intermediate dir generation
+
+## S3/HCFS Interoperability
+
+**In case of intermediate directory generation is enabled (with either of the configuraiton keys)**:
+
+When somebody creates a new key like `/a/b/c/d`, the same key should be visible from HCFS (`o3fs//` or `o3://`). `/a`, `/a/b` and `/a/b/c` should be visible as directories from HCFS.
+
+S3 should list only the `/a/b/c/d` keys, (`/a`, `/a/b`, `/a/b/c` keys, created to help HCFS, **won't be visible** if the key is created from S3)
+
+This can be done with persisting an extra flag with the implicit directory entries. These entries can be modified if they are explicit created.
+
+This flag should be added only for the keys which are created by S3. `ofs://` and `of3fs://`  create explicit directories all the time.
+
+Advantages of this approach:
+
+ 1. HCFS and S3 can work together
+ 2. S3 behavior is closer to the original AWS s3 behavior (when `/a/b/c` key is created `/a/b` won't be visible)
+
+## Handling of the incompatible paths
+
+As it's defined above the intermediate directory generation and normalization are two independent settings. (It's possible to choose only to create the intermediate directories).
+
+**If normalization is choosen**: (`ozone.om.enable.filesystem.paths=true`), all the key names will be normalized to fs-compatible name. It may cause a conflict (error) if the normalized key is already exists (or exists as a file instead of directory)
+
+**Without normalization (`ozone.om.enable.intermediate.dirs=true`)**:
+
+Creating intermediate directories might not be possible if path contains illegal characters or can't be parsed as a file system path. **These keys will be invisible from HCFS** by default. They will be ignored during the normal file list.
+
+## Using Ozone in object-store only mode
+
+Creating intermediate directories can have some overhead (write amplification is increased if many keys are written with different prefixes as we need an entry for each prefixes). This write-amplification can be handled with the current implementation: based on the measurements RocksDB has no problems with billions of keys.
+
+If none of the mentioned configurations are enabled, the intermediate directories won't be created. But in this case, the consistent view of `ofs/o3fs` couldn't be guaranteed, so `ofs/o3fs` **should be disabled and throw an exception** (But Ozone can be used as a pure S3 replacement without using as a HCFS).
+
+# Problematic cases
+
+As described in the previous section there are some cases which couldn't be supported out-of-the-box due to the differences between the flat key-space and file-system hierarchy. These cases are collected here together with the information how existing tools (AWS console, AWS cli, AWS S3A Hadoop connector) behaves.
+
+## Empty directory path
+
+With a pure object store keys with empty directory names can be created.
+
+```
+aws s3api put-object --bucket ozonetest --key a/b//y --body README.md
+```
+
+Behavior:
+ * *S3 web console*: empty dir is rendered as `____` to make it possible to navigate in
+ * **aws s3 ls**: Prefix entry is visible as `PRE /`
+ * S3A: Not visible
+ 
+Proposed behavior:
+
+ * `ozone.om.enable.intermediate.dirs=true`: `/y` is not accessible, `/a/b` directory doesn't contain this entry

Review comment:
       yes. exactly




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org


[GitHub] [hadoop-ozone] bharatviswa504 commented on a change in pull request #1411: HDDS-4097. [DESIGN] S3/Ozone Filesystem inter-op

Posted by GitBox <gi...@apache.org>.
bharatviswa504 commented on a change in pull request #1411:
URL: https://github.com/apache/hadoop-ozone/pull/1411#discussion_r485801364



##########
File path: hadoop-hdds/docs/content/design/s3_hcfs.md
##########
@@ -0,0 +1,282 @@
+---
+title: S3/Ozone Filesystem inter-op 
+summary: How to support both S3 and HCFS and the same time
+date: 2020-09-09
+jira: HDDS-4097
+status: draft
+author: Marton Elek, 
+---
+<!--
+  Licensed under the Apache License, Version 2.0 (the "License");
+  you may not use this file except in compliance with the License.
+  You may obtain a copy of the License at
+
+   http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License. See accompanying LICENSE file.
+-->
+
+# Ozone S3 vs file-system semantics
+
+Ozone is an object-store for Hadoop ecosystem which can be used from multiple interfaces: 
+
+ 1. From Hadoop Compatible File Systems (will be called as *HCFS* in the remaining of this document) (RPC)
+ 2. From S3 compatible applications (REST)
+ 3. From container orchestrator as mounted volume (CSI, alpha feature)
+
+As Ozone is an object store it stores key and values in a flat hierarchy which is enough to support S3 (2). But to support Hadoop Compatible File System (and CSI), Ozone should simulated file system hierarchy.
+
+There are multiple challenges when file system hierarchy is simulated by a flat namespace:
+
+ 1. Some key patterns couldn't be easily transformed to file system path (e.g. `/a/b/../c`, `/a/b//d`, or a real key with directory path like `/b/d/`)
+ 2. Directory entries (which may have own properties) require special handling as file system interface requires a dir entry even if it's not created explicitly (for example if key `/a/b/c` is created `/a/b` supposed to be a visible directory entry for file system interface) 
+ 3. Non-recursive listing of directories can be hard (Listing direct entries under `/a` should ignore all the `/a/b/...`, `/a/b/c/...` keys) 
+ 4. Similar to listing, rename can be a costly operation as it requires to rename many keys (renaming a first level directory means a rename of all the keys with the same prefix)
+
+See also the [Hadoop S3A documentation](https://hadoop.apache.org/docs/current/hadoop-aws/tools/hadoop-aws/index.html#Introducing_the_Hadoop_S3A_client) which describes some of these problem when AWS S3 is used. (*Warnings* section)
+
+# Current status
+
+As of today *Ozone Manager* has two different interfaces (both are defined in `OmClientProtocol.proto`): 
+
+ 1. object store related functions (like *CreateKey*, *LookupKey*, *DeleteKey*,...)  
+ 2. file system related functions (like *CreateFile*, *LookupFile*,...)
+
+File system related functions uses the same flat hierarchy under the hood but includes additional functionalities. For example the `createFile` call creates all the intermediate directories for a specific key (create file `/a/b/c` will create `/a/b` and `/a` entries in the key space)
+
+Today, if a key is created from the S3 interface can cause exceptions if the intermediate directories are checked from HCFS:
+
+
+```shell
+$ aws s3api put-object --endpoint http://localhost:9878 --bucket bucket1 --key /a/b/c/d
+
+$ ozone fs -ls o3fs://bucket1.s3v/a/
+ls: `o3fs://bucket1.s3v/a/': No such file or directory
+```
+
+This problem is reported in [HDDS-3955](https://issues.apache.org/jira/browse/HDDS-3955), where a new configuration key is introduced (`ozone.om.enable.filesystem.paths`). If this is enabled, intermediate directories are created even if the object store interface is used.
+
+This configuration is turned off by default, which means that S3 and HCFS couldn't be used together.
+
+To solve the performance problems of the directory listing / rename, [HDDS-2939](https://issues.apache.org/jira/browse/HDDS-2939) is created, which propose to use a new prefix table to store the "directory" entries (=prefixes).
+
+[HDDS-4097](https://issues.apache.org/jira/browse/HDDS-4097) is created to normalize the key names based on file-system semantics if `ozone.om.enable.filesystem.paths` is enabled. But please note that `ozone.om.enable.filesystem.paths` should always be turned on if S3 and HCFS are both used which means that S3 and HCFS couldn't be used together with normalization.
+
+# Goals
+
+ * Out of the box Ozone should support both S3 and HCFS interfaces without any settings. (It's possible only for the regular, fs compatible key names)
+ * As 100% compatibility couldn't be achieved on both side we need a configuration to set the expectations for incompatible key names
+ * Default behavior of `o3fs` and `ofs` should be as close to `s3a` as possible (when s3 compatibilty is prefered)
+
+# Possible cases to support
+
+There are two main aspects of supporting both `ofs/o3fs` and `s3` together:
+
+ 1. `ofs/o3fs` require to create intermediate directory entries (for exapmle `/a/b` for the key `/b/c/c`)
+ 2. Special file-system incompatible key names require special attention
+
+The second couldn't be done with compromise.
+
+ 1. We either support all key names (including non fs compatible key names), which means `ofs/o3fs` can provide only a partial view
+ 2. Or we can normalize the key names to be fs compatible (which makes it possible to create inconsistent S3 keys)
+
+HDDS-3955 introduced `ozone.om.enable.filesystem.paths`, with this setting we will have two possible usage pattern:
+
+| ozone.om.enable.filesystem.paths= | true | false
+|-|-|-|
+| create itermediate dirs | YES | NO |
+| normalize key names from `ofs/o3fs` | YES | NO
+| force to normalize key names of `s3` interface | YES (1) | NO 
+| `s3` key `/a/b/c` available from `ofs/o3fs` | YES | NO
+| `s3` key `/a/b//c` available from `ofs/o3fs` | YES | NO
+| `s3` key `/a/b//c` available from `s3` | AWS S3 incompatibility | YES
+
+(1): Under implementation
+
+This proposal suggest to use a 3rd option where 100% AWS compatiblity is guaranteed in exchange of a limited `ofs/o3fs` view:
+
+| ozone.om.intermediate.dir.generation= | true |
+|-|-|-|
+| create itermediate dirs | YES | 
+| normalize key names from `ofs/o3fs` | YES |
+| force to normalize key names of `s3` interface | **NO** |
+| `s3` key `/a/b/c` available from `ofs/o3fs` | YES | 
+| `s3` key `/a/b//c` available from `ofs/o3fs` | NO | 
+| `s3` key `/a/b//c` available from `s3` | *YES* (100% AWS compatibility) |
+
+
+# Proposed solution
+
+In short: 
+
+ **I propose to make it possible to configure **normalization** and **intermediate dir creation**, independent from each other**
+ 
+It can be done in multiple ways. For the sake of simplicity, let's imagine two configuration option
+
+| configuration | behavior | 
+|-|-|
+| `ozone.om.enable.filesystem.paths=true`  | Enable intermediate dir generation **AND** key name normalization 
+| `ozone.om.enable.intermediate.dirs=true` | Enable only the intermediate dir generation
+
+## S3/HCFS Interoperability
+
+**In case of intermediate directory generation is enabled (with either of the configuraiton keys)**:
+
+When somebody creates a new key like `/a/b/c/d`, the same key should be visible from HCFS (`o3fs//` or `o3://`). `/a`, `/a/b` and `/a/b/c` should be visible as directories from HCFS.
+
+S3 should list only the `/a/b/c/d` keys, (`/a`, `/a/b`, `/a/b/c` keys, created to help HCFS, **won't be visible** if the key is created from S3)
+
+This can be done with persisting an extra flag with the implicit directory entries. These entries can be modified if they are explicit created.
+
+This flag should be added only for the keys which are created by S3. `ofs://` and `of3fs://`  create explicit directories all the time.
+
+Advantages of this approach:
+
+ 1. HCFS and S3 can work together
+ 2. S3 behavior is closer to the original AWS s3 behavior (when `/a/b/c` key is created `/a/b` won't be visible)
+
+## Handling of the incompatible paths
+
+As it's defined above the intermediate directory generation and normalization are two independent settings. (It's possible to choose only to create the intermediate directories).
+
+**If normalization is choosen**: (`ozone.om.enable.filesystem.paths=true`), all the key names will be normalized to fs-compatible name. It may cause a conflict (error) if the normalized key is already exists (or exists as a file instead of directory)
+
+**Without normalization (`ozone.om.enable.intermediate.dirs=true`)**:
+
+Creating intermediate directories might not be possible if path contains illegal characters or can't be parsed as a file system path. **These keys will be invisible from HCFS** by default. They will be ignored during the normal file list.
+
+## Using Ozone in object-store only mode
+
+Creating intermediate directories can have some overhead (write amplification is increased if many keys are written with different prefixes as we need an entry for each prefixes). This write-amplification can be handled with the current implementation: based on the measurements RocksDB has no problems with billions of keys.
+
+If none of the mentioned configurations are enabled, the intermediate directories won't be created. But in this case, the consistent view of `ofs/o3fs` couldn't be guaranteed, so `ofs/o3fs` **should be disabled and throw an exception** (But Ozone can be used as a pure S3 replacement without using as a HCFS).
+
+# Problematic cases
+
+As described in the previous section there are some cases which couldn't be supported out-of-the-box due to the differences between the flat key-space and file-system hierarchy. These cases are collected here together with the information how existing tools (AWS console, AWS cli, AWS S3A Hadoop connector) behaves.
+
+## Empty directory path
+
+With a pure object store keys with empty directory names can be created.
+
+```
+aws s3api put-object --bucket ozonetest --key a/b//y --body README.md
+```
+
+Behavior:
+ * *S3 web console*: empty dir is rendered as `____` to make it possible to navigate in
+ * **aws s3 ls**: Prefix entry is visible as `PRE /`
+ * S3A: Not visible
+ 
+Proposed behavior:
+
+ * `ozone.om.enable.intermediate.dirs=true`: `/y` is not accessible, `/a/b` directory doesn't contain this entry

Review comment:
       `/y` is not accessible from FS, but will be accessible from S3 correct?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org


[GitHub] [hadoop-ozone] bharatviswa504 commented on a change in pull request #1411: HDDS-4097. [DESIGN] S3/Ozone Filesystem inter-op

Posted by GitBox <gi...@apache.org>.
bharatviswa504 commented on a change in pull request #1411:
URL: https://github.com/apache/hadoop-ozone/pull/1411#discussion_r485802830



##########
File path: hadoop-hdds/docs/content/design/s3_hcfs.md
##########
@@ -0,0 +1,282 @@
+---
+title: S3/Ozone Filesystem inter-op 
+summary: How to support both S3 and HCFS and the same time
+date: 2020-09-09
+jira: HDDS-4097
+status: draft
+author: Marton Elek, 
+---
+<!--
+  Licensed under the Apache License, Version 2.0 (the "License");
+  you may not use this file except in compliance with the License.
+  You may obtain a copy of the License at
+
+   http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License. See accompanying LICENSE file.
+-->
+
+# Ozone S3 vs file-system semantics
+
+Ozone is an object-store for Hadoop ecosystem which can be used from multiple interfaces: 
+
+ 1. From Hadoop Compatible File Systems (will be called as *HCFS* in the remaining of this document) (RPC)
+ 2. From S3 compatible applications (REST)
+ 3. From container orchestrator as mounted volume (CSI, alpha feature)
+
+As Ozone is an object store it stores key and values in a flat hierarchy which is enough to support S3 (2). But to support Hadoop Compatible File System (and CSI), Ozone should simulated file system hierarchy.
+
+There are multiple challenges when file system hierarchy is simulated by a flat namespace:
+
+ 1. Some key patterns couldn't be easily transformed to file system path (e.g. `/a/b/../c`, `/a/b//d`, or a real key with directory path like `/b/d/`)
+ 2. Directory entries (which may have own properties) require special handling as file system interface requires a dir entry even if it's not created explicitly (for example if key `/a/b/c` is created `/a/b` supposed to be a visible directory entry for file system interface) 
+ 3. Non-recursive listing of directories can be hard (Listing direct entries under `/a` should ignore all the `/a/b/...`, `/a/b/c/...` keys) 
+ 4. Similar to listing, rename can be a costly operation as it requires to rename many keys (renaming a first level directory means a rename of all the keys with the same prefix)
+
+See also the [Hadoop S3A documentation](https://hadoop.apache.org/docs/current/hadoop-aws/tools/hadoop-aws/index.html#Introducing_the_Hadoop_S3A_client) which describes some of these problem when AWS S3 is used. (*Warnings* section)
+
+# Current status
+
+As of today *Ozone Manager* has two different interfaces (both are defined in `OmClientProtocol.proto`): 
+
+ 1. object store related functions (like *CreateKey*, *LookupKey*, *DeleteKey*,...)  
+ 2. file system related functions (like *CreateFile*, *LookupFile*,...)
+
+File system related functions uses the same flat hierarchy under the hood but includes additional functionalities. For example the `createFile` call creates all the intermediate directories for a specific key (create file `/a/b/c` will create `/a/b` and `/a` entries in the key space)
+
+Today, if a key is created from the S3 interface can cause exceptions if the intermediate directories are checked from HCFS:
+
+
+```shell
+$ aws s3api put-object --endpoint http://localhost:9878 --bucket bucket1 --key /a/b/c/d
+
+$ ozone fs -ls o3fs://bucket1.s3v/a/
+ls: `o3fs://bucket1.s3v/a/': No such file or directory
+```
+
+This problem is reported in [HDDS-3955](https://issues.apache.org/jira/browse/HDDS-3955), where a new configuration key is introduced (`ozone.om.enable.filesystem.paths`). If this is enabled, intermediate directories are created even if the object store interface is used.
+
+This configuration is turned off by default, which means that S3 and HCFS couldn't be used together.
+
+To solve the performance problems of the directory listing / rename, [HDDS-2939](https://issues.apache.org/jira/browse/HDDS-2939) is created, which propose to use a new prefix table to store the "directory" entries (=prefixes).
+
+[HDDS-4097](https://issues.apache.org/jira/browse/HDDS-4097) is created to normalize the key names based on file-system semantics if `ozone.om.enable.filesystem.paths` is enabled. But please note that `ozone.om.enable.filesystem.paths` should always be turned on if S3 and HCFS are both used which means that S3 and HCFS couldn't be used together with normalization.
+
+# Goals
+
+ * Out of the box Ozone should support both S3 and HCFS interfaces without any settings. (It's possible only for the regular, fs compatible key names)
+ * As 100% compatibility couldn't be achieved on both side we need a configuration to set the expectations for incompatible key names
+ * Default behavior of `o3fs` and `ofs` should be as close to `s3a` as possible (when s3 compatibilty is prefered)
+
+# Possible cases to support
+
+There are two main aspects of supporting both `ofs/o3fs` and `s3` together:
+
+ 1. `ofs/o3fs` require to create intermediate directory entries (for exapmle `/a/b` for the key `/b/c/c`)
+ 2. Special file-system incompatible key names require special attention
+
+The second couldn't be done with compromise.
+
+ 1. We either support all key names (including non fs compatible key names), which means `ofs/o3fs` can provide only a partial view
+ 2. Or we can normalize the key names to be fs compatible (which makes it possible to create inconsistent S3 keys)
+
+HDDS-3955 introduced `ozone.om.enable.filesystem.paths`, with this setting we will have two possible usage pattern:
+
+| ozone.om.enable.filesystem.paths= | true | false
+|-|-|-|
+| create itermediate dirs | YES | NO |
+| normalize key names from `ofs/o3fs` | YES | NO
+| force to normalize key names of `s3` interface | YES (1) | NO 
+| `s3` key `/a/b/c` available from `ofs/o3fs` | YES | NO
+| `s3` key `/a/b//c` available from `ofs/o3fs` | YES | NO
+| `s3` key `/a/b//c` available from `s3` | AWS S3 incompatibility | YES
+
+(1): Under implementation
+
+This proposal suggest to use a 3rd option where 100% AWS compatiblity is guaranteed in exchange of a limited `ofs/o3fs` view:
+
+| ozone.om.intermediate.dir.generation= | true |
+|-|-|-|
+| create itermediate dirs | YES | 
+| normalize key names from `ofs/o3fs` | YES |
+| force to normalize key names of `s3` interface | **NO** |
+| `s3` key `/a/b/c` available from `ofs/o3fs` | YES | 
+| `s3` key `/a/b//c` available from `ofs/o3fs` | NO | 
+| `s3` key `/a/b//c` available from `s3` | *YES* (100% AWS compatibility) |
+
+
+# Proposed solution
+
+In short: 
+
+ **I propose to make it possible to configure **normalization** and **intermediate dir creation**, independent from each other**
+ 
+It can be done in multiple ways. For the sake of simplicity, let's imagine two configuration option
+
+| configuration | behavior | 
+|-|-|
+| `ozone.om.enable.filesystem.paths=true`  | Enable intermediate dir generation **AND** key name normalization 
+| `ozone.om.enable.intermediate.dirs=true` | Enable only the intermediate dir generation
+
+## S3/HCFS Interoperability
+
+**In case of intermediate directory generation is enabled (with either of the configuraiton keys)**:
+
+When somebody creates a new key like `/a/b/c/d`, the same key should be visible from HCFS (`o3fs//` or `o3://`). `/a`, `/a/b` and `/a/b/c` should be visible as directories from HCFS.
+
+S3 should list only the `/a/b/c/d` keys, (`/a`, `/a/b`, `/a/b/c` keys, created to help HCFS, **won't be visible** if the key is created from S3)
+
+This can be done with persisting an extra flag with the implicit directory entries. These entries can be modified if they are explicit created.
+
+This flag should be added only for the keys which are created by S3. `ofs://` and `of3fs://`  create explicit directories all the time.
+
+Advantages of this approach:
+
+ 1. HCFS and S3 can work together
+ 2. S3 behavior is closer to the original AWS s3 behavior (when `/a/b/c` key is created `/a/b` won't be visible)
+
+## Handling of the incompatible paths
+
+As it's defined above the intermediate directory generation and normalization are two independent settings. (It's possible to choose only to create the intermediate directories).
+
+**If normalization is choosen**: (`ozone.om.enable.filesystem.paths=true`), all the key names will be normalized to fs-compatible name. It may cause a conflict (error) if the normalized key is already exists (or exists as a file instead of directory)
+
+**Without normalization (`ozone.om.enable.intermediate.dirs=true`)**:
+
+Creating intermediate directories might not be possible if path contains illegal characters or can't be parsed as a file system path. **These keys will be invisible from HCFS** by default. They will be ignored during the normal file list.
+
+## Using Ozone in object-store only mode
+
+Creating intermediate directories can have some overhead (write amplification is increased if many keys are written with different prefixes as we need an entry for each prefixes). This write-amplification can be handled with the current implementation: based on the measurements RocksDB has no problems with billions of keys.
+
+If none of the mentioned configurations are enabled, the intermediate directories won't be created. But in this case, the consistent view of `ofs/o3fs` couldn't be guaranteed, so `ofs/o3fs` **should be disabled and throw an exception** (But Ozone can be used as a pure S3 replacement without using as a HCFS).
+
+# Problematic cases
+
+As described in the previous section there are some cases which couldn't be supported out-of-the-box due to the differences between the flat key-space and file-system hierarchy. These cases are collected here together with the information how existing tools (AWS console, AWS cli, AWS S3A Hadoop connector) behaves.
+
+## Empty directory path
+
+With a pure object store keys with empty directory names can be created.
+
+```
+aws s3api put-object --bucket ozonetest --key a/b//y --body README.md
+```
+
+Behavior:
+ * *S3 web console*: empty dir is rendered as `____` to make it possible to navigate in
+ * **aws s3 ls**: Prefix entry is visible as `PRE /`
+ * S3A: Not visible
+ 
+Proposed behavior:
+
+ * `ozone.om.enable.intermediate.dirs=true`: `/y` is not accessible, `/a/b` directory doesn't contain this entry
+ * `ozone.om.enable.filesystem.paths=true`: key stored as `/a/b/c`  
+ 
+## Path with invalid characters (`..`,`.`) 
+
+Path segments might include parts which has file system semantics:
+
+```
+aws s3api put-object --bucket ozonetest --key a/b/../e --body README.md
+aws s3api put-object --bucket ozonetest --key a/b/./f --body README.md
+```
+
+Behavior:
+ * *S3 web console*: `.` and `..` are rendered as directories to make it possible to navigate in
+ * **aws s3 ls**: Prefix entry is visible as `PRE ../` and `PRE ./`
+ * S3A: Entries are not visible
+ 
+Proposed behavior:
+
+ * `ozone.om.enable.intermediate.dirs=true`: `e` and `f` are not visible
+ * `ozone.om.enable.filesystem.paths=true`: key stored as `/a/e` and `a/b/f`  
+
+## Key and directory with the same name
+
+It is possible to create directory and key with the same name in AWS:
+
+```
+aws s3api put-object --bucket ozonetest --key a/b/h --body README.md

Review comment:
       aws s3api put-object --bucket ozonetest --key a/b/ --body README.md
   Need one more put above to complete the example

##########
File path: hadoop-hdds/docs/content/design/s3_hcfs.md
##########
@@ -0,0 +1,282 @@
+---
+title: S3/Ozone Filesystem inter-op 
+summary: How to support both S3 and HCFS and the same time
+date: 2020-09-09
+jira: HDDS-4097
+status: draft
+author: Marton Elek, 
+---
+<!--
+  Licensed under the Apache License, Version 2.0 (the "License");
+  you may not use this file except in compliance with the License.
+  You may obtain a copy of the License at
+
+   http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License. See accompanying LICENSE file.
+-->
+
+# Ozone S3 vs file-system semantics
+
+Ozone is an object-store for Hadoop ecosystem which can be used from multiple interfaces: 
+
+ 1. From Hadoop Compatible File Systems (will be called as *HCFS* in the remaining of this document) (RPC)
+ 2. From S3 compatible applications (REST)
+ 3. From container orchestrator as mounted volume (CSI, alpha feature)
+
+As Ozone is an object store it stores key and values in a flat hierarchy which is enough to support S3 (2). But to support Hadoop Compatible File System (and CSI), Ozone should simulated file system hierarchy.
+
+There are multiple challenges when file system hierarchy is simulated by a flat namespace:
+
+ 1. Some key patterns couldn't be easily transformed to file system path (e.g. `/a/b/../c`, `/a/b//d`, or a real key with directory path like `/b/d/`)
+ 2. Directory entries (which may have own properties) require special handling as file system interface requires a dir entry even if it's not created explicitly (for example if key `/a/b/c` is created `/a/b` supposed to be a visible directory entry for file system interface) 
+ 3. Non-recursive listing of directories can be hard (Listing direct entries under `/a` should ignore all the `/a/b/...`, `/a/b/c/...` keys) 
+ 4. Similar to listing, rename can be a costly operation as it requires to rename many keys (renaming a first level directory means a rename of all the keys with the same prefix)
+
+See also the [Hadoop S3A documentation](https://hadoop.apache.org/docs/current/hadoop-aws/tools/hadoop-aws/index.html#Introducing_the_Hadoop_S3A_client) which describes some of these problem when AWS S3 is used. (*Warnings* section)
+
+# Current status
+
+As of today *Ozone Manager* has two different interfaces (both are defined in `OmClientProtocol.proto`): 
+
+ 1. object store related functions (like *CreateKey*, *LookupKey*, *DeleteKey*,...)  
+ 2. file system related functions (like *CreateFile*, *LookupFile*,...)
+
+File system related functions uses the same flat hierarchy under the hood but includes additional functionalities. For example the `createFile` call creates all the intermediate directories for a specific key (create file `/a/b/c` will create `/a/b` and `/a` entries in the key space)
+
+Today, if a key is created from the S3 interface can cause exceptions if the intermediate directories are checked from HCFS:
+
+
+```shell
+$ aws s3api put-object --endpoint http://localhost:9878 --bucket bucket1 --key /a/b/c/d
+
+$ ozone fs -ls o3fs://bucket1.s3v/a/
+ls: `o3fs://bucket1.s3v/a/': No such file or directory
+```
+
+This problem is reported in [HDDS-3955](https://issues.apache.org/jira/browse/HDDS-3955), where a new configuration key is introduced (`ozone.om.enable.filesystem.paths`). If this is enabled, intermediate directories are created even if the object store interface is used.
+
+This configuration is turned off by default, which means that S3 and HCFS couldn't be used together.
+
+To solve the performance problems of the directory listing / rename, [HDDS-2939](https://issues.apache.org/jira/browse/HDDS-2939) is created, which propose to use a new prefix table to store the "directory" entries (=prefixes).
+
+[HDDS-4097](https://issues.apache.org/jira/browse/HDDS-4097) is created to normalize the key names based on file-system semantics if `ozone.om.enable.filesystem.paths` is enabled. But please note that `ozone.om.enable.filesystem.paths` should always be turned on if S3 and HCFS are both used which means that S3 and HCFS couldn't be used together with normalization.
+
+# Goals
+
+ * Out of the box Ozone should support both S3 and HCFS interfaces without any settings. (It's possible only for the regular, fs compatible key names)
+ * As 100% compatibility couldn't be achieved on both side we need a configuration to set the expectations for incompatible key names
+ * Default behavior of `o3fs` and `ofs` should be as close to `s3a` as possible (when s3 compatibilty is prefered)
+
+# Possible cases to support
+
+There are two main aspects of supporting both `ofs/o3fs` and `s3` together:
+
+ 1. `ofs/o3fs` require to create intermediate directory entries (for exapmle `/a/b` for the key `/b/c/c`)
+ 2. Special file-system incompatible key names require special attention
+
+The second couldn't be done with compromise.
+
+ 1. We either support all key names (including non fs compatible key names), which means `ofs/o3fs` can provide only a partial view
+ 2. Or we can normalize the key names to be fs compatible (which makes it possible to create inconsistent S3 keys)
+
+HDDS-3955 introduced `ozone.om.enable.filesystem.paths`, with this setting we will have two possible usage pattern:
+
+| ozone.om.enable.filesystem.paths= | true | false
+|-|-|-|
+| create itermediate dirs | YES | NO |
+| normalize key names from `ofs/o3fs` | YES | NO
+| force to normalize key names of `s3` interface | YES (1) | NO 
+| `s3` key `/a/b/c` available from `ofs/o3fs` | YES | NO
+| `s3` key `/a/b//c` available from `ofs/o3fs` | YES | NO
+| `s3` key `/a/b//c` available from `s3` | AWS S3 incompatibility | YES
+
+(1): Under implementation
+
+This proposal suggest to use a 3rd option where 100% AWS compatiblity is guaranteed in exchange of a limited `ofs/o3fs` view:
+
+| ozone.om.intermediate.dir.generation= | true |
+|-|-|-|
+| create itermediate dirs | YES | 
+| normalize key names from `ofs/o3fs` | YES |
+| force to normalize key names of `s3` interface | **NO** |
+| `s3` key `/a/b/c` available from `ofs/o3fs` | YES | 
+| `s3` key `/a/b//c` available from `ofs/o3fs` | NO | 
+| `s3` key `/a/b//c` available from `s3` | *YES* (100% AWS compatibility) |
+
+
+# Proposed solution
+
+In short: 
+
+ **I propose to make it possible to configure **normalization** and **intermediate dir creation**, independent from each other**
+ 
+It can be done in multiple ways. For the sake of simplicity, let's imagine two configuration option
+
+| configuration | behavior | 
+|-|-|
+| `ozone.om.enable.filesystem.paths=true`  | Enable intermediate dir generation **AND** key name normalization 
+| `ozone.om.enable.intermediate.dirs=true` | Enable only the intermediate dir generation
+
+## S3/HCFS Interoperability
+
+**In case of intermediate directory generation is enabled (with either of the configuraiton keys)**:
+
+When somebody creates a new key like `/a/b/c/d`, the same key should be visible from HCFS (`o3fs//` or `o3://`). `/a`, `/a/b` and `/a/b/c` should be visible as directories from HCFS.
+
+S3 should list only the `/a/b/c/d` keys, (`/a`, `/a/b`, `/a/b/c` keys, created to help HCFS, **won't be visible** if the key is created from S3)
+
+This can be done with persisting an extra flag with the implicit directory entries. These entries can be modified if they are explicit created.
+
+This flag should be added only for the keys which are created by S3. `ofs://` and `of3fs://`  create explicit directories all the time.
+
+Advantages of this approach:
+
+ 1. HCFS and S3 can work together
+ 2. S3 behavior is closer to the original AWS s3 behavior (when `/a/b/c` key is created `/a/b` won't be visible)
+
+## Handling of the incompatible paths
+
+As it's defined above the intermediate directory generation and normalization are two independent settings. (It's possible to choose only to create the intermediate directories).
+
+**If normalization is choosen**: (`ozone.om.enable.filesystem.paths=true`), all the key names will be normalized to fs-compatible name. It may cause a conflict (error) if the normalized key is already exists (or exists as a file instead of directory)
+
+**Without normalization (`ozone.om.enable.intermediate.dirs=true`)**:
+
+Creating intermediate directories might not be possible if path contains illegal characters or can't be parsed as a file system path. **These keys will be invisible from HCFS** by default. They will be ignored during the normal file list.
+
+## Using Ozone in object-store only mode
+
+Creating intermediate directories can have some overhead (write amplification is increased if many keys are written with different prefixes as we need an entry for each prefixes). This write-amplification can be handled with the current implementation: based on the measurements RocksDB has no problems with billions of keys.
+
+If none of the mentioned configurations are enabled, the intermediate directories won't be created. But in this case, the consistent view of `ofs/o3fs` couldn't be guaranteed, so `ofs/o3fs` **should be disabled and throw an exception** (But Ozone can be used as a pure S3 replacement without using as a HCFS).
+
+# Problematic cases
+
+As described in the previous section there are some cases which couldn't be supported out-of-the-box due to the differences between the flat key-space and file-system hierarchy. These cases are collected here together with the information how existing tools (AWS console, AWS cli, AWS S3A Hadoop connector) behaves.
+
+## Empty directory path
+
+With a pure object store keys with empty directory names can be created.
+
+```
+aws s3api put-object --bucket ozonetest --key a/b//y --body README.md
+```
+
+Behavior:
+ * *S3 web console*: empty dir is rendered as `____` to make it possible to navigate in
+ * **aws s3 ls**: Prefix entry is visible as `PRE /`
+ * S3A: Not visible
+ 
+Proposed behavior:
+
+ * `ozone.om.enable.intermediate.dirs=true`: `/y` is not accessible, `/a/b` directory doesn't contain this entry
+ * `ozone.om.enable.filesystem.paths=true`: key stored as `/a/b/c`  
+ 
+## Path with invalid characters (`..`,`.`) 
+
+Path segments might include parts which has file system semantics:
+
+```
+aws s3api put-object --bucket ozonetest --key a/b/../e --body README.md
+aws s3api put-object --bucket ozonetest --key a/b/./f --body README.md
+```
+
+Behavior:
+ * *S3 web console*: `.` and `..` are rendered as directories to make it possible to navigate in
+ * **aws s3 ls**: Prefix entry is visible as `PRE ../` and `PRE ./`
+ * S3A: Entries are not visible
+ 
+Proposed behavior:
+
+ * `ozone.om.enable.intermediate.dirs=true`: `e` and `f` are not visible
+ * `ozone.om.enable.filesystem.paths=true`: key stored as `/a/e` and `a/b/f`  
+
+## Key and directory with the same name
+
+It is possible to create directory and key with the same name in AWS:
+
+```
+aws s3api put-object --bucket ozonetest --key a/b/h --body README.md

Review comment:
       hdfs dfs -mkdir -p o3fs://ozonetest.s3v/a/b/h 
   Need mkdir above to complete the example




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org


[GitHub] [hadoop-ozone] elek commented on a change in pull request #1411: HDDS-4097. [DESIGN] S3/Ozone Filesystem inter-op

Posted by GitBox <gi...@apache.org>.
elek commented on a change in pull request #1411:
URL: https://github.com/apache/hadoop-ozone/pull/1411#discussion_r486417710



##########
File path: hadoop-hdds/docs/content/design/s3_hcfs.md
##########
@@ -67,45 +66,100 @@ To solve the performance problems of the directory listing / rename, [HDDS-2939]
 
 [HDDS-4097](https://issues.apache.org/jira/browse/HDDS-4097) is created to normalize the key names based on file-system semantics if `ozone.om.enable.filesystem.paths` is enabled. But please note that `ozone.om.enable.filesystem.paths` should always be turned on if S3 and HCFS are both used which means that S3 and HCFS couldn't be used together with normalization.
 
-## Goals
+# Goals
+
+ * Out of the box Ozone should support both S3 and HCFS interfaces without any settings. (It's possible only for the regular, fs compatible key names)
+ * As 100% compatibility couldn't be achieved on both side we need a configuration to set the expectations for incompatible key names
+ * Default behavior of `o3fs` and `ofs` should be as close to `s3a` as possible (when s3 compatibilty is prefered)
+
+# Possible cases to support
+
+There are two main aspects of supporting both `ofs/o3fs` and `s3` together:
+
+ 1. `ofs/o3fs` require to create intermediate directory entries (for exapmle `/a/b` for the key `/b/c/c`)
+ 2. Special file-system incompatible key names require special attention
+
+The second couldn't be done with compromise.
+
+ 1. We either support all key names (including non fs compatible key names), which means `ofs/o3fs` can provide only a partial view
+ 2. Or we can normalize the key names to be fs compatible (which makes it possible to create inconsistent S3 keys)
+
+HDDS-3955 introduced `ozone.om.enable.filesystem.paths`, with this setting we will have two possible usage pattern:
+
+| ozone.om.enable.filesystem.paths= | true | false
+|-|-|-|
+| create itermediate dirs | YES | NO |
+| normalize key names from `ofs/o3fs` | YES | NO
+| force to normalize key names of `s3` interface | YES (1) | NO 
+| `s3` key `/a/b/c` available from `ofs/o3fs` | YES | NO
+| `s3` key `/a/b//c` available from `ofs/o3fs` | YES | NO
+| `s3` key `/a/b//c` available from `s3` | AWS S3 incompatibility | YES
+
+(1): Under implementation

Review comment:
       > Here AWS S3 incompatibility means, is it because we are showing normalized keys?
   
   Yes, keys are normalized. Content can be found under different key names.
   
   I started to define the 100% compatibility here:
   
   https://github.com/elek/hadoop-ozone/blob/s3-compat/hadoop-ozone/dist/src/main/smoketest/s3/s3-vs-filepath.robot




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org


[GitHub] [hadoop-ozone] elek edited a comment on pull request #1411: HDDS-4097. [DESIGN] S3/Ozone Filesystem inter-op

Posted by GitBox <gi...@apache.org>.
elek edited a comment on pull request #1411:
URL: https://github.com/apache/hadoop-ozone/pull/1411#issuecomment-689541807


   Opened as a **DRAFT** pull request as this is only a proposal. 
   
   @arp7 and @bharatviswa504 still have concerns about the proposed approach.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org


[GitHub] [hadoop-ozone] bharatviswa504 commented on pull request #1411: HDDS-4097. [DESIGN] S3/Ozone Filesystem inter-op

Posted by GitBox <gi...@apache.org>.
bharatviswa504 commented on pull request #1411:
URL: https://github.com/apache/hadoop-ozone/pull/1411#issuecomment-690808157


   > > Thank You @elek for the design document.
   > > My understanding from this is the draft is as below. Let me know if I am missing something here.
   > > <img alt="Screen Shot 2020-09-09 at 10 59 11 AM" width="998" src="https://user-images.githubusercontent.com/8586345/92635994-8856fe80-f28b-11ea-95bf-8864d48e488f.png">
   > 
   > Correct. But this is not a matrix anymore. You should turn on either first or second of the configs, but not both.
   
   Not sure what is meant here, because we have 2 configs, now we can have 4 combinations according to proposal 3 are valid, 4th one is not. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org


[GitHub] [hadoop-ozone] bharatviswa504 commented on a change in pull request #1411: HDDS-4097. [DESIGN] S3/Ozone Filesystem inter-op

Posted by GitBox <gi...@apache.org>.
bharatviswa504 commented on a change in pull request #1411:
URL: https://github.com/apache/hadoop-ozone/pull/1411#discussion_r485785304



##########
File path: hadoop-hdds/docs/content/design/s3_hcfs.md
##########
@@ -0,0 +1,282 @@
+---
+title: S3/Ozone Filesystem inter-op 
+summary: How to support both S3 and HCFS and the same time
+date: 2020-09-09
+jira: HDDS-4097
+status: draft
+author: Marton Elek, 
+---
+<!--
+  Licensed under the Apache License, Version 2.0 (the "License");
+  you may not use this file except in compliance with the License.
+  You may obtain a copy of the License at
+
+   http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License. See accompanying LICENSE file.
+-->
+
+# Ozone S3 vs file-system semantics
+
+Ozone is an object-store for Hadoop ecosystem which can be used from multiple interfaces: 
+
+ 1. From Hadoop Compatible File Systems (will be called as *HCFS* in the remaining of this document) (RPC)
+ 2. From S3 compatible applications (REST)
+ 3. From container orchestrator as mounted volume (CSI, alpha feature)
+
+As Ozone is an object store it stores key and values in a flat hierarchy which is enough to support S3 (2). But to support Hadoop Compatible File System (and CSI), Ozone should simulated file system hierarchy.
+
+There are multiple challenges when file system hierarchy is simulated by a flat namespace:
+
+ 1. Some key patterns couldn't be easily transformed to file system path (e.g. `/a/b/../c`, `/a/b//d`, or a real key with directory path like `/b/d/`)
+ 2. Directory entries (which may have own properties) require special handling as file system interface requires a dir entry even if it's not created explicitly (for example if key `/a/b/c` is created `/a/b` supposed to be a visible directory entry for file system interface) 
+ 3. Non-recursive listing of directories can be hard (Listing direct entries under `/a` should ignore all the `/a/b/...`, `/a/b/c/...` keys) 
+ 4. Similar to listing, rename can be a costly operation as it requires to rename many keys (renaming a first level directory means a rename of all the keys with the same prefix)
+
+See also the [Hadoop S3A documentation](https://hadoop.apache.org/docs/current/hadoop-aws/tools/hadoop-aws/index.html#Introducing_the_Hadoop_S3A_client) which describes some of these problem when AWS S3 is used. (*Warnings* section)
+
+# Current status
+
+As of today *Ozone Manager* has two different interfaces (both are defined in `OmClientProtocol.proto`): 
+
+ 1. object store related functions (like *CreateKey*, *LookupKey*, *DeleteKey*,...)  
+ 2. file system related functions (like *CreateFile*, *LookupFile*,...)
+
+File system related functions uses the same flat hierarchy under the hood but includes additional functionalities. For example the `createFile` call creates all the intermediate directories for a specific key (create file `/a/b/c` will create `/a/b` and `/a` entries in the key space)
+
+Today, if a key is created from the S3 interface can cause exceptions if the intermediate directories are checked from HCFS:
+
+
+```shell
+$ aws s3api put-object --endpoint http://localhost:9878 --bucket bucket1 --key /a/b/c/d
+
+$ ozone fs -ls o3fs://bucket1.s3v/a/
+ls: `o3fs://bucket1.s3v/a/': No such file or directory
+```
+
+This problem is reported in [HDDS-3955](https://issues.apache.org/jira/browse/HDDS-3955), where a new configuration key is introduced (`ozone.om.enable.filesystem.paths`). If this is enabled, intermediate directories are created even if the object store interface is used.
+
+This configuration is turned off by default, which means that S3 and HCFS couldn't be used together.
+
+To solve the performance problems of the directory listing / rename, [HDDS-2939](https://issues.apache.org/jira/browse/HDDS-2939) is created, which propose to use a new prefix table to store the "directory" entries (=prefixes).
+
+[HDDS-4097](https://issues.apache.org/jira/browse/HDDS-4097) is created to normalize the key names based on file-system semantics if `ozone.om.enable.filesystem.paths` is enabled. But please note that `ozone.om.enable.filesystem.paths` should always be turned on if S3 and HCFS are both used which means that S3 and HCFS couldn't be used together with normalization.
+
+# Goals
+
+ * Out of the box Ozone should support both S3 and HCFS interfaces without any settings. (It's possible only for the regular, fs compatible key names)
+ * As 100% compatibility couldn't be achieved on both side we need a configuration to set the expectations for incompatible key names
+ * Default behavior of `o3fs` and `ofs` should be as close to `s3a` as possible (when s3 compatibilty is prefered)
+
+# Possible cases to support
+
+There are two main aspects of supporting both `ofs/o3fs` and `s3` together:
+
+ 1. `ofs/o3fs` require to create intermediate directory entries (for exapmle `/a/b` for the key `/b/c/c`)
+ 2. Special file-system incompatible key names require special attention
+
+The second couldn't be done with compromise.
+
+ 1. We either support all key names (including non fs compatible key names), which means `ofs/o3fs` can provide only a partial view
+ 2. Or we can normalize the key names to be fs compatible (which makes it possible to create inconsistent S3 keys)
+
+HDDS-3955 introduced `ozone.om.enable.filesystem.paths`, with this setting we will have two possible usage pattern:
+
+| ozone.om.enable.filesystem.paths= | true | false
+|-|-|-|
+| create itermediate dirs | YES | NO |
+| normalize key names from `ofs/o3fs` | YES | NO
+| force to normalize key names of `s3` interface | YES (1) | NO 

Review comment:
       There is no such force flag, if normalize is enable, we normalize from all interfaces.
   As We have 4 interfaces.
   1. CLI
   2. S3
   3. HCFS
   4. Java Native Client.
   
   So, this flag is not some thing special for S3. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org


[GitHub] [hadoop-ozone] bharatviswa504 commented on a change in pull request #1411: HDDS-4097. [DESIGN] S3/Ozone Filesystem inter-op

Posted by GitBox <gi...@apache.org>.
bharatviswa504 commented on a change in pull request #1411:
URL: https://github.com/apache/hadoop-ozone/pull/1411#discussion_r485805715



##########
File path: hadoop-hdds/docs/content/design/s3_hcfs.md
##########
@@ -0,0 +1,282 @@
+---
+title: S3/Ozone Filesystem inter-op 
+summary: How to support both S3 and HCFS and the same time
+date: 2020-09-09
+jira: HDDS-4097
+status: draft
+author: Marton Elek, 
+---
+<!--
+  Licensed under the Apache License, Version 2.0 (the "License");
+  you may not use this file except in compliance with the License.
+  You may obtain a copy of the License at
+
+   http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License. See accompanying LICENSE file.
+-->
+
+# Ozone S3 vs file-system semantics
+
+Ozone is an object-store for Hadoop ecosystem which can be used from multiple interfaces: 
+
+ 1. From Hadoop Compatible File Systems (will be called as *HCFS* in the remaining of this document) (RPC)
+ 2. From S3 compatible applications (REST)
+ 3. From container orchestrator as mounted volume (CSI, alpha feature)
+
+As Ozone is an object store it stores key and values in a flat hierarchy which is enough to support S3 (2). But to support Hadoop Compatible File System (and CSI), Ozone should simulated file system hierarchy.
+
+There are multiple challenges when file system hierarchy is simulated by a flat namespace:
+
+ 1. Some key patterns couldn't be easily transformed to file system path (e.g. `/a/b/../c`, `/a/b//d`, or a real key with directory path like `/b/d/`)
+ 2. Directory entries (which may have own properties) require special handling as file system interface requires a dir entry even if it's not created explicitly (for example if key `/a/b/c` is created `/a/b` supposed to be a visible directory entry for file system interface) 
+ 3. Non-recursive listing of directories can be hard (Listing direct entries under `/a` should ignore all the `/a/b/...`, `/a/b/c/...` keys) 
+ 4. Similar to listing, rename can be a costly operation as it requires to rename many keys (renaming a first level directory means a rename of all the keys with the same prefix)
+
+See also the [Hadoop S3A documentation](https://hadoop.apache.org/docs/current/hadoop-aws/tools/hadoop-aws/index.html#Introducing_the_Hadoop_S3A_client) which describes some of these problem when AWS S3 is used. (*Warnings* section)
+
+# Current status
+
+As of today *Ozone Manager* has two different interfaces (both are defined in `OmClientProtocol.proto`): 
+
+ 1. object store related functions (like *CreateKey*, *LookupKey*, *DeleteKey*,...)  
+ 2. file system related functions (like *CreateFile*, *LookupFile*,...)
+
+File system related functions uses the same flat hierarchy under the hood but includes additional functionalities. For example the `createFile` call creates all the intermediate directories for a specific key (create file `/a/b/c` will create `/a/b` and `/a` entries in the key space)
+
+Today, if a key is created from the S3 interface can cause exceptions if the intermediate directories are checked from HCFS:
+
+
+```shell
+$ aws s3api put-object --endpoint http://localhost:9878 --bucket bucket1 --key /a/b/c/d
+
+$ ozone fs -ls o3fs://bucket1.s3v/a/
+ls: `o3fs://bucket1.s3v/a/': No such file or directory
+```
+
+This problem is reported in [HDDS-3955](https://issues.apache.org/jira/browse/HDDS-3955), where a new configuration key is introduced (`ozone.om.enable.filesystem.paths`). If this is enabled, intermediate directories are created even if the object store interface is used.
+
+This configuration is turned off by default, which means that S3 and HCFS couldn't be used together.
+
+To solve the performance problems of the directory listing / rename, [HDDS-2939](https://issues.apache.org/jira/browse/HDDS-2939) is created, which propose to use a new prefix table to store the "directory" entries (=prefixes).
+
+[HDDS-4097](https://issues.apache.org/jira/browse/HDDS-4097) is created to normalize the key names based on file-system semantics if `ozone.om.enable.filesystem.paths` is enabled. But please note that `ozone.om.enable.filesystem.paths` should always be turned on if S3 and HCFS are both used which means that S3 and HCFS couldn't be used together with normalization.
+
+# Goals
+
+ * Out of the box Ozone should support both S3 and HCFS interfaces without any settings. (It's possible only for the regular, fs compatible key names)
+ * As 100% compatibility couldn't be achieved on both side we need a configuration to set the expectations for incompatible key names
+ * Default behavior of `o3fs` and `ofs` should be as close to `s3a` as possible (when s3 compatibilty is prefered)
+
+# Possible cases to support
+
+There are two main aspects of supporting both `ofs/o3fs` and `s3` together:
+
+ 1. `ofs/o3fs` require to create intermediate directory entries (for exapmle `/a/b` for the key `/b/c/c`)
+ 2. Special file-system incompatible key names require special attention
+
+The second couldn't be done with compromise.
+
+ 1. We either support all key names (including non fs compatible key names), which means `ofs/o3fs` can provide only a partial view
+ 2. Or we can normalize the key names to be fs compatible (which makes it possible to create inconsistent S3 keys)
+
+HDDS-3955 introduced `ozone.om.enable.filesystem.paths`, with this setting we will have two possible usage pattern:
+
+| ozone.om.enable.filesystem.paths= | true | false
+|-|-|-|
+| create itermediate dirs | YES | NO |
+| normalize key names from `ofs/o3fs` | YES | NO
+| force to normalize key names of `s3` interface | YES (1) | NO 
+| `s3` key `/a/b/c` available from `ofs/o3fs` | YES | NO
+| `s3` key `/a/b//c` available from `ofs/o3fs` | YES | NO
+| `s3` key `/a/b//c` available from `s3` | AWS S3 incompatibility | YES
+
+(1): Under implementation
+
+This proposal suggest to use a 3rd option where 100% AWS compatiblity is guaranteed in exchange of a limited `ofs/o3fs` view:
+
+| ozone.om.intermediate.dir.generation= | true |
+|-|-|-|
+| create itermediate dirs | YES | 
+| normalize key names from `ofs/o3fs` | YES |
+| force to normalize key names of `s3` interface | **NO** |
+| `s3` key `/a/b/c` available from `ofs/o3fs` | YES | 
+| `s3` key `/a/b//c` available from `ofs/o3fs` | NO | 
+| `s3` key `/a/b//c` available from `s3` | *YES* (100% AWS compatibility) |
+
+
+# Proposed solution
+
+In short: 
+
+ **I propose to make it possible to configure **normalization** and **intermediate dir creation**, independent from each other**
+ 
+It can be done in multiple ways. For the sake of simplicity, let's imagine two configuration option
+
+| configuration | behavior | 
+|-|-|
+| `ozone.om.enable.filesystem.paths=true`  | Enable intermediate dir generation **AND** key name normalization 
+| `ozone.om.enable.intermediate.dirs=true` | Enable only the intermediate dir generation
+
+## S3/HCFS Interoperability
+
+**In case of intermediate directory generation is enabled (with either of the configuraiton keys)**:
+
+When somebody creates a new key like `/a/b/c/d`, the same key should be visible from HCFS (`o3fs//` or `o3://`). `/a`, `/a/b` and `/a/b/c` should be visible as directories from HCFS.
+
+S3 should list only the `/a/b/c/d` keys, (`/a`, `/a/b`, `/a/b/c` keys, created to help HCFS, **won't be visible** if the key is created from S3)
+
+This can be done with persisting an extra flag with the implicit directory entries. These entries can be modified if they are explicit created.
+
+This flag should be added only for the keys which are created by S3. `ofs://` and `of3fs://`  create explicit directories all the time.
+
+Advantages of this approach:
+
+ 1. HCFS and S3 can work together
+ 2. S3 behavior is closer to the original AWS s3 behavior (when `/a/b/c` key is created `/a/b` won't be visible)
+
+## Handling of the incompatible paths
+
+As it's defined above the intermediate directory generation and normalization are two independent settings. (It's possible to choose only to create the intermediate directories).
+
+**If normalization is choosen**: (`ozone.om.enable.filesystem.paths=true`), all the key names will be normalized to fs-compatible name. It may cause a conflict (error) if the normalized key is already exists (or exists as a file instead of directory)
+
+**Without normalization (`ozone.om.enable.intermediate.dirs=true`)**:
+
+Creating intermediate directories might not be possible if path contains illegal characters or can't be parsed as a file system path. **These keys will be invisible from HCFS** by default. They will be ignored during the normal file list.
+
+## Using Ozone in object-store only mode
+
+Creating intermediate directories can have some overhead (write amplification is increased if many keys are written with different prefixes as we need an entry for each prefixes). This write-amplification can be handled with the current implementation: based on the measurements RocksDB has no problems with billions of keys.
+
+If none of the mentioned configurations are enabled, the intermediate directories won't be created. But in this case, the consistent view of `ofs/o3fs` couldn't be guaranteed, so `ofs/o3fs` **should be disabled and throw an exception** (But Ozone can be used as a pure S3 replacement without using as a HCFS).
+
+# Problematic cases
+
+As described in the previous section there are some cases which couldn't be supported out-of-the-box due to the differences between the flat key-space and file-system hierarchy. These cases are collected here together with the information how existing tools (AWS console, AWS cli, AWS S3A Hadoop connector) behaves.
+
+## Empty directory path
+
+With a pure object store keys with empty directory names can be created.
+
+```
+aws s3api put-object --bucket ozonetest --key a/b//y --body README.md
+```
+
+Behavior:
+ * *S3 web console*: empty dir is rendered as `____` to make it possible to navigate in
+ * **aws s3 ls**: Prefix entry is visible as `PRE /`
+ * S3A: Not visible
+ 
+Proposed behavior:
+
+ * `ozone.om.enable.intermediate.dirs=true`: `/y` is not accessible, `/a/b` directory doesn't contain this entry
+ * `ozone.om.enable.filesystem.paths=true`: key stored as `/a/b/c`  
+ 
+## Path with invalid characters (`..`,`.`) 
+
+Path segments might include parts which has file system semantics:
+
+```
+aws s3api put-object --bucket ozonetest --key a/b/../e --body README.md
+aws s3api put-object --bucket ozonetest --key a/b/./f --body README.md
+```
+
+Behavior:
+ * *S3 web console*: `.` and `..` are rendered as directories to make it possible to navigate in
+ * **aws s3 ls**: Prefix entry is visible as `PRE ../` and `PRE ./`
+ * S3A: Entries are not visible
+ 
+Proposed behavior:
+
+ * `ozone.om.enable.intermediate.dirs=true`: `e` and `f` are not visible
+ * `ozone.om.enable.filesystem.paths=true`: key stored as `/a/e` and `a/b/f`  
+
+## Key and directory with the same name
+
+It is possible to create directory and key with the same name in AWS:
+
+```
+aws s3api put-object --bucket ozonetest --key a/b/h --body README.md
+aws s3api list-objects --bucket ozonetest --prefix=a/b/h/
+```
+
+Behavior:
+ * *S3 web console*: both directory and file are rendered
+ * **aws s3 ls**: prefix (`PRE h/`) and file (`h`) are both displayed
+ * S3A: both entries are visible with the name `/a/b/h` but firt is a file (with size) second is a directory (with directory attributes)
+ 
+Proposed behavior:
+
+ * `ozone.om.enable.intermediate.dirs=true`: show both the file and the directory with the same name (similar to S3A)
+ * `ozone.om.enable.filesystem.paths=true`: throwing exception when the second one is created  
+
+## Directory entry created with file content
+
+In this case we create a directory (key which ends with `/`) but with real file content:
+
+```
+aws s3api put-object --bucket ozonetest --key a/b/i/ --body README.md
+```
+
+Behavior:
+ * *S3 web console*: rendered as directory (couldn't be downloaded)
+ * **aws s3 ls**: showed as a prefix (`aws s3 ls s3://ozonetest/a/b`), but when the full path is used showed as a file without name (`aws s3 ls s3://ozonetest/a/b/i/`)
+ * S3A: `./bin/hdfs dfs -ls s3a://ozonetest/a/b/` shows a directory `h`, `./bin/hdfs dfs -ls s3a://ozonetest/a/b/i` shows a file `i`
+ 
+Proposed behavior:
+
+ * `ozone.om.enable.intermediate.dirs=true`: possible but `i/` is hidden from o3fs/ofs
+ * `ozone.om.enable.filesystem.paths=true`: key name is normalized to real key name
+
+## Create key and explicit create parent dir
+
+```
+aws s3api put-object --bucket ozonetest --key e/f/g/
+aws s3api put-object --bucket ozonetest --key e/f/
+```
+
+Behavior:
+
+ * S3 can support it without any problem
+ 
+Proposed behavior:
+
+After the first command `/e/f/` and `/e/` entries created in the key space (as they are required by `ofs`/`o3fs`) but **with a specific flag** (explicit=false). 
+
+AWS S3 list-objects API should exclude those entries from the result (!).
+
+Second command execution should modify the flag of `/e/f/` key to (explicit=true).
+
+## Create parent dir AND key with S3a
+
+This is the problem which is reported by [HDDS-4209](https://issues.apache.org/jira/browse/HDDS-4209)
+
+```
+hdfs dfs -mkdir -p s3a://b12345/d11/d12 # -> Success
+
+hdfs dfs -put /tmp/file1 s3a://b12345/d11/d12/file1 # -> fails with below error
+```
+
+Proposed behavior:
+
+ * `ozone.om.enable.intermediate.dirs=true`: should work without error
+ * `ozone.om.enable.filesystem.paths=true`: should work without error.
+
+This is an `ofs`/`o3fs` question not an S3. The directory created in the first step shouldn't block the creation of the file. This can be a **mandatory** normalization for `mkdir` directory creation. As it's an HCFS operation, s3 is not affected. Entries created from S3 can be visible from s3 without any problem.
+
+## Create file and directory with S3
+
+This problem is reported in HDDS-4209, thanks to @Bharat
+
+```
+hdfs dfs -mkdir -p s3a://b12345/d11/d12 -> Success
+
+hdfs dfs -put /tmp/file1 s3a://b12345/d11/d12/file1 
+```
+
+In this case first a `d11/d12/` key is created. The intermediate key creation logic in the second step should use it as a directory instead of throwing an exception.

Review comment:
       d11/d12 is created without traiiling "/"




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org


[GitHub] [hadoop-ozone] xiaoyuyao commented on a change in pull request #1411: HDDS-4097. [DESIGN] S3/Ozone Filesystem inter-op

Posted by GitBox <gi...@apache.org>.
xiaoyuyao commented on a change in pull request #1411:
URL: https://github.com/apache/hadoop-ozone/pull/1411#discussion_r486761554



##########
File path: hadoop-hdds/docs/content/design/s3_hcfs.md
##########
@@ -0,0 +1,280 @@
+---
+title: S3/Ozone Filesystem inter-op 
+summary: How to support both S3 and HCFS and the same time
+date: 2020-09-09
+jira: HDDS-4097
+status: draft
+author: Marton Elek, 
+---
+<!--
+  Licensed under the Apache License, Version 2.0 (the "License");
+  you may not use this file except in compliance with the License.
+  You may obtain a copy of the License at
+
+   http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License. See accompanying LICENSE file.
+-->
+
+# Ozone S3 vs file-system semantics
+
+Ozone is an object-store for Hadoop ecosystem which can be used from multiple interfaces: 
+
+ 1. From Hadoop Compatible File Systems (will be called as *HCFS* in the remaining of this document) (RPC)
+ 2. From S3 compatible applications (REST)
+ 3. From container orchestrator as mounted volume (CSI, alpha feature)
+
+As Ozone is an object store it stores key and values in a flat hierarchy which is enough to support S3 (2). But to support Hadoop Compatible File System (and CSI), Ozone should simulated file system hierarchy.
+
+There are multiple challenges when file system hierarchy is simulated by a flat namespace:
+
+ 1. Some key patterns couldn't be easily transformed to file system path (e.g. `/a/b/../c`, `/a/b//d`, or a real key with directory path like `/b/d/`)
+ 2. Directory entries (which may have own properties) require special handling as file system interface requires a dir entry even if it's not created explicitly (for example if key `/a/b/c` is created `/a/b` supposed to be a visible directory entry for file system interface) 
+ 3. Non-recursive listing of directories can be hard (Listing direct entries under `/a` should ignore all the `/a/b/...`, `/a/b/c/...` keys) 
+ 4. Similar to listing, rename can be a costly operation as it requires to rename many keys (renaming a first level directory means a rename of all the keys with the same prefix)
+
+See also the [Hadoop S3A documentation](https://hadoop.apache.org/docs/current/hadoop-aws/tools/hadoop-aws/index.html#Introducing_the_Hadoop_S3A_client) which describes some of these problem when AWS S3 is used. (*Warnings* section)
+
+# Current status
+
+As of today *Ozone Manager* has two different interfaces (both are defined in `OmClientProtocol.proto`): 
+
+ 1. object store related functions (like *CreateKey*, *LookupKey*, *DeleteKey*,...)  
+ 2. file system related functions (like *CreateFile*, *LookupFile*,...)
+
+File system related functions uses the same flat hierarchy under the hood but includes additional functionalities. For example the `createFile` call creates all the intermediate directories for a specific key (create file `/a/b/c` will create `/a/b` and `/a` entries in the key space)
+
+Today, if a key is created from the S3 interface can cause exceptions if the intermediate directories are checked from HCFS:
+
+
+```shell
+$ aws s3api put-object --endpoint http://localhost:9878 --bucket bucket1 --key /a/b/c/d
+
+$ ozone fs -ls o3fs://bucket1.s3v/a/
+ls: `o3fs://bucket1.s3v/a/': No such file or directory
+```
+
+This problem is reported in [HDDS-3955](https://issues.apache.org/jira/browse/HDDS-3955), where a new configuration key is introduced (`ozone.om.enable.filesystem.paths`). If this is enabled, intermediate directories are created even if the object store interface is used.
+
+This configuration is turned off by default, which means that S3 and HCFS couldn't be used together.
+
+To solve the performance problems of the directory listing / rename, [HDDS-2939](https://issues.apache.org/jira/browse/HDDS-2939) is created, which propose to use a new prefix table to store the "directory" entries (=prefixes).
+
+[HDDS-4097](https://issues.apache.org/jira/browse/HDDS-4097) is created to normalize the key names based on file-system semantics if `ozone.om.enable.filesystem.paths` is enabled. But please note that `ozone.om.enable.filesystem.paths` should always be turned on if S3 and HCFS are both used. It means that if both S3 and HCFS are used, normalization is forced, and S3 interface is not fully AWS S3 compatible. There is no option to use HCFS and S3 but with full AWS compatibility (and reduced HCFS compatibility). 
+
+# Goals
+
+ * Out of the box Ozone should support both S3 and HCFS interfaces without any settings. (It's possible only for the regular, fs compatible key names)
+ * As 100% compatibility couldn't be achieved on both side we need a configuration to set the expectations for incompatible key names
+ * Default behavior of `o3fs` and `ofs` should be as close to `s3a` as possible (when s3 compatibilty is prefered)
+
+# Possible cases to support
+
+There are two main aspects of supporting both `ofs/o3fs` and `s3` together:
+
+ 1. `ofs/o3fs` require to create intermediate directory entries (for example `/a/b` for the key `/b/c/c`)
+ 2. Special file-system incompatible key names require special attention
+
+The second couldn't be done with compromise.
+
+ 1. We either support all key names (including non fs compatible key names), which means `ofs/o3fs` can provide only a partial view
+ 2. Or we can normalize the key names to be fs compatible (which makes it possible to create inconsistent S3 keys)
+
+HDDS-3955 introduced `ozone.om.enable.filesystem.paths`, with this setting we will have two possible usage pattern:
+
+| ozone.om.enable.filesystem.paths= | true | false
+|-|-|-|
+| create itermediate dirs | YES | NO |
+| normalize key names from `ofs/o3fs` | YES | NO
+| force to normalize key names of `s3` interface | YES | NO 
+| `s3` key `/a/b/c` available from `ofs/o3fs` | YES | NO
+| `s3` key `/a/b//c` available from `ofs/o3fs` | YES | NO
+| `s3` key `/a/b//c` available from `s3` | AWS S3 incompatibility | YES
+
+This proposal suggest to use a 3rd option where 100% AWS compatiblity is guaranteed in exchange of a limited `ofs/o3fs` view:
+
+| ozone.om.intermediate.dir.generation= | true |
+|-|-|-|
+| create itermediate dirs | YES | 
+| normalize key names from `ofs/o3fs` | YES |
+| force to normalize key names of `s3` interface | **NO** |
+| `s3` key `/a/b/c` available from `ofs/o3fs` | YES | 
+| `s3` key `/a/b//c` available from `ofs/o3fs` | NO | 
+| `s3` key `/a/b//c` available from `s3` | *YES* (100% AWS compatibility) |
+
+
+# Proposed solution
+
+In short: 
+
+ **I propose to make it possible to configure **normalization** and **intermediate dir creation**, independent from each other**
+ 
+It can be done in multiple ways. For the sake of simplicity, let's imagine two configuration option
+
+| configuration | behavior | 
+|-|-|
+| `ozone.om.enable.filesystem.paths=true`  | Enable intermediate dir generation **AND** key name normalization 
+| `ozone.om.enable.intermediate.dirs=true` | Enable only the intermediate dir generation
+
+## S3/HCFS Interoperability
+
+**In case of intermediate directory generation is enabled (with either of the configuraiton keys)**:
+
+When somebody creates a new key like `/a/b/c/d`, the same key should be visible from HCFS (`o3fs//` or `o3://`). `/a`, `/a/b` and `/a/b/c` should be visible as directories from HCFS.
+
+S3 should list only the `/a/b/c/d` keys, (`/a`, `/a/b`, `/a/b/c` keys, created to help HCFS, **won't be visible** if the key is created from S3)
+
+This can be done with persisting an extra flag with the implicit directory entries. These entries can be modified if they are explicit created.
+
+This flag should be added only for the keys which are created by S3. `ofs://` and `of3fs://`  create explicit directories all the time.
+
+Advantages of this approach:
+
+ 1. HCFS and S3 can work together
+ 2. S3 behavior is closer to the original AWS s3 behavior (when `/a/b/c` key is created `/a/b` won't be visible)
+
+## Handling of the incompatible paths
+
+As it's defined above the intermediate directory generation and normalization are two independent settings. (It's possible to choose only to create the intermediate directories).
+
+**If normalization is choosen**: (`ozone.om.enable.filesystem.paths=true`), all the key names will be normalized to fs-compatible name. It may cause a conflict (error) if the normalized key is already exists (or exists as a file instead of directory)
+
+**Without normalization (`ozone.om.enable.intermediate.dirs=true`)**:
+
+Creating intermediate directories might not be possible if path contains illegal characters or can't be parsed as a file system path. **These keys will be invisible from HCFS** by default. They will be ignored during the normal file list.

Review comment:
       The illegal char from S3 path can be encoded into FS path except the /. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org


[GitHub] [hadoop-ozone] elek commented on a change in pull request #1411: HDDS-4097. [DESIGN] S3/Ozone Filesystem inter-op

Posted by GitBox <gi...@apache.org>.
elek commented on a change in pull request #1411:
URL: https://github.com/apache/hadoop-ozone/pull/1411#discussion_r486433552



##########
File path: hadoop-hdds/docs/content/design/s3_hcfs.md
##########
@@ -0,0 +1,282 @@
+---
+title: S3/Ozone Filesystem inter-op 
+summary: How to support both S3 and HCFS and the same time
+date: 2020-09-09
+jira: HDDS-4097
+status: draft
+author: Marton Elek, 
+---
+<!--
+  Licensed under the Apache License, Version 2.0 (the "License");
+  you may not use this file except in compliance with the License.
+  You may obtain a copy of the License at
+
+   http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License. See accompanying LICENSE file.
+-->
+
+# Ozone S3 vs file-system semantics
+
+Ozone is an object-store for Hadoop ecosystem which can be used from multiple interfaces: 
+
+ 1. From Hadoop Compatible File Systems (will be called as *HCFS* in the remaining of this document) (RPC)
+ 2. From S3 compatible applications (REST)
+ 3. From container orchestrator as mounted volume (CSI, alpha feature)
+
+As Ozone is an object store it stores key and values in a flat hierarchy which is enough to support S3 (2). But to support Hadoop Compatible File System (and CSI), Ozone should simulated file system hierarchy.
+
+There are multiple challenges when file system hierarchy is simulated by a flat namespace:
+
+ 1. Some key patterns couldn't be easily transformed to file system path (e.g. `/a/b/../c`, `/a/b//d`, or a real key with directory path like `/b/d/`)
+ 2. Directory entries (which may have own properties) require special handling as file system interface requires a dir entry even if it's not created explicitly (for example if key `/a/b/c` is created `/a/b` supposed to be a visible directory entry for file system interface) 
+ 3. Non-recursive listing of directories can be hard (Listing direct entries under `/a` should ignore all the `/a/b/...`, `/a/b/c/...` keys) 
+ 4. Similar to listing, rename can be a costly operation as it requires to rename many keys (renaming a first level directory means a rename of all the keys with the same prefix)
+
+See also the [Hadoop S3A documentation](https://hadoop.apache.org/docs/current/hadoop-aws/tools/hadoop-aws/index.html#Introducing_the_Hadoop_S3A_client) which describes some of these problem when AWS S3 is used. (*Warnings* section)
+
+# Current status
+
+As of today *Ozone Manager* has two different interfaces (both are defined in `OmClientProtocol.proto`): 
+
+ 1. object store related functions (like *CreateKey*, *LookupKey*, *DeleteKey*,...)  
+ 2. file system related functions (like *CreateFile*, *LookupFile*,...)
+
+File system related functions uses the same flat hierarchy under the hood but includes additional functionalities. For example the `createFile` call creates all the intermediate directories for a specific key (create file `/a/b/c` will create `/a/b` and `/a` entries in the key space)
+
+Today, if a key is created from the S3 interface can cause exceptions if the intermediate directories are checked from HCFS:
+
+
+```shell
+$ aws s3api put-object --endpoint http://localhost:9878 --bucket bucket1 --key /a/b/c/d
+
+$ ozone fs -ls o3fs://bucket1.s3v/a/
+ls: `o3fs://bucket1.s3v/a/': No such file or directory
+```
+
+This problem is reported in [HDDS-3955](https://issues.apache.org/jira/browse/HDDS-3955), where a new configuration key is introduced (`ozone.om.enable.filesystem.paths`). If this is enabled, intermediate directories are created even if the object store interface is used.
+
+This configuration is turned off by default, which means that S3 and HCFS couldn't be used together.
+
+To solve the performance problems of the directory listing / rename, [HDDS-2939](https://issues.apache.org/jira/browse/HDDS-2939) is created, which propose to use a new prefix table to store the "directory" entries (=prefixes).
+
+[HDDS-4097](https://issues.apache.org/jira/browse/HDDS-4097) is created to normalize the key names based on file-system semantics if `ozone.om.enable.filesystem.paths` is enabled. But please note that `ozone.om.enable.filesystem.paths` should always be turned on if S3 and HCFS are both used which means that S3 and HCFS couldn't be used together with normalization.
+
+# Goals
+
+ * Out of the box Ozone should support both S3 and HCFS interfaces without any settings. (It's possible only for the regular, fs compatible key names)
+ * As 100% compatibility couldn't be achieved on both side we need a configuration to set the expectations for incompatible key names
+ * Default behavior of `o3fs` and `ofs` should be as close to `s3a` as possible (when s3 compatibilty is prefered)
+
+# Possible cases to support
+
+There are two main aspects of supporting both `ofs/o3fs` and `s3` together:
+
+ 1. `ofs/o3fs` require to create intermediate directory entries (for exapmle `/a/b` for the key `/b/c/c`)
+ 2. Special file-system incompatible key names require special attention
+
+The second couldn't be done with compromise.
+
+ 1. We either support all key names (including non fs compatible key names), which means `ofs/o3fs` can provide only a partial view
+ 2. Or we can normalize the key names to be fs compatible (which makes it possible to create inconsistent S3 keys)
+
+HDDS-3955 introduced `ozone.om.enable.filesystem.paths`, with this setting we will have two possible usage pattern:
+
+| ozone.om.enable.filesystem.paths= | true | false
+|-|-|-|
+| create itermediate dirs | YES | NO |
+| normalize key names from `ofs/o3fs` | YES | NO
+| force to normalize key names of `s3` interface | YES (1) | NO 
+| `s3` key `/a/b/c` available from `ofs/o3fs` | YES | NO
+| `s3` key `/a/b//c` available from `ofs/o3fs` | YES | NO
+| `s3` key `/a/b//c` available from `s3` | AWS S3 incompatibility | YES
+
+(1): Under implementation
+
+This proposal suggest to use a 3rd option where 100% AWS compatiblity is guaranteed in exchange of a limited `ofs/o3fs` view:
+
+| ozone.om.intermediate.dir.generation= | true |
+|-|-|-|
+| create itermediate dirs | YES | 
+| normalize key names from `ofs/o3fs` | YES |
+| force to normalize key names of `s3` interface | **NO** |
+| `s3` key `/a/b/c` available from `ofs/o3fs` | YES | 
+| `s3` key `/a/b//c` available from `ofs/o3fs` | NO | 
+| `s3` key `/a/b//c` available from `s3` | *YES* (100% AWS compatibility) |
+
+
+# Proposed solution
+
+In short: 
+
+ **I propose to make it possible to configure **normalization** and **intermediate dir creation**, independent from each other**
+ 
+It can be done in multiple ways. For the sake of simplicity, let's imagine two configuration option
+
+| configuration | behavior | 
+|-|-|
+| `ozone.om.enable.filesystem.paths=true`  | Enable intermediate dir generation **AND** key name normalization 

Review comment:
       I am open to reorganize the configuration in any way. In this model the two configuration are independent. Existing config means CREATE_DIR+NORMALIZE, new config is just CREATE_DIR.
   
   They are independent and not matrix. 
   
   But we can switch back to the original version.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org


[GitHub] [hadoop-ozone] bharatviswa504 commented on a change in pull request #1411: HDDS-4097. [DESIGN] S3/Ozone Filesystem inter-op

Posted by GitBox <gi...@apache.org>.
bharatviswa504 commented on a change in pull request #1411:
URL: https://github.com/apache/hadoop-ozone/pull/1411#discussion_r485789295



##########
File path: hadoop-hdds/docs/content/design/s3_hcfs.md
##########
@@ -0,0 +1,282 @@
+---
+title: S3/Ozone Filesystem inter-op 
+summary: How to support both S3 and HCFS and the same time
+date: 2020-09-09
+jira: HDDS-4097
+status: draft
+author: Marton Elek, 
+---
+<!--
+  Licensed under the Apache License, Version 2.0 (the "License");
+  you may not use this file except in compliance with the License.
+  You may obtain a copy of the License at
+
+   http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License. See accompanying LICENSE file.
+-->
+
+# Ozone S3 vs file-system semantics
+
+Ozone is an object-store for Hadoop ecosystem which can be used from multiple interfaces: 
+
+ 1. From Hadoop Compatible File Systems (will be called as *HCFS* in the remaining of this document) (RPC)
+ 2. From S3 compatible applications (REST)
+ 3. From container orchestrator as mounted volume (CSI, alpha feature)
+
+As Ozone is an object store it stores key and values in a flat hierarchy which is enough to support S3 (2). But to support Hadoop Compatible File System (and CSI), Ozone should simulated file system hierarchy.
+
+There are multiple challenges when file system hierarchy is simulated by a flat namespace:
+
+ 1. Some key patterns couldn't be easily transformed to file system path (e.g. `/a/b/../c`, `/a/b//d`, or a real key with directory path like `/b/d/`)
+ 2. Directory entries (which may have own properties) require special handling as file system interface requires a dir entry even if it's not created explicitly (for example if key `/a/b/c` is created `/a/b` supposed to be a visible directory entry for file system interface) 
+ 3. Non-recursive listing of directories can be hard (Listing direct entries under `/a` should ignore all the `/a/b/...`, `/a/b/c/...` keys) 
+ 4. Similar to listing, rename can be a costly operation as it requires to rename many keys (renaming a first level directory means a rename of all the keys with the same prefix)
+
+See also the [Hadoop S3A documentation](https://hadoop.apache.org/docs/current/hadoop-aws/tools/hadoop-aws/index.html#Introducing_the_Hadoop_S3A_client) which describes some of these problem when AWS S3 is used. (*Warnings* section)
+
+# Current status
+
+As of today *Ozone Manager* has two different interfaces (both are defined in `OmClientProtocol.proto`): 
+
+ 1. object store related functions (like *CreateKey*, *LookupKey*, *DeleteKey*,...)  
+ 2. file system related functions (like *CreateFile*, *LookupFile*,...)
+
+File system related functions uses the same flat hierarchy under the hood but includes additional functionalities. For example the `createFile` call creates all the intermediate directories for a specific key (create file `/a/b/c` will create `/a/b` and `/a` entries in the key space)
+
+Today, if a key is created from the S3 interface can cause exceptions if the intermediate directories are checked from HCFS:
+
+
+```shell
+$ aws s3api put-object --endpoint http://localhost:9878 --bucket bucket1 --key /a/b/c/d
+
+$ ozone fs -ls o3fs://bucket1.s3v/a/
+ls: `o3fs://bucket1.s3v/a/': No such file or directory
+```
+
+This problem is reported in [HDDS-3955](https://issues.apache.org/jira/browse/HDDS-3955), where a new configuration key is introduced (`ozone.om.enable.filesystem.paths`). If this is enabled, intermediate directories are created even if the object store interface is used.
+
+This configuration is turned off by default, which means that S3 and HCFS couldn't be used together.
+
+To solve the performance problems of the directory listing / rename, [HDDS-2939](https://issues.apache.org/jira/browse/HDDS-2939) is created, which propose to use a new prefix table to store the "directory" entries (=prefixes).
+
+[HDDS-4097](https://issues.apache.org/jira/browse/HDDS-4097) is created to normalize the key names based on file-system semantics if `ozone.om.enable.filesystem.paths` is enabled. But please note that `ozone.om.enable.filesystem.paths` should always be turned on if S3 and HCFS are both used which means that S3 and HCFS couldn't be used together with normalization.
+
+# Goals
+
+ * Out of the box Ozone should support both S3 and HCFS interfaces without any settings. (It's possible only for the regular, fs compatible key names)
+ * As 100% compatibility couldn't be achieved on both side we need a configuration to set the expectations for incompatible key names
+ * Default behavior of `o3fs` and `ofs` should be as close to `s3a` as possible (when s3 compatibilty is prefered)
+
+# Possible cases to support
+
+There are two main aspects of supporting both `ofs/o3fs` and `s3` together:
+
+ 1. `ofs/o3fs` require to create intermediate directory entries (for exapmle `/a/b` for the key `/b/c/c`)
+ 2. Special file-system incompatible key names require special attention
+
+The second couldn't be done with compromise.
+
+ 1. We either support all key names (including non fs compatible key names), which means `ofs/o3fs` can provide only a partial view
+ 2. Or we can normalize the key names to be fs compatible (which makes it possible to create inconsistent S3 keys)
+
+HDDS-3955 introduced `ozone.om.enable.filesystem.paths`, with this setting we will have two possible usage pattern:
+
+| ozone.om.enable.filesystem.paths= | true | false
+|-|-|-|
+| create itermediate dirs | YES | NO |
+| normalize key names from `ofs/o3fs` | YES | NO
+| force to normalize key names of `s3` interface | YES (1) | NO 
+| `s3` key `/a/b/c` available from `ofs/o3fs` | YES | NO
+| `s3` key `/a/b//c` available from `ofs/o3fs` | YES | NO
+| `s3` key `/a/b//c` available from `s3` | AWS S3 incompatibility | YES
+
+(1): Under implementation
+
+This proposal suggest to use a 3rd option where 100% AWS compatiblity is guaranteed in exchange of a limited `ofs/o3fs` view:
+
+| ozone.om.intermediate.dir.generation= | true |
+|-|-|-|
+| create itermediate dirs | YES | 
+| normalize key names from `ofs/o3fs` | YES |

Review comment:
       There is no special normalize is needed, as we use Path Object in FS, key names are normalized and sent to OM




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org


[GitHub] [hadoop-ozone] bharatviswa504 commented on a change in pull request #1411: HDDS-4097. [DESIGN] S3/Ozone Filesystem inter-op

Posted by GitBox <gi...@apache.org>.
bharatviswa504 commented on a change in pull request #1411:
URL: https://github.com/apache/hadoop-ozone/pull/1411#discussion_r485778737



##########
File path: hadoop-hdds/docs/content/design/s3_hcfs.md
##########
@@ -0,0 +1,282 @@
+---
+title: S3/Ozone Filesystem inter-op 
+summary: How to support both S3 and HCFS and the same time
+date: 2020-09-09
+jira: HDDS-4097
+status: draft
+author: Marton Elek, 
+---
+<!--
+  Licensed under the Apache License, Version 2.0 (the "License");
+  you may not use this file except in compliance with the License.
+  You may obtain a copy of the License at
+
+   http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License. See accompanying LICENSE file.
+-->
+
+# Ozone S3 vs file-system semantics
+
+Ozone is an object-store for Hadoop ecosystem which can be used from multiple interfaces: 
+
+ 1. From Hadoop Compatible File Systems (will be called as *HCFS* in the remaining of this document) (RPC)
+ 2. From S3 compatible applications (REST)
+ 3. From container orchestrator as mounted volume (CSI, alpha feature)
+
+As Ozone is an object store it stores key and values in a flat hierarchy which is enough to support S3 (2). But to support Hadoop Compatible File System (and CSI), Ozone should simulated file system hierarchy.
+
+There are multiple challenges when file system hierarchy is simulated by a flat namespace:
+
+ 1. Some key patterns couldn't be easily transformed to file system path (e.g. `/a/b/../c`, `/a/b//d`, or a real key with directory path like `/b/d/`)
+ 2. Directory entries (which may have own properties) require special handling as file system interface requires a dir entry even if it's not created explicitly (for example if key `/a/b/c` is created `/a/b` supposed to be a visible directory entry for file system interface) 
+ 3. Non-recursive listing of directories can be hard (Listing direct entries under `/a` should ignore all the `/a/b/...`, `/a/b/c/...` keys) 
+ 4. Similar to listing, rename can be a costly operation as it requires to rename many keys (renaming a first level directory means a rename of all the keys with the same prefix)
+
+See also the [Hadoop S3A documentation](https://hadoop.apache.org/docs/current/hadoop-aws/tools/hadoop-aws/index.html#Introducing_the_Hadoop_S3A_client) which describes some of these problem when AWS S3 is used. (*Warnings* section)
+
+# Current status
+
+As of today *Ozone Manager* has two different interfaces (both are defined in `OmClientProtocol.proto`): 
+
+ 1. object store related functions (like *CreateKey*, *LookupKey*, *DeleteKey*,...)  
+ 2. file system related functions (like *CreateFile*, *LookupFile*,...)
+
+File system related functions uses the same flat hierarchy under the hood but includes additional functionalities. For example the `createFile` call creates all the intermediate directories for a specific key (create file `/a/b/c` will create `/a/b` and `/a` entries in the key space)
+
+Today, if a key is created from the S3 interface can cause exceptions if the intermediate directories are checked from HCFS:
+
+
+```shell
+$ aws s3api put-object --endpoint http://localhost:9878 --bucket bucket1 --key /a/b/c/d
+
+$ ozone fs -ls o3fs://bucket1.s3v/a/
+ls: `o3fs://bucket1.s3v/a/': No such file or directory
+```
+
+This problem is reported in [HDDS-3955](https://issues.apache.org/jira/browse/HDDS-3955), where a new configuration key is introduced (`ozone.om.enable.filesystem.paths`). If this is enabled, intermediate directories are created even if the object store interface is used.
+
+This configuration is turned off by default, which means that S3 and HCFS couldn't be used together.
+
+To solve the performance problems of the directory listing / rename, [HDDS-2939](https://issues.apache.org/jira/browse/HDDS-2939) is created, which propose to use a new prefix table to store the "directory" entries (=prefixes).
+
+[HDDS-4097](https://issues.apache.org/jira/browse/HDDS-4097) is created to normalize the key names based on file-system semantics if `ozone.om.enable.filesystem.paths` is enabled. But please note that `ozone.om.enable.filesystem.paths` should always be turned on if S3 and HCFS are both used which means that S3 and HCFS couldn't be used together with normalization.
+
+# Goals
+
+ * Out of the box Ozone should support both S3 and HCFS interfaces without any settings. (It's possible only for the regular, fs compatible key names)
+ * As 100% compatibility couldn't be achieved on both side we need a configuration to set the expectations for incompatible key names
+ * Default behavior of `o3fs` and `ofs` should be as close to `s3a` as possible (when s3 compatibilty is prefered)
+
+# Possible cases to support
+
+There are two main aspects of supporting both `ofs/o3fs` and `s3` together:
+
+ 1. `ofs/o3fs` require to create intermediate directory entries (for exapmle `/a/b` for the key `/b/c/c`)

Review comment:
       exapmle -> example




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org


[GitHub] [hadoop-ozone] elek commented on a change in pull request #1411: HDDS-4097. [DESIGN] S3/Ozone Filesystem inter-op

Posted by GitBox <gi...@apache.org>.
elek commented on a change in pull request #1411:
URL: https://github.com/apache/hadoop-ozone/pull/1411#discussion_r486428560



##########
File path: hadoop-hdds/docs/content/design/s3_hcfs.md
##########
@@ -0,0 +1,282 @@
+---
+title: S3/Ozone Filesystem inter-op 
+summary: How to support both S3 and HCFS and the same time
+date: 2020-09-09
+jira: HDDS-4097
+status: draft
+author: Marton Elek, 
+---
+<!--
+  Licensed under the Apache License, Version 2.0 (the "License");
+  you may not use this file except in compliance with the License.
+  You may obtain a copy of the License at
+
+   http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License. See accompanying LICENSE file.
+-->
+
+# Ozone S3 vs file-system semantics
+
+Ozone is an object-store for Hadoop ecosystem which can be used from multiple interfaces: 
+
+ 1. From Hadoop Compatible File Systems (will be called as *HCFS* in the remaining of this document) (RPC)
+ 2. From S3 compatible applications (REST)
+ 3. From container orchestrator as mounted volume (CSI, alpha feature)
+
+As Ozone is an object store it stores key and values in a flat hierarchy which is enough to support S3 (2). But to support Hadoop Compatible File System (and CSI), Ozone should simulated file system hierarchy.
+
+There are multiple challenges when file system hierarchy is simulated by a flat namespace:
+
+ 1. Some key patterns couldn't be easily transformed to file system path (e.g. `/a/b/../c`, `/a/b//d`, or a real key with directory path like `/b/d/`)
+ 2. Directory entries (which may have own properties) require special handling as file system interface requires a dir entry even if it's not created explicitly (for example if key `/a/b/c` is created `/a/b` supposed to be a visible directory entry for file system interface) 
+ 3. Non-recursive listing of directories can be hard (Listing direct entries under `/a` should ignore all the `/a/b/...`, `/a/b/c/...` keys) 
+ 4. Similar to listing, rename can be a costly operation as it requires to rename many keys (renaming a first level directory means a rename of all the keys with the same prefix)
+
+See also the [Hadoop S3A documentation](https://hadoop.apache.org/docs/current/hadoop-aws/tools/hadoop-aws/index.html#Introducing_the_Hadoop_S3A_client) which describes some of these problem when AWS S3 is used. (*Warnings* section)
+
+# Current status
+
+As of today *Ozone Manager* has two different interfaces (both are defined in `OmClientProtocol.proto`): 
+
+ 1. object store related functions (like *CreateKey*, *LookupKey*, *DeleteKey*,...)  
+ 2. file system related functions (like *CreateFile*, *LookupFile*,...)
+
+File system related functions uses the same flat hierarchy under the hood but includes additional functionalities. For example the `createFile` call creates all the intermediate directories for a specific key (create file `/a/b/c` will create `/a/b` and `/a` entries in the key space)
+
+Today, if a key is created from the S3 interface can cause exceptions if the intermediate directories are checked from HCFS:
+
+
+```shell
+$ aws s3api put-object --endpoint http://localhost:9878 --bucket bucket1 --key /a/b/c/d
+
+$ ozone fs -ls o3fs://bucket1.s3v/a/
+ls: `o3fs://bucket1.s3v/a/': No such file or directory
+```
+
+This problem is reported in [HDDS-3955](https://issues.apache.org/jira/browse/HDDS-3955), where a new configuration key is introduced (`ozone.om.enable.filesystem.paths`). If this is enabled, intermediate directories are created even if the object store interface is used.
+
+This configuration is turned off by default, which means that S3 and HCFS couldn't be used together.
+
+To solve the performance problems of the directory listing / rename, [HDDS-2939](https://issues.apache.org/jira/browse/HDDS-2939) is created, which propose to use a new prefix table to store the "directory" entries (=prefixes).
+
+[HDDS-4097](https://issues.apache.org/jira/browse/HDDS-4097) is created to normalize the key names based on file-system semantics if `ozone.om.enable.filesystem.paths` is enabled. But please note that `ozone.om.enable.filesystem.paths` should always be turned on if S3 and HCFS are both used which means that S3 and HCFS couldn't be used together with normalization.
+
+# Goals
+
+ * Out of the box Ozone should support both S3 and HCFS interfaces without any settings. (It's possible only for the regular, fs compatible key names)
+ * As 100% compatibility couldn't be achieved on both side we need a configuration to set the expectations for incompatible key names
+ * Default behavior of `o3fs` and `ofs` should be as close to `s3a` as possible (when s3 compatibilty is prefered)
+
+# Possible cases to support
+
+There are two main aspects of supporting both `ofs/o3fs` and `s3` together:
+
+ 1. `ofs/o3fs` require to create intermediate directory entries (for exapmle `/a/b` for the key `/b/c/c`)

Review comment:
       Fixed, thanks.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org


[GitHub] [hadoop-ozone] arp7 commented on a change in pull request #1411: HDDS-4097. [DESIGN] S3/Ozone Filesystem inter-op

Posted by GitBox <gi...@apache.org>.
arp7 commented on a change in pull request #1411:
URL: https://github.com/apache/hadoop-ozone/pull/1411#discussion_r485693614



##########
File path: hadoop-hdds/docs/content/design/s3_hcfs.md
##########
@@ -67,45 +66,100 @@ To solve the performance problems of the directory listing / rename, [HDDS-2939]
 
 [HDDS-4097](https://issues.apache.org/jira/browse/HDDS-4097) is created to normalize the key names based on file-system semantics if `ozone.om.enable.filesystem.paths` is enabled. But please note that `ozone.om.enable.filesystem.paths` should always be turned on if S3 and HCFS are both used which means that S3 and HCFS couldn't be used together with normalization.
 
-## Goals
+# Goals
+
+ * Out of the box Ozone should support both S3 and HCFS interfaces without any settings. (It's possible only for the regular, fs compatible key names)
+ * As 100% compatibility couldn't be achieved on both side we need a configuration to set the expectations for incompatible key names
+ * Default behavior of `o3fs` and `ofs` should be as close to `s3a` as possible (when s3 compatibilty is prefered)
+
+# Possible cases to support
+
+There are two main aspects of supporting both `ofs/o3fs` and `s3` together:
+
+ 1. `ofs/o3fs` require to create intermediate directory entries (for exapmle `/a/b` for the key `/b/c/c`)
+ 2. Special file-system incompatible key names require special attention
+
+The second couldn't be done with compromise.
+
+ 1. We either support all key names (including non fs compatible key names), which means `ofs/o3fs` can provide only a partial view
+ 2. Or we can normalize the key names to be fs compatible (which makes it possible to create inconsistent S3 keys)
+
+HDDS-3955 introduced `ozone.om.enable.filesystem.paths`, with this setting we will have two possible usage pattern:
+
+| ozone.om.enable.filesystem.paths= | true | false
+|-|-|-|
+| create itermediate dirs | YES | NO |
+| normalize key names from `ofs/o3fs` | YES | NO
+| force to normalize key names of `s3` interface | YES (1) | NO 
+| `s3` key `/a/b/c` available from `ofs/o3fs` | YES | NO
+| `s3` key `/a/b//c` available from `ofs/o3fs` | YES | NO
+| `s3` key `/a/b//c` available from `s3` | AWS S3 incompatibility | YES
+
+(1): Under implementation

Review comment:
       I don't think that is true. Paths are normalized already on the S3 interface when writing new keys.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org


[GitHub] [hadoop-ozone] elek commented on a change in pull request #1411: HDDS-4097. [DESIGN] S3/Ozone Filesystem inter-op

Posted by GitBox <gi...@apache.org>.
elek commented on a change in pull request #1411:
URL: https://github.com/apache/hadoop-ozone/pull/1411#discussion_r486434809



##########
File path: hadoop-hdds/docs/content/design/s3_hcfs.md
##########
@@ -0,0 +1,282 @@
+---
+title: S3/Ozone Filesystem inter-op 
+summary: How to support both S3 and HCFS and the same time
+date: 2020-09-09
+jira: HDDS-4097
+status: draft
+author: Marton Elek, 
+---
+<!--
+  Licensed under the Apache License, Version 2.0 (the "License");
+  you may not use this file except in compliance with the License.
+  You may obtain a copy of the License at
+
+   http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License. See accompanying LICENSE file.
+-->
+
+# Ozone S3 vs file-system semantics
+
+Ozone is an object-store for Hadoop ecosystem which can be used from multiple interfaces: 
+
+ 1. From Hadoop Compatible File Systems (will be called as *HCFS* in the remaining of this document) (RPC)
+ 2. From S3 compatible applications (REST)
+ 3. From container orchestrator as mounted volume (CSI, alpha feature)
+
+As Ozone is an object store it stores key and values in a flat hierarchy which is enough to support S3 (2). But to support Hadoop Compatible File System (and CSI), Ozone should simulated file system hierarchy.
+
+There are multiple challenges when file system hierarchy is simulated by a flat namespace:
+
+ 1. Some key patterns couldn't be easily transformed to file system path (e.g. `/a/b/../c`, `/a/b//d`, or a real key with directory path like `/b/d/`)
+ 2. Directory entries (which may have own properties) require special handling as file system interface requires a dir entry even if it's not created explicitly (for example if key `/a/b/c` is created `/a/b` supposed to be a visible directory entry for file system interface) 
+ 3. Non-recursive listing of directories can be hard (Listing direct entries under `/a` should ignore all the `/a/b/...`, `/a/b/c/...` keys) 
+ 4. Similar to listing, rename can be a costly operation as it requires to rename many keys (renaming a first level directory means a rename of all the keys with the same prefix)
+
+See also the [Hadoop S3A documentation](https://hadoop.apache.org/docs/current/hadoop-aws/tools/hadoop-aws/index.html#Introducing_the_Hadoop_S3A_client) which describes some of these problem when AWS S3 is used. (*Warnings* section)
+
+# Current status
+
+As of today *Ozone Manager* has two different interfaces (both are defined in `OmClientProtocol.proto`): 
+
+ 1. object store related functions (like *CreateKey*, *LookupKey*, *DeleteKey*,...)  
+ 2. file system related functions (like *CreateFile*, *LookupFile*,...)
+
+File system related functions uses the same flat hierarchy under the hood but includes additional functionalities. For example the `createFile` call creates all the intermediate directories for a specific key (create file `/a/b/c` will create `/a/b` and `/a` entries in the key space)
+
+Today, if a key is created from the S3 interface can cause exceptions if the intermediate directories are checked from HCFS:
+
+
+```shell
+$ aws s3api put-object --endpoint http://localhost:9878 --bucket bucket1 --key /a/b/c/d
+
+$ ozone fs -ls o3fs://bucket1.s3v/a/
+ls: `o3fs://bucket1.s3v/a/': No such file or directory
+```
+
+This problem is reported in [HDDS-3955](https://issues.apache.org/jira/browse/HDDS-3955), where a new configuration key is introduced (`ozone.om.enable.filesystem.paths`). If this is enabled, intermediate directories are created even if the object store interface is used.
+
+This configuration is turned off by default, which means that S3 and HCFS couldn't be used together.
+
+To solve the performance problems of the directory listing / rename, [HDDS-2939](https://issues.apache.org/jira/browse/HDDS-2939) is created, which propose to use a new prefix table to store the "directory" entries (=prefixes).
+
+[HDDS-4097](https://issues.apache.org/jira/browse/HDDS-4097) is created to normalize the key names based on file-system semantics if `ozone.om.enable.filesystem.paths` is enabled. But please note that `ozone.om.enable.filesystem.paths` should always be turned on if S3 and HCFS are both used which means that S3 and HCFS couldn't be used together with normalization.
+
+# Goals
+
+ * Out of the box Ozone should support both S3 and HCFS interfaces without any settings. (It's possible only for the regular, fs compatible key names)
+ * As 100% compatibility couldn't be achieved on both side we need a configuration to set the expectations for incompatible key names
+ * Default behavior of `o3fs` and `ofs` should be as close to `s3a` as possible (when s3 compatibilty is prefered)
+
+# Possible cases to support
+
+There are two main aspects of supporting both `ofs/o3fs` and `s3` together:
+
+ 1. `ofs/o3fs` require to create intermediate directory entries (for exapmle `/a/b` for the key `/b/c/c`)
+ 2. Special file-system incompatible key names require special attention
+
+The second couldn't be done with compromise.
+
+ 1. We either support all key names (including non fs compatible key names), which means `ofs/o3fs` can provide only a partial view
+ 2. Or we can normalize the key names to be fs compatible (which makes it possible to create inconsistent S3 keys)
+
+HDDS-3955 introduced `ozone.om.enable.filesystem.paths`, with this setting we will have two possible usage pattern:
+
+| ozone.om.enable.filesystem.paths= | true | false
+|-|-|-|
+| create itermediate dirs | YES | NO |
+| normalize key names from `ofs/o3fs` | YES | NO
+| force to normalize key names of `s3` interface | YES (1) | NO 
+| `s3` key `/a/b/c` available from `ofs/o3fs` | YES | NO
+| `s3` key `/a/b//c` available from `ofs/o3fs` | YES | NO
+| `s3` key `/a/b//c` available from `s3` | AWS S3 incompatibility | YES
+
+(1): Under implementation
+
+This proposal suggest to use a 3rd option where 100% AWS compatiblity is guaranteed in exchange of a limited `ofs/o3fs` view:
+
+| ozone.om.intermediate.dir.generation= | true |
+|-|-|-|
+| create itermediate dirs | YES | 
+| normalize key names from `ofs/o3fs` | YES |
+| force to normalize key names of `s3` interface | **NO** |
+| `s3` key `/a/b/c` available from `ofs/o3fs` | YES | 
+| `s3` key `/a/b//c` available from `ofs/o3fs` | NO | 
+| `s3` key `/a/b//c` available from `s3` | *YES* (100% AWS compatibility) |
+
+
+# Proposed solution
+
+In short: 
+
+ **I propose to make it possible to configure **normalization** and **intermediate dir creation**, independent from each other**
+ 
+It can be done in multiple ways. For the sake of simplicity, let's imagine two configuration option
+
+| configuration | behavior | 
+|-|-|
+| `ozone.om.enable.filesystem.paths=true`  | Enable intermediate dir generation **AND** key name normalization 

Review comment:
       I think so, but we can remove `om`. As it's not required to know from a config where is it used. It's an important Ozone config. But I am open to use any config name.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org


[GitHub] [hadoop-ozone] bharatviswa504 commented on a change in pull request #1411: HDDS-4097. [DESIGN] S3/Ozone Filesystem inter-op

Posted by GitBox <gi...@apache.org>.
bharatviswa504 commented on a change in pull request #1411:
URL: https://github.com/apache/hadoop-ozone/pull/1411#discussion_r485817494



##########
File path: hadoop-hdds/docs/content/design/s3_hcfs.md
##########
@@ -67,45 +66,100 @@ To solve the performance problems of the directory listing / rename, [HDDS-2939]
 
 [HDDS-4097](https://issues.apache.org/jira/browse/HDDS-4097) is created to normalize the key names based on file-system semantics if `ozone.om.enable.filesystem.paths` is enabled. But please note that `ozone.om.enable.filesystem.paths` should always be turned on if S3 and HCFS are both used which means that S3 and HCFS couldn't be used together with normalization.
 
-## Goals
+# Goals
+
+ * Out of the box Ozone should support both S3 and HCFS interfaces without any settings. (It's possible only for the regular, fs compatible key names)
+ * As 100% compatibility couldn't be achieved on both side we need a configuration to set the expectations for incompatible key names
+ * Default behavior of `o3fs` and `ofs` should be as close to `s3a` as possible (when s3 compatibilty is prefered)
+
+# Possible cases to support
+
+There are two main aspects of supporting both `ofs/o3fs` and `s3` together:
+
+ 1. `ofs/o3fs` require to create intermediate directory entries (for exapmle `/a/b` for the key `/b/c/c`)
+ 2. Special file-system incompatible key names require special attention
+
+The second couldn't be done with compromise.
+
+ 1. We either support all key names (including non fs compatible key names), which means `ofs/o3fs` can provide only a partial view
+ 2. Or we can normalize the key names to be fs compatible (which makes it possible to create inconsistent S3 keys)
+
+HDDS-3955 introduced `ozone.om.enable.filesystem.paths`, with this setting we will have two possible usage pattern:
+
+| ozone.om.enable.filesystem.paths= | true | false
+|-|-|-|
+| create itermediate dirs | YES | NO |
+| normalize key names from `ofs/o3fs` | YES | NO
+| force to normalize key names of `s3` interface | YES (1) | NO 
+| `s3` key `/a/b/c` available from `ofs/o3fs` | YES | NO
+| `s3` key `/a/b//c` available from `ofs/o3fs` | YES | NO
+| `s3` key `/a/b//c` available from `s3` | AWS S3 incompatibility | YES
+
+(1): Under implementation
 
- * Out of the box Ozone should support both S3 and HCFS interfaces without any settings. (It's possible only for the regular path)
- * As 100% compatibility couldn't be achieved on both side we need a configuration to set the expectations in case of incompatible key names
- * Default behavior of `o3fs` and `ofs` should be as close to `s3a` as possible
+This proposal suggest to use a 3rd option where 100% AWS compatiblity is guaranteed in exchange of a limited `ofs/o3fs` view:

Review comment:
       We also have another way right, an existing bucket can be exposed created via CLI to be exposed to S3, what semantics that bucket will get?
   
   >For buckets created via FS interface, the FS semantics will always take precedence
   
   Buckets creation is possible via only OFS, what about O3fs?
   
   >If the global setting is enabled, then the value of the setting at the time of bucket creation is sampled and that takes >effect for the lifetime of the bucket.
   
   A bucket created via Shell, when global flag (assuming ozone.om.enable.filesystem.paths=true), they will follow FS semantics and with slight S3 incompatibility.
   So, a bucket created via Shell, when global flag (assuming ozone.om.enable.filesystem.paths=false), they will follow S3 semantics and with broken FS semantics or completely disallow.
   
   Written from my understanding, as I have not got the complete context of the proposal.
   
   I might be missing somethings here.
   
   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org


[GitHub] [hadoop-ozone] elek commented on a change in pull request #1411: HDDS-4097. [DESIGN] S3/Ozone Filesystem inter-op

Posted by GitBox <gi...@apache.org>.
elek commented on a change in pull request #1411:
URL: https://github.com/apache/hadoop-ozone/pull/1411#discussion_r486443111



##########
File path: hadoop-hdds/docs/content/design/s3_hcfs.md
##########
@@ -0,0 +1,282 @@
+---
+title: S3/Ozone Filesystem inter-op 
+summary: How to support both S3 and HCFS and the same time
+date: 2020-09-09
+jira: HDDS-4097
+status: draft
+author: Marton Elek, 
+---
+<!--
+  Licensed under the Apache License, Version 2.0 (the "License");
+  you may not use this file except in compliance with the License.
+  You may obtain a copy of the License at
+
+   http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License. See accompanying LICENSE file.
+-->
+
+# Ozone S3 vs file-system semantics
+
+Ozone is an object-store for Hadoop ecosystem which can be used from multiple interfaces: 
+
+ 1. From Hadoop Compatible File Systems (will be called as *HCFS* in the remaining of this document) (RPC)
+ 2. From S3 compatible applications (REST)
+ 3. From container orchestrator as mounted volume (CSI, alpha feature)
+
+As Ozone is an object store it stores key and values in a flat hierarchy which is enough to support S3 (2). But to support Hadoop Compatible File System (and CSI), Ozone should simulated file system hierarchy.
+
+There are multiple challenges when file system hierarchy is simulated by a flat namespace:
+
+ 1. Some key patterns couldn't be easily transformed to file system path (e.g. `/a/b/../c`, `/a/b//d`, or a real key with directory path like `/b/d/`)
+ 2. Directory entries (which may have own properties) require special handling as file system interface requires a dir entry even if it's not created explicitly (for example if key `/a/b/c` is created `/a/b` supposed to be a visible directory entry for file system interface) 
+ 3. Non-recursive listing of directories can be hard (Listing direct entries under `/a` should ignore all the `/a/b/...`, `/a/b/c/...` keys) 
+ 4. Similar to listing, rename can be a costly operation as it requires to rename many keys (renaming a first level directory means a rename of all the keys with the same prefix)
+
+See also the [Hadoop S3A documentation](https://hadoop.apache.org/docs/current/hadoop-aws/tools/hadoop-aws/index.html#Introducing_the_Hadoop_S3A_client) which describes some of these problem when AWS S3 is used. (*Warnings* section)
+
+# Current status
+
+As of today *Ozone Manager* has two different interfaces (both are defined in `OmClientProtocol.proto`): 
+
+ 1. object store related functions (like *CreateKey*, *LookupKey*, *DeleteKey*,...)  
+ 2. file system related functions (like *CreateFile*, *LookupFile*,...)
+
+File system related functions uses the same flat hierarchy under the hood but includes additional functionalities. For example the `createFile` call creates all the intermediate directories for a specific key (create file `/a/b/c` will create `/a/b` and `/a` entries in the key space)
+
+Today, if a key is created from the S3 interface can cause exceptions if the intermediate directories are checked from HCFS:
+
+
+```shell
+$ aws s3api put-object --endpoint http://localhost:9878 --bucket bucket1 --key /a/b/c/d
+
+$ ozone fs -ls o3fs://bucket1.s3v/a/
+ls: `o3fs://bucket1.s3v/a/': No such file or directory
+```
+
+This problem is reported in [HDDS-3955](https://issues.apache.org/jira/browse/HDDS-3955), where a new configuration key is introduced (`ozone.om.enable.filesystem.paths`). If this is enabled, intermediate directories are created even if the object store interface is used.
+
+This configuration is turned off by default, which means that S3 and HCFS couldn't be used together.
+
+To solve the performance problems of the directory listing / rename, [HDDS-2939](https://issues.apache.org/jira/browse/HDDS-2939) is created, which propose to use a new prefix table to store the "directory" entries (=prefixes).
+
+[HDDS-4097](https://issues.apache.org/jira/browse/HDDS-4097) is created to normalize the key names based on file-system semantics if `ozone.om.enable.filesystem.paths` is enabled. But please note that `ozone.om.enable.filesystem.paths` should always be turned on if S3 and HCFS are both used which means that S3 and HCFS couldn't be used together with normalization.
+
+# Goals
+
+ * Out of the box Ozone should support both S3 and HCFS interfaces without any settings. (It's possible only for the regular, fs compatible key names)
+ * As 100% compatibility couldn't be achieved on both side we need a configuration to set the expectations for incompatible key names
+ * Default behavior of `o3fs` and `ofs` should be as close to `s3a` as possible (when s3 compatibilty is prefered)
+
+# Possible cases to support
+
+There are two main aspects of supporting both `ofs/o3fs` and `s3` together:
+
+ 1. `ofs/o3fs` require to create intermediate directory entries (for exapmle `/a/b` for the key `/b/c/c`)
+ 2. Special file-system incompatible key names require special attention
+
+The second couldn't be done with compromise.
+
+ 1. We either support all key names (including non fs compatible key names), which means `ofs/o3fs` can provide only a partial view
+ 2. Or we can normalize the key names to be fs compatible (which makes it possible to create inconsistent S3 keys)
+
+HDDS-3955 introduced `ozone.om.enable.filesystem.paths`, with this setting we will have two possible usage pattern:
+
+| ozone.om.enable.filesystem.paths= | true | false
+|-|-|-|
+| create itermediate dirs | YES | NO |
+| normalize key names from `ofs/o3fs` | YES | NO
+| force to normalize key names of `s3` interface | YES (1) | NO 
+| `s3` key `/a/b/c` available from `ofs/o3fs` | YES | NO
+| `s3` key `/a/b//c` available from `ofs/o3fs` | YES | NO
+| `s3` key `/a/b//c` available from `s3` | AWS S3 incompatibility | YES
+
+(1): Under implementation
+
+This proposal suggest to use a 3rd option where 100% AWS compatiblity is guaranteed in exchange of a limited `ofs/o3fs` view:
+
+| ozone.om.intermediate.dir.generation= | true |
+|-|-|-|
+| create itermediate dirs | YES | 
+| normalize key names from `ofs/o3fs` | YES |
+| force to normalize key names of `s3` interface | **NO** |
+| `s3` key `/a/b/c` available from `ofs/o3fs` | YES | 
+| `s3` key `/a/b//c` available from `ofs/o3fs` | NO | 
+| `s3` key `/a/b//c` available from `s3` | *YES* (100% AWS compatibility) |
+
+
+# Proposed solution
+
+In short: 
+
+ **I propose to make it possible to configure **normalization** and **intermediate dir creation**, independent from each other**
+ 
+It can be done in multiple ways. For the sake of simplicity, let's imagine two configuration option
+
+| configuration | behavior | 
+|-|-|
+| `ozone.om.enable.filesystem.paths=true`  | Enable intermediate dir generation **AND** key name normalization 
+| `ozone.om.enable.intermediate.dirs=true` | Enable only the intermediate dir generation
+
+## S3/HCFS Interoperability
+
+**In case of intermediate directory generation is enabled (with either of the configuraiton keys)**:
+
+When somebody creates a new key like `/a/b/c/d`, the same key should be visible from HCFS (`o3fs//` or `o3://`). `/a`, `/a/b` and `/a/b/c` should be visible as directories from HCFS.
+
+S3 should list only the `/a/b/c/d` keys, (`/a`, `/a/b`, `/a/b/c` keys, created to help HCFS, **won't be visible** if the key is created from S3)
+
+This can be done with persisting an extra flag with the implicit directory entries. These entries can be modified if they are explicit created.
+
+This flag should be added only for the keys which are created by S3. `ofs://` and `of3fs://`  create explicit directories all the time.
+
+Advantages of this approach:
+
+ 1. HCFS and S3 can work together
+ 2. S3 behavior is closer to the original AWS s3 behavior (when `/a/b/c` key is created `/a/b` won't be visible)
+
+## Handling of the incompatible paths
+
+As it's defined above the intermediate directory generation and normalization are two independent settings. (It's possible to choose only to create the intermediate directories).
+
+**If normalization is choosen**: (`ozone.om.enable.filesystem.paths=true`), all the key names will be normalized to fs-compatible name. It may cause a conflict (error) if the normalized key is already exists (or exists as a file instead of directory)
+
+**Without normalization (`ozone.om.enable.intermediate.dirs=true`)**:
+
+Creating intermediate directories might not be possible if path contains illegal characters or can't be parsed as a file system path. **These keys will be invisible from HCFS** by default. They will be ignored during the normal file list.
+
+## Using Ozone in object-store only mode
+
+Creating intermediate directories can have some overhead (write amplification is increased if many keys are written with different prefixes as we need an entry for each prefixes). This write-amplification can be handled with the current implementation: based on the measurements RocksDB has no problems with billions of keys.
+
+If none of the mentioned configurations are enabled, the intermediate directories won't be created. But in this case, the consistent view of `ofs/o3fs` couldn't be guaranteed, so `ofs/o3fs` **should be disabled and throw an exception** (But Ozone can be used as a pure S3 replacement without using as a HCFS).
+
+# Problematic cases
+
+As described in the previous section there are some cases which couldn't be supported out-of-the-box due to the differences between the flat key-space and file-system hierarchy. These cases are collected here together with the information how existing tools (AWS console, AWS cli, AWS S3A Hadoop connector) behaves.
+
+## Empty directory path
+
+With a pure object store keys with empty directory names can be created.
+
+```
+aws s3api put-object --bucket ozonetest --key a/b//y --body README.md
+```
+
+Behavior:
+ * *S3 web console*: empty dir is rendered as `____` to make it possible to navigate in
+ * **aws s3 ls**: Prefix entry is visible as `PRE /`
+ * S3A: Not visible
+ 
+Proposed behavior:
+
+ * `ozone.om.enable.intermediate.dirs=true`: `/y` is not accessible, `/a/b` directory doesn't contain this entry
+ * `ozone.om.enable.filesystem.paths=true`: key stored as `/a/b/c`  
+ 
+## Path with invalid characters (`..`,`.`) 
+
+Path segments might include parts which has file system semantics:
+
+```
+aws s3api put-object --bucket ozonetest --key a/b/../e --body README.md
+aws s3api put-object --bucket ozonetest --key a/b/./f --body README.md
+```
+
+Behavior:
+ * *S3 web console*: `.` and `..` are rendered as directories to make it possible to navigate in
+ * **aws s3 ls**: Prefix entry is visible as `PRE ../` and `PRE ./`
+ * S3A: Entries are not visible
+ 
+Proposed behavior:
+
+ * `ozone.om.enable.intermediate.dirs=true`: `e` and `f` are not visible
+ * `ozone.om.enable.filesystem.paths=true`: key stored as `/a/e` and `a/b/f`  
+
+## Key and directory with the same name
+
+It is possible to create directory and key with the same name in AWS:
+
+```
+aws s3api put-object --bucket ozonetest --key a/b/h --body README.md
+aws s3api list-objects --bucket ozonetest --prefix=a/b/h/
+```
+
+Behavior:
+ * *S3 web console*: both directory and file are rendered
+ * **aws s3 ls**: prefix (`PRE h/`) and file (`h`) are both displayed
+ * S3A: both entries are visible with the name `/a/b/h` but firt is a file (with size) second is a directory (with directory attributes)
+ 
+Proposed behavior:
+
+ * `ozone.om.enable.intermediate.dirs=true`: show both the file and the directory with the same name (similar to S3A)
+ * `ozone.om.enable.filesystem.paths=true`: throwing exception when the second one is created  
+
+## Directory entry created with file content
+
+In this case we create a directory (key which ends with `/`) but with real file content:
+
+```
+aws s3api put-object --bucket ozonetest --key a/b/i/ --body README.md
+```
+
+Behavior:
+ * *S3 web console*: rendered as directory (couldn't be downloaded)
+ * **aws s3 ls**: showed as a prefix (`aws s3 ls s3://ozonetest/a/b`), but when the full path is used showed as a file without name (`aws s3 ls s3://ozonetest/a/b/i/`)
+ * S3A: `./bin/hdfs dfs -ls s3a://ozonetest/a/b/` shows a directory `h`, `./bin/hdfs dfs -ls s3a://ozonetest/a/b/i` shows a file `i`
+ 
+Proposed behavior:
+
+ * `ozone.om.enable.intermediate.dirs=true`: possible but `i/` is hidden from o3fs/ofs
+ * `ozone.om.enable.filesystem.paths=true`: key name is normalized to real key name
+
+## Create key and explicit create parent dir
+
+```
+aws s3api put-object --bucket ozonetest --key e/f/g/
+aws s3api put-object --bucket ozonetest --key e/f/
+```
+
+Behavior:
+
+ * S3 can support it without any problem
+ 
+Proposed behavior:
+
+After the first command `/e/f/` and `/e/` entries created in the key space (as they are required by `ofs`/`o3fs`) but **with a specific flag** (explicit=false). 
+
+AWS S3 list-objects API should exclude those entries from the result (!).
+
+Second command execution should modify the flag of `/e/f/` key to (explicit=true).
+
+## Create parent dir AND key with S3a
+
+This is the problem which is reported by [HDDS-4209](https://issues.apache.org/jira/browse/HDDS-4209)
+
+```
+hdfs dfs -mkdir -p s3a://b12345/d11/d12 # -> Success
+
+hdfs dfs -put /tmp/file1 s3a://b12345/d11/d12/file1 # -> fails with below error
+```
+
+Proposed behavior:
+
+ * `ozone.om.enable.intermediate.dirs=true`: should work without error
+ * `ozone.om.enable.filesystem.paths=true`: should work without error.
+
+This is an `ofs`/`o3fs` question not an S3. The directory created in the first step shouldn't block the creation of the file. This can be a **mandatory** normalization for `mkdir` directory creation. As it's an HCFS operation, s3 is not affected. Entries created from S3 can be visible from s3 without any problem.
+
+## Create file and directory with S3
+
+This problem is reported in HDDS-4209, thanks to @Bharat
+
+```
+hdfs dfs -mkdir -p s3a://b12345/d11/d12 -> Success
+
+hdfs dfs -put /tmp/file1 s3a://b12345/d11/d12/file1 
+```
+
+In this case first a `d11/d12/` key is created. The intermediate key creation logic in the second step should use it as a directory instead of throwing an exception.

Review comment:
       I tested it locally and found the trailing `/`. Are you sure?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org


[GitHub] [hadoop-ozone] bharatviswa504 commented on a change in pull request #1411: HDDS-4097. [DESIGN] S3/Ozone Filesystem inter-op

Posted by GitBox <gi...@apache.org>.
bharatviswa504 commented on a change in pull request #1411:
URL: https://github.com/apache/hadoop-ozone/pull/1411#discussion_r485802418



##########
File path: hadoop-hdds/docs/content/design/s3_hcfs.md
##########
@@ -0,0 +1,282 @@
+---
+title: S3/Ozone Filesystem inter-op 
+summary: How to support both S3 and HCFS and the same time
+date: 2020-09-09
+jira: HDDS-4097
+status: draft
+author: Marton Elek, 
+---
+<!--
+  Licensed under the Apache License, Version 2.0 (the "License");
+  you may not use this file except in compliance with the License.
+  You may obtain a copy of the License at
+
+   http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License. See accompanying LICENSE file.
+-->
+
+# Ozone S3 vs file-system semantics
+
+Ozone is an object-store for Hadoop ecosystem which can be used from multiple interfaces: 
+
+ 1. From Hadoop Compatible File Systems (will be called as *HCFS* in the remaining of this document) (RPC)
+ 2. From S3 compatible applications (REST)
+ 3. From container orchestrator as mounted volume (CSI, alpha feature)
+
+As Ozone is an object store it stores key and values in a flat hierarchy which is enough to support S3 (2). But to support Hadoop Compatible File System (and CSI), Ozone should simulated file system hierarchy.
+
+There are multiple challenges when file system hierarchy is simulated by a flat namespace:
+
+ 1. Some key patterns couldn't be easily transformed to file system path (e.g. `/a/b/../c`, `/a/b//d`, or a real key with directory path like `/b/d/`)
+ 2. Directory entries (which may have own properties) require special handling as file system interface requires a dir entry even if it's not created explicitly (for example if key `/a/b/c` is created `/a/b` supposed to be a visible directory entry for file system interface) 
+ 3. Non-recursive listing of directories can be hard (Listing direct entries under `/a` should ignore all the `/a/b/...`, `/a/b/c/...` keys) 
+ 4. Similar to listing, rename can be a costly operation as it requires to rename many keys (renaming a first level directory means a rename of all the keys with the same prefix)
+
+See also the [Hadoop S3A documentation](https://hadoop.apache.org/docs/current/hadoop-aws/tools/hadoop-aws/index.html#Introducing_the_Hadoop_S3A_client) which describes some of these problem when AWS S3 is used. (*Warnings* section)
+
+# Current status
+
+As of today *Ozone Manager* has two different interfaces (both are defined in `OmClientProtocol.proto`): 
+
+ 1. object store related functions (like *CreateKey*, *LookupKey*, *DeleteKey*,...)  
+ 2. file system related functions (like *CreateFile*, *LookupFile*,...)
+
+File system related functions uses the same flat hierarchy under the hood but includes additional functionalities. For example the `createFile` call creates all the intermediate directories for a specific key (create file `/a/b/c` will create `/a/b` and `/a` entries in the key space)
+
+Today, if a key is created from the S3 interface can cause exceptions if the intermediate directories are checked from HCFS:
+
+
+```shell
+$ aws s3api put-object --endpoint http://localhost:9878 --bucket bucket1 --key /a/b/c/d
+
+$ ozone fs -ls o3fs://bucket1.s3v/a/
+ls: `o3fs://bucket1.s3v/a/': No such file or directory
+```
+
+This problem is reported in [HDDS-3955](https://issues.apache.org/jira/browse/HDDS-3955), where a new configuration key is introduced (`ozone.om.enable.filesystem.paths`). If this is enabled, intermediate directories are created even if the object store interface is used.
+
+This configuration is turned off by default, which means that S3 and HCFS couldn't be used together.
+
+To solve the performance problems of the directory listing / rename, [HDDS-2939](https://issues.apache.org/jira/browse/HDDS-2939) is created, which propose to use a new prefix table to store the "directory" entries (=prefixes).
+
+[HDDS-4097](https://issues.apache.org/jira/browse/HDDS-4097) is created to normalize the key names based on file-system semantics if `ozone.om.enable.filesystem.paths` is enabled. But please note that `ozone.om.enable.filesystem.paths` should always be turned on if S3 and HCFS are both used which means that S3 and HCFS couldn't be used together with normalization.
+
+# Goals
+
+ * Out of the box Ozone should support both S3 and HCFS interfaces without any settings. (It's possible only for the regular, fs compatible key names)
+ * As 100% compatibility couldn't be achieved on both side we need a configuration to set the expectations for incompatible key names
+ * Default behavior of `o3fs` and `ofs` should be as close to `s3a` as possible (when s3 compatibilty is prefered)
+
+# Possible cases to support
+
+There are two main aspects of supporting both `ofs/o3fs` and `s3` together:
+
+ 1. `ofs/o3fs` require to create intermediate directory entries (for exapmle `/a/b` for the key `/b/c/c`)
+ 2. Special file-system incompatible key names require special attention
+
+The second couldn't be done with compromise.
+
+ 1. We either support all key names (including non fs compatible key names), which means `ofs/o3fs` can provide only a partial view
+ 2. Or we can normalize the key names to be fs compatible (which makes it possible to create inconsistent S3 keys)
+
+HDDS-3955 introduced `ozone.om.enable.filesystem.paths`, with this setting we will have two possible usage pattern:
+
+| ozone.om.enable.filesystem.paths= | true | false
+|-|-|-|
+| create itermediate dirs | YES | NO |
+| normalize key names from `ofs/o3fs` | YES | NO
+| force to normalize key names of `s3` interface | YES (1) | NO 
+| `s3` key `/a/b/c` available from `ofs/o3fs` | YES | NO
+| `s3` key `/a/b//c` available from `ofs/o3fs` | YES | NO
+| `s3` key `/a/b//c` available from `s3` | AWS S3 incompatibility | YES
+
+(1): Under implementation
+
+This proposal suggest to use a 3rd option where 100% AWS compatiblity is guaranteed in exchange of a limited `ofs/o3fs` view:
+
+| ozone.om.intermediate.dir.generation= | true |
+|-|-|-|
+| create itermediate dirs | YES | 
+| normalize key names from `ofs/o3fs` | YES |
+| force to normalize key names of `s3` interface | **NO** |
+| `s3` key `/a/b/c` available from `ofs/o3fs` | YES | 
+| `s3` key `/a/b//c` available from `ofs/o3fs` | NO | 
+| `s3` key `/a/b//c` available from `s3` | *YES* (100% AWS compatibility) |
+
+
+# Proposed solution
+
+In short: 
+
+ **I propose to make it possible to configure **normalization** and **intermediate dir creation**, independent from each other**
+ 
+It can be done in multiple ways. For the sake of simplicity, let's imagine two configuration option
+
+| configuration | behavior | 
+|-|-|
+| `ozone.om.enable.filesystem.paths=true`  | Enable intermediate dir generation **AND** key name normalization 
+| `ozone.om.enable.intermediate.dirs=true` | Enable only the intermediate dir generation
+
+## S3/HCFS Interoperability
+
+**In case of intermediate directory generation is enabled (with either of the configuraiton keys)**:
+
+When somebody creates a new key like `/a/b/c/d`, the same key should be visible from HCFS (`o3fs//` or `o3://`). `/a`, `/a/b` and `/a/b/c` should be visible as directories from HCFS.
+
+S3 should list only the `/a/b/c/d` keys, (`/a`, `/a/b`, `/a/b/c` keys, created to help HCFS, **won't be visible** if the key is created from S3)
+
+This can be done with persisting an extra flag with the implicit directory entries. These entries can be modified if they are explicit created.
+
+This flag should be added only for the keys which are created by S3. `ofs://` and `of3fs://`  create explicit directories all the time.
+
+Advantages of this approach:
+
+ 1. HCFS and S3 can work together
+ 2. S3 behavior is closer to the original AWS s3 behavior (when `/a/b/c` key is created `/a/b` won't be visible)
+
+## Handling of the incompatible paths
+
+As it's defined above the intermediate directory generation and normalization are two independent settings. (It's possible to choose only to create the intermediate directories).
+
+**If normalization is choosen**: (`ozone.om.enable.filesystem.paths=true`), all the key names will be normalized to fs-compatible name. It may cause a conflict (error) if the normalized key is already exists (or exists as a file instead of directory)
+
+**Without normalization (`ozone.om.enable.intermediate.dirs=true`)**:
+
+Creating intermediate directories might not be possible if path contains illegal characters or can't be parsed as a file system path. **These keys will be invisible from HCFS** by default. They will be ignored during the normal file list.
+
+## Using Ozone in object-store only mode
+
+Creating intermediate directories can have some overhead (write amplification is increased if many keys are written with different prefixes as we need an entry for each prefixes). This write-amplification can be handled with the current implementation: based on the measurements RocksDB has no problems with billions of keys.
+
+If none of the mentioned configurations are enabled, the intermediate directories won't be created. But in this case, the consistent view of `ofs/o3fs` couldn't be guaranteed, so `ofs/o3fs` **should be disabled and throw an exception** (But Ozone can be used as a pure S3 replacement without using as a HCFS).
+
+# Problematic cases
+
+As described in the previous section there are some cases which couldn't be supported out-of-the-box due to the differences between the flat key-space and file-system hierarchy. These cases are collected here together with the information how existing tools (AWS console, AWS cli, AWS S3A Hadoop connector) behaves.
+
+## Empty directory path
+
+With a pure object store keys with empty directory names can be created.
+
+```
+aws s3api put-object --bucket ozonetest --key a/b//y --body README.md
+```
+
+Behavior:
+ * *S3 web console*: empty dir is rendered as `____` to make it possible to navigate in
+ * **aws s3 ls**: Prefix entry is visible as `PRE /`
+ * S3A: Not visible
+ 
+Proposed behavior:
+
+ * `ozone.om.enable.intermediate.dirs=true`: `/y` is not accessible, `/a/b` directory doesn't contain this entry
+ * `ozone.om.enable.filesystem.paths=true`: key stored as `/a/b/c`  
+ 
+## Path with invalid characters (`..`,`.`) 
+
+Path segments might include parts which has file system semantics:
+
+```
+aws s3api put-object --bucket ozonetest --key a/b/../e --body README.md
+aws s3api put-object --bucket ozonetest --key a/b/./f --body README.md
+```
+
+Behavior:
+ * *S3 web console*: `.` and `..` are rendered as directories to make it possible to navigate in
+ * **aws s3 ls**: Prefix entry is visible as `PRE ../` and `PRE ./`
+ * S3A: Entries are not visible
+ 
+Proposed behavior:
+
+ * `ozone.om.enable.intermediate.dirs=true`: `e` and `f` are not visible
+ * `ozone.om.enable.filesystem.paths=true`: key stored as `/a/e` and `a/b/f`  

Review comment:
       Can we also add info on visibility and how it will be displayed also? 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org


[GitHub] [hadoop-ozone] xiaoyuyao commented on a change in pull request #1411: HDDS-4097. [DESIGN] S3/Ozone Filesystem inter-op

Posted by GitBox <gi...@apache.org>.
xiaoyuyao commented on a change in pull request #1411:
URL: https://github.com/apache/hadoop-ozone/pull/1411#discussion_r486757610



##########
File path: hadoop-hdds/docs/content/design/s3_hcfs.md
##########
@@ -0,0 +1,280 @@
+---
+title: S3/Ozone Filesystem inter-op 
+summary: How to support both S3 and HCFS and the same time
+date: 2020-09-09
+jira: HDDS-4097
+status: draft
+author: Marton Elek, 
+---
+<!--
+  Licensed under the Apache License, Version 2.0 (the "License");
+  you may not use this file except in compliance with the License.
+  You may obtain a copy of the License at
+
+   http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License. See accompanying LICENSE file.
+-->
+
+# Ozone S3 vs file-system semantics
+
+Ozone is an object-store for Hadoop ecosystem which can be used from multiple interfaces: 
+
+ 1. From Hadoop Compatible File Systems (will be called as *HCFS* in the remaining of this document) (RPC)
+ 2. From S3 compatible applications (REST)
+ 3. From container orchestrator as mounted volume (CSI, alpha feature)
+
+As Ozone is an object store it stores key and values in a flat hierarchy which is enough to support S3 (2). But to support Hadoop Compatible File System (and CSI), Ozone should simulated file system hierarchy.
+
+There are multiple challenges when file system hierarchy is simulated by a flat namespace:
+
+ 1. Some key patterns couldn't be easily transformed to file system path (e.g. `/a/b/../c`, `/a/b//d`, or a real key with directory path like `/b/d/`)
+ 2. Directory entries (which may have own properties) require special handling as file system interface requires a dir entry even if it's not created explicitly (for example if key `/a/b/c` is created `/a/b` supposed to be a visible directory entry for file system interface) 
+ 3. Non-recursive listing of directories can be hard (Listing direct entries under `/a` should ignore all the `/a/b/...`, `/a/b/c/...` keys) 
+ 4. Similar to listing, rename can be a costly operation as it requires to rename many keys (renaming a first level directory means a rename of all the keys with the same prefix)
+
+See also the [Hadoop S3A documentation](https://hadoop.apache.org/docs/current/hadoop-aws/tools/hadoop-aws/index.html#Introducing_the_Hadoop_S3A_client) which describes some of these problem when AWS S3 is used. (*Warnings* section)
+
+# Current status
+
+As of today *Ozone Manager* has two different interfaces (both are defined in `OmClientProtocol.proto`): 
+
+ 1. object store related functions (like *CreateKey*, *LookupKey*, *DeleteKey*,...)  
+ 2. file system related functions (like *CreateFile*, *LookupFile*,...)
+
+File system related functions uses the same flat hierarchy under the hood but includes additional functionalities. For example the `createFile` call creates all the intermediate directories for a specific key (create file `/a/b/c` will create `/a/b` and `/a` entries in the key space)
+
+Today, if a key is created from the S3 interface can cause exceptions if the intermediate directories are checked from HCFS:
+
+
+```shell
+$ aws s3api put-object --endpoint http://localhost:9878 --bucket bucket1 --key /a/b/c/d
+
+$ ozone fs -ls o3fs://bucket1.s3v/a/
+ls: `o3fs://bucket1.s3v/a/': No such file or directory
+```
+
+This problem is reported in [HDDS-3955](https://issues.apache.org/jira/browse/HDDS-3955), where a new configuration key is introduced (`ozone.om.enable.filesystem.paths`). If this is enabled, intermediate directories are created even if the object store interface is used.
+
+This configuration is turned off by default, which means that S3 and HCFS couldn't be used together.
+
+To solve the performance problems of the directory listing / rename, [HDDS-2939](https://issues.apache.org/jira/browse/HDDS-2939) is created, which propose to use a new prefix table to store the "directory" entries (=prefixes).
+
+[HDDS-4097](https://issues.apache.org/jira/browse/HDDS-4097) is created to normalize the key names based on file-system semantics if `ozone.om.enable.filesystem.paths` is enabled. But please note that `ozone.om.enable.filesystem.paths` should always be turned on if S3 and HCFS are both used. It means that if both S3 and HCFS are used, normalization is forced, and S3 interface is not fully AWS S3 compatible. There is no option to use HCFS and S3 but with full AWS compatibility (and reduced HCFS compatibility). 
+
+# Goals
+
+ * Out of the box Ozone should support both S3 and HCFS interfaces without any settings. (It's possible only for the regular, fs compatible key names)
+ * As 100% compatibility couldn't be achieved on both side we need a configuration to set the expectations for incompatible key names
+ * Default behavior of `o3fs` and `ofs` should be as close to `s3a` as possible (when s3 compatibilty is prefered)
+
+# Possible cases to support
+
+There are two main aspects of supporting both `ofs/o3fs` and `s3` together:
+
+ 1. `ofs/o3fs` require to create intermediate directory entries (for example `/a/b` for the key `/b/c/c`)

Review comment:
       /b/c/c => /a/b/c

##########
File path: hadoop-hdds/docs/content/design/s3_hcfs.md
##########
@@ -0,0 +1,280 @@
+---
+title: S3/Ozone Filesystem inter-op 
+summary: How to support both S3 and HCFS and the same time
+date: 2020-09-09
+jira: HDDS-4097
+status: draft
+author: Marton Elek, 
+---
+<!--
+  Licensed under the Apache License, Version 2.0 (the "License");
+  you may not use this file except in compliance with the License.
+  You may obtain a copy of the License at
+
+   http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License. See accompanying LICENSE file.
+-->
+
+# Ozone S3 vs file-system semantics
+
+Ozone is an object-store for Hadoop ecosystem which can be used from multiple interfaces: 
+
+ 1. From Hadoop Compatible File Systems (will be called as *HCFS* in the remaining of this document) (RPC)
+ 2. From S3 compatible applications (REST)
+ 3. From container orchestrator as mounted volume (CSI, alpha feature)
+
+As Ozone is an object store it stores key and values in a flat hierarchy which is enough to support S3 (2). But to support Hadoop Compatible File System (and CSI), Ozone should simulated file system hierarchy.
+
+There are multiple challenges when file system hierarchy is simulated by a flat namespace:
+
+ 1. Some key patterns couldn't be easily transformed to file system path (e.g. `/a/b/../c`, `/a/b//d`, or a real key with directory path like `/b/d/`)
+ 2. Directory entries (which may have own properties) require special handling as file system interface requires a dir entry even if it's not created explicitly (for example if key `/a/b/c` is created `/a/b` supposed to be a visible directory entry for file system interface) 
+ 3. Non-recursive listing of directories can be hard (Listing direct entries under `/a` should ignore all the `/a/b/...`, `/a/b/c/...` keys) 
+ 4. Similar to listing, rename can be a costly operation as it requires to rename many keys (renaming a first level directory means a rename of all the keys with the same prefix)
+
+See also the [Hadoop S3A documentation](https://hadoop.apache.org/docs/current/hadoop-aws/tools/hadoop-aws/index.html#Introducing_the_Hadoop_S3A_client) which describes some of these problem when AWS S3 is used. (*Warnings* section)
+
+# Current status
+
+As of today *Ozone Manager* has two different interfaces (both are defined in `OmClientProtocol.proto`): 
+
+ 1. object store related functions (like *CreateKey*, *LookupKey*, *DeleteKey*,...)  
+ 2. file system related functions (like *CreateFile*, *LookupFile*,...)
+
+File system related functions uses the same flat hierarchy under the hood but includes additional functionalities. For example the `createFile` call creates all the intermediate directories for a specific key (create file `/a/b/c` will create `/a/b` and `/a` entries in the key space)
+
+Today, if a key is created from the S3 interface can cause exceptions if the intermediate directories are checked from HCFS:
+
+
+```shell
+$ aws s3api put-object --endpoint http://localhost:9878 --bucket bucket1 --key /a/b/c/d
+
+$ ozone fs -ls o3fs://bucket1.s3v/a/
+ls: `o3fs://bucket1.s3v/a/': No such file or directory
+```
+
+This problem is reported in [HDDS-3955](https://issues.apache.org/jira/browse/HDDS-3955), where a new configuration key is introduced (`ozone.om.enable.filesystem.paths`). If this is enabled, intermediate directories are created even if the object store interface is used.
+
+This configuration is turned off by default, which means that S3 and HCFS couldn't be used together.
+
+To solve the performance problems of the directory listing / rename, [HDDS-2939](https://issues.apache.org/jira/browse/HDDS-2939) is created, which propose to use a new prefix table to store the "directory" entries (=prefixes).
+
+[HDDS-4097](https://issues.apache.org/jira/browse/HDDS-4097) is created to normalize the key names based on file-system semantics if `ozone.om.enable.filesystem.paths` is enabled. But please note that `ozone.om.enable.filesystem.paths` should always be turned on if S3 and HCFS are both used. It means that if both S3 and HCFS are used, normalization is forced, and S3 interface is not fully AWS S3 compatible. There is no option to use HCFS and S3 but with full AWS compatibility (and reduced HCFS compatibility). 
+
+# Goals
+
+ * Out of the box Ozone should support both S3 and HCFS interfaces without any settings. (It's possible only for the regular, fs compatible key names)
+ * As 100% compatibility couldn't be achieved on both side we need a configuration to set the expectations for incompatible key names
+ * Default behavior of `o3fs` and `ofs` should be as close to `s3a` as possible (when s3 compatibilty is prefered)
+
+# Possible cases to support
+
+There are two main aspects of supporting both `ofs/o3fs` and `s3` together:
+
+ 1. `ofs/o3fs` require to create intermediate directory entries (for example `/a/b` for the key `/b/c/c`)
+ 2. Special file-system incompatible key names require special attention
+
+The second couldn't be done with compromise.
+
+ 1. We either support all key names (including non fs compatible key names), which means `ofs/o3fs` can provide only a partial view
+ 2. Or we can normalize the key names to be fs compatible (which makes it possible to create inconsistent S3 keys)
+
+HDDS-3955 introduced `ozone.om.enable.filesystem.paths`, with this setting we will have two possible usage pattern:
+
+| ozone.om.enable.filesystem.paths= | true | false
+|-|-|-|
+| create itermediate dirs | YES | NO |
+| normalize key names from `ofs/o3fs` | YES | NO
+| force to normalize key names of `s3` interface | YES | NO 
+| `s3` key `/a/b/c` available from `ofs/o3fs` | YES | NO
+| `s3` key `/a/b//c` available from `ofs/o3fs` | YES | NO
+| `s3` key `/a/b//c` available from `s3` | AWS S3 incompatibility | YES

Review comment:
       Any pointer to S3 compatibility requirement? e.g., path handling, normalization, etc.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org


[GitHub] [hadoop-ozone] xiaoyuyao commented on a change in pull request #1411: HDDS-4097. [DESIGN] S3/Ozone Filesystem inter-op

Posted by GitBox <gi...@apache.org>.
xiaoyuyao commented on a change in pull request #1411:
URL: https://github.com/apache/hadoop-ozone/pull/1411#discussion_r486757610



##########
File path: hadoop-hdds/docs/content/design/s3_hcfs.md
##########
@@ -0,0 +1,280 @@
+---
+title: S3/Ozone Filesystem inter-op 
+summary: How to support both S3 and HCFS and the same time
+date: 2020-09-09
+jira: HDDS-4097
+status: draft
+author: Marton Elek, 
+---
+<!--
+  Licensed under the Apache License, Version 2.0 (the "License");
+  you may not use this file except in compliance with the License.
+  You may obtain a copy of the License at
+
+   http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License. See accompanying LICENSE file.
+-->
+
+# Ozone S3 vs file-system semantics
+
+Ozone is an object-store for Hadoop ecosystem which can be used from multiple interfaces: 
+
+ 1. From Hadoop Compatible File Systems (will be called as *HCFS* in the remaining of this document) (RPC)
+ 2. From S3 compatible applications (REST)
+ 3. From container orchestrator as mounted volume (CSI, alpha feature)
+
+As Ozone is an object store it stores key and values in a flat hierarchy which is enough to support S3 (2). But to support Hadoop Compatible File System (and CSI), Ozone should simulated file system hierarchy.
+
+There are multiple challenges when file system hierarchy is simulated by a flat namespace:
+
+ 1. Some key patterns couldn't be easily transformed to file system path (e.g. `/a/b/../c`, `/a/b//d`, or a real key with directory path like `/b/d/`)
+ 2. Directory entries (which may have own properties) require special handling as file system interface requires a dir entry even if it's not created explicitly (for example if key `/a/b/c` is created `/a/b` supposed to be a visible directory entry for file system interface) 
+ 3. Non-recursive listing of directories can be hard (Listing direct entries under `/a` should ignore all the `/a/b/...`, `/a/b/c/...` keys) 
+ 4. Similar to listing, rename can be a costly operation as it requires to rename many keys (renaming a first level directory means a rename of all the keys with the same prefix)
+
+See also the [Hadoop S3A documentation](https://hadoop.apache.org/docs/current/hadoop-aws/tools/hadoop-aws/index.html#Introducing_the_Hadoop_S3A_client) which describes some of these problem when AWS S3 is used. (*Warnings* section)
+
+# Current status
+
+As of today *Ozone Manager* has two different interfaces (both are defined in `OmClientProtocol.proto`): 
+
+ 1. object store related functions (like *CreateKey*, *LookupKey*, *DeleteKey*,...)  
+ 2. file system related functions (like *CreateFile*, *LookupFile*,...)
+
+File system related functions uses the same flat hierarchy under the hood but includes additional functionalities. For example the `createFile` call creates all the intermediate directories for a specific key (create file `/a/b/c` will create `/a/b` and `/a` entries in the key space)
+
+Today, if a key is created from the S3 interface can cause exceptions if the intermediate directories are checked from HCFS:
+
+
+```shell
+$ aws s3api put-object --endpoint http://localhost:9878 --bucket bucket1 --key /a/b/c/d
+
+$ ozone fs -ls o3fs://bucket1.s3v/a/
+ls: `o3fs://bucket1.s3v/a/': No such file or directory
+```
+
+This problem is reported in [HDDS-3955](https://issues.apache.org/jira/browse/HDDS-3955), where a new configuration key is introduced (`ozone.om.enable.filesystem.paths`). If this is enabled, intermediate directories are created even if the object store interface is used.
+
+This configuration is turned off by default, which means that S3 and HCFS couldn't be used together.
+
+To solve the performance problems of the directory listing / rename, [HDDS-2939](https://issues.apache.org/jira/browse/HDDS-2939) is created, which propose to use a new prefix table to store the "directory" entries (=prefixes).
+
+[HDDS-4097](https://issues.apache.org/jira/browse/HDDS-4097) is created to normalize the key names based on file-system semantics if `ozone.om.enable.filesystem.paths` is enabled. But please note that `ozone.om.enable.filesystem.paths` should always be turned on if S3 and HCFS are both used. It means that if both S3 and HCFS are used, normalization is forced, and S3 interface is not fully AWS S3 compatible. There is no option to use HCFS and S3 but with full AWS compatibility (and reduced HCFS compatibility). 
+
+# Goals
+
+ * Out of the box Ozone should support both S3 and HCFS interfaces without any settings. (It's possible only for the regular, fs compatible key names)
+ * As 100% compatibility couldn't be achieved on both side we need a configuration to set the expectations for incompatible key names
+ * Default behavior of `o3fs` and `ofs` should be as close to `s3a` as possible (when s3 compatibilty is prefered)
+
+# Possible cases to support
+
+There are two main aspects of supporting both `ofs/o3fs` and `s3` together:
+
+ 1. `ofs/o3fs` require to create intermediate directory entries (for example `/a/b` for the key `/b/c/c`)

Review comment:
       /b/c/c => /a/b/c

##########
File path: hadoop-hdds/docs/content/design/s3_hcfs.md
##########
@@ -0,0 +1,280 @@
+---
+title: S3/Ozone Filesystem inter-op 
+summary: How to support both S3 and HCFS and the same time
+date: 2020-09-09
+jira: HDDS-4097
+status: draft
+author: Marton Elek, 
+---
+<!--
+  Licensed under the Apache License, Version 2.0 (the "License");
+  you may not use this file except in compliance with the License.
+  You may obtain a copy of the License at
+
+   http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License. See accompanying LICENSE file.
+-->
+
+# Ozone S3 vs file-system semantics
+
+Ozone is an object-store for Hadoop ecosystem which can be used from multiple interfaces: 
+
+ 1. From Hadoop Compatible File Systems (will be called as *HCFS* in the remaining of this document) (RPC)
+ 2. From S3 compatible applications (REST)
+ 3. From container orchestrator as mounted volume (CSI, alpha feature)
+
+As Ozone is an object store it stores key and values in a flat hierarchy which is enough to support S3 (2). But to support Hadoop Compatible File System (and CSI), Ozone should simulated file system hierarchy.
+
+There are multiple challenges when file system hierarchy is simulated by a flat namespace:
+
+ 1. Some key patterns couldn't be easily transformed to file system path (e.g. `/a/b/../c`, `/a/b//d`, or a real key with directory path like `/b/d/`)
+ 2. Directory entries (which may have own properties) require special handling as file system interface requires a dir entry even if it's not created explicitly (for example if key `/a/b/c` is created `/a/b` supposed to be a visible directory entry for file system interface) 
+ 3. Non-recursive listing of directories can be hard (Listing direct entries under `/a` should ignore all the `/a/b/...`, `/a/b/c/...` keys) 
+ 4. Similar to listing, rename can be a costly operation as it requires to rename many keys (renaming a first level directory means a rename of all the keys with the same prefix)
+
+See also the [Hadoop S3A documentation](https://hadoop.apache.org/docs/current/hadoop-aws/tools/hadoop-aws/index.html#Introducing_the_Hadoop_S3A_client) which describes some of these problem when AWS S3 is used. (*Warnings* section)
+
+# Current status
+
+As of today *Ozone Manager* has two different interfaces (both are defined in `OmClientProtocol.proto`): 
+
+ 1. object store related functions (like *CreateKey*, *LookupKey*, *DeleteKey*,...)  
+ 2. file system related functions (like *CreateFile*, *LookupFile*,...)
+
+File system related functions uses the same flat hierarchy under the hood but includes additional functionalities. For example the `createFile` call creates all the intermediate directories for a specific key (create file `/a/b/c` will create `/a/b` and `/a` entries in the key space)
+
+Today, if a key is created from the S3 interface can cause exceptions if the intermediate directories are checked from HCFS:
+
+
+```shell
+$ aws s3api put-object --endpoint http://localhost:9878 --bucket bucket1 --key /a/b/c/d
+
+$ ozone fs -ls o3fs://bucket1.s3v/a/
+ls: `o3fs://bucket1.s3v/a/': No such file or directory
+```
+
+This problem is reported in [HDDS-3955](https://issues.apache.org/jira/browse/HDDS-3955), where a new configuration key is introduced (`ozone.om.enable.filesystem.paths`). If this is enabled, intermediate directories are created even if the object store interface is used.
+
+This configuration is turned off by default, which means that S3 and HCFS couldn't be used together.
+
+To solve the performance problems of the directory listing / rename, [HDDS-2939](https://issues.apache.org/jira/browse/HDDS-2939) is created, which propose to use a new prefix table to store the "directory" entries (=prefixes).
+
+[HDDS-4097](https://issues.apache.org/jira/browse/HDDS-4097) is created to normalize the key names based on file-system semantics if `ozone.om.enable.filesystem.paths` is enabled. But please note that `ozone.om.enable.filesystem.paths` should always be turned on if S3 and HCFS are both used. It means that if both S3 and HCFS are used, normalization is forced, and S3 interface is not fully AWS S3 compatible. There is no option to use HCFS and S3 but with full AWS compatibility (and reduced HCFS compatibility). 
+
+# Goals
+
+ * Out of the box Ozone should support both S3 and HCFS interfaces without any settings. (It's possible only for the regular, fs compatible key names)
+ * As 100% compatibility couldn't be achieved on both side we need a configuration to set the expectations for incompatible key names
+ * Default behavior of `o3fs` and `ofs` should be as close to `s3a` as possible (when s3 compatibilty is prefered)
+
+# Possible cases to support
+
+There are two main aspects of supporting both `ofs/o3fs` and `s3` together:
+
+ 1. `ofs/o3fs` require to create intermediate directory entries (for example `/a/b` for the key `/b/c/c`)
+ 2. Special file-system incompatible key names require special attention
+
+The second couldn't be done with compromise.
+
+ 1. We either support all key names (including non fs compatible key names), which means `ofs/o3fs` can provide only a partial view
+ 2. Or we can normalize the key names to be fs compatible (which makes it possible to create inconsistent S3 keys)
+
+HDDS-3955 introduced `ozone.om.enable.filesystem.paths`, with this setting we will have two possible usage pattern:
+
+| ozone.om.enable.filesystem.paths= | true | false
+|-|-|-|
+| create itermediate dirs | YES | NO |
+| normalize key names from `ofs/o3fs` | YES | NO
+| force to normalize key names of `s3` interface | YES | NO 
+| `s3` key `/a/b/c` available from `ofs/o3fs` | YES | NO
+| `s3` key `/a/b//c` available from `ofs/o3fs` | YES | NO
+| `s3` key `/a/b//c` available from `s3` | AWS S3 incompatibility | YES

Review comment:
       Any pointer to S3 compatibility requirement? e.g., path handling, normalization, etc.

##########
File path: hadoop-hdds/docs/content/design/s3_hcfs.md
##########
@@ -0,0 +1,282 @@
+---
+title: S3/Ozone Filesystem inter-op 
+summary: How to support both S3 and HCFS and the same time
+date: 2020-09-09
+jira: HDDS-4097
+status: draft
+author: Marton Elek, 
+---
+<!--
+  Licensed under the Apache License, Version 2.0 (the "License");
+  you may not use this file except in compliance with the License.
+  You may obtain a copy of the License at
+
+   http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License. See accompanying LICENSE file.
+-->
+
+# Ozone S3 vs file-system semantics
+
+Ozone is an object-store for Hadoop ecosystem which can be used from multiple interfaces: 
+
+ 1. From Hadoop Compatible File Systems (will be called as *HCFS* in the remaining of this document) (RPC)
+ 2. From S3 compatible applications (REST)
+ 3. From container orchestrator as mounted volume (CSI, alpha feature)
+
+As Ozone is an object store it stores key and values in a flat hierarchy which is enough to support S3 (2). But to support Hadoop Compatible File System (and CSI), Ozone should simulated file system hierarchy.
+
+There are multiple challenges when file system hierarchy is simulated by a flat namespace:
+
+ 1. Some key patterns couldn't be easily transformed to file system path (e.g. `/a/b/../c`, `/a/b//d`, or a real key with directory path like `/b/d/`)
+ 2. Directory entries (which may have own properties) require special handling as file system interface requires a dir entry even if it's not created explicitly (for example if key `/a/b/c` is created `/a/b` supposed to be a visible directory entry for file system interface) 
+ 3. Non-recursive listing of directories can be hard (Listing direct entries under `/a` should ignore all the `/a/b/...`, `/a/b/c/...` keys) 
+ 4. Similar to listing, rename can be a costly operation as it requires to rename many keys (renaming a first level directory means a rename of all the keys with the same prefix)
+
+See also the [Hadoop S3A documentation](https://hadoop.apache.org/docs/current/hadoop-aws/tools/hadoop-aws/index.html#Introducing_the_Hadoop_S3A_client) which describes some of these problem when AWS S3 is used. (*Warnings* section)
+
+# Current status
+
+As of today *Ozone Manager* has two different interfaces (both are defined in `OmClientProtocol.proto`): 
+
+ 1. object store related functions (like *CreateKey*, *LookupKey*, *DeleteKey*,...)  
+ 2. file system related functions (like *CreateFile*, *LookupFile*,...)
+
+File system related functions uses the same flat hierarchy under the hood but includes additional functionalities. For example the `createFile` call creates all the intermediate directories for a specific key (create file `/a/b/c` will create `/a/b` and `/a` entries in the key space)
+
+Today, if a key is created from the S3 interface can cause exceptions if the intermediate directories are checked from HCFS:
+
+
+```shell
+$ aws s3api put-object --endpoint http://localhost:9878 --bucket bucket1 --key /a/b/c/d
+
+$ ozone fs -ls o3fs://bucket1.s3v/a/
+ls: `o3fs://bucket1.s3v/a/': No such file or directory
+```
+
+This problem is reported in [HDDS-3955](https://issues.apache.org/jira/browse/HDDS-3955), where a new configuration key is introduced (`ozone.om.enable.filesystem.paths`). If this is enabled, intermediate directories are created even if the object store interface is used.
+
+This configuration is turned off by default, which means that S3 and HCFS couldn't be used together.
+
+To solve the performance problems of the directory listing / rename, [HDDS-2939](https://issues.apache.org/jira/browse/HDDS-2939) is created, which propose to use a new prefix table to store the "directory" entries (=prefixes).
+
+[HDDS-4097](https://issues.apache.org/jira/browse/HDDS-4097) is created to normalize the key names based on file-system semantics if `ozone.om.enable.filesystem.paths` is enabled. But please note that `ozone.om.enable.filesystem.paths` should always be turned on if S3 and HCFS are both used which means that S3 and HCFS couldn't be used together with normalization.
+
+# Goals
+
+ * Out of the box Ozone should support both S3 and HCFS interfaces without any settings. (It's possible only for the regular, fs compatible key names)
+ * As 100% compatibility couldn't be achieved on both side we need a configuration to set the expectations for incompatible key names
+ * Default behavior of `o3fs` and `ofs` should be as close to `s3a` as possible (when s3 compatibilty is prefered)
+
+# Possible cases to support
+
+There are two main aspects of supporting both `ofs/o3fs` and `s3` together:
+
+ 1. `ofs/o3fs` require to create intermediate directory entries (for exapmle `/a/b` for the key `/b/c/c`)
+ 2. Special file-system incompatible key names require special attention
+
+The second couldn't be done with compromise.
+
+ 1. We either support all key names (including non fs compatible key names), which means `ofs/o3fs` can provide only a partial view
+ 2. Or we can normalize the key names to be fs compatible (which makes it possible to create inconsistent S3 keys)
+
+HDDS-3955 introduced `ozone.om.enable.filesystem.paths`, with this setting we will have two possible usage pattern:
+
+| ozone.om.enable.filesystem.paths= | true | false
+|-|-|-|
+| create itermediate dirs | YES | NO |
+| normalize key names from `ofs/o3fs` | YES | NO
+| force to normalize key names of `s3` interface | YES (1) | NO 
+| `s3` key `/a/b/c` available from `ofs/o3fs` | YES | NO
+| `s3` key `/a/b//c` available from `ofs/o3fs` | YES | NO
+| `s3` key `/a/b//c` available from `s3` | AWS S3 incompatibility | YES
+
+(1): Under implementation
+
+This proposal suggest to use a 3rd option where 100% AWS compatiblity is guaranteed in exchange of a limited `ofs/o3fs` view:
+
+| ozone.om.intermediate.dir.generation= | true |
+|-|-|-|
+| create itermediate dirs | YES | 
+| normalize key names from `ofs/o3fs` | YES |
+| force to normalize key names of `s3` interface | **NO** |
+| `s3` key `/a/b/c` available from `ofs/o3fs` | YES | 
+| `s3` key `/a/b//c` available from `ofs/o3fs` | NO | 
+| `s3` key `/a/b//c` available from `s3` | *YES* (100% AWS compatibility) |
+
+
+# Proposed solution
+
+In short: 
+
+ **I propose to make it possible to configure **normalization** and **intermediate dir creation**, independent from each other**
+ 
+It can be done in multiple ways. For the sake of simplicity, let's imagine two configuration option
+
+| configuration | behavior | 
+|-|-|
+| `ozone.om.enable.filesystem.paths=true`  | Enable intermediate dir generation **AND** key name normalization 

Review comment:
       bq. Existing config means CREATE_DIR+NORMALIZE, new config is just CREATE_DIR.
   
   Are these the only two cases that are useful? do we need to support other combinations?

##########
File path: hadoop-hdds/docs/content/design/s3_hcfs.md
##########
@@ -0,0 +1,282 @@
+---
+title: S3/Ozone Filesystem inter-op 
+summary: How to support both S3 and HCFS and the same time
+date: 2020-09-09
+jira: HDDS-4097
+status: draft
+author: Marton Elek, 
+---
+<!--
+  Licensed under the Apache License, Version 2.0 (the "License");
+  you may not use this file except in compliance with the License.
+  You may obtain a copy of the License at
+
+   http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License. See accompanying LICENSE file.
+-->
+
+# Ozone S3 vs file-system semantics
+
+Ozone is an object-store for Hadoop ecosystem which can be used from multiple interfaces: 
+
+ 1. From Hadoop Compatible File Systems (will be called as *HCFS* in the remaining of this document) (RPC)
+ 2. From S3 compatible applications (REST)
+ 3. From container orchestrator as mounted volume (CSI, alpha feature)
+
+As Ozone is an object store it stores key and values in a flat hierarchy which is enough to support S3 (2). But to support Hadoop Compatible File System (and CSI), Ozone should simulated file system hierarchy.
+
+There are multiple challenges when file system hierarchy is simulated by a flat namespace:
+
+ 1. Some key patterns couldn't be easily transformed to file system path (e.g. `/a/b/../c`, `/a/b//d`, or a real key with directory path like `/b/d/`)
+ 2. Directory entries (which may have own properties) require special handling as file system interface requires a dir entry even if it's not created explicitly (for example if key `/a/b/c` is created `/a/b` supposed to be a visible directory entry for file system interface) 
+ 3. Non-recursive listing of directories can be hard (Listing direct entries under `/a` should ignore all the `/a/b/...`, `/a/b/c/...` keys) 
+ 4. Similar to listing, rename can be a costly operation as it requires to rename many keys (renaming a first level directory means a rename of all the keys with the same prefix)
+
+See also the [Hadoop S3A documentation](https://hadoop.apache.org/docs/current/hadoop-aws/tools/hadoop-aws/index.html#Introducing_the_Hadoop_S3A_client) which describes some of these problem when AWS S3 is used. (*Warnings* section)
+
+# Current status
+
+As of today *Ozone Manager* has two different interfaces (both are defined in `OmClientProtocol.proto`): 
+
+ 1. object store related functions (like *CreateKey*, *LookupKey*, *DeleteKey*,...)  
+ 2. file system related functions (like *CreateFile*, *LookupFile*,...)
+
+File system related functions uses the same flat hierarchy under the hood but includes additional functionalities. For example the `createFile` call creates all the intermediate directories for a specific key (create file `/a/b/c` will create `/a/b` and `/a` entries in the key space)
+
+Today, if a key is created from the S3 interface can cause exceptions if the intermediate directories are checked from HCFS:
+
+
+```shell
+$ aws s3api put-object --endpoint http://localhost:9878 --bucket bucket1 --key /a/b/c/d
+
+$ ozone fs -ls o3fs://bucket1.s3v/a/
+ls: `o3fs://bucket1.s3v/a/': No such file or directory
+```
+
+This problem is reported in [HDDS-3955](https://issues.apache.org/jira/browse/HDDS-3955), where a new configuration key is introduced (`ozone.om.enable.filesystem.paths`). If this is enabled, intermediate directories are created even if the object store interface is used.
+
+This configuration is turned off by default, which means that S3 and HCFS couldn't be used together.
+
+To solve the performance problems of the directory listing / rename, [HDDS-2939](https://issues.apache.org/jira/browse/HDDS-2939) is created, which propose to use a new prefix table to store the "directory" entries (=prefixes).
+
+[HDDS-4097](https://issues.apache.org/jira/browse/HDDS-4097) is created to normalize the key names based on file-system semantics if `ozone.om.enable.filesystem.paths` is enabled. But please note that `ozone.om.enable.filesystem.paths` should always be turned on if S3 and HCFS are both used which means that S3 and HCFS couldn't be used together with normalization.
+
+# Goals
+
+ * Out of the box Ozone should support both S3 and HCFS interfaces without any settings. (It's possible only for the regular, fs compatible key names)
+ * As 100% compatibility couldn't be achieved on both side we need a configuration to set the expectations for incompatible key names
+ * Default behavior of `o3fs` and `ofs` should be as close to `s3a` as possible (when s3 compatibilty is prefered)
+
+# Possible cases to support
+
+There are two main aspects of supporting both `ofs/o3fs` and `s3` together:
+
+ 1. `ofs/o3fs` require to create intermediate directory entries (for exapmle `/a/b` for the key `/b/c/c`)
+ 2. Special file-system incompatible key names require special attention
+
+The second couldn't be done with compromise.
+
+ 1. We either support all key names (including non fs compatible key names), which means `ofs/o3fs` can provide only a partial view
+ 2. Or we can normalize the key names to be fs compatible (which makes it possible to create inconsistent S3 keys)
+
+HDDS-3955 introduced `ozone.om.enable.filesystem.paths`, with this setting we will have two possible usage pattern:
+
+| ozone.om.enable.filesystem.paths= | true | false
+|-|-|-|
+| create itermediate dirs | YES | NO |
+| normalize key names from `ofs/o3fs` | YES | NO
+| force to normalize key names of `s3` interface | YES (1) | NO 
+| `s3` key `/a/b/c` available from `ofs/o3fs` | YES | NO
+| `s3` key `/a/b//c` available from `ofs/o3fs` | YES | NO
+| `s3` key `/a/b//c` available from `s3` | AWS S3 incompatibility | YES
+
+(1): Under implementation
+
+This proposal suggest to use a 3rd option where 100% AWS compatiblity is guaranteed in exchange of a limited `ofs/o3fs` view:
+
+| ozone.om.intermediate.dir.generation= | true |
+|-|-|-|
+| create itermediate dirs | YES | 
+| normalize key names from `ofs/o3fs` | YES |
+| force to normalize key names of `s3` interface | **NO** |
+| `s3` key `/a/b/c` available from `ofs/o3fs` | YES | 
+| `s3` key `/a/b//c` available from `ofs/o3fs` | NO | 
+| `s3` key `/a/b//c` available from `s3` | *YES* (100% AWS compatibility) |
+
+
+# Proposed solution
+
+In short: 
+
+ **I propose to make it possible to configure **normalization** and **intermediate dir creation**, independent from each other**
+ 
+It can be done in multiple ways. For the sake of simplicity, let's imagine two configuration option
+
+| configuration | behavior | 
+|-|-|
+| `ozone.om.enable.filesystem.paths=true`  | Enable intermediate dir generation **AND** key name normalization 
+| `ozone.om.enable.intermediate.dirs=true` | Enable only the intermediate dir generation
+
+## S3/HCFS Interoperability
+
+**In case of intermediate directory generation is enabled (with either of the configuraiton keys)**:
+
+When somebody creates a new key like `/a/b/c/d`, the same key should be visible from HCFS (`o3fs//` or `o3://`). `/a`, `/a/b` and `/a/b/c` should be visible as directories from HCFS.
+
+S3 should list only the `/a/b/c/d` keys, (`/a`, `/a/b`, `/a/b/c` keys, created to help HCFS, **won't be visible** if the key is created from S3)
+
+This can be done with persisting an extra flag with the implicit directory entries. These entries can be modified if they are explicit created.
+
+This flag should be added only for the keys which are created by S3. `ofs://` and `of3fs://`  create explicit directories all the time.

Review comment:
       bq. S3 should list only the `/a/b/c/d` keys, (`/a`, `/a/b`, `/a/b/c` keys, created to help HCFS, **won't be visible** if the key is created from S3)
   
   what if user create additional files under intermediate dirs such as /a/b/d via FS interface, we still want to show them in this case for interop?
   

##########
File path: hadoop-hdds/docs/content/design/s3_hcfs.md
##########
@@ -0,0 +1,280 @@
+---
+title: S3/Ozone Filesystem inter-op 
+summary: How to support both S3 and HCFS and the same time
+date: 2020-09-09
+jira: HDDS-4097
+status: draft
+author: Marton Elek, 
+---
+<!--
+  Licensed under the Apache License, Version 2.0 (the "License");
+  you may not use this file except in compliance with the License.
+  You may obtain a copy of the License at
+
+   http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License. See accompanying LICENSE file.
+-->
+
+# Ozone S3 vs file-system semantics
+
+Ozone is an object-store for Hadoop ecosystem which can be used from multiple interfaces: 
+
+ 1. From Hadoop Compatible File Systems (will be called as *HCFS* in the remaining of this document) (RPC)
+ 2. From S3 compatible applications (REST)
+ 3. From container orchestrator as mounted volume (CSI, alpha feature)
+
+As Ozone is an object store it stores key and values in a flat hierarchy which is enough to support S3 (2). But to support Hadoop Compatible File System (and CSI), Ozone should simulated file system hierarchy.
+
+There are multiple challenges when file system hierarchy is simulated by a flat namespace:
+
+ 1. Some key patterns couldn't be easily transformed to file system path (e.g. `/a/b/../c`, `/a/b//d`, or a real key with directory path like `/b/d/`)
+ 2. Directory entries (which may have own properties) require special handling as file system interface requires a dir entry even if it's not created explicitly (for example if key `/a/b/c` is created `/a/b` supposed to be a visible directory entry for file system interface) 
+ 3. Non-recursive listing of directories can be hard (Listing direct entries under `/a` should ignore all the `/a/b/...`, `/a/b/c/...` keys) 
+ 4. Similar to listing, rename can be a costly operation as it requires to rename many keys (renaming a first level directory means a rename of all the keys with the same prefix)
+
+See also the [Hadoop S3A documentation](https://hadoop.apache.org/docs/current/hadoop-aws/tools/hadoop-aws/index.html#Introducing_the_Hadoop_S3A_client) which describes some of these problem when AWS S3 is used. (*Warnings* section)
+
+# Current status
+
+As of today *Ozone Manager* has two different interfaces (both are defined in `OmClientProtocol.proto`): 
+
+ 1. object store related functions (like *CreateKey*, *LookupKey*, *DeleteKey*,...)  
+ 2. file system related functions (like *CreateFile*, *LookupFile*,...)
+
+File system related functions uses the same flat hierarchy under the hood but includes additional functionalities. For example the `createFile` call creates all the intermediate directories for a specific key (create file `/a/b/c` will create `/a/b` and `/a` entries in the key space)
+
+Today, if a key is created from the S3 interface can cause exceptions if the intermediate directories are checked from HCFS:
+
+
+```shell
+$ aws s3api put-object --endpoint http://localhost:9878 --bucket bucket1 --key /a/b/c/d
+
+$ ozone fs -ls o3fs://bucket1.s3v/a/
+ls: `o3fs://bucket1.s3v/a/': No such file or directory
+```
+
+This problem is reported in [HDDS-3955](https://issues.apache.org/jira/browse/HDDS-3955), where a new configuration key is introduced (`ozone.om.enable.filesystem.paths`). If this is enabled, intermediate directories are created even if the object store interface is used.
+
+This configuration is turned off by default, which means that S3 and HCFS couldn't be used together.
+
+To solve the performance problems of the directory listing / rename, [HDDS-2939](https://issues.apache.org/jira/browse/HDDS-2939) is created, which propose to use a new prefix table to store the "directory" entries (=prefixes).
+
+[HDDS-4097](https://issues.apache.org/jira/browse/HDDS-4097) is created to normalize the key names based on file-system semantics if `ozone.om.enable.filesystem.paths` is enabled. But please note that `ozone.om.enable.filesystem.paths` should always be turned on if S3 and HCFS are both used. It means that if both S3 and HCFS are used, normalization is forced, and S3 interface is not fully AWS S3 compatible. There is no option to use HCFS and S3 but with full AWS compatibility (and reduced HCFS compatibility). 
+
+# Goals
+
+ * Out of the box Ozone should support both S3 and HCFS interfaces without any settings. (It's possible only for the regular, fs compatible key names)
+ * As 100% compatibility couldn't be achieved on both side we need a configuration to set the expectations for incompatible key names
+ * Default behavior of `o3fs` and `ofs` should be as close to `s3a` as possible (when s3 compatibilty is prefered)
+
+# Possible cases to support
+
+There are two main aspects of supporting both `ofs/o3fs` and `s3` together:
+
+ 1. `ofs/o3fs` require to create intermediate directory entries (for example `/a/b` for the key `/b/c/c`)
+ 2. Special file-system incompatible key names require special attention
+
+The second couldn't be done with compromise.
+
+ 1. We either support all key names (including non fs compatible key names), which means `ofs/o3fs` can provide only a partial view
+ 2. Or we can normalize the key names to be fs compatible (which makes it possible to create inconsistent S3 keys)
+
+HDDS-3955 introduced `ozone.om.enable.filesystem.paths`, with this setting we will have two possible usage pattern:
+
+| ozone.om.enable.filesystem.paths= | true | false
+|-|-|-|
+| create itermediate dirs | YES | NO |
+| normalize key names from `ofs/o3fs` | YES | NO
+| force to normalize key names of `s3` interface | YES | NO 
+| `s3` key `/a/b/c` available from `ofs/o3fs` | YES | NO
+| `s3` key `/a/b//c` available from `ofs/o3fs` | YES | NO
+| `s3` key `/a/b//c` available from `s3` | AWS S3 incompatibility | YES
+
+This proposal suggest to use a 3rd option where 100% AWS compatiblity is guaranteed in exchange of a limited `ofs/o3fs` view:
+
+| ozone.om.intermediate.dir.generation= | true |
+|-|-|-|
+| create itermediate dirs | YES | 
+| normalize key names from `ofs/o3fs` | YES |
+| force to normalize key names of `s3` interface | **NO** |
+| `s3` key `/a/b/c` available from `ofs/o3fs` | YES | 
+| `s3` key `/a/b//c` available from `ofs/o3fs` | NO | 
+| `s3` key `/a/b//c` available from `s3` | *YES* (100% AWS compatibility) |
+
+
+# Proposed solution
+
+In short: 
+
+ **I propose to make it possible to configure **normalization** and **intermediate dir creation**, independent from each other**
+ 
+It can be done in multiple ways. For the sake of simplicity, let's imagine two configuration option
+
+| configuration | behavior | 
+|-|-|
+| `ozone.om.enable.filesystem.paths=true`  | Enable intermediate dir generation **AND** key name normalization 
+| `ozone.om.enable.intermediate.dirs=true` | Enable only the intermediate dir generation
+
+## S3/HCFS Interoperability
+
+**In case of intermediate directory generation is enabled (with either of the configuraiton keys)**:
+
+When somebody creates a new key like `/a/b/c/d`, the same key should be visible from HCFS (`o3fs//` or `o3://`). `/a`, `/a/b` and `/a/b/c` should be visible as directories from HCFS.
+
+S3 should list only the `/a/b/c/d` keys, (`/a`, `/a/b`, `/a/b/c` keys, created to help HCFS, **won't be visible** if the key is created from S3)
+
+This can be done with persisting an extra flag with the implicit directory entries. These entries can be modified if they are explicit created.
+
+This flag should be added only for the keys which are created by S3. `ofs://` and `of3fs://`  create explicit directories all the time.
+
+Advantages of this approach:
+
+ 1. HCFS and S3 can work together
+ 2. S3 behavior is closer to the original AWS s3 behavior (when `/a/b/c` key is created `/a/b` won't be visible)
+
+## Handling of the incompatible paths
+
+As it's defined above the intermediate directory generation and normalization are two independent settings. (It's possible to choose only to create the intermediate directories).
+
+**If normalization is choosen**: (`ozone.om.enable.filesystem.paths=true`), all the key names will be normalized to fs-compatible name. It may cause a conflict (error) if the normalized key is already exists (or exists as a file instead of directory)
+
+**Without normalization (`ozone.om.enable.intermediate.dirs=true`)**:
+
+Creating intermediate directories might not be possible if path contains illegal characters or can't be parsed as a file system path. **These keys will be invisible from HCFS** by default. They will be ignored during the normal file list.

Review comment:
       The illegal char from S3 path can be encoded into FS path except the /. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org


[GitHub] [hadoop-ozone] elek commented on a change in pull request #1411: HDDS-4097. [DESIGN] S3/Ozone Filesystem inter-op

Posted by GitBox <gi...@apache.org>.
elek commented on a change in pull request #1411:
URL: https://github.com/apache/hadoop-ozone/pull/1411#discussion_r487496013



##########
File path: hadoop-hdds/docs/content/design/s3_hcfs.md
##########
@@ -0,0 +1,282 @@
+---
+title: S3/Ozone Filesystem inter-op 
+summary: How to support both S3 and HCFS and the same time
+date: 2020-09-09
+jira: HDDS-4097
+status: draft
+author: Marton Elek, 
+---
+<!--
+  Licensed under the Apache License, Version 2.0 (the "License");
+  you may not use this file except in compliance with the License.
+  You may obtain a copy of the License at
+
+   http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License. See accompanying LICENSE file.
+-->
+
+# Ozone S3 vs file-system semantics
+
+Ozone is an object-store for Hadoop ecosystem which can be used from multiple interfaces: 
+
+ 1. From Hadoop Compatible File Systems (will be called as *HCFS* in the remaining of this document) (RPC)
+ 2. From S3 compatible applications (REST)
+ 3. From container orchestrator as mounted volume (CSI, alpha feature)
+
+As Ozone is an object store it stores key and values in a flat hierarchy which is enough to support S3 (2). But to support Hadoop Compatible File System (and CSI), Ozone should simulated file system hierarchy.
+
+There are multiple challenges when file system hierarchy is simulated by a flat namespace:
+
+ 1. Some key patterns couldn't be easily transformed to file system path (e.g. `/a/b/../c`, `/a/b//d`, or a real key with directory path like `/b/d/`)
+ 2. Directory entries (which may have own properties) require special handling as file system interface requires a dir entry even if it's not created explicitly (for example if key `/a/b/c` is created `/a/b` supposed to be a visible directory entry for file system interface) 
+ 3. Non-recursive listing of directories can be hard (Listing direct entries under `/a` should ignore all the `/a/b/...`, `/a/b/c/...` keys) 
+ 4. Similar to listing, rename can be a costly operation as it requires to rename many keys (renaming a first level directory means a rename of all the keys with the same prefix)
+
+See also the [Hadoop S3A documentation](https://hadoop.apache.org/docs/current/hadoop-aws/tools/hadoop-aws/index.html#Introducing_the_Hadoop_S3A_client) which describes some of these problem when AWS S3 is used. (*Warnings* section)
+
+# Current status
+
+As of today *Ozone Manager* has two different interfaces (both are defined in `OmClientProtocol.proto`): 
+
+ 1. object store related functions (like *CreateKey*, *LookupKey*, *DeleteKey*,...)  
+ 2. file system related functions (like *CreateFile*, *LookupFile*,...)
+
+File system related functions uses the same flat hierarchy under the hood but includes additional functionalities. For example the `createFile` call creates all the intermediate directories for a specific key (create file `/a/b/c` will create `/a/b` and `/a` entries in the key space)
+
+Today, if a key is created from the S3 interface can cause exceptions if the intermediate directories are checked from HCFS:
+
+
+```shell
+$ aws s3api put-object --endpoint http://localhost:9878 --bucket bucket1 --key /a/b/c/d
+
+$ ozone fs -ls o3fs://bucket1.s3v/a/
+ls: `o3fs://bucket1.s3v/a/': No such file or directory
+```
+
+This problem is reported in [HDDS-3955](https://issues.apache.org/jira/browse/HDDS-3955), where a new configuration key is introduced (`ozone.om.enable.filesystem.paths`). If this is enabled, intermediate directories are created even if the object store interface is used.
+
+This configuration is turned off by default, which means that S3 and HCFS couldn't be used together.
+
+To solve the performance problems of the directory listing / rename, [HDDS-2939](https://issues.apache.org/jira/browse/HDDS-2939) is created, which propose to use a new prefix table to store the "directory" entries (=prefixes).
+
+[HDDS-4097](https://issues.apache.org/jira/browse/HDDS-4097) is created to normalize the key names based on file-system semantics if `ozone.om.enable.filesystem.paths` is enabled. But please note that `ozone.om.enable.filesystem.paths` should always be turned on if S3 and HCFS are both used which means that S3 and HCFS couldn't be used together with normalization.
+
+# Goals
+
+ * Out of the box Ozone should support both S3 and HCFS interfaces without any settings. (It's possible only for the regular, fs compatible key names)
+ * As 100% compatibility couldn't be achieved on both side we need a configuration to set the expectations for incompatible key names
+ * Default behavior of `o3fs` and `ofs` should be as close to `s3a` as possible (when s3 compatibilty is prefered)
+
+# Possible cases to support
+
+There are two main aspects of supporting both `ofs/o3fs` and `s3` together:
+
+ 1. `ofs/o3fs` require to create intermediate directory entries (for exapmle `/a/b` for the key `/b/c/c`)
+ 2. Special file-system incompatible key names require special attention
+
+The second couldn't be done with compromise.
+
+ 1. We either support all key names (including non fs compatible key names), which means `ofs/o3fs` can provide only a partial view
+ 2. Or we can normalize the key names to be fs compatible (which makes it possible to create inconsistent S3 keys)
+
+HDDS-3955 introduced `ozone.om.enable.filesystem.paths`, with this setting we will have two possible usage pattern:
+
+| ozone.om.enable.filesystem.paths= | true | false
+|-|-|-|
+| create itermediate dirs | YES | NO |
+| normalize key names from `ofs/o3fs` | YES | NO
+| force to normalize key names of `s3` interface | YES (1) | NO 
+| `s3` key `/a/b/c` available from `ofs/o3fs` | YES | NO
+| `s3` key `/a/b//c` available from `ofs/o3fs` | YES | NO
+| `s3` key `/a/b//c` available from `s3` | AWS S3 incompatibility | YES
+
+(1): Under implementation
+
+This proposal suggest to use a 3rd option where 100% AWS compatiblity is guaranteed in exchange of a limited `ofs/o3fs` view:
+
+| ozone.om.intermediate.dir.generation= | true |
+|-|-|-|
+| create itermediate dirs | YES | 
+| normalize key names from `ofs/o3fs` | YES |
+| force to normalize key names of `s3` interface | **NO** |
+| `s3` key `/a/b/c` available from `ofs/o3fs` | YES | 
+| `s3` key `/a/b//c` available from `ofs/o3fs` | NO | 
+| `s3` key `/a/b//c` available from `s3` | *YES* (100% AWS compatibility) |
+
+
+# Proposed solution
+
+In short: 
+
+ **I propose to make it possible to configure **normalization** and **intermediate dir creation**, independent from each other**
+ 
+It can be done in multiple ways. For the sake of simplicity, let's imagine two configuration option
+
+| configuration | behavior | 
+|-|-|
+| `ozone.om.enable.filesystem.paths=true`  | Enable intermediate dir generation **AND** key name normalization 
+| `ozone.om.enable.intermediate.dirs=true` | Enable only the intermediate dir generation
+
+## S3/HCFS Interoperability
+
+**In case of intermediate directory generation is enabled (with either of the configuraiton keys)**:
+
+When somebody creates a new key like `/a/b/c/d`, the same key should be visible from HCFS (`o3fs//` or `o3://`). `/a`, `/a/b` and `/a/b/c` should be visible as directories from HCFS.
+
+S3 should list only the `/a/b/c/d` keys, (`/a`, `/a/b`, `/a/b/c` keys, created to help HCFS, **won't be visible** if the key is created from S3)
+
+This can be done with persisting an extra flag with the implicit directory entries. These entries can be modified if they are explicit created.
+
+This flag should be added only for the keys which are created by S3. `ofs://` and `of3fs://`  create explicit directories all the time.
+
+Advantages of this approach:
+
+ 1. HCFS and S3 can work together
+ 2. S3 behavior is closer to the original AWS s3 behavior (when `/a/b/c` key is created `/a/b` won't be visible)
+
+## Handling of the incompatible paths
+
+As it's defined above the intermediate directory generation and normalization are two independent settings. (It's possible to choose only to create the intermediate directories).
+
+**If normalization is choosen**: (`ozone.om.enable.filesystem.paths=true`), all the key names will be normalized to fs-compatible name. It may cause a conflict (error) if the normalized key is already exists (or exists as a file instead of directory)
+
+**Without normalization (`ozone.om.enable.intermediate.dirs=true`)**:
+
+Creating intermediate directories might not be possible if path contains illegal characters or can't be parsed as a file system path. **These keys will be invisible from HCFS** by default. They will be ignored during the normal file list.
+
+## Using Ozone in object-store only mode
+
+Creating intermediate directories can have some overhead (write amplification is increased if many keys are written with different prefixes as we need an entry for each prefixes). This write-amplification can be handled with the current implementation: based on the measurements RocksDB has no problems with billions of keys.
+
+If none of the mentioned configurations are enabled, the intermediate directories won't be created. But in this case, the consistent view of `ofs/o3fs` couldn't be guaranteed, so `ofs/o3fs` **should be disabled and throw an exception** (But Ozone can be used as a pure S3 replacement without using as a HCFS).
+
+# Problematic cases
+
+As described in the previous section there are some cases which couldn't be supported out-of-the-box due to the differences between the flat key-space and file-system hierarchy. These cases are collected here together with the information how existing tools (AWS console, AWS cli, AWS S3A Hadoop connector) behaves.
+
+## Empty directory path
+
+With a pure object store keys with empty directory names can be created.
+
+```
+aws s3api put-object --bucket ozonetest --key a/b//y --body README.md
+```
+
+Behavior:
+ * *S3 web console*: empty dir is rendered as `____` to make it possible to navigate in
+ * **aws s3 ls**: Prefix entry is visible as `PRE /`
+ * S3A: Not visible
+ 
+Proposed behavior:
+
+ * `ozone.om.enable.intermediate.dirs=true`: `/y` is not accessible, `/a/b` directory doesn't contain this entry
+ * `ozone.om.enable.filesystem.paths=true`: key stored as `/a/b/c`  
+ 
+## Path with invalid characters (`..`,`.`) 
+
+Path segments might include parts which has file system semantics:
+
+```
+aws s3api put-object --bucket ozonetest --key a/b/../e --body README.md
+aws s3api put-object --bucket ozonetest --key a/b/./f --body README.md
+```
+
+Behavior:
+ * *S3 web console*: `.` and `..` are rendered as directories to make it possible to navigate in
+ * **aws s3 ls**: Prefix entry is visible as `PRE ../` and `PRE ./`
+ * S3A: Entries are not visible
+ 
+Proposed behavior:
+
+ * `ozone.om.enable.intermediate.dirs=true`: `e` and `f` are not visible
+ * `ozone.om.enable.filesystem.paths=true`: key stored as `/a/e` and `a/b/f`  
+
+## Key and directory with the same name
+
+It is possible to create directory and key with the same name in AWS:
+
+```
+aws s3api put-object --bucket ozonetest --key a/b/h --body README.md
+aws s3api list-objects --bucket ozonetest --prefix=a/b/h/
+```
+
+Behavior:
+ * *S3 web console*: both directory and file are rendered
+ * **aws s3 ls**: prefix (`PRE h/`) and file (`h`) are both displayed
+ * S3A: both entries are visible with the name `/a/b/h` but firt is a file (with size) second is a directory (with directory attributes)
+ 
+Proposed behavior:
+
+ * `ozone.om.enable.intermediate.dirs=true`: show both the file and the directory with the same name (similar to S3A)
+ * `ozone.om.enable.filesystem.paths=true`: throwing exception when the second one is created  
+
+## Directory entry created with file content
+
+In this case we create a directory (key which ends with `/`) but with real file content:
+
+```
+aws s3api put-object --bucket ozonetest --key a/b/i/ --body README.md
+```
+
+Behavior:
+ * *S3 web console*: rendered as directory (couldn't be downloaded)
+ * **aws s3 ls**: showed as a prefix (`aws s3 ls s3://ozonetest/a/b`), but when the full path is used showed as a file without name (`aws s3 ls s3://ozonetest/a/b/i/`)
+ * S3A: `./bin/hdfs dfs -ls s3a://ozonetest/a/b/` shows a directory `h`, `./bin/hdfs dfs -ls s3a://ozonetest/a/b/i` shows a file `i`
+ 
+Proposed behavior:
+
+ * `ozone.om.enable.intermediate.dirs=true`: possible but `i/` is hidden from o3fs/ofs
+ * `ozone.om.enable.filesystem.paths=true`: key name is normalized to real key name
+
+## Create key and explicit create parent dir
+
+```
+aws s3api put-object --bucket ozonetest --key e/f/g/
+aws s3api put-object --bucket ozonetest --key e/f/
+```
+
+Behavior:
+
+ * S3 can support it without any problem
+ 
+Proposed behavior:
+
+After the first command `/e/f/` and `/e/` entries created in the key space (as they are required by `ofs`/`o3fs`) but **with a specific flag** (explicit=false). 
+
+AWS S3 list-objects API should exclude those entries from the result (!).
+
+Second command execution should modify the flag of `/e/f/` key to (explicit=true).
+
+## Create parent dir AND key with S3a
+
+This is the problem which is reported by [HDDS-4209](https://issues.apache.org/jira/browse/HDDS-4209)
+
+```
+hdfs dfs -mkdir -p s3a://b12345/d11/d12 # -> Success
+
+hdfs dfs -put /tmp/file1 s3a://b12345/d11/d12/file1 # -> fails with below error
+```
+
+Proposed behavior:
+
+ * `ozone.om.enable.intermediate.dirs=true`: should work without error
+ * `ozone.om.enable.filesystem.paths=true`: should work without error.
+
+This is an `ofs`/`o3fs` question not an S3. The directory created in the first step shouldn't block the creation of the file. This can be a **mandatory** normalization for `mkdir` directory creation. As it's an HCFS operation, s3 is not affected. Entries created from S3 can be visible from s3 without any problem.
+
+## Create file and directory with S3
+
+This problem is reported in HDDS-4209, thanks to @Bharat
+
+```
+hdfs dfs -mkdir -p s3a://b12345/d11/d12 -> Success
+
+hdfs dfs -put /tmp/file1 s3a://b12345/d11/d12/file1 
+```
+
+In this case first a `d11/d12/` key is created. The intermediate key creation logic in the second step should use it as a directory instead of throwing an exception.

Review comment:
       Thanks to explain it. I tested it with pure AWS S3 and `s3a`. The trailing `/` seems to be missing due to a bug in our normalization, and it's not and `s3a` behavior. I think we should fix the normalization.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org


[GitHub] [hadoop-ozone] elek commented on pull request #1411: HDDS-4097. [DESIGN] S3/Ozone Filesystem inter-op

Posted by GitBox <gi...@apache.org>.
elek commented on pull request #1411:
URL: https://github.com/apache/hadoop-ozone/pull/1411#issuecomment-703504639


   We had an offline conversation with @bharatviswa504 @arp7
   
   Got the feedback from Arpit: the 3rd option can be useful (we had disagreement how useful it is), but it was requested to fill the behavior in `Ozone filesystem path enabled` attached to the design doc.
   
   I uploaded the updated version: `Ozone filesystem path enabled v3.xlsx` to the design doc.
   
   @arp7 Would you be so kind to have a look and give me feedback.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org


[GitHub] [hadoop-ozone] elek commented on a change in pull request #1411: HDDS-4097. [DESIGN] S3/Ozone Filesystem inter-op

Posted by GitBox <gi...@apache.org>.
elek commented on a change in pull request #1411:
URL: https://github.com/apache/hadoop-ozone/pull/1411#discussion_r486429661



##########
File path: hadoop-hdds/docs/content/design/s3_hcfs.md
##########
@@ -0,0 +1,282 @@
+---
+title: S3/Ozone Filesystem inter-op 
+summary: How to support both S3 and HCFS and the same time
+date: 2020-09-09
+jira: HDDS-4097
+status: draft
+author: Marton Elek, 
+---
+<!--
+  Licensed under the Apache License, Version 2.0 (the "License");
+  you may not use this file except in compliance with the License.
+  You may obtain a copy of the License at
+
+   http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License. See accompanying LICENSE file.
+-->
+
+# Ozone S3 vs file-system semantics
+
+Ozone is an object-store for Hadoop ecosystem which can be used from multiple interfaces: 
+
+ 1. From Hadoop Compatible File Systems (will be called as *HCFS* in the remaining of this document) (RPC)
+ 2. From S3 compatible applications (REST)
+ 3. From container orchestrator as mounted volume (CSI, alpha feature)
+
+As Ozone is an object store it stores key and values in a flat hierarchy which is enough to support S3 (2). But to support Hadoop Compatible File System (and CSI), Ozone should simulated file system hierarchy.
+
+There are multiple challenges when file system hierarchy is simulated by a flat namespace:
+
+ 1. Some key patterns couldn't be easily transformed to file system path (e.g. `/a/b/../c`, `/a/b//d`, or a real key with directory path like `/b/d/`)
+ 2. Directory entries (which may have own properties) require special handling as file system interface requires a dir entry even if it's not created explicitly (for example if key `/a/b/c` is created `/a/b` supposed to be a visible directory entry for file system interface) 
+ 3. Non-recursive listing of directories can be hard (Listing direct entries under `/a` should ignore all the `/a/b/...`, `/a/b/c/...` keys) 
+ 4. Similar to listing, rename can be a costly operation as it requires to rename many keys (renaming a first level directory means a rename of all the keys with the same prefix)
+
+See also the [Hadoop S3A documentation](https://hadoop.apache.org/docs/current/hadoop-aws/tools/hadoop-aws/index.html#Introducing_the_Hadoop_S3A_client) which describes some of these problem when AWS S3 is used. (*Warnings* section)
+
+# Current status
+
+As of today *Ozone Manager* has two different interfaces (both are defined in `OmClientProtocol.proto`): 
+
+ 1. object store related functions (like *CreateKey*, *LookupKey*, *DeleteKey*,...)  
+ 2. file system related functions (like *CreateFile*, *LookupFile*,...)
+
+File system related functions uses the same flat hierarchy under the hood but includes additional functionalities. For example the `createFile` call creates all the intermediate directories for a specific key (create file `/a/b/c` will create `/a/b` and `/a` entries in the key space)
+
+Today, if a key is created from the S3 interface can cause exceptions if the intermediate directories are checked from HCFS:
+
+
+```shell
+$ aws s3api put-object --endpoint http://localhost:9878 --bucket bucket1 --key /a/b/c/d
+
+$ ozone fs -ls o3fs://bucket1.s3v/a/
+ls: `o3fs://bucket1.s3v/a/': No such file or directory
+```
+
+This problem is reported in [HDDS-3955](https://issues.apache.org/jira/browse/HDDS-3955), where a new configuration key is introduced (`ozone.om.enable.filesystem.paths`). If this is enabled, intermediate directories are created even if the object store interface is used.
+
+This configuration is turned off by default, which means that S3 and HCFS couldn't be used together.
+
+To solve the performance problems of the directory listing / rename, [HDDS-2939](https://issues.apache.org/jira/browse/HDDS-2939) is created, which propose to use a new prefix table to store the "directory" entries (=prefixes).
+
+[HDDS-4097](https://issues.apache.org/jira/browse/HDDS-4097) is created to normalize the key names based on file-system semantics if `ozone.om.enable.filesystem.paths` is enabled. But please note that `ozone.om.enable.filesystem.paths` should always be turned on if S3 and HCFS are both used which means that S3 and HCFS couldn't be used together with normalization.
+
+# Goals
+
+ * Out of the box Ozone should support both S3 and HCFS interfaces without any settings. (It's possible only for the regular, fs compatible key names)
+ * As 100% compatibility couldn't be achieved on both side we need a configuration to set the expectations for incompatible key names
+ * Default behavior of `o3fs` and `ofs` should be as close to `s3a` as possible (when s3 compatibilty is prefered)
+
+# Possible cases to support
+
+There are two main aspects of supporting both `ofs/o3fs` and `s3` together:
+
+ 1. `ofs/o3fs` require to create intermediate directory entries (for exapmle `/a/b` for the key `/b/c/c`)
+ 2. Special file-system incompatible key names require special attention
+
+The second couldn't be done with compromise.
+
+ 1. We either support all key names (including non fs compatible key names), which means `ofs/o3fs` can provide only a partial view
+ 2. Or we can normalize the key names to be fs compatible (which makes it possible to create inconsistent S3 keys)
+
+HDDS-3955 introduced `ozone.om.enable.filesystem.paths`, with this setting we will have two possible usage pattern:
+
+| ozone.om.enable.filesystem.paths= | true | false
+|-|-|-|
+| create itermediate dirs | YES | NO |
+| normalize key names from `ofs/o3fs` | YES | NO
+| force to normalize key names of `s3` interface | YES (1) | NO 

Review comment:
       Yes, agree. I tried to explain this behavior with the two lines to show that both are normalized. This is not because we have two flags. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org


[GitHub] [hadoop-ozone] arp7 commented on a change in pull request #1411: HDDS-4097. [DESIGN] S3/Ozone Filesystem inter-op

Posted by GitBox <gi...@apache.org>.
arp7 commented on a change in pull request #1411:
URL: https://github.com/apache/hadoop-ozone/pull/1411#discussion_r485820636



##########
File path: hadoop-hdds/docs/content/design/s3_hcfs.md
##########
@@ -67,45 +66,100 @@ To solve the performance problems of the directory listing / rename, [HDDS-2939]
 
 [HDDS-4097](https://issues.apache.org/jira/browse/HDDS-4097) is created to normalize the key names based on file-system semantics if `ozone.om.enable.filesystem.paths` is enabled. But please note that `ozone.om.enable.filesystem.paths` should always be turned on if S3 and HCFS are both used which means that S3 and HCFS couldn't be used together with normalization.
 
-## Goals
+# Goals
+
+ * Out of the box Ozone should support both S3 and HCFS interfaces without any settings. (It's possible only for the regular, fs compatible key names)
+ * As 100% compatibility couldn't be achieved on both side we need a configuration to set the expectations for incompatible key names
+ * Default behavior of `o3fs` and `ofs` should be as close to `s3a` as possible (when s3 compatibilty is prefered)
+
+# Possible cases to support
+
+There are two main aspects of supporting both `ofs/o3fs` and `s3` together:
+
+ 1. `ofs/o3fs` require to create intermediate directory entries (for exapmle `/a/b` for the key `/b/c/c`)
+ 2. Special file-system incompatible key names require special attention
+
+The second couldn't be done with compromise.
+
+ 1. We either support all key names (including non fs compatible key names), which means `ofs/o3fs` can provide only a partial view
+ 2. Or we can normalize the key names to be fs compatible (which makes it possible to create inconsistent S3 keys)
+
+HDDS-3955 introduced `ozone.om.enable.filesystem.paths`, with this setting we will have two possible usage pattern:
+
+| ozone.om.enable.filesystem.paths= | true | false
+|-|-|-|
+| create itermediate dirs | YES | NO |
+| normalize key names from `ofs/o3fs` | YES | NO
+| force to normalize key names of `s3` interface | YES (1) | NO 
+| `s3` key `/a/b/c` available from `ofs/o3fs` | YES | NO
+| `s3` key `/a/b//c` available from `ofs/o3fs` | YES | NO
+| `s3` key `/a/b//c` available from `s3` | AWS S3 incompatibility | YES
+
+(1): Under implementation
 
- * Out of the box Ozone should support both S3 and HCFS interfaces without any settings. (It's possible only for the regular path)
- * As 100% compatibility couldn't be achieved on both side we need a configuration to set the expectations in case of incompatible key names
- * Default behavior of `o3fs` and `ofs` should be as close to `s3a` as possible
+This proposal suggest to use a 3rd option where 100% AWS compatiblity is guaranteed in exchange of a limited `ofs/o3fs` view:

Review comment:
       > an existing bucket can be exposed created via CLI to be exposed to S3, what semantics that bucket will get?
   If the bucket was created via FS interface, it will support FS semantics.
   
   > Buckets creation is possible via only OFS, what about O3fs?
   Good point, for buckets created via the Ozone shell, we could accept a command-line flag. The default can be filesystem because S3 buckets are traditionally created via the S3 API. You're right this needs some more discussion.

##########
File path: hadoop-hdds/docs/content/design/s3_hcfs.md
##########
@@ -67,45 +66,100 @@ To solve the performance problems of the directory listing / rename, [HDDS-2939]
 
 [HDDS-4097](https://issues.apache.org/jira/browse/HDDS-4097) is created to normalize the key names based on file-system semantics if `ozone.om.enable.filesystem.paths` is enabled. But please note that `ozone.om.enable.filesystem.paths` should always be turned on if S3 and HCFS are both used which means that S3 and HCFS couldn't be used together with normalization.
 
-## Goals
+# Goals
+
+ * Out of the box Ozone should support both S3 and HCFS interfaces without any settings. (It's possible only for the regular, fs compatible key names)
+ * As 100% compatibility couldn't be achieved on both side we need a configuration to set the expectations for incompatible key names
+ * Default behavior of `o3fs` and `ofs` should be as close to `s3a` as possible (when s3 compatibilty is prefered)
+
+# Possible cases to support
+
+There are two main aspects of supporting both `ofs/o3fs` and `s3` together:
+
+ 1. `ofs/o3fs` require to create intermediate directory entries (for exapmle `/a/b` for the key `/b/c/c`)
+ 2. Special file-system incompatible key names require special attention
+
+The second couldn't be done with compromise.
+
+ 1. We either support all key names (including non fs compatible key names), which means `ofs/o3fs` can provide only a partial view
+ 2. Or we can normalize the key names to be fs compatible (which makes it possible to create inconsistent S3 keys)
+
+HDDS-3955 introduced `ozone.om.enable.filesystem.paths`, with this setting we will have two possible usage pattern:
+
+| ozone.om.enable.filesystem.paths= | true | false
+|-|-|-|
+| create itermediate dirs | YES | NO |
+| normalize key names from `ofs/o3fs` | YES | NO
+| force to normalize key names of `s3` interface | YES (1) | NO 
+| `s3` key `/a/b/c` available from `ofs/o3fs` | YES | NO
+| `s3` key `/a/b//c` available from `ofs/o3fs` | YES | NO
+| `s3` key `/a/b//c` available from `s3` | AWS S3 incompatibility | YES
+
+(1): Under implementation
 
- * Out of the box Ozone should support both S3 and HCFS interfaces without any settings. (It's possible only for the regular path)
- * As 100% compatibility couldn't be achieved on both side we need a configuration to set the expectations in case of incompatible key names
- * Default behavior of `o3fs` and `ofs` should be as close to `s3a` as possible
+This proposal suggest to use a 3rd option where 100% AWS compatiblity is guaranteed in exchange of a limited `ofs/o3fs` view:

Review comment:
       > an existing bucket can be exposed created via CLI to be exposed to S3, what semantics that bucket will get?
   
   If the bucket was created via FS interface, it will support FS semantics.
   
   > Buckets creation is possible via only OFS, what about O3fs?
   
   Good point, for buckets created via the Ozone shell, we could accept a command-line flag. The default can be filesystem because S3 buckets are traditionally created via the S3 API. You're right this needs some more discussion.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org


[GitHub] [hadoop-ozone] elek commented on a change in pull request #1411: HDDS-4097. [DESIGN] S3/Ozone Filesystem inter-op

Posted by GitBox <gi...@apache.org>.
elek commented on a change in pull request #1411:
URL: https://github.com/apache/hadoop-ozone/pull/1411#discussion_r487497090



##########
File path: hadoop-hdds/docs/content/design/s3_hcfs.md
##########
@@ -0,0 +1,280 @@
+---
+title: S3/Ozone Filesystem inter-op 
+summary: How to support both S3 and HCFS and the same time
+date: 2020-09-09
+jira: HDDS-4097
+status: draft
+author: Marton Elek, 
+---
+<!--
+  Licensed under the Apache License, Version 2.0 (the "License");
+  you may not use this file except in compliance with the License.
+  You may obtain a copy of the License at
+
+   http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License. See accompanying LICENSE file.
+-->
+
+# Ozone S3 vs file-system semantics
+
+Ozone is an object-store for Hadoop ecosystem which can be used from multiple interfaces: 
+
+ 1. From Hadoop Compatible File Systems (will be called as *HCFS* in the remaining of this document) (RPC)
+ 2. From S3 compatible applications (REST)
+ 3. From container orchestrator as mounted volume (CSI, alpha feature)
+
+As Ozone is an object store it stores key and values in a flat hierarchy which is enough to support S3 (2). But to support Hadoop Compatible File System (and CSI), Ozone should simulated file system hierarchy.
+
+There are multiple challenges when file system hierarchy is simulated by a flat namespace:
+
+ 1. Some key patterns couldn't be easily transformed to file system path (e.g. `/a/b/../c`, `/a/b//d`, or a real key with directory path like `/b/d/`)
+ 2. Directory entries (which may have own properties) require special handling as file system interface requires a dir entry even if it's not created explicitly (for example if key `/a/b/c` is created `/a/b` supposed to be a visible directory entry for file system interface) 
+ 3. Non-recursive listing of directories can be hard (Listing direct entries under `/a` should ignore all the `/a/b/...`, `/a/b/c/...` keys) 
+ 4. Similar to listing, rename can be a costly operation as it requires to rename many keys (renaming a first level directory means a rename of all the keys with the same prefix)
+
+See also the [Hadoop S3A documentation](https://hadoop.apache.org/docs/current/hadoop-aws/tools/hadoop-aws/index.html#Introducing_the_Hadoop_S3A_client) which describes some of these problem when AWS S3 is used. (*Warnings* section)
+
+# Current status
+
+As of today *Ozone Manager* has two different interfaces (both are defined in `OmClientProtocol.proto`): 
+
+ 1. object store related functions (like *CreateKey*, *LookupKey*, *DeleteKey*,...)  
+ 2. file system related functions (like *CreateFile*, *LookupFile*,...)
+
+File system related functions uses the same flat hierarchy under the hood but includes additional functionalities. For example the `createFile` call creates all the intermediate directories for a specific key (create file `/a/b/c` will create `/a/b` and `/a` entries in the key space)
+
+Today, if a key is created from the S3 interface can cause exceptions if the intermediate directories are checked from HCFS:
+
+
+```shell
+$ aws s3api put-object --endpoint http://localhost:9878 --bucket bucket1 --key /a/b/c/d
+
+$ ozone fs -ls o3fs://bucket1.s3v/a/
+ls: `o3fs://bucket1.s3v/a/': No such file or directory
+```
+
+This problem is reported in [HDDS-3955](https://issues.apache.org/jira/browse/HDDS-3955), where a new configuration key is introduced (`ozone.om.enable.filesystem.paths`). If this is enabled, intermediate directories are created even if the object store interface is used.
+
+This configuration is turned off by default, which means that S3 and HCFS couldn't be used together.
+
+To solve the performance problems of the directory listing / rename, [HDDS-2939](https://issues.apache.org/jira/browse/HDDS-2939) is created, which propose to use a new prefix table to store the "directory" entries (=prefixes).
+
+[HDDS-4097](https://issues.apache.org/jira/browse/HDDS-4097) is created to normalize the key names based on file-system semantics if `ozone.om.enable.filesystem.paths` is enabled. But please note that `ozone.om.enable.filesystem.paths` should always be turned on if S3 and HCFS are both used. It means that if both S3 and HCFS are used, normalization is forced, and S3 interface is not fully AWS S3 compatible. There is no option to use HCFS and S3 but with full AWS compatibility (and reduced HCFS compatibility). 
+
+# Goals
+
+ * Out of the box Ozone should support both S3 and HCFS interfaces without any settings. (It's possible only for the regular, fs compatible key names)
+ * As 100% compatibility couldn't be achieved on both side we need a configuration to set the expectations for incompatible key names
+ * Default behavior of `o3fs` and `ofs` should be as close to `s3a` as possible (when s3 compatibilty is prefered)
+
+# Possible cases to support
+
+There are two main aspects of supporting both `ofs/o3fs` and `s3` together:
+
+ 1. `ofs/o3fs` require to create intermediate directory entries (for example `/a/b` for the key `/b/c/c`)
+ 2. Special file-system incompatible key names require special attention
+
+The second couldn't be done with compromise.
+
+ 1. We either support all key names (including non fs compatible key names), which means `ofs/o3fs` can provide only a partial view
+ 2. Or we can normalize the key names to be fs compatible (which makes it possible to create inconsistent S3 keys)
+
+HDDS-3955 introduced `ozone.om.enable.filesystem.paths`, with this setting we will have two possible usage pattern:
+
+| ozone.om.enable.filesystem.paths= | true | false
+|-|-|-|
+| create itermediate dirs | YES | NO |
+| normalize key names from `ofs/o3fs` | YES | NO
+| force to normalize key names of `s3` interface | YES | NO 
+| `s3` key `/a/b/c` available from `ofs/o3fs` | YES | NO
+| `s3` key `/a/b//c` available from `ofs/o3fs` | YES | NO
+| `s3` key `/a/b//c` available from `s3` | AWS S3 incompatibility | YES
+
+This proposal suggest to use a 3rd option where 100% AWS compatiblity is guaranteed in exchange of a limited `ofs/o3fs` view:
+
+| ozone.om.intermediate.dir.generation= | true |
+|-|-|-|
+| create itermediate dirs | YES | 
+| normalize key names from `ofs/o3fs` | YES |
+| force to normalize key names of `s3` interface | **NO** |
+| `s3` key `/a/b/c` available from `ofs/o3fs` | YES | 
+| `s3` key `/a/b//c` available from `ofs/o3fs` | NO | 
+| `s3` key `/a/b//c` available from `s3` | *YES* (100% AWS compatibility) |
+
+
+# Proposed solution
+
+In short: 
+
+ **I propose to make it possible to configure **normalization** and **intermediate dir creation**, independent from each other**
+ 
+It can be done in multiple ways. For the sake of simplicity, let's imagine two configuration option
+
+| configuration | behavior | 
+|-|-|
+| `ozone.om.enable.filesystem.paths=true`  | Enable intermediate dir generation **AND** key name normalization 
+| `ozone.om.enable.intermediate.dirs=true` | Enable only the intermediate dir generation
+
+## S3/HCFS Interoperability
+
+**In case of intermediate directory generation is enabled (with either of the configuraiton keys)**:
+
+When somebody creates a new key like `/a/b/c/d`, the same key should be visible from HCFS (`o3fs//` or `o3://`). `/a`, `/a/b` and `/a/b/c` should be visible as directories from HCFS.
+
+S3 should list only the `/a/b/c/d` keys, (`/a`, `/a/b`, `/a/b/c` keys, created to help HCFS, **won't be visible** if the key is created from S3)
+
+This can be done with persisting an extra flag with the implicit directory entries. These entries can be modified if they are explicit created.
+
+This flag should be added only for the keys which are created by S3. `ofs://` and `of3fs://`  create explicit directories all the time.
+
+Advantages of this approach:
+
+ 1. HCFS and S3 can work together
+ 2. S3 behavior is closer to the original AWS s3 behavior (when `/a/b/c` key is created `/a/b` won't be visible)
+
+## Handling of the incompatible paths
+
+As it's defined above the intermediate directory generation and normalization are two independent settings. (It's possible to choose only to create the intermediate directories).
+
+**If normalization is choosen**: (`ozone.om.enable.filesystem.paths=true`), all the key names will be normalized to fs-compatible name. It may cause a conflict (error) if the normalized key is already exists (or exists as a file instead of directory)
+
+**Without normalization (`ozone.om.enable.intermediate.dirs=true`)**:
+
+Creating intermediate directories might not be possible if path contains illegal characters or can't be parsed as a file system path. **These keys will be invisible from HCFS** by default. They will be ignored during the normal file list.

Review comment:
       `..` also can be tricky.
   
   But in general, I agree. We might be more permissive and show some of these elements even in `ofs/o3fs`. For example `a/b/c//d` might be possible to show as `/a/b/c/d`. But even if some of these are visible, we should accept that `ofs/o3fs` doesn't provide a *full* view when 100% AWS compatibility is requested.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org


[GitHub] [hadoop-ozone] elek commented on a change in pull request #1411: HDDS-4097. [DESIGN] S3/Ozone Filesystem inter-op

Posted by GitBox <gi...@apache.org>.
elek commented on a change in pull request #1411:
URL: https://github.com/apache/hadoop-ozone/pull/1411#discussion_r486442745



##########
File path: hadoop-hdds/docs/content/design/s3_hcfs.md
##########
@@ -0,0 +1,282 @@
+---
+title: S3/Ozone Filesystem inter-op 
+summary: How to support both S3 and HCFS and the same time
+date: 2020-09-09
+jira: HDDS-4097
+status: draft
+author: Marton Elek, 
+---
+<!--
+  Licensed under the Apache License, Version 2.0 (the "License");
+  you may not use this file except in compliance with the License.
+  You may obtain a copy of the License at
+
+   http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License. See accompanying LICENSE file.
+-->
+
+# Ozone S3 vs file-system semantics
+
+Ozone is an object-store for Hadoop ecosystem which can be used from multiple interfaces: 
+
+ 1. From Hadoop Compatible File Systems (will be called as *HCFS* in the remaining of this document) (RPC)
+ 2. From S3 compatible applications (REST)
+ 3. From container orchestrator as mounted volume (CSI, alpha feature)
+
+As Ozone is an object store it stores key and values in a flat hierarchy which is enough to support S3 (2). But to support Hadoop Compatible File System (and CSI), Ozone should simulated file system hierarchy.
+
+There are multiple challenges when file system hierarchy is simulated by a flat namespace:
+
+ 1. Some key patterns couldn't be easily transformed to file system path (e.g. `/a/b/../c`, `/a/b//d`, or a real key with directory path like `/b/d/`)
+ 2. Directory entries (which may have own properties) require special handling as file system interface requires a dir entry even if it's not created explicitly (for example if key `/a/b/c` is created `/a/b` supposed to be a visible directory entry for file system interface) 
+ 3. Non-recursive listing of directories can be hard (Listing direct entries under `/a` should ignore all the `/a/b/...`, `/a/b/c/...` keys) 
+ 4. Similar to listing, rename can be a costly operation as it requires to rename many keys (renaming a first level directory means a rename of all the keys with the same prefix)
+
+See also the [Hadoop S3A documentation](https://hadoop.apache.org/docs/current/hadoop-aws/tools/hadoop-aws/index.html#Introducing_the_Hadoop_S3A_client) which describes some of these problem when AWS S3 is used. (*Warnings* section)
+
+# Current status
+
+As of today *Ozone Manager* has two different interfaces (both are defined in `OmClientProtocol.proto`): 
+
+ 1. object store related functions (like *CreateKey*, *LookupKey*, *DeleteKey*,...)  
+ 2. file system related functions (like *CreateFile*, *LookupFile*,...)
+
+File system related functions uses the same flat hierarchy under the hood but includes additional functionalities. For example the `createFile` call creates all the intermediate directories for a specific key (create file `/a/b/c` will create `/a/b` and `/a` entries in the key space)
+
+Today, if a key is created from the S3 interface can cause exceptions if the intermediate directories are checked from HCFS:
+
+
+```shell
+$ aws s3api put-object --endpoint http://localhost:9878 --bucket bucket1 --key /a/b/c/d
+
+$ ozone fs -ls o3fs://bucket1.s3v/a/
+ls: `o3fs://bucket1.s3v/a/': No such file or directory
+```
+
+This problem is reported in [HDDS-3955](https://issues.apache.org/jira/browse/HDDS-3955), where a new configuration key is introduced (`ozone.om.enable.filesystem.paths`). If this is enabled, intermediate directories are created even if the object store interface is used.
+
+This configuration is turned off by default, which means that S3 and HCFS couldn't be used together.
+
+To solve the performance problems of the directory listing / rename, [HDDS-2939](https://issues.apache.org/jira/browse/HDDS-2939) is created, which propose to use a new prefix table to store the "directory" entries (=prefixes).
+
+[HDDS-4097](https://issues.apache.org/jira/browse/HDDS-4097) is created to normalize the key names based on file-system semantics if `ozone.om.enable.filesystem.paths` is enabled. But please note that `ozone.om.enable.filesystem.paths` should always be turned on if S3 and HCFS are both used which means that S3 and HCFS couldn't be used together with normalization.
+
+# Goals
+
+ * Out of the box Ozone should support both S3 and HCFS interfaces without any settings. (It's possible only for the regular, fs compatible key names)
+ * As 100% compatibility couldn't be achieved on both side we need a configuration to set the expectations for incompatible key names
+ * Default behavior of `o3fs` and `ofs` should be as close to `s3a` as possible (when s3 compatibilty is prefered)
+
+# Possible cases to support
+
+There are two main aspects of supporting both `ofs/o3fs` and `s3` together:
+
+ 1. `ofs/o3fs` require to create intermediate directory entries (for exapmle `/a/b` for the key `/b/c/c`)
+ 2. Special file-system incompatible key names require special attention
+
+The second couldn't be done with compromise.
+
+ 1. We either support all key names (including non fs compatible key names), which means `ofs/o3fs` can provide only a partial view
+ 2. Or we can normalize the key names to be fs compatible (which makes it possible to create inconsistent S3 keys)
+
+HDDS-3955 introduced `ozone.om.enable.filesystem.paths`, with this setting we will have two possible usage pattern:
+
+| ozone.om.enable.filesystem.paths= | true | false
+|-|-|-|
+| create itermediate dirs | YES | NO |
+| normalize key names from `ofs/o3fs` | YES | NO
+| force to normalize key names of `s3` interface | YES (1) | NO 
+| `s3` key `/a/b/c` available from `ofs/o3fs` | YES | NO
+| `s3` key `/a/b//c` available from `ofs/o3fs` | YES | NO
+| `s3` key `/a/b//c` available from `s3` | AWS S3 incompatibility | YES
+
+(1): Under implementation
+
+This proposal suggest to use a 3rd option where 100% AWS compatiblity is guaranteed in exchange of a limited `ofs/o3fs` view:
+
+| ozone.om.intermediate.dir.generation= | true |
+|-|-|-|
+| create itermediate dirs | YES | 
+| normalize key names from `ofs/o3fs` | YES |
+| force to normalize key names of `s3` interface | **NO** |
+| `s3` key `/a/b/c` available from `ofs/o3fs` | YES | 
+| `s3` key `/a/b//c` available from `ofs/o3fs` | NO | 
+| `s3` key `/a/b//c` available from `s3` | *YES* (100% AWS compatibility) |
+
+
+# Proposed solution
+
+In short: 
+
+ **I propose to make it possible to configure **normalization** and **intermediate dir creation**, independent from each other**
+ 
+It can be done in multiple ways. For the sake of simplicity, let's imagine two configuration option
+
+| configuration | behavior | 
+|-|-|
+| `ozone.om.enable.filesystem.paths=true`  | Enable intermediate dir generation **AND** key name normalization 
+| `ozone.om.enable.intermediate.dirs=true` | Enable only the intermediate dir generation
+
+## S3/HCFS Interoperability
+
+**In case of intermediate directory generation is enabled (with either of the configuraiton keys)**:
+
+When somebody creates a new key like `/a/b/c/d`, the same key should be visible from HCFS (`o3fs//` or `o3://`). `/a`, `/a/b` and `/a/b/c` should be visible as directories from HCFS.
+
+S3 should list only the `/a/b/c/d` keys, (`/a`, `/a/b`, `/a/b/c` keys, created to help HCFS, **won't be visible** if the key is created from S3)
+
+This can be done with persisting an extra flag with the implicit directory entries. These entries can be modified if they are explicit created.
+
+This flag should be added only for the keys which are created by S3. `ofs://` and `of3fs://`  create explicit directories all the time.
+
+Advantages of this approach:
+
+ 1. HCFS and S3 can work together
+ 2. S3 behavior is closer to the original AWS s3 behavior (when `/a/b/c` key is created `/a/b` won't be visible)
+
+## Handling of the incompatible paths
+
+As it's defined above the intermediate directory generation and normalization are two independent settings. (It's possible to choose only to create the intermediate directories).
+
+**If normalization is choosen**: (`ozone.om.enable.filesystem.paths=true`), all the key names will be normalized to fs-compatible name. It may cause a conflict (error) if the normalized key is already exists (or exists as a file instead of directory)
+
+**Without normalization (`ozone.om.enable.intermediate.dirs=true`)**:
+
+Creating intermediate directories might not be possible if path contains illegal characters or can't be parsed as a file system path. **These keys will be invisible from HCFS** by default. They will be ignored during the normal file list.
+
+## Using Ozone in object-store only mode
+
+Creating intermediate directories can have some overhead (write amplification is increased if many keys are written with different prefixes as we need an entry for each prefixes). This write-amplification can be handled with the current implementation: based on the measurements RocksDB has no problems with billions of keys.
+
+If none of the mentioned configurations are enabled, the intermediate directories won't be created. But in this case, the consistent view of `ofs/o3fs` couldn't be guaranteed, so `ofs/o3fs` **should be disabled and throw an exception** (But Ozone can be used as a pure S3 replacement without using as a HCFS).
+
+# Problematic cases
+
+As described in the previous section there are some cases which couldn't be supported out-of-the-box due to the differences between the flat key-space and file-system hierarchy. These cases are collected here together with the information how existing tools (AWS console, AWS cli, AWS S3A Hadoop connector) behaves.
+
+## Empty directory path
+
+With a pure object store keys with empty directory names can be created.
+
+```
+aws s3api put-object --bucket ozonetest --key a/b//y --body README.md
+```
+
+Behavior:
+ * *S3 web console*: empty dir is rendered as `____` to make it possible to navigate in
+ * **aws s3 ls**: Prefix entry is visible as `PRE /`
+ * S3A: Not visible
+ 
+Proposed behavior:
+
+ * `ozone.om.enable.intermediate.dirs=true`: `/y` is not accessible, `/a/b` directory doesn't contain this entry
+ * `ozone.om.enable.filesystem.paths=true`: key stored as `/a/b/c`  
+ 
+## Path with invalid characters (`..`,`.`) 
+
+Path segments might include parts which has file system semantics:
+
+```
+aws s3api put-object --bucket ozonetest --key a/b/../e --body README.md
+aws s3api put-object --bucket ozonetest --key a/b/./f --body README.md
+```
+
+Behavior:
+ * *S3 web console*: `.` and `..` are rendered as directories to make it possible to navigate in
+ * **aws s3 ls**: Prefix entry is visible as `PRE ../` and `PRE ./`
+ * S3A: Entries are not visible
+ 
+Proposed behavior:
+
+ * `ozone.om.enable.intermediate.dirs=true`: `e` and `f` are not visible
+ * `ozone.om.enable.filesystem.paths=true`: key stored as `/a/e` and `a/b/f`  
+
+## Key and directory with the same name
+
+It is possible to create directory and key with the same name in AWS:
+
+```
+aws s3api put-object --bucket ozonetest --key a/b/h --body README.md
+aws s3api list-objects --bucket ozonetest --prefix=a/b/h/
+```
+
+Behavior:
+ * *S3 web console*: both directory and file are rendered
+ * **aws s3 ls**: prefix (`PRE h/`) and file (`h`) are both displayed
+ * S3A: both entries are visible with the name `/a/b/h` but firt is a file (with size) second is a directory (with directory attributes)
+ 
+Proposed behavior:
+
+ * `ozone.om.enable.intermediate.dirs=true`: show both the file and the directory with the same name (similar to S3A)
+ * `ozone.om.enable.filesystem.paths=true`: throwing exception when the second one is created  
+
+## Directory entry created with file content
+
+In this case we create a directory (key which ends with `/`) but with real file content:
+
+```
+aws s3api put-object --bucket ozonetest --key a/b/i/ --body README.md
+```
+
+Behavior:
+ * *S3 web console*: rendered as directory (couldn't be downloaded)
+ * **aws s3 ls**: showed as a prefix (`aws s3 ls s3://ozonetest/a/b`), but when the full path is used showed as a file without name (`aws s3 ls s3://ozonetest/a/b/i/`)
+ * S3A: `./bin/hdfs dfs -ls s3a://ozonetest/a/b/` shows a directory `h`, `./bin/hdfs dfs -ls s3a://ozonetest/a/b/i` shows a file `i`
+ 
+Proposed behavior:
+
+ * `ozone.om.enable.intermediate.dirs=true`: possible but `i/` is hidden from o3fs/ofs
+ * `ozone.om.enable.filesystem.paths=true`: key name is normalized to real key name
+
+## Create key and explicit create parent dir
+
+```
+aws s3api put-object --bucket ozonetest --key e/f/g/
+aws s3api put-object --bucket ozonetest --key e/f/
+```
+
+Behavior:
+
+ * S3 can support it without any problem
+ 
+Proposed behavior:
+
+After the first command `/e/f/` and `/e/` entries created in the key space (as they are required by `ofs`/`o3fs`) but **with a specific flag** (explicit=false). 
+
+AWS S3 list-objects API should exclude those entries from the result (!).
+
+Second command execution should modify the flag of `/e/f/` key to (explicit=true).
+
+## Create parent dir AND key with S3a
+
+This is the problem which is reported by [HDDS-4209](https://issues.apache.org/jira/browse/HDDS-4209)
+
+```
+hdfs dfs -mkdir -p s3a://b12345/d11/d12 # -> Success
+
+hdfs dfs -put /tmp/file1 s3a://b12345/d11/d12/file1 # -> fails with below error
+```
+
+Proposed behavior:
+
+ * `ozone.om.enable.intermediate.dirs=true`: should work without error
+ * `ozone.om.enable.filesystem.paths=true`: should work without error.
+
+This is an `ofs`/`o3fs` question not an S3. The directory created in the first step shouldn't block the creation of the file. This can be a **mandatory** normalization for `mkdir` directory creation. As it's an HCFS operation, s3 is not affected. Entries created from S3 can be visible from s3 without any problem.

Review comment:
       Can we change the type to a directory when it's empty and name ends with `/` ?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org


[GitHub] [hadoop-ozone] arp7 commented on a change in pull request #1411: HDDS-4097. [DESIGN] S3/Ozone Filesystem inter-op

Posted by GitBox <gi...@apache.org>.
arp7 commented on a change in pull request #1411:
URL: https://github.com/apache/hadoop-ozone/pull/1411#discussion_r485820636



##########
File path: hadoop-hdds/docs/content/design/s3_hcfs.md
##########
@@ -67,45 +66,100 @@ To solve the performance problems of the directory listing / rename, [HDDS-2939]
 
 [HDDS-4097](https://issues.apache.org/jira/browse/HDDS-4097) is created to normalize the key names based on file-system semantics if `ozone.om.enable.filesystem.paths` is enabled. But please note that `ozone.om.enable.filesystem.paths` should always be turned on if S3 and HCFS are both used which means that S3 and HCFS couldn't be used together with normalization.
 
-## Goals
+# Goals
+
+ * Out of the box Ozone should support both S3 and HCFS interfaces without any settings. (It's possible only for the regular, fs compatible key names)
+ * As 100% compatibility couldn't be achieved on both side we need a configuration to set the expectations for incompatible key names
+ * Default behavior of `o3fs` and `ofs` should be as close to `s3a` as possible (when s3 compatibilty is prefered)
+
+# Possible cases to support
+
+There are two main aspects of supporting both `ofs/o3fs` and `s3` together:
+
+ 1. `ofs/o3fs` require to create intermediate directory entries (for exapmle `/a/b` for the key `/b/c/c`)
+ 2. Special file-system incompatible key names require special attention
+
+The second couldn't be done with compromise.
+
+ 1. We either support all key names (including non fs compatible key names), which means `ofs/o3fs` can provide only a partial view
+ 2. Or we can normalize the key names to be fs compatible (which makes it possible to create inconsistent S3 keys)
+
+HDDS-3955 introduced `ozone.om.enable.filesystem.paths`, with this setting we will have two possible usage pattern:
+
+| ozone.om.enable.filesystem.paths= | true | false
+|-|-|-|
+| create itermediate dirs | YES | NO |
+| normalize key names from `ofs/o3fs` | YES | NO
+| force to normalize key names of `s3` interface | YES (1) | NO 
+| `s3` key `/a/b/c` available from `ofs/o3fs` | YES | NO
+| `s3` key `/a/b//c` available from `ofs/o3fs` | YES | NO
+| `s3` key `/a/b//c` available from `s3` | AWS S3 incompatibility | YES
+
+(1): Under implementation
 
- * Out of the box Ozone should support both S3 and HCFS interfaces without any settings. (It's possible only for the regular path)
- * As 100% compatibility couldn't be achieved on both side we need a configuration to set the expectations in case of incompatible key names
- * Default behavior of `o3fs` and `ofs` should be as close to `s3a` as possible
+This proposal suggest to use a 3rd option where 100% AWS compatiblity is guaranteed in exchange of a limited `ofs/o3fs` view:

Review comment:
       bq. an existing bucket can be exposed created via CLI to be exposed to S3, what semantics that bucket will get?
   If the bucket was created via FS interface, it will support FS semantics.
   
   bq. Buckets creation is possible via only OFS, what about O3fs?
   Good point, for buckets created via the Ozone shell, we could accept a command-line flag. The default can be filesystem because S3 buckets are traditionally created via the S3 API. You're right this needs some more discussion.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org


[GitHub] [hadoop-ozone] elek commented on pull request #1411: HDDS-4097. [DESIGN] S3/Ozone Filesystem inter-op

Posted by GitBox <gi...@apache.org>.
elek commented on pull request #1411:
URL: https://github.com/apache/hadoop-ozone/pull/1411#issuecomment-690376470


   > Thank You @elek for the design document.
   > 
   > My understanding from this is the draft is as below. Let me know if I am missing something here.
   > <img alt="Screen Shot 2020-09-09 at 10 59 11 AM" width="998" src="https://user-images.githubusercontent.com/8586345/92635994-8856fe80-f28b-11ea-95bf-8864d48e488f.png">
   
   Correct. But this is not a matrix anymore. You should turn on either first or second of the configs, but not both. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org


[GitHub] [hadoop-ozone] bharatviswa504 commented on a change in pull request #1411: HDDS-4097. [DESIGN] S3/Ozone Filesystem inter-op

Posted by GitBox <gi...@apache.org>.
bharatviswa504 commented on a change in pull request #1411:
URL: https://github.com/apache/hadoop-ozone/pull/1411#discussion_r485817494



##########
File path: hadoop-hdds/docs/content/design/s3_hcfs.md
##########
@@ -67,45 +66,100 @@ To solve the performance problems of the directory listing / rename, [HDDS-2939]
 
 [HDDS-4097](https://issues.apache.org/jira/browse/HDDS-4097) is created to normalize the key names based on file-system semantics if `ozone.om.enable.filesystem.paths` is enabled. But please note that `ozone.om.enable.filesystem.paths` should always be turned on if S3 and HCFS are both used which means that S3 and HCFS couldn't be used together with normalization.
 
-## Goals
+# Goals
+
+ * Out of the box Ozone should support both S3 and HCFS interfaces without any settings. (It's possible only for the regular, fs compatible key names)
+ * As 100% compatibility couldn't be achieved on both side we need a configuration to set the expectations for incompatible key names
+ * Default behavior of `o3fs` and `ofs` should be as close to `s3a` as possible (when s3 compatibilty is prefered)
+
+# Possible cases to support
+
+There are two main aspects of supporting both `ofs/o3fs` and `s3` together:
+
+ 1. `ofs/o3fs` require to create intermediate directory entries (for exapmle `/a/b` for the key `/b/c/c`)
+ 2. Special file-system incompatible key names require special attention
+
+The second couldn't be done with compromise.
+
+ 1. We either support all key names (including non fs compatible key names), which means `ofs/o3fs` can provide only a partial view
+ 2. Or we can normalize the key names to be fs compatible (which makes it possible to create inconsistent S3 keys)
+
+HDDS-3955 introduced `ozone.om.enable.filesystem.paths`, with this setting we will have two possible usage pattern:
+
+| ozone.om.enable.filesystem.paths= | true | false
+|-|-|-|
+| create itermediate dirs | YES | NO |
+| normalize key names from `ofs/o3fs` | YES | NO
+| force to normalize key names of `s3` interface | YES (1) | NO 
+| `s3` key `/a/b/c` available from `ofs/o3fs` | YES | NO
+| `s3` key `/a/b//c` available from `ofs/o3fs` | YES | NO
+| `s3` key `/a/b//c` available from `s3` | AWS S3 incompatibility | YES
+
+(1): Under implementation
 
- * Out of the box Ozone should support both S3 and HCFS interfaces without any settings. (It's possible only for the regular path)
- * As 100% compatibility couldn't be achieved on both side we need a configuration to set the expectations in case of incompatible key names
- * Default behavior of `o3fs` and `ofs` should be as close to `s3a` as possible
+This proposal suggest to use a 3rd option where 100% AWS compatiblity is guaranteed in exchange of a limited `ofs/o3fs` view:

Review comment:
       We also have another way right, an existing bucket can be exposed created via CLI to be exposed to S3, what semantics that bucket will get?
   
   >For buckets created via FS interface, the FS semantics will always take precedence
   Buckets creation is possible via only OFS, what about O3fs?
   
   >If the global setting is enabled, then the value of the setting at the time of bucket creation is sampled and that takes >effect for the lifetime of the bucket.
   
   A bucket created via Shell, when global flag (assuming ozone.om.enable.filesystem.paths=true), they will follow FS semantics and with slight S3 incompatibility.
   So, a bucket created via Shell, when global flag (assuming ozone.om.enable.filesystem.paths=false), they will follow S3 semantics and with broken FS semantics or completely disallow.
   
   Written from my understanding, as I have not got the complete context of the proposal.
   
   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org


[GitHub] [hadoop-ozone] bharatviswa504 commented on a change in pull request #1411: HDDS-4097. [DESIGN] S3/Ozone Filesystem inter-op

Posted by GitBox <gi...@apache.org>.
bharatviswa504 commented on a change in pull request #1411:
URL: https://github.com/apache/hadoop-ozone/pull/1411#discussion_r485801742



##########
File path: hadoop-hdds/docs/content/design/s3_hcfs.md
##########
@@ -0,0 +1,282 @@
+---
+title: S3/Ozone Filesystem inter-op 
+summary: How to support both S3 and HCFS and the same time
+date: 2020-09-09
+jira: HDDS-4097
+status: draft
+author: Marton Elek, 
+---
+<!--
+  Licensed under the Apache License, Version 2.0 (the "License");
+  you may not use this file except in compliance with the License.
+  You may obtain a copy of the License at
+
+   http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License. See accompanying LICENSE file.
+-->
+
+# Ozone S3 vs file-system semantics
+
+Ozone is an object-store for Hadoop ecosystem which can be used from multiple interfaces: 
+
+ 1. From Hadoop Compatible File Systems (will be called as *HCFS* in the remaining of this document) (RPC)
+ 2. From S3 compatible applications (REST)
+ 3. From container orchestrator as mounted volume (CSI, alpha feature)
+
+As Ozone is an object store it stores key and values in a flat hierarchy which is enough to support S3 (2). But to support Hadoop Compatible File System (and CSI), Ozone should simulated file system hierarchy.
+
+There are multiple challenges when file system hierarchy is simulated by a flat namespace:
+
+ 1. Some key patterns couldn't be easily transformed to file system path (e.g. `/a/b/../c`, `/a/b//d`, or a real key with directory path like `/b/d/`)
+ 2. Directory entries (which may have own properties) require special handling as file system interface requires a dir entry even if it's not created explicitly (for example if key `/a/b/c` is created `/a/b` supposed to be a visible directory entry for file system interface) 
+ 3. Non-recursive listing of directories can be hard (Listing direct entries under `/a` should ignore all the `/a/b/...`, `/a/b/c/...` keys) 
+ 4. Similar to listing, rename can be a costly operation as it requires to rename many keys (renaming a first level directory means a rename of all the keys with the same prefix)
+
+See also the [Hadoop S3A documentation](https://hadoop.apache.org/docs/current/hadoop-aws/tools/hadoop-aws/index.html#Introducing_the_Hadoop_S3A_client) which describes some of these problem when AWS S3 is used. (*Warnings* section)
+
+# Current status
+
+As of today *Ozone Manager* has two different interfaces (both are defined in `OmClientProtocol.proto`): 
+
+ 1. object store related functions (like *CreateKey*, *LookupKey*, *DeleteKey*,...)  
+ 2. file system related functions (like *CreateFile*, *LookupFile*,...)
+
+File system related functions uses the same flat hierarchy under the hood but includes additional functionalities. For example the `createFile` call creates all the intermediate directories for a specific key (create file `/a/b/c` will create `/a/b` and `/a` entries in the key space)
+
+Today, if a key is created from the S3 interface can cause exceptions if the intermediate directories are checked from HCFS:
+
+
+```shell
+$ aws s3api put-object --endpoint http://localhost:9878 --bucket bucket1 --key /a/b/c/d
+
+$ ozone fs -ls o3fs://bucket1.s3v/a/
+ls: `o3fs://bucket1.s3v/a/': No such file or directory
+```
+
+This problem is reported in [HDDS-3955](https://issues.apache.org/jira/browse/HDDS-3955), where a new configuration key is introduced (`ozone.om.enable.filesystem.paths`). If this is enabled, intermediate directories are created even if the object store interface is used.
+
+This configuration is turned off by default, which means that S3 and HCFS couldn't be used together.
+
+To solve the performance problems of the directory listing / rename, [HDDS-2939](https://issues.apache.org/jira/browse/HDDS-2939) is created, which propose to use a new prefix table to store the "directory" entries (=prefixes).
+
+[HDDS-4097](https://issues.apache.org/jira/browse/HDDS-4097) is created to normalize the key names based on file-system semantics if `ozone.om.enable.filesystem.paths` is enabled. But please note that `ozone.om.enable.filesystem.paths` should always be turned on if S3 and HCFS are both used which means that S3 and HCFS couldn't be used together with normalization.
+
+# Goals
+
+ * Out of the box Ozone should support both S3 and HCFS interfaces without any settings. (It's possible only for the regular, fs compatible key names)
+ * As 100% compatibility couldn't be achieved on both side we need a configuration to set the expectations for incompatible key names
+ * Default behavior of `o3fs` and `ofs` should be as close to `s3a` as possible (when s3 compatibilty is prefered)
+
+# Possible cases to support
+
+There are two main aspects of supporting both `ofs/o3fs` and `s3` together:
+
+ 1. `ofs/o3fs` require to create intermediate directory entries (for exapmle `/a/b` for the key `/b/c/c`)
+ 2. Special file-system incompatible key names require special attention
+
+The second couldn't be done with compromise.
+
+ 1. We either support all key names (including non fs compatible key names), which means `ofs/o3fs` can provide only a partial view
+ 2. Or we can normalize the key names to be fs compatible (which makes it possible to create inconsistent S3 keys)
+
+HDDS-3955 introduced `ozone.om.enable.filesystem.paths`, with this setting we will have two possible usage pattern:
+
+| ozone.om.enable.filesystem.paths= | true | false
+|-|-|-|
+| create itermediate dirs | YES | NO |
+| normalize key names from `ofs/o3fs` | YES | NO
+| force to normalize key names of `s3` interface | YES (1) | NO 
+| `s3` key `/a/b/c` available from `ofs/o3fs` | YES | NO
+| `s3` key `/a/b//c` available from `ofs/o3fs` | YES | NO
+| `s3` key `/a/b//c` available from `s3` | AWS S3 incompatibility | YES
+
+(1): Under implementation
+
+This proposal suggest to use a 3rd option where 100% AWS compatiblity is guaranteed in exchange of a limited `ofs/o3fs` view:
+
+| ozone.om.intermediate.dir.generation= | true |
+|-|-|-|
+| create itermediate dirs | YES | 
+| normalize key names from `ofs/o3fs` | YES |
+| force to normalize key names of `s3` interface | **NO** |
+| `s3` key `/a/b/c` available from `ofs/o3fs` | YES | 
+| `s3` key `/a/b//c` available from `ofs/o3fs` | NO | 
+| `s3` key `/a/b//c` available from `s3` | *YES* (100% AWS compatibility) |
+
+
+# Proposed solution
+
+In short: 
+
+ **I propose to make it possible to configure **normalization** and **intermediate dir creation**, independent from each other**
+ 
+It can be done in multiple ways. For the sake of simplicity, let's imagine two configuration option
+
+| configuration | behavior | 
+|-|-|
+| `ozone.om.enable.filesystem.paths=true`  | Enable intermediate dir generation **AND** key name normalization 
+| `ozone.om.enable.intermediate.dirs=true` | Enable only the intermediate dir generation
+
+## S3/HCFS Interoperability
+
+**In case of intermediate directory generation is enabled (with either of the configuraiton keys)**:
+
+When somebody creates a new key like `/a/b/c/d`, the same key should be visible from HCFS (`o3fs//` or `o3://`). `/a`, `/a/b` and `/a/b/c` should be visible as directories from HCFS.
+
+S3 should list only the `/a/b/c/d` keys, (`/a`, `/a/b`, `/a/b/c` keys, created to help HCFS, **won't be visible** if the key is created from S3)
+
+This can be done with persisting an extra flag with the implicit directory entries. These entries can be modified if they are explicit created.
+
+This flag should be added only for the keys which are created by S3. `ofs://` and `of3fs://`  create explicit directories all the time.
+
+Advantages of this approach:
+
+ 1. HCFS and S3 can work together
+ 2. S3 behavior is closer to the original AWS s3 behavior (when `/a/b/c` key is created `/a/b` won't be visible)
+
+## Handling of the incompatible paths
+
+As it's defined above the intermediate directory generation and normalization are two independent settings. (It's possible to choose only to create the intermediate directories).
+
+**If normalization is choosen**: (`ozone.om.enable.filesystem.paths=true`), all the key names will be normalized to fs-compatible name. It may cause a conflict (error) if the normalized key is already exists (or exists as a file instead of directory)
+
+**Without normalization (`ozone.om.enable.intermediate.dirs=true`)**:
+
+Creating intermediate directories might not be possible if path contains illegal characters or can't be parsed as a file system path. **These keys will be invisible from HCFS** by default. They will be ignored during the normal file list.
+
+## Using Ozone in object-store only mode
+
+Creating intermediate directories can have some overhead (write amplification is increased if many keys are written with different prefixes as we need an entry for each prefixes). This write-amplification can be handled with the current implementation: based on the measurements RocksDB has no problems with billions of keys.
+
+If none of the mentioned configurations are enabled, the intermediate directories won't be created. But in this case, the consistent view of `ofs/o3fs` couldn't be guaranteed, so `ofs/o3fs` **should be disabled and throw an exception** (But Ozone can be used as a pure S3 replacement without using as a HCFS).
+
+# Problematic cases
+
+As described in the previous section there are some cases which couldn't be supported out-of-the-box due to the differences between the flat key-space and file-system hierarchy. These cases are collected here together with the information how existing tools (AWS console, AWS cli, AWS S3A Hadoop connector) behaves.
+
+## Empty directory path
+
+With a pure object store keys with empty directory names can be created.
+
+```
+aws s3api put-object --bucket ozonetest --key a/b//y --body README.md
+```
+
+Behavior:
+ * *S3 web console*: empty dir is rendered as `____` to make it possible to navigate in
+ * **aws s3 ls**: Prefix entry is visible as `PRE /`
+ * S3A: Not visible
+ 
+Proposed behavior:
+
+ * `ozone.om.enable.intermediate.dirs=true`: `/y` is not accessible, `/a/b` directory doesn't contain this entry
+ * `ozone.om.enable.filesystem.paths=true`: key stored as `/a/b/c`  

Review comment:
       With this, visible from both, but in S3 it will be shown with normalized name?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org


[GitHub] [hadoop-ozone] elek commented on pull request #1411: HDDS-4097. [DESIGN] S3/Ozone Filesystem inter-op

Posted by GitBox <gi...@apache.org>.
elek commented on pull request #1411:
URL: https://github.com/apache/hadoop-ozone/pull/1411#issuecomment-689541807


   Opened as a **DRAFT** pull request as this is only a proposal. 
   
   @arp7 and @bharatviswa504 still have concerns about the proposed approache.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-issues-help@hadoop.apache.org