You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ozone.apache.org by "Attila Doroszlai (Jira)" <ji...@apache.org> on 2023/12/20 17:23:00 UTC

[jira] [Resolved] (HDDS-9807) [EC][SCM] Incorrect check of available space on datanodes in case of allocating blocks

     [ https://issues.apache.org/jira/browse/HDDS-9807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Attila Doroszlai resolved HDDS-9807.
------------------------------------
    Fix Version/s: 1.5.0
       Resolution: Fixed

> [EC][SCM] Incorrect check of available space on datanodes in case of allocating blocks
> --------------------------------------------------------------------------------------
>
>                 Key: HDDS-9807
>                 URL: https://issues.apache.org/jira/browse/HDDS-9807
>             Project: Apache Ozone
>          Issue Type: Bug
>          Components: EC, SCM
>    Affects Versions: 1.4.0
>            Reporter: Vyacheslav Tutrinov
>            Assignee: Vyacheslav Tutrinov
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 1.5.0
>
>
> SCM checks the datanodes availability to allocate blocks incorrectly - it doesn't consider the committed space (created containers max size sum).
> Imagine the case:
> 1. The cluster has 10 datanodes with 2Gb storage mounted to /data
> ./hadoop-ozone/dist/target/ozone-1.4.0-SNAPSHOT/compose/ozone/docker-compose.yaml
> {code:yaml}
> version: "3.8"
> # reusable fragments (see https://docs.docker.com/compose/compose-file/#extension-fields)
> x-common-config:
>   &common-config
>   image: ${OZONE_RUNNER_IMAGE}:${OZONE_RUNNER_VERSION}
>   volumes:
>     - ../..:/opt/hadoop
>   env_file:
>     - docker-config
> x-replication:
>   &replication
>   OZONE-SITE.XML_ozone.server.default.replication: ${OZONE_REPLICATION_FACTOR:-1}
> services:
>   datanode1:
>     <<: *common-config
>     ports:
>       - 19864
>       - 9882
>       - 9001:5005
>     environment:
>       <<: *replication
>       OZONE_OPTS: -agentlib:jdwp=transport=dt_socket,server=y,suspend=n,address=*:5005
>     command: [ "ozone","datanode" ]
>     volumes:
>       - tmpfs1:/data
>       - ../..:/opt/hadoop
>   datanode2:
>     <<: *common-config
>     ports:
>       - 19864
>       - 9882
>       - 9002:5005
>     environment:
>       <<: *replication
>       OZONE_OPTS: -agentlib:jdwp=transport=dt_socket,server=y,suspend=n,address=*:5005
>     command: [ "ozone","datanode" ]
>     volumes:
>       - tmpfs2:/data
>       - ../..:/opt/hadoop
>   datanode3:
>     <<: *common-config
>     ports:
>       - 19864
>       - 9882
>       - 9003:5005
>     environment:
>       <<: *replication
>       OZONE_OPTS: -agentlib:jdwp=transport=dt_socket,server=y,suspend=n,address=*:5005
>     command: [ "ozone","datanode" ]
>     volumes:
>       - tmpfs3:/data
>       - ../..:/opt/hadoop
>   datanode4:
>     <<: *common-config
>     ports:
>       - 19864
>       - 9882
>       - 9004:5005
>     environment:
>       <<: *replication
>       OZONE_OPTS: -agentlib:jdwp=transport=dt_socket,server=y,suspend=n,address=*:5005
>     command: [ "ozone","datanode" ]
>     volumes:
>       - tmpfs4:/data
>       - ../..:/opt/hadoop
>   datanode5:
>     <<: *common-config
>     ports:
>       - 19864
>       - 9882
>       - 9005:5005
>     environment:
>       <<: *replication
>       OZONE_OPTS: -agentlib:jdwp=transport=dt_socket,server=y,suspend=n,address=*:5005
>     command: [ "ozone","datanode" ]
>     volumes:
>       - tmpfs5:/data
>       - ../..:/opt/hadoop
>   datanode6:
>     <<: *common-config
>     ports:
>       - 19864
>       - 9882
>       - 9006:5005
>     environment:
>       <<: *replication
>       OZONE_OPTS: -agentlib:jdwp=transport=dt_socket,server=y,suspend=n,address=*:5005
>     command: [ "ozone","datanode" ]
>     volumes:
>       - tmpfs6:/data
>       - ../..:/opt/hadoop
>   datanode7:
>     <<: *common-config
>     ports:
>       - 19864
>       - 9882
>       - 9007:5005
>     environment:
>       <<: *replication
>       OZONE_OPTS: -agentlib:jdwp=transport=dt_socket,server=y,suspend=n,address=*:5005
>     command: [ "ozone","datanode" ]
>     volumes:
>       - tmpfs7:/data
>       - ../..:/opt/hadoop
>   datanode8:
>     <<: *common-config
>     ports:
>       - 19864
>       - 9882
>       - 9008:5005
>     environment:
>       <<: *replication
>       OZONE_OPTS: -agentlib:jdwp=transport=dt_socket,server=y,suspend=n,address=*:5005
>     command: [ "ozone","datanode" ]
>     volumes:
>       - tmpfs8:/data
>       - ../..:/opt/hadoop
>   datanode9:
>     <<: *common-config
>     ports:
>       - 19864
>       - 9882
>       - 9009:5005
>     environment:
>       <<: *replication
>       OZONE_OPTS: -agentlib:jdwp=transport=dt_socket,server=y,suspend=n,address=*:5005
>     command: [ "ozone","datanode" ]
>     volumes:
>       - tmpfs9:/data
>       - ../..:/opt/hadoop
>   datanode10:
>     <<: *common-config
>     ports:
>       - 19864
>       - 9882
>       - 9010:5005
>     environment:
>       <<: *replication
>       OZONE_OPTS: -agentlib:jdwp=transport=dt_socket,server=y,suspend=n,address=*:5005
>     command: [ "ozone","datanode" ]
>     volumes:
>       - tmpfs10:/data
>       - ../..:/opt/hadoop
>   om:
>     <<: *common-config
>     environment:
>       ENSURE_OM_INITIALIZED: /data/metadata/om/current/VERSION
>       OZONE_OPTS:
>       <<: *replication
>     ports:
>       - 9874:9874
>       - 9862:9862
>     command: ["ozone","om"]
>   scm:
>     <<: *common-config
>     ports:
>       - 9876:9876
>       - 9860:9860
>     environment:
>       ENSURE_SCM_INITIALIZED: /data/metadata/scm/current/VERSION
>       OZONE-SITE.XML_hdds.scm.safemode.min.datanode: ${OZONE_SAFEMODE_MIN_DATANODES:-1}
>       OZONE_OPTS:
>       <<: *replication
>     command: ["ozone","scm"]
>   httpfs:
>     <<: *common-config
>     environment:
>       OZONE-SITE.XML_hdds.scm.safemode.min.datanode: ${OZONE_SAFEMODE_MIN_DATANODES:-1}
>       <<: *replication
>     ports:
>       - 14000:14000
>     command: [ "ozone","httpfs" ]
>   s3g:
>     <<: *common-config
>     environment:
>       OZONE_OPTS:
>       <<: *replication
>     ports:
>       - 9878:9878
>     command: ["ozone","s3g"]
>   recon:
>     <<: *common-config
>     ports:
>       - 9888:9888
>     environment:
>       OZONE_OPTS:
>       <<: *replication
>     command: ["ozone","recon"]
> volumes:
>   tmpfs1:
>     driver: local
>     driver_opts:
>       o: "size=2g,uid=1000"
>       device: tmpfs
>       type: tmpfs
>   tmpfs2:
>     driver: local
>     driver_opts:
>       o: "size=2g,uid=2000"
>       device: tmpfs
>       type: tmpfs
>   tmpfs3:
>     driver: local
>     driver_opts:
>       o: "size=2g,uid=3000"
>       device: tmpfs
>       type: tmpfs
>   tmpfs4:
>     driver: local
>     driver_opts:
>       o: "size=2g,uid=4000"
>       device: tmpfs
>       type: tmpfs
>   tmpfs5:
>     driver: local
>     driver_opts:
>       o: "size=2g,uid=5000"
>       device: tmpfs
>       type: tmpfs
>   tmpfs6:
>     driver: local
>     driver_opts:
>       o: "size=2g,uid=6000"
>       device: tmpfs
>       type: tmpfs
>   tmpfs7:
>     driver: local
>     driver_opts:
>       o: "size=2g,uid=7000"
>       device: tmpfs
>       type: tmpfs
>   tmpfs8:
>     driver: local
>     driver_opts:
>       o: "size=2g,uid=8000"
>       device: tmpfs
>       type: tmpfs
>   tmpfs9:
>     driver: local
>     driver_opts:
>       o: "size=2g,uid=9000"
>       device: tmpfs
>       type: tmpfs
>   tmpfs10:
>     driver: local
>     driver_opts:
>       o: "size=2g,uid=10000"
>       device: tmpfs
>       type: tmpfs
> {code}
> ./hadoop-ozone/dist/target/ozone-1.4.0-SNAPSHOT/compose/ozone/.env
> {code}
> ...
> OZONE_REPLICATION_FACTOR=3
> ...
> {code}
> ./hadoop-ozone/dist/target/ozone-1.4.0-SNAPSHOT/compose/ozone/docker-config
> {code}
> ...
> OZONE-SITE.XML_ozone.scm.pipeline.creation.auto.factor.one=false
> ...
> {code}
> 2. There is an EC-bucket with rs-6-3-1024k replication config
> {code:bash}
> ozone sh volume create data
> ozone sh bucket create data/bucket1 --type EC --replication rs-6-3-1024k --layout LEGACY
> ozone sh bucket link data/bucket1 s3v/bucket1
> {code}
> 3. Create 200KiB file and put it to the bucket
> {code:bash}
> head -c 200KiB </dev/urandom > /tmp/test_file_200KiB
> ozone sh key put s3v/bucket1/test_key_200KiB_001 /tmp/test_file_200KiB
> {code}
> A new EC-pipeline will be created:
> {code}
> #scm log
> 2023-11-30 08:33:26,124 [IPC Server handler 7 on default port 9863] INFO pipeline.WritableECContainerProvider: Created and opened new pipeline Pipeline[ Id: 70b771a8-0141-4447-8e0f-730b9fba2c34, Nodes: 05f06265-66e3-407d-9429-a31754686468(ozone-datanode4-1.ozone_default/192.168.176.5)3c4549f6-e3b9-44a5-8e8b-5c1078ddaffa(ozone-datanode5-1.ozone_default/192.168.176.3)dbb2a07e-5b7d-4aef-a7cc-aed3134563ae(ozone-datanode10-1.ozone_default/192.168.176.6)1f6131be-4cec-465d-a5cd-cf7b87824b7f(ozone-datanode3-1.ozone_default/192.168.176.7)afd41e81-1ead-4c5a-b087-8f1bb69e2574(ozone-datanode6-1.ozone_default/192.168.176.2)9a144484-a05a-42e4-813e-4aaccf390ea8(ozone-datanode9-1.ozone_default/192.168.176.12)65fd7e45-140b-4524-b0e3-800ca5fb0724(ozone-datanode8-1.ozone_default/192.168.176.8)993705b9-1599-4901-a629-56fbc4c29971(ozone-datanode2-1.ozone_default/192.168.176.13)8614d173-4001-46d4-a4e2-1a30339b8585(ozone-datanode1-1.ozone_default/192.168.176.10), ReplicationConfig: EC{rs-6-3-1024k}, State:ALLOCATED, leaderId:, CreationTimestamp2023-11-30T08:33:26.080416Z[UTC]]
> # ozone admin pipeline list
> Pipeline[ Id: 077f1a30-0dec-4538-a66f-509583223052, Nodes: 05f06265-66e3-407d-9429-a31754686468(ozone-datanode4-1.ozone_default/192.168.176.5)1f6131be-4cec-465d-a5cd-cf7b87824b7f(ozone-datanode3-1.ozone_default/192.168.176.7)8614d173-4001-46d4-a4e2-1a30339b8585(ozone-datanode1-1.ozone_default/192.168.176.10), ReplicationConfig: RATIS/THREE, State:OPEN, leaderId:05f06265-66e3-407d-9429-a31754686468, CreationTimestamp2023-11-30T08:30:47.873Z[UTC]]
> Pipeline[ Id: cda08d91-afee-4d31-ad16-02ea3313e502, Nodes: afd41e81-1ead-4c5a-b087-8f1bb69e2574(ozone-datanode6-1.ozone_default/192.168.176.2)65fd7e45-140b-4524-b0e3-800ca5fb0724(ozone-datanode8-1.ozone_default/192.168.176.8)993705b9-1599-4901-a629-56fbc4c29971(ozone-datanode2-1.ozone_default/192.168.176.13), ReplicationConfig: RATIS/THREE, State:OPEN, leaderId:afd41e81-1ead-4c5a-b087-8f1bb69e2574, CreationTimestamp2023-11-30T08:30:47.508Z[UTC]]
> Pipeline[ Id: 70b771a8-0141-4447-8e0f-730b9fba2c34, Nodes: 05f06265-66e3-407d-9429-a31754686468(ozone-datanode4-1.ozone_default/192.168.176.5)3c4549f6-e3b9-44a5-8e8b-5c1078ddaffa(ozone-datanode5-1.ozone_default/192.168.176.3)dbb2a07e-5b7d-4aef-a7cc-aed3134563ae(ozone-datanode10-1.ozone_default/192.168.176.6)1f6131be-4cec-465d-a5cd-cf7b87824b7f(ozone-datanode3-1.ozone_default/192.168.176.7)afd41e81-1ead-4c5a-b087-8f1bb69e2574(ozone-datanode6-1.ozone_default/192.168.176.2)9a144484-a05a-42e4-813e-4aaccf390ea8(ozone-datanode9-1.ozone_default/192.168.176.12)65fd7e45-140b-4524-b0e3-800ca5fb0724(ozone-datanode8-1.ozone_default/192.168.176.8)993705b9-1599-4901-a629-56fbc4c29971(ozone-datanode2-1.ozone_default/192.168.176.13)8614d173-4001-46d4-a4e2-1a30339b8585(ozone-datanode1-1.ozone_default/192.168.176.10), ReplicationConfig: EC{rs-6-3-1024k}, State:OPEN, leaderId:, CreationTimestamp2023-11-30T08:33:26.080Z[UTC]]
> Pipeline[ Id: d46e8c43-ed23-460a-8200-bb4af0599cae, Nodes: 9a144484-a05a-42e4-813e-4aaccf390ea8(ozone-datanode9-1.ozone_default/192.168.176.12)e68158dc-6f86-4304-b78d-86c4fa93cd7d(ozone-datanode7-1.ozone_default/192.168.176.15)3c4549f6-e3b9-44a5-8e8b-5c1078ddaffa(ozone-datanode5-1.ozone_default/192.168.176.3), ReplicationConfig: RATIS/THREE, State:OPEN, leaderId:3c4549f6-e3b9-44a5-8e8b-5c1078ddaffa, CreationTimestamp2023-11-30T08:30:48.503Z[UTC]]
> {code}
> Datanodes usageinfo
> {code}
> Usage Information (1 Datanodes)
> UUID         : 8614d173-4001-46d4-a4e2-1a30339b8585 
> IP Address   : 192.168.176.10 
> Hostname     : ozone-datanode1-1.ozone_default 
> Capacity     : 2147483648 B (2 GB) 
> Total Used   : 179290112 B (170.98 MB) 
> Total Used % : 8.35% 
> Ozone Used   : 204800 B (200 KB) 
> Ozone Used % : 0.01% 
> Remaining    : 1968193536 B (1.83 GB) 
> Remaining %  : 91.65% 
> Container(s) : 1 
> Usage Information (1 Datanodes)
> UUID         : 993705b9-1599-4901-a629-56fbc4c29971 
> IP Address   : 192.168.176.13 
> Hostname     : ozone-datanode2-1.ozone_default 
> Capacity     : 2147483648 B (2 GB) 
> Total Used   : 179290112 B (170.98 MB) 
> Total Used % : 8.35% 
> Ozone Used   : 204800 B (200 KB) 
> Ozone Used % : 0.01% 
> Remaining    : 1968193536 B (1.83 GB) 
> Remaining %  : 91.65% 
> Container(s) : 1 
> Usage Information (1 Datanodes)
> UUID         : 9a144484-a05a-42e4-813e-4aaccf390ea8 
> IP Address   : 192.168.176.12 
> Hostname     : ozone-datanode9-1.ozone_default 
> Capacity     : 2147483648 B (2 GB) 
> Total Used   : 179085312 B (170.79 MB) 
> Total Used % : 8.34% 
> Ozone Used   : 0 B (0 B) 
> Ozone Used % : 0.00% 
> Remaining    : 1968398336 B (1.83 GB) 
> Remaining %  : 91.66% 
> Container(s) : 1 
> Usage Information (1 Datanodes)
> UUID         : afd41e81-1ead-4c5a-b087-8f1bb69e2574 
> IP Address   : 192.168.176.2 
> Hostname     : ozone-datanode6-1.ozone_default 
> Capacity     : 2147483648 B (2 GB) 
> Total Used   : 179085312 B (170.79 MB) 
> Total Used % : 8.34% 
> Ozone Used   : 0 B (0 B) 
> Ozone Used % : 0.00% 
> Remaining    : 1968398336 B (1.83 GB) 
> Remaining %  : 91.66% 
> Container(s) : 1 
> Usage Information (1 Datanodes)
> UUID         : dbb2a07e-5b7d-4aef-a7cc-aed3134563ae 
> IP Address   : 192.168.176.6 
> Hostname     : ozone-datanode10-1.ozone_default 
> Capacity     : 2147483648 B (2 GB) 
> Total Used   : 174878720 B (166.78 MB) 
> Total Used % : 8.14% 
> Ozone Used   : 0 B (0 B) 
> Ozone Used % : 0.00% 
> Remaining    : 1972604928 B (1.84 GB) 
> Remaining %  : 91.86% 
> Container(s) : 1 
> Usage Information (1 Datanodes)
> UUID         : e68158dc-6f86-4304-b78d-86c4fa93cd7d 
> IP Address   : 192.168.176.15 
> Hostname     : ozone-datanode7-1.ozone_default 
> Capacity     : 2147483648 B (2 GB) 
> Total Used   : 31440896 B (29.98 MB) 
> Total Used % : 1.46% 
> Ozone Used   : 0 B (0 B) 
> Ozone Used % : 0.00% 
> Remaining    : 2116042752 B (1.97 GB) 
> Remaining %  : 98.54% 
> Container(s) : 0 
> Usage Information (1 Datanodes)
> UUID         : 05f06265-66e3-407d-9429-a31754686468 
> IP Address   : 192.168.176.5 
> Hostname     : ozone-datanode4-1.ozone_default 
> Capacity     : 2147483648 B (2 GB) 
> Total Used   : 179290112 B (170.98 MB) 
> Total Used % : 8.35% 
> Ozone Used   : 204800 B (200 KB) 
> Ozone Used % : 0.01% 
> Remaining    : 1968193536 B (1.83 GB) 
> Remaining %  : 91.65% 
> Container(s) : 1 
> Usage Information (1 Datanodes)
> UUID         : 1f6131be-4cec-465d-a5cd-cf7b87824b7f 
> IP Address   : 192.168.176.7 
> Hostname     : ozone-datanode3-1.ozone_default 
> Capacity     : 2147483648 B (2 GB) 
> Total Used   : 179085312 B (170.79 MB) 
> Total Used % : 8.34% 
> Ozone Used   : 0 B (0 B) 
> Ozone Used % : 0.00% 
> Remaining    : 1968398336 B (1.83 GB) 
> Remaining %  : 91.66% 
> Container(s) : 1 
> Usage Information (1 Datanodes)
> UUID         : 3c4549f6-e3b9-44a5-8e8b-5c1078ddaffa 
> IP Address   : 192.168.176.3 
> Hostname     : ozone-datanode5-1.ozone_default 
> Capacity     : 2147483648 B (2 GB) 
> Total Used   : 179085312 B (170.79 MB) 
> Total Used % : 8.34% 
> Ozone Used   : 0 B (0 B) 
> Ozone Used % : 0.00% 
> Remaining    : 1968398336 B (1.83 GB) 
> Remaining %  : 91.66% 
> Container(s) : 1 
> Usage Information (1 Datanodes)
> UUID         : 65fd7e45-140b-4524-b0e3-800ca5fb0724 
> IP Address   : 192.168.176.8 
> Hostname     : ozone-datanode8-1.ozone_default 
> Capacity     : 2147483648 B (2 GB) 
> Total Used   : 179290112 B (170.98 MB) 
> Total Used % : 8.35% 
> Ozone Used   : 204800 B (200 KB) 
> Ozone Used % : 0.01% 
> Remaining    : 1968193536 B (1.83 GB) 
> Remaining %  : 91.65% 
> Container(s) : 1 
> {code}
> 4. Now let's try to create a 100MiB file and put it down to the same bucket
> {code:bash}
> head -c 100MiB </dev/urandom > /tmp/test_file_100MiB
> ozone sh key put s3v/bucket1/test_key_100MiB_001 /tmp/test_file_100MiB
> {code}
> The request will fail with the next error on the client side:
> {code}
> INTERNAL_ERROR No enough datanodes to choose. TotalNodes = 10 AvailableNodes = 0 RequiredNodes = 9 ExcludedNodes = 10 UsedNodes = 0
> {code}
> The SCM creates EC-pipelines up to the max pipeline count:
> {code}
> ozone-scm-1         | 2023-11-30 08:40:04,485 [IPC Server handler 20 on default port 9863] INFO algorithms.SCMContainerPlacementRackScatter: Chosen nodes: [1f6131be-4cec-465d-a5cd-cf7b87824b7f(ozone-datanode3-1.ozone_default/192.168.176.7), 993705b9-1599-4901-a629-56fbc4c29971(ozone-datanode2-1.ozone_default/192.168.176.13), 9a144484-a05a-42e4-813e-4aaccf390ea8(ozone-datanode9-1.ozone_default/192.168.176.12), 3c4549f6-e3b9-44a5-8e8b-5c1078ddaffa(ozone-datanode5-1.ozone_default/192.168.176.3), 8614d173-4001-46d4-a4e2-1a30339b8585(ozone-datanode1-1.ozone_default/192.168.176.10), e68158dc-6f86-4304-b78d-86c4fa93cd7d(ozone-datanode7-1.ozone_default/192.168.176.15), 65fd7e45-140b-4524-b0e3-800ca5fb0724(ozone-datanode8-1.ozone_default/192.168.176.8), afd41e81-1ead-4c5a-b087-8f1bb69e2574(ozone-datanode6-1.ozone_default/192.168.176.2), dbb2a07e-5b7d-4aef-a7cc-aed3134563ae(ozone-datanode10-1.ozone_default/192.168.176.6)]. isPolicySatisfied: true.
> ozone-scm-1         | 2023-11-30 08:40:04,502 [IPC Server handler 20 on default port 9863] INFO pipeline.WritableECContainerProvider: Created and opened new pipeline Pipeline[ Id: 42d76b70-84f5-42a1-980a-e3fc3445edb6, Nodes: 1f6131be-4cec-465d-a5cd-cf7b87824b7f(ozone-datanode3-1.ozone_default/192.168.176.7)993705b9-1599-4901-a629-56fbc4c29971(ozone-datanode2-1.ozone_default/192.168.176.13)9a144484-a05a-42e4-813e-4aaccf390ea8(ozone-datanode9-1.ozone_default/192.168.176.12)3c4549f6-e3b9-44a5-8e8b-5c1078ddaffa(ozone-datanode5-1.ozone_default/192.168.176.3)8614d173-4001-46d4-a4e2-1a30339b8585(ozone-datanode1-1.ozone_default/192.168.176.10)e68158dc-6f86-4304-b78d-86c4fa93cd7d(ozone-datanode7-1.ozone_default/192.168.176.15)65fd7e45-140b-4524-b0e3-800ca5fb0724(ozone-datanode8-1.ozone_default/192.168.176.8)afd41e81-1ead-4c5a-b087-8f1bb69e2574(ozone-datanode6-1.ozone_default/192.168.176.2)dbb2a07e-5b7d-4aef-a7cc-aed3134563ae(ozone-datanode10-1.ozone_default/192.168.176.6), ReplicationConfig: EC{rs-6-3-1024k}, State:ALLOCATED, leaderId:, CreationTimestamp2023-11-30T08:40:04.487343Z[UTC]]
> ozone-scm-1         | 2023-11-30 08:40:04,503 [IPC Server handler 20 on default port 9863] INFO algorithms.SCMContainerPlacementRackScatter: Chosen nodes: [afd41e81-1ead-4c5a-b087-8f1bb69e2574(ozone-datanode6-1.ozone_default/192.168.176.2), 1f6131be-4cec-465d-a5cd-cf7b87824b7f(ozone-datanode3-1.ozone_default/192.168.176.7), 8614d173-4001-46d4-a4e2-1a30339b8585(ozone-datanode1-1.ozone_default/192.168.176.10), 3c4549f6-e3b9-44a5-8e8b-5c1078ddaffa(ozone-datanode5-1.ozone_default/192.168.176.3), dbb2a07e-5b7d-4aef-a7cc-aed3134563ae(ozone-datanode10-1.ozone_default/192.168.176.6), 05f06265-66e3-407d-9429-a31754686468(ozone-datanode4-1.ozone_default/192.168.176.5), e68158dc-6f86-4304-b78d-86c4fa93cd7d(ozone-datanode7-1.ozone_default/192.168.176.15), 993705b9-1599-4901-a629-56fbc4c29971(ozone-datanode2-1.ozone_default/192.168.176.13), 9a144484-a05a-42e4-813e-4aaccf390ea8(ozone-datanode9-1.ozone_default/192.168.176.12)]. isPolicySatisfied: true.
> ozone-scm-1         | 2023-11-30 08:40:04,510 [IPC Server handler 20 on default port 9863] INFO pipeline.WritableECContainerProvider: Created and opened new pipeline Pipeline[ Id: 498dfea3-17ee-4600-a3b9-94727c1cd729, Nodes: afd41e81-1ead-4c5a-b087-8f1bb69e2574(ozone-datanode6-1.ozone_default/192.168.176.2)1f6131be-4cec-465d-a5cd-cf7b87824b7f(ozone-datanode3-1.ozone_default/192.168.176.7)8614d173-4001-46d4-a4e2-1a30339b8585(ozone-datanode1-1.ozone_default/192.168.176.10)3c4549f6-e3b9-44a5-8e8b-5c1078ddaffa(ozone-datanode5-1.ozone_default/192.168.176.3)dbb2a07e-5b7d-4aef-a7cc-aed3134563ae(ozone-datanode10-1.ozone_default/192.168.176.6)05f06265-66e3-407d-9429-a31754686468(ozone-datanode4-1.ozone_default/192.168.176.5)e68158dc-6f86-4304-b78d-86c4fa93cd7d(ozone-datanode7-1.ozone_default/192.168.176.15)993705b9-1599-4901-a629-56fbc4c29971(ozone-datanode2-1.ozone_default/192.168.176.13)9a144484-a05a-42e4-813e-4aaccf390ea8(ozone-datanode9-1.ozone_default/192.168.176.12), ReplicationConfig: EC{rs-6-3-1024k}, State:ALLOCATED, leaderId:, CreationTimestamp2023-11-30T08:40:04.503388Z[UTC]]
> ozone-scm-1         | 2023-11-30 08:40:04,511 [IPC Server handler 20 on default port 9863] INFO algorithms.SCMContainerPlacementRackScatter: Chosen nodes: [8614d173-4001-46d4-a4e2-1a30339b8585(ozone-datanode1-1.ozone_default/192.168.176.10), 993705b9-1599-4901-a629-56fbc4c29971(ozone-datanode2-1.ozone_default/192.168.176.13), 05f06265-66e3-407d-9429-a31754686468(ozone-datanode4-1.ozone_default/192.168.176.5), 9a144484-a05a-42e4-813e-4aaccf390ea8(ozone-datanode9-1.ozone_default/192.168.176.12), 65fd7e45-140b-4524-b0e3-800ca5fb0724(ozone-datanode8-1.ozone_default/192.168.176.8), afd41e81-1ead-4c5a-b087-8f1bb69e2574(ozone-datanode6-1.ozone_default/192.168.176.2), e68158dc-6f86-4304-b78d-86c4fa93cd7d(ozone-datanode7-1.ozone_default/192.168.176.15), dbb2a07e-5b7d-4aef-a7cc-aed3134563ae(ozone-datanode10-1.ozone_default/192.168.176.6), 3c4549f6-e3b9-44a5-8e8b-5c1078ddaffa(ozone-datanode5-1.ozone_default/192.168.176.3)]. isPolicySatisfied: true.
> ozone-scm-1         | 2023-11-30 08:40:04,518 [IPC Server handler 20 on default port 9863] INFO pipeline.WritableECContainerProvider: Created and opened new pipeline Pipeline[ Id: 93539a72-b48d-4a22-8d0d-bac58d217e42, Nodes: 8614d173-4001-46d4-a4e2-1a30339b8585(ozone-datanode1-1.ozone_default/192.168.176.10)993705b9-1599-4901-a629-56fbc4c29971(ozone-datanode2-1.ozone_default/192.168.176.13)05f06265-66e3-407d-9429-a31754686468(ozone-datanode4-1.ozone_default/192.168.176.5)9a144484-a05a-42e4-813e-4aaccf390ea8(ozone-datanode9-1.ozone_default/192.168.176.12)65fd7e45-140b-4524-b0e3-800ca5fb0724(ozone-datanode8-1.ozone_default/192.168.176.8)afd41e81-1ead-4c5a-b087-8f1bb69e2574(ozone-datanode6-1.ozone_default/192.168.176.2)e68158dc-6f86-4304-b78d-86c4fa93cd7d(ozone-datanode7-1.ozone_default/192.168.176.15)dbb2a07e-5b7d-4aef-a7cc-aed3134563ae(ozone-datanode10-1.ozone_default/192.168.176.6)3c4549f6-e3b9-44a5-8e8b-5c1078ddaffa(ozone-datanode5-1.ozone_default/192.168.176.3), ReplicationConfig: EC{rs-6-3-1024k}, State:ALLOCATED, leaderId:, CreationTimestamp2023-11-30T08:40:04.511440Z[UTC]]
> ozone-scm-1         | 2023-11-30 08:40:04,518 [IPC Server handler 20 on default port 9863] INFO algorithms.SCMContainerPlacementRackScatter: Chosen nodes: [1f6131be-4cec-465d-a5cd-cf7b87824b7f(ozone-datanode3-1.ozone_default/192.168.176.7), afd41e81-1ead-4c5a-b087-8f1bb69e2574(ozone-datanode6-1.ozone_default/192.168.176.2), dbb2a07e-5b7d-4aef-a7cc-aed3134563ae(ozone-datanode10-1.ozone_default/192.168.176.6), 9a144484-a05a-42e4-813e-4aaccf390ea8(ozone-datanode9-1.ozone_default/192.168.176.12), 993705b9-1599-4901-a629-56fbc4c29971(ozone-datanode2-1.ozone_default/192.168.176.13), 8614d173-4001-46d4-a4e2-1a30339b8585(ozone-datanode1-1.ozone_default/192.168.176.10), 05f06265-66e3-407d-9429-a31754686468(ozone-datanode4-1.ozone_default/192.168.176.5), 3c4549f6-e3b9-44a5-8e8b-5c1078ddaffa(ozone-datanode5-1.ozone_default/192.168.176.3), e68158dc-6f86-4304-b78d-86c4fa93cd7d(ozone-datanode7-1.ozone_default/192.168.176.15)]. isPolicySatisfied: true.
> ozone-scm-1         | 2023-11-30 08:40:04,529 [IPC Server handler 20 on default port 9863] INFO pipeline.WritableECContainerProvider: Created and opened new pipeline Pipeline[ Id: 8f8cce33-8631-4e42-b3ac-34f8708be23a, Nodes: 1f6131be-4cec-465d-a5cd-cf7b87824b7f(ozone-datanode3-1.ozone_default/192.168.176.7)afd41e81-1ead-4c5a-b087-8f1bb69e2574(ozone-datanode6-1.ozone_default/192.168.176.2)dbb2a07e-5b7d-4aef-a7cc-aed3134563ae(ozone-datanode10-1.ozone_default/192.168.176.6)9a144484-a05a-42e4-813e-4aaccf390ea8(ozone-datanode9-1.ozone_default/192.168.176.12)993705b9-1599-4901-a629-56fbc4c29971(ozone-datanode2-1.ozone_default/192.168.176.13)8614d173-4001-46d4-a4e2-1a30339b8585(ozone-datanode1-1.ozone_default/192.168.176.10)05f06265-66e3-407d-9429-a31754686468(ozone-datanode4-1.ozone_default/192.168.176.5)3c4549f6-e3b9-44a5-8e8b-5c1078ddaffa(ozone-datanode5-1.ozone_default/192.168.176.3)e68158dc-6f86-4304-b78d-86c4fa93cd7d(ozone-datanode7-1.ozone_default/192.168.176.15), ReplicationConfig: EC{rs-6-3-1024k}, State:ALLOCATED, leaderId:, CreationTimestamp2023-11-30T08:40:04.518973Z[UTC]]
> {code}
> But pipelines's reserved datanodes are unavailable to create new containers:
> {code}
> ozone-datanode8-1   | 2023-11-30 08:40:06,062 [65fd7e45-140b-4524-b0e3-800ca5fb0724-ChunkReader-6] INFO volume.CapacityVolumeChoosingPolicy: No volumes have enough space for a new container.  Most available space: 894656512 bytes; required space: 1073741824, volumes: {/data/hdds/hdds=free: 1968193536, committed: 1073537024}
> ozone-datanode1-1   | 2023-11-30 08:40:06,063 [8614d173-4001-46d4-a4e2-1a30339b8585-ChunkReader-5] INFO volume.CapacityVolumeChoosingPolicy: No volumes have enough space for a new container.  Most available space: 894656512 bytes; required space: 1073741824, volumes: {/data/hdds/hdds=free: 1968193536, committed: 1073537024}
> ozone-datanode9-1   | 2023-11-30 08:40:06,070 [9a144484-a05a-42e4-813e-4aaccf390ea8-ChunkReader-4] INFO volume.CapacityVolumeChoosingPolicy: No volumes have enough space for a new container.  Most available space: 894656512 bytes; required space: 1073741824, volumes: {/data/hdds/hdds=free: 1968398336, committed: 1073741824}
> ozone-datanode2-1   | 2023-11-30 08:40:06,078 [993705b9-1599-4901-a629-56fbc4c29971-ChunkReader-6] INFO volume.CapacityVolumeChoosingPolicy: No volumes have enough space for a new container.  Most available space: 894656512 bytes; required space: 1073741824, volumes: {/data/hdds/hdds=free: 1968193536, committed: 1073537024}
> ozone-datanode3-1   | 2023-11-30 08:40:06,093 [1f6131be-4cec-465d-a5cd-cf7b87824b7f-ChunkReader-4] INFO volume.CapacityVolumeChoosingPolicy: No volumes have enough space for a new container.  Most available space: 894656512 bytes; required space: 1073741824, volumes: {/data/hdds/hdds=free: 1968398336, committed: 1073741824}
> ozone-datanode5-1   | 2023-11-30 08:40:06,102 [3c4549f6-e3b9-44a5-8e8b-5c1078ddaffa-ChunkReader-4] INFO volume.CapacityVolumeChoosingPolicy: No volumes have enough space for a new container.  Most available space: 894656512 bytes; required space: 1073741824, volumes: {/data/hdds/hdds=free: 1968398336, committed: 1073741824}
> ozone-datanode6-1   | 2023-11-30 08:40:06,136 [afd41e81-1ead-4c5a-b087-8f1bb69e2574-ChunkReader-3] INFO volume.CapacityVolumeChoosingPolicy: No volumes have enough space for a new container.  Most available space: 894656512 bytes; required space: 1073741824, volumes: {/data/hdds/hdds=free: 1968398336, committed: 1073741824}
> ozone-datanode10-1  | 2023-11-30 08:40:06,150 [dbb2a07e-5b7d-4aef-a7cc-aed3134563ae-ChunkReader-4] INFO volume.CapacityVolumeChoosingPolicy: No volumes have enough space for a new container.  Most available space: 898863104 bytes; required space: 1073741824, volumes: {/data/hdds/hdds=free: 1972604928, committed: 1073741824}
> ozone-datanode8-1   | 2023-11-30 08:40:06,239 [65fd7e45-140b-4524-b0e3-800ca5fb0724-ChunkReader-6] WARN keyvalue.KeyValueHandler: Operation: CreateContainer , Trace ID:  , Message: Container creation failed, due to disk out of space , Result: DISK_OUT_OF_SPACE , StorageContainerException Occurred.
> ozone-datanode8-1   | org.apache.hadoop.hdds.scm.container.common.helpers.StorageContainerException: Container creation failed, due to disk out of space
> ozone-datanode8-1   | 	at org.apache.hadoop.ozone.container.keyvalue.KeyValueContainer.create(KeyValueContainer.java:162)
> ozone-datanode8-1   | 	at org.apache.hadoop.ozone.container.keyvalue.KeyValueHandler.handleCreateContainer(KeyValueHandler.java:367)
> ozone-datanode8-1   | 	at org.apache.hadoop.ozone.container.keyvalue.KeyValueHandler.dispatchRequest(KeyValueHandler.java:239)
> ozone-datanode8-1   | 	at org.apache.hadoop.ozone.container.keyvalue.KeyValueHandler.handle(KeyValueHandler.java:222)
> ozone-datanode8-1   | 	at org.apache.hadoop.ozone.container.common.impl.HddsDispatcher.createContainer(HddsDispatcher.java:469)
> ozone-datanode8-1   | 	at org.apache.hadoop.ozone.container.common.impl.HddsDispatcher.dispatchRequest(HddsDispatcher.java:275)
> ozone-datanode8-1   | 	at org.apache.hadoop.ozone.container.common.impl.HddsDispatcher.lambda$dispatch$0(HddsDispatcher.java:179)
> ozone-datanode8-1   | 	at org.apache.hadoop.hdds.server.OzoneProtocolMessageDispatcher.processRequest(OzoneProtocolMessageDispatcher.java:89)
> ozone-datanode8-1   | 	at org.apache.hadoop.ozone.container.common.impl.HddsDispatcher.dispatch(HddsDispatcher.java:178)
> ozone-datanode8-1   | 	at org.apache.hadoop.ozone.container.common.transport.server.GrpcXceiverService$1.onNext(GrpcXceiverService.java:57)
> ozone-datanode8-1   | 	at org.apache.hadoop.ozone.container.common.transport.server.GrpcXceiverService$1.onNext(GrpcXceiverService.java:50)
> ozone-datanode8-1   | 	at org.apache.ratis.thirdparty.io.grpc.stub.ServerCalls$StreamingServerCallHandler$StreamingServerCallListener.onMessage(ServerCalls.java:262)
> ozone-datanode8-1   | 	at org.apache.ratis.thirdparty.io.grpc.ForwardingServerCallListener.onMessage(ForwardingServerCallListener.java:33)
> ozone-datanode8-1   | 	at org.apache.hadoop.hdds.tracing.GrpcServerInterceptor$1.onMessage(GrpcServerInterceptor.java:49)
> ozone-datanode8-1   | 	at org.apache.ratis.thirdparty.io.grpc.internal.ServerCallImpl$ServerStreamListenerImpl.messagesAvailableInternal(ServerCallImpl.java:329)
> ozone-datanode8-1   | 	at org.apache.ratis.thirdparty.io.grpc.internal.ServerCallImpl$ServerStreamListenerImpl.messagesAvailable(ServerCallImpl.java:314)
> ozone-datanode8-1   | 	at org.apache.ratis.thirdparty.io.grpc.internal.ServerImpl$JumpToApplicationThreadServerStreamListener$1MessagesAvailable.runInContext(ServerImpl.java:833)
> ozone-datanode8-1   | 	at org.apache.ratis.thirdparty.io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37)
> ozone-datanode8-1   | 	at org.apache.ratis.thirdparty.io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:133)
> ozone-datanode8-1   | 	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
> ozone-datanode8-1   | 	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
> ozone-datanode8-1   | 	at java.base/java.lang.Thread.run(Thread.java:829)
> ozone-datanode8-1   | Caused by: org.apache.hadoop.util.DiskChecker$DiskOutOfSpaceException: No volumes have enough space for a new container.  Most available space: 894656512 bytes
> ozone-datanode8-1   | 	at org.apache.hadoop.ozone.container.common.volume.VolumeChoosingUtil.throwDiskOutOfSpace(VolumeChoosingUtil.java:38)
> ozone-datanode8-1   | 	at org.apache.hadoop.ozone.container.common.volume.CapacityVolumeChoosingPolicy.chooseVolume(CapacityVolumeChoosingPolicy.java:68)
> ozone-datanode8-1   | 	at org.apache.hadoop.ozone.container.keyvalue.KeyValueContainer.create(KeyValueContainer.java:160)
> ozone-datanode8-1   | 	... 21 more
> {code}
> because the SCM and datanodes check the volume availability in a different manner:
> SCM (org.apache.hadoop.hdds.scm.container.placement.algorithms.SCMContainerPlacementRackScatter#chooseNode(java.lang.String, java.util.List<org.apache.hadoop.hdds.scm.net.Node>, long, long) -> org.apache.hadoop.hdds.scm.SCMCommonPlacementPolicy#isValidNode -> org.apache.hadoop.hdds.scm.SCMCommonPlacementPolicy#hasEnoughSpace)
> {code:java}
> if (dataSizeRequired > 0) {
>       for (StorageReportProto reportProto : datanodeInfo.getStorageReports()) {
>         if (reportProto.getRemaining() > dataSizeRequired) {
>           enoughForData = true;
>           break;
>         }
>       }
>     } else {
>       enoughForData = true;
>     }
> {code}
> Datanode (org.apache.hadoop.ozone.container.common.volume.AvailableSpaceFilter#test)
> {code:java}
> public boolean test(HddsVolume vol) {
>     long volumeCapacity = vol.getCapacity();
>     long free = vol.getAvailable();
>     long committed = vol.getCommittedBytes();
>     long available = free - committed;
>     long volumeFreeSpace =
>         VolumeUsage.getMinVolumeFreeSpace(vol.getConf(), volumeCapacity);
>     boolean hasEnoughSpace =
>         available > Math.max(requiredSpace, volumeFreeSpace);
>     mostAvailableSpace = Math.max(available, mostAvailableSpace);
>     if (!hasEnoughSpace) {
>       fullVolumes.put(vol, new AvailableSpace(free, committed));
>     }
>     return hasEnoughSpace;
>   }
> {code}
> The SCM doesn't take into account the committed space and guesses that the datanode is available to allocate new containers but it's not



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org