You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@druid.apache.org by GitBox <gi...@apache.org> on 2021/12/21 14:07:36 UTC
[GitHub] [druid] alborotogarcia opened a new issue #12087: Deep storage on kubernetes
alborotogarcia opened a new issue #12087:
URL: https://github.com/apache/druid/issues/12087
Coordinator always restarts when I set minio/hdfs for deep storage
### Affected Version
v0.22.1
### Description
I'm new to druid, I see that in order to persist segments deep storage is needed.
As the docs say, It is needed to enable the "druid-s3-extensions" or "druid-hdfs-storage" extensions in the loadlist, so that it get sets from configmap.
In case of hdfs as deep storage it is also needed the core-site.xml and hdfs-site.xml but the coordinator pod always gets restarted with no trace.
Please include as much detailed information about the problem as possible.
- Cluster size
6 nodes
- Configurations in use
mostly defaults from helm/druid, everything is fine If I don't set s3/hdfs for deep storage
```
druid_storage_type: hdfs
druid_storage_storageDirectory: hdfs://hadoop-hdfs-nn.hdfs:8020/druid
# druid_storage_type: s3
# druid_storage_bucket: s3://druid
# druid_s3_endpointUrl: http://myminioinstance.svc.cluster.local:9000
# druid_s3_accessKey: miniokey
# druid_s3_secretKey: miniopass
```
- Steps to reproduce the problem
- The error message or stack traces encountered. Providing more context, such as nearby log messages or even entire logs, can be helpful.
- Any debugging that you have already done
I set core-site.xml and hdfs-site.xml as a configmap same as my hadoop deployment
```
apiVersion: v1
kind: ConfigMap
metadata:
name: hadoop
data:
core-site.xml: |
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://hadoop-hdfs-nn.hdfs:8020/</value>
<description>NameNode URI</description>
</property>
</configuration>
hdfs-site.xml: |
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration><property>
<name>dfs.webhdfs.enabled</name>
<value>true</value>
</property><property>
<name>dfs.datanode.use.datanode.hostname</name>
<value>true</value>
</property>
<property>
<name>dfs.client.use.datanode.hostname</name>
<value>true</value>
</property>
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:///root/hdfs/datanode</value>
<description>DataNode directory</description>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:///root/hdfs/namenode</value>
<description>NameNode directory for namespace and transaction logs storage.</description>
</property>
<property>
<name>dfs.namenode.datanode.registration.ip-hostname-check</name>
<value>false</value>
</property>
<!-- Bind to all interfaces -->
<property>
<name>dfs.namenode.rpc-bind-host</name>
<value>0.0.0.0</value>
</property>
<property>
<name>dfs.namenode.servicerpc-bind-host</name>
<value>0.0.0.0</value>
</property>
<!-- /Bind to all interfaces -->
</configuration>
```
So it gets mounted on the conmon subpath
```
volumeMounts:
- name: hadoop-config
mountPath: /opt/druid/conf/druid/cluster/_common/core-site.xml
subPath: core-site.xml
- name: hadoop-config
mountPath: /opt/druid/conf/druid/cluster/_common/hdfs-site.xml
subPath: hdfs-site.xml
volumes:
- name: hadoop-config
configMap:
name: hadoop
```
I tried creating my /druid root folder on hdfs just in case, though no difference so far..
```
~ k get svc -nhdfs
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
hadoop-hdfs-dn ClusterIP None <none> 9000/TCP,9864/TCP,8020/TCP 59m
hadoop-hdfs-nn ClusterIP None <none> 9000/TCP,9870/TCP,8020/TCP 59m
hadoop-yarn-nm ClusterIP None <none> 8088/TCP,8082/TCP,8042/TCP 59m
hadoop-yarn-rm ClusterIP None <none> 8088/TCP 59m
hadoop-yarn-ui ClusterIP 10.43.132.233 <none> 8088/TCP 59m
root@hadoop-hdfs-nn-0:/# hdfs dfs -ls /
Found 1 items
drwxrwxrwx - root supergroup 0 2021-12-21 13:21 /druid
```
Here is the coordinator trace..
```
+ druid druid-coordinator-6c8b48f5cd-nngjc › druid
druid druid-coordinator-6c8b48f5cd-nngjc druid 2021-12-21T14:47:19+0100 startup service coordinator
druid druid-coordinator-6c8b48f5cd-nngjc druid Setting druid.host=10.42.23.164 in /tmp/conf/druid/cluster/master/coordinator-overlord/runtime.properties
druid druid-coordinator-6c8b48f5cd-nngjc druid Setting druid.storage.type=hdfs in /tmp/conf/druid/cluster/master/coordinator-overlord/runtime.properties
druid druid-coordinator-6c8b48f5cd-nngjc druid Setting druid.metadata.storage.connector.connectURI=jdbc:postgresql://acid-minimal-cluster.storage:5432/druid in /tmp/conf/druid/cluster/master/coordinator-overlord/runtime.properties
druid druid-coordinator-6c8b48f5cd-nngjc druid Setting druid.extensions.loadList=["druid-histogram", "druid-datasketches", "druid-lookups-cached-global","postgresql-metadata-storage","druid-kafka-indexing-service","druid-kafka-extraction-namespace","druid-avro-extensions","druid-basic-security","druid-s3-extensions","druid-hdfs-storage"] in /tmp/conf/druid/cluster/master/coordinator-overlord/runtime.properties
druid druid-coordinator-6c8b48f5cd-nngjc druid Setting druid.indexer.logs.type=file in /tmp/conf/druid/cluster/master/coordinator-overlord/runtime.properties
druid druid-coordinator-6c8b48f5cd-nngjc druid Setting druid.indexer.logs.directory=/opt/data/indexing-logs in /tmp/conf/druid/cluster/master/coordinator-overlord/runtime.properties
druid druid-coordinator-6c8b48f5cd-nngjc druid Setting druid.zk.service.host=druid-zookeeper-headless:2181 in /tmp/conf/druid/cluster/master/coordinator-overlord/runtime.properties
druid druid-coordinator-6c8b48f5cd-nngjc druid Setting druid.metadata.storage.type=postgresql in /tmp/conf/druid/cluster/master/coordinator-overlord/runtime.properties
druid druid-coordinator-6c8b48f5cd-nngjc druid Setting druid.metadata.storage.connector.user=xxxxxxxx in /tmp/conf/druid/cluster/master/coordinator-overlord/runtime.properties
druid druid-coordinator-6c8b48f5cd-nngjc druid Setting druid.metadata.storage.connector.password=xxxxxxxxxxxxxxx in /tmp/conf/druid/cluster/master/coordinator-overlord/runtime.properties
druid druid-coordinator-6c8b48f5cd-nngjc druid Setting druid.storage.storageDirectory=hdfs://hadoop-hdfs-nn.hdfs:8020/druid in /tmp/conf/druid/cluster/master/coordinator-overlord/runtime.properties
- druid druid-coordinator-6c8b48f5cd-nngjc › druid
```
After a while it gets restarted
Please let me know If I there's more info I can provide, Sorry for the long issue !
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org
[GitHub] [druid] alborotogarcia closed issue #12087: Deep storage on kubernetes
Posted by GitBox <gi...@apache.org>.
alborotogarcia closed issue #12087:
URL: https://github.com/apache/druid/issues/12087
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org
[GitHub] [druid] alborotogarcia commented on issue #12087: Deep storage on kubernetes
Posted by GitBox <gi...@apache.org>.
alborotogarcia commented on issue #12087:
URL: https://github.com/apache/druid/issues/12087#issuecomment-999051841
Hey @fhennig thanks for the reply, yes I am aware of that however, local deep storage seems to turn into problems between historical and middle manager [see this issue](https://github.com/apache/druid/issues/10523#issuecomment-714201892)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org
[GitHub] [druid] fhennig commented on issue #12087: Deep storage on kubernetes
Posted by GitBox <gi...@apache.org>.
fhennig commented on issue #12087:
URL: https://github.com/apache/druid/issues/12087#issuecomment-998929620
Hey, there is also local deep storage, maybe you can use that instead: https://druid.apache.org/docs/latest/dependencies/deep-storage.html#local-mount
Fro what you wrote it seemed like you were unaware of that, maybe that helps you
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org
[GitHub] [druid] alborotogarcia edited a comment on issue #12087: Deep storage on kubernetes
Posted by GitBox <gi...@apache.org>.
alborotogarcia edited a comment on issue #12087:
URL: https://github.com/apache/druid/issues/12087#issuecomment-1000832314
Alright finally It got solved, by looking at the middle-manager logs, it seems that druid-s3-extensions needs to be properly tuned even if it's not used for deep storage (FWIW I was missing the aws zone, as I was trying to use minio s3 buckets, even if I finally stick to hdfs deep storage). Mind also the required write permisions on hdfs/s3.
Question @asdf2014 , even though I got all the segments listed on hdfs, not all the segments are available on druid after a while. How could anyone keep some segments for a while (eg.24h period) when ingesting from kafka? Should I read back from hdfs as another datasource?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org
[GitHub] [druid] alborotogarcia commented on issue #12087: Deep storage on kubernetes
Posted by GitBox <gi...@apache.org>.
alborotogarcia commented on issue #12087:
URL: https://github.com/apache/druid/issues/12087#issuecomment-1000832314
Alright finally It got solved, by looking at the middle-manager logs, it seems that druid-s3-extensions needs to be properly tuned even if it's not used for deep storage (FWIW I was missing the aws zone, as I was trying to use minio s3 buckets, even if I finally stick to hdfs deep storage). Mind also the required write permisions on hdfs/s3.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org