You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@accumulo.apache.org by "Samudrala, Ranganath [USA] via user" <us...@accumulo.apache.org> on 2023/02/02 18:03:30 UTC
Re: [External] Re: Accumulo with S3
This was very helpful.
Thanks
Ranga
________________________________
From: Jeff Kubina <je...@gmail.com>
Sent: Friday, January 20, 2023 2:03 PM
To: user@accumulo.apache.org <us...@accumulo.apache.org>; Samudrala, Ranganath [USA] <Sa...@bah.com>
Subject: Re: [External] Re: Accumulo with S3
You might want to look at this repo https://github.com/Accumulo-S3/accumulo-s3-fs/tree/main<https://urldefense.com/v3/__https://github.com/Accumulo-S3/accumulo-s3-fs/tree/main__;!!May37g!NDIejkXBbAysPILQ__i2D0BIn9-CQEMmQNGr_eca3v5NE9Wwe-ghLKDOUuyUKMQJeSAUPj1aB5go_WLxRjU_cMfwUw$>
Jeff
On Fri, Jan 20, 2023 at 1:02 PM Samudrala, Ranganath [USA] via user <us...@accumulo.apache.org>> wrote:
In the accumulo-env.sh, we are setting the location of HADOOP_CONF_DIR as below and adding that to classpath.
## Accumulo logs directory. Referenced by logger config.
ACCUMULO_LOG_DIR="${ACCUMULO_LOG_DIR:-${basedir}/logs}"
## Hadoop installation
HADOOP_HOME="${HADOOP_HOME:-/opt/hadoop}"
## Hadoop configuration
HADOOP_CONF_DIR="${HADOOP_CONF_DIR:-${HADOOP_HOME}/etc/hadoop}"
## Zookeeper installation
ZOOKEEPER_HOME="${ZOOKEEPER_HOME:-/opt/zookeeper}"
.
.
CLASSPATH="${CLASSPATH}:${lib}/*:${HADOOP_CONF_DIR}:${ZOOKEEPER_HOME}/*:${ZK_JARS}:${HADOOP_HOME}/share/hadoop/client/*:${HADOOP_HOME}/share/hadoop/common/*:${HADOOP_HOME}/share/hadoop/hdfs/*"
export CLASSPATH
.
.
From: Arvind Shyamsundar <ar...@microsoft.com>>
Date: Friday, January 20, 2023 at 12:55 PM
To: user@accumulo.apache.org<ma...@accumulo.apache.org> <us...@accumulo.apache.org>>, Samudrala, Ranganath [USA] <Sa...@bah.com>>
Subject: RE: [External] Re: Accumulo with S3
Vaguely rings a bell - in case it’s a classpath issue – double check that your accumulo-site includes the HADOOP_CONF folder in the classpath: https://github.com/apache/fluo-muchos/blob/3c5d48958b27a6d38226aba286f1fb275aceac90/ansible/roles/accumulo/templates/accumulo-site.xml#L95<https://urldefense.com/v3/__https:/github.com/apache/fluo-muchos/blob/3c5d48958b27a6d38226aba286f1fb275aceac90/ansible/roles/accumulo/templates/accumulo-site.xml*L95__;Iw!!May37g!K5L6efrlKmDFfzjHAREcV4yOagD4waCp5g_YbFkC9FJ_P16K9OradkSiz8pA7Ss-5QUAmeuU433CFnj0pTNlaPfl$>
Arvind Shyamsundar (HE / HIM)
From: Samudrala, Ranganath [USA] via user <us...@accumulo.apache.org>>
Sent: Friday, January 20, 2023 9:46 AM
To: user@accumulo.apache.org<ma...@accumulo.apache.org>
Subject: Re: [External] Re: Accumulo with S3
The logic is using “org.apache.hadoop.fs.s3a.S3AFileSystem” as we can see in the stack trace. Shouldn’t this then be using S3 related configuration in HADOOP_CONF_DIR? In Hadoop’s core-site.xml, we have the S3 related configuration parameters as below:
<property>
<name>fs.s3a.endpoint</name>
<value>http://accumulo-minio:9000</value<https://urldefense.com/v3/__http:/accumulo-minio:9000*3c/value__;JQ!!May37g!K5L6efrlKmDFfzjHAREcV4yOagD4waCp5g_YbFkC9FJ_P16K9OradkSiz8pA7Ss-5QUAmeuU433CFnj0pYwwuuI1$>>
</property>
<property>
<name>fs.s3a.access.key</name>
<value>YYYYYYY</value>
</property>
<property>
<name>fs.s3a.secret.key</name>
<value>XXXXXXX</value>
</property>
So, why do we need to create AWS credentials file? Where do we create it and what is the format?
Thanks
Ranga
From: Christopher <ct...@apache.org>>
Date: Friday, January 20, 2023 at 12:19 PM
To: accumulo-user <us...@accumulo.apache.org>>, Samudrala, Ranganath [USA] <Sa...@bah.com>>
Subject: [External] Re: Accumulo with S3
Based on the error message, it looks like you might need to configure each of the Accumulo nodes with the AWS credentials file.
On Fri, Jan 20, 2023, 11:43 Samudrala, Ranganath [USA] via user <us...@accumulo.apache.org>> wrote:
Hello again!
Next problem I am facing is configuring Minio S3 with Accumulo. I am referring to this document: https://accumulo.apache.org/blog/2019/09/10/accumulo-S3-notes.html<https://urldefense.com/v3/__https:/nam06.safelinks.protection.outlook.com/?url=https*3A*2F*2Furldefense.com*2Fv3*2F__https*3A*2Faccumulo.apache.org*2Fblog*2F2019*2F09*2F10*2Faccumulo-S3-notes.html__*3B!!May37g!Npq_ufPiXLCon5b0bmFXpSF0_wq62PJ3jqlGWCzr2IrUJ7AuKJy9nTyTWBQEHcDq856CEyCDZlbZctvxmZ5CmUU*24&data=05*7C01*7Carvindsh*40microsoft.com*7C44b7a6139e684ed7b27308dafb0e3d2c*7C72f988bf86f141af91ab2d7cd011db47*7C1*7C0*7C638098335840910362*7CUnknown*7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0*3D*7C2000*7C*7C*7C&sdata=kTllBwgw*2BHw2zPqRZm9Q*2FKcDHhCq2XqoWTbo72nyQqY*3D&reserved=0__;JSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSU!!May37g!K5L6efrlKmDFfzjHAREcV4yOagD4waCp5g_YbFkC9FJ_P16K9OradkSiz8pA7Ss-5QUAmeuU433CFnj0paNfNZ1j$>
I have already invoked the command “accumulo init” with and without the option “–upload-accumulo-props” and using accumulo.properties as below:
instance.volumes= hdfs://accumulo-hdfs-namenode-0.accumulo-hdfs-namenodes:8020/accumulo
instance.zookeeper.host=accumulo-zookeeper
general.volume.chooser=org.apache.accumulo.core.spi.fs.PreferredVolumeChooser
general.custom.volume.preferred.logger=hdfs://accumulo-hdfs-namenode-0.accumulo-hdfs-namenodes:8020/accumulo
general.custom.volume.preferred.default= hdfs://accumulo-hdfs-namenode-0.accumulo-hdfs-namenodes:8020/accumulo
Next, when I run the command “accumulo init –add-volumes” with accumulo.properties is as below:
instance.volumes=s3a://minio-s3/accumulo,hdfs://accumulo-hdfs-namenode-0.accumulo-hdfs-namenodes:8020/accumulo
instance.zookeeper.host=accumulo-zookeeper
general.volume.chooser=org.apache.accumulo.core.spi.fs.PreferredVolumeChooser
general.custom.volume.preferred.logger=hdfs://accumulo-hdfs-namenode-0.accumulo-hdfs-namenodes:8020/accumulo
general.custom.volume.preferred.default=s3a://minio-s3/accumulo
I see error as below:
ERROR StatusLogger An exception occurred processing Appender MonitorLog
java.lang.RuntimeException: Can't tell if Accumulo is initialized; can't read instance id at s3a://minio-s3/accumulo/instance_id
at org.apache.accumulo.server.fs.VolumeManager.getInstanceIDFromHdfs(VolumeManager.java:229)
at org.apache.accumulo.server.ServerInfo.<init>(ServerInfo.java:102)
at org.apache.accumulo.server.ServerContext.<init>(ServerContext.java:106)
at org.apache.accumulo.monitor.util.logging.AccumuloMonitorAppender.lambda$new$1(AccumuloMonitorAppender.java:93)
at org.apache.accumulo.monitor.util.logging.AccumuloMonitorAppender.append(AccumuloMonitorAppender.java:111)
at org.apache.logging.log4j.core.config.AppenderControl.tryCallAppender(AppenderControl.java:161)
at org.apache.logging.log4j.core.config.AppenderControl.callAppender0(AppenderControl.java:134)
at org.apache.logging.log4j.core.config.AppenderControl.callAppenderPreventRecursion(AppenderControl.java:125)
at org.apache.logging.log4j.core.config.AppenderControl.callAppender(AppenderControl.java:89)
at org.apache.logging.log4j.core.config.LoggerConfig.callAppenders(LoggerConfig.java:683)
at org.apache.logging.log4j.core.config.LoggerConfig.processLogEvent(LoggerConfig.java:641)
at org.apache.logging.log4j.core.config.LoggerConfig.log(LoggerConfig.java:624)
at org.apache.logging.log4j.core.config.LoggerConfig.logParent(LoggerConfig.java:674)
at org.apache.logging.log4j.core.config.LoggerConfig.processLogEvent(LoggerConfig.java:643)
at org.apache.logging.log4j.core.config.LoggerConfig.log(LoggerConfig.java:624)
at org.apache.logging.log4j.core.config.LoggerConfig.log(LoggerConfig.java:612)
at org.apache.logging.log4j.core.config.AwaitCompletionReliabilityStrategy.log(AwaitCompletionReliabilityStrategy.java:98)
at org.apache.logging.log4j.core.async.AsyncLogger.actualAsyncLog(AsyncLogger.java:488)
at org.apache.logging.log4j.core.async.RingBufferLogEvent.execute(RingBufferLogEvent.java:156)
at org.apache.logging.log4j.core.async.RingBufferLogEventHandler.onEvent(RingBufferLogEventHandler.java:51)
at org.apache.logging.log4j.core.async.RingBufferLogEventHandler.onEvent(RingBufferLogEventHandler.java:29)
at com.lmax.disruptor.BatchEventProcessor.processEvents(BatchEventProcessor.java:168)
at com.lmax.disruptor.BatchEventProcessor.run(BatchEventProcessor.java:125)
at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: java.nio.file.AccessDeniedException: s3a://minio-s3/accumulo/instance_id: listStatus on s3a://minio-s3/accumulo/instance_id: com.amazonaws.services.s3.model.AmazonS3Exception: The AWS Access Key Id you provided does not exist in our records. (Service: Amazon S3; Status Code: 403; Error Code: InvalidAccessKeyId; Request ID: 71HC1ZM3D43W0H67; S3 Extended Request ID: OsRVgg057cm+M7EP+P069hY97mA6na8rkhnNVunVRTUmttCDc5Sm5aKqodS+oogU5/UupgsEy1A=; Proxy: null), S3 Extended Request ID: OsRVgg057cm+M7EP+P069hY97mA6na8rkhnNVunVRTUmttCDc5Sm5aKqodS+oogU5/UupgsEy1A=:InvalidAccessKeyId
at org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:255)
at org.apache.hadoop.fs.s3a.Invoker.once(Invoker.java:119)
at org.apache.hadoop.fs.s3a.S3AFileSystem.lambda$listStatus$21(S3AFileSystem.java:3263)
at org.apache.hadoop.fs.statistics.impl.IOStatisticsBinding.lambda$trackDurationOfOperation$5(IOStatisticsBinding.java:499)
at org.apache.hadoop.fs.statistics.impl.IOStatisticsBinding.trackDuration(IOStatisticsBinding.java:444)
at org.apache.hadoop.fs.s3a.S3AFileSystem.trackDurationAndSpan(S3AFileSystem.java:2337)
at org.apache.hadoop.fs.s3a.S3AFileSystem.trackDurationAndSpan(S3AFileSystem.java:2356)
at org.apache.hadoop.fs.s3a.S3AFileSystem.listStatus(S3AFileSystem.java:3262)
at org.apache.accumulo.server.fs.VolumeManager.getInstanceIDFromHdfs(VolumeManager.java:211)
... 23 more
Caused by: com.amazonaws.services.s3.model.AmazonS3Exception: The AWS Access Key Id you provided does not exist in our records. (Service: Amazon S3; Status Code: 403; Error Code: InvalidAccessKeyId; Request ID: 71HC1ZM3D43W0H67; S3 Extended Request ID: OsRVgg057cm+M7EP+P069hY97mA6na8rkhnNVunVRTUmttCDc5Sm5aKqodS+oogU5/UupgsEy1A=; Proxy: null), S3 Extended Request ID: OsRVgg057cm+M7EP+P069hY97mA6na8rkhnNVunVRTUmttCDc5Sm5aKqodS+oogU5/UupgsEy1A=
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleErrorResponse(AmazonHttpClient.java:1879)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleServiceErrorResponse(AmazonHttpClient.java:1418)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1387)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1157)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:814)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:781)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:755)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:715)
at com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:697)
at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:561)
at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:541)
at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:5456)
at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:5403)
at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:5397)
at com.amazonaws.services.s3.AmazonS3Client.listObjectsV2(AmazonS3Client.java:971)
at org.apache.hadoop.fs.s3a.S3AFileSystem.lambda$listObjects$11(S3AFileSystem.java:2595)
at org.apache.hadoop.fs.statistics.impl.IOStatisticsBinding.lambda$trackDurationOfOperation$5(IOStatisticsBinding.java:499)
at org.apache.hadoop.fs.s3a.Invoker.retryUntranslated(Invoker.java:414)
at org.apache.hadoop.fs.s3a.Invoker.retryUntranslated(Invoker.java:377)
at org.apache.hadoop.fs.s3a.S3AFileSystem.listObjects(S3AFileSystem.java:2586)
at org.apache.hadoop.fs.s3a.S3AFileSystem$ListingOperationCallbacksImpl.lambda$listObjectsAsync$0(S3AFileSystem.java:2153)
at org.apache.hadoop.fs.s3a.impl.CallableSupplier.get(CallableSupplier.java:87)
at java.base/java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1700)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
... 1 more
When I invoke commands from HDFS, I see no problems though:
* hdfs dfs -fs s3a://minio-s3 -ls /
2023-01-20 16:38:51,319 DEBUG [s3a-transfer-minio-s3-unbounded-pool2-t1] [com.amazonaws.services.s3.model.transform.XmlResponsesSaxParser]: Sanitizing XML document destined for handler class com.amazonaws.services.s3.model.transform.XmlResponsesSaxParser$ListObjectsV2Handler
2023-01-20 16:38:51,321 DEBUG [s3a-transfer-minio-s3-unbounded-pool2-t1] [com.amazonaws.thirdparty.apache.http.impl.conn.PoolingHttpClientConnectionManager]: Connection [id: 0][route: {}->http://accumulo-minio:9000<https://urldefense.com/v3/__https:/nam06.safelinks.protection.outlook.com/?url=https*3A*2F*2Furldefense.com*2Fv3*2F__http*3A*2Faccumulo-minio*3A9000__*3B!!May37g!Npq_ufPiXLCon5b0bmFXpSF0_wq62PJ3jqlGWCzr2IrUJ7AuKJy9nTyTWBQEHcDq856CEyCDZlbZctvxtBLkmDw*24&data=05*7C01*7Carvindsh*40microsoft.com*7C44b7a6139e684ed7b27308dafb0e3d2c*7C72f988bf86f141af91ab2d7cd011db47*7C1*7C0*7C638098335840910362*7CUnknown*7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0*3D*7C2000*7C*7C*7C&sdata=ggWzEX3hdXN*2Fx*2FBKB*2BfHhu8MIYjUVPqi2*2Bp8PuBtgrA*3D&reserved=0__;JSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUl!!May37g!K5L6efrlKmDFfzjHAREcV4yOagD4waCp5g_YbFkC9FJ_P16K9OradkSiz8pA7Ss-5QUAmeuU433CFnj0pUs3Eihy$>] can be kept alive for 60.0 seconds
2023-01-20 16:38:51,321 DEBUG [s3a-transfer-minio-s3-unbounded-pool2-t1] [com.amazonaws.thirdparty.apache.http.impl.conn.DefaultManagedHttpClientConnection]: http-outgoing-0: set socket timeout to 0
2023-01-20 16:38:51,321 DEBUG [s3a-transfer-minio-s3-unbounded-pool2-t1] [com.amazonaws.thirdparty.apache.http.impl.conn.PoolingHttpClientConnectionManager]: Connection released: [id: 0][route: {}->http://accumulo-minio:9000][total<https://urldefense.com/v3/__https:/nam06.safelinks.protection.outlook.com/?url=https*3A*2F*2Furldefense.com*2Fv3*2F__http*3A*2Faccumulo-minio*3A9000**AAtotal__*3BXVs!!May37g!Npq_ufPiXLCon5b0bmFXpSF0_wq62PJ3jqlGWCzr2IrUJ7AuKJy9nTyTWBQEHcDq856CEyCDZlbZctvxg0_Hq0M*24&data=05*7C01*7Carvindsh*40microsoft.com*7C44b7a6139e684ed7b27308dafb0e3d2c*7C72f988bf86f141af91ab2d7cd011db47*7C1*7C0*7C638098335840910362*7CUnknown*7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0*3D*7C2000*7C*7C*7C&sdata=2Z3tzhihfDQEOmebKYRmxNPnN1QzgInxwldR6*2FxCsQ4*3D&reserved=0__;JSUlJSUlJSUqKiUlJSUlJSUlJSUlJSUlJSUlJSU!!May37g!K5L6efrlKmDFfzjHAREcV4yOagD4waCp5g_YbFkC9FJ_P16K9OradkSiz8pA7Ss-5QUAmeuU433CFnj0pTRIEhuu$> available: 1; route allocated: 1 of 128; total allocated: 1 of 128]
2023-01-20 16:38:51,321 DEBUG [s3a-transfer-minio-s3-unbounded-pool2-t1] [com.amazonaws.services.s3.model.transform.XmlResponsesSaxParser]: Parsing XML response document with handler: class com.amazonaws.services.s3.model.transform.XmlResponsesSaxParser$ListObjectsV2Handler
2023-01-20 16:38:51,328 DEBUG [s3a-transfer-minio-s3-unbounded-pool2-t1] [com.amazonaws.services.s3.model.transform.XmlResponsesSaxParser]: Examining listing for bucket: minio-s3
2023-01-20 16:38:51,329 DEBUG [s3a-transfer-minio-s3-unbounded-pool2-t1] [com.amazonaws.request]: Received successful response: 200, AWS Request ID: 173C11CC6FEF29A0
2023-01-20 16:38:51,329 DEBUG [s3a-transfer-minio-s3-unbounded-pool2-t1] [com.amazonaws.requestId]: x-amzn-RequestId: not available
2023-01-20 16:38:51,329 DEBUG [s3a-transfer-minio-s3-unbounded-pool2-t1] [com.amazonaws.requestId]: AWS Request ID: 173C11CC6FEF29A0
2023-01-20 16:38:51,338 DEBUG [s3a-transfer-minio-s3-unbounded-pool2-t1] [com.amazonaws.latency]: ServiceName=[Amazon S3], StatusCode=[200], ServiceEndpoint=[http://accumulo-minio:9000<https://urldefense.com/v3/__https:/nam06.safelinks.protection.outlook.com/?url=https*3A*2F*2Furldefense.com*2Fv3*2F__http*3A*2Faccumulo-minio*3A9000__*3B!!May37g!Npq_ufPiXLCon5b0bmFXpSF0_wq62PJ3jqlGWCzr2IrUJ7AuKJy9nTyTWBQEHcDq856CEyCDZlbZctvxtBLkmDw*24&data=05*7C01*7Carvindsh*40microsoft.com*7C44b7a6139e684ed7b27308dafb0e3d2c*7C72f988bf86f141af91ab2d7cd011db47*7C1*7C0*7C638098335840910362*7CUnknown*7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0*3D*7C2000*7C*7C*7C&sdata=ggWzEX3hdXN*2Fx*2FBKB*2BfHhu8MIYjUVPqi2*2Bp8PuBtgrA*3D&reserved=0__;JSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUlJSUl!!May37g!K5L6efrlKmDFfzjHAREcV4yOagD4waCp5g_YbFkC9FJ_P16K9OradkSiz8pA7Ss-5QUAmeuU433CFnj0pUs3Eihy$>], RequestType=[ListObjectsV2Request], AWSRequestID=[173C11CC6FEF29A0], HttpClientPoolPendingCount=0, RetryCapacityConsumed=0, HttpClientPoolAvailableCount=0, RequestCount=1, HttpClientPoolLeasedCount=0, ResponseProcessingTime=[71.198], ClientExecuteTime=[297.496], HttpClientSendRequestTime=[7.255], HttpRequestTime=[119.87], ApiCallLatency=[279.779], RequestSigningTime=[56.006], CredentialsRequestTime=[5.091, 0.015], HttpClientReceiveResponseTime=[12.849]