You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-issues@hadoop.apache.org by GitBox <gi...@apache.org> on 2019/03/27 18:10:25 UTC
[GitHub] [hadoop] noslowerdna commented on a change in pull request #647: HADOOP-16118. S3Guard to support on-demand DDB tables.

noslowerdna commented on a change in pull request #647: HADOOP-16118. S3Guard to support on-demand DDB tables.
URL: https://github.com/apache/hadoop/pull/647#discussion_r269702082
 
 

 ##########
 File path: hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/s3guard.md
 ##########
 @@ -906,22 +909,102 @@ If operations, especially directory operations, are slow, check the AWS
 console. It is also possible to set up AWS alerts for capacity limits
 being exceeded.
 
+### <a name="on-demand"></a> On-Demand Dynamo Capacity
+
+[Amazon DynamoDB On-Demand](https://aws.amazon.com/blogs/aws/amazon-dynamodb-on-demand-no-capacity-planning-and-pay-per-request-pricing/)
+removes the need to pre-allocate I/O capacity for S3Guard tables.
+Instead the caller is _only_ charged per I/O Operation.
+
+* There are no SLA capacity guarantees. This is generally not an issue
+for S3Guard applications.
+* There's no explicit limit on I/O capacity, so operations which make
+heavy use of S3Guard tables (for example: SQL query planning) do not
+get throttled.
+* There's no way put a limit on the I/O; you may unintentionally run up
+large bills through sustained heavy load.
+* The `s3guard set-capacity` command fails: it does not make sense any more.
+
+When idle, S3Guard tables are only billed for the data stored, not for
+any unused capacity. For this reason, there is no benefit from sharing
+a single S3Guard table across multiple buckets.
+
+*Enabling DynamoDB On-Demand for a S3Guard table*
+
+You cannot currently enable DynamoDB on-demand from the `s3guard` command
+when creating or updating a bucket.
+
+Instead it must be done through the AWS console or [the CLI](https://docs.aws.amazon.com/cli/latest/reference/dynamodb/update-table.html).
+From the Web console or the command line, switch the billing to pay-per-request.
+
+Once enabled, the read and write capacities of the table listed in the
+`hadoop s3guard bucket-info` command become "0", and the "billing-mode"
+attribute changes to "per-request":
+
+```
+> hadoop s3guard bucket-info s3a://example-bucket/
+
+Filesystem s3a://example-bucket
+Location: eu-west-1
+Filesystem s3a://example-bucket is using S3Guard with store
+  DynamoDBMetadataStore{region=eu-west-1, tableName=example-bucket,
+  tableArn=arn:aws:dynamodb:eu-west-1:11111122223333:table/example-bucket}
+Authoritative S3Guard: fs.s3a.metadatastore.authoritative=false
+Metadata Store Diagnostics:
+  ARN=arn:aws:dynamodb:eu-west-1:11111122223333:table/example-bucket
+  billing-mode=per-request
+  description=S3Guard metadata store in DynamoDB
+  name=example-bucket
+  persist.authoritative.bit=true
+  read-capacity=0
+  region=eu-west-1
+  retryPolicy=ExponentialBackoffRetry(maxRetries=9, sleepTime=250 MILLISECONDS)
+  size=66797
+  status=ACTIVE
+  table={AttributeDefinitions:
+    [{AttributeName: child,AttributeType: S},
+     {AttributeName: parent,AttributeType: S}],
+     TableName: example-bucket,
+     KeySchema: [{
+       AttributeName: parent,KeyType: HASH},
+       {AttributeName: child,KeyType: RANGE}],
+     TableStatus: ACTIVE,
+     CreationDateTime: Thu Oct 11 18:51:14 BST 2018,
+     ProvisionedThroughput: {
+       LastIncreaseDateTime: Tue Oct 30 16:48:45 GMT 2018,
+       LastDecreaseDateTime: Tue Oct 30 18:00:03 GMT 2018,
+       NumberOfDecreasesToday: 0,
+       ReadCapacityUnits: 0,
+       WriteCapacityUnits: 0},
+     TableSizeBytes: 66797,
+     ItemCount: 415,
+     TableArn: arn:aws:dynamodb:eu-west-1:11111122223333:table/example-bucket,
+     TableId: a7b0728a-f008-4260-b2a0-aaaaabbbbb,}
+  write-capacity=0
+The "magic" committer is supported
+```
+
+### <a name="autoscaling"></a> Autoscaling S3Guard tables.
+
 [DynamoDB Auto Scaling](https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/AutoScaling.html)
 can automatically increase and decrease the allocated capacity.
-This is good for keeping capacity high when needed, but avoiding large
-bills when it is not.
+
+Before DynamoDB On-Demand was introduced, autoscaling was the sole form
+of dynamic scaling. 
 
 Experiments with S3Guard and DynamoDB Auto Scaling have shown that any Auto Scaling
 operation will only take place after callers have been throttled for a period of
 time. The clients will still need to be configured to retry when overloaded
 until any extra capacity is allocated. Furthermore, as this retrying will
-block the threads from performing other operations -including more IO, the
+block the threads from performing other operations -including more I/O, the
 the autoscale may not scale fast enough.
 
-We recommend experimenting with this, based on usage information collected
-from previous days, and and choosing a combination of
-retry counts and an interval which allow for the clients to cope with
-some throttling, but not to time out other applications.
+This is why the DynamoDB On-Demand appears to be a better option for
+workloads with Hadoop, Spark, Hive and othe rapplications.
 
 Review comment:
   typo, "other applications"

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org