You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hbase.apache.org by Austin Heyne <ah...@ccri.com> on 2018/07/23 13:56:51 UTC

Ingest performance after upgrade

Hey all, hopefully this is a simple one.

We've migrated from HBase (on s3) 1.3.2 to 1.4.4, creating a fresh 
instance with all our configuration from the previous one with few 
changes. However, we're seeing ingest run ~100x slower than we saw on 
1.3.2 and the requests per second seem really low at around 3k total for 
a 10 node cluster. We're not seeing any atypical errors but it does seem 
we're seeing a ton of flushes, 1 about every 3 or 4 seconds but the logs 
indicate the flushes are full size ~128MB. I've included our 
configuration below.

Thanks for the help,
Austin

[
   {
     "classification": "hbase-site",
     "properties": {
       "fs.s3.consistent.retryPeriodSeconds": "10",
       "hbase.regionserver.thread.compaction.large": "3",
       "fs.s3.consistent.retryPolicyType": "fixed",
       "hbase.hstore.blockingStoreFiles": "1000",
       "fs.s3.consistent.throwExceptionOnInconsistency": "false",
       "hbase.bucketcache.size": "27000",
       "hbase.ipc.server.callqueue.read.ratio": "0.25",
       "hbase.bucketcache.combinedcache.enabled": "true",
       "fs.s3a.threads.max": "50",
       "hbase.regionserver.thread.compaction.small": "2",
       "hbase.hregion.memstore.flush.size": "134217728",
       "hbase.hregion.max.filesize": "21474836480",
       "hbase.regionserver.regionSplitLimit": "10000",
       "fs.s3.consistent.metadata.tableName": "redacted",
       "hbase.hstore.compaction.max": "1000",
       "hbase.regionserver.global.memstore.size": "0.4",
       "hbase.ipc.server.callqueue.handler.factor": "0.5",
       "hbase.regionserver.logroll.period": "100000",
       "hbase.hregion.majorcompaction": "0",
       "hbase.hstore.compactionThreshold": "1000",
       "hbase.hregion.memstore.mslab.enabled": "false",
       "hbase.regionserver.handler.count": "50",
       "fs.s3a.connection.maximum": "100",
       "hbase.hstore.flusher.count": "10",
       "hbase.hstore.blockingWaitTime": "0",
       "hbase.hregion.memstore.block.multiplier": "10",
       "hbase.bucketcache.ioengine": "offheap"
     }
   },
   {
     "configurations": [
       {
         "classification": "export",
         "properties": {
           "HBASE_REGIONSERVER_OPTS": "\"-Dcom.sun.management.jmxremote 
-Dcom.sun.management.jmxremote.authenticate\u003dfalse 
-Dcom.sun.management.jmxremote.port\u003d10102 
-Dcom.sun.management.jmxremote.ssl\u003dfalse -Xmx28G 
-XX:MaxDirectMemorySize\u003d28G\"",
           "HBASE_MASTER_OPTS": "\"-Dcom.sun.management.jmxremote 
-Dcom.sun.management.jmxremote.authenticate\u003dfalse 
-Dcom.sun.management.jmxremote.port\u003d10101 
-Dcom.sun.management.jmxremote.ssl\u003dfalse -Xmx28G 
-XX:MaxDirectMemorySize\u003d28G\""
         }
       }
     ],
     "classification": "hbase-env",
     "properties": {

     }
   },
   {
     "classification": "hbase-metrics",
     "properties": {
       "rpc.period": "60",
       "hbase.period": "60",
       "rpc.class": 
"org.apache.hadoop.metrics.spi.NullContextWithUpdateThread",
       "hbase.class": 
"org.apache.hadoop.metrics.spi.NullContextWithUpdateThread",
       "jvm.class": 
"org.apache.hadoop.metrics.spi.NullContextWithUpdateThread",
       "jvm.period": "60"
     }
   },
   {
     "classification": "emrfs-site",
     "properties": {
       "fs.s3.consistent": "true"
     }
   }
]

-- 
Austin L. Heyne

Re: Ingest performance after upgrade

Posted by Sean Busbey <bu...@apache.org>.

It sounds like it needs to get moved under the "what's changed" for 1.4 though.

On Tue, Jul 31, 2018 at 2:31 PM, Stack <st...@duboce.net> wrote:
> Thanks for coming back to the list with your finding Austin.
>
> Pleased to say we'd called this out in "Changed Metrics" under
> "Changes of Note" for hbase-2.0.0 [1], not that anyone reads the
> manual (smile).
>
> St.Ack
>
> 1. http://hbase.apache.org/book.html#_changes_of_note
> On Mon, Jul 30, 2018 at 8:58 AM Austin Heyne <ah...@ccri.com> wrote:
>>
>> I ran some benchmarks comparing the two and actual performance is the
>> same, the perception came from HBASE 18469 [1] where the request/s
>> numbers were changed for 1.4.0 and 2.0.0.
>>
>> Thanks,
>> Austin
>>
>> [1] https://issues.apache.org/jira/browse/HBASE-18469
>>
>>
>> On 07/26/2018 06:41 PM, Stack wrote:
>> > On Mon, Jul 23, 2018 at 6:57 AM Austin Heyne <ah...@ccri.com> wrote:
>> >
>> >> Hey all, hopefully this is a simple one.
>> >>
>> >> We've migrated from HBase (on s3) 1.3.2 to 1.4.4, creating a fresh
>> >> instance with all our configuration from the previous one with few
>> >> changes. However, we're seeing ingest run ~100x slower than we saw on
>> >> 1.3.2 and the requests per second seem really low at around 3k total for
>> >> a 10 node cluster. We're not seeing any atypical errors but it does seem
>> >> we're seeing a ton of flushes, 1 about every 3 or 4 seconds but the logs
>> >> indicate the flushes are full size ~128MB. I've included our
>> >> configuration below.
>> >>
>> >>
>> > Check old logs to see what rate you used to flush at?
>> >
>> > Is it possible that you are just writing way more data now or the data
>> > character is different now? Larger values?
>> >
>> > S
>> >
>> >
>> >
>> >> Thanks for the help,
>> >> Austin
>> >>
>> >> [
>> >>     {
>> >>       "classification": "hbase-site",
>> >>       "properties": {
>> >>         "fs.s3.consistent.retryPeriodSeconds": "10",
>> >>         "hbase.regionserver.thread.compaction.large": "3",
>> >>         "fs.s3.consistent.retryPolicyType": "fixed",
>> >>         "hbase.hstore.blockingStoreFiles": "1000",
>> >>         "fs.s3.consistent.throwExceptionOnInconsistency": "false",
>> >>         "hbase.bucketcache.size": "27000",
>> >>         "hbase.ipc.server.callqueue.read.ratio": "0.25",
>> >>         "hbase.bucketcache.combinedcache.enabled": "true",
>> >>         "fs.s3a.threads.max": "50",
>> >>         "hbase.regionserver.thread.compaction.small": "2",
>> >>         "hbase.hregion.memstore.flush.size": "134217728",
>> >>         "hbase.hregion.max.filesize": "21474836480",
>> >>         "hbase.regionserver.regionSplitLimit": "10000",
>> >>         "fs.s3.consistent.metadata.tableName": "redacted",
>> >>         "hbase.hstore.compaction.max": "1000",
>> >>         "hbase.regionserver.global.memstore.size": "0.4",
>> >>         "hbase.ipc.server.callqueue.handler.factor": "0.5",
>> >>         "hbase.regionserver.logroll.period": "100000",
>> >>         "hbase.hregion.majorcompaction": "0",
>> >>         "hbase.hstore.compactionThreshold": "1000",
>> >>         "hbase.hregion.memstore.mslab.enabled": "false",
>> >>         "hbase.regionserver.handler.count": "50",
>> >>         "fs.s3a.connection.maximum": "100",
>> >>         "hbase.hstore.flusher.count": "10",
>> >>         "hbase.hstore.blockingWaitTime": "0",
>> >>         "hbase.hregion.memstore.block.multiplier": "10",
>> >>         "hbase.bucketcache.ioengine": "offheap"
>> >>       }
>> >>     },
>> >>     {
>> >>       "configurations": [
>> >>         {
>> >>           "classification": "export",
>> >>           "properties": {
>> >>             "HBASE_REGIONSERVER_OPTS": "\"-Dcom.sun.management.jmxremote
>> >> -Dcom.sun.management.jmxremote.authenticate\u003dfalse
>> >> -Dcom.sun.management.jmxremote.port\u003d10102
>> >> -Dcom.sun.management.jmxremote.ssl\u003dfalse -Xmx28G
>> >> -XX:MaxDirectMemorySize\u003d28G\"",
>> >>             "HBASE_MASTER_OPTS": "\"-Dcom.sun.management.jmxremote
>> >> -Dcom.sun.management.jmxremote.authenticate\u003dfalse
>> >> -Dcom.sun.management.jmxremote.port\u003d10101
>> >> -Dcom.sun.management.jmxremote.ssl\u003dfalse -Xmx28G
>> >> -XX:MaxDirectMemorySize\u003d28G\""
>> >>           }
>> >>         }
>> >>       ],
>> >>       "classification": "hbase-env",
>> >>       "properties": {
>> >>
>> >>       }
>> >>     },
>> >>     {
>> >>       "classification": "hbase-metrics",
>> >>       "properties": {
>> >>         "rpc.period": "60",
>> >>         "hbase.period": "60",
>> >>         "rpc.class":
>> >> "org.apache.hadoop.metrics.spi.NullContextWithUpdateThread",
>> >>         "hbase.class":
>> >> "org.apache.hadoop.metrics.spi.NullContextWithUpdateThread",
>> >>         "jvm.class":
>> >> "org.apache.hadoop.metrics.spi.NullContextWithUpdateThread",
>> >>         "jvm.period": "60"
>> >>       }
>> >>     },
>> >>     {
>> >>       "classification": "emrfs-site",
>> >>       "properties": {
>> >>         "fs.s3.consistent": "true"
>> >>       }
>> >>     }
>> >> ]
>> >>
>> >> --
>> >> Austin L. Heyne
>> >>
>> >>
>>
>> --
>> Austin L. Heyne
>>

Re: Ingest performance after upgrade

Posted by Stack <st...@duboce.net>.

Thanks for coming back to the list with your finding Austin.

Pleased to say we'd called this out in "Changed Metrics" under
"Changes of Note" for hbase-2.0.0 [1], not that anyone reads the
manual (smile).

St.Ack

1. http://hbase.apache.org/book.html#_changes_of_note
On Mon, Jul 30, 2018 at 8:58 AM Austin Heyne <ah...@ccri.com> wrote:
>
> I ran some benchmarks comparing the two and actual performance is the
> same, the perception came from HBASE 18469 [1] where the request/s
> numbers were changed for 1.4.0 and 2.0.0.
>
> Thanks,
> Austin
>
> [1] https://issues.apache.org/jira/browse/HBASE-18469
>
>
> On 07/26/2018 06:41 PM, Stack wrote:
> > On Mon, Jul 23, 2018 at 6:57 AM Austin Heyne <ah...@ccri.com> wrote:
> >
> >> Hey all, hopefully this is a simple one.
> >>
> >> We've migrated from HBase (on s3) 1.3.2 to 1.4.4, creating a fresh
> >> instance with all our configuration from the previous one with few
> >> changes. However, we're seeing ingest run ~100x slower than we saw on
> >> 1.3.2 and the requests per second seem really low at around 3k total for
> >> a 10 node cluster. We're not seeing any atypical errors but it does seem
> >> we're seeing a ton of flushes, 1 about every 3 or 4 seconds but the logs
> >> indicate the flushes are full size ~128MB. I've included our
> >> configuration below.
> >>
> >>
> > Check old logs to see what rate you used to flush at?
> >
> > Is it possible that you are just writing way more data now or the data
> > character is different now? Larger values?
> >
> > S
> >
> >
> >
> >> Thanks for the help,
> >> Austin
> >>
> >> [
> >>     {
> >>       "classification": "hbase-site",
> >>       "properties": {
> >>         "fs.s3.consistent.retryPeriodSeconds": "10",
> >>         "hbase.regionserver.thread.compaction.large": "3",
> >>         "fs.s3.consistent.retryPolicyType": "fixed",
> >>         "hbase.hstore.blockingStoreFiles": "1000",
> >>         "fs.s3.consistent.throwExceptionOnInconsistency": "false",
> >>         "hbase.bucketcache.size": "27000",
> >>         "hbase.ipc.server.callqueue.read.ratio": "0.25",
> >>         "hbase.bucketcache.combinedcache.enabled": "true",
> >>         "fs.s3a.threads.max": "50",
> >>         "hbase.regionserver.thread.compaction.small": "2",
> >>         "hbase.hregion.memstore.flush.size": "134217728",
> >>         "hbase.hregion.max.filesize": "21474836480",
> >>         "hbase.regionserver.regionSplitLimit": "10000",
> >>         "fs.s3.consistent.metadata.tableName": "redacted",
> >>         "hbase.hstore.compaction.max": "1000",
> >>         "hbase.regionserver.global.memstore.size": "0.4",
> >>         "hbase.ipc.server.callqueue.handler.factor": "0.5",
> >>         "hbase.regionserver.logroll.period": "100000",
> >>         "hbase.hregion.majorcompaction": "0",
> >>         "hbase.hstore.compactionThreshold": "1000",
> >>         "hbase.hregion.memstore.mslab.enabled": "false",
> >>         "hbase.regionserver.handler.count": "50",
> >>         "fs.s3a.connection.maximum": "100",
> >>         "hbase.hstore.flusher.count": "10",
> >>         "hbase.hstore.blockingWaitTime": "0",
> >>         "hbase.hregion.memstore.block.multiplier": "10",
> >>         "hbase.bucketcache.ioengine": "offheap"
> >>       }
> >>     },
> >>     {
> >>       "configurations": [
> >>         {
> >>           "classification": "export",
> >>           "properties": {
> >>             "HBASE_REGIONSERVER_OPTS": "\"-Dcom.sun.management.jmxremote
> >> -Dcom.sun.management.jmxremote.authenticate\u003dfalse
> >> -Dcom.sun.management.jmxremote.port\u003d10102
> >> -Dcom.sun.management.jmxremote.ssl\u003dfalse -Xmx28G
> >> -XX:MaxDirectMemorySize\u003d28G\"",
> >>             "HBASE_MASTER_OPTS": "\"-Dcom.sun.management.jmxremote
> >> -Dcom.sun.management.jmxremote.authenticate\u003dfalse
> >> -Dcom.sun.management.jmxremote.port\u003d10101
> >> -Dcom.sun.management.jmxremote.ssl\u003dfalse -Xmx28G
> >> -XX:MaxDirectMemorySize\u003d28G\""
> >>           }
> >>         }
> >>       ],
> >>       "classification": "hbase-env",
> >>       "properties": {
> >>
> >>       }
> >>     },
> >>     {
> >>       "classification": "hbase-metrics",
> >>       "properties": {
> >>         "rpc.period": "60",
> >>         "hbase.period": "60",
> >>         "rpc.class":
> >> "org.apache.hadoop.metrics.spi.NullContextWithUpdateThread",
> >>         "hbase.class":
> >> "org.apache.hadoop.metrics.spi.NullContextWithUpdateThread",
> >>         "jvm.class":
> >> "org.apache.hadoop.metrics.spi.NullContextWithUpdateThread",
> >>         "jvm.period": "60"
> >>       }
> >>     },
> >>     {
> >>       "classification": "emrfs-site",
> >>       "properties": {
> >>         "fs.s3.consistent": "true"
> >>       }
> >>     }
> >> ]
> >>
> >> --
> >> Austin L. Heyne
> >>
> >>
>
> --
> Austin L. Heyne
>

Re: Ingest performance after upgrade

Posted by Austin Heyne <ah...@ccri.com>.

I ran some benchmarks comparing the two and actual performance is the 
same, the perception came from HBASE 18469 [1] where the request/s 
numbers were changed for 1.4.0 and 2.0.0.

Thanks,
Austin

[1] https://issues.apache.org/jira/browse/HBASE-18469


On 07/26/2018 06:41 PM, Stack wrote:
> On Mon, Jul 23, 2018 at 6:57 AM Austin Heyne <ah...@ccri.com> wrote:
>
>> Hey all, hopefully this is a simple one.
>>
>> We've migrated from HBase (on s3) 1.3.2 to 1.4.4, creating a fresh
>> instance with all our configuration from the previous one with few
>> changes. However, we're seeing ingest run ~100x slower than we saw on
>> 1.3.2 and the requests per second seem really low at around 3k total for
>> a 10 node cluster. We're not seeing any atypical errors but it does seem
>> we're seeing a ton of flushes, 1 about every 3 or 4 seconds but the logs
>> indicate the flushes are full size ~128MB. I've included our
>> configuration below.
>>
>>
> Check old logs to see what rate you used to flush at?
>
> Is it possible that you are just writing way more data now or the data
> character is different now? Larger values?
>
> S
>
>
>
>> Thanks for the help,
>> Austin
>>
>> [
>>     {
>>       "classification": "hbase-site",
>>       "properties": {
>>         "fs.s3.consistent.retryPeriodSeconds": "10",
>>         "hbase.regionserver.thread.compaction.large": "3",
>>         "fs.s3.consistent.retryPolicyType": "fixed",
>>         "hbase.hstore.blockingStoreFiles": "1000",
>>         "fs.s3.consistent.throwExceptionOnInconsistency": "false",
>>         "hbase.bucketcache.size": "27000",
>>         "hbase.ipc.server.callqueue.read.ratio": "0.25",
>>         "hbase.bucketcache.combinedcache.enabled": "true",
>>         "fs.s3a.threads.max": "50",
>>         "hbase.regionserver.thread.compaction.small": "2",
>>         "hbase.hregion.memstore.flush.size": "134217728",
>>         "hbase.hregion.max.filesize": "21474836480",
>>         "hbase.regionserver.regionSplitLimit": "10000",
>>         "fs.s3.consistent.metadata.tableName": "redacted",
>>         "hbase.hstore.compaction.max": "1000",
>>         "hbase.regionserver.global.memstore.size": "0.4",
>>         "hbase.ipc.server.callqueue.handler.factor": "0.5",
>>         "hbase.regionserver.logroll.period": "100000",
>>         "hbase.hregion.majorcompaction": "0",
>>         "hbase.hstore.compactionThreshold": "1000",
>>         "hbase.hregion.memstore.mslab.enabled": "false",
>>         "hbase.regionserver.handler.count": "50",
>>         "fs.s3a.connection.maximum": "100",
>>         "hbase.hstore.flusher.count": "10",
>>         "hbase.hstore.blockingWaitTime": "0",
>>         "hbase.hregion.memstore.block.multiplier": "10",
>>         "hbase.bucketcache.ioengine": "offheap"
>>       }
>>     },
>>     {
>>       "configurations": [
>>         {
>>           "classification": "export",
>>           "properties": {
>>             "HBASE_REGIONSERVER_OPTS": "\"-Dcom.sun.management.jmxremote
>> -Dcom.sun.management.jmxremote.authenticate\u003dfalse
>> -Dcom.sun.management.jmxremote.port\u003d10102
>> -Dcom.sun.management.jmxremote.ssl\u003dfalse -Xmx28G
>> -XX:MaxDirectMemorySize\u003d28G\"",
>>             "HBASE_MASTER_OPTS": "\"-Dcom.sun.management.jmxremote
>> -Dcom.sun.management.jmxremote.authenticate\u003dfalse
>> -Dcom.sun.management.jmxremote.port\u003d10101
>> -Dcom.sun.management.jmxremote.ssl\u003dfalse -Xmx28G
>> -XX:MaxDirectMemorySize\u003d28G\""
>>           }
>>         }
>>       ],
>>       "classification": "hbase-env",
>>       "properties": {
>>
>>       }
>>     },
>>     {
>>       "classification": "hbase-metrics",
>>       "properties": {
>>         "rpc.period": "60",
>>         "hbase.period": "60",
>>         "rpc.class":
>> "org.apache.hadoop.metrics.spi.NullContextWithUpdateThread",
>>         "hbase.class":
>> "org.apache.hadoop.metrics.spi.NullContextWithUpdateThread",
>>         "jvm.class":
>> "org.apache.hadoop.metrics.spi.NullContextWithUpdateThread",
>>         "jvm.period": "60"
>>       }
>>     },
>>     {
>>       "classification": "emrfs-site",
>>       "properties": {
>>         "fs.s3.consistent": "true"
>>       }
>>     }
>> ]
>>
>> --
>> Austin L. Heyne
>>
>>

-- 
Austin L. Heyne

Re: Ingest performance after upgrade

Posted by Stack <st...@duboce.net>.

On Mon, Jul 23, 2018 at 6:57 AM Austin Heyne <ah...@ccri.com> wrote:

> Hey all, hopefully this is a simple one.
>
> We've migrated from HBase (on s3) 1.3.2 to 1.4.4, creating a fresh
> instance with all our configuration from the previous one with few
> changes. However, we're seeing ingest run ~100x slower than we saw on
> 1.3.2 and the requests per second seem really low at around 3k total for
> a 10 node cluster. We're not seeing any atypical errors but it does seem
> we're seeing a ton of flushes, 1 about every 3 or 4 seconds but the logs
> indicate the flushes are full size ~128MB. I've included our
> configuration below.
>
>
Check old logs to see what rate you used to flush at?

Is it possible that you are just writing way more data now or the data
character is different now? Larger values?

S



> Thanks for the help,
> Austin
>
> [
>    {
>      "classification": "hbase-site",
>      "properties": {
>        "fs.s3.consistent.retryPeriodSeconds": "10",
>        "hbase.regionserver.thread.compaction.large": "3",
>        "fs.s3.consistent.retryPolicyType": "fixed",
>        "hbase.hstore.blockingStoreFiles": "1000",
>        "fs.s3.consistent.throwExceptionOnInconsistency": "false",
>        "hbase.bucketcache.size": "27000",
>        "hbase.ipc.server.callqueue.read.ratio": "0.25",
>        "hbase.bucketcache.combinedcache.enabled": "true",
>        "fs.s3a.threads.max": "50",
>        "hbase.regionserver.thread.compaction.small": "2",
>        "hbase.hregion.memstore.flush.size": "134217728",
>        "hbase.hregion.max.filesize": "21474836480",
>        "hbase.regionserver.regionSplitLimit": "10000",
>        "fs.s3.consistent.metadata.tableName": "redacted",
>        "hbase.hstore.compaction.max": "1000",
>        "hbase.regionserver.global.memstore.size": "0.4",
>        "hbase.ipc.server.callqueue.handler.factor": "0.5",
>        "hbase.regionserver.logroll.period": "100000",
>        "hbase.hregion.majorcompaction": "0",
>        "hbase.hstore.compactionThreshold": "1000",
>        "hbase.hregion.memstore.mslab.enabled": "false",
>        "hbase.regionserver.handler.count": "50",
>        "fs.s3a.connection.maximum": "100",
>        "hbase.hstore.flusher.count": "10",
>        "hbase.hstore.blockingWaitTime": "0",
>        "hbase.hregion.memstore.block.multiplier": "10",
>        "hbase.bucketcache.ioengine": "offheap"
>      }
>    },
>    {
>      "configurations": [
>        {
>          "classification": "export",
>          "properties": {
>            "HBASE_REGIONSERVER_OPTS": "\"-Dcom.sun.management.jmxremote
> -Dcom.sun.management.jmxremote.authenticate\u003dfalse
> -Dcom.sun.management.jmxremote.port\u003d10102
> -Dcom.sun.management.jmxremote.ssl\u003dfalse -Xmx28G
> -XX:MaxDirectMemorySize\u003d28G\"",
>            "HBASE_MASTER_OPTS": "\"-Dcom.sun.management.jmxremote
> -Dcom.sun.management.jmxremote.authenticate\u003dfalse
> -Dcom.sun.management.jmxremote.port\u003d10101
> -Dcom.sun.management.jmxremote.ssl\u003dfalse -Xmx28G
> -XX:MaxDirectMemorySize\u003d28G\""
>          }
>        }
>      ],
>      "classification": "hbase-env",
>      "properties": {
>
>      }
>    },
>    {
>      "classification": "hbase-metrics",
>      "properties": {
>        "rpc.period": "60",
>        "hbase.period": "60",
>        "rpc.class":
> "org.apache.hadoop.metrics.spi.NullContextWithUpdateThread",
>        "hbase.class":
> "org.apache.hadoop.metrics.spi.NullContextWithUpdateThread",
>        "jvm.class":
> "org.apache.hadoop.metrics.spi.NullContextWithUpdateThread",
>        "jvm.period": "60"
>      }
>    },
>    {
>      "classification": "emrfs-site",
>      "properties": {
>        "fs.s3.consistent": "true"
>      }
>    }
> ]
>
> --
> Austin L. Heyne
>
>