You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@hbase.apache.org by Stack <st...@duboce.net> on 2013/09/01 00:13:07 UTC

Re: [VOTE] The 1st hbase-0.96.0 release candidate is available for download

On Sat, Aug 31, 2013 at 4:15 AM, Jean-Marc Spaggiari <
jean-marc@spaggiari.org> wrote:

> Hi St.Ack,
>
> Thanks for this release!
>
> Is there a way to run the test suite on the binary packages? Or we can only
> run the tests on the source distribution?
>

The bin packages don't have poms so mvn is all confused if you ask it to
run the tests.

You might be able to fake out junit to run them since the bin package
includes the tests jar but it looks like we would need to bundle the
hamcrest jar (IIRC, we exclude it explicitly because could not see why we'd
want it at runtime -- smile):

durruti:hbase-0.96.0-hadoop2 stack$ ./bin/hbase  org.junit.runner.JUnitCore
org.apache.hadoop.hbase.TestKeyValue
JUnit version 4.11
Exception in thread "main" java.lang.NoClassDefFoundError:
org/hamcrest/SelfDescribing
at java.lang.ClassLoader.defineClass1(Native Method)
at java.lang.ClassLoader.defineClass(ClassLoader.java:791)
at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
...

You'd have name each test explicitly also.

So, yeah, use the -src package to run unit tests since it has poms, etc.

For integration tests, hbase-it tests, see 16.7.5 section in here
http://hbase.apache.org/book.html#hbase.tests  You should be able to run
those in binary package.

Thanks JMS,
St.Ack

Re: [VOTE] The 1st hbase-0.96.0 release candidate is available for download

Posted by Devaraj Das <dd...@hortonworks.com>.

Yeah, will take a look at this.


On Tue, Sep 3, 2013 at 12:56 PM, Elliott Clark <ec...@apache.org> wrote:

> Can someone take a look at running Test Big Linked for > 5 iterations
> with slowDeterministic chaos monkey on a distributed cluster.  I'm
> pretty concerned about HBASE-9338
>
> On Tue, Sep 3, 2013 at 6:57 AM, Jean-Marc Spaggiari
> <je...@spaggiari.org> wrote:
> > There was a typo in my log4j.properties :(
> >
> > So it's working fine.
> >
> > The only INFO logs I still see are those one:
> > 2013-09-03 09:53:07,313 INFO  [M:0;t430s:45176] mortbay.log: Logging to
> > org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via
> > org.mortbay.log.Slf4jLog
> > 2013-09-03 09:53:07,350 INFO  [M:0;t430s:45176] mortbay.log: jetty-6.1.26
> > But there is only very few of them.
> >
> > Performances wise, here are the numbers (the higher, the better. Rows per
> > seconds, expect for scans where it's rows/min). As you will see, 0.96 is
> > slower only for RandomSeekScanTest (way slower) and RandomScanWithRange10
> > but is faster for everything else. I ran the tests with the default
> > settings. So I think we should look at RandomSeekScanTest but expect this
> > one, everything else is pretty good.
> >
> > Also, I have been able to reproduce this exception:
> > 2013-09-03 09:55:36,718 WARN  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181
> ]
> > server.NIOServerCnxn: caught end of stream exception
> > EndOfStreamException: Unable to read additional data from client
> sessionid
> > 0x140e4191edb0009, likely client has closed socket
> >     at
> > org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:220)
> >     at
> >
> org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208)
> >     at java.lang.Thread.run(Thread.java:662)
> >
> > Just had to run PE and kill it in the middle.
> >
> > JM
> >
> >
> >
> > 0.96.0RC0 0.94.11
> >  org.apache.hadoop.hbase.PerformanceEvaluation$FilteredScanTest 10.28
> 10.17
> > 101.12%  org.apache.hadoop.hbase.PerformanceEvaluation$RandomReadTest
> 966.01
> > 810.58 119.18%
> > org.apache.hadoop.hbase.PerformanceEvaluation$RandomSeekScanTest 98.50
> > 255.71 38.52%
>  org.apache.hadoop.hbase.PerformanceEvaluation$RandomWriteTest
> > 39251.17 25682.11 152.83%
> > org.apache.hadoop.hbase.PerformanceEvaluation$RandomScanWithRange10Test
> > 25844.88 28715.29 90.00%
> > org.apache.hadoop.hbase.PerformanceEvaluation$RandomScanWithRange100Test
> > 20029.48 18022.39 111.14%
> > org.apache.hadoop.hbase.PerformanceEvaluation$RandomScanWithRange1000Test
> > 2692.16 2346.85 114.71%
> > org.apache.hadoop.hbase.PerformanceEvaluation$SequentialReadTest 3002.18
> > 2875.83 104.39%
> > org.apache.hadoop.hbase.PerformanceEvaluation$SequentialWriteTest
> 38995.50
> > 26693.23 146.09%
> >
> >
> > 2013/9/3 Stack <st...@duboce.net>
> >
> >> On Mon, Sep 2, 2013 at 10:51 AM, Jean-Marc Spaggiari <
> >> jean-marc@spaggiari.org> wrote:
> >>
> >> > I have created:
> >> >  HBASE-9412
> >> >  HBASE-9413
> >> >  HBASE-9414
> >> >
> >> > I have not been able yet to reproduce the ZK error. I'm trying.
> >> >
> >> >
> >> Is it when you have a shell connection and then kill it?
> >>
> >>
> >>
> >> > Last, I tried, with no success, to set loglevel to WARN to remove all
> >> > DEBUG and INFO logs. Setting it to WARN remove the DEBUG lines, but I
> >> > keep getting the INFO. Seems that something is setting the log level
> >> > somewhere else, or it's not read.
> >> >
> >> > Here is my log4j.properties file. I removed all the customed log level
> >> > to setup WARN for org.apache. And it's still showing INFO...
> >> >
> >> >
> >>
> >> You did it by editing log4j and restarting or in the UI?  I think the UI
> >> log level setting is broke.... (new issue!)
> >>
> >> Thanks for trying it out JMS,
> >>
> >> So everything is slower in 0.96?
> >> St.Ack
> >>
> >>
> >>
> >> > JM
> >> >
> >> >
> >> > # Define some default values that can be overridden by system
> properties
> >> > hbase.root.logger=WARN,console
> >> > hbase.security.logger=WARN,console
> >> > hbase.log.dir=.
> >> > hbase.log.file=hbase.log
> >> >
> >> > # Define the root logger to the system property "hbase.root.logger".
> >> > log4j.rootLogger=${hbase.root.logger}
> >> >
> >> > # Logging Threshold
> >> > log4j.threshold=ALL
> >> >
> >> > #
> >> > # Daily Rolling File Appender
> >> > #
> >> > log4j.appender.DRFA=org.apache.log4j.DailyRollingFileAppender
> >> > log4j.appender.DRFA.File=${hbase.log.dir}/${hbase.log.file}
> >> >
> >> > # Rollver at midnight
> >> > log4j.appender.DRFA.DatePattern=.yyyy-MM-dd
> >> >
> >> > # 30-day backup
> >> > #log4j.appender.DRFA.MaxBackupIndex=30
> >> > log4j.appender.DRFA.layout=org.apache.log4j.PatternLayout
> >> >
> >> > # Pattern format: Date LogLevel LoggerName LogMessage
> >> > log4j.appender.DRFA.layout.ConversionPattern=%d{ISO8601} %-5p [%t]
> %c{2}:
> >> > %m%n
> >> >
> >> > # Rolling File Appender properties
> >> > hbase.log.maxfilesize=256MB
> >> > hbase.log.maxbackupindex=20
> >> >
> >> > # Rolling File Appender
> >> > log4j.appender.RFA=org.apache.log4j.RollingFileAppender
> >> > log4j.appender.RFA.File=${hbase.log.dir}/${hbase.log.file}
> >> >
> >> > log4j.appender.RFA.MaxFileSize=${hbase.log.maxfilesize}
> >> > log4j.appender.RFA.MaxBackupIndex=${hbase.log.maxbackupindex}
> >> >
> >> > log4j.appender.RFA.layout=org.apache.log4j.PatternLayout
> >> > log4j.appender.RFA.layout.ConversionPattern=%d{ISO8601} %-5p [%t]
> %c{2}:
> >> > %m%n
> >> >
> >> > #
> >> > # Security audit appender
> >> > #
> >> > hbase.security.log.file=SecurityAuth.audit
> >> > hbase.security.log.maxfilesize=256MB
> >> > hbase.security.log.maxbackupindex=20
> >> > log4j.appender.RFAS=org.apache.log4j.RollingFileAppender
> >> > log4j.appender.RFAS.File=${hbase.log.dir}/${hbase.security.log.file}
> >> > log4j.appender.RFAS.MaxFileSize=${hbase.security.log.maxfilesize}
> >> >
> log4j.appender.RFAS.MaxBackupIndex=${hbase.security.log.maxbackupindex}
> >> > log4j.appender.RFAS.layout=org.apache.log4j.PatternLayout
> >> > log4j.appender.RFAS.layout.ConversionPattern=%d{ISO8601} %p %c: %m%n
> >> > log4j.category.SecurityLogger=${hbase.security.logger}
> >> > log4j.additivity.SecurityLogger=false
> >> >
> >> >
> >>
> #log4j.logger.SecurityLogger.org.apache.hadoop.hbase.security.access.AccessController=TRACE
> >> >
> >> > #
> >> > # Null Appender
> >> > #
> >> > log4j.appender.NullAppender=org.apache.log4j.varia.NullAppender
> >> >
> >> > #
> >> > # console
> >> > # Add "console" to rootlogger above if you want to use this
> >> > #
> >> > log4j.appender.console=org.apache.log4j.ConsoleAppender
> >> > log4j.appender.console.target=System.err
> >> > log4j.appender.console.layout=org.apache.log4j.PatternLayout
> >> > log4j.appender.console.layout.ConversionPattern=%d{ISO8601} %-5p [%t]
> >> > %c{2}: %m%n
> >> >
> >> > # Custom Logging levels
> >> >
> >> > log4j.logger.ore.apache=WARN
> >> >
> >> > 2013/9/2 Jean-Marc Spaggiari <je...@spaggiari.org>:
> >> > > Hi St.Ack,
> >> > >
> >> > > I will open the relate JIRAs in few minutes.
> >> > >
> >> > > Regarding performances, RandomSeekScanTest is way slower, and only
> >> > > RandomScanWithRange100Test was faster. Others were similar. For
> >> > > RandomScanWithRange100Test I suspect that I don't have the right
> >> > > number for 0.94 so 0.94.11 tests are running right now on the same
> >> > > server with the same configuration. I will start to have numbers by
> >> > > end of day, else, tomorrow morning, but will most probably take
> about
> >> > > 24h to get all of them.
> >> > >
> >> > > JM
> >> > >
> >> > > 2013/9/2 Stack <st...@duboce.net>:
> >> > >> On Mon, Sep 2, 2013 at 9:41 AM, Jean-Marc Spaggiari <
> >> > jean-marc@spaggiari.org
> >> > >>> wrote:
> >> > >>
> >> > >>> Got it.
> >> > >>>
> >> > >>> I can't run the integration tests for now because I'm lacking some
> >> > >>> servers :( Need to complete some HBase on RAID tests before I can
> get
> >> > >>> those new servers
> >> > >>>
> >> > >>> First thing is, start-hbase.cmd has the execute flag set. I don't
> >> > >>> think it's required. And it will help with tabulation feature if
> we
> >> > >>> can un-set it.
> >> > >>>
> >> > >>> For 0.96.0RC0 here are my results:
> >> > >>> First, I get 2 .out files. each time I start the server, instead
> of
> >> > >>> usually one... With the same timestamp.
> >> > >>>
> >> > >>> -rw-r--r-- 1 jmspaggiari jmspaggiari     0 Aug 31 15:38
> >> > >>> hbase-jmspaggiari-master-t430s.out
> >> > >>> -rw-r--r-- 1 jmspaggiari jmspaggiari     0 Aug 31 15:38
> >> > >>> hbase-jmspaggiari-master-t430s.out.1
> >> > >>>
> >> > >>>
> >> > >>> In the UI, we say "The .META. table holds references to all User
> >> Table
> >> > >>> regions" but the table name is "hbase:meta" and not ".META."
> >> > >>>
> >> > >>> On the logs, I found this exception that I did not had before:
> >> > >>> 2013-08-31 18:45:05,490 WARN
> >> > >>> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181] server.NIOServerCnxn:
> >> > >>> caught end of stream exception
> >> > >>> EndOfStreamException: Unable to read additional data from client
> >> > >>> sessionid 0x140d68bb9d50004, likely client has closed socket
> >> > >>>     at
> >> > >>>
> >> org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:220)
> >> > >>>     at
> >> > >>>
> >> >
> >>
> org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208)
> >> > >>>     at java.lang.Thread.run(Thread.java:662)
> >> > >>>
> >> > >>>
> >> > >>> I ran PE over 48h. I don't have another 0.96 baseline to compare
> >> with,
> >> > >>> so I compared with 0.94.
> >> > >>> org.apache.hadoop.hbase.PerformanceEvaluation$RandomSeekScanTest
> is
> >> > >>> about 3 times slower with 0.96.
> >> > >>>
> >> >
> org.apache.hadoop.hbase.PerformanceEvaluation$RandomScanWithRange100Test
> >> > >>> seems to be 2 times faster.
> >> > >>>
> >> > >>> Writes also are faster but I changed my hard drive since I ran
> 0.94.
> >> > >>> So I'm currently re-running 0.94 and will provide another more
> >> > >>> accurate comparison soon.
> >> > >>>
> >> > >>> Ran LoadTestTool: Failed to write keys: 0
> >> > >>>
> >> > >>> I have not been able to run IntegrationTestLoadAndVerify nor
> >> > >>> IntegrationTestBigLinkedList because of the lack of servers.
> >> > >>>
> >> > >>> So overall, it seems to be working fine, but I have not been able
> to
> >> > >>> test this release as deeply as I'm usually testing the 0.94
> releases.
> >> > >>>
> >> > >>
> >> > >> Thank you for taking it for a spin JMS -- you the man.  If you
> don't
> >> > open
> >> > >> issues for the teething stuff, I will.
> >> > >>
> >> > >> So 0.96 is slower doing RandomSeekScanTest but faster on your other
> >> > tests?
> >> > >>  I can take a look too.
> >> > >>
> >> > >> St.Ack
> >> >
> >>
>

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Re: [VOTE] The 1st hbase-0.96.0 release candidate is available for download

Posted by Elliott Clark <ec...@apache.org>.

Yeah I got that for a while as well.  Though once online schema change
was disabled I can complete the reduce step with 2.5 G of heap (still
failing :-/).

On Tue, Sep 3, 2013 at 6:30 PM, Devaraj Das <dd...@hortonworks.com> wrote:
> Well from the test code it seems like the problem is due to the fact that
> the reducer got unexpected data and it was trying to construct the log
> message for the user. So the job had already failed in reality.
>
>
> On Tue, Sep 3, 2013 at 6:17 PM, Devaraj Das <dd...@hortonworks.com> wrote:
>
>> Elliott, what are the heap sizes of the M/R tasks in your setup. I was
>> running the job like this (without chaosmonkey to start with):
>>
>> hbase org.apache.hadoop.hbase.test.IntegrationTestBigLinkedList Loop 5 12
>> 2500000 IntegrationTestBigLinkedList 10
>>
>> Even the above test failed with one reduce task failing with OOM, in the
>> verify step. The heap size was set to 3G.
>>
>> 2013-09-04 01:11:56,054 FATAL [main] org.apache.hadoop.mapred.YarnChild: Error running child : java.lang.OutOfMemoryError: Java heap space
>>       at java.util.Arrays.copyOf(Arrays.java:2882)
>>       at java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:100)
>>       at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:390)
>>       at java.lang.StringBuilder.append(StringBuilder.java:119)
>>       at org.apache.hadoop.hbase.test.IntegrationTestBigLinkedList$Verify$VerifyReducer.reduce(IntegrationTestBigLinkedList.java:576)
>>       at org.apache.hadoop.hbase.test.IntegrationTestBigLinkedList$Verify$VerifyReducer.reduce(IntegrationTestBigLinkedList.java:547)
>>       at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:171)
>>       at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:645)
>>       at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:405)
>>       at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:162)
>>       at java.security.AccessController.doPrivileged(Native Method)
>>       at javax.security.auth.Subject.doAs(Subject.java:396)
>>       at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1477)
>>       at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:157)
>>
>>
>>
>>
>>
>> On Tue, Sep 3, 2013 at 12:56 PM, Elliott Clark <ec...@apache.org> wrote:
>>
>>> Can someone take a look at running Test Big Linked for > 5 iterations
>>> with slowDeterministic chaos monkey on a distributed cluster.  I'm
>>> pretty concerned about HBASE-9338
>>>
>>> On Tue, Sep 3, 2013 at 6:57 AM, Jean-Marc Spaggiari
>>> <je...@spaggiari.org> wrote:
>>> > There was a typo in my log4j.properties :(
>>> >
>>> > So it's working fine.
>>> >
>>> > The only INFO logs I still see are those one:
>>> > 2013-09-03 09:53:07,313 INFO  [M:0;t430s:45176] mortbay.log: Logging to
>>> > org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via
>>> > org.mortbay.log.Slf4jLog
>>> > 2013-09-03 09:53:07,350 INFO  [M:0;t430s:45176] mortbay.log:
>>> jetty-6.1.26
>>> > But there is only very few of them.
>>> >
>>> > Performances wise, here are the numbers (the higher, the better. Rows
>>> per
>>> > seconds, expect for scans where it's rows/min). As you will see, 0.96 is
>>> > slower only for RandomSeekScanTest (way slower) and
>>> RandomScanWithRange10
>>> > but is faster for everything else. I ran the tests with the default
>>> > settings. So I think we should look at RandomSeekScanTest but expect
>>> this
>>> > one, everything else is pretty good.
>>> >
>>> > Also, I have been able to reproduce this exception:
>>> > 2013-09-03 09:55:36,718 WARN  [NIOServerCxn.Factory:
>>> 0.0.0.0/0.0.0.0:2181]
>>> > server.NIOServerCnxn: caught end of stream exception
>>> > EndOfStreamException: Unable to read additional data from client
>>> sessionid
>>> > 0x140e4191edb0009, likely client has closed socket
>>> >     at
>>> > org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:220)
>>> >     at
>>> >
>>> org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208)
>>> >     at java.lang.Thread.run(Thread.java:662)
>>> >
>>> > Just had to run PE and kill it in the middle.
>>> >
>>> > JM
>>> >
>>> >
>>> >
>>> > 0.96.0RC0 0.94.11
>>> >  org.apache.hadoop.hbase.PerformanceEvaluation$FilteredScanTest 10.28
>>> 10.17
>>> > 101.12%  org.apache.hadoop.hbase.PerformanceEvaluation$RandomReadTest
>>> 966.01
>>> > 810.58 119.18%
>>> > org.apache.hadoop.hbase.PerformanceEvaluation$RandomSeekScanTest 98.50
>>> > 255.71 38.52%
>>>  org.apache.hadoop.hbase.PerformanceEvaluation$RandomWriteTest
>>> > 39251.17 25682.11 152.83%
>>> > org.apache.hadoop.hbase.PerformanceEvaluation$RandomScanWithRange10Test
>>> > 25844.88 28715.29 90.00%
>>> > org.apache.hadoop.hbase.PerformanceEvaluation$RandomScanWithRange100Test
>>> > 20029.48 18022.39 111.14%
>>> >
>>> org.apache.hadoop.hbase.PerformanceEvaluation$RandomScanWithRange1000Test
>>> > 2692.16 2346.85 114.71%
>>> > org.apache.hadoop.hbase.PerformanceEvaluation$SequentialReadTest 3002.18
>>> > 2875.83 104.39%
>>> > org.apache.hadoop.hbase.PerformanceEvaluation$SequentialWriteTest
>>> 38995.50
>>> > 26693.23 146.09%
>>> >
>>> >
>>> > 2013/9/3 Stack <st...@duboce.net>
>>> >
>>> >> On Mon, Sep 2, 2013 at 10:51 AM, Jean-Marc Spaggiari <
>>> >> jean-marc@spaggiari.org> wrote:
>>> >>
>>> >> > I have created:
>>> >> >  HBASE-9412
>>> >> >  HBASE-9413
>>> >> >  HBASE-9414
>>> >> >
>>> >> > I have not been able yet to reproduce the ZK error. I'm trying.
>>> >> >
>>> >> >
>>> >> Is it when you have a shell connection and then kill it?
>>> >>
>>> >>
>>> >>
>>> >> > Last, I tried, with no success, to set loglevel to WARN to remove all
>>> >> > DEBUG and INFO logs. Setting it to WARN remove the DEBUG lines, but I
>>> >> > keep getting the INFO. Seems that something is setting the log level
>>> >> > somewhere else, or it's not read.
>>> >> >
>>> >> > Here is my log4j.properties file. I removed all the customed log
>>> level
>>> >> > to setup WARN for org.apache. And it's still showing INFO...
>>> >> >
>>> >> >
>>> >>
>>> >> You did it by editing log4j and restarting or in the UI?  I think the
>>> UI
>>> >> log level setting is broke.... (new issue!)
>>> >>
>>> >> Thanks for trying it out JMS,
>>> >>
>>> >> So everything is slower in 0.96?
>>> >> St.Ack
>>> >>
>>> >>
>>> >>
>>> >> > JM
>>> >> >
>>> >> >
>>> >> > # Define some default values that can be overridden by system
>>> properties
>>> >> > hbase.root.logger=WARN,console
>>> >> > hbase.security.logger=WARN,console
>>> >> > hbase.log.dir=.
>>> >> > hbase.log.file=hbase.log
>>> >> >
>>> >> > # Define the root logger to the system property "hbase.root.logger".
>>> >> > log4j.rootLogger=${hbase.root.logger}
>>> >> >
>>> >> > # Logging Threshold
>>> >> > log4j.threshold=ALL
>>> >> >
>>> >> > #
>>> >> > # Daily Rolling File Appender
>>> >> > #
>>> >> > log4j.appender.DRFA=org.apache.log4j.DailyRollingFileAppender
>>> >> > log4j.appender.DRFA.File=${hbase.log.dir}/${hbase.log.file}
>>> >> >
>>> >> > # Rollver at midnight
>>> >> > log4j.appender.DRFA.DatePattern=.yyyy-MM-dd
>>> >> >
>>> >> > # 30-day backup
>>> >> > #log4j.appender.DRFA.MaxBackupIndex=30
>>> >> > log4j.appender.DRFA.layout=org.apache.log4j.PatternLayout
>>> >> >
>>> >> > # Pattern format: Date LogLevel LoggerName LogMessage
>>> >> > log4j.appender.DRFA.layout.ConversionPattern=%d{ISO8601} %-5p [%t]
>>> %c{2}:
>>> >> > %m%n
>>> >> >
>>> >> > # Rolling File Appender properties
>>> >> > hbase.log.maxfilesize=256MB
>>> >> > hbase.log.maxbackupindex=20
>>> >> >
>>> >> > # Rolling File Appender
>>> >> > log4j.appender.RFA=org.apache.log4j.RollingFileAppender
>>> >> > log4j.appender.RFA.File=${hbase.log.dir}/${hbase.log.file}
>>> >> >
>>> >> > log4j.appender.RFA.MaxFileSize=${hbase.log.maxfilesize}
>>> >> > log4j.appender.RFA.MaxBackupIndex=${hbase.log.maxbackupindex}
>>> >> >
>>> >> > log4j.appender.RFA.layout=org.apache.log4j.PatternLayout
>>> >> > log4j.appender.RFA.layout.ConversionPattern=%d{ISO8601} %-5p [%t]
>>> %c{2}:
>>> >> > %m%n
>>> >> >
>>> >> > #
>>> >> > # Security audit appender
>>> >> > #
>>> >> > hbase.security.log.file=SecurityAuth.audit
>>> >> > hbase.security.log.maxfilesize=256MB
>>> >> > hbase.security.log.maxbackupindex=20
>>> >> > log4j.appender.RFAS=org.apache.log4j.RollingFileAppender
>>> >> > log4j.appender.RFAS.File=${hbase.log.dir}/${hbase.security.log.file}
>>> >> > log4j.appender.RFAS.MaxFileSize=${hbase.security.log.maxfilesize}
>>> >> >
>>> log4j.appender.RFAS.MaxBackupIndex=${hbase.security.log.maxbackupindex}
>>> >> > log4j.appender.RFAS.layout=org.apache.log4j.PatternLayout
>>> >> > log4j.appender.RFAS.layout.ConversionPattern=%d{ISO8601} %p %c: %m%n
>>> >> > log4j.category.SecurityLogger=${hbase.security.logger}
>>> >> > log4j.additivity.SecurityLogger=false
>>> >> >
>>> >> >
>>> >>
>>> #log4j.logger.SecurityLogger.org.apache.hadoop.hbase.security.access.AccessController=TRACE
>>> >> >
>>> >> > #
>>> >> > # Null Appender
>>> >> > #
>>> >> > log4j.appender.NullAppender=org.apache.log4j.varia.NullAppender
>>> >> >
>>> >> > #
>>> >> > # console
>>> >> > # Add "console" to rootlogger above if you want to use this
>>> >> > #
>>> >> > log4j.appender.console=org.apache.log4j.ConsoleAppender
>>> >> > log4j.appender.console.target=System.err
>>> >> > log4j.appender.console.layout=org.apache.log4j.PatternLayout
>>> >> > log4j.appender.console.layout.ConversionPattern=%d{ISO8601} %-5p [%t]
>>> >> > %c{2}: %m%n
>>> >> >
>>> >> > # Custom Logging levels
>>> >> >
>>> >> > log4j.logger.ore.apache=WARN
>>> >> >
>>> >> > 2013/9/2 Jean-Marc Spaggiari <je...@spaggiari.org>:
>>> >> > > Hi St.Ack,
>>> >> > >
>>> >> > > I will open the relate JIRAs in few minutes.
>>> >> > >
>>> >> > > Regarding performances, RandomSeekScanTest is way slower, and only
>>> >> > > RandomScanWithRange100Test was faster. Others were similar. For
>>> >> > > RandomScanWithRange100Test I suspect that I don't have the right
>>> >> > > number for 0.94 so 0.94.11 tests are running right now on the same
>>> >> > > server with the same configuration. I will start to have numbers by
>>> >> > > end of day, else, tomorrow morning, but will most probably take
>>> about
>>> >> > > 24h to get all of them.
>>> >> > >
>>> >> > > JM
>>> >> > >
>>> >> > > 2013/9/2 Stack <st...@duboce.net>:
>>> >> > >> On Mon, Sep 2, 2013 at 9:41 AM, Jean-Marc Spaggiari <
>>> >> > jean-marc@spaggiari.org
>>> >> > >>> wrote:
>>> >> > >>
>>> >> > >>> Got it.
>>> >> > >>>
>>> >> > >>> I can't run the integration tests for now because I'm lacking
>>> some
>>> >> > >>> servers :( Need to complete some HBase on RAID tests before I
>>> can get
>>> >> > >>> those new servers
>>> >> > >>>
>>> >> > >>> First thing is, start-hbase.cmd has the execute flag set. I don't
>>> >> > >>> think it's required. And it will help with tabulation feature if
>>> we
>>> >> > >>> can un-set it.
>>> >> > >>>
>>> >> > >>> For 0.96.0RC0 here are my results:
>>> >> > >>> First, I get 2 .out files. each time I start the server, instead
>>> of
>>> >> > >>> usually one... With the same timestamp.
>>> >> > >>>
>>> >> > >>> -rw-r--r-- 1 jmspaggiari jmspaggiari     0 Aug 31 15:38
>>> >> > >>> hbase-jmspaggiari-master-t430s.out
>>> >> > >>> -rw-r--r-- 1 jmspaggiari jmspaggiari     0 Aug 31 15:38
>>> >> > >>> hbase-jmspaggiari-master-t430s.out.1
>>> >> > >>>
>>> >> > >>>
>>> >> > >>> In the UI, we say "The .META. table holds references to all User
>>> >> Table
>>> >> > >>> regions" but the table name is "hbase:meta" and not ".META."
>>> >> > >>>
>>> >> > >>> On the logs, I found this exception that I did not had before:
>>> >> > >>> 2013-08-31 18:45:05,490 WARN
>>> >> > >>> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181]
>>> server.NIOServerCnxn:
>>> >> > >>> caught end of stream exception
>>> >> > >>> EndOfStreamException: Unable to read additional data from client
>>> >> > >>> sessionid 0x140d68bb9d50004, likely client has closed socket
>>> >> > >>>     at
>>> >> > >>>
>>> >> org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:220)
>>> >> > >>>     at
>>> >> > >>>
>>> >> >
>>> >>
>>> org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208)
>>> >> > >>>     at java.lang.Thread.run(Thread.java:662)
>>> >> > >>>
>>> >> > >>>
>>> >> > >>> I ran PE over 48h. I don't have another 0.96 baseline to compare
>>> >> with,
>>> >> > >>> so I compared with 0.94.
>>> >> > >>> org.apache.hadoop.hbase.PerformanceEvaluation$RandomSeekScanTest
>>> is
>>> >> > >>> about 3 times slower with 0.96.
>>> >> > >>>
>>> >> >
>>> org.apache.hadoop.hbase.PerformanceEvaluation$RandomScanWithRange100Test
>>> >> > >>> seems to be 2 times faster.
>>> >> > >>>
>>> >> > >>> Writes also are faster but I changed my hard drive since I ran
>>> 0.94.
>>> >> > >>> So I'm currently re-running 0.94 and will provide another more
>>> >> > >>> accurate comparison soon.
>>> >> > >>>
>>> >> > >>> Ran LoadTestTool: Failed to write keys: 0
>>> >> > >>>
>>> >> > >>> I have not been able to run IntegrationTestLoadAndVerify nor
>>> >> > >>> IntegrationTestBigLinkedList because of the lack of servers.
>>> >> > >>>
>>> >> > >>> So overall, it seems to be working fine, but I have not been
>>> able to
>>> >> > >>> test this release as deeply as I'm usually testing the 0.94
>>> releases.
>>> >> > >>>
>>> >> > >>
>>> >> > >> Thank you for taking it for a spin JMS -- you the man.  If you
>>> don't
>>> >> > open
>>> >> > >> issues for the teething stuff, I will.
>>> >> > >>
>>> >> > >> So 0.96 is slower doing RandomSeekScanTest but faster on your
>>> other
>>> >> > tests?
>>> >> > >>  I can take a look too.
>>> >> > >>
>>> >> > >> St.Ack
>>> >> >
>>> >>
>>>
>>
>>
>
> --
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity to
> which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the reader
> of this message is not the intended recipient, you are hereby notified that
> any printing, copying, dissemination, distribution, disclosure or
> forwarding of this communication is strictly prohibited. If you have
> received this communication in error, please contact the sender immediately
> and delete it from your system. Thank You.

Re: [VOTE] The 1st hbase-0.96.0 release candidate is available for download

Posted by Devaraj Das <dd...@hortonworks.com>.

Well from the test code it seems like the problem is due to the fact that
the reducer got unexpected data and it was trying to construct the log
message for the user. So the job had already failed in reality.


On Tue, Sep 3, 2013 at 6:17 PM, Devaraj Das <dd...@hortonworks.com> wrote:

> Elliott, what are the heap sizes of the M/R tasks in your setup. I was
> running the job like this (without chaosmonkey to start with):
>
> hbase org.apache.hadoop.hbase.test.IntegrationTestBigLinkedList Loop 5 12
> 2500000 IntegrationTestBigLinkedList 10
>
> Even the above test failed with one reduce task failing with OOM, in the
> verify step. The heap size was set to 3G.
>
> 2013-09-04 01:11:56,054 FATAL [main] org.apache.hadoop.mapred.YarnChild: Error running child : java.lang.OutOfMemoryError: Java heap space
> 	at java.util.Arrays.copyOf(Arrays.java:2882)
> 	at java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:100)
> 	at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:390)
> 	at java.lang.StringBuilder.append(StringBuilder.java:119)
> 	at org.apache.hadoop.hbase.test.IntegrationTestBigLinkedList$Verify$VerifyReducer.reduce(IntegrationTestBigLinkedList.java:576)
> 	at org.apache.hadoop.hbase.test.IntegrationTestBigLinkedList$Verify$VerifyReducer.reduce(IntegrationTestBigLinkedList.java:547)
> 	at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:171)
> 	at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:645)
> 	at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:405)
> 	at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:162)
> 	at java.security.AccessController.doPrivileged(Native Method)
> 	at javax.security.auth.Subject.doAs(Subject.java:396)
> 	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1477)
> 	at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:157)
>
>
>
>
>
> On Tue, Sep 3, 2013 at 12:56 PM, Elliott Clark <ec...@apache.org> wrote:
>
>> Can someone take a look at running Test Big Linked for > 5 iterations
>> with slowDeterministic chaos monkey on a distributed cluster.  I'm
>> pretty concerned about HBASE-9338
>>
>> On Tue, Sep 3, 2013 at 6:57 AM, Jean-Marc Spaggiari
>> <je...@spaggiari.org> wrote:
>> > There was a typo in my log4j.properties :(
>> >
>> > So it's working fine.
>> >
>> > The only INFO logs I still see are those one:
>> > 2013-09-03 09:53:07,313 INFO  [M:0;t430s:45176] mortbay.log: Logging to
>> > org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via
>> > org.mortbay.log.Slf4jLog
>> > 2013-09-03 09:53:07,350 INFO  [M:0;t430s:45176] mortbay.log:
>> jetty-6.1.26
>> > But there is only very few of them.
>> >
>> > Performances wise, here are the numbers (the higher, the better. Rows
>> per
>> > seconds, expect for scans where it's rows/min). As you will see, 0.96 is
>> > slower only for RandomSeekScanTest (way slower) and
>> RandomScanWithRange10
>> > but is faster for everything else. I ran the tests with the default
>> > settings. So I think we should look at RandomSeekScanTest but expect
>> this
>> > one, everything else is pretty good.
>> >
>> > Also, I have been able to reproduce this exception:
>> > 2013-09-03 09:55:36,718 WARN  [NIOServerCxn.Factory:
>> 0.0.0.0/0.0.0.0:2181]
>> > server.NIOServerCnxn: caught end of stream exception
>> > EndOfStreamException: Unable to read additional data from client
>> sessionid
>> > 0x140e4191edb0009, likely client has closed socket
>> >     at
>> > org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:220)
>> >     at
>> >
>> org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208)
>> >     at java.lang.Thread.run(Thread.java:662)
>> >
>> > Just had to run PE and kill it in the middle.
>> >
>> > JM
>> >
>> >
>> >
>> > 0.96.0RC0 0.94.11
>> >  org.apache.hadoop.hbase.PerformanceEvaluation$FilteredScanTest 10.28
>> 10.17
>> > 101.12%  org.apache.hadoop.hbase.PerformanceEvaluation$RandomReadTest
>> 966.01
>> > 810.58 119.18%
>> > org.apache.hadoop.hbase.PerformanceEvaluation$RandomSeekScanTest 98.50
>> > 255.71 38.52%
>>  org.apache.hadoop.hbase.PerformanceEvaluation$RandomWriteTest
>> > 39251.17 25682.11 152.83%
>> > org.apache.hadoop.hbase.PerformanceEvaluation$RandomScanWithRange10Test
>> > 25844.88 28715.29 90.00%
>> > org.apache.hadoop.hbase.PerformanceEvaluation$RandomScanWithRange100Test
>> > 20029.48 18022.39 111.14%
>> >
>> org.apache.hadoop.hbase.PerformanceEvaluation$RandomScanWithRange1000Test
>> > 2692.16 2346.85 114.71%
>> > org.apache.hadoop.hbase.PerformanceEvaluation$SequentialReadTest 3002.18
>> > 2875.83 104.39%
>> > org.apache.hadoop.hbase.PerformanceEvaluation$SequentialWriteTest
>> 38995.50
>> > 26693.23 146.09%
>> >
>> >
>> > 2013/9/3 Stack <st...@duboce.net>
>> >
>> >> On Mon, Sep 2, 2013 at 10:51 AM, Jean-Marc Spaggiari <
>> >> jean-marc@spaggiari.org> wrote:
>> >>
>> >> > I have created:
>> >> >  HBASE-9412
>> >> >  HBASE-9413
>> >> >  HBASE-9414
>> >> >
>> >> > I have not been able yet to reproduce the ZK error. I'm trying.
>> >> >
>> >> >
>> >> Is it when you have a shell connection and then kill it?
>> >>
>> >>
>> >>
>> >> > Last, I tried, with no success, to set loglevel to WARN to remove all
>> >> > DEBUG and INFO logs. Setting it to WARN remove the DEBUG lines, but I
>> >> > keep getting the INFO. Seems that something is setting the log level
>> >> > somewhere else, or it's not read.
>> >> >
>> >> > Here is my log4j.properties file. I removed all the customed log
>> level
>> >> > to setup WARN for org.apache. And it's still showing INFO...
>> >> >
>> >> >
>> >>
>> >> You did it by editing log4j and restarting or in the UI?  I think the
>> UI
>> >> log level setting is broke.... (new issue!)
>> >>
>> >> Thanks for trying it out JMS,
>> >>
>> >> So everything is slower in 0.96?
>> >> St.Ack
>> >>
>> >>
>> >>
>> >> > JM
>> >> >
>> >> >
>> >> > # Define some default values that can be overridden by system
>> properties
>> >> > hbase.root.logger=WARN,console
>> >> > hbase.security.logger=WARN,console
>> >> > hbase.log.dir=.
>> >> > hbase.log.file=hbase.log
>> >> >
>> >> > # Define the root logger to the system property "hbase.root.logger".
>> >> > log4j.rootLogger=${hbase.root.logger}
>> >> >
>> >> > # Logging Threshold
>> >> > log4j.threshold=ALL
>> >> >
>> >> > #
>> >> > # Daily Rolling File Appender
>> >> > #
>> >> > log4j.appender.DRFA=org.apache.log4j.DailyRollingFileAppender
>> >> > log4j.appender.DRFA.File=${hbase.log.dir}/${hbase.log.file}
>> >> >
>> >> > # Rollver at midnight
>> >> > log4j.appender.DRFA.DatePattern=.yyyy-MM-dd
>> >> >
>> >> > # 30-day backup
>> >> > #log4j.appender.DRFA.MaxBackupIndex=30
>> >> > log4j.appender.DRFA.layout=org.apache.log4j.PatternLayout
>> >> >
>> >> > # Pattern format: Date LogLevel LoggerName LogMessage
>> >> > log4j.appender.DRFA.layout.ConversionPattern=%d{ISO8601} %-5p [%t]
>> %c{2}:
>> >> > %m%n
>> >> >
>> >> > # Rolling File Appender properties
>> >> > hbase.log.maxfilesize=256MB
>> >> > hbase.log.maxbackupindex=20
>> >> >
>> >> > # Rolling File Appender
>> >> > log4j.appender.RFA=org.apache.log4j.RollingFileAppender
>> >> > log4j.appender.RFA.File=${hbase.log.dir}/${hbase.log.file}
>> >> >
>> >> > log4j.appender.RFA.MaxFileSize=${hbase.log.maxfilesize}
>> >> > log4j.appender.RFA.MaxBackupIndex=${hbase.log.maxbackupindex}
>> >> >
>> >> > log4j.appender.RFA.layout=org.apache.log4j.PatternLayout
>> >> > log4j.appender.RFA.layout.ConversionPattern=%d{ISO8601} %-5p [%t]
>> %c{2}:
>> >> > %m%n
>> >> >
>> >> > #
>> >> > # Security audit appender
>> >> > #
>> >> > hbase.security.log.file=SecurityAuth.audit
>> >> > hbase.security.log.maxfilesize=256MB
>> >> > hbase.security.log.maxbackupindex=20
>> >> > log4j.appender.RFAS=org.apache.log4j.RollingFileAppender
>> >> > log4j.appender.RFAS.File=${hbase.log.dir}/${hbase.security.log.file}
>> >> > log4j.appender.RFAS.MaxFileSize=${hbase.security.log.maxfilesize}
>> >> >
>> log4j.appender.RFAS.MaxBackupIndex=${hbase.security.log.maxbackupindex}
>> >> > log4j.appender.RFAS.layout=org.apache.log4j.PatternLayout
>> >> > log4j.appender.RFAS.layout.ConversionPattern=%d{ISO8601} %p %c: %m%n
>> >> > log4j.category.SecurityLogger=${hbase.security.logger}
>> >> > log4j.additivity.SecurityLogger=false
>> >> >
>> >> >
>> >>
>> #log4j.logger.SecurityLogger.org.apache.hadoop.hbase.security.access.AccessController=TRACE
>> >> >
>> >> > #
>> >> > # Null Appender
>> >> > #
>> >> > log4j.appender.NullAppender=org.apache.log4j.varia.NullAppender
>> >> >
>> >> > #
>> >> > # console
>> >> > # Add "console" to rootlogger above if you want to use this
>> >> > #
>> >> > log4j.appender.console=org.apache.log4j.ConsoleAppender
>> >> > log4j.appender.console.target=System.err
>> >> > log4j.appender.console.layout=org.apache.log4j.PatternLayout
>> >> > log4j.appender.console.layout.ConversionPattern=%d{ISO8601} %-5p [%t]
>> >> > %c{2}: %m%n
>> >> >
>> >> > # Custom Logging levels
>> >> >
>> >> > log4j.logger.ore.apache=WARN
>> >> >
>> >> > 2013/9/2 Jean-Marc Spaggiari <je...@spaggiari.org>:
>> >> > > Hi St.Ack,
>> >> > >
>> >> > > I will open the relate JIRAs in few minutes.
>> >> > >
>> >> > > Regarding performances, RandomSeekScanTest is way slower, and only
>> >> > > RandomScanWithRange100Test was faster. Others were similar. For
>> >> > > RandomScanWithRange100Test I suspect that I don't have the right
>> >> > > number for 0.94 so 0.94.11 tests are running right now on the same
>> >> > > server with the same configuration. I will start to have numbers by
>> >> > > end of day, else, tomorrow morning, but will most probably take
>> about
>> >> > > 24h to get all of them.
>> >> > >
>> >> > > JM
>> >> > >
>> >> > > 2013/9/2 Stack <st...@duboce.net>:
>> >> > >> On Mon, Sep 2, 2013 at 9:41 AM, Jean-Marc Spaggiari <
>> >> > jean-marc@spaggiari.org
>> >> > >>> wrote:
>> >> > >>
>> >> > >>> Got it.
>> >> > >>>
>> >> > >>> I can't run the integration tests for now because I'm lacking
>> some
>> >> > >>> servers :( Need to complete some HBase on RAID tests before I
>> can get
>> >> > >>> those new servers
>> >> > >>>
>> >> > >>> First thing is, start-hbase.cmd has the execute flag set. I don't
>> >> > >>> think it's required. And it will help with tabulation feature if
>> we
>> >> > >>> can un-set it.
>> >> > >>>
>> >> > >>> For 0.96.0RC0 here are my results:
>> >> > >>> First, I get 2 .out files. each time I start the server, instead
>> of
>> >> > >>> usually one... With the same timestamp.
>> >> > >>>
>> >> > >>> -rw-r--r-- 1 jmspaggiari jmspaggiari     0 Aug 31 15:38
>> >> > >>> hbase-jmspaggiari-master-t430s.out
>> >> > >>> -rw-r--r-- 1 jmspaggiari jmspaggiari     0 Aug 31 15:38
>> >> > >>> hbase-jmspaggiari-master-t430s.out.1
>> >> > >>>
>> >> > >>>
>> >> > >>> In the UI, we say "The .META. table holds references to all User
>> >> Table
>> >> > >>> regions" but the table name is "hbase:meta" and not ".META."
>> >> > >>>
>> >> > >>> On the logs, I found this exception that I did not had before:
>> >> > >>> 2013-08-31 18:45:05,490 WARN
>> >> > >>> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181]
>> server.NIOServerCnxn:
>> >> > >>> caught end of stream exception
>> >> > >>> EndOfStreamException: Unable to read additional data from client
>> >> > >>> sessionid 0x140d68bb9d50004, likely client has closed socket
>> >> > >>>     at
>> >> > >>>
>> >> org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:220)
>> >> > >>>     at
>> >> > >>>
>> >> >
>> >>
>> org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208)
>> >> > >>>     at java.lang.Thread.run(Thread.java:662)
>> >> > >>>
>> >> > >>>
>> >> > >>> I ran PE over 48h. I don't have another 0.96 baseline to compare
>> >> with,
>> >> > >>> so I compared with 0.94.
>> >> > >>> org.apache.hadoop.hbase.PerformanceEvaluation$RandomSeekScanTest
>> is
>> >> > >>> about 3 times slower with 0.96.
>> >> > >>>
>> >> >
>> org.apache.hadoop.hbase.PerformanceEvaluation$RandomScanWithRange100Test
>> >> > >>> seems to be 2 times faster.
>> >> > >>>
>> >> > >>> Writes also are faster but I changed my hard drive since I ran
>> 0.94.
>> >> > >>> So I'm currently re-running 0.94 and will provide another more
>> >> > >>> accurate comparison soon.
>> >> > >>>
>> >> > >>> Ran LoadTestTool: Failed to write keys: 0
>> >> > >>>
>> >> > >>> I have not been able to run IntegrationTestLoadAndVerify nor
>> >> > >>> IntegrationTestBigLinkedList because of the lack of servers.
>> >> > >>>
>> >> > >>> So overall, it seems to be working fine, but I have not been
>> able to
>> >> > >>> test this release as deeply as I'm usually testing the 0.94
>> releases.
>> >> > >>>
>> >> > >>
>> >> > >> Thank you for taking it for a spin JMS -- you the man.  If you
>> don't
>> >> > open
>> >> > >> issues for the teething stuff, I will.
>> >> > >>
>> >> > >> So 0.96 is slower doing RandomSeekScanTest but faster on your
>> other
>> >> > tests?
>> >> > >>  I can take a look too.
>> >> > >>
>> >> > >> St.Ack
>> >> >
>> >>
>>
>
>

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Re: [VOTE] The 1st hbase-0.96.0 release candidate is available for download

Posted by Devaraj Das <dd...@hortonworks.com>.

Elliott, what are the heap sizes of the M/R tasks in your setup. I was
running the job like this (without chaosmonkey to start with):

hbase org.apache.hadoop.hbase.test.IntegrationTestBigLinkedList Loop 5 12
2500000 IntegrationTestBigLinkedList 10

Even the above test failed with one reduce task failing with OOM, in the
verify step. The heap size was set to 3G.

2013-09-04 01:11:56,054 FATAL [main]
org.apache.hadoop.mapred.YarnChild: Error running child :
java.lang.OutOfMemoryError: Java heap space
	at java.util.Arrays.copyOf(Arrays.java:2882)
	at java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:100)
	at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:390)
	at java.lang.StringBuilder.append(StringBuilder.java:119)
	at org.apache.hadoop.hbase.test.IntegrationTestBigLinkedList$Verify$VerifyReducer.reduce(IntegrationTestBigLinkedList.java:576)
	at org.apache.hadoop.hbase.test.IntegrationTestBigLinkedList$Verify$VerifyReducer.reduce(IntegrationTestBigLinkedList.java:547)
	at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:171)
	at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:645)
	at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:405)
	at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:162)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:396)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1477)
	at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:157)





On Tue, Sep 3, 2013 at 12:56 PM, Elliott Clark <ec...@apache.org> wrote:

> Can someone take a look at running Test Big Linked for > 5 iterations
> with slowDeterministic chaos monkey on a distributed cluster.  I'm
> pretty concerned about HBASE-9338
>
> On Tue, Sep 3, 2013 at 6:57 AM, Jean-Marc Spaggiari
> <je...@spaggiari.org> wrote:
> > There was a typo in my log4j.properties :(
> >
> > So it's working fine.
> >
> > The only INFO logs I still see are those one:
> > 2013-09-03 09:53:07,313 INFO  [M:0;t430s:45176] mortbay.log: Logging to
> > org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via
> > org.mortbay.log.Slf4jLog
> > 2013-09-03 09:53:07,350 INFO  [M:0;t430s:45176] mortbay.log: jetty-6.1.26
> > But there is only very few of them.
> >
> > Performances wise, here are the numbers (the higher, the better. Rows per
> > seconds, expect for scans where it's rows/min). As you will see, 0.96 is
> > slower only for RandomSeekScanTest (way slower) and RandomScanWithRange10
> > but is faster for everything else. I ran the tests with the default
> > settings. So I think we should look at RandomSeekScanTest but expect this
> > one, everything else is pretty good.
> >
> > Also, I have been able to reproduce this exception:
> > 2013-09-03 09:55:36,718 WARN  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181
> ]
> > server.NIOServerCnxn: caught end of stream exception
> > EndOfStreamException: Unable to read additional data from client
> sessionid
> > 0x140e4191edb0009, likely client has closed socket
> >     at
> > org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:220)
> >     at
> >
> org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208)
> >     at java.lang.Thread.run(Thread.java:662)
> >
> > Just had to run PE and kill it in the middle.
> >
> > JM
> >
> >
> >
> > 0.96.0RC0 0.94.11
> >  org.apache.hadoop.hbase.PerformanceEvaluation$FilteredScanTest 10.28
> 10.17
> > 101.12%  org.apache.hadoop.hbase.PerformanceEvaluation$RandomReadTest
> 966.01
> > 810.58 119.18%
> > org.apache.hadoop.hbase.PerformanceEvaluation$RandomSeekScanTest 98.50
> > 255.71 38.52%
>  org.apache.hadoop.hbase.PerformanceEvaluation$RandomWriteTest
> > 39251.17 25682.11 152.83%
> > org.apache.hadoop.hbase.PerformanceEvaluation$RandomScanWithRange10Test
> > 25844.88 28715.29 90.00%
> > org.apache.hadoop.hbase.PerformanceEvaluation$RandomScanWithRange100Test
> > 20029.48 18022.39 111.14%
> > org.apache.hadoop.hbase.PerformanceEvaluation$RandomScanWithRange1000Test
> > 2692.16 2346.85 114.71%
> > org.apache.hadoop.hbase.PerformanceEvaluation$SequentialReadTest 3002.18
> > 2875.83 104.39%
> > org.apache.hadoop.hbase.PerformanceEvaluation$SequentialWriteTest
> 38995.50
> > 26693.23 146.09%
> >
> >
> > 2013/9/3 Stack <st...@duboce.net>
> >
> >> On Mon, Sep 2, 2013 at 10:51 AM, Jean-Marc Spaggiari <
> >> jean-marc@spaggiari.org> wrote:
> >>
> >> > I have created:
> >> >  HBASE-9412
> >> >  HBASE-9413
> >> >  HBASE-9414
> >> >
> >> > I have not been able yet to reproduce the ZK error. I'm trying.
> >> >
> >> >
> >> Is it when you have a shell connection and then kill it?
> >>
> >>
> >>
> >> > Last, I tried, with no success, to set loglevel to WARN to remove all
> >> > DEBUG and INFO logs. Setting it to WARN remove the DEBUG lines, but I
> >> > keep getting the INFO. Seems that something is setting the log level
> >> > somewhere else, or it's not read.
> >> >
> >> > Here is my log4j.properties file. I removed all the customed log level
> >> > to setup WARN for org.apache. And it's still showing INFO...
> >> >
> >> >
> >>
> >> You did it by editing log4j and restarting or in the UI?  I think the UI
> >> log level setting is broke.... (new issue!)
> >>
> >> Thanks for trying it out JMS,
> >>
> >> So everything is slower in 0.96?
> >> St.Ack
> >>
> >>
> >>
> >> > JM
> >> >
> >> >
> >> > # Define some default values that can be overridden by system
> properties
> >> > hbase.root.logger=WARN,console
> >> > hbase.security.logger=WARN,console
> >> > hbase.log.dir=.
> >> > hbase.log.file=hbase.log
> >> >
> >> > # Define the root logger to the system property "hbase.root.logger".
> >> > log4j.rootLogger=${hbase.root.logger}
> >> >
> >> > # Logging Threshold
> >> > log4j.threshold=ALL
> >> >
> >> > #
> >> > # Daily Rolling File Appender
> >> > #
> >> > log4j.appender.DRFA=org.apache.log4j.DailyRollingFileAppender
> >> > log4j.appender.DRFA.File=${hbase.log.dir}/${hbase.log.file}
> >> >
> >> > # Rollver at midnight
> >> > log4j.appender.DRFA.DatePattern=.yyyy-MM-dd
> >> >
> >> > # 30-day backup
> >> > #log4j.appender.DRFA.MaxBackupIndex=30
> >> > log4j.appender.DRFA.layout=org.apache.log4j.PatternLayout
> >> >
> >> > # Pattern format: Date LogLevel LoggerName LogMessage
> >> > log4j.appender.DRFA.layout.ConversionPattern=%d{ISO8601} %-5p [%t]
> %c{2}:
> >> > %m%n
> >> >
> >> > # Rolling File Appender properties
> >> > hbase.log.maxfilesize=256MB
> >> > hbase.log.maxbackupindex=20
> >> >
> >> > # Rolling File Appender
> >> > log4j.appender.RFA=org.apache.log4j.RollingFileAppender
> >> > log4j.appender.RFA.File=${hbase.log.dir}/${hbase.log.file}
> >> >
> >> > log4j.appender.RFA.MaxFileSize=${hbase.log.maxfilesize}
> >> > log4j.appender.RFA.MaxBackupIndex=${hbase.log.maxbackupindex}
> >> >
> >> > log4j.appender.RFA.layout=org.apache.log4j.PatternLayout
> >> > log4j.appender.RFA.layout.ConversionPattern=%d{ISO8601} %-5p [%t]
> %c{2}:
> >> > %m%n
> >> >
> >> > #
> >> > # Security audit appender
> >> > #
> >> > hbase.security.log.file=SecurityAuth.audit
> >> > hbase.security.log.maxfilesize=256MB
> >> > hbase.security.log.maxbackupindex=20
> >> > log4j.appender.RFAS=org.apache.log4j.RollingFileAppender
> >> > log4j.appender.RFAS.File=${hbase.log.dir}/${hbase.security.log.file}
> >> > log4j.appender.RFAS.MaxFileSize=${hbase.security.log.maxfilesize}
> >> >
> log4j.appender.RFAS.MaxBackupIndex=${hbase.security.log.maxbackupindex}
> >> > log4j.appender.RFAS.layout=org.apache.log4j.PatternLayout
> >> > log4j.appender.RFAS.layout.ConversionPattern=%d{ISO8601} %p %c: %m%n
> >> > log4j.category.SecurityLogger=${hbase.security.logger}
> >> > log4j.additivity.SecurityLogger=false
> >> >
> >> >
> >>
> #log4j.logger.SecurityLogger.org.apache.hadoop.hbase.security.access.AccessController=TRACE
> >> >
> >> > #
> >> > # Null Appender
> >> > #
> >> > log4j.appender.NullAppender=org.apache.log4j.varia.NullAppender
> >> >
> >> > #
> >> > # console
> >> > # Add "console" to rootlogger above if you want to use this
> >> > #
> >> > log4j.appender.console=org.apache.log4j.ConsoleAppender
> >> > log4j.appender.console.target=System.err
> >> > log4j.appender.console.layout=org.apache.log4j.PatternLayout
> >> > log4j.appender.console.layout.ConversionPattern=%d{ISO8601} %-5p [%t]
> >> > %c{2}: %m%n
> >> >
> >> > # Custom Logging levels
> >> >
> >> > log4j.logger.ore.apache=WARN
> >> >
> >> > 2013/9/2 Jean-Marc Spaggiari <je...@spaggiari.org>:
> >> > > Hi St.Ack,
> >> > >
> >> > > I will open the relate JIRAs in few minutes.
> >> > >
> >> > > Regarding performances, RandomSeekScanTest is way slower, and only
> >> > > RandomScanWithRange100Test was faster. Others were similar. For
> >> > > RandomScanWithRange100Test I suspect that I don't have the right
> >> > > number for 0.94 so 0.94.11 tests are running right now on the same
> >> > > server with the same configuration. I will start to have numbers by
> >> > > end of day, else, tomorrow morning, but will most probably take
> about
> >> > > 24h to get all of them.
> >> > >
> >> > > JM
> >> > >
> >> > > 2013/9/2 Stack <st...@duboce.net>:
> >> > >> On Mon, Sep 2, 2013 at 9:41 AM, Jean-Marc Spaggiari <
> >> > jean-marc@spaggiari.org
> >> > >>> wrote:
> >> > >>
> >> > >>> Got it.
> >> > >>>
> >> > >>> I can't run the integration tests for now because I'm lacking some
> >> > >>> servers :( Need to complete some HBase on RAID tests before I can
> get
> >> > >>> those new servers
> >> > >>>
> >> > >>> First thing is, start-hbase.cmd has the execute flag set. I don't
> >> > >>> think it's required. And it will help with tabulation feature if
> we
> >> > >>> can un-set it.
> >> > >>>
> >> > >>> For 0.96.0RC0 here are my results:
> >> > >>> First, I get 2 .out files. each time I start the server, instead
> of
> >> > >>> usually one... With the same timestamp.
> >> > >>>
> >> > >>> -rw-r--r-- 1 jmspaggiari jmspaggiari     0 Aug 31 15:38
> >> > >>> hbase-jmspaggiari-master-t430s.out
> >> > >>> -rw-r--r-- 1 jmspaggiari jmspaggiari     0 Aug 31 15:38
> >> > >>> hbase-jmspaggiari-master-t430s.out.1
> >> > >>>
> >> > >>>
> >> > >>> In the UI, we say "The .META. table holds references to all User
> >> Table
> >> > >>> regions" but the table name is "hbase:meta" and not ".META."
> >> > >>>
> >> > >>> On the logs, I found this exception that I did not had before:
> >> > >>> 2013-08-31 18:45:05,490 WARN
> >> > >>> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181] server.NIOServerCnxn:
> >> > >>> caught end of stream exception
> >> > >>> EndOfStreamException: Unable to read additional data from client
> >> > >>> sessionid 0x140d68bb9d50004, likely client has closed socket
> >> > >>>     at
> >> > >>>
> >> org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:220)
> >> > >>>     at
> >> > >>>
> >> >
> >>
> org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208)
> >> > >>>     at java.lang.Thread.run(Thread.java:662)
> >> > >>>
> >> > >>>
> >> > >>> I ran PE over 48h. I don't have another 0.96 baseline to compare
> >> with,
> >> > >>> so I compared with 0.94.
> >> > >>> org.apache.hadoop.hbase.PerformanceEvaluation$RandomSeekScanTest
> is
> >> > >>> about 3 times slower with 0.96.
> >> > >>>
> >> >
> org.apache.hadoop.hbase.PerformanceEvaluation$RandomScanWithRange100Test
> >> > >>> seems to be 2 times faster.
> >> > >>>
> >> > >>> Writes also are faster but I changed my hard drive since I ran
> 0.94.
> >> > >>> So I'm currently re-running 0.94 and will provide another more
> >> > >>> accurate comparison soon.
> >> > >>>
> >> > >>> Ran LoadTestTool: Failed to write keys: 0
> >> > >>>
> >> > >>> I have not been able to run IntegrationTestLoadAndVerify nor
> >> > >>> IntegrationTestBigLinkedList because of the lack of servers.
> >> > >>>
> >> > >>> So overall, it seems to be working fine, but I have not been able
> to
> >> > >>> test this release as deeply as I'm usually testing the 0.94
> releases.
> >> > >>>
> >> > >>
> >> > >> Thank you for taking it for a spin JMS -- you the man.  If you
> don't
> >> > open
> >> > >> issues for the teething stuff, I will.
> >> > >>
> >> > >> So 0.96 is slower doing RandomSeekScanTest but faster on your other
> >> > tests?
> >> > >>  I can take a look too.
> >> > >>
> >> > >> St.Ack
> >> >
> >>
>

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.