You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@accumulo.apache.org by Hayden Marchant <HA...@il.ibm.com> on 2014/06/19 12:17:58 UTC

Running Accumulo on the IBM JVM

Hi there,

I have been working on getting Accumulo running on IBM JDK, as preparation 
of including Accumulo in an upcoming version of BigInsights (IBM's Hadoop 
distribution). I have come across a number of issues, to which I have made 
some local fixes in my own environment. Since I'm a newbie in Accumulo, I 
wanted to make sure that the approach that I have taken for resolving 
these issues is aligned with the design intent of Accumulo.

Some of the issues are real defects, and some are instances in which the 
assumption of Sun/Oracle JDK being the used JVM is hard-coded into the 
source-code.

I have grouped the issues into 2 sections -  Unit test failures and 
Sun-specific dependencies (though there is an overlap)

1. Unit Test failures - should run consistently no matter which OS, Java 
vendor/version etc...
        a. 
org.apache.accumulo.core.util.format.ShardedTableDistributionFormatterTest.testAggregate 
. This fails on IBM JRE, since the test is asserting order of elements in 
a HashMap. This consistently passes on Sun , and consistently fails on 
Oracle. Proposal: Change ShardedTableDistributionFormatter.countsByDay to 
TreeMap
 
        b. 
org.apache.accumulo.core.security.crypto.BlockedIOStreamTest.testGiantWrite.
        This test assumes a max heap of about 1GB. This fails on IBM JRE, 
since the default max heap is not specified, and on IBM JRE this depends 
on the OS (see 
http://www-01.ibm.com/support/knowledgecenter/SSYKE2_6.0.0/com.ibm.java.doc.diagnostics.60/diag/appendixes/defaults.html?lang=en
). 
        Proposal: add -Xmx1g to the surefire maven plugin reference in 
parent maven pom.
 
        c. Both org.apache.accumulo.core.security.crypto.CrypoTest & 
org.apache.accumulo.core.file.rfile.RFileTest have lots of failures due to 
calls to SEcureRandom with Random Number Generator Provider hard-coded as 
Sun. The IBM JRE has it's own built in RNG Provider called IBMJCE. 2 
issues - hard-coded calls to SecureRandom.getInstance(<algo>,"SUN") and 
also default value in Property class is "SUN". 
        Proposal: Add mechanism to override default Property through 
System property through new annotator in Property class. Only usage will 
be by Property.CRYPTO_SECURE_RNG_PROVIDER
 
 
2. Environment/Configuration
        a. The generated configuration files contain references to GC 
params that are specific to Sun JVM. In accumulo-env.sh, the 
ACCUMULO_TSERVER_OPTS contains -XX:NewSize and -XX:MaxNewSize , and also 
in ACCUMULO_GENERAL_OPTS,
-XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=75 are used.
        b. in bin/accumulo, get ClassNotFoundException due to 
specification of JAXP Doc Builder: 
-Djavax.xml.parsers.DocumentBuilderFactory=com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderFactoryImpl 
. 
        The Sun implementation of Document Builder Factory does not exists 
in IBM JDK, so a ClassNotFoundException is thrown on running accumulo 
script
 
        c. MiniAccumuloCluster - in the MiniAccumuloClusterImpl, 
Sun-speciifc GC params are passed as params to the java process (similar 
to section a. )
 
        Single proposal for solving all three above issues:
        Enhance bootstrap_config.sh with request to select Java vendor. 
Selecting this will set correct values for GC params (they differ between 
IBM and Sun), inclusion/ommision of JAXP setting. The 
MiniAccumuloClusterImpl can read the same env variable that was set in 
code for the GC Params, and use in the exec command.
 
 
 So far, my work has been focused on getting unit tests working for all 
Java vendors in a clean manner. I have not yet run intensive testing of 
real clusters following these changes, and would be happy to get pointers 
to what else might need treatment.
 
 I would also like to hear if these changes make sense, and if so, should 
I go ahead and create some JIRAs, and attach my patches for commit 
approval?
 
 Looking forward to hearing feedback!
 
 Regards,
 Hayden Marchant
 Software Architect
 IBM BigInsights, IBM
 

Re: Running Accumulo on the IBM JVM

Posted by Vicky Kak <vi...@gmail.com>.
Hi Hayden,

Most of the recommendation looks okay to me since there are many change to
be done I think you should go ahead and create main JIRA which would have
multiple subtasks addressing all the changes.
I am almost sure that you might get into similar kind of issue if you run
other java based NoSql distributions i.e. HBase/Cassandra on IBM jdk, I
personally had surprises in api calls related to ordering in my application
a long back ago. Your observations looks reasonable to me.

Regards,
Vicky


On Thu, Jun 19, 2014 at 3:47 PM, Hayden Marchant <HA...@il.ibm.com> wrote:

> Hi there,
>
> I have been working on getting Accumulo running on IBM JDK, as preparation
> of including Accumulo in an upcoming version of BigInsights (IBM's Hadoop
> distribution). I have come across a number of issues, to which I have made
> some local fixes in my own environment. Since I'm a newbie in Accumulo, I
> wanted to make sure that the approach that I have taken for resolving
> these issues is aligned with the design intent of Accumulo.
>
> Some of the issues are real defects, and some are instances in which the
> assumption of Sun/Oracle JDK being the used JVM is hard-coded into the
> source-code.
>
> I have grouped the issues into 2 sections -  Unit test failures and
> Sun-specific dependencies (though there is an overlap)
>
> 1. Unit Test failures - should run consistently no matter which OS, Java
> vendor/version etc...
>         a.
>
> org.apache.accumulo.core.util.format.ShardedTableDistributionFormatterTest.testAggregate
> . This fails on IBM JRE, since the test is asserting order of elements in
> a HashMap. This consistently passes on Sun , and consistently fails on
> Oracle. Proposal: Change ShardedTableDistributionFormatter.countsByDay to
> TreeMap
>
>         b.
>
> org.apache.accumulo.core.security.crypto.BlockedIOStreamTest.testGiantWrite.
>         This test assumes a max heap of about 1GB. This fails on IBM JRE,
> since the default max heap is not specified, and on IBM JRE this depends
> on the OS (see
>
> http://www-01.ibm.com/support/knowledgecenter/SSYKE2_6.0.0/com.ibm.java.doc.diagnostics.60/diag/appendixes/defaults.html?lang=en
> ).
>         Proposal: add -Xmx1g to the surefire maven plugin reference in
> parent maven pom.
>
>         c. Both org.apache.accumulo.core.security.crypto.CrypoTest &
> org.apache.accumulo.core.file.rfile.RFileTest have lots of failures due to
> calls to SEcureRandom with Random Number Generator Provider hard-coded as
> Sun. The IBM JRE has it's own built in RNG Provider called IBMJCE. 2
> issues - hard-coded calls to SecureRandom.getInstance(<algo>,"SUN") and
> also default value in Property class is "SUN".
>         Proposal: Add mechanism to override default Property through
> System property through new annotator in Property class. Only usage will
> be by Property.CRYPTO_SECURE_RNG_PROVIDER
>
>
> 2. Environment/Configuration
>         a. The generated configuration files contain references to GC
> params that are specific to Sun JVM. In accumulo-env.sh, the
> ACCUMULO_TSERVER_OPTS contains -XX:NewSize and -XX:MaxNewSize , and also
> in ACCUMULO_GENERAL_OPTS,
> -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=75 are used.
>         b. in bin/accumulo, get ClassNotFoundException due to
> specification of JAXP Doc Builder:
>
> -Djavax.xml.parsers.DocumentBuilderFactory=com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderFactoryImpl
> .
>         The Sun implementation of Document Builder Factory does not exists
> in IBM JDK, so a ClassNotFoundException is thrown on running accumulo
> script
>
>         c. MiniAccumuloCluster - in the MiniAccumuloClusterImpl,
> Sun-speciifc GC params are passed as params to the java process (similar
> to section a. )
>
>         Single proposal for solving all three above issues:
>         Enhance bootstrap_config.sh with request to select Java vendor.
> Selecting this will set correct values for GC params (they differ between
> IBM and Sun), inclusion/ommision of JAXP setting. The
> MiniAccumuloClusterImpl can read the same env variable that was set in
> code for the GC Params, and use in the exec command.
>
>
>  So far, my work has been focused on getting unit tests working for all
> Java vendors in a clean manner. I have not yet run intensive testing of
> real clusters following these changes, and would be happy to get pointers
> to what else might need treatment.
>
>  I would also like to hear if these changes make sense, and if so, should
> I go ahead and create some JIRAs, and attach my patches for commit
> approval?
>
>  Looking forward to hearing feedback!
>
>  Regards,
>  Hayden Marchant
>  Software Architect
>  IBM BigInsights, IBM
>