You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@accumulo.apache.org by Hayden Marchant <HA...@il.ibm.com> on 2014/06/19 12:17:58 UTC
Running Accumulo on the IBM JVM
Hi there,
I have been working on getting Accumulo running on IBM JDK, as preparation
of including Accumulo in an upcoming version of BigInsights (IBM's Hadoop
distribution). I have come across a number of issues, to which I have made
some local fixes in my own environment. Since I'm a newbie in Accumulo, I
wanted to make sure that the approach that I have taken for resolving
these issues is aligned with the design intent of Accumulo.
Some of the issues are real defects, and some are instances in which the
assumption of Sun/Oracle JDK being the used JVM is hard-coded into the
source-code.
I have grouped the issues into 2 sections - Unit test failures and
Sun-specific dependencies (though there is an overlap)
1. Unit Test failures - should run consistently no matter which OS, Java
vendor/version etc...
a.
org.apache.accumulo.core.util.format.ShardedTableDistributionFormatterTest.testAggregate
. This fails on IBM JRE, since the test is asserting order of elements in
a HashMap. This consistently passes on Sun , and consistently fails on
Oracle. Proposal: Change ShardedTableDistributionFormatter.countsByDay to
TreeMap
b.
org.apache.accumulo.core.security.crypto.BlockedIOStreamTest.testGiantWrite.
This test assumes a max heap of about 1GB. This fails on IBM JRE,
since the default max heap is not specified, and on IBM JRE this depends
on the OS (see
http://www-01.ibm.com/support/knowledgecenter/SSYKE2_6.0.0/com.ibm.java.doc.diagnostics.60/diag/appendixes/defaults.html?lang=en
).
Proposal: add -Xmx1g to the surefire maven plugin reference in
parent maven pom.
c. Both org.apache.accumulo.core.security.crypto.CrypoTest &
org.apache.accumulo.core.file.rfile.RFileTest have lots of failures due to
calls to SEcureRandom with Random Number Generator Provider hard-coded as
Sun. The IBM JRE has it's own built in RNG Provider called IBMJCE. 2
issues - hard-coded calls to SecureRandom.getInstance(<algo>,"SUN") and
also default value in Property class is "SUN".
Proposal: Add mechanism to override default Property through
System property through new annotator in Property class. Only usage will
be by Property.CRYPTO_SECURE_RNG_PROVIDER
2. Environment/Configuration
a. The generated configuration files contain references to GC
params that are specific to Sun JVM. In accumulo-env.sh, the
ACCUMULO_TSERVER_OPTS contains -XX:NewSize and -XX:MaxNewSize , and also
in ACCUMULO_GENERAL_OPTS,
-XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=75 are used.
b. in bin/accumulo, get ClassNotFoundException due to
specification of JAXP Doc Builder:
-Djavax.xml.parsers.DocumentBuilderFactory=com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderFactoryImpl
.
The Sun implementation of Document Builder Factory does not exists
in IBM JDK, so a ClassNotFoundException is thrown on running accumulo
script
c. MiniAccumuloCluster - in the MiniAccumuloClusterImpl,
Sun-speciifc GC params are passed as params to the java process (similar
to section a. )
Single proposal for solving all three above issues:
Enhance bootstrap_config.sh with request to select Java vendor.
Selecting this will set correct values for GC params (they differ between
IBM and Sun), inclusion/ommision of JAXP setting. The
MiniAccumuloClusterImpl can read the same env variable that was set in
code for the GC Params, and use in the exec command.
So far, my work has been focused on getting unit tests working for all
Java vendors in a clean manner. I have not yet run intensive testing of
real clusters following these changes, and would be happy to get pointers
to what else might need treatment.
I would also like to hear if these changes make sense, and if so, should
I go ahead and create some JIRAs, and attach my patches for commit
approval?
Looking forward to hearing feedback!
Regards,
Hayden Marchant
Software Architect
IBM BigInsights, IBM
Re: Running Accumulo on the IBM JVM
Posted by Vicky Kak <vi...@gmail.com>.
Hi Hayden,
Most of the recommendation looks okay to me since there are many change to
be done I think you should go ahead and create main JIRA which would have
multiple subtasks addressing all the changes.
I am almost sure that you might get into similar kind of issue if you run
other java based NoSql distributions i.e. HBase/Cassandra on IBM jdk, I
personally had surprises in api calls related to ordering in my application
a long back ago. Your observations looks reasonable to me.
Regards,
Vicky
On Thu, Jun 19, 2014 at 3:47 PM, Hayden Marchant <HA...@il.ibm.com> wrote:
> Hi there,
>
> I have been working on getting Accumulo running on IBM JDK, as preparation
> of including Accumulo in an upcoming version of BigInsights (IBM's Hadoop
> distribution). I have come across a number of issues, to which I have made
> some local fixes in my own environment. Since I'm a newbie in Accumulo, I
> wanted to make sure that the approach that I have taken for resolving
> these issues is aligned with the design intent of Accumulo.
>
> Some of the issues are real defects, and some are instances in which the
> assumption of Sun/Oracle JDK being the used JVM is hard-coded into the
> source-code.
>
> I have grouped the issues into 2 sections - Unit test failures and
> Sun-specific dependencies (though there is an overlap)
>
> 1. Unit Test failures - should run consistently no matter which OS, Java
> vendor/version etc...
> a.
>
> org.apache.accumulo.core.util.format.ShardedTableDistributionFormatterTest.testAggregate
> . This fails on IBM JRE, since the test is asserting order of elements in
> a HashMap. This consistently passes on Sun , and consistently fails on
> Oracle. Proposal: Change ShardedTableDistributionFormatter.countsByDay to
> TreeMap
>
> b.
>
> org.apache.accumulo.core.security.crypto.BlockedIOStreamTest.testGiantWrite.
> This test assumes a max heap of about 1GB. This fails on IBM JRE,
> since the default max heap is not specified, and on IBM JRE this depends
> on the OS (see
>
> http://www-01.ibm.com/support/knowledgecenter/SSYKE2_6.0.0/com.ibm.java.doc.diagnostics.60/diag/appendixes/defaults.html?lang=en
> ).
> Proposal: add -Xmx1g to the surefire maven plugin reference in
> parent maven pom.
>
> c. Both org.apache.accumulo.core.security.crypto.CrypoTest &
> org.apache.accumulo.core.file.rfile.RFileTest have lots of failures due to
> calls to SEcureRandom with Random Number Generator Provider hard-coded as
> Sun. The IBM JRE has it's own built in RNG Provider called IBMJCE. 2
> issues - hard-coded calls to SecureRandom.getInstance(<algo>,"SUN") and
> also default value in Property class is "SUN".
> Proposal: Add mechanism to override default Property through
> System property through new annotator in Property class. Only usage will
> be by Property.CRYPTO_SECURE_RNG_PROVIDER
>
>
> 2. Environment/Configuration
> a. The generated configuration files contain references to GC
> params that are specific to Sun JVM. In accumulo-env.sh, the
> ACCUMULO_TSERVER_OPTS contains -XX:NewSize and -XX:MaxNewSize , and also
> in ACCUMULO_GENERAL_OPTS,
> -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=75 are used.
> b. in bin/accumulo, get ClassNotFoundException due to
> specification of JAXP Doc Builder:
>
> -Djavax.xml.parsers.DocumentBuilderFactory=com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderFactoryImpl
> .
> The Sun implementation of Document Builder Factory does not exists
> in IBM JDK, so a ClassNotFoundException is thrown on running accumulo
> script
>
> c. MiniAccumuloCluster - in the MiniAccumuloClusterImpl,
> Sun-speciifc GC params are passed as params to the java process (similar
> to section a. )
>
> Single proposal for solving all three above issues:
> Enhance bootstrap_config.sh with request to select Java vendor.
> Selecting this will set correct values for GC params (they differ between
> IBM and Sun), inclusion/ommision of JAXP setting. The
> MiniAccumuloClusterImpl can read the same env variable that was set in
> code for the GC Params, and use in the exec command.
>
>
> So far, my work has been focused on getting unit tests working for all
> Java vendors in a clean manner. I have not yet run intensive testing of
> real clusters following these changes, and would be happy to get pointers
> to what else might need treatment.
>
> I would also like to hear if these changes make sense, and if so, should
> I go ahead and create some JIRAs, and attach my patches for commit
> approval?
>
> Looking forward to hearing feedback!
>
> Regards,
> Hayden Marchant
> Software Architect
> IBM BigInsights, IBM
>