You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@accumulo.apache.org by Keys Botzum <kb...@maprtech.com> on 2012/04/20 15:35:15 UTC

Re: Accumulo on MapR Continued

Keith,

I was able to get Accumulo to use a Accumulo specific configuration of Hadoop. It was a bit of a hack. Basically I created a fake Hadoop installation tree that is almost entirely symbolic links to the real tree under /opt/mapr/hadoop. The only real file in the tree is core-site.xml where I set the two properties. The essential steps where:
	cd /opt/accumulo
	mkdir hadoop
	mkdir hadoop/hadoop-0.20.2
	cd  hadoop/hadoop-0.20.2
	ln -s /opt/mapr/hadoop/hadoop-0.20.2/* .
	rm conf
	mkdir conf
	cd conf
	ln -s /opt/mapr/hadoop/hadoop-0.20.2/conf/*
	cp core-site.xml t
	mv t core-site.xml
	edit core-site.xml as needed

Then I set the HADOOP_HOME in accumulo-env.sh to that directory and everything worked fine.

By the way, I tried setting HADOOP_CONF_DIR and that had no effect.

Since I plan to document these steps, I want to make sure I understood your intent and that I haven't missed something. Typically in Hadoop components the ultimate configuration is a combination of each components *-site.xml file. As a result I can set things in, for example, hbase-site.xml that are really Hadoop properties. Assuming I understood what you and Eric were saying, this is not true in Accumulo. That's fine by me, but I just want to make sure I'm not saying things that aren't true.

Thanks again for all of your help,
Keys

p.s. I'm running the random and ingest tests you and Eric suggested as we speak. The random test completed successfully.
________________________________
Keys Botzum
Senior Principal Technologist
WW Systems Engineering
kbotzum@maprtech.com
443-718-0098
MapR Technologies
http://www.mapr.com



On Apr 18, 2012, at 3:11 PM, Keith Turner wrote:

> I suppose accumulo could be pointed to a different hadoop config dir.
> 
> On Wed, Apr 18, 2012 at 1:58 PM, Keys Botzum <kb...@maprtech.com> wrote:
>> Eric and Keith,
>> 
>> I will attempt the additional tests you have suggested.
>> 
>> Any ideas on what to do regarding those configuration properties? With hbase
>> in hbase-site.xml, we set those properties and they work fine. Is there some
>> incantation I'm missing here? I really don't want those properties to be
>> global as they will negatively impact performance and are only relevant to
>> hbase and Accumulo.
>> 
>> Thanks,
>> Keys
>> ________________________________
>> Keys Botzum
>> Senior Principal Technologist
>> WW Systems Engineering
>> kbotzum@maprtech.com
>> 443-718-0098
>> MapR Technologies
>> http://www.mapr.com
>> 
>> 
>> 
>> On Apr 18, 2012, at 1:42 PM, Keith Turner wrote:
>> 
>> Settings in accumulo-site.xml do not end up in the hadoop config
>> object, so setting them will probably have no effect.
>> 
>> I would suggest running continuous ingest test and random walk test if
>> you really want to stress it.  These are the test we use prior to an
>> accumulo release.  You would need to exclude the random walk security
>> test, it triggers known bug in 1.4 that are not fixed.
>> 
>> Running the test on a cluster overnight would be good.
>> 
>> Keith
>> 
>> On Wed, Apr 18, 2012 at 1:17 PM, Keys Botzum <kb...@maprtech.com> wrote:
>> 
>> Thanks to the help of Keith, Todd, and Eric, as well as MapR engineering,
>> all of the Accumulo tests is test/system/auto are now passing. Note that the
>> latelastcontact test only passes if I actually install zookeeper on the
>> host. This is because of the dependency on zkCli.sh that I mentioned
>> earlier.
>> 
>> 
>> The final piece of the puzzle was that MapR does aggressive read ahead
>> caching of data as well as aggregation of writes to improve performance. As
>> with Hbase, we don't think this type of behavior is helpful with something
>> like Accumulo. In our specific case, the interaction between Accumulo and
>> MapR's behavior results in the large row test failing.
>> 
>> 
>> So now I have one more question. To disable the caching and aggregation
>> behavior, we need to set these properties:
>> 
>> <property>
>> 
>> <name>fs.mapr.readbuffering</name>
>> 
>> <value>false</value>
>> 
>> </property>
>> 
>> 
>> <property>
>> 
>> <name>fs.mapr.aggregate.writes</name>
>> 
>> <value>false</value>
>> 
>> </property>
>> 
>> 
>> If I set them in core-site.xml they of course work but that's a global
>> setting. I want to only affect Accumulo. If I set them in accumulo-site.xml,
>> I presume they take effect for normal Accumulo usage, but I'm nearly certain
>> that settings in accumulo-site.xml do not affect the tests as I posted
>> earlier. How can I set those two properties in a way that will cause the
>> tests temporary configuration to take them into account? I tried editing
>> TestUtils.py TestUtilsMixin settings as did work for the Accumulo property
>> table.file.compress.type, but the MapR related properties don't seem to
>> take. Ideas?
>> 
>> 
>> Also, if all of the auto tests pass successfully do you feel comfortable
>> that the testing was sufficient or do you recommend running additional
>> tests?
>> 
>> 
>> Thanks!
>> 
>> Keys
>> 
>> ________________________________
>> 
>> Keys Botzum
>> 
>> Senior Principal Technologist
>> 
>> WW Systems Engineering
>> 
>> kbotzum@maprtech.com
>> 
>> 443-718-0098
>> 
>> MapR Technologies
>> 
>> http://www.mapr.com
>> 
>> 
>> 


Re: Accumulo on MapR Continued

Posted by Keys Botzum <kb...@maprtech.com>.
Eric,

Clever. I'll add that to the doc as an option.

Keys
________________________________
Keys Botzum
Senior Principal Technologist
WW Systems Engineering
kbotzum@maprtech.com
443-718-0098
MapR Technologies
http://www.mapr.com



On Apr 20, 2012, at 9:44 AM, Eric Newton wrote:

> You should be able to adjust the classpath in conf/accumulo-site.xml, and remove $HADOOP_HOME/conf and just put the updated core-site.xml in the accumulo/conf directory.
> 
> -Eric
> 
> On Fri, Apr 20, 2012 at 9:35 AM, Keys Botzum <kb...@maprtech.com> wrote:
> Keith,
> 
> I was able to get Accumulo to use a Accumulo specific configuration of Hadoop. It was a bit of a hack. Basically I created a fake Hadoop installation tree that is almost entirely symbolic links to the real tree under /opt/mapr/hadoop. The only real file in the tree is core-site.xml where I set the two properties. The essential steps where:
> 	cd /opt/accumulo
> 	mkdir hadoop
> 	mkdir hadoop/hadoop-0.20.2
> 	cd  hadoop/hadoop-0.20.2
> 	ln -s /opt/mapr/hadoop/hadoop-0.20.2/* .
> 	rm conf
> 	mkdir conf
> 	cd conf
> 	ln -s /opt/mapr/hadoop/hadoop-0.20.2/conf/*
> 	cp core-site.xml t
> 	mv t core-site.xml
> 	edit core-site.xml as needed
> 
> Then I set the HADOOP_HOME in accumulo-env.sh to that directory and everything worked fine.
> 
> By the way, I tried setting HADOOP_CONF_DIR and that had no effect.
> 
> Since I plan to document these steps, I want to make sure I understood your intent and that I haven't missed something. Typically in Hadoop components the ultimate configuration is a combination of each components *-site.xml file. As a result I can set things in, for example, hbase-site.xml that are really Hadoop properties. Assuming I understood what you and Eric were saying, this is not true in Accumulo. That's fine by me, but I just want to make sure I'm not saying things that aren't true.
> 
> Thanks again for all of your help,
> Keys
> 
> p.s. I'm running the random and ingest tests you and Eric suggested as we speak. The random test completed successfully.
> 
> ________________________________
> Keys Botzum
> Senior Principal Technologist
> WW Systems Engineering
> kbotzum@maprtech.com
> 443-718-0098
> MapR Technologies
> http://www.mapr.com
> 
> 


Re: Accumulo on MapR Continued

Posted by Eric Newton <er...@gmail.com>.
You should be able to adjust the classpath in conf/accumulo-site.xml, and
remove $HADOOP_HOME/conf and just put the updated core-site.xml in the
accumulo/conf directory.

-Eric

On Fri, Apr 20, 2012 at 9:35 AM, Keys Botzum <kb...@maprtech.com> wrote:

> Keith,
>
> I was able to get Accumulo to use a Accumulo specific configuration of
> Hadoop. It was a bit of a hack. Basically I created a fake Hadoop
> installation tree that is almost entirely symbolic links to the real tree
> under /opt/mapr/hadoop. The only real file in the tree is core-site.xml
> where I set the two properties. The essential steps where:
> cd /opt/accumulo
> mkdir hadoop
> mkdir hadoop/hadoop-0.20.2
> cd  hadoop/hadoop-0.20.2
> ln -s /opt/mapr/hadoop/hadoop-0.20.2/* .
> rm conf
> mkdir conf
> cd conf
> ln -s /opt/mapr/hadoop/hadoop-0.20.2/conf/*
> cp core-site.xml t
> mv t core-site.xml
> edit core-site.xml as needed
>
> Then I set the HADOOP_HOME in accumulo-env.sh to that directory and
> everything worked fine.
>
> By the way, I tried setting HADOOP_CONF_DIR and that had no effect.
>
> Since I plan to document these steps, I want to make sure I understood
> your intent and that I haven't missed something. Typically in Hadoop
> components the ultimate configuration is a combination of each components
> *-site.xml file. As a result I can set things in, for example,
> hbase-site.xml that are really Hadoop properties. Assuming I understood
> what you and Eric were saying, this is not true in Accumulo. That's fine by
> me, but I just want to make sure I'm not saying things that aren't true.
>
> Thanks again for all of your help,
> Keys
>
> p.s. I'm running the random and ingest tests you and Eric suggested as we
> speak. The random test completed successfully.
>
> ________________________________
> Keys Botzum
> Senior Principal Technologist
> WW Systems Engineering
> kbotzum@maprtech.com
> 443-718-0098
> MapR Technologies
> http://www.mapr.com
>
>
>
> On Apr 18, 2012, at 3:11 PM, Keith Turner wrote:
>
> I suppose accumulo could be pointed to a different hadoop config dir.
>
> On Wed, Apr 18, 2012 at 1:58 PM, Keys Botzum <kb...@maprtech.com> wrote:
>
> Eric and Keith,
>
>
> I will attempt the additional tests you have suggested.
>
>
> Any ideas on what to do regarding those configuration properties? With
> hbase
>
> in hbase-site.xml, we set those properties and they work fine. Is there
> some
>
> incantation I'm missing here? I really don't want those properties to be
>
> global as they will negatively impact performance and are only relevant to
>
> hbase and Accumulo.
>
>
> Thanks,
>
> Keys
>
> ________________________________
>
> Keys Botzum
>
> Senior Principal Technologist
>
> WW Systems Engineering
>
> kbotzum@maprtech.com
>
> 443-718-0098
>
> MapR Technologies
>
> http://www.mapr.com
>
>
>
>
> On Apr 18, 2012, at 1:42 PM, Keith Turner wrote:
>
>
> Settings in accumulo-site.xml do not end up in the hadoop config
>
> object, so setting them will probably have no effect.
>
>
> I would suggest running continuous ingest test and random walk test if
>
> you really want to stress it.  These are the test we use prior to an
>
> accumulo release.  You would need to exclude the random walk security
>
> test, it triggers known bug in 1.4 that are not fixed.
>
>
> Running the test on a cluster overnight would be good.
>
>
> Keith
>
>
> On Wed, Apr 18, 2012 at 1:17 PM, Keys Botzum <kb...@maprtech.com> wrote:
>
>
> Thanks to the help of Keith, Todd, and Eric, as well as MapR engineering,
>
> all of the Accumulo tests is test/system/auto are now passing. Note that
> the
>
> latelastcontact test only passes if I actually install zookeeper on the
>
> host. This is because of the dependency on zkCli.sh that I mentioned
>
> earlier.
>
>
>
> The final piece of the puzzle was that MapR does aggressive read ahead
>
> caching of data as well as aggregation of writes to improve performance. As
>
> with Hbase, we don't think this type of behavior is helpful with something
>
> like Accumulo. In our specific case, the interaction between Accumulo and
>
> MapR's behavior results in the large row test failing.
>
>
>
> So now I have one more question. To disable the caching and aggregation
>
> behavior, we need to set these properties:
>
>
> <property>
>
>
> <name>fs.mapr.readbuffering</name>
>
>
> <value>false</value>
>
>
> </property>
>
>
>
> <property>
>
>
> <name>fs.mapr.aggregate.writes</name>
>
>
> <value>false</value>
>
>
> </property>
>
>
>
> If I set them in core-site.xml they of course work but that's a global
>
> setting. I want to only affect Accumulo. If I set them in
> accumulo-site.xml,
>
> I presume they take effect for normal Accumulo usage, but I'm nearly
> certain
>
> that settings in accumulo-site.xml do not affect the tests as I posted
>
> earlier. How can I set those two properties in a way that will cause the
>
> tests temporary configuration to take them into account? I tried editing
>
> TestUtils.py TestUtilsMixin settings as did work for the Accumulo property
>
> table.file.compress.type, but the MapR related properties don't seem to
>
> take. Ideas?
>
>
>
> Also, if all of the auto tests pass successfully do you feel comfortable
>
> that the testing was sufficient or do you recommend running additional
>
> tests?
>
>
>
> Thanks!
>
>
> Keys
>
>
> ________________________________
>
>
> Keys Botzum
>
>
> Senior Principal Technologist
>
>
> WW Systems Engineering
>
>
> kbotzum@maprtech.com
>
>
> 443-718-0098
>
>
> MapR Technologies
>
>
> http://www.mapr.com
>
>
>
>
>
>