You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@accumulo.apache.org by "Roberts, Geoffry [USA]" <Ro...@bah.com> on 2020/12/23 17:19:14 UTC

Accumulo on HDInsight

All,

A quick question on something I've never tried before:

Does anyone have any experience with setting up Accumulo with HDInsight?  Can it be done?  Or am I better off just using a few Linux VMs, which is my first inclination and definitely my comfort zone?

The employer has me on MS Azure.  I am setting up an Accumulo  cluster there.  I notice Az offers a Hadoop thing called HDInsight.  I looked into possibly using it-it has Zookeeper-for the H & Z part of my installation but as yet I don't see how to bring Accumulo into the picture.

Any thoughts are appreciated.

RE: [External] Re: Accumulo on HDInsight

Posted by "Roberts, Geoffry [USA]" <Ro...@bah.com>.
So far, I can only see Accumulo cannot being collocated with H & Z.  Az has documentation on how to set up 3rd partly applications, but I don’t think something like Accumulo qualifies.  Something I haven’t mentioned is I also need the programming language Julia talking to Accumulo but that’s another matter.

From: Christopher <ct...@apache.org>
Sent: Wednesday, December 23, 2020 1:22 PM
To: accumulo-user <us...@accumulo.apache.org>
Subject: [External] Re: Accumulo on HDInsight

I have not had experience with HDInsight. My first thoughts are that if it provides Hadoop and ZooKeeper for you, then that's a few less things to worry about from a maintenance perspective for your Accumulo cluster. On the other hand, if you can't run Accumulo nodes colocated with Hadoop DataNodes, then I wonder if you're losing some performance due to lack of data locality (on top of any performance hit from being in a virtual environment).

On Wed, Dec 23, 2020 at 12:19 PM Roberts, Geoffry [USA] <Ro...@bah.com>> wrote:
All,

A quick question on something I’ve never tried before:

Does anyone have any experience with setting up Accumulo with HDInsight?  Can it be done?  Or am I better off just using a few Linux VMs, which is my first inclination and definitely my comfort zone?

The employer has me on MS Azure.  I am setting up an Accumulo  cluster there.  I notice Az offers a Hadoop thing called HDInsight.  I looked into possibly using it—it has Zookeeper—for the H & Z part of my installation but as yet I don’t see how to bring Accumulo into the picture.

Any thoughts are appreciated.

RE: [External] Re: Accumulo on HDInsight

Posted by "Roberts, Geoffry [USA]" <Ro...@bah.com>.
Good insight Anagha.  I am now leaned toward rolling my own.  It makes distributed Julia easier as well.

From: anagha khanolkar <an...@gmail.com>
Sent: Wednesday, December 23, 2020 1:45 PM
To: user@accumulo.apache.org
Subject: [External] Re: Accumulo on HDInsight

I have experience with HDInsight (used to be Hortonworks Hadoop PaaS, now Azure has its own distro - same look and feel).  Just a quick background.  HDInsight offers workload based cluster offerings - Spark, Hadoop (MR) Kafka, Hive LLAP, HBase etc.  Its disaggregated compute and storage (leverages configurable cloud native storage - Azure Blob Storage or Azure Data Lake Store), and virtual machines.  Only for Kafka it uses network attached disks for storage, and Hbase WAL is on attached disks.  The secure flavor (Kerberized and the only multi-user flavor) leverages a PaaS service called Azure Active Directory Domain Services (Active Directory under the hood).

The product team - if you reached out to them would discourage running Accumulo on HDInsight and going ahead and installing it would affect support & SLAs.  I would reach out to the customer assigned (by Microsoft) Azure cloud solution architect and get an email back from the product team to share with the customer if the customer needs it.

I would go with VMs and if they are using the secure version (Enterprise Security Package), I would domain join those VMs in Azure Active Directory Domain Services.

Anagha Khanolkar


On Wed, Dec 23, 2020 at 12:23 PM Christopher <ct...@apache.org>> wrote:
I have not had experience with HDInsight. My first thoughts are that if it provides Hadoop and ZooKeeper for you, then that's a few less things to worry about from a maintenance perspective for your Accumulo cluster. On the other hand, if you can't run Accumulo nodes colocated with Hadoop DataNodes, then I wonder if you're losing some performance due to lack of data locality (on top of any performance hit from being in a virtual environment).

On Wed, Dec 23, 2020 at 12:19 PM Roberts, Geoffry [USA] <Ro...@bah.com>> wrote:
All,

A quick question on something I’ve never tried before:

Does anyone have any experience with setting up Accumulo with HDInsight?  Can it be done?  Or am I better off just using a few Linux VMs, which is my first inclination and definitely my comfort zone?

The employer has me on MS Azure.  I am setting up an Accumulo  cluster there.  I notice Az offers a Hadoop thing called HDInsight.  I looked into possibly using it—it has Zookeeper—for the H & Z part of my installation but as yet I don’t see how to bring Accumulo into the picture.

Any thoughts are appreciated.

Re: Accumulo on HDInsight

Posted by anagha khanolkar <an...@gmail.com>.
I have experience with HDInsight (used to be Hortonworks Hadoop PaaS, now
Azure has its own distro - same look and feel).  Just a quick background.
HDInsight offers workload based cluster offerings - Spark, Hadoop (MR)
Kafka, Hive LLAP, HBase etc.  Its disaggregated compute and storage
(leverages configurable cloud native storage - Azure Blob Storage or Azure
Data Lake Store), and virtual machines.  Only for Kafka it uses network
attached disks for storage, and Hbase WAL is on attached disks.  The secure
flavor (Kerberized and the only multi-user flavor) leverages a PaaS service
called Azure Active Directory Domain Services (Active Directory under the
hood).

The product team - if you reached out to them would discourage running
Accumulo on HDInsight and going ahead and installing it would
affect support & SLAs.  I would reach out to the customer assigned (by
Microsoft) Azure cloud solution architect and get an email back from the
product team to share with the customer if the customer needs it.

I would go with VMs and if they are using the secure version (Enterprise
Security Package), I would domain join those VMs in Azure Active Directory
Domain Services.

Anagha Khanolkar


On Wed, Dec 23, 2020 at 12:23 PM Christopher <ct...@apache.org> wrote:

> I have not had experience with HDInsight. My first thoughts are that if it
> provides Hadoop and ZooKeeper for you, then that's a few less things to
> worry about from a maintenance perspective for your Accumulo cluster. On
> the other hand, if you can't run Accumulo nodes colocated with Hadoop
> DataNodes, then I wonder if you're losing some performance due to lack of
> data locality (on top of any performance hit from being in a virtual
> environment).
>
> On Wed, Dec 23, 2020 at 12:19 PM Roberts, Geoffry [USA] <
> Roberts_Geoffry@bah.com> wrote:
>
>> All,
>>
>>
>>
>> A quick question on something I’ve never tried before:
>>
>>
>>
>> Does anyone have any experience with setting up Accumulo with HDInsight?
>> Can it be done?  Or am I better off just using a few Linux VMs, which is my
>> first inclination and definitely my comfort zone?
>>
>>
>>
>> The employer has me on MS Azure.  I am setting up an Accumulo  cluster
>> there.  I notice Az offers a Hadoop thing called HDInsight.  I looked into
>> possibly using it—it has Zookeeper—for the H & Z part of my installation
>> but as yet I don’t see how to bring Accumulo into the picture.
>>
>>
>>
>> Any thoughts are appreciated.
>>
>

Re: Accumulo on HDInsight

Posted by Christopher <ct...@apache.org>.
I have not had experience with HDInsight. My first thoughts are that if it
provides Hadoop and ZooKeeper for you, then that's a few less things to
worry about from a maintenance perspective for your Accumulo cluster. On
the other hand, if you can't run Accumulo nodes colocated with Hadoop
DataNodes, then I wonder if you're losing some performance due to lack of
data locality (on top of any performance hit from being in a virtual
environment).

On Wed, Dec 23, 2020 at 12:19 PM Roberts, Geoffry [USA] <
Roberts_Geoffry@bah.com> wrote:

> All,
>
>
>
> A quick question on something I’ve never tried before:
>
>
>
> Does anyone have any experience with setting up Accumulo with HDInsight?
> Can it be done?  Or am I better off just using a few Linux VMs, which is my
> first inclination and definitely my comfort zone?
>
>
>
> The employer has me on MS Azure.  I am setting up an Accumulo  cluster
> there.  I notice Az offers a Hadoop thing called HDInsight.  I looked into
> possibly using it—it has Zookeeper—for the H & Z part of my installation
> but as yet I don’t see how to bring Accumulo into the picture.
>
>
>
> Any thoughts are appreciated.
>