You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@metron.apache.org by deepak kumar <kd...@gmail.com> on 2018/10/19 11:18:24 UTC

HCP in Cloud infrastructures such as AWS , GCP, AZURE

Hi All
I have a quick question around HCP deployments in cloud infra such as AWS.
I am planning to run persistent cluster for all event streaming and
processing.
And then run transient cluster such as AWS EMR to run batch loads on the
data ingested from persistent cluster.
Have anyone tried this model ?
Since data volume is going to be humongous ,cloud is charging lot of money
for data io and storage.
Keeping this in mind , what could be the best cloud deployment of hcp
components assuming there is going to be ingest rate of 10TB per day .

Thanks in advance.


Regards,
Deepak

Re: HCP in Cloud infrastructures such as AWS , GCP, AZURE

Posted by Ali Nazemian <al...@gmail.com>.

Depending on the model of security, you may have some challenges with the
Ranger integration with your cloud storage especially if you are thinking
of using TDE for the encryption at rest. Otherwise, using Metron in that
way should be quite feasible. However, you may face some performance issues
depending on what the required SLA is, but the cost saving will most
probably convince you to go with the decoupling storage from compute.

Cheers,
Ali

On Tue, Oct 23, 2018 at 2:57 AM deepak kumar <kd...@gmail.com> wrote:

> Thanks Carolyn.
> Is there any defined reference architecture to refer to?
>
> Thanks
> Deepak
>
> On Mon, Oct 22, 2018 at 8:23 PM Carolyn Duby <cd...@hortonworks.com>
> wrote:
>
> >
> > Hive 3.0 works well with block stores.  You can either add it to your
> > Metron cluster or spin up an ephemeral cluster with Cloudbreak:
> >
> > 1. Metron streams into HDFS in JSON.
> > 2. Compact daily with Spark into ORC format and store in block store (S3,
> > ADLS, etc).
> > 3. Query ORC in block store using external Hive 3.0 tables in HDP 3 using
> > LLAP.
> > 4. If querying externally from block store is too slow, try adding more
> > LLAP cache or load data into HDFS prior to analysis.
> >
> > If you are using the Metron Alerts UI, you will need solr which works
> well
> > only on fast disk.   To keep costs down, reduce the context stored in
> Solr
> > using the following techniques:
> > 1. Only index the fields you might search on.
> > 2. Reduce the formats you store in Solr to only those you will want to
> see
> > in the Alerts UI.
> > 3. Reduce the length of time you store data in Solr.
> >
> > Thanks
> > Carolyn Duby
> > Solutions Engineer, Northeast
> > cduby@hortonworks.com
> > +1.508.965.0584
> >
> > Join my team!
> > Enterprise Account Manager – Boston - http://grnh.se/wepchv1
> > Solutions Engineer – Boston - http://grnh.se/8gbxy41
> > Need Answers? Try https://community.hortonworks.com <
> > https://community.hortonworks.com/answers/index.html>
> >
> >
> >
> >
> >
> >
> >
> >
> > On 10/19/18, 7:18 AM, "deepak kumar" <kd...@gmail.com> wrote:
> >
> > >Hi All
> > >I have a quick question around HCP deployments in cloud infra such as
> AWS.
> > >I am planning to run persistent cluster for all event streaming and
> > >processing.
> > >And then run transient cluster such as AWS EMR to run batch loads on the
> > >data ingested from persistent cluster.
> > >Have anyone tried this model ?
> > >Since data volume is going to be humongous ,cloud is charging lot of
> money
> > >for data io and storage.
> > >Keeping this in mind , what could be the best cloud deployment of hcp
> > >components assuming there is going to be ingest rate of 10TB per day .
> > >
> > >Thanks in advance.
> > >
> > >
> > >Regards,
> > >Deepak
> >
>


-- 
A.Nazemian

Re: HCP in Cloud infrastructures such as AWS , GCP, AZURE

Posted by deepak kumar <kd...@gmail.com>.

Thanks Carolyn.
Is there any defined reference architecture to refer to?

Thanks
Deepak

On Mon, Oct 22, 2018 at 8:23 PM Carolyn Duby <cd...@hortonworks.com> wrote:

>
> Hive 3.0 works well with block stores.  You can either add it to your
> Metron cluster or spin up an ephemeral cluster with Cloudbreak:
>
> 1. Metron streams into HDFS in JSON.
> 2. Compact daily with Spark into ORC format and store in block store (S3,
> ADLS, etc).
> 3. Query ORC in block store using external Hive 3.0 tables in HDP 3 using
> LLAP.
> 4. If querying externally from block store is too slow, try adding more
> LLAP cache or load data into HDFS prior to analysis.
>
> If you are using the Metron Alerts UI, you will need solr which works well
> only on fast disk.   To keep costs down, reduce the context stored in Solr
> using the following techniques:
> 1. Only index the fields you might search on.
> 2. Reduce the formats you store in Solr to only those you will want to see
> in the Alerts UI.
> 3. Reduce the length of time you store data in Solr.
>
> Thanks
> Carolyn Duby
> Solutions Engineer, Northeast
> cduby@hortonworks.com
> +1.508.965.0584
>
> Join my team!
> Enterprise Account Manager – Boston - http://grnh.se/wepchv1
> Solutions Engineer – Boston - http://grnh.se/8gbxy41
> Need Answers? Try https://community.hortonworks.com <
> https://community.hortonworks.com/answers/index.html>
>
>
>
>
>
>
>
>
> On 10/19/18, 7:18 AM, "deepak kumar" <kd...@gmail.com> wrote:
>
> >Hi All
> >I have a quick question around HCP deployments in cloud infra such as AWS.
> >I am planning to run persistent cluster for all event streaming and
> >processing.
> >And then run transient cluster such as AWS EMR to run batch loads on the
> >data ingested from persistent cluster.
> >Have anyone tried this model ?
> >Since data volume is going to be humongous ,cloud is charging lot of money
> >for data io and storage.
> >Keeping this in mind , what could be the best cloud deployment of hcp
> >components assuming there is going to be ingest rate of 10TB per day .
> >
> >Thanks in advance.
> >
> >
> >Regards,
> >Deepak
>

Re: HCP in Cloud infrastructures such as AWS , GCP, AZURE

Posted by Carolyn Duby <cd...@hortonworks.com>.

Hive 3.0 works well with block stores.  You can either add it to your Metron cluster or spin up an ephemeral cluster with Cloudbreak:

1. Metron streams into HDFS in JSON.
2. Compact daily with Spark into ORC format and store in block store (S3, ADLS, etc).
3. Query ORC in block store using external Hive 3.0 tables in HDP 3 using LLAP.
4. If querying externally from block store is too slow, try adding more LLAP cache or load data into HDFS prior to analysis.

If you are using the Metron Alerts UI, you will need solr which works well only on fast disk.   To keep costs down, reduce the context stored in Solr using the following techniques:
1. Only index the fields you might search on.
2. Reduce the formats you store in Solr to only those you will want to see in the Alerts UI.
3. Reduce the length of time you store data in Solr.

Thanks
Carolyn Duby
Solutions Engineer, Northeast
cduby@hortonworks.com
+1.508.965.0584

Join my team!
Enterprise Account Manager – Boston - http://grnh.se/wepchv1
Solutions Engineer – Boston - http://grnh.se/8gbxy41
Need Answers? Try https://community.hortonworks.com <https://community.hortonworks.com/answers/index.html>

On 10/19/18, 7:18 AM, "deepak kumar" <kd...@gmail.com> wrote:

>Hi All
>I have a quick question around HCP deployments in cloud infra such as AWS.
>I am planning to run persistent cluster for all event streaming and
>processing.
>And then run transient cluster such as AWS EMR to run batch loads on the
>data ingested from persistent cluster.
>Have anyone tried this model ?
>Since data volume is going to be humongous ,cloud is charging lot of money
>for data io and storage.
>Keeping this in mind , what could be the best cloud deployment of hcp
>components assuming there is going to be ingest rate of 10TB per day .
>
>Thanks in advance.
>
>
>Regards,
>Deepak