You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by Mich Talebzadeh <mi...@gmail.com> on 2016/11/11 14:56:08 UTC

Possible DR solution

Hi,

Has anyone had experience of using WanDisco <https://www.wandisco.com/>
block replication to create a fault tolerant solution to DR in Hadoop?

The product claims that it starts replicating as soon as the first data
block lands on HDFS and takes the block and sends it to DR/replicate site.
The idea is that is faster than doing it through traditional HDFS copy
tools which are normally batch oriented.

It also claims to replicate Hive metadata as well.

I wanted to gauge if anyone has used it or a competitor product. The claim
is that they do not have competitors!

Thanks


Dr Mich Talebzadeh



LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.

Re: Possible DR solution

Posted by Mich Talebzadeh <mi...@gmail.com>.

Hi,

I meant the way Wandisco does replication. Streaming blocks of data one
after another.

You are correct that temporary directories need not be replicated.

One of their point is that one can replicate a cluster from say NY to
Singapore. I much doubt if that is doable given the volume of data growth.

We are back to the same old story. Put new data in the active cluster from
anywhere. So have users log in from Singapore to NY cluster and do the work
there. You could then replicate that data from NY to Singapore
using Wandisco. However,  how about the latency of such operations and how
*up-to-date* the data in replicate site is going to be.

Bottom line how good is to deploy such tool given the cost?

Dr Mich Talebzadeh



LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.



On 13 November 2016 at 06:49, Jörn Franke <jo...@gmail.com> wrote:

> One solution would be to be selective. E.g. Transferring only the source
> data and/or the processed one and skip intermediate data.
>
> With respect to streaming: are you referring to Spark Streaming? Then I
> would run the same Streaming job in both data centers.
>
> On 12 Nov 2016, at 21:53, Mich Talebzadeh <mi...@gmail.com>
> wrote:
>
> Thanks for the links.
>
> The difficulty with building DR for HDFS is the distributed nature of
> HDFS. If each DataNode had a mirror copy in DR via something similar to
> SRDF (assuming NameNode and others taken care of), then there would not be
> an issue. The fail-over would be starting the mirror HDFS in DR site.
>
> However, I agree with the points made that if your active cluster is busy,
> then the job becomes more challenging due to latency observed. Points also
> need to be observed that in an Enterprise like a Bank that Prod-DR WAN is
> shared among many applications some transactional (Oracle, Sybase , MSSQL)
> and others as DW including HDFS.
>
> May be a smart solution would be to replicate active partitions using
> streaming technologies and leave the dormant ones as they hardly change.
> However, we are still talking about potentially Terabytes of data through
> Gigabits WAN.
>
> The problem from my experience is that if you replicate few hundred
> Gigabytes of data daily, then you may just live with it. As your data
> grows, the task of streaming data is going to be much challenging. I have
> seen these issues with replicating large rows of CLOBS and BLOBS columns
> with Oracle and Sybase trying to push data from London to Singapore. It can
> become a nightmare.
>
>
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
>
>
> http://talebzadehmich.wordpress.com
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
> On 12 November 2016 at 17:17, Timur Shenkao <ts...@timshenkao.su> wrote:
>
>> Hi guys!
>>
>> 1) Though it's quite interesting, I believe that this discussion is not
>> about Spark :)
>> 2) If you are interested, there is solution by Cloudera
>> https://www.cloudera.com/documentation/enterprise/5-5-x/
>> topics/cm_bdr_replication_intro.html (requires that *source cluster* has
>> Cloudera Enterprise license, so it's not for free).
>> Correct me but I don't remember specialized replication solution by
>> Hortonworks (Atlas, Falcon, etc. are not precisely about inter-custer
>> replication).
>> Some solutions from Hadoop  Ecosystem try to implement replication of
>> their own: https://cwiki.apache.org/confluence/pages/viewpage.action?
>> pageId=62687462 , http://highscalability.com/blo
>> g/2016/8/1/how-to-setup-a-highly-available-multi-az-cassandr
>> a-cluster-o.html ,
>> 3) Read this discussion https://community.hortonworks.
>> com/questions/29645/hdfs-replication-for-dr.html
>> 4) I prefer bash scripts / Python scripts / Oozie jobs + distcp - it's
>> for free & I control what's going on precisely. But, in case of huge
>> clusters & sophisticated logic, this approach become cumbersome.
>> 5) Don't forget about security & encryption: your sensitive data may be
>> read by third-party agents during replication
>>
>> On Sat, Nov 12, 2016 at 6:05 PM, Mich Talebzadeh <
>> mich.talebzadeh@gmail.com> wrote:
>>
>>> Thanks Jorn.
>>>
>>> The way WanDisco promotes itself is doing block level replication. as I
>>> understand you modify core-file.xml and add couple of network server
>>> locations there. they call this tool Fusion. there are at least 2 fusion
>>> servers for high availability. each one among other things has a database
>>> of its own. Once the client interacts with HDFS the fusion server behaves
>>> like a sniffer  with its own port. As soon as the first HTFS block of
>>> 256MBout of say a file of 30GB is written, it starts sending that block to
>>> recipient. the laws of physics, the pipeline size etc applies here. That is
>>> up to the consumer. it can 10 files at the same time etc. so that is all.
>>> It is a known technology now labeled as streaming. so in summary it does
>>> not have to wait for the full file to be written to HDFS before replicating
>>> blocks.  that is where it scores.
>>>
>>> It helps WAN work. Say the primary/active HDFS is in London and the
>>> replicate is in Singapore. so users in Singapore can see replicated data
>>> (eventually) when it gets there. It can obviously be used for DR in that
>>> case it is like Hot standby (borrowing a terminology from Sybase). In
>>> contrast one can do the same with period loads with homemade tools or tools
>>> like BDR from Cloudera.
>>>
>>> I mentioned that Hive is going to have its metastore on Hbase as well
>>> and that can be potential problems. The site is here
>>> <https://www.wandisco.com/>
>>>
>>> They are claiming there is no competitors in the market for their
>>> streaming HA product.
>>>
>>> HTH
>>>
>>> Dr Mich Talebzadeh
>>>
>>>
>>>
>>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>
>>>
>>>
>>> http://talebzadehmich.wordpress.com
>>>
>>>
>>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>>> any loss, damage or destruction of data or any other property which may
>>> arise from relying on this email's technical content is explicitly
>>> disclaimed. The author will in no case be liable for any monetary damages
>>> arising from such loss, damage or destruction.
>>>
>>>
>>>
>>> On 12 November 2016 at 11:17, Jörn Franke <jo...@gmail.com> wrote:
>>>
>>>> What is wrong with the good old batch transfer for transferring data
>>>> from a cluster to another? I assume your use case is only business
>>>> continuity in case of disasters such as data center loss, which are
>>>> unlikely to happen (well it does not mean they do not happen) and where you
>>>> could afford to loose one day (or hour) of data (depends!).
>>>>
>>>> Nevertheless, I assume he refers to the Hadoop storage policies:
>>>> https://hadoop.apache.org/docs/current/hadoop-proj
>>>> ect-dist/hadoop-hdfs/ArchivalStorage.html , but this still only works
>>>> for the same cluster.
>>>>
>>>> You could also develop a custom secondary file system, similar to the
>>>> Ignite Cache filesystem, that sits on top of HDFS and as soon as it
>>>> receives data it sends them to another cluster and provides it to HDFS. Not
>>>> knowing Wandisco, I assume what it does. Given the prices (and the fact
>>>> that clusters tend to grow) you may want to evaluate if buying or making
>>>> makes sense. In any case, it also requires evaluation of network
>>>> throughput, because this may become the bottleneck somewhere (either within
>>>> the cluster or more likely between data centers).
>>>>
>>>> As you mentioned, Hbase & Co may require a special consideration for
>>>> the case that data is in-memory and not yet persisted.
>>>>
>>>> On Sat, Nov 12, 2016 at 12:04 PM, Mich Talebzadeh <
>>>> mich.talebzadeh@gmail.com> wrote:
>>>>
>>>>> thanks Vince
>>>>>
>>>>> can you provide more details on this pls
>>>>>
>>>>> Dr Mich Talebzadeh
>>>>>
>>>>>
>>>>>
>>>>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>>>
>>>>>
>>>>>
>>>>> http://talebzadehmich.wordpress.com
>>>>>
>>>>>
>>>>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>>>>> any loss, damage or destruction of data or any other property which may
>>>>> arise from relying on this email's technical content is explicitly
>>>>> disclaimed. The author will in no case be liable for any monetary damages
>>>>> arising from such loss, damage or destruction.
>>>>>
>>>>>
>>>>>
>>>>> On 12 November 2016 at 09:52, vincent gromakowski <
>>>>> vincent.gromakowski@gmail.com> wrote:
>>>>>
>>>>>> A Hdfs tiering policy with good tags should be similar
>>>>>>
>>>>>> Le 11 nov. 2016 11:19 PM, "Mich Talebzadeh" <
>>>>>> mich.talebzadeh@gmail.com> a écrit :
>>>>>>
>>>>>>> I really don't see why one wants to set up streaming replication
>>>>>>> unless for situations where similar functionality to transactional
>>>>>>> databases is required in big data?
>>>>>>>
>>>>>>> Dr Mich Talebzadeh
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>>>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> http://talebzadehmich.wordpress.com
>>>>>>>
>>>>>>>
>>>>>>> *Disclaimer:* Use it at your own risk. Any and all responsibility
>>>>>>> for any loss, damage or destruction of data or any other property which may
>>>>>>> arise from relying on this email's technical content is explicitly
>>>>>>> disclaimed. The author will in no case be liable for any monetary damages
>>>>>>> arising from such loss, damage or destruction.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On 11 November 2016 at 17:24, Mich Talebzadeh <
>>>>>>> mich.talebzadeh@gmail.com> wrote:
>>>>>>>
>>>>>>>> I think it differs as it starts streaming data through its own port
>>>>>>>> as soon as the first block is landed. so the granularity is a block.
>>>>>>>>
>>>>>>>> however, think of it as oracle golden gate replication or sap
>>>>>>>> replication for databases. the only difference is that if the corruption in
>>>>>>>> the block with hdfs it will be freplicated much like srdf.
>>>>>>>>
>>>>>>>> whereas with oracle or sap it is log based replication which stops
>>>>>>>> when it encounters corruption.
>>>>>>>>
>>>>>>>> replication depends on the block. so can replicate hive metadata
>>>>>>>> and fsimage etc. but cannot replicate hbase memstore if hbase crashes.
>>>>>>>>
>>>>>>>> so that is the gist of it. streaming replication as opposed to
>>>>>>>> snapshot.
>>>>>>>>
>>>>>>>> sounds familiar. think of it as log shipping in oracle old days
>>>>>>>> versus goldengate etc.
>>>>>>>>
>>>>>>>> hth
>>>>>>>>
>>>>>>>> Dr Mich Talebzadeh
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>>>>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> http://talebzadehmich.wordpress.com
>>>>>>>>
>>>>>>>>
>>>>>>>> *Disclaimer:* Use it at your own risk. Any and all responsibility
>>>>>>>> for any loss, damage or destruction of data or any other property which may
>>>>>>>> arise from relying on this email's technical content is explicitly
>>>>>>>> disclaimed. The author will in no case be liable for any monetary damages
>>>>>>>> arising from such loss, damage or destruction.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On 11 November 2016 at 17:14, Deepak Sharma <de...@gmail.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Reason being you can set up hdfs duplication on your own to some
>>>>>>>>> other cluster .
>>>>>>>>>
>>>>>>>>> On Nov 11, 2016 22:42, "Mich Talebzadeh" <
>>>>>>>>> mich.talebzadeh@gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>> reason being ?
>>>>>>>>>>
>>>>>>>>>> Dr Mich Talebzadeh
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>>>>>>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> http://talebzadehmich.wordpress.com
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> *Disclaimer:* Use it at your own risk. Any and all
>>>>>>>>>> responsibility for any loss, damage or destruction of data or any other
>>>>>>>>>> property which may arise from relying on this email's technical content is
>>>>>>>>>> explicitly disclaimed. The author will in no case be liable for any
>>>>>>>>>> monetary damages arising from such loss, damage or destruction.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On 11 November 2016 at 17:11, Deepak Sharma <
>>>>>>>>>> deepakmca05@gmail.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> This is waste of money I guess.
>>>>>>>>>>>
>>>>>>>>>>> On Nov 11, 2016 22:41, "Mich Talebzadeh" <
>>>>>>>>>>> mich.talebzadeh@gmail.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> starts at $4,000 per node per year all inclusive.
>>>>>>>>>>>>
>>>>>>>>>>>> With discount it can be halved but we are talking a node itself
>>>>>>>>>>>> so if you have 5 nodes in primary and 5 nodes in DR we are talking about
>>>>>>>>>>>> $40K already.
>>>>>>>>>>>>
>>>>>>>>>>>> HTH
>>>>>>>>>>>>
>>>>>>>>>>>> Dr Mich Talebzadeh
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>>>>>>>>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> http://talebzadehmich.wordpress.com
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> *Disclaimer:* Use it at your own risk. Any and all
>>>>>>>>>>>> responsibility for any loss, damage or destruction of data or any other
>>>>>>>>>>>> property which may arise from relying on this email's technical content is
>>>>>>>>>>>> explicitly disclaimed. The author will in no case be liable for any
>>>>>>>>>>>> monetary damages arising from such loss, damage or destruction.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On 11 November 2016 at 16:43, Mudit Kumar <
>>>>>>>>>>>> mkumar128@sapient.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Is it feasible cost wise?
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>
>>>>>>>>>>>>> Mudit
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> *From:* Mich Talebzadeh [mailto:mich.talebzadeh@gmail.com]
>>>>>>>>>>>>> *Sent:* Friday, November 11, 2016 2:56 PM
>>>>>>>>>>>>> *To:* user @spark
>>>>>>>>>>>>> *Subject:* Possible DR solution
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Has anyone had experience of using WanDisco
>>>>>>>>>>>>> <https://www.wandisco.com/> block replication to create a
>>>>>>>>>>>>> fault tolerant solution to DR in Hadoop?
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> The product claims that it starts replicating as soon as the
>>>>>>>>>>>>> first data block lands on HDFS and takes the block and sends it to
>>>>>>>>>>>>> DR/replicate site. The idea is that is faster than doing it through
>>>>>>>>>>>>> traditional HDFS copy tools which are normally batch oriented.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> It also claims to replicate Hive metadata as well.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> I wanted to gauge if anyone has used it or a competitor
>>>>>>>>>>>>> product. The claim is that they do not have competitors!
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Dr Mich Talebzadeh
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> LinkedIn  *https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>>>>>>>>>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> http://talebzadehmich.wordpress.com
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> *Disclaimer:* Use it at your own risk. Any and all
>>>>>>>>>>>>> responsibility for any loss, damage or destruction of data or any other
>>>>>>>>>>>>> property which may arise from relying on this email's technical content is
>>>>>>>>>>>>> explicitly disclaimed. The author will in no case be liable for any
>>>>>>>>>>>>> monetary damages arising from such loss, damage or destruction.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Possible DR solution

Posted by Mich Talebzadeh <mi...@gmail.com>.

Thanks for the links.

The difficulty with building DR for HDFS is the distributed nature of HDFS.
If each DataNode had a mirror copy in DR via something similar to SRDF
(assuming NameNode and others taken care of), then there would not be an
issue. The fail-over would be starting the mirror HDFS in DR site.

However, I agree with the points made that if your active cluster is busy,
then the job becomes more challenging due to latency observed. Points also
need to be observed that in an Enterprise like a Bank that Prod-DR WAN is
shared among many applications some transactional (Oracle, Sybase , MSSQL)
and others as DW including HDFS.

May be a smart solution would be to replicate active partitions using
streaming technologies and leave the dormant ones as they hardly change.
However, we are still talking about potentially Terabytes of data through
Gigabits WAN.

The problem from my experience is that if you replicate few hundred
Gigabytes of data daily, then you may just live with it. As your data
grows, the task of streaming data is going to be much challenging. I have
seen these issues with replicating large rows of CLOBS and BLOBS columns
with Oracle and Sybase trying to push data from London to Singapore. It can
become a nightmare.



Dr Mich Talebzadeh



LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.



On 12 November 2016 at 17:17, Timur Shenkao <ts...@timshenkao.su> wrote:

> Hi guys!
>
> 1) Though it's quite interesting, I believe that this discussion is not
> about Spark :)
> 2) If you are interested, there is solution by Cloudera
> https://www.cloudera.com/documentation/enterprise/5-5-
> x/topics/cm_bdr_replication_intro.html (requires that *source cluster*
> has Cloudera Enterprise license, so it's not for free).
> Correct me but I don't remember specialized replication solution by
> Hortonworks (Atlas, Falcon, etc. are not precisely about inter-custer
> replication).
> Some solutions from Hadoop  Ecosystem try to implement replication of
> their own: https://cwiki.apache.org/confluence/pages/viewpage.
> action?pageId=62687462 , http://highscalability.com/
> blog/2016/8/1/how-to-setup-a-highly-available-multi-az-
> cassandra-cluster-o.html ,
> 3) Read this discussion https://community.hortonworks.
> com/questions/29645/hdfs-replication-for-dr.html
> 4) I prefer bash scripts / Python scripts / Oozie jobs + distcp - it's for
> free & I control what's going on precisely. But, in case of huge clusters &
> sophisticated logic, this approach become cumbersome.
> 5) Don't forget about security & encryption: your sensitive data may be
> read by third-party agents during replication
>
> On Sat, Nov 12, 2016 at 6:05 PM, Mich Talebzadeh <
> mich.talebzadeh@gmail.com> wrote:
>
>> Thanks Jorn.
>>
>> The way WanDisco promotes itself is doing block level replication. as I
>> understand you modify core-file.xml and add couple of network server
>> locations there. they call this tool Fusion. there are at least 2 fusion
>> servers for high availability. each one among other things has a database
>> of its own. Once the client interacts with HDFS the fusion server behaves
>> like a sniffer  with its own port. As soon as the first HTFS block of
>> 256MBout of say a file of 30GB is written, it starts sending that block to
>> recipient. the laws of physics, the pipeline size etc applies here. That is
>> up to the consumer. it can 10 files at the same time etc. so that is all.
>> It is a known technology now labeled as streaming. so in summary it does
>> not have to wait for the full file to be written to HDFS before replicating
>> blocks.  that is where it scores.
>>
>> It helps WAN work. Say the primary/active HDFS is in London and the
>> replicate is in Singapore. so users in Singapore can see replicated data
>> (eventually) when it gets there. It can obviously be used for DR in that
>> case it is like Hot standby (borrowing a terminology from Sybase). In
>> contrast one can do the same with period loads with homemade tools or tools
>> like BDR from Cloudera.
>>
>> I mentioned that Hive is going to have its metastore on Hbase as well and
>> that can be potential problems. The site is here
>> <https://www.wandisco.com/>
>>
>> They are claiming there is no competitors in the market for their
>> streaming HA product.
>>
>> HTH
>>
>> Dr Mich Talebzadeh
>>
>>
>>
>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>
>>
>>
>> http://talebzadehmich.wordpress.com
>>
>>
>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>> any loss, damage or destruction of data or any other property which may
>> arise from relying on this email's technical content is explicitly
>> disclaimed. The author will in no case be liable for any monetary damages
>> arising from such loss, damage or destruction.
>>
>>
>>
>> On 12 November 2016 at 11:17, Jörn Franke <jo...@gmail.com> wrote:
>>
>>> What is wrong with the good old batch transfer for transferring data
>>> from a cluster to another? I assume your use case is only business
>>> continuity in case of disasters such as data center loss, which are
>>> unlikely to happen (well it does not mean they do not happen) and where you
>>> could afford to loose one day (or hour) of data (depends!).
>>>
>>> Nevertheless, I assume he refers to the Hadoop storage policies:
>>> https://hadoop.apache.org/docs/current/hadoop-proj
>>> ect-dist/hadoop-hdfs/ArchivalStorage.html , but this still only works
>>> for the same cluster.
>>>
>>> You could also develop a custom secondary file system, similar to the
>>> Ignite Cache filesystem, that sits on top of HDFS and as soon as it
>>> receives data it sends them to another cluster and provides it to HDFS. Not
>>> knowing Wandisco, I assume what it does. Given the prices (and the fact
>>> that clusters tend to grow) you may want to evaluate if buying or making
>>> makes sense. In any case, it also requires evaluation of network
>>> throughput, because this may become the bottleneck somewhere (either within
>>> the cluster or more likely between data centers).
>>>
>>> As you mentioned, Hbase & Co may require a special consideration for the
>>> case that data is in-memory and not yet persisted.
>>>
>>> On Sat, Nov 12, 2016 at 12:04 PM, Mich Talebzadeh <
>>> mich.talebzadeh@gmail.com> wrote:
>>>
>>>> thanks Vince
>>>>
>>>> can you provide more details on this pls
>>>>
>>>> Dr Mich Talebzadeh
>>>>
>>>>
>>>>
>>>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>>
>>>>
>>>>
>>>> http://talebzadehmich.wordpress.com
>>>>
>>>>
>>>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>>>> any loss, damage or destruction of data or any other property which may
>>>> arise from relying on this email's technical content is explicitly
>>>> disclaimed. The author will in no case be liable for any monetary damages
>>>> arising from such loss, damage or destruction.
>>>>
>>>>
>>>>
>>>> On 12 November 2016 at 09:52, vincent gromakowski <
>>>> vincent.gromakowski@gmail.com> wrote:
>>>>
>>>>> A Hdfs tiering policy with good tags should be similar
>>>>>
>>>>> Le 11 nov. 2016 11:19 PM, "Mich Talebzadeh" <mi...@gmail.com>
>>>>> a écrit :
>>>>>
>>>>>> I really don't see why one wants to set up streaming replication
>>>>>> unless for situations where similar functionality to transactional
>>>>>> databases is required in big data?
>>>>>>
>>>>>> Dr Mich Talebzadeh
>>>>>>
>>>>>>
>>>>>>
>>>>>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>>>>
>>>>>>
>>>>>>
>>>>>> http://talebzadehmich.wordpress.com
>>>>>>
>>>>>>
>>>>>> *Disclaimer:* Use it at your own risk. Any and all responsibility
>>>>>> for any loss, damage or destruction of data or any other property which may
>>>>>> arise from relying on this email's technical content is explicitly
>>>>>> disclaimed. The author will in no case be liable for any monetary damages
>>>>>> arising from such loss, damage or destruction.
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 11 November 2016 at 17:24, Mich Talebzadeh <
>>>>>> mich.talebzadeh@gmail.com> wrote:
>>>>>>
>>>>>>> I think it differs as it starts streaming data through its own port
>>>>>>> as soon as the first block is landed. so the granularity is a block.
>>>>>>>
>>>>>>> however, think of it as oracle golden gate replication or sap
>>>>>>> replication for databases. the only difference is that if the corruption in
>>>>>>> the block with hdfs it will be freplicated much like srdf.
>>>>>>>
>>>>>>> whereas with oracle or sap it is log based replication which stops
>>>>>>> when it encounters corruption.
>>>>>>>
>>>>>>> replication depends on the block. so can replicate hive metadata and
>>>>>>> fsimage etc. but cannot replicate hbase memstore if hbase crashes.
>>>>>>>
>>>>>>> so that is the gist of it. streaming replication as opposed to
>>>>>>> snapshot.
>>>>>>>
>>>>>>> sounds familiar. think of it as log shipping in oracle old days
>>>>>>> versus goldengate etc.
>>>>>>>
>>>>>>> hth
>>>>>>>
>>>>>>> Dr Mich Talebzadeh
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>>>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> http://talebzadehmich.wordpress.com
>>>>>>>
>>>>>>>
>>>>>>> *Disclaimer:* Use it at your own risk. Any and all responsibility
>>>>>>> for any loss, damage or destruction of data or any other property which may
>>>>>>> arise from relying on this email's technical content is explicitly
>>>>>>> disclaimed. The author will in no case be liable for any monetary damages
>>>>>>> arising from such loss, damage or destruction.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On 11 November 2016 at 17:14, Deepak Sharma <de...@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Reason being you can set up hdfs duplication on your own to some
>>>>>>>> other cluster .
>>>>>>>>
>>>>>>>> On Nov 11, 2016 22:42, "Mich Talebzadeh" <mi...@gmail.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> reason being ?
>>>>>>>>>
>>>>>>>>> Dr Mich Talebzadeh
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>>>>>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> http://talebzadehmich.wordpress.com
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> *Disclaimer:* Use it at your own risk. Any and all responsibility
>>>>>>>>> for any loss, damage or destruction of data or any other property which may
>>>>>>>>> arise from relying on this email's technical content is explicitly
>>>>>>>>> disclaimed. The author will in no case be liable for any monetary damages
>>>>>>>>> arising from such loss, damage or destruction.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 11 November 2016 at 17:11, Deepak Sharma <deepakmca05@gmail.com
>>>>>>>>> > wrote:
>>>>>>>>>
>>>>>>>>>> This is waste of money I guess.
>>>>>>>>>>
>>>>>>>>>> On Nov 11, 2016 22:41, "Mich Talebzadeh" <
>>>>>>>>>> mich.talebzadeh@gmail.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> starts at $4,000 per node per year all inclusive.
>>>>>>>>>>>
>>>>>>>>>>> With discount it can be halved but we are talking a node itself
>>>>>>>>>>> so if you have 5 nodes in primary and 5 nodes in DR we are talking about
>>>>>>>>>>> $40K already.
>>>>>>>>>>>
>>>>>>>>>>> HTH
>>>>>>>>>>>
>>>>>>>>>>> Dr Mich Talebzadeh
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>>>>>>>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> http://talebzadehmich.wordpress.com
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> *Disclaimer:* Use it at your own risk. Any and all
>>>>>>>>>>> responsibility for any loss, damage or destruction of data or any other
>>>>>>>>>>> property which may arise from relying on this email's technical content is
>>>>>>>>>>> explicitly disclaimed. The author will in no case be liable for any
>>>>>>>>>>> monetary damages arising from such loss, damage or destruction.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On 11 November 2016 at 16:43, Mudit Kumar <mkumar128@sapient.com
>>>>>>>>>>> > wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Is it feasible cost wise?
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>
>>>>>>>>>>>> Mudit
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> *From:* Mich Talebzadeh [mailto:mich.talebzadeh@gmail.com]
>>>>>>>>>>>> *Sent:* Friday, November 11, 2016 2:56 PM
>>>>>>>>>>>> *To:* user @spark
>>>>>>>>>>>> *Subject:* Possible DR solution
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Hi,
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Has anyone had experience of using WanDisco
>>>>>>>>>>>> <https://www.wandisco.com/> block replication to create a
>>>>>>>>>>>> fault tolerant solution to DR in Hadoop?
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> The product claims that it starts replicating as soon as the
>>>>>>>>>>>> first data block lands on HDFS and takes the block and sends it to
>>>>>>>>>>>> DR/replicate site. The idea is that is faster than doing it through
>>>>>>>>>>>> traditional HDFS copy tools which are normally batch oriented.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> It also claims to replicate Hive metadata as well.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> I wanted to gauge if anyone has used it or a competitor
>>>>>>>>>>>> product. The claim is that they do not have competitors!
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Dr Mich Talebzadeh
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> LinkedIn  *https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>>>>>>>>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> http://talebzadehmich.wordpress.com
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> *Disclaimer:* Use it at your own risk. Any and all
>>>>>>>>>>>> responsibility for any loss, damage or destruction of data or any other
>>>>>>>>>>>> property which may arise from relying on this email's technical content is
>>>>>>>>>>>> explicitly disclaimed. The author will in no case be liable for any
>>>>>>>>>>>> monetary damages arising from such loss, damage or destruction.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>
>>>>>>>
>>>>>>
>>>>
>>>
>>
>

Re: Possible DR solution

Posted by Timur Shenkao <ts...@timshenkao.su>.

Hi guys!

1) Though it's quite interesting, I believe that this discussion is not
about Spark :)
2) If you are interested, there is solution by Cloudera
https://www.cloudera.com/documentation/enterprise/5-5-x/topics/cm_bdr_replication_intro.html
(requires that *source cluster* has Cloudera Enterprise license, so it's
not for free).
Correct me but I don't remember specialized replication solution by
Hortonworks (Atlas, Falcon, etc. are not precisely about inter-custer
replication).
Some solutions from Hadoop  Ecosystem try to implement replication of their
own:
https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=62687462 ,
http://highscalability.com/blog/2016/8/1/how-to-setup-a-highly-available-multi-az-cassandra-cluster-o.html
,
3) Read this discussion
https://community.hortonworks.com/questions/29645/hdfs-replication-for-dr.html
4) I prefer bash scripts / Python scripts / Oozie jobs + distcp - it's for
free & I control what's going on precisely. But, in case of huge clusters &
sophisticated logic, this approach become cumbersome.
5) Don't forget about security & encryption: your sensitive data may be
read by third-party agents during replication

On Sat, Nov 12, 2016 at 6:05 PM, Mich Talebzadeh <mi...@gmail.com>
wrote:

> Thanks Jorn.
>
> The way WanDisco promotes itself is doing block level replication. as I
> understand you modify core-file.xml and add couple of network server
> locations there. they call this tool Fusion. there are at least 2 fusion
> servers for high availability. each one among other things has a database
> of its own. Once the client interacts with HDFS the fusion server behaves
> like a sniffer  with its own port. As soon as the first HTFS block of
> 256MBout of say a file of 30GB is written, it starts sending that block to
> recipient. the laws of physics, the pipeline size etc applies here. That is
> up to the consumer. it can 10 files at the same time etc. so that is all.
> It is a known technology now labeled as streaming. so in summary it does
> not have to wait for the full file to be written to HDFS before replicating
> blocks.  that is where it scores.
>
> It helps WAN work. Say the primary/active HDFS is in London and the
> replicate is in Singapore. so users in Singapore can see replicated data
> (eventually) when it gets there. It can obviously be used for DR in that
> case it is like Hot standby (borrowing a terminology from Sybase). In
> contrast one can do the same with period loads with homemade tools or tools
> like BDR from Cloudera.
>
> I mentioned that Hive is going to have its metastore on Hbase as well and
> that can be potential problems. The site is here
> <https://www.wandisco.com/>
>
> They are claiming there is no competitors in the market for their
> streaming HA product.
>
> HTH
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
>
>
> http://talebzadehmich.wordpress.com
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
> On 12 November 2016 at 11:17, Jörn Franke <jo...@gmail.com> wrote:
>
>> What is wrong with the good old batch transfer for transferring data from
>> a cluster to another? I assume your use case is only business continuity in
>> case of disasters such as data center loss, which are unlikely to happen
>> (well it does not mean they do not happen) and where you could afford to
>> loose one day (or hour) of data (depends!).
>>
>> Nevertheless, I assume he refers to the Hadoop storage policies:
>> https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/
>> ArchivalStorage.html , but this still only works for the same cluster.
>>
>> You could also develop a custom secondary file system, similar to the
>> Ignite Cache filesystem, that sits on top of HDFS and as soon as it
>> receives data it sends them to another cluster and provides it to HDFS. Not
>> knowing Wandisco, I assume what it does. Given the prices (and the fact
>> that clusters tend to grow) you may want to evaluate if buying or making
>> makes sense. In any case, it also requires evaluation of network
>> throughput, because this may become the bottleneck somewhere (either within
>> the cluster or more likely between data centers).
>>
>> As you mentioned, Hbase & Co may require a special consideration for the
>> case that data is in-memory and not yet persisted.
>>
>> On Sat, Nov 12, 2016 at 12:04 PM, Mich Talebzadeh <
>> mich.talebzadeh@gmail.com> wrote:
>>
>>> thanks Vince
>>>
>>> can you provide more details on this pls
>>>
>>> Dr Mich Talebzadeh
>>>
>>>
>>>
>>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>
>>>
>>>
>>> http://talebzadehmich.wordpress.com
>>>
>>>
>>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>>> any loss, damage or destruction of data or any other property which may
>>> arise from relying on this email's technical content is explicitly
>>> disclaimed. The author will in no case be liable for any monetary damages
>>> arising from such loss, damage or destruction.
>>>
>>>
>>>
>>> On 12 November 2016 at 09:52, vincent gromakowski <
>>> vincent.gromakowski@gmail.com> wrote:
>>>
>>>> A Hdfs tiering policy with good tags should be similar
>>>>
>>>> Le 11 nov. 2016 11:19 PM, "Mich Talebzadeh" <mi...@gmail.com>
>>>> a écrit :
>>>>
>>>>> I really don't see why one wants to set up streaming replication
>>>>> unless for situations where similar functionality to transactional
>>>>> databases is required in big data?
>>>>>
>>>>> Dr Mich Talebzadeh
>>>>>
>>>>>
>>>>>
>>>>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>>>
>>>>>
>>>>>
>>>>> http://talebzadehmich.wordpress.com
>>>>>
>>>>>
>>>>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>>>>> any loss, damage or destruction of data or any other property which may
>>>>> arise from relying on this email's technical content is explicitly
>>>>> disclaimed. The author will in no case be liable for any monetary damages
>>>>> arising from such loss, damage or destruction.
>>>>>
>>>>>
>>>>>
>>>>> On 11 November 2016 at 17:24, Mich Talebzadeh <
>>>>> mich.talebzadeh@gmail.com> wrote:
>>>>>
>>>>>> I think it differs as it starts streaming data through its own port
>>>>>> as soon as the first block is landed. so the granularity is a block.
>>>>>>
>>>>>> however, think of it as oracle golden gate replication or sap
>>>>>> replication for databases. the only difference is that if the corruption in
>>>>>> the block with hdfs it will be freplicated much like srdf.
>>>>>>
>>>>>> whereas with oracle or sap it is log based replication which stops
>>>>>> when it encounters corruption.
>>>>>>
>>>>>> replication depends on the block. so can replicate hive metadata and
>>>>>> fsimage etc. but cannot replicate hbase memstore if hbase crashes.
>>>>>>
>>>>>> so that is the gist of it. streaming replication as opposed to
>>>>>> snapshot.
>>>>>>
>>>>>> sounds familiar. think of it as log shipping in oracle old days
>>>>>> versus goldengate etc.
>>>>>>
>>>>>> hth
>>>>>>
>>>>>> Dr Mich Talebzadeh
>>>>>>
>>>>>>
>>>>>>
>>>>>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>>>>
>>>>>>
>>>>>>
>>>>>> http://talebzadehmich.wordpress.com
>>>>>>
>>>>>>
>>>>>> *Disclaimer:* Use it at your own risk. Any and all responsibility
>>>>>> for any loss, damage or destruction of data or any other property which may
>>>>>> arise from relying on this email's technical content is explicitly
>>>>>> disclaimed. The author will in no case be liable for any monetary damages
>>>>>> arising from such loss, damage or destruction.
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 11 November 2016 at 17:14, Deepak Sharma <de...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Reason being you can set up hdfs duplication on your own to some
>>>>>>> other cluster .
>>>>>>>
>>>>>>> On Nov 11, 2016 22:42, "Mich Talebzadeh" <mi...@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> reason being ?
>>>>>>>>
>>>>>>>> Dr Mich Talebzadeh
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>>>>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> http://talebzadehmich.wordpress.com
>>>>>>>>
>>>>>>>>
>>>>>>>> *Disclaimer:* Use it at your own risk. Any and all responsibility
>>>>>>>> for any loss, damage or destruction of data or any other property which may
>>>>>>>> arise from relying on this email's technical content is explicitly
>>>>>>>> disclaimed. The author will in no case be liable for any monetary damages
>>>>>>>> arising from such loss, damage or destruction.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On 11 November 2016 at 17:11, Deepak Sharma <de...@gmail.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> This is waste of money I guess.
>>>>>>>>>
>>>>>>>>> On Nov 11, 2016 22:41, "Mich Talebzadeh" <
>>>>>>>>> mich.talebzadeh@gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>> starts at $4,000 per node per year all inclusive.
>>>>>>>>>>
>>>>>>>>>> With discount it can be halved but we are talking a node itself
>>>>>>>>>> so if you have 5 nodes in primary and 5 nodes in DR we are talking about
>>>>>>>>>> $40K already.
>>>>>>>>>>
>>>>>>>>>> HTH
>>>>>>>>>>
>>>>>>>>>> Dr Mich Talebzadeh
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>>>>>>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> http://talebzadehmich.wordpress.com
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> *Disclaimer:* Use it at your own risk. Any and all
>>>>>>>>>> responsibility for any loss, damage or destruction of data or any other
>>>>>>>>>> property which may arise from relying on this email's technical content is
>>>>>>>>>> explicitly disclaimed. The author will in no case be liable for any
>>>>>>>>>> monetary damages arising from such loss, damage or destruction.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On 11 November 2016 at 16:43, Mudit Kumar <mk...@sapient.com>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> Is it feasible cost wise?
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>>
>>>>>>>>>>> Mudit
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> *From:* Mich Talebzadeh [mailto:mich.talebzadeh@gmail.com]
>>>>>>>>>>> *Sent:* Friday, November 11, 2016 2:56 PM
>>>>>>>>>>> *To:* user @spark
>>>>>>>>>>> *Subject:* Possible DR solution
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Hi,
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Has anyone had experience of using WanDisco
>>>>>>>>>>> <https://www.wandisco.com/> block replication to create a fault
>>>>>>>>>>> tolerant solution to DR in Hadoop?
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> The product claims that it starts replicating as soon as the
>>>>>>>>>>> first data block lands on HDFS and takes the block and sends it to
>>>>>>>>>>> DR/replicate site. The idea is that is faster than doing it through
>>>>>>>>>>> traditional HDFS copy tools which are normally batch oriented.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> It also claims to replicate Hive metadata as well.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> I wanted to gauge if anyone has used it or a competitor product.
>>>>>>>>>>> The claim is that they do not have competitors!
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Thanks
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Dr Mich Talebzadeh
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> LinkedIn  *https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>>>>>>>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> http://talebzadehmich.wordpress.com
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> *Disclaimer:* Use it at your own risk. Any and all
>>>>>>>>>>> responsibility for any loss, damage or destruction of data or any other
>>>>>>>>>>> property which may arise from relying on this email's technical content is
>>>>>>>>>>> explicitly disclaimed. The author will in no case be liable for any
>>>>>>>>>>> monetary damages arising from such loss, damage or destruction.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>
>>>>>>
>>>>>
>>>
>>
>

Re: Possible DR solution

Posted by Mich Talebzadeh <mi...@gmail.com>.

Thanks Jorn.

The way WanDisco promotes itself is doing block level replication. as I
understand you modify core-file.xml and add couple of network server
locations there. they call this tool Fusion. there are at least 2 fusion
servers for high availability. each one among other things has a database
of its own. Once the client interacts with HDFS the fusion server behaves
like a sniffer  with its own port. As soon as the first HTFS block of
256MBout of say a file of 30GB is written, it starts sending that block to
recipient. the laws of physics, the pipeline size etc applies here. That is
up to the consumer. it can 10 files at the same time etc. so that is all.
It is a known technology now labeled as streaming. so in summary it does
not have to wait for the full file to be written to HDFS before replicating
blocks.  that is where it scores.

It helps WAN work. Say the primary/active HDFS is in London and the
replicate is in Singapore. so users in Singapore can see replicated data
(eventually) when it gets there. It can obviously be used for DR in that
case it is like Hot standby (borrowing a terminology from Sybase). In
contrast one can do the same with period loads with homemade tools or tools
like BDR from Cloudera.

I mentioned that Hive is going to have its metastore on Hbase as well and
that can be potential problems. The site is here <https://www.wandisco.com/>

They are claiming there is no competitors in the market for their streaming
HA product.

HTH

Dr Mich Talebzadeh



LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.



On 12 November 2016 at 11:17, Jörn Franke <jo...@gmail.com> wrote:

> What is wrong with the good old batch transfer for transferring data from
> a cluster to another? I assume your use case is only business continuity in
> case of disasters such as data center loss, which are unlikely to happen
> (well it does not mean they do not happen) and where you could afford to
> loose one day (or hour) of data (depends!).
>
> Nevertheless, I assume he refers to the Hadoop storage policies:
> https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-
> hdfs/ArchivalStorage.html , but this still only works for the same
> cluster.
>
> You could also develop a custom secondary file system, similar to the
> Ignite Cache filesystem, that sits on top of HDFS and as soon as it
> receives data it sends them to another cluster and provides it to HDFS. Not
> knowing Wandisco, I assume what it does. Given the prices (and the fact
> that clusters tend to grow) you may want to evaluate if buying or making
> makes sense. In any case, it also requires evaluation of network
> throughput, because this may become the bottleneck somewhere (either within
> the cluster or more likely between data centers).
>
> As you mentioned, Hbase & Co may require a special consideration for the
> case that data is in-memory and not yet persisted.
>
> On Sat, Nov 12, 2016 at 12:04 PM, Mich Talebzadeh <
> mich.talebzadeh@gmail.com> wrote:
>
>> thanks Vince
>>
>> can you provide more details on this pls
>>
>> Dr Mich Talebzadeh
>>
>>
>>
>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>
>>
>>
>> http://talebzadehmich.wordpress.com
>>
>>
>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>> any loss, damage or destruction of data or any other property which may
>> arise from relying on this email's technical content is explicitly
>> disclaimed. The author will in no case be liable for any monetary damages
>> arising from such loss, damage or destruction.
>>
>>
>>
>> On 12 November 2016 at 09:52, vincent gromakowski <
>> vincent.gromakowski@gmail.com> wrote:
>>
>>> A Hdfs tiering policy with good tags should be similar
>>>
>>> Le 11 nov. 2016 11:19 PM, "Mich Talebzadeh" <mi...@gmail.com>
>>> a écrit :
>>>
>>>> I really don't see why one wants to set up streaming replication unless
>>>> for situations where similar functionality to transactional databases is
>>>> required in big data?
>>>>
>>>> Dr Mich Talebzadeh
>>>>
>>>>
>>>>
>>>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>>
>>>>
>>>>
>>>> http://talebzadehmich.wordpress.com
>>>>
>>>>
>>>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>>>> any loss, damage or destruction of data or any other property which may
>>>> arise from relying on this email's technical content is explicitly
>>>> disclaimed. The author will in no case be liable for any monetary damages
>>>> arising from such loss, damage or destruction.
>>>>
>>>>
>>>>
>>>> On 11 November 2016 at 17:24, Mich Talebzadeh <
>>>> mich.talebzadeh@gmail.com> wrote:
>>>>
>>>>> I think it differs as it starts streaming data through its own port as
>>>>> soon as the first block is landed. so the granularity is a block.
>>>>>
>>>>> however, think of it as oracle golden gate replication or sap
>>>>> replication for databases. the only difference is that if the corruption in
>>>>> the block with hdfs it will be freplicated much like srdf.
>>>>>
>>>>> whereas with oracle or sap it is log based replication which stops
>>>>> when it encounters corruption.
>>>>>
>>>>> replication depends on the block. so can replicate hive metadata and
>>>>> fsimage etc. but cannot replicate hbase memstore if hbase crashes.
>>>>>
>>>>> so that is the gist of it. streaming replication as opposed to
>>>>> snapshot.
>>>>>
>>>>> sounds familiar. think of it as log shipping in oracle old days versus
>>>>> goldengate etc.
>>>>>
>>>>> hth
>>>>>
>>>>> Dr Mich Talebzadeh
>>>>>
>>>>>
>>>>>
>>>>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>>>
>>>>>
>>>>>
>>>>> http://talebzadehmich.wordpress.com
>>>>>
>>>>>
>>>>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>>>>> any loss, damage or destruction of data or any other property which may
>>>>> arise from relying on this email's technical content is explicitly
>>>>> disclaimed. The author will in no case be liable for any monetary damages
>>>>> arising from such loss, damage or destruction.
>>>>>
>>>>>
>>>>>
>>>>> On 11 November 2016 at 17:14, Deepak Sharma <de...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Reason being you can set up hdfs duplication on your own to some
>>>>>> other cluster .
>>>>>>
>>>>>> On Nov 11, 2016 22:42, "Mich Talebzadeh" <mi...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> reason being ?
>>>>>>>
>>>>>>> Dr Mich Talebzadeh
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>>>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> http://talebzadehmich.wordpress.com
>>>>>>>
>>>>>>>
>>>>>>> *Disclaimer:* Use it at your own risk. Any and all responsibility
>>>>>>> for any loss, damage or destruction of data or any other property which may
>>>>>>> arise from relying on this email's technical content is explicitly
>>>>>>> disclaimed. The author will in no case be liable for any monetary damages
>>>>>>> arising from such loss, damage or destruction.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On 11 November 2016 at 17:11, Deepak Sharma <de...@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> This is waste of money I guess.
>>>>>>>>
>>>>>>>> On Nov 11, 2016 22:41, "Mich Talebzadeh" <mi...@gmail.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> starts at $4,000 per node per year all inclusive.
>>>>>>>>>
>>>>>>>>> With discount it can be halved but we are talking a node itself so
>>>>>>>>> if you have 5 nodes in primary and 5 nodes in DR we are talking about $40K
>>>>>>>>> already.
>>>>>>>>>
>>>>>>>>> HTH
>>>>>>>>>
>>>>>>>>> Dr Mich Talebzadeh
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>>>>>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> http://talebzadehmich.wordpress.com
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> *Disclaimer:* Use it at your own risk. Any and all responsibility
>>>>>>>>> for any loss, damage or destruction of data or any other property which may
>>>>>>>>> arise from relying on this email's technical content is explicitly
>>>>>>>>> disclaimed. The author will in no case be liable for any monetary damages
>>>>>>>>> arising from such loss, damage or destruction.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 11 November 2016 at 16:43, Mudit Kumar <mk...@sapient.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Is it feasible cost wise?
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>>
>>>>>>>>>> Mudit
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> *From:* Mich Talebzadeh [mailto:mich.talebzadeh@gmail.com]
>>>>>>>>>> *Sent:* Friday, November 11, 2016 2:56 PM
>>>>>>>>>> *To:* user @spark
>>>>>>>>>> *Subject:* Possible DR solution
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Hi,
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Has anyone had experience of using WanDisco
>>>>>>>>>> <https://www.wandisco.com/> block replication to create a fault
>>>>>>>>>> tolerant solution to DR in Hadoop?
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> The product claims that it starts replicating as soon as the
>>>>>>>>>> first data block lands on HDFS and takes the block and sends it to
>>>>>>>>>> DR/replicate site. The idea is that is faster than doing it through
>>>>>>>>>> traditional HDFS copy tools which are normally batch oriented.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> It also claims to replicate Hive metadata as well.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> I wanted to gauge if anyone has used it or a competitor product.
>>>>>>>>>> The claim is that they do not have competitors!
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Thanks
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Dr Mich Talebzadeh
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> LinkedIn  *https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>>>>>>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> http://talebzadehmich.wordpress.com
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> *Disclaimer:* Use it at your own risk. Any and all
>>>>>>>>>> responsibility for any loss, damage or destruction of data or any other
>>>>>>>>>> property which may arise from relying on this email's technical content is
>>>>>>>>>> explicitly disclaimed. The author will in no case be liable for any
>>>>>>>>>> monetary damages arising from such loss, damage or destruction.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>
>>>>>
>>>>
>>
>

Re: Possible DR solution

Posted by Jörn Franke <jo...@gmail.com>.

What is wrong with the good old batch transfer for transferring data from a
cluster to another? I assume your use case is only business continuity in
case of disasters such as data center loss, which are unlikely to happen
(well it does not mean they do not happen) and where you could afford to
loose one day (or hour) of data (depends!).

Nevertheless, I assume he refers to the Hadoop storage policies:
https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/ArchivalStorage.html
, but this still only works for the same cluster.

You could also develop a custom secondary file system, similar to the
Ignite Cache filesystem, that sits on top of HDFS and as soon as it
receives data it sends them to another cluster and provides it to HDFS. Not
knowing Wandisco, I assume what it does. Given the prices (and the fact
that clusters tend to grow) you may want to evaluate if buying or making
makes sense. In any case, it also requires evaluation of network
throughput, because this may become the bottleneck somewhere (either within
the cluster or more likely between data centers).

As you mentioned, Hbase & Co may require a special consideration for the
case that data is in-memory and not yet persisted.

On Sat, Nov 12, 2016 at 12:04 PM, Mich Talebzadeh <mich.talebzadeh@gmail.com
> wrote:

> thanks Vince
>
> can you provide more details on this pls
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
>
>
> http://talebzadehmich.wordpress.com
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
> On 12 November 2016 at 09:52, vincent gromakowski <
> vincent.gromakowski@gmail.com> wrote:
>
>> A Hdfs tiering policy with good tags should be similar
>>
>> Le 11 nov. 2016 11:19 PM, "Mich Talebzadeh" <mi...@gmail.com>
>> a écrit :
>>
>>> I really don't see why one wants to set up streaming replication unless
>>> for situations where similar functionality to transactional databases is
>>> required in big data?
>>>
>>> Dr Mich Talebzadeh
>>>
>>>
>>>
>>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>
>>>
>>>
>>> http://talebzadehmich.wordpress.com
>>>
>>>
>>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>>> any loss, damage or destruction of data or any other property which may
>>> arise from relying on this email's technical content is explicitly
>>> disclaimed. The author will in no case be liable for any monetary damages
>>> arising from such loss, damage or destruction.
>>>
>>>
>>>
>>> On 11 November 2016 at 17:24, Mich Talebzadeh <mich.talebzadeh@gmail.com
>>> > wrote:
>>>
>>>> I think it differs as it starts streaming data through its own port as
>>>> soon as the first block is landed. so the granularity is a block.
>>>>
>>>> however, think of it as oracle golden gate replication or sap
>>>> replication for databases. the only difference is that if the corruption in
>>>> the block with hdfs it will be freplicated much like srdf.
>>>>
>>>> whereas with oracle or sap it is log based replication which stops when
>>>> it encounters corruption.
>>>>
>>>> replication depends on the block. so can replicate hive metadata and
>>>> fsimage etc. but cannot replicate hbase memstore if hbase crashes.
>>>>
>>>> so that is the gist of it. streaming replication as opposed to
>>>> snapshot.
>>>>
>>>> sounds familiar. think of it as log shipping in oracle old days versus
>>>> goldengate etc.
>>>>
>>>> hth
>>>>
>>>> Dr Mich Talebzadeh
>>>>
>>>>
>>>>
>>>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>>
>>>>
>>>>
>>>> http://talebzadehmich.wordpress.com
>>>>
>>>>
>>>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>>>> any loss, damage or destruction of data or any other property which may
>>>> arise from relying on this email's technical content is explicitly
>>>> disclaimed. The author will in no case be liable for any monetary damages
>>>> arising from such loss, damage or destruction.
>>>>
>>>>
>>>>
>>>> On 11 November 2016 at 17:14, Deepak Sharma <de...@gmail.com>
>>>> wrote:
>>>>
>>>>> Reason being you can set up hdfs duplication on your own to some other
>>>>> cluster .
>>>>>
>>>>> On Nov 11, 2016 22:42, "Mich Talebzadeh" <mi...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> reason being ?
>>>>>>
>>>>>> Dr Mich Talebzadeh
>>>>>>
>>>>>>
>>>>>>
>>>>>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>>>>
>>>>>>
>>>>>>
>>>>>> http://talebzadehmich.wordpress.com
>>>>>>
>>>>>>
>>>>>> *Disclaimer:* Use it at your own risk. Any and all responsibility
>>>>>> for any loss, damage or destruction of data or any other property which may
>>>>>> arise from relying on this email's technical content is explicitly
>>>>>> disclaimed. The author will in no case be liable for any monetary damages
>>>>>> arising from such loss, damage or destruction.
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 11 November 2016 at 17:11, Deepak Sharma <de...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> This is waste of money I guess.
>>>>>>>
>>>>>>> On Nov 11, 2016 22:41, "Mich Talebzadeh" <mi...@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> starts at $4,000 per node per year all inclusive.
>>>>>>>>
>>>>>>>> With discount it can be halved but we are talking a node itself so
>>>>>>>> if you have 5 nodes in primary and 5 nodes in DR we are talking about $40K
>>>>>>>> already.
>>>>>>>>
>>>>>>>> HTH
>>>>>>>>
>>>>>>>> Dr Mich Talebzadeh
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>>>>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> http://talebzadehmich.wordpress.com
>>>>>>>>
>>>>>>>>
>>>>>>>> *Disclaimer:* Use it at your own risk. Any and all responsibility
>>>>>>>> for any loss, damage or destruction of data or any other property which may
>>>>>>>> arise from relying on this email's technical content is explicitly
>>>>>>>> disclaimed. The author will in no case be liable for any monetary damages
>>>>>>>> arising from such loss, damage or destruction.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On 11 November 2016 at 16:43, Mudit Kumar <mk...@sapient.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Is it feasible cost wise?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>>
>>>>>>>>> Mudit
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> *From:* Mich Talebzadeh [mailto:mich.talebzadeh@gmail.com]
>>>>>>>>> *Sent:* Friday, November 11, 2016 2:56 PM
>>>>>>>>> *To:* user @spark
>>>>>>>>> *Subject:* Possible DR solution
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Has anyone had experience of using WanDisco
>>>>>>>>> <https://www.wandisco.com/> block replication to create a fault
>>>>>>>>> tolerant solution to DR in Hadoop?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> The product claims that it starts replicating as soon as the first
>>>>>>>>> data block lands on HDFS and takes the block and sends it to DR/replicate
>>>>>>>>> site. The idea is that is faster than doing it through traditional HDFS
>>>>>>>>> copy tools which are normally batch oriented.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> It also claims to replicate Hive metadata as well.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> I wanted to gauge if anyone has used it or a competitor product.
>>>>>>>>> The claim is that they do not have competitors!
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Thanks
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Dr Mich Talebzadeh
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> LinkedIn  *https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>>>>>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> http://talebzadehmich.wordpress.com
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> *Disclaimer:* Use it at your own risk. Any and all responsibility
>>>>>>>>> for any loss, damage or destruction of data or any other property which may
>>>>>>>>> arise from relying on this email's technical content is explicitly
>>>>>>>>> disclaimed. The author will in no case be liable for any monetary damages
>>>>>>>>> arising from such loss, damage or destruction.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>
>>>>
>>>
>

Re: Possible DR solution

Posted by Mich Talebzadeh <mi...@gmail.com>.

thanks Vince

can you provide more details on this pls

Dr Mich Talebzadeh



LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.



On 12 November 2016 at 09:52, vincent gromakowski <
vincent.gromakowski@gmail.com> wrote:

> A Hdfs tiering policy with good tags should be similar
>
> Le 11 nov. 2016 11:19 PM, "Mich Talebzadeh" <mi...@gmail.com> a
> écrit :
>
>> I really don't see why one wants to set up streaming replication unless
>> for situations where similar functionality to transactional databases is
>> required in big data?
>>
>> Dr Mich Talebzadeh
>>
>>
>>
>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>
>>
>>
>> http://talebzadehmich.wordpress.com
>>
>>
>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>> any loss, damage or destruction of data or any other property which may
>> arise from relying on this email's technical content is explicitly
>> disclaimed. The author will in no case be liable for any monetary damages
>> arising from such loss, damage or destruction.
>>
>>
>>
>> On 11 November 2016 at 17:24, Mich Talebzadeh <mi...@gmail.com>
>> wrote:
>>
>>> I think it differs as it starts streaming data through its own port as
>>> soon as the first block is landed. so the granularity is a block.
>>>
>>> however, think of it as oracle golden gate replication or sap
>>> replication for databases. the only difference is that if the corruption in
>>> the block with hdfs it will be freplicated much like srdf.
>>>
>>> whereas with oracle or sap it is log based replication which stops when
>>> it encounters corruption.
>>>
>>> replication depends on the block. so can replicate hive metadata and
>>> fsimage etc. but cannot replicate hbase memstore if hbase crashes.
>>>
>>> so that is the gist of it. streaming replication as opposed to snapshot.
>>>
>>> sounds familiar. think of it as log shipping in oracle old days versus
>>> goldengate etc.
>>>
>>> hth
>>>
>>> Dr Mich Talebzadeh
>>>
>>>
>>>
>>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>
>>>
>>>
>>> http://talebzadehmich.wordpress.com
>>>
>>>
>>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>>> any loss, damage or destruction of data or any other property which may
>>> arise from relying on this email's technical content is explicitly
>>> disclaimed. The author will in no case be liable for any monetary damages
>>> arising from such loss, damage or destruction.
>>>
>>>
>>>
>>> On 11 November 2016 at 17:14, Deepak Sharma <de...@gmail.com>
>>> wrote:
>>>
>>>> Reason being you can set up hdfs duplication on your own to some other
>>>> cluster .
>>>>
>>>> On Nov 11, 2016 22:42, "Mich Talebzadeh" <mi...@gmail.com>
>>>> wrote:
>>>>
>>>>> reason being ?
>>>>>
>>>>> Dr Mich Talebzadeh
>>>>>
>>>>>
>>>>>
>>>>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>>>
>>>>>
>>>>>
>>>>> http://talebzadehmich.wordpress.com
>>>>>
>>>>>
>>>>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>>>>> any loss, damage or destruction of data or any other property which may
>>>>> arise from relying on this email's technical content is explicitly
>>>>> disclaimed. The author will in no case be liable for any monetary damages
>>>>> arising from such loss, damage or destruction.
>>>>>
>>>>>
>>>>>
>>>>> On 11 November 2016 at 17:11, Deepak Sharma <de...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> This is waste of money I guess.
>>>>>>
>>>>>> On Nov 11, 2016 22:41, "Mich Talebzadeh" <mi...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> starts at $4,000 per node per year all inclusive.
>>>>>>>
>>>>>>> With discount it can be halved but we are talking a node itself so
>>>>>>> if you have 5 nodes in primary and 5 nodes in DR we are talking about $40K
>>>>>>> already.
>>>>>>>
>>>>>>> HTH
>>>>>>>
>>>>>>> Dr Mich Talebzadeh
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>>>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> http://talebzadehmich.wordpress.com
>>>>>>>
>>>>>>>
>>>>>>> *Disclaimer:* Use it at your own risk. Any and all responsibility
>>>>>>> for any loss, damage or destruction of data or any other property which may
>>>>>>> arise from relying on this email's technical content is explicitly
>>>>>>> disclaimed. The author will in no case be liable for any monetary damages
>>>>>>> arising from such loss, damage or destruction.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On 11 November 2016 at 16:43, Mudit Kumar <mk...@sapient.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Is it feasible cost wise?
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>>
>>>>>>>> Mudit
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> *From:* Mich Talebzadeh [mailto:mich.talebzadeh@gmail.com]
>>>>>>>> *Sent:* Friday, November 11, 2016 2:56 PM
>>>>>>>> *To:* user @spark
>>>>>>>> *Subject:* Possible DR solution
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Has anyone had experience of using WanDisco
>>>>>>>> <https://www.wandisco.com/> block replication to create a fault
>>>>>>>> tolerant solution to DR in Hadoop?
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> The product claims that it starts replicating as soon as the first
>>>>>>>> data block lands on HDFS and takes the block and sends it to DR/replicate
>>>>>>>> site. The idea is that is faster than doing it through traditional HDFS
>>>>>>>> copy tools which are normally batch oriented.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> It also claims to replicate Hive metadata as well.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> I wanted to gauge if anyone has used it or a competitor product.
>>>>>>>> The claim is that they do not have competitors!
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Thanks
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Dr Mich Talebzadeh
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> LinkedIn  *https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>>>>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> http://talebzadehmich.wordpress.com
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> *Disclaimer:* Use it at your own risk. Any and all responsibility
>>>>>>>> for any loss, damage or destruction of data or any other property which may
>>>>>>>> arise from relying on this email's technical content is explicitly
>>>>>>>> disclaimed. The author will in no case be liable for any monetary damages
>>>>>>>> arising from such loss, damage or destruction.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>
>>>
>>

Re: Possible DR solution

Posted by vincent gromakowski <vi...@gmail.com>.

A Hdfs tiering policy with good tags should be similar

Le 11 nov. 2016 11:19 PM, "Mich Talebzadeh" <mi...@gmail.com> a
écrit :

> I really don't see why one wants to set up streaming replication unless
> for situations where similar functionality to transactional databases is
> required in big data?
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
>
>
> http://talebzadehmich.wordpress.com
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
> On 11 November 2016 at 17:24, Mich Talebzadeh <mi...@gmail.com>
> wrote:
>
>> I think it differs as it starts streaming data through its own port as
>> soon as the first block is landed. so the granularity is a block.
>>
>> however, think of it as oracle golden gate replication or sap replication
>> for databases. the only difference is that if the corruption in the block
>> with hdfs it will be freplicated much like srdf.
>>
>> whereas with oracle or sap it is log based replication which stops when
>> it encounters corruption.
>>
>> replication depends on the block. so can replicate hive metadata and
>> fsimage etc. but cannot replicate hbase memstore if hbase crashes.
>>
>> so that is the gist of it. streaming replication as opposed to snapshot.
>>
>> sounds familiar. think of it as log shipping in oracle old days versus
>> goldengate etc.
>>
>> hth
>>
>> Dr Mich Talebzadeh
>>
>>
>>
>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>
>>
>>
>> http://talebzadehmich.wordpress.com
>>
>>
>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>> any loss, damage or destruction of data or any other property which may
>> arise from relying on this email's technical content is explicitly
>> disclaimed. The author will in no case be liable for any monetary damages
>> arising from such loss, damage or destruction.
>>
>>
>>
>> On 11 November 2016 at 17:14, Deepak Sharma <de...@gmail.com>
>> wrote:
>>
>>> Reason being you can set up hdfs duplication on your own to some other
>>> cluster .
>>>
>>> On Nov 11, 2016 22:42, "Mich Talebzadeh" <mi...@gmail.com>
>>> wrote:
>>>
>>>> reason being ?
>>>>
>>>> Dr Mich Talebzadeh
>>>>
>>>>
>>>>
>>>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>>
>>>>
>>>>
>>>> http://talebzadehmich.wordpress.com
>>>>
>>>>
>>>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>>>> any loss, damage or destruction of data or any other property which may
>>>> arise from relying on this email's technical content is explicitly
>>>> disclaimed. The author will in no case be liable for any monetary damages
>>>> arising from such loss, damage or destruction.
>>>>
>>>>
>>>>
>>>> On 11 November 2016 at 17:11, Deepak Sharma <de...@gmail.com>
>>>> wrote:
>>>>
>>>>> This is waste of money I guess.
>>>>>
>>>>> On Nov 11, 2016 22:41, "Mich Talebzadeh" <mi...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> starts at $4,000 per node per year all inclusive.
>>>>>>
>>>>>> With discount it can be halved but we are talking a node itself so if
>>>>>> you have 5 nodes in primary and 5 nodes in DR we are talking about $40K
>>>>>> already.
>>>>>>
>>>>>> HTH
>>>>>>
>>>>>> Dr Mich Talebzadeh
>>>>>>
>>>>>>
>>>>>>
>>>>>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>>>>
>>>>>>
>>>>>>
>>>>>> http://talebzadehmich.wordpress.com
>>>>>>
>>>>>>
>>>>>> *Disclaimer:* Use it at your own risk. Any and all responsibility
>>>>>> for any loss, damage or destruction of data or any other property which may
>>>>>> arise from relying on this email's technical content is explicitly
>>>>>> disclaimed. The author will in no case be liable for any monetary damages
>>>>>> arising from such loss, damage or destruction.
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 11 November 2016 at 16:43, Mudit Kumar <mk...@sapient.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Is it feasible cost wise?
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Thanks,
>>>>>>>
>>>>>>> Mudit
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> *From:* Mich Talebzadeh [mailto:mich.talebzadeh@gmail.com]
>>>>>>> *Sent:* Friday, November 11, 2016 2:56 PM
>>>>>>> *To:* user @spark
>>>>>>> *Subject:* Possible DR solution
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Has anyone had experience of using WanDisco
>>>>>>> <https://www.wandisco.com/> block replication to create a fault
>>>>>>> tolerant solution to DR in Hadoop?
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> The product claims that it starts replicating as soon as the first
>>>>>>> data block lands on HDFS and takes the block and sends it to DR/replicate
>>>>>>> site. The idea is that is faster than doing it through traditional HDFS
>>>>>>> copy tools which are normally batch oriented.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> It also claims to replicate Hive metadata as well.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> I wanted to gauge if anyone has used it or a competitor product. The
>>>>>>> claim is that they do not have competitors!
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Thanks
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Dr Mich Talebzadeh
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> LinkedIn  *https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>>>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> http://talebzadehmich.wordpress.com
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> *Disclaimer:* Use it at your own risk. Any and all responsibility
>>>>>>> for any loss, damage or destruction of data or any other property which may
>>>>>>> arise from relying on this email's technical content is explicitly
>>>>>>> disclaimed. The author will in no case be liable for any monetary damages
>>>>>>> arising from such loss, damage or destruction.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>
>>
>

Re: Possible DR solution

Posted by Mich Talebzadeh <mi...@gmail.com>.

I really don't see why one wants to set up streaming replication unless for
situations where similar functionality to transactional databases is
required in big data?

Dr Mich Talebzadeh



LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.



On 11 November 2016 at 17:24, Mich Talebzadeh <mi...@gmail.com>
wrote:

> I think it differs as it starts streaming data through its own port as
> soon as the first block is landed. so the granularity is a block.
>
> however, think of it as oracle golden gate replication or sap replication
> for databases. the only difference is that if the corruption in the block
> with hdfs it will be freplicated much like srdf.
>
> whereas with oracle or sap it is log based replication which stops when it
> encounters corruption.
>
> replication depends on the block. so can replicate hive metadata and
> fsimage etc. but cannot replicate hbase memstore if hbase crashes.
>
> so that is the gist of it. streaming replication as opposed to snapshot.
>
> sounds familiar. think of it as log shipping in oracle old days versus
> goldengate etc.
>
> hth
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
>
>
> http://talebzadehmich.wordpress.com
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
> On 11 November 2016 at 17:14, Deepak Sharma <de...@gmail.com> wrote:
>
>> Reason being you can set up hdfs duplication on your own to some other
>> cluster .
>>
>> On Nov 11, 2016 22:42, "Mich Talebzadeh" <mi...@gmail.com>
>> wrote:
>>
>>> reason being ?
>>>
>>> Dr Mich Talebzadeh
>>>
>>>
>>>
>>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>
>>>
>>>
>>> http://talebzadehmich.wordpress.com
>>>
>>>
>>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>>> any loss, damage or destruction of data or any other property which may
>>> arise from relying on this email's technical content is explicitly
>>> disclaimed. The author will in no case be liable for any monetary damages
>>> arising from such loss, damage or destruction.
>>>
>>>
>>>
>>> On 11 November 2016 at 17:11, Deepak Sharma <de...@gmail.com>
>>> wrote:
>>>
>>>> This is waste of money I guess.
>>>>
>>>> On Nov 11, 2016 22:41, "Mich Talebzadeh" <mi...@gmail.com>
>>>> wrote:
>>>>
>>>>> starts at $4,000 per node per year all inclusive.
>>>>>
>>>>> With discount it can be halved but we are talking a node itself so if
>>>>> you have 5 nodes in primary and 5 nodes in DR we are talking about $40K
>>>>> already.
>>>>>
>>>>> HTH
>>>>>
>>>>> Dr Mich Talebzadeh
>>>>>
>>>>>
>>>>>
>>>>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>>>
>>>>>
>>>>>
>>>>> http://talebzadehmich.wordpress.com
>>>>>
>>>>>
>>>>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>>>>> any loss, damage or destruction of data or any other property which may
>>>>> arise from relying on this email's technical content is explicitly
>>>>> disclaimed. The author will in no case be liable for any monetary damages
>>>>> arising from such loss, damage or destruction.
>>>>>
>>>>>
>>>>>
>>>>> On 11 November 2016 at 16:43, Mudit Kumar <mk...@sapient.com>
>>>>> wrote:
>>>>>
>>>>>> Is it feasible cost wise?
>>>>>>
>>>>>>
>>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>> Mudit
>>>>>>
>>>>>>
>>>>>>
>>>>>> *From:* Mich Talebzadeh [mailto:mich.talebzadeh@gmail.com]
>>>>>> *Sent:* Friday, November 11, 2016 2:56 PM
>>>>>> *To:* user @spark
>>>>>> *Subject:* Possible DR solution
>>>>>>
>>>>>>
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>>
>>>>>>
>>>>>> Has anyone had experience of using WanDisco
>>>>>> <https://www.wandisco.com/> block replication to create a fault
>>>>>> tolerant solution to DR in Hadoop?
>>>>>>
>>>>>>
>>>>>>
>>>>>> The product claims that it starts replicating as soon as the first
>>>>>> data block lands on HDFS and takes the block and sends it to DR/replicate
>>>>>> site. The idea is that is faster than doing it through traditional HDFS
>>>>>> copy tools which are normally batch oriented.
>>>>>>
>>>>>>
>>>>>>
>>>>>> It also claims to replicate Hive metadata as well.
>>>>>>
>>>>>>
>>>>>>
>>>>>> I wanted to gauge if anyone has used it or a competitor product. The
>>>>>> claim is that they do not have competitors!
>>>>>>
>>>>>>
>>>>>>
>>>>>> Thanks
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> Dr Mich Talebzadeh
>>>>>>
>>>>>>
>>>>>>
>>>>>> LinkedIn  *https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>>>>
>>>>>>
>>>>>>
>>>>>> http://talebzadehmich.wordpress.com
>>>>>>
>>>>>>
>>>>>>
>>>>>> *Disclaimer:* Use it at your own risk. Any and all responsibility
>>>>>> for any loss, damage or destruction of data or any other property which may
>>>>>> arise from relying on this email's technical content is explicitly
>>>>>> disclaimed. The author will in no case be liable for any monetary damages
>>>>>> arising from such loss, damage or destruction.
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>
>

Re: Possible DR solution

Posted by Mich Talebzadeh <mi...@gmail.com>.

I think it differs as it starts streaming data through its own port as soon
as the first block is landed. so the granularity is a block.

however, think of it as oracle golden gate replication or sap replication
for databases. the only difference is that if the corruption in the block
with hdfs it will be freplicated much like srdf.

whereas with oracle or sap it is log based replication which stops when it
encounters corruption.

replication depends on the block. so can replicate hive metadata and
fsimage etc. but cannot replicate hbase memstore if hbase crashes.

so that is the gist of it. streaming replication as opposed to snapshot.

sounds familiar. think of it as log shipping in oracle old days versus
goldengate etc.

hth

Dr Mich Talebzadeh



LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.



On 11 November 2016 at 17:14, Deepak Sharma <de...@gmail.com> wrote:

> Reason being you can set up hdfs duplication on your own to some other
> cluster .
>
> On Nov 11, 2016 22:42, "Mich Talebzadeh" <mi...@gmail.com>
> wrote:
>
>> reason being ?
>>
>> Dr Mich Talebzadeh
>>
>>
>>
>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>
>>
>>
>> http://talebzadehmich.wordpress.com
>>
>>
>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>> any loss, damage or destruction of data or any other property which may
>> arise from relying on this email's technical content is explicitly
>> disclaimed. The author will in no case be liable for any monetary damages
>> arising from such loss, damage or destruction.
>>
>>
>>
>> On 11 November 2016 at 17:11, Deepak Sharma <de...@gmail.com>
>> wrote:
>>
>>> This is waste of money I guess.
>>>
>>> On Nov 11, 2016 22:41, "Mich Talebzadeh" <mi...@gmail.com>
>>> wrote:
>>>
>>>> starts at $4,000 per node per year all inclusive.
>>>>
>>>> With discount it can be halved but we are talking a node itself so if
>>>> you have 5 nodes in primary and 5 nodes in DR we are talking about $40K
>>>> already.
>>>>
>>>> HTH
>>>>
>>>> Dr Mich Talebzadeh
>>>>
>>>>
>>>>
>>>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>>
>>>>
>>>>
>>>> http://talebzadehmich.wordpress.com
>>>>
>>>>
>>>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>>>> any loss, damage or destruction of data or any other property which may
>>>> arise from relying on this email's technical content is explicitly
>>>> disclaimed. The author will in no case be liable for any monetary damages
>>>> arising from such loss, damage or destruction.
>>>>
>>>>
>>>>
>>>> On 11 November 2016 at 16:43, Mudit Kumar <mk...@sapient.com>
>>>> wrote:
>>>>
>>>>> Is it feasible cost wise?
>>>>>
>>>>>
>>>>>
>>>>> Thanks,
>>>>>
>>>>> Mudit
>>>>>
>>>>>
>>>>>
>>>>> *From:* Mich Talebzadeh [mailto:mich.talebzadeh@gmail.com]
>>>>> *Sent:* Friday, November 11, 2016 2:56 PM
>>>>> *To:* user @spark
>>>>> *Subject:* Possible DR solution
>>>>>
>>>>>
>>>>>
>>>>> Hi,
>>>>>
>>>>>
>>>>>
>>>>> Has anyone had experience of using WanDisco
>>>>> <https://www.wandisco.com/> block replication to create a fault
>>>>> tolerant solution to DR in Hadoop?
>>>>>
>>>>>
>>>>>
>>>>> The product claims that it starts replicating as soon as the first
>>>>> data block lands on HDFS and takes the block and sends it to DR/replicate
>>>>> site. The idea is that is faster than doing it through traditional HDFS
>>>>> copy tools which are normally batch oriented.
>>>>>
>>>>>
>>>>>
>>>>> It also claims to replicate Hive metadata as well.
>>>>>
>>>>>
>>>>>
>>>>> I wanted to gauge if anyone has used it or a competitor product. The
>>>>> claim is that they do not have competitors!
>>>>>
>>>>>
>>>>>
>>>>> Thanks
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> Dr Mich Talebzadeh
>>>>>
>>>>>
>>>>>
>>>>> LinkedIn  *https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>>>
>>>>>
>>>>>
>>>>> http://talebzadehmich.wordpress.com
>>>>>
>>>>>
>>>>>
>>>>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>>>>> any loss, damage or destruction of data or any other property which may
>>>>> arise from relying on this email's technical content is explicitly
>>>>> disclaimed. The author will in no case be liable for any monetary damages
>>>>> arising from such loss, damage or destruction.
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>

Re: Possible DR solution

Posted by Deepak Sharma <de...@gmail.com>.

Reason being you can set up hdfs duplication on your own to some other
cluster .

On Nov 11, 2016 22:42, "Mich Talebzadeh" <mi...@gmail.com> wrote:

> reason being ?
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
>
>
> http://talebzadehmich.wordpress.com
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
> On 11 November 2016 at 17:11, Deepak Sharma <de...@gmail.com> wrote:
>
>> This is waste of money I guess.
>>
>> On Nov 11, 2016 22:41, "Mich Talebzadeh" <mi...@gmail.com>
>> wrote:
>>
>>> starts at $4,000 per node per year all inclusive.
>>>
>>> With discount it can be halved but we are talking a node itself so if
>>> you have 5 nodes in primary and 5 nodes in DR we are talking about $40K
>>> already.
>>>
>>> HTH
>>>
>>> Dr Mich Talebzadeh
>>>
>>>
>>>
>>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>
>>>
>>>
>>> http://talebzadehmich.wordpress.com
>>>
>>>
>>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>>> any loss, damage or destruction of data or any other property which may
>>> arise from relying on this email's technical content is explicitly
>>> disclaimed. The author will in no case be liable for any monetary damages
>>> arising from such loss, damage or destruction.
>>>
>>>
>>>
>>> On 11 November 2016 at 16:43, Mudit Kumar <mk...@sapient.com> wrote:
>>>
>>>> Is it feasible cost wise?
>>>>
>>>>
>>>>
>>>> Thanks,
>>>>
>>>> Mudit
>>>>
>>>>
>>>>
>>>> *From:* Mich Talebzadeh [mailto:mich.talebzadeh@gmail.com]
>>>> *Sent:* Friday, November 11, 2016 2:56 PM
>>>> *To:* user @spark
>>>> *Subject:* Possible DR solution
>>>>
>>>>
>>>>
>>>> Hi,
>>>>
>>>>
>>>>
>>>> Has anyone had experience of using WanDisco <https://www.wandisco.com/>
>>>> block replication to create a fault tolerant solution to DR in Hadoop?
>>>>
>>>>
>>>>
>>>> The product claims that it starts replicating as soon as the first data
>>>> block lands on HDFS and takes the block and sends it to DR/replicate site.
>>>> The idea is that is faster than doing it through traditional HDFS copy
>>>> tools which are normally batch oriented.
>>>>
>>>>
>>>>
>>>> It also claims to replicate Hive metadata as well.
>>>>
>>>>
>>>>
>>>> I wanted to gauge if anyone has used it or a competitor product. The
>>>> claim is that they do not have competitors!
>>>>
>>>>
>>>>
>>>> Thanks
>>>>
>>>>
>>>>
>>>>
>>>> Dr Mich Talebzadeh
>>>>
>>>>
>>>>
>>>> LinkedIn  *https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>>
>>>>
>>>>
>>>> http://talebzadehmich.wordpress.com
>>>>
>>>>
>>>>
>>>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>>>> any loss, damage or destruction of data or any other property which may
>>>> arise from relying on this email's technical content is explicitly
>>>> disclaimed. The author will in no case be liable for any monetary damages
>>>> arising from such loss, damage or destruction.
>>>>
>>>>
>>>>
>>>
>>>
>

Re: Possible DR solution

Posted by Mich Talebzadeh <mi...@gmail.com>.

reason being ?

Dr Mich Talebzadeh



LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.



On 11 November 2016 at 17:11, Deepak Sharma <de...@gmail.com> wrote:

> This is waste of money I guess.
>
> On Nov 11, 2016 22:41, "Mich Talebzadeh" <mi...@gmail.com>
> wrote:
>
>> starts at $4,000 per node per year all inclusive.
>>
>> With discount it can be halved but we are talking a node itself so if you
>> have 5 nodes in primary and 5 nodes in DR we are talking about $40K already.
>>
>> HTH
>>
>> Dr Mich Talebzadeh
>>
>>
>>
>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>
>>
>>
>> http://talebzadehmich.wordpress.com
>>
>>
>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>> any loss, damage or destruction of data or any other property which may
>> arise from relying on this email's technical content is explicitly
>> disclaimed. The author will in no case be liable for any monetary damages
>> arising from such loss, damage or destruction.
>>
>>
>>
>> On 11 November 2016 at 16:43, Mudit Kumar <mk...@sapient.com> wrote:
>>
>>> Is it feasible cost wise?
>>>
>>>
>>>
>>> Thanks,
>>>
>>> Mudit
>>>
>>>
>>>
>>> *From:* Mich Talebzadeh [mailto:mich.talebzadeh@gmail.com]
>>> *Sent:* Friday, November 11, 2016 2:56 PM
>>> *To:* user @spark
>>> *Subject:* Possible DR solution
>>>
>>>
>>>
>>> Hi,
>>>
>>>
>>>
>>> Has anyone had experience of using WanDisco <https://www.wandisco.com/>
>>> block replication to create a fault tolerant solution to DR in Hadoop?
>>>
>>>
>>>
>>> The product claims that it starts replicating as soon as the first data
>>> block lands on HDFS and takes the block and sends it to DR/replicate site.
>>> The idea is that is faster than doing it through traditional HDFS copy
>>> tools which are normally batch oriented.
>>>
>>>
>>>
>>> It also claims to replicate Hive metadata as well.
>>>
>>>
>>>
>>> I wanted to gauge if anyone has used it or a competitor product. The
>>> claim is that they do not have competitors!
>>>
>>>
>>>
>>> Thanks
>>>
>>>
>>>
>>>
>>> Dr Mich Talebzadeh
>>>
>>>
>>>
>>> LinkedIn  *https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>
>>>
>>>
>>> http://talebzadehmich.wordpress.com
>>>
>>>
>>>
>>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>>> any loss, damage or destruction of data or any other property which may
>>> arise from relying on this email's technical content is explicitly
>>> disclaimed. The author will in no case be liable for any monetary damages
>>> arising from such loss, damage or destruction.
>>>
>>>
>>>
>>
>>

Re: Possible DR solution

Posted by Deepak Sharma <de...@gmail.com>.

This is waste of money I guess.

On Nov 11, 2016 22:41, "Mich Talebzadeh" <mi...@gmail.com> wrote:

> starts at $4,000 per node per year all inclusive.
>
> With discount it can be halved but we are talking a node itself so if you
> have 5 nodes in primary and 5 nodes in DR we are talking about $40K already.
>
> HTH
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
>
>
> http://talebzadehmich.wordpress.com
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
> On 11 November 2016 at 16:43, Mudit Kumar <mk...@sapient.com> wrote:
>
>> Is it feasible cost wise?
>>
>>
>>
>> Thanks,
>>
>> Mudit
>>
>>
>>
>> *From:* Mich Talebzadeh [mailto:mich.talebzadeh@gmail.com]
>> *Sent:* Friday, November 11, 2016 2:56 PM
>> *To:* user @spark
>> *Subject:* Possible DR solution
>>
>>
>>
>> Hi,
>>
>>
>>
>> Has anyone had experience of using WanDisco <https://www.wandisco.com/>
>> block replication to create a fault tolerant solution to DR in Hadoop?
>>
>>
>>
>> The product claims that it starts replicating as soon as the first data
>> block lands on HDFS and takes the block and sends it to DR/replicate site.
>> The idea is that is faster than doing it through traditional HDFS copy
>> tools which are normally batch oriented.
>>
>>
>>
>> It also claims to replicate Hive metadata as well.
>>
>>
>>
>> I wanted to gauge if anyone has used it or a competitor product. The
>> claim is that they do not have competitors!
>>
>>
>>
>> Thanks
>>
>>
>>
>>
>> Dr Mich Talebzadeh
>>
>>
>>
>> LinkedIn  *https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>
>>
>>
>> http://talebzadehmich.wordpress.com
>>
>>
>>
>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>> any loss, damage or destruction of data or any other property which may
>> arise from relying on this email's technical content is explicitly
>> disclaimed. The author will in no case be liable for any monetary damages
>> arising from such loss, damage or destruction.
>>
>>
>>
>
>

Re: Possible DR solution

Posted by Mich Talebzadeh <mi...@gmail.com>.

starts at $4,000 per node per year all inclusive.

With discount it can be halved but we are talking a node itself so if you
have 5 nodes in primary and 5 nodes in DR we are talking about $40K already.

HTH

Dr Mich Talebzadeh



LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.



On 11 November 2016 at 16:43, Mudit Kumar <mk...@sapient.com> wrote:

> Is it feasible cost wise?
>
>
>
> Thanks,
>
> Mudit
>
>
>
> *From:* Mich Talebzadeh [mailto:mich.talebzadeh@gmail.com]
> *Sent:* Friday, November 11, 2016 2:56 PM
> *To:* user @spark
> *Subject:* Possible DR solution
>
>
>
> Hi,
>
>
>
> Has anyone had experience of using WanDisco <https://www.wandisco.com/>
> block replication to create a fault tolerant solution to DR in Hadoop?
>
>
>
> The product claims that it starts replicating as soon as the first data
> block lands on HDFS and takes the block and sends it to DR/replicate site.
> The idea is that is faster than doing it through traditional HDFS copy
> tools which are normally batch oriented.
>
>
>
> It also claims to replicate Hive metadata as well.
>
>
>
> I wanted to gauge if anyone has used it or a competitor product. The claim
> is that they do not have competitors!
>
>
>
> Thanks
>
>
>
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn  *https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
>
>
> http://talebzadehmich.wordpress.com
>
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>

RE: Possible DR solution

Posted by Mudit Kumar <mk...@sapient.com>.

Is it feasible cost wise?

Thanks,
Mudit

From: Mich Talebzadeh [mailto:mich.talebzadeh@gmail.com]
Sent: Friday, November 11, 2016 2:56 PM
To: user @spark
Subject: Possible DR solution

Hi,

Has anyone had experience of using WanDisco<https://www.wandisco.com/> block replication to create a fault tolerant solution to DR in Hadoop?

The product claims that it starts replicating as soon as the first data block lands on HDFS and takes the block and sends it to DR/replicate site. The idea is that is faster than doing it through traditional HDFS copy tools which are normally batch oriented.

It also claims to replicate Hive metadata as well.

I wanted to gauge if anyone has used it or a competitor product. The claim is that they do not have competitors!

Thanks

Dr Mich Talebzadeh

LinkedIn  https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw

http://talebzadehmich.wordpress.com

Disclaimer: Use it at your own risk. Any and all responsibility for any loss, damage or destruction of data or any other property which may arise from relying on this email's technical content is explicitly disclaimed. The author will in no case be liable for any monetary damages arising from such loss, damage or destruction.