You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Udbhav Agarwal <ud...@syncoms.com> on 2017/07/13 05:57:15 UTC

Hbase on docker container with persistent storage

Hi All,
I need to run hbase 0.98 backed by hdfs on docker container and want to stop the data lost if the container restarts.
               As per my understanding of docker containers, they work in a way that if any of the container is stopped/killed , every information related to it gets killed. It implies if I am running hbase in a container and I have stored some data in some tables and consequently if the container is stopped then the data will be lost. I need a way in which I can stop this data loss.
               I have gone through concept of volume in docker. Is it possible to stop this data loss with this approach? What if volume gets corrupted? Is there any instance of volume running there which can be stopped and can cause data loss ?
               Is there a possibility that I can use hdfs running at some external host outside the docker and my hbase running inside docker ? Is such scenario possible ? If yes, How ?
               Thank you in advance.


Thanks,
Udbhav Agarwal


RE: Hbase on docker container with persistent storage

Posted by Udbhav Agarwal <ud...@syncoms.com>.
That’s great. I have used hbase docker from the following link.
https://github.com/apache/hbase/tree/master/dev-support/hbase_docker
I have the following observation. I run hbase shell using -
docker run -it hbase_docker
and create few tables in hbase using shell, put data in them and scan them. Things look fine. Now I close the shell, and run it again. Now I cant find the tables which I created before.
So is it the normal behaviour while using hbase inside docker ?


Thanks,
Udbhav
From: Dima Spivak [mailto:dimaspivak@apache.org]
Sent: Wednesday, July 19, 2017 7:22 PM
To: Udbhav Agarwal <ud...@syncoms.com>
Cc: user@hbase.apache.org
Subject: Re: Hbase on docker container with persistent storage

I've run HDFS/HBase in Docker containers across a handful of hosts while working on changes to the clusterdock project [1]. More often, though, I've worked with multiple Docker containers on a single machine (albeit with lots of storage) to test the components.

1. https://github.com/clusterdock/

-Dima

On Tue, Jul 18, 2017 at 9:52 PM, Udbhav Agarwal <ud...@syncoms.com>> wrote:
Okay, at which scale you have experience with ?

-----Original Message-----
From: Dima Spivak [mailto:dimaspivak@apache.org<ma...@apache.org>]
Sent: Monday, July 17, 2017 7:40 PM
To: user@hbase.apache.org<ma...@hbase.apache.org>
Subject: Re: Hbase on docker container with persistent storage

No, not at the scale you're looking at.

On Mon, Jul 17, 2017 at 6:36 AM Udbhav Agarwal <ud...@syncoms.com>>
wrote:

> Hi Dima,
> I am unable to containeriz HDFS till now. Do you have any reference
> which I can use to go ahead with that ?
>
> Thanks,
> Udbhav
>
> -----Original Message-----
> From: Dima Spivak [mailto:dimaspivak@apache.org<ma...@apache.org>]
> Sent: Monday, July 17, 2017 6:37 PM
> To: user@hbase.apache.org<ma...@hbase.apache.org>
> Subject: Re: Hbase on docker container with persistent storage
>
> Hi Udbhav,
>
> How have you containerized HDFS to run on Docker across 80 hosts? The
> answer to that would guide how you might add HBase into the mix.
>
> On Mon, Jul 17, 2017 at 5:33 AM Udbhav Agarwal
> <ud...@syncoms.com>
> >
> wrote:
>
> > Hi Dima,
> > Hope you are doing well.
> > Using hbase on a single host is performant because now I am not
> > dealing with Terabytes of data. For now data size is very
> > less.(around
> > 1 gb). This setup I am using to test my application.
> >                As a next step I have to grow the data as well as
> > storage and check performance. So I will need to use hbase deployed
> > on
> > 70-80 servers.
> >                Now can you please let me know how can I containerize
> > hbase so as to be able to use hbase backed by hdfs using 70-80 host
> > machines and not loose data if the container itself dies due to some
> reason?
> >
> > Thanks,
> > Udbhav
> >
> > From: Dima Spivak [mailto:dimaspivak@apache.org<ma...@apache.org>]
> > Sent: Friday, July 14, 2017 10:11 PM
> > To: Udbhav Agarwal <ud...@syncoms.com>>;
> > user@hbase.apache.org<ma...@hbase.apache.org>
> > Cc: dimaspivak@apache.org<ma...@apache.org>
> > Subject: Re: Hbase on docker container with persistent storage
> >
> > If running HBase on a single host is performant enough for you, why
> > use HBase at all? How are you currently storing your data?
> >
> > On Fri, Jul 14, 2017 at 6:07 AM Udbhav Agarwal
> > <ud...@syncoms.com> <ma...@syncoms.com>>> wrote:
> > Additionally, can you please provide me some links which can guide
> > me to setup up such system with volumes ? Thank you.
> >
> > Udbhav
> > -----Original Message-----
> > From: Udbhav Agarwal [mailto:udbhav.agarwal@syncoms.com<ma...@syncoms.com><mailto:
> > udbhav.agarwal@syncoms.com<ma...@syncoms.com>>]
> > Sent: Friday, July 14, 2017 6:31 PM
> > To: user@hbase.apache.org<ma...@hbase.apache.org>>
> > Cc: dimaspivak@apache.org<ma...@apache.org>>
> > Subject: RE: Hbase on docker container with persistent storage
> >
> > Thank you Dima for the response.
> >         Let me reiterate what I want to achieve in my case. I am
> > using hbase to persist my bigdata(Terabytes and petabytes) coming
> > from various sources through spark streaming and kafka.  Spark
> > streaming and kafka are running as separate microservices inside
> > different and
> excusive containers.
> > These containers are communicating with http service protocol.
> > Currently I am using hbase setup on 4 VMs on a single host machine.
> > I have a microservice inside a container to connect to this hbase.
> > This whole setup is functional and I am able to persist data into as
> > well as get data from hbase into spark streaming. My use case is of
> > real time ingestion into hbase as well as real time query from hbase.
> >         Now I am planning to deploy hbase itself inside container. I
> > want to know what are the options for this. In how many possible
> > ways I can achieve this ? If I use volumes of container, will they
> > be able to hold such amount of data (TBs & PBs) ? How will I setup
> > up hdfs
> inside volumes ?
> > how can I use the power of distributed file system there? Is this
> > the best way ?
> >
> >
> > Thanks,
> > Udbhav
> > -----Original Message-----
> > From: Dima Spivak [mailto:dimaspivak@apache.org<ma...@apache.org><mailto:
> > dimaspivak@apache.org<ma...@apache.org>>]
> > Sent: Friday, July 14, 2017 3:44 AM
> > To: hbase-user <us...@hbase.apache.org>>>
> > Subject: Re: Hbase on docker container with persistent storage
> >
> > Udbhav,
> >
> > Volumes are Docker's way of having folders or files from the host
> > machine bypass the union filesystem used within a Docker container.
> > As such, if a container with a volume is killed, the data from that
> > volume should remain there. That said, if whatever caused the
> > container to die affects the filesystem within the container, it
> > would
> also affect the data on the host.
> >
> > Running HBase in the manner you've described is not typical in
> > anything resembling a production environment, but if you explain
> > more about your use case, we could provide more advice. That said,
> > how you'd handle data locality and, in particular, multi-host
> > deployments of HBase in this manner is more of a concern for me than
> > volume data corruption. What kind of scale do you need to support?
> > What kind of
> performance do you expect?
> >
> > -Dima
> >
> > On Thu, Jul 13, 2017 at 12:18 AM, Samir Ahmic <ah...@gmail.com>
> > <ma...@gmail.com>>> wrote:
> >
> > > Hi Udbhav,
> > > Great work on hbase docker deployment was done in
> > > https://issues.apache.org/jira/browse/HBASE-12721 you may start
> > > your journey from there.  As for rest of your questions maybe
> > > there are some folks here that were doing similar testing and may
> > > give you more
> > info.
> > >
> > > Regards
> > > Samir
> > >
> > > On Thu, Jul 13, 2017 at 7:57 AM, Udbhav Agarwal <
> > > udbhav.agarwal@syncoms.com<ma...@syncoms.com>>>
> > > wrote:
> > >
> > > > Hi All,
> > > > I need to run hbase 0.98 backed by hdfs on docker container and
> > > > want to stop the data lost if the container restarts.
> > > >                As per my understanding of docker containers,
> > > > they work in a way that if any of the container is
> > > > stopped/killed , every information related to it gets killed. It
> > > > implies if I am running hbase in a
> > > container
> > > > and I have stored some data in some tables and consequently if
> > > > the container is stopped then the data will be lost. I need a
> > > > way in which I can stop this data loss.
> > > >                I have gone through concept of volume in docker.
> > > > Is it possible to stop this data loss with this approach? What
> > > > if volume gets corrupted? Is there any instance of volume
> > > > running there which can be stopped and can cause data loss ?
> > > >                Is there a possibility that I can use hdfs
> > > > running at some external host outside the docker and my hbase
> > > > running inside docker ? Is such scenario possible ? If yes, How ?
> > > >                Thank you in advance.
> > > >
> > > >
> > > > Thanks,
> > > > Udbhav Agarwal
> > > >
> > > >
> > >
> > --
> > -Dima
> >
> --
> -Dima
>
--
-Dima


Re: Hbase on docker container with persistent storage

Posted by Dima Spivak <di...@apache.org>.
I've run HDFS/HBase in Docker containers across a handful of hosts while
working on changes to the clusterdock project [1]. More often, though, I've
worked with multiple Docker containers on a single machine (albeit with
lots of storage) to test the components.

1. https://github.com/clusterdock/

-Dima

On Tue, Jul 18, 2017 at 9:52 PM, Udbhav Agarwal <ud...@syncoms.com>
wrote:

> Okay, at which scale you have experience with ?
>
> -----Original Message-----
> From: Dima Spivak [mailto:dimaspivak@apache.org]
> Sent: Monday, July 17, 2017 7:40 PM
> To: user@hbase.apache.org
> Subject: Re: Hbase on docker container with persistent storage
>
> No, not at the scale you're looking at.
>
> On Mon, Jul 17, 2017 at 6:36 AM Udbhav Agarwal <udbhav.agarwal@syncoms.com
> >
> wrote:
>
> > Hi Dima,
> > I am unable to containeriz HDFS till now. Do you have any reference
> > which I can use to go ahead with that ?
> >
> > Thanks,
> > Udbhav
> >
> > -----Original Message-----
> > From: Dima Spivak [mailto:dimaspivak@apache.org]
> > Sent: Monday, July 17, 2017 6:37 PM
> > To: user@hbase.apache.org
> > Subject: Re: Hbase on docker container with persistent storage
> >
> > Hi Udbhav,
> >
> > How have you containerized HDFS to run on Docker across 80 hosts? The
> > answer to that would guide how you might add HBase into the mix.
> >
> > On Mon, Jul 17, 2017 at 5:33 AM Udbhav Agarwal
> > <udbhav.agarwal@syncoms.com
> > >
> > wrote:
> >
> > > Hi Dima,
> > > Hope you are doing well.
> > > Using hbase on a single host is performant because now I am not
> > > dealing with Terabytes of data. For now data size is very
> > > less.(around
> > > 1 gb). This setup I am using to test my application.
> > >                As a next step I have to grow the data as well as
> > > storage and check performance. So I will need to use hbase deployed
> > > on
> > > 70-80 servers.
> > >                Now can you please let me know how can I containerize
> > > hbase so as to be able to use hbase backed by hdfs using 70-80 host
> > > machines and not loose data if the container itself dies due to some
> > reason?
> > >
> > > Thanks,
> > > Udbhav
> > >
> > > From: Dima Spivak [mailto:dimaspivak@apache.org]
> > > Sent: Friday, July 14, 2017 10:11 PM
> > > To: Udbhav Agarwal <ud...@syncoms.com>;
> > > user@hbase.apache.org
> > > Cc: dimaspivak@apache.org
> > > Subject: Re: Hbase on docker container with persistent storage
> > >
> > > If running HBase on a single host is performant enough for you, why
> > > use HBase at all? How are you currently storing your data?
> > >
> > > On Fri, Jul 14, 2017 at 6:07 AM Udbhav Agarwal
> > > <udbhav.agarwal@syncoms.com <ma...@syncoms.com>>
> wrote:
> > > Additionally, can you please provide me some links which can guide
> > > me to setup up such system with volumes ? Thank you.
> > >
> > > Udbhav
> > > -----Original Message-----
> > > From: Udbhav Agarwal [mailto:udbhav.agarwal@syncoms.com<mailto:
> > > udbhav.agarwal@syncoms.com>]
> > > Sent: Friday, July 14, 2017 6:31 PM
> > > To: user@hbase.apache.org<ma...@hbase.apache.org>
> > > Cc: dimaspivak@apache.org<ma...@apache.org>
> > > Subject: RE: Hbase on docker container with persistent storage
> > >
> > > Thank you Dima for the response.
> > >         Let me reiterate what I want to achieve in my case. I am
> > > using hbase to persist my bigdata(Terabytes and petabytes) coming
> > > from various sources through spark streaming and kafka.  Spark
> > > streaming and kafka are running as separate microservices inside
> > > different and
> > excusive containers.
> > > These containers are communicating with http service protocol.
> > > Currently I am using hbase setup on 4 VMs on a single host machine.
> > > I have a microservice inside a container to connect to this hbase.
> > > This whole setup is functional and I am able to persist data into as
> > > well as get data from hbase into spark streaming. My use case is of
> > > real time ingestion into hbase as well as real time query from hbase.
> > >         Now I am planning to deploy hbase itself inside container. I
> > > want to know what are the options for this. In how many possible
> > > ways I can achieve this ? If I use volumes of container, will they
> > > be able to hold such amount of data (TBs & PBs) ? How will I setup
> > > up hdfs
> > inside volumes ?
> > > how can I use the power of distributed file system there? Is this
> > > the best way ?
> > >
> > >
> > > Thanks,
> > > Udbhav
> > > -----Original Message-----
> > > From: Dima Spivak [mailto:dimaspivak@apache.org<mailto:
> > > dimaspivak@apache.org>]
> > > Sent: Friday, July 14, 2017 3:44 AM
> > > To: hbase-user <us...@hbase.apache.org>>
> > > Subject: Re: Hbase on docker container with persistent storage
> > >
> > > Udbhav,
> > >
> > > Volumes are Docker's way of having folders or files from the host
> > > machine bypass the union filesystem used within a Docker container.
> > > As such, if a container with a volume is killed, the data from that
> > > volume should remain there. That said, if whatever caused the
> > > container to die affects the filesystem within the container, it
> > > would
> > also affect the data on the host.
> > >
> > > Running HBase in the manner you've described is not typical in
> > > anything resembling a production environment, but if you explain
> > > more about your use case, we could provide more advice. That said,
> > > how you'd handle data locality and, in particular, multi-host
> > > deployments of HBase in this manner is more of a concern for me than
> > > volume data corruption. What kind of scale do you need to support?
> > > What kind of
> > performance do you expect?
> > >
> > > -Dima
> > >
> > > On Thu, Jul 13, 2017 at 12:18 AM, Samir Ahmic <ahmic.samir@gmail.com
> > > <ma...@gmail.com>> wrote:
> > >
> > > > Hi Udbhav,
> > > > Great work on hbase docker deployment was done in
> > > > https://issues.apache.org/jira/browse/HBASE-12721 you may start
> > > > your journey from there.  As for rest of your questions maybe
> > > > there are some folks here that were doing similar testing and may
> > > > give you more
> > > info.
> > > >
> > > > Regards
> > > > Samir
> > > >
> > > > On Thu, Jul 13, 2017 at 7:57 AM, Udbhav Agarwal <
> > > > udbhav.agarwal@syncoms.com<ma...@syncoms.com>>
> > > > wrote:
> > > >
> > > > > Hi All,
> > > > > I need to run hbase 0.98 backed by hdfs on docker container and
> > > > > want to stop the data lost if the container restarts.
> > > > >                As per my understanding of docker containers,
> > > > > they work in a way that if any of the container is
> > > > > stopped/killed , every information related to it gets killed. It
> > > > > implies if I am running hbase in a
> > > > container
> > > > > and I have stored some data in some tables and consequently if
> > > > > the container is stopped then the data will be lost. I need a
> > > > > way in which I can stop this data loss.
> > > > >                I have gone through concept of volume in docker.
> > > > > Is it possible to stop this data loss with this approach? What
> > > > > if volume gets corrupted? Is there any instance of volume
> > > > > running there which can be stopped and can cause data loss ?
> > > > >                Is there a possibility that I can use hdfs
> > > > > running at some external host outside the docker and my hbase
> > > > > running inside docker ? Is such scenario possible ? If yes, How ?
> > > > >                Thank you in advance.
> > > > >
> > > > >
> > > > > Thanks,
> > > > > Udbhav Agarwal
> > > > >
> > > > >
> > > >
> > > --
> > > -Dima
> > >
> > --
> > -Dima
> >
> --
> -Dima
>

RE: Hbase on docker container with persistent storage

Posted by Udbhav Agarwal <ud...@syncoms.com>.
Okay, at which scale you have experience with ?

-----Original Message-----
From: Dima Spivak [mailto:dimaspivak@apache.org] 
Sent: Monday, July 17, 2017 7:40 PM
To: user@hbase.apache.org
Subject: Re: Hbase on docker container with persistent storage

No, not at the scale you're looking at.

On Mon, Jul 17, 2017 at 6:36 AM Udbhav Agarwal <ud...@syncoms.com>
wrote:

> Hi Dima,
> I am unable to containeriz HDFS till now. Do you have any reference 
> which I can use to go ahead with that ?
>
> Thanks,
> Udbhav
>
> -----Original Message-----
> From: Dima Spivak [mailto:dimaspivak@apache.org]
> Sent: Monday, July 17, 2017 6:37 PM
> To: user@hbase.apache.org
> Subject: Re: Hbase on docker container with persistent storage
>
> Hi Udbhav,
>
> How have you containerized HDFS to run on Docker across 80 hosts? The 
> answer to that would guide how you might add HBase into the mix.
>
> On Mon, Jul 17, 2017 at 5:33 AM Udbhav Agarwal 
> <udbhav.agarwal@syncoms.com
> >
> wrote:
>
> > Hi Dima,
> > Hope you are doing well.
> > Using hbase on a single host is performant because now I am not 
> > dealing with Terabytes of data. For now data size is very 
> > less.(around
> > 1 gb). This setup I am using to test my application.
> >                As a next step I have to grow the data as well as 
> > storage and check performance. So I will need to use hbase deployed 
> > on
> > 70-80 servers.
> >                Now can you please let me know how can I containerize 
> > hbase so as to be able to use hbase backed by hdfs using 70-80 host 
> > machines and not loose data if the container itself dies due to some
> reason?
> >
> > Thanks,
> > Udbhav
> >
> > From: Dima Spivak [mailto:dimaspivak@apache.org]
> > Sent: Friday, July 14, 2017 10:11 PM
> > To: Udbhav Agarwal <ud...@syncoms.com>; 
> > user@hbase.apache.org
> > Cc: dimaspivak@apache.org
> > Subject: Re: Hbase on docker container with persistent storage
> >
> > If running HBase on a single host is performant enough for you, why 
> > use HBase at all? How are you currently storing your data?
> >
> > On Fri, Jul 14, 2017 at 6:07 AM Udbhav Agarwal 
> > <udbhav.agarwal@syncoms.com <ma...@syncoms.com>> wrote:
> > Additionally, can you please provide me some links which can guide 
> > me to setup up such system with volumes ? Thank you.
> >
> > Udbhav
> > -----Original Message-----
> > From: Udbhav Agarwal [mailto:udbhav.agarwal@syncoms.com<mailto:
> > udbhav.agarwal@syncoms.com>]
> > Sent: Friday, July 14, 2017 6:31 PM
> > To: user@hbase.apache.org<ma...@hbase.apache.org>
> > Cc: dimaspivak@apache.org<ma...@apache.org>
> > Subject: RE: Hbase on docker container with persistent storage
> >
> > Thank you Dima for the response.
> >         Let me reiterate what I want to achieve in my case. I am 
> > using hbase to persist my bigdata(Terabytes and petabytes) coming 
> > from various sources through spark streaming and kafka.  Spark 
> > streaming and kafka are running as separate microservices inside 
> > different and
> excusive containers.
> > These containers are communicating with http service protocol.
> > Currently I am using hbase setup on 4 VMs on a single host machine. 
> > I have a microservice inside a container to connect to this hbase. 
> > This whole setup is functional and I am able to persist data into as 
> > well as get data from hbase into spark streaming. My use case is of 
> > real time ingestion into hbase as well as real time query from hbase.
> >         Now I am planning to deploy hbase itself inside container. I 
> > want to know what are the options for this. In how many possible 
> > ways I can achieve this ? If I use volumes of container, will they 
> > be able to hold such amount of data (TBs & PBs) ? How will I setup 
> > up hdfs
> inside volumes ?
> > how can I use the power of distributed file system there? Is this 
> > the best way ?
> >
> >
> > Thanks,
> > Udbhav
> > -----Original Message-----
> > From: Dima Spivak [mailto:dimaspivak@apache.org<mailto:
> > dimaspivak@apache.org>]
> > Sent: Friday, July 14, 2017 3:44 AM
> > To: hbase-user <us...@hbase.apache.org>>
> > Subject: Re: Hbase on docker container with persistent storage
> >
> > Udbhav,
> >
> > Volumes are Docker's way of having folders or files from the host 
> > machine bypass the union filesystem used within a Docker container. 
> > As such, if a container with a volume is killed, the data from that 
> > volume should remain there. That said, if whatever caused the 
> > container to die affects the filesystem within the container, it 
> > would
> also affect the data on the host.
> >
> > Running HBase in the manner you've described is not typical in 
> > anything resembling a production environment, but if you explain 
> > more about your use case, we could provide more advice. That said, 
> > how you'd handle data locality and, in particular, multi-host 
> > deployments of HBase in this manner is more of a concern for me than 
> > volume data corruption. What kind of scale do you need to support? 
> > What kind of
> performance do you expect?
> >
> > -Dima
> >
> > On Thu, Jul 13, 2017 at 12:18 AM, Samir Ahmic <ahmic.samir@gmail.com 
> > <ma...@gmail.com>> wrote:
> >
> > > Hi Udbhav,
> > > Great work on hbase docker deployment was done in
> > > https://issues.apache.org/jira/browse/HBASE-12721 you may start 
> > > your journey from there.  As for rest of your questions maybe 
> > > there are some folks here that were doing similar testing and may 
> > > give you more
> > info.
> > >
> > > Regards
> > > Samir
> > >
> > > On Thu, Jul 13, 2017 at 7:57 AM, Udbhav Agarwal < 
> > > udbhav.agarwal@syncoms.com<ma...@syncoms.com>>
> > > wrote:
> > >
> > > > Hi All,
> > > > I need to run hbase 0.98 backed by hdfs on docker container and 
> > > > want to stop the data lost if the container restarts.
> > > >                As per my understanding of docker containers, 
> > > > they work in a way that if any of the container is 
> > > > stopped/killed , every information related to it gets killed. It 
> > > > implies if I am running hbase in a
> > > container
> > > > and I have stored some data in some tables and consequently if 
> > > > the container is stopped then the data will be lost. I need a 
> > > > way in which I can stop this data loss.
> > > >                I have gone through concept of volume in docker. 
> > > > Is it possible to stop this data loss with this approach? What 
> > > > if volume gets corrupted? Is there any instance of volume 
> > > > running there which can be stopped and can cause data loss ?
> > > >                Is there a possibility that I can use hdfs 
> > > > running at some external host outside the docker and my hbase 
> > > > running inside docker ? Is such scenario possible ? If yes, How ?
> > > >                Thank you in advance.
> > > >
> > > >
> > > > Thanks,
> > > > Udbhav Agarwal
> > > >
> > > >
> > >
> > --
> > -Dima
> >
> --
> -Dima
>
--
-Dima

Re: Hbase on docker container with persistent storage

Posted by Dima Spivak <di...@apache.org>.
No, not at the scale you're looking at.

On Mon, Jul 17, 2017 at 6:36 AM Udbhav Agarwal <ud...@syncoms.com>
wrote:

> Hi Dima,
> I am unable to containeriz HDFS till now. Do you have any reference which
> I can use to go ahead with that ?
>
> Thanks,
> Udbhav
>
> -----Original Message-----
> From: Dima Spivak [mailto:dimaspivak@apache.org]
> Sent: Monday, July 17, 2017 6:37 PM
> To: user@hbase.apache.org
> Subject: Re: Hbase on docker container with persistent storage
>
> Hi Udbhav,
>
> How have you containerized HDFS to run on Docker across 80 hosts? The
> answer to that would guide how you might add HBase into the mix.
>
> On Mon, Jul 17, 2017 at 5:33 AM Udbhav Agarwal <udbhav.agarwal@syncoms.com
> >
> wrote:
>
> > Hi Dima,
> > Hope you are doing well.
> > Using hbase on a single host is performant because now I am not
> > dealing with Terabytes of data. For now data size is very less.(around
> > 1 gb). This setup I am using to test my application.
> >                As a next step I have to grow the data as well as
> > storage and check performance. So I will need to use hbase deployed on
> > 70-80 servers.
> >                Now can you please let me know how can I containerize
> > hbase so as to be able to use hbase backed by hdfs using 70-80 host
> > machines and not loose data if the container itself dies due to some
> reason?
> >
> > Thanks,
> > Udbhav
> >
> > From: Dima Spivak [mailto:dimaspivak@apache.org]
> > Sent: Friday, July 14, 2017 10:11 PM
> > To: Udbhav Agarwal <ud...@syncoms.com>; user@hbase.apache.org
> > Cc: dimaspivak@apache.org
> > Subject: Re: Hbase on docker container with persistent storage
> >
> > If running HBase on a single host is performant enough for you, why
> > use HBase at all? How are you currently storing your data?
> >
> > On Fri, Jul 14, 2017 at 6:07 AM Udbhav Agarwal
> > <udbhav.agarwal@syncoms.com <ma...@syncoms.com>> wrote:
> > Additionally, can you please provide me some links which can guide me
> > to setup up such system with volumes ? Thank you.
> >
> > Udbhav
> > -----Original Message-----
> > From: Udbhav Agarwal [mailto:udbhav.agarwal@syncoms.com<mailto:
> > udbhav.agarwal@syncoms.com>]
> > Sent: Friday, July 14, 2017 6:31 PM
> > To: user@hbase.apache.org<ma...@hbase.apache.org>
> > Cc: dimaspivak@apache.org<ma...@apache.org>
> > Subject: RE: Hbase on docker container with persistent storage
> >
> > Thank you Dima for the response.
> >         Let me reiterate what I want to achieve in my case. I am using
> > hbase to persist my bigdata(Terabytes and petabytes) coming from
> > various sources through spark streaming and kafka.  Spark streaming
> > and kafka are running as separate microservices inside different and
> excusive containers.
> > These containers are communicating with http service protocol.
> > Currently I am using hbase setup on 4 VMs on a single host machine. I
> > have a microservice inside a container to connect to this hbase. This
> > whole setup is functional and I am able to persist data into as well
> > as get data from hbase into spark streaming. My use case is of real
> > time ingestion into hbase as well as real time query from hbase.
> >         Now I am planning to deploy hbase itself inside container. I
> > want to know what are the options for this. In how many possible ways
> > I can achieve this ? If I use volumes of container, will they be able
> > to hold such amount of data (TBs & PBs) ? How will I setup up hdfs
> inside volumes ?
> > how can I use the power of distributed file system there? Is this the
> > best way ?
> >
> >
> > Thanks,
> > Udbhav
> > -----Original Message-----
> > From: Dima Spivak [mailto:dimaspivak@apache.org<mailto:
> > dimaspivak@apache.org>]
> > Sent: Friday, July 14, 2017 3:44 AM
> > To: hbase-user <us...@hbase.apache.org>>
> > Subject: Re: Hbase on docker container with persistent storage
> >
> > Udbhav,
> >
> > Volumes are Docker's way of having folders or files from the host
> > machine bypass the union filesystem used within a Docker container. As
> > such, if a container with a volume is killed, the data from that
> > volume should remain there. That said, if whatever caused the
> > container to die affects the filesystem within the container, it would
> also affect the data on the host.
> >
> > Running HBase in the manner you've described is not typical in
> > anything resembling a production environment, but if you explain more
> > about your use case, we could provide more advice. That said, how
> > you'd handle data locality and, in particular, multi-host deployments
> > of HBase in this manner is more of a concern for me than volume data
> > corruption. What kind of scale do you need to support? What kind of
> performance do you expect?
> >
> > -Dima
> >
> > On Thu, Jul 13, 2017 at 12:18 AM, Samir Ahmic <ahmic.samir@gmail.com
> > <ma...@gmail.com>> wrote:
> >
> > > Hi Udbhav,
> > > Great work on hbase docker deployment was done in
> > > https://issues.apache.org/jira/browse/HBASE-12721 you may start your
> > > journey from there.  As for rest of your questions maybe there are
> > > some folks here that were doing similar testing and may give you
> > > more
> > info.
> > >
> > > Regards
> > > Samir
> > >
> > > On Thu, Jul 13, 2017 at 7:57 AM, Udbhav Agarwal <
> > > udbhav.agarwal@syncoms.com<ma...@syncoms.com>>
> > > wrote:
> > >
> > > > Hi All,
> > > > I need to run hbase 0.98 backed by hdfs on docker container and
> > > > want to stop the data lost if the container restarts.
> > > >                As per my understanding of docker containers, they
> > > > work in a way that if any of the container is stopped/killed ,
> > > > every information related to it gets killed. It implies if I am
> > > > running hbase in a
> > > container
> > > > and I have stored some data in some tables and consequently if the
> > > > container is stopped then the data will be lost. I need a way in
> > > > which I can stop this data loss.
> > > >                I have gone through concept of volume in docker. Is
> > > > it possible to stop this data loss with this approach? What if
> > > > volume gets corrupted? Is there any instance of volume running
> > > > there which can be stopped and can cause data loss ?
> > > >                Is there a possibility that I can use hdfs running
> > > > at some external host outside the docker and my hbase running
> > > > inside docker ? Is such scenario possible ? If yes, How ?
> > > >                Thank you in advance.
> > > >
> > > >
> > > > Thanks,
> > > > Udbhav Agarwal
> > > >
> > > >
> > >
> > --
> > -Dima
> >
> --
> -Dima
>
-- 
-Dima

RE: Hbase on docker container with persistent storage

Posted by Udbhav Agarwal <ud...@syncoms.com>.
Hi Dima,
I am unable to containeriz HDFS till now. Do you have any reference which I can use to go ahead with that ? 

Thanks,
Udbhav

-----Original Message-----
From: Dima Spivak [mailto:dimaspivak@apache.org] 
Sent: Monday, July 17, 2017 6:37 PM
To: user@hbase.apache.org
Subject: Re: Hbase on docker container with persistent storage

Hi Udbhav,

How have you containerized HDFS to run on Docker across 80 hosts? The answer to that would guide how you might add HBase into the mix.

On Mon, Jul 17, 2017 at 5:33 AM Udbhav Agarwal <ud...@syncoms.com>
wrote:

> Hi Dima,
> Hope you are doing well.
> Using hbase on a single host is performant because now I am not 
> dealing with Terabytes of data. For now data size is very less.(around 
> 1 gb). This setup I am using to test my application.
>                As a next step I have to grow the data as well as 
> storage and check performance. So I will need to use hbase deployed on 
> 70-80 servers.
>                Now can you please let me know how can I containerize 
> hbase so as to be able to use hbase backed by hdfs using 70-80 host 
> machines and not loose data if the container itself dies due to some reason?
>
> Thanks,
> Udbhav
>
> From: Dima Spivak [mailto:dimaspivak@apache.org]
> Sent: Friday, July 14, 2017 10:11 PM
> To: Udbhav Agarwal <ud...@syncoms.com>; user@hbase.apache.org
> Cc: dimaspivak@apache.org
> Subject: Re: Hbase on docker container with persistent storage
>
> If running HBase on a single host is performant enough for you, why 
> use HBase at all? How are you currently storing your data?
>
> On Fri, Jul 14, 2017 at 6:07 AM Udbhav Agarwal 
> <udbhav.agarwal@syncoms.com <ma...@syncoms.com>> wrote:
> Additionally, can you please provide me some links which can guide me 
> to setup up such system with volumes ? Thank you.
>
> Udbhav
> -----Original Message-----
> From: Udbhav Agarwal [mailto:udbhav.agarwal@syncoms.com<mailto:
> udbhav.agarwal@syncoms.com>]
> Sent: Friday, July 14, 2017 6:31 PM
> To: user@hbase.apache.org<ma...@hbase.apache.org>
> Cc: dimaspivak@apache.org<ma...@apache.org>
> Subject: RE: Hbase on docker container with persistent storage
>
> Thank you Dima for the response.
>         Let me reiterate what I want to achieve in my case. I am using 
> hbase to persist my bigdata(Terabytes and petabytes) coming from 
> various sources through spark streaming and kafka.  Spark streaming 
> and kafka are running as separate microservices inside different and excusive containers.
> These containers are communicating with http service protocol. 
> Currently I am using hbase setup on 4 VMs on a single host machine. I 
> have a microservice inside a container to connect to this hbase. This 
> whole setup is functional and I am able to persist data into as well 
> as get data from hbase into spark streaming. My use case is of real 
> time ingestion into hbase as well as real time query from hbase.
>         Now I am planning to deploy hbase itself inside container. I 
> want to know what are the options for this. In how many possible ways 
> I can achieve this ? If I use volumes of container, will they be able 
> to hold such amount of data (TBs & PBs) ? How will I setup up hdfs inside volumes ?
> how can I use the power of distributed file system there? Is this the 
> best way ?
>
>
> Thanks,
> Udbhav
> -----Original Message-----
> From: Dima Spivak [mailto:dimaspivak@apache.org<mailto:
> dimaspivak@apache.org>]
> Sent: Friday, July 14, 2017 3:44 AM
> To: hbase-user <us...@hbase.apache.org>>
> Subject: Re: Hbase on docker container with persistent storage
>
> Udbhav,
>
> Volumes are Docker's way of having folders or files from the host 
> machine bypass the union filesystem used within a Docker container. As 
> such, if a container with a volume is killed, the data from that 
> volume should remain there. That said, if whatever caused the 
> container to die affects the filesystem within the container, it would also affect the data on the host.
>
> Running HBase in the manner you've described is not typical in 
> anything resembling a production environment, but if you explain more 
> about your use case, we could provide more advice. That said, how 
> you'd handle data locality and, in particular, multi-host deployments 
> of HBase in this manner is more of a concern for me than volume data 
> corruption. What kind of scale do you need to support? What kind of performance do you expect?
>
> -Dima
>
> On Thu, Jul 13, 2017 at 12:18 AM, Samir Ahmic <ahmic.samir@gmail.com 
> <ma...@gmail.com>> wrote:
>
> > Hi Udbhav,
> > Great work on hbase docker deployment was done in
> > https://issues.apache.org/jira/browse/HBASE-12721 you may start your 
> > journey from there.  As for rest of your questions maybe there are 
> > some folks here that were doing similar testing and may give you 
> > more
> info.
> >
> > Regards
> > Samir
> >
> > On Thu, Jul 13, 2017 at 7:57 AM, Udbhav Agarwal < 
> > udbhav.agarwal@syncoms.com<ma...@syncoms.com>>
> > wrote:
> >
> > > Hi All,
> > > I need to run hbase 0.98 backed by hdfs on docker container and 
> > > want to stop the data lost if the container restarts.
> > >                As per my understanding of docker containers, they 
> > > work in a way that if any of the container is stopped/killed , 
> > > every information related to it gets killed. It implies if I am 
> > > running hbase in a
> > container
> > > and I have stored some data in some tables and consequently if the 
> > > container is stopped then the data will be lost. I need a way in 
> > > which I can stop this data loss.
> > >                I have gone through concept of volume in docker. Is 
> > > it possible to stop this data loss with this approach? What if 
> > > volume gets corrupted? Is there any instance of volume running 
> > > there which can be stopped and can cause data loss ?
> > >                Is there a possibility that I can use hdfs running 
> > > at some external host outside the docker and my hbase running 
> > > inside docker ? Is such scenario possible ? If yes, How ?
> > >                Thank you in advance.
> > >
> > >
> > > Thanks,
> > > Udbhav Agarwal
> > >
> > >
> >
> --
> -Dima
>
--
-Dima

Re: Hbase on docker container with persistent storage

Posted by Dima Spivak <di...@apache.org>.
Hi Udbhav,

How have you containerized HDFS to run on Docker across 80 hosts? The
answer to that would guide how you might add HBase into the mix.

On Mon, Jul 17, 2017 at 5:33 AM Udbhav Agarwal <ud...@syncoms.com>
wrote:

> Hi Dima,
> Hope you are doing well.
> Using hbase on a single host is performant because now I am not dealing
> with Terabytes of data. For now data size is very less.(around 1 gb). This
> setup I am using to test my application.
>                As a next step I have to grow the data as well as storage
> and check performance. So I will need to use hbase deployed on 70-80
> servers.
>                Now can you please let me know how can I containerize hbase
> so as to be able to use hbase backed by hdfs using 70-80 host machines and
> not loose data if the container itself dies due to some reason?
>
> Thanks,
> Udbhav
>
> From: Dima Spivak [mailto:dimaspivak@apache.org]
> Sent: Friday, July 14, 2017 10:11 PM
> To: Udbhav Agarwal <ud...@syncoms.com>; user@hbase.apache.org
> Cc: dimaspivak@apache.org
> Subject: Re: Hbase on docker container with persistent storage
>
> If running HBase on a single host is performant enough for you, why use
> HBase at all? How are you currently storing your data?
>
> On Fri, Jul 14, 2017 at 6:07 AM Udbhav Agarwal <udbhav.agarwal@syncoms.com
> <ma...@syncoms.com>> wrote:
> Additionally, can you please provide me some links which can guide me to
> setup up such system with volumes ? Thank you.
>
> Udbhav
> -----Original Message-----
> From: Udbhav Agarwal [mailto:udbhav.agarwal@syncoms.com<mailto:
> udbhav.agarwal@syncoms.com>]
> Sent: Friday, July 14, 2017 6:31 PM
> To: user@hbase.apache.org<ma...@hbase.apache.org>
> Cc: dimaspivak@apache.org<ma...@apache.org>
> Subject: RE: Hbase on docker container with persistent storage
>
> Thank you Dima for the response.
>         Let me reiterate what I want to achieve in my case. I am using
> hbase to persist my bigdata(Terabytes and petabytes) coming from various
> sources through spark streaming and kafka.  Spark streaming and kafka are
> running as separate microservices inside different and excusive containers.
> These containers are communicating with http service protocol. Currently I
> am using hbase setup on 4 VMs on a single host machine. I have a
> microservice inside a container to connect to this hbase. This whole setup
> is functional and I am able to persist data into as well as get data from
> hbase into spark streaming. My use case is of real time ingestion into
> hbase as well as real time query from hbase.
>         Now I am planning to deploy hbase itself inside container. I want
> to know what are the options for this. In how many possible ways I can
> achieve this ? If I use volumes of container, will they be able to hold
> such amount of data (TBs & PBs) ? How will I setup up hdfs inside volumes ?
> how can I use the power of distributed file system there? Is this the best
> way ?
>
>
> Thanks,
> Udbhav
> -----Original Message-----
> From: Dima Spivak [mailto:dimaspivak@apache.org<mailto:
> dimaspivak@apache.org>]
> Sent: Friday, July 14, 2017 3:44 AM
> To: hbase-user <us...@hbase.apache.org>>
> Subject: Re: Hbase on docker container with persistent storage
>
> Udbhav,
>
> Volumes are Docker's way of having folders or files from the host machine
> bypass the union filesystem used within a Docker container. As such, if a
> container with a volume is killed, the data from that volume should remain
> there. That said, if whatever caused the container to die affects the
> filesystem within the container, it would also affect the data on the host.
>
> Running HBase in the manner you've described is not typical in anything
> resembling a production environment, but if you explain more about your use
> case, we could provide more advice. That said, how you'd handle data
> locality and, in particular, multi-host deployments of HBase in this manner
> is more of a concern for me than volume data corruption. What kind of scale
> do you need to support? What kind of performance do you expect?
>
> -Dima
>
> On Thu, Jul 13, 2017 at 12:18 AM, Samir Ahmic <ahmic.samir@gmail.com
> <ma...@gmail.com>> wrote:
>
> > Hi Udbhav,
> > Great work on hbase docker deployment was done in
> > https://issues.apache.org/jira/browse/HBASE-12721 you may start your
> > journey from there.  As for rest of your questions maybe there are
> > some folks here that were doing similar testing and may give you more
> info.
> >
> > Regards
> > Samir
> >
> > On Thu, Jul 13, 2017 at 7:57 AM, Udbhav Agarwal <
> > udbhav.agarwal@syncoms.com<ma...@syncoms.com>>
> > wrote:
> >
> > > Hi All,
> > > I need to run hbase 0.98 backed by hdfs on docker container and want
> > > to stop the data lost if the container restarts.
> > >                As per my understanding of docker containers, they
> > > work in a way that if any of the container is stopped/killed , every
> > > information related to it gets killed. It implies if I am running
> > > hbase in a
> > container
> > > and I have stored some data in some tables and consequently if the
> > > container is stopped then the data will be lost. I need a way in
> > > which I can stop this data loss.
> > >                I have gone through concept of volume in docker. Is
> > > it possible to stop this data loss with this approach? What if
> > > volume gets corrupted? Is there any instance of volume running there
> > > which can be stopped and can cause data loss ?
> > >                Is there a possibility that I can use hdfs running at
> > > some external host outside the docker and my hbase running inside
> > > docker ? Is such scenario possible ? If yes, How ?
> > >                Thank you in advance.
> > >
> > >
> > > Thanks,
> > > Udbhav Agarwal
> > >
> > >
> >
> --
> -Dima
>
-- 
-Dima

RE: Hbase on docker container with persistent storage

Posted by Udbhav Agarwal <ud...@syncoms.com>.
Hi Dima,
Hope you are doing well.
Using hbase on a single host is performant because now I am not dealing with Terabytes of data. For now data size is very less.(around 1 gb). This setup I am using to test my application.
               As a next step I have to grow the data as well as storage and check performance. So I will need to use hbase deployed on 70-80 servers.
               Now can you please let me know how can I containerize hbase so as to be able to use hbase backed by hdfs using 70-80 host machines and not loose data if the container itself dies due to some reason?

Thanks,
Udbhav

From: Dima Spivak [mailto:dimaspivak@apache.org]
Sent: Friday, July 14, 2017 10:11 PM
To: Udbhav Agarwal <ud...@syncoms.com>; user@hbase.apache.org
Cc: dimaspivak@apache.org
Subject: Re: Hbase on docker container with persistent storage

If running HBase on a single host is performant enough for you, why use HBase at all? How are you currently storing your data?

On Fri, Jul 14, 2017 at 6:07 AM Udbhav Agarwal <ud...@syncoms.com>> wrote:
Additionally, can you please provide me some links which can guide me to setup up such system with volumes ? Thank you.

Udbhav
-----Original Message-----
From: Udbhav Agarwal [mailto:udbhav.agarwal@syncoms.com<ma...@syncoms.com>]
Sent: Friday, July 14, 2017 6:31 PM
To: user@hbase.apache.org<ma...@hbase.apache.org>
Cc: dimaspivak@apache.org<ma...@apache.org>
Subject: RE: Hbase on docker container with persistent storage

Thank you Dima for the response.
        Let me reiterate what I want to achieve in my case. I am using hbase to persist my bigdata(Terabytes and petabytes) coming from various sources through spark streaming and kafka.  Spark streaming and kafka are running as separate microservices inside different and excusive containers. These containers are communicating with http service protocol. Currently I am using hbase setup on 4 VMs on a single host machine. I have a microservice inside a container to connect to this hbase. This whole setup is functional and I am able to persist data into as well as get data from hbase into spark streaming. My use case is of real time ingestion into hbase as well as real time query from hbase.
        Now I am planning to deploy hbase itself inside container. I want to know what are the options for this. In how many possible ways I can achieve this ? If I use volumes of container, will they be able to hold such amount of data (TBs & PBs) ? How will I setup up hdfs inside volumes ? how can I use the power of distributed file system there? Is this the best way ?


Thanks,
Udbhav
-----Original Message-----
From: Dima Spivak [mailto:dimaspivak@apache.org<ma...@apache.org>]
Sent: Friday, July 14, 2017 3:44 AM
To: hbase-user <us...@hbase.apache.org>>
Subject: Re: Hbase on docker container with persistent storage

Udbhav,

Volumes are Docker's way of having folders or files from the host machine bypass the union filesystem used within a Docker container. As such, if a container with a volume is killed, the data from that volume should remain there. That said, if whatever caused the container to die affects the filesystem within the container, it would also affect the data on the host.

Running HBase in the manner you've described is not typical in anything resembling a production environment, but if you explain more about your use case, we could provide more advice. That said, how you'd handle data locality and, in particular, multi-host deployments of HBase in this manner is more of a concern for me than volume data corruption. What kind of scale do you need to support? What kind of performance do you expect?

-Dima

On Thu, Jul 13, 2017 at 12:18 AM, Samir Ahmic <ah...@gmail.com>> wrote:

> Hi Udbhav,
> Great work on hbase docker deployment was done in
> https://issues.apache.org/jira/browse/HBASE-12721 you may start your
> journey from there.  As for rest of your questions maybe there are
> some folks here that were doing similar testing and may give you more info.
>
> Regards
> Samir
>
> On Thu, Jul 13, 2017 at 7:57 AM, Udbhav Agarwal <
> udbhav.agarwal@syncoms.com<ma...@syncoms.com>>
> wrote:
>
> > Hi All,
> > I need to run hbase 0.98 backed by hdfs on docker container and want
> > to stop the data lost if the container restarts.
> >                As per my understanding of docker containers, they
> > work in a way that if any of the container is stopped/killed , every
> > information related to it gets killed. It implies if I am running
> > hbase in a
> container
> > and I have stored some data in some tables and consequently if the
> > container is stopped then the data will be lost. I need a way in
> > which I can stop this data loss.
> >                I have gone through concept of volume in docker. Is
> > it possible to stop this data loss with this approach? What if
> > volume gets corrupted? Is there any instance of volume running there
> > which can be stopped and can cause data loss ?
> >                Is there a possibility that I can use hdfs running at
> > some external host outside the docker and my hbase running inside
> > docker ? Is such scenario possible ? If yes, How ?
> >                Thank you in advance.
> >
> >
> > Thanks,
> > Udbhav Agarwal
> >
> >
>
--
-Dima

Re: Hbase on docker container with persistent storage

Posted by Dima Spivak <di...@apache.org>.
If running HBase on a single host is performant enough for you, why use
HBase at all? How are you currently storing your data?

On Fri, Jul 14, 2017 at 6:07 AM Udbhav Agarwal <ud...@syncoms.com>
wrote:

> Additionally, can you please provide me some links which can guide me to
> setup up such system with volumes ? Thank you.
>
> Udbhav
> -----Original Message-----
> From: Udbhav Agarwal [mailto:udbhav.agarwal@syncoms.com]
> Sent: Friday, July 14, 2017 6:31 PM
> To: user@hbase.apache.org
> Cc: dimaspivak@apache.org
> Subject: RE: Hbase on docker container with persistent storage
>
> Thank you Dima for the response.
>         Let me reiterate what I want to achieve in my case. I am using
> hbase to persist my bigdata(Terabytes and petabytes) coming from various
> sources through spark streaming and kafka.  Spark streaming and kafka are
> running as separate microservices inside different and excusive containers.
> These containers are communicating with http service protocol. Currently I
> am using hbase setup on 4 VMs on a single host machine. I have a
> microservice inside a container to connect to this hbase. This whole setup
> is functional and I am able to persist data into as well as get data from
> hbase into spark streaming. My use case is of real time ingestion into
> hbase as well as real time query from hbase.
>         Now I am planning to deploy hbase itself inside container. I want
> to know what are the options for this. In how many possible ways I can
> achieve this ? If I use volumes of container, will they be able to hold
> such amount of data (TBs & PBs) ? How will I setup up hdfs inside volumes ?
> how can I use the power of distributed file system there? Is this the best
> way ?
>
>
> Thanks,
> Udbhav
> -----Original Message-----
> From: Dima Spivak [mailto:dimaspivak@apache.org]
> Sent: Friday, July 14, 2017 3:44 AM
> To: hbase-user <us...@hbase.apache.org>
> Subject: Re: Hbase on docker container with persistent storage
>
> Udbhav,
>
> Volumes are Docker's way of having folders or files from the host machine
> bypass the union filesystem used within a Docker container. As such, if a
> container with a volume is killed, the data from that volume should remain
> there. That said, if whatever caused the container to die affects the
> filesystem within the container, it would also affect the data on the host.
>
> Running HBase in the manner you've described is not typical in anything
> resembling a production environment, but if you explain more about your use
> case, we could provide more advice. That said, how you'd handle data
> locality and, in particular, multi-host deployments of HBase in this manner
> is more of a concern for me than volume data corruption. What kind of scale
> do you need to support? What kind of performance do you expect?
>
> -Dima
>
> On Thu, Jul 13, 2017 at 12:18 AM, Samir Ahmic <ah...@gmail.com>
> wrote:
>
> > Hi Udbhav,
> > Great work on hbase docker deployment was done in
> > https://issues.apache.org/jira/browse/HBASE-12721 you may start your
> > journey from there.  As for rest of your questions maybe there are
> > some folks here that were doing similar testing and may give you more
> info.
> >
> > Regards
> > Samir
> >
> > On Thu, Jul 13, 2017 at 7:57 AM, Udbhav Agarwal <
> > udbhav.agarwal@syncoms.com>
> > wrote:
> >
> > > Hi All,
> > > I need to run hbase 0.98 backed by hdfs on docker container and want
> > > to stop the data lost if the container restarts.
> > >                As per my understanding of docker containers, they
> > > work in a way that if any of the container is stopped/killed , every
> > > information related to it gets killed. It implies if I am running
> > > hbase in a
> > container
> > > and I have stored some data in some tables and consequently if the
> > > container is stopped then the data will be lost. I need a way in
> > > which I can stop this data loss.
> > >                I have gone through concept of volume in docker. Is
> > > it possible to stop this data loss with this approach? What if
> > > volume gets corrupted? Is there any instance of volume running there
> > > which can be stopped and can cause data loss ?
> > >                Is there a possibility that I can use hdfs running at
> > > some external host outside the docker and my hbase running inside
> > > docker ? Is such scenario possible ? If yes, How ?
> > >                Thank you in advance.
> > >
> > >
> > > Thanks,
> > > Udbhav Agarwal
> > >
> > >
> >
>
-- 
-Dima

RE: Hbase on docker container with persistent storage

Posted by Udbhav Agarwal <ud...@syncoms.com>.
Additionally, can you please provide me some links which can guide me to setup up such system with volumes ? Thank you.

Udbhav
-----Original Message-----
From: Udbhav Agarwal [mailto:udbhav.agarwal@syncoms.com] 
Sent: Friday, July 14, 2017 6:31 PM
To: user@hbase.apache.org
Cc: dimaspivak@apache.org
Subject: RE: Hbase on docker container with persistent storage

Thank you Dima for the response.
	Let me reiterate what I want to achieve in my case. I am using hbase to persist my bigdata(Terabytes and petabytes) coming from various sources through spark streaming and kafka.  Spark streaming and kafka are running as separate microservices inside different and excusive containers. These containers are communicating with http service protocol. Currently I am using hbase setup on 4 VMs on a single host machine. I have a microservice inside a container to connect to this hbase. This whole setup is functional and I am able to persist data into as well as get data from hbase into spark streaming. My use case is of real time ingestion into hbase as well as real time query from hbase.
	Now I am planning to deploy hbase itself inside container. I want to know what are the options for this. In how many possible ways I can achieve this ? If I use volumes of container, will they be able to hold such amount of data (TBs & PBs) ? How will I setup up hdfs inside volumes ? how can I use the power of distributed file system there? Is this the best way ? 


Thanks,
Udbhav
-----Original Message-----
From: Dima Spivak [mailto:dimaspivak@apache.org]
Sent: Friday, July 14, 2017 3:44 AM
To: hbase-user <us...@hbase.apache.org>
Subject: Re: Hbase on docker container with persistent storage

Udbhav,

Volumes are Docker's way of having folders or files from the host machine bypass the union filesystem used within a Docker container. As such, if a container with a volume is killed, the data from that volume should remain there. That said, if whatever caused the container to die affects the filesystem within the container, it would also affect the data on the host.

Running HBase in the manner you've described is not typical in anything resembling a production environment, but if you explain more about your use case, we could provide more advice. That said, how you'd handle data locality and, in particular, multi-host deployments of HBase in this manner is more of a concern for me than volume data corruption. What kind of scale do you need to support? What kind of performance do you expect?

-Dima

On Thu, Jul 13, 2017 at 12:18 AM, Samir Ahmic <ah...@gmail.com> wrote:

> Hi Udbhav,
> Great work on hbase docker deployment was done in
> https://issues.apache.org/jira/browse/HBASE-12721 you may start your 
> journey from there.  As for rest of your questions maybe there are 
> some folks here that were doing similar testing and may give you more info.
>
> Regards
> Samir
>
> On Thu, Jul 13, 2017 at 7:57 AM, Udbhav Agarwal < 
> udbhav.agarwal@syncoms.com>
> wrote:
>
> > Hi All,
> > I need to run hbase 0.98 backed by hdfs on docker container and want 
> > to stop the data lost if the container restarts.
> >                As per my understanding of docker containers, they 
> > work in a way that if any of the container is stopped/killed , every 
> > information related to it gets killed. It implies if I am running 
> > hbase in a
> container
> > and I have stored some data in some tables and consequently if the 
> > container is stopped then the data will be lost. I need a way in 
> > which I can stop this data loss.
> >                I have gone through concept of volume in docker. Is 
> > it possible to stop this data loss with this approach? What if 
> > volume gets corrupted? Is there any instance of volume running there 
> > which can be stopped and can cause data loss ?
> >                Is there a possibility that I can use hdfs running at 
> > some external host outside the docker and my hbase running inside 
> > docker ? Is such scenario possible ? If yes, How ?
> >                Thank you in advance.
> >
> >
> > Thanks,
> > Udbhav Agarwal
> >
> >
>

RE: Hbase on docker container with persistent storage

Posted by Udbhav Agarwal <ud...@syncoms.com>.
Thank you Dima for the response.
	Let me reiterate what I want to achieve in my case. I am using hbase to persist my bigdata(Terabytes and petabytes) coming from various sources through spark streaming and kafka.  Spark streaming and kafka are running as separate microservices inside different and excusive containers. These containers are communicating with http service protocol. Currently I am using hbase setup on 4 VMs on a single host machine. I have a microservice inside a container to connect to this hbase. This whole setup is functional and I am able to persist data into as well as get data from hbase into spark streaming. My use case is of real time ingestion into hbase as well as real time query from hbase.
	Now I am planning to deploy hbase itself inside container. I want to know what are the options for this. In how many possible ways I can achieve this ? If I use volumes of container, will they be able to hold such amount of data (TBs & PBs) ? How will I setup up hdfs inside volumes ? how can I use the power of distributed file system there? Is this the best way ? 


Thanks,
Udbhav
-----Original Message-----
From: Dima Spivak [mailto:dimaspivak@apache.org] 
Sent: Friday, July 14, 2017 3:44 AM
To: hbase-user <us...@hbase.apache.org>
Subject: Re: Hbase on docker container with persistent storage

Udbhav,

Volumes are Docker's way of having folders or files from the host machine bypass the union filesystem used within a Docker container. As such, if a container with a volume is killed, the data from that volume should remain there. That said, if whatever caused the container to die affects the filesystem within the container, it would also affect the data on the host.

Running HBase in the manner you've described is not typical in anything resembling a production environment, but if you explain more about your use case, we could provide more advice. That said, how you'd handle data locality and, in particular, multi-host deployments of HBase in this manner is more of a concern for me than volume data corruption. What kind of scale do you need to support? What kind of performance do you expect?

-Dima

On Thu, Jul 13, 2017 at 12:18 AM, Samir Ahmic <ah...@gmail.com> wrote:

> Hi Udbhav,
> Great work on hbase docker deployment was done in
> https://issues.apache.org/jira/browse/HBASE-12721 you may start your 
> journey from there.  As for rest of your questions maybe there are 
> some folks here that were doing similar testing and may give you more info.
>
> Regards
> Samir
>
> On Thu, Jul 13, 2017 at 7:57 AM, Udbhav Agarwal < 
> udbhav.agarwal@syncoms.com>
> wrote:
>
> > Hi All,
> > I need to run hbase 0.98 backed by hdfs on docker container and want 
> > to stop the data lost if the container restarts.
> >                As per my understanding of docker containers, they 
> > work in a way that if any of the container is stopped/killed , every 
> > information related to it gets killed. It implies if I am running 
> > hbase in a
> container
> > and I have stored some data in some tables and consequently if the 
> > container is stopped then the data will be lost. I need a way in 
> > which I can stop this data loss.
> >                I have gone through concept of volume in docker. Is 
> > it possible to stop this data loss with this approach? What if 
> > volume gets corrupted? Is there any instance of volume running there 
> > which can be stopped and can cause data loss ?
> >                Is there a possibility that I can use hdfs running at 
> > some external host outside the docker and my hbase running inside 
> > docker ? Is such scenario possible ? If yes, How ?
> >                Thank you in advance.
> >
> >
> > Thanks,
> > Udbhav Agarwal
> >
> >
>

Re: Hbase on docker container with persistent storage

Posted by Dima Spivak <di...@apache.org>.
Udbhav,

Volumes are Docker's way of having folders or files from the host machine
bypass the union filesystem used within a Docker container. As such, if a
container with a volume is killed, the data from that volume should remain
there. That said, if whatever caused the container to die affects the
filesystem within the container, it would also affect the data on the host.

Running HBase in the manner you've described is not typical in anything
resembling a production environment, but if you explain more about your use
case, we could provide more advice. That said, how you'd handle data
locality and, in particular, multi-host deployments of HBase in this manner
is more of a concern for me than volume data corruption. What kind of scale
do you need to support? What kind of performance do you expect?

-Dima

On Thu, Jul 13, 2017 at 12:18 AM, Samir Ahmic <ah...@gmail.com> wrote:

> Hi Udbhav,
> Great work on hbase docker deployment was done in
> https://issues.apache.org/jira/browse/HBASE-12721 you may start your
> journey from there.  As for rest of your questions maybe there are some
> folks here that were doing similar testing and may give you more info.
>
> Regards
> Samir
>
> On Thu, Jul 13, 2017 at 7:57 AM, Udbhav Agarwal <
> udbhav.agarwal@syncoms.com>
> wrote:
>
> > Hi All,
> > I need to run hbase 0.98 backed by hdfs on docker container and want to
> > stop the data lost if the container restarts.
> >                As per my understanding of docker containers, they work in
> > a way that if any of the container is stopped/killed , every information
> > related to it gets killed. It implies if I am running hbase in a
> container
> > and I have stored some data in some tables and consequently if the
> > container is stopped then the data will be lost. I need a way in which I
> > can stop this data loss.
> >                I have gone through concept of volume in docker. Is it
> > possible to stop this data loss with this approach? What if volume gets
> > corrupted? Is there any instance of volume running there which can be
> > stopped and can cause data loss ?
> >                Is there a possibility that I can use hdfs running at some
> > external host outside the docker and my hbase running inside docker ? Is
> > such scenario possible ? If yes, How ?
> >                Thank you in advance.
> >
> >
> > Thanks,
> > Udbhav Agarwal
> >
> >
>

Re: Hbase on docker container with persistent storage

Posted by Samir Ahmic <ah...@gmail.com>.
Hi Udbhav,
Great work on hbase docker deployment was done in
https://issues.apache.org/jira/browse/HBASE-12721 you may start your
journey from there.  As for rest of your questions maybe there are some
folks here that were doing similar testing and may give you more info.

Regards
Samir

On Thu, Jul 13, 2017 at 7:57 AM, Udbhav Agarwal <ud...@syncoms.com>
wrote:

> Hi All,
> I need to run hbase 0.98 backed by hdfs on docker container and want to
> stop the data lost if the container restarts.
>                As per my understanding of docker containers, they work in
> a way that if any of the container is stopped/killed , every information
> related to it gets killed. It implies if I am running hbase in a container
> and I have stored some data in some tables and consequently if the
> container is stopped then the data will be lost. I need a way in which I
> can stop this data loss.
>                I have gone through concept of volume in docker. Is it
> possible to stop this data loss with this approach? What if volume gets
> corrupted? Is there any instance of volume running there which can be
> stopped and can cause data loss ?
>                Is there a possibility that I can use hdfs running at some
> external host outside the docker and my hbase running inside docker ? Is
> such scenario possible ? If yes, How ?
>                Thank you in advance.
>
>
> Thanks,
> Udbhav Agarwal
>
>