You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hbase.apache.org by Manjeet Singh <ma...@gmail.com> on 2018/07/11 15:16:27 UTC

Query for OldWals and use of WAl for Hbase indexer

Hi All

I have a query regarding Hbase replication and OldWals

Hbase version 1.2.1

To enable Hbase indexing we use below command on table

alter '<NameOfTable>', {NAME => 'CF1', REPLICATION_SCOPE => 1}

By Doing this actually replication get enabled as hbase-indexer required
it, as per my understanding indexer use hbase WAL (Please correct me if I
am wrong).

so question is How Hbase syncronize with Solr Indexer? What is the role of
replication? what optimization we can apply in order to reduce data size?


I can see that our OldWals are getting filled , if Hmaster it self taking
care why it's reached to 7.2 TB? what if I delete it, does it impact solr
indexing?

7.2 K   21.5 K  /hbase/.hbase-snapshot
0       0       /hbase/.tmp
0       0       /hbase/MasterProcWALs
18.3 G  60.2 G  /hbase/WALs
28.7 G  86.1 G  /hbase/archive
0       0       /hbase/corrupt
1.7 T   5.2 T   /hbase/data
42      126     /hbase/hbase.id
7       21      /hbase/hbase.version
7.2 T   21.6 T  /hbase/oldWALs




Thanks
Manjeet Singh

Re: Query for OldWals and use of WAl for Hbase indexer

Posted by Sean Busbey <bu...@apache.org>.

On Wed, Jul 11, 2018 at 12:49 PM, Manjeet Singh
<ma...@gmail.com> wrote:
> Thanks Sean for your reply
>
> I still have some question un answered like
> Q1: How Hbase syncronized with Hbase indexer.

The Lily Indexer for HBase (sometimes referred to as the "Lily HBase
Indexer") is an independent project, linked to in my previous email.
They would be best positioned to answer questions you have about how
it works. I suggest you talk with that project.

My understanding is that the Lily Indexer leverages the HBase
replication system to get observe edits as they come into the system
and then applies a corresponding change to the index it maintains.
That means it has all the same configurable options and warnings that
come with any HBase replication set up:

http://hbase.apache.org/1.2/book.html#_cluster_replication

> Q2 What optimization I can apply.

You need to dig into the logs to find out if there is a problem
talking to the indexers or if they are just lagging. If they're
lagging, then I believe you can add more indexer nodes to scale up
effective throughput. The Lily Indexer for HBase project would be
better suited to answer that though.

> Q3 As it's clear from my stats, data in OldWals is quite huge so it's not
> getting clear my HMaster., how can I improve my HDFS space issue due to
> this?

The only way to safely decrease the size of the retained wals is to
make it so they are no longer needed. That means either getting the
Lily Indexer for HBase to catch up or reseting things and using a
batch indexing method to fill in the gap.

Re: Query for OldWals and use of WAl for Hbase indexer

Posted by Reid Chan <re...@outlook.com>.

Please check the comments i leave in your filed jira.

________________________________________
From: Manjeet Singh <ma...@gmail.com>
Sent: 12 July 2018 14:55:40
To: user@hbase.apache.org
Subject: Re: Query for OldWals and use of WAl for Hbase indexer

Hi

Reid
You suggest me to directly delete OldWals by using Hdfs command "hdfs -rm"
 Question is : It's really safe and I will not lose any data and Indexing
from the system.

Thanks
Manjeet Singh

On Thu, Jul 12, 2018 at 12:09 PM, Manjeet Singh <ma...@gmail.com>
wrote:

> Hi
>
> I have created HBASE-20877
> <https://issues.apache.org/jira/browse/HBASE-20877> for the same, request
> you to please move it into active sprint.
>
> Thanks
> Manjeet Singh
>
> On Thu, Jul 12, 2018 at 7:42 AM, Reid Chan <re...@outlook.com> wrote:
>
>> oldWals are supposed to be cleaned in master background chore, I also
>> doubt they are needed.
>>
>> HBASE-20352(for 1.x version) is to speed up cleaning oldWals, it may
>> address your concern "OldWals is quite huge"
>>
>>
>> R.C
>>
>>
>>
>> ________________________________________
>> From: Manjeet Singh <ma...@gmail.com>
>> Sent: 12 July 2018 08:19:21
>> To: user@hbase.apache.org
>> Subject: Re: Query for OldWals and use of WAl for Hbase indexer
>>
>> I have one more question
>>
>> If solr is having its own data mean its maintaining data in their shards
>> and hbase is maintaining in data folder... Why still oldWals need?
>>
>> Thanks
>> Manjeet singh
>>
>> On Wed, 11 Jul 2018, 23:19 Manjeet Singh, <ma...@gmail.com>
>> wrote:
>>
>> > Thanks Sean for your reply
>> >
>> > I still have some question un answered like
>> > Q1: How Hbase syncronized with Hbase indexer.
>> > Q2 What optimization I can apply.
>> > Q3 As it's clear from my stats, data in OldWals is quite huge so it's
>> not
>> > getting clear my HMaster., how can I improve my HDFS space issue due to
>> > this?
>> >
>> > Thanks
>> > Manjeet Singh
>> >
>> > On Wed, Jul 11, 2018 at 9:33 PM, Sean Busbey <bu...@apache.org> wrote:
>> >
>> >> Presuming you're using the Lily indexer[1], yes it relies on hbase's
>> >> built in cross-cluster replication.
>> >>
>> >> The replication system stores WALs until it can successfully send them
>> >> for replication. If you look in ZK you should be able to see which
>> >> regionserver(s) are waiting to send those WALs over. The easiest way
>> >> to do this is probably to look at the "zk dump" web page on the
>> >> Master's web ui[2].
>> >>
>> >> Once you have the particular region server(s), take a look at their
>> >> logs for messages about difficulty sending edits to the replication
>> >> peer you have set up for the destination solr collection.
>> >>
>> >> If you remove the WALs then the solr collection will have a hole in
>> >> it. Depending on how far behind you are, it might be quicker to 1)
>> >> remove the replication peer, 2) wait for old wals to clear, 3)
>> >> reenable replication, 4) use a batch indexing tool to index data
>> >> already in the table.
>> >>
>> >> [1]:
>> >>
>> >> http://ngdata.github.io/hbase-indexer/
>> >>
>> >> [2]:
>> >>
>> >> The specifics will vary depending on your installation, but the page
>> >> is essentially at a URL like
>> >> https://active-master-host.example.com:22002/zk.jsp
>> >>
>> >> the link is on the master UI landing page, near the bottom, in the
>> >> description of the "ZooKeeper Quorum" row. it's the end of "Addresses
>> >> of all registered ZK servers. For more, see zk dump."
>> >>
>> >> On Wed, Jul 11, 2018 at 10:16 AM, Manjeet Singh
>> >> <ma...@gmail.com> wrote:
>> >> > Hi All
>> >> >
>> >> > I have a query regarding Hbase replication and OldWals
>> >> >
>> >> > Hbase version 1.2.1
>> >> >
>> >> > To enable Hbase indexing we use below command on table
>> >> >
>> >> > alter '<NameOfTable>', {NAME => 'CF1', REPLICATION_SCOPE => 1}
>> >> >
>> >> > By Doing this actually replication get enabled as hbase-indexer
>> required
>> >> > it, as per my understanding indexer use hbase WAL (Please correct me
>> if
>> >> I
>> >> > am wrong).
>> >> >
>> >> > so question is How Hbase syncronize with Solr Indexer? What is the
>> role
>> >> of
>> >> > replication? what optimization we can apply in order to reduce data
>> >> size?
>> >> >
>> >> >
>> >> > I can see that our OldWals are getting filled , if Hmaster it self
>> >> taking
>> >> > care why it's reached to 7.2 TB? what if I delete it, does it impact
>> >> solr
>> >> > indexing?
>> >> >
>> >> > 7.2 K   21.5 K  /hbase/.hbase-snapshot
>> >> > 0       0       /hbase/.tmp
>> >> > 0       0       /hbase/MasterProcWALs
>> >> > 18.3 G  60.2 G  /hbase/WALs
>> >> > 28.7 G  86.1 G  /hbase/archive
>> >> > 0       0       /hbase/corrupt
>> >> > 1.7 T   5.2 T   /hbase/data
>> >> > 42      126     /hbase/hbase.id
>> >> > 7       21      /hbase/hbase.version
>> >> > 7.2 T   21.6 T  /hbase/oldWALs
>> >> >
>> >> >
>> >> >
>> >> >
>> >> > Thanks
>> >> > Manjeet Singh
>> >>
>> >
>> >
>> >
>> > --
>> > luv all
>> >
>>
>
>
>
> --
> luv all
>



--
luv all

Re: Query for OldWals and use of WAl for Hbase indexer

Posted by Manjeet Singh <ma...@gmail.com>.

Hi

Reid
You suggest me to directly delete OldWals by using Hdfs command "hdfs -rm"
 Question is : It's really safe and I will not lose any data and Indexing
from the system.

Thanks
Manjeet Singh

On Thu, Jul 12, 2018 at 12:09 PM, Manjeet Singh <ma...@gmail.com>
wrote:

> Hi
>
> I have created HBASE-20877
> <https://issues.apache.org/jira/browse/HBASE-20877> for the same, request
> you to please move it into active sprint.
>
> Thanks
> Manjeet Singh
>
> On Thu, Jul 12, 2018 at 7:42 AM, Reid Chan <re...@outlook.com> wrote:
>
>> oldWals are supposed to be cleaned in master background chore, I also
>> doubt they are needed.
>>
>> HBASE-20352(for 1.x version) is to speed up cleaning oldWals, it may
>> address your concern "OldWals is quite huge"
>>
>>
>> R.C
>>
>>
>>
>> ________________________________________
>> From: Manjeet Singh <ma...@gmail.com>
>> Sent: 12 July 2018 08:19:21
>> To: user@hbase.apache.org
>> Subject: Re: Query for OldWals and use of WAl for Hbase indexer
>>
>> I have one more question
>>
>> If solr is having its own data mean its maintaining data in their shards
>> and hbase is maintaining in data folder... Why still oldWals need?
>>
>> Thanks
>> Manjeet singh
>>
>> On Wed, 11 Jul 2018, 23:19 Manjeet Singh, <ma...@gmail.com>
>> wrote:
>>
>> > Thanks Sean for your reply
>> >
>> > I still have some question un answered like
>> > Q1: How Hbase syncronized with Hbase indexer.
>> > Q2 What optimization I can apply.
>> > Q3 As it's clear from my stats, data in OldWals is quite huge so it's
>> not
>> > getting clear my HMaster., how can I improve my HDFS space issue due to
>> > this?
>> >
>> > Thanks
>> > Manjeet Singh
>> >
>> > On Wed, Jul 11, 2018 at 9:33 PM, Sean Busbey <bu...@apache.org> wrote:
>> >
>> >> Presuming you're using the Lily indexer[1], yes it relies on hbase's
>> >> built in cross-cluster replication.
>> >>
>> >> The replication system stores WALs until it can successfully send them
>> >> for replication. If you look in ZK you should be able to see which
>> >> regionserver(s) are waiting to send those WALs over. The easiest way
>> >> to do this is probably to look at the "zk dump" web page on the
>> >> Master's web ui[2].
>> >>
>> >> Once you have the particular region server(s), take a look at their
>> >> logs for messages about difficulty sending edits to the replication
>> >> peer you have set up for the destination solr collection.
>> >>
>> >> If you remove the WALs then the solr collection will have a hole in
>> >> it. Depending on how far behind you are, it might be quicker to 1)
>> >> remove the replication peer, 2) wait for old wals to clear, 3)
>> >> reenable replication, 4) use a batch indexing tool to index data
>> >> already in the table.
>> >>
>> >> [1]:
>> >>
>> >> http://ngdata.github.io/hbase-indexer/
>> >>
>> >> [2]:
>> >>
>> >> The specifics will vary depending on your installation, but the page
>> >> is essentially at a URL like
>> >> https://active-master-host.example.com:22002/zk.jsp
>> >>
>> >> the link is on the master UI landing page, near the bottom, in the
>> >> description of the "ZooKeeper Quorum" row. it's the end of "Addresses
>> >> of all registered ZK servers. For more, see zk dump."
>> >>
>> >> On Wed, Jul 11, 2018 at 10:16 AM, Manjeet Singh
>> >> <ma...@gmail.com> wrote:
>> >> > Hi All
>> >> >
>> >> > I have a query regarding Hbase replication and OldWals
>> >> >
>> >> > Hbase version 1.2.1
>> >> >
>> >> > To enable Hbase indexing we use below command on table
>> >> >
>> >> > alter '<NameOfTable>', {NAME => 'CF1', REPLICATION_SCOPE => 1}
>> >> >
>> >> > By Doing this actually replication get enabled as hbase-indexer
>> required
>> >> > it, as per my understanding indexer use hbase WAL (Please correct me
>> if
>> >> I
>> >> > am wrong).
>> >> >
>> >> > so question is How Hbase syncronize with Solr Indexer? What is the
>> role
>> >> of
>> >> > replication? what optimization we can apply in order to reduce data
>> >> size?
>> >> >
>> >> >
>> >> > I can see that our OldWals are getting filled , if Hmaster it self
>> >> taking
>> >> > care why it's reached to 7.2 TB? what if I delete it, does it impact
>> >> solr
>> >> > indexing?
>> >> >
>> >> > 7.2 K   21.5 K  /hbase/.hbase-snapshot
>> >> > 0       0       /hbase/.tmp
>> >> > 0       0       /hbase/MasterProcWALs
>> >> > 18.3 G  60.2 G  /hbase/WALs
>> >> > 28.7 G  86.1 G  /hbase/archive
>> >> > 0       0       /hbase/corrupt
>> >> > 1.7 T   5.2 T   /hbase/data
>> >> > 42      126     /hbase/hbase.id
>> >> > 7       21      /hbase/hbase.version
>> >> > 7.2 T   21.6 T  /hbase/oldWALs
>> >> >
>> >> >
>> >> >
>> >> >
>> >> > Thanks
>> >> > Manjeet Singh
>> >>
>> >
>> >
>> >
>> > --
>> > luv all
>> >
>>
>
>
>
> --
> luv all
>



-- 
luv all

Re: Query for OldWals and use of WAl for Hbase indexer

Posted by Manjeet Singh <ma...@gmail.com>.

Hi

I have created HBASE-20877
<https://issues.apache.org/jira/browse/HBASE-20877> for the same, request
you to please move it into active sprint.

Thanks
Manjeet Singh

On Thu, Jul 12, 2018 at 7:42 AM, Reid Chan <re...@outlook.com> wrote:

> oldWals are supposed to be cleaned in master background chore, I also
> doubt they are needed.
>
> HBASE-20352(for 1.x version) is to speed up cleaning oldWals, it may
> address your concern "OldWals is quite huge"
>
>
> R.C
>
>
>
> ________________________________________
> From: Manjeet Singh <ma...@gmail.com>
> Sent: 12 July 2018 08:19:21
> To: user@hbase.apache.org
> Subject: Re: Query for OldWals and use of WAl for Hbase indexer
>
> I have one more question
>
> If solr is having its own data mean its maintaining data in their shards
> and hbase is maintaining in data folder... Why still oldWals need?
>
> Thanks
> Manjeet singh
>
> On Wed, 11 Jul 2018, 23:19 Manjeet Singh, <ma...@gmail.com>
> wrote:
>
> > Thanks Sean for your reply
> >
> > I still have some question un answered like
> > Q1: How Hbase syncronized with Hbase indexer.
> > Q2 What optimization I can apply.
> > Q3 As it's clear from my stats, data in OldWals is quite huge so it's not
> > getting clear my HMaster., how can I improve my HDFS space issue due to
> > this?
> >
> > Thanks
> > Manjeet Singh
> >
> > On Wed, Jul 11, 2018 at 9:33 PM, Sean Busbey <bu...@apache.org> wrote:
> >
> >> Presuming you're using the Lily indexer[1], yes it relies on hbase's
> >> built in cross-cluster replication.
> >>
> >> The replication system stores WALs until it can successfully send them
> >> for replication. If you look in ZK you should be able to see which
> >> regionserver(s) are waiting to send those WALs over. The easiest way
> >> to do this is probably to look at the "zk dump" web page on the
> >> Master's web ui[2].
> >>
> >> Once you have the particular region server(s), take a look at their
> >> logs for messages about difficulty sending edits to the replication
> >> peer you have set up for the destination solr collection.
> >>
> >> If you remove the WALs then the solr collection will have a hole in
> >> it. Depending on how far behind you are, it might be quicker to 1)
> >> remove the replication peer, 2) wait for old wals to clear, 3)
> >> reenable replication, 4) use a batch indexing tool to index data
> >> already in the table.
> >>
> >> [1]:
> >>
> >> http://ngdata.github.io/hbase-indexer/
> >>
> >> [2]:
> >>
> >> The specifics will vary depending on your installation, but the page
> >> is essentially at a URL like
> >> https://active-master-host.example.com:22002/zk.jsp
> >>
> >> the link is on the master UI landing page, near the bottom, in the
> >> description of the "ZooKeeper Quorum" row. it's the end of "Addresses
> >> of all registered ZK servers. For more, see zk dump."
> >>
> >> On Wed, Jul 11, 2018 at 10:16 AM, Manjeet Singh
> >> <ma...@gmail.com> wrote:
> >> > Hi All
> >> >
> >> > I have a query regarding Hbase replication and OldWals
> >> >
> >> > Hbase version 1.2.1
> >> >
> >> > To enable Hbase indexing we use below command on table
> >> >
> >> > alter '<NameOfTable>', {NAME => 'CF1', REPLICATION_SCOPE => 1}
> >> >
> >> > By Doing this actually replication get enabled as hbase-indexer
> required
> >> > it, as per my understanding indexer use hbase WAL (Please correct me
> if
> >> I
> >> > am wrong).
> >> >
> >> > so question is How Hbase syncronize with Solr Indexer? What is the
> role
> >> of
> >> > replication? what optimization we can apply in order to reduce data
> >> size?
> >> >
> >> >
> >> > I can see that our OldWals are getting filled , if Hmaster it self
> >> taking
> >> > care why it's reached to 7.2 TB? what if I delete it, does it impact
> >> solr
> >> > indexing?
> >> >
> >> > 7.2 K   21.5 K  /hbase/.hbase-snapshot
> >> > 0       0       /hbase/.tmp
> >> > 0       0       /hbase/MasterProcWALs
> >> > 18.3 G  60.2 G  /hbase/WALs
> >> > 28.7 G  86.1 G  /hbase/archive
> >> > 0       0       /hbase/corrupt
> >> > 1.7 T   5.2 T   /hbase/data
> >> > 42      126     /hbase/hbase.id
> >> > 7       21      /hbase/hbase.version
> >> > 7.2 T   21.6 T  /hbase/oldWALs
> >> >
> >> >
> >> >
> >> >
> >> > Thanks
> >> > Manjeet Singh
> >>
> >
> >
> >
> > --
> > luv all
> >
>



-- 
luv all

Re: Query for OldWals and use of WAl for Hbase indexer

Posted by Manjeet Singh <ma...@gmail.com>.

Hi

I have created HBASE-20877
<https://issues.apache.org/jira/browse/HBASE-20877> for the same, request
you to please move it into active sprint.

Thanks
Manjeet Singh

On Thu, Jul 12, 2018 at 7:42 AM, Reid Chan <re...@outlook.com> wrote:

> oldWals are supposed to be cleaned in master background chore, I also
> doubt they are needed.
>
> HBASE-20352(for 1.x version) is to speed up cleaning oldWals, it may
> address your concern "OldWals is quite huge"
>
>
> R.C
>
>
>
> ________________________________________
> From: Manjeet Singh <ma...@gmail.com>
> Sent: 12 July 2018 08:19:21
> To: user@hbase.apache.org
> Subject: Re: Query for OldWals and use of WAl for Hbase indexer
>
> I have one more question
>
> If solr is having its own data mean its maintaining data in their shards
> and hbase is maintaining in data folder... Why still oldWals need?
>
> Thanks
> Manjeet singh
>
> On Wed, 11 Jul 2018, 23:19 Manjeet Singh, <ma...@gmail.com>
> wrote:
>
> > Thanks Sean for your reply
> >
> > I still have some question un answered like
> > Q1: How Hbase syncronized with Hbase indexer.
> > Q2 What optimization I can apply.
> > Q3 As it's clear from my stats, data in OldWals is quite huge so it's not
> > getting clear my HMaster., how can I improve my HDFS space issue due to
> > this?
> >
> > Thanks
> > Manjeet Singh
> >
> > On Wed, Jul 11, 2018 at 9:33 PM, Sean Busbey <bu...@apache.org> wrote:
> >
> >> Presuming you're using the Lily indexer[1], yes it relies on hbase's
> >> built in cross-cluster replication.
> >>
> >> The replication system stores WALs until it can successfully send them
> >> for replication. If you look in ZK you should be able to see which
> >> regionserver(s) are waiting to send those WALs over. The easiest way
> >> to do this is probably to look at the "zk dump" web page on the
> >> Master's web ui[2].
> >>
> >> Once you have the particular region server(s), take a look at their
> >> logs for messages about difficulty sending edits to the replication
> >> peer you have set up for the destination solr collection.
> >>
> >> If you remove the WALs then the solr collection will have a hole in
> >> it. Depending on how far behind you are, it might be quicker to 1)
> >> remove the replication peer, 2) wait for old wals to clear, 3)
> >> reenable replication, 4) use a batch indexing tool to index data
> >> already in the table.
> >>
> >> [1]:
> >>
> >> http://ngdata.github.io/hbase-indexer/
> >>
> >> [2]:
> >>
> >> The specifics will vary depending on your installation, but the page
> >> is essentially at a URL like
> >> https://active-master-host.example.com:22002/zk.jsp
> >>
> >> the link is on the master UI landing page, near the bottom, in the
> >> description of the "ZooKeeper Quorum" row. it's the end of "Addresses
> >> of all registered ZK servers. For more, see zk dump."
> >>
> >> On Wed, Jul 11, 2018 at 10:16 AM, Manjeet Singh
> >> <ma...@gmail.com> wrote:
> >> > Hi All
> >> >
> >> > I have a query regarding Hbase replication and OldWals
> >> >
> >> > Hbase version 1.2.1
> >> >
> >> > To enable Hbase indexing we use below command on table
> >> >
> >> > alter '<NameOfTable>', {NAME => 'CF1', REPLICATION_SCOPE => 1}
> >> >
> >> > By Doing this actually replication get enabled as hbase-indexer
> required
> >> > it, as per my understanding indexer use hbase WAL (Please correct me
> if
> >> I
> >> > am wrong).
> >> >
> >> > so question is How Hbase syncronize with Solr Indexer? What is the
> role
> >> of
> >> > replication? what optimization we can apply in order to reduce data
> >> size?
> >> >
> >> >
> >> > I can see that our OldWals are getting filled , if Hmaster it self
> >> taking
> >> > care why it's reached to 7.2 TB? what if I delete it, does it impact
> >> solr
> >> > indexing?
> >> >
> >> > 7.2 K   21.5 K  /hbase/.hbase-snapshot
> >> > 0       0       /hbase/.tmp
> >> > 0       0       /hbase/MasterProcWALs
> >> > 18.3 G  60.2 G  /hbase/WALs
> >> > 28.7 G  86.1 G  /hbase/archive
> >> > 0       0       /hbase/corrupt
> >> > 1.7 T   5.2 T   /hbase/data
> >> > 42      126     /hbase/hbase.id
> >> > 7       21      /hbase/hbase.version
> >> > 7.2 T   21.6 T  /hbase/oldWALs
> >> >
> >> >
> >> >
> >> >
> >> > Thanks
> >> > Manjeet Singh
> >>
> >
> >
> >
> > --
> > luv all
> >
>



-- 
luv all

Re: Query for OldWals and use of WAl for Hbase indexer

Posted by Reid Chan <re...@outlook.com>.

oldWals are supposed to be cleaned in master background chore, I also doubt they are needed.

HBASE-20352(for 1.x version) is to speed up cleaning oldWals, it may address your concern "OldWals is quite huge"


R.C



________________________________________
From: Manjeet Singh <ma...@gmail.com>
Sent: 12 July 2018 08:19:21
To: user@hbase.apache.org
Subject: Re: Query for OldWals and use of WAl for Hbase indexer

I have one more question

If solr is having its own data mean its maintaining data in their shards
and hbase is maintaining in data folder... Why still oldWals need?

Thanks
Manjeet singh

On Wed, 11 Jul 2018, 23:19 Manjeet Singh, <ma...@gmail.com>
wrote:

> Thanks Sean for your reply
>
> I still have some question un answered like
> Q1: How Hbase syncronized with Hbase indexer.
> Q2 What optimization I can apply.
> Q3 As it's clear from my stats, data in OldWals is quite huge so it's not
> getting clear my HMaster., how can I improve my HDFS space issue due to
> this?
>
> Thanks
> Manjeet Singh
>
> On Wed, Jul 11, 2018 at 9:33 PM, Sean Busbey <bu...@apache.org> wrote:
>
>> Presuming you're using the Lily indexer[1], yes it relies on hbase's
>> built in cross-cluster replication.
>>
>> The replication system stores WALs until it can successfully send them
>> for replication. If you look in ZK you should be able to see which
>> regionserver(s) are waiting to send those WALs over. The easiest way
>> to do this is probably to look at the "zk dump" web page on the
>> Master's web ui[2].
>>
>> Once you have the particular region server(s), take a look at their
>> logs for messages about difficulty sending edits to the replication
>> peer you have set up for the destination solr collection.
>>
>> If you remove the WALs then the solr collection will have a hole in
>> it. Depending on how far behind you are, it might be quicker to 1)
>> remove the replication peer, 2) wait for old wals to clear, 3)
>> reenable replication, 4) use a batch indexing tool to index data
>> already in the table.
>>
>> [1]:
>>
>> http://ngdata.github.io/hbase-indexer/
>>
>> [2]:
>>
>> The specifics will vary depending on your installation, but the page
>> is essentially at a URL like
>> https://active-master-host.example.com:22002/zk.jsp
>>
>> the link is on the master UI landing page, near the bottom, in the
>> description of the "ZooKeeper Quorum" row. it's the end of "Addresses
>> of all registered ZK servers. For more, see zk dump."
>>
>> On Wed, Jul 11, 2018 at 10:16 AM, Manjeet Singh
>> <ma...@gmail.com> wrote:
>> > Hi All
>> >
>> > I have a query regarding Hbase replication and OldWals
>> >
>> > Hbase version 1.2.1
>> >
>> > To enable Hbase indexing we use below command on table
>> >
>> > alter '<NameOfTable>', {NAME => 'CF1', REPLICATION_SCOPE => 1}
>> >
>> > By Doing this actually replication get enabled as hbase-indexer required
>> > it, as per my understanding indexer use hbase WAL (Please correct me if
>> I
>> > am wrong).
>> >
>> > so question is How Hbase syncronize with Solr Indexer? What is the role
>> of
>> > replication? what optimization we can apply in order to reduce data
>> size?
>> >
>> >
>> > I can see that our OldWals are getting filled , if Hmaster it self
>> taking
>> > care why it's reached to 7.2 TB? what if I delete it, does it impact
>> solr
>> > indexing?
>> >
>> > 7.2 K   21.5 K  /hbase/.hbase-snapshot
>> > 0       0       /hbase/.tmp
>> > 0       0       /hbase/MasterProcWALs
>> > 18.3 G  60.2 G  /hbase/WALs
>> > 28.7 G  86.1 G  /hbase/archive
>> > 0       0       /hbase/corrupt
>> > 1.7 T   5.2 T   /hbase/data
>> > 42      126     /hbase/hbase.id
>> > 7       21      /hbase/hbase.version
>> > 7.2 T   21.6 T  /hbase/oldWALs
>> >
>> >
>> >
>> >
>> > Thanks
>> > Manjeet Singh
>>
>
>
>
> --
> luv all
>

Re: Query for OldWals and use of WAl for Hbase indexer

Posted by Sean Busbey <bu...@apache.org>.

On Wed, Jul 11, 2018 at 7:19 PM, Manjeet Singh
<ma...@gmail.com> wrote:
> I have one more question
>
> If solr is having its own data mean its maintaining data in their shards
> and hbase is maintaining in data folder... Why still oldWals need?

Once the HBase replication system is active and there's a peer (even a
disabled peer) it starts retaining WALs until they have been
successfully acknowledged by all peer clusters. Since the Lily Indexer
for HBase makes use of the replication system to get an stream of
edits, the WALs will stick around until the indexer has said it saw
the edits in them.

Re: Query for OldWals and use of WAl for Hbase indexer

Posted by Manjeet Singh <ma...@gmail.com>.

I have one more question

If solr is having its own data mean its maintaining data in their shards
and hbase is maintaining in data folder... Why still oldWals need?

Thanks
Manjeet singh

On Wed, 11 Jul 2018, 23:19 Manjeet Singh, <ma...@gmail.com>
wrote:

> Thanks Sean for your reply
>
> I still have some question un answered like
> Q1: How Hbase syncronized with Hbase indexer.
> Q2 What optimization I can apply.
> Q3 As it's clear from my stats, data in OldWals is quite huge so it's not
> getting clear my HMaster., how can I improve my HDFS space issue due to
> this?
>
> Thanks
> Manjeet Singh
>
> On Wed, Jul 11, 2018 at 9:33 PM, Sean Busbey <bu...@apache.org> wrote:
>
>> Presuming you're using the Lily indexer[1], yes it relies on hbase's
>> built in cross-cluster replication.
>>
>> The replication system stores WALs until it can successfully send them
>> for replication. If you look in ZK you should be able to see which
>> regionserver(s) are waiting to send those WALs over. The easiest way
>> to do this is probably to look at the "zk dump" web page on the
>> Master's web ui[2].
>>
>> Once you have the particular region server(s), take a look at their
>> logs for messages about difficulty sending edits to the replication
>> peer you have set up for the destination solr collection.
>>
>> If you remove the WALs then the solr collection will have a hole in
>> it. Depending on how far behind you are, it might be quicker to 1)
>> remove the replication peer, 2) wait for old wals to clear, 3)
>> reenable replication, 4) use a batch indexing tool to index data
>> already in the table.
>>
>> [1]:
>>
>> http://ngdata.github.io/hbase-indexer/
>>
>> [2]:
>>
>> The specifics will vary depending on your installation, but the page
>> is essentially at a URL like
>> https://active-master-host.example.com:22002/zk.jsp
>>
>> the link is on the master UI landing page, near the bottom, in the
>> description of the "ZooKeeper Quorum" row. it's the end of "Addresses
>> of all registered ZK servers. For more, see zk dump."
>>
>> On Wed, Jul 11, 2018 at 10:16 AM, Manjeet Singh
>> <ma...@gmail.com> wrote:
>> > Hi All
>> >
>> > I have a query regarding Hbase replication and OldWals
>> >
>> > Hbase version 1.2.1
>> >
>> > To enable Hbase indexing we use below command on table
>> >
>> > alter '<NameOfTable>', {NAME => 'CF1', REPLICATION_SCOPE => 1}
>> >
>> > By Doing this actually replication get enabled as hbase-indexer required
>> > it, as per my understanding indexer use hbase WAL (Please correct me if
>> I
>> > am wrong).
>> >
>> > so question is How Hbase syncronize with Solr Indexer? What is the role
>> of
>> > replication? what optimization we can apply in order to reduce data
>> size?
>> >
>> >
>> > I can see that our OldWals are getting filled , if Hmaster it self
>> taking
>> > care why it's reached to 7.2 TB? what if I delete it, does it impact
>> solr
>> > indexing?
>> >
>> > 7.2 K   21.5 K  /hbase/.hbase-snapshot
>> > 0       0       /hbase/.tmp
>> > 0       0       /hbase/MasterProcWALs
>> > 18.3 G  60.2 G  /hbase/WALs
>> > 28.7 G  86.1 G  /hbase/archive
>> > 0       0       /hbase/corrupt
>> > 1.7 T   5.2 T   /hbase/data
>> > 42      126     /hbase/hbase.id
>> > 7       21      /hbase/hbase.version
>> > 7.2 T   21.6 T  /hbase/oldWALs
>> >
>> >
>> >
>> >
>> > Thanks
>> > Manjeet Singh
>>
>
>
>
> --
> luv all
>

Re: Query for OldWals and use of WAl for Hbase indexer

Posted by Manjeet Singh <ma...@gmail.com>.

Thanks Sean for your reply

I still have some question un answered like
Q1: How Hbase syncronized with Hbase indexer.
Q2 What optimization I can apply.
Q3 As it's clear from my stats, data in OldWals is quite huge so it's not
getting clear my HMaster., how can I improve my HDFS space issue due to
this?

Thanks
Manjeet Singh

On Wed, Jul 11, 2018 at 9:33 PM, Sean Busbey <bu...@apache.org> wrote:

> Presuming you're using the Lily indexer[1], yes it relies on hbase's
> built in cross-cluster replication.
>
> The replication system stores WALs until it can successfully send them
> for replication. If you look in ZK you should be able to see which
> regionserver(s) are waiting to send those WALs over. The easiest way
> to do this is probably to look at the "zk dump" web page on the
> Master's web ui[2].
>
> Once you have the particular region server(s), take a look at their
> logs for messages about difficulty sending edits to the replication
> peer you have set up for the destination solr collection.
>
> If you remove the WALs then the solr collection will have a hole in
> it. Depending on how far behind you are, it might be quicker to 1)
> remove the replication peer, 2) wait for old wals to clear, 3)
> reenable replication, 4) use a batch indexing tool to index data
> already in the table.
>
> [1]:
>
> http://ngdata.github.io/hbase-indexer/
>
> [2]:
>
> The specifics will vary depending on your installation, but the page
> is essentially at a URL like
> https://active-master-host.example.com:22002/zk.jsp
>
> the link is on the master UI landing page, near the bottom, in the
> description of the "ZooKeeper Quorum" row. it's the end of "Addresses
> of all registered ZK servers. For more, see zk dump."
>
> On Wed, Jul 11, 2018 at 10:16 AM, Manjeet Singh
> <ma...@gmail.com> wrote:
> > Hi All
> >
> > I have a query regarding Hbase replication and OldWals
> >
> > Hbase version 1.2.1
> >
> > To enable Hbase indexing we use below command on table
> >
> > alter '<NameOfTable>', {NAME => 'CF1', REPLICATION_SCOPE => 1}
> >
> > By Doing this actually replication get enabled as hbase-indexer required
> > it, as per my understanding indexer use hbase WAL (Please correct me if I
> > am wrong).
> >
> > so question is How Hbase syncronize with Solr Indexer? What is the role
> of
> > replication? what optimization we can apply in order to reduce data size?
> >
> >
> > I can see that our OldWals are getting filled , if Hmaster it self taking
> > care why it's reached to 7.2 TB? what if I delete it, does it impact solr
> > indexing?
> >
> > 7.2 K   21.5 K  /hbase/.hbase-snapshot
> > 0       0       /hbase/.tmp
> > 0       0       /hbase/MasterProcWALs
> > 18.3 G  60.2 G  /hbase/WALs
> > 28.7 G  86.1 G  /hbase/archive
> > 0       0       /hbase/corrupt
> > 1.7 T   5.2 T   /hbase/data
> > 42      126     /hbase/hbase.id
> > 7       21      /hbase/hbase.version
> > 7.2 T   21.6 T  /hbase/oldWALs
> >
> >
> >
> >
> > Thanks
> > Manjeet Singh
>



-- 
luv all

Re: Query for OldWals and use of WAl for Hbase indexer

Posted by Sean Busbey <bu...@apache.org>.

Presuming you're using the Lily indexer[1], yes it relies on hbase's
built in cross-cluster replication.

The replication system stores WALs until it can successfully send them
for replication. If you look in ZK you should be able to see which
regionserver(s) are waiting to send those WALs over. The easiest way
to do this is probably to look at the "zk dump" web page on the
Master's web ui[2].

Once you have the particular region server(s), take a look at their
logs for messages about difficulty sending edits to the replication
peer you have set up for the destination solr collection.

If you remove the WALs then the solr collection will have a hole in
it. Depending on how far behind you are, it might be quicker to 1)
remove the replication peer, 2) wait for old wals to clear, 3)
reenable replication, 4) use a batch indexing tool to index data
already in the table.

[1]:

http://ngdata.github.io/hbase-indexer/

[2]:

The specifics will vary depending on your installation, but the page
is essentially at a URL like
https://active-master-host.example.com:22002/zk.jsp

the link is on the master UI landing page, near the bottom, in the
description of the "ZooKeeper Quorum" row. it's the end of "Addresses
of all registered ZK servers. For more, see zk dump."

On Wed, Jul 11, 2018 at 10:16 AM, Manjeet Singh
<ma...@gmail.com> wrote:
> Hi All
>
> I have a query regarding Hbase replication and OldWals
>
> Hbase version 1.2.1
>
> To enable Hbase indexing we use below command on table
>
> alter '<NameOfTable>', {NAME => 'CF1', REPLICATION_SCOPE => 1}
>
> By Doing this actually replication get enabled as hbase-indexer required
> it, as per my understanding indexer use hbase WAL (Please correct me if I
> am wrong).
>
> so question is How Hbase syncronize with Solr Indexer? What is the role of
> replication? what optimization we can apply in order to reduce data size?
>
>
> I can see that our OldWals are getting filled , if Hmaster it self taking
> care why it's reached to 7.2 TB? what if I delete it, does it impact solr
> indexing?
>
> 7.2 K   21.5 K  /hbase/.hbase-snapshot
> 0       0       /hbase/.tmp
> 0       0       /hbase/MasterProcWALs
> 18.3 G  60.2 G  /hbase/WALs
> 28.7 G  86.1 G  /hbase/archive
> 0       0       /hbase/corrupt
> 1.7 T   5.2 T   /hbase/data
> 42      126     /hbase/hbase.id
> 7       21      /hbase/hbase.version
> 7.2 T   21.6 T  /hbase/oldWALs
>
>
>
>
> Thanks
> Manjeet Singh