You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hbase.apache.org by Tianying Chang <ty...@gmail.com> on 2016/05/27 21:32:37 UTC

Major compaction cannot remove deleted rows until the region is split. Strange!

Hi,

We saw a very strange case in one of our production cluster. A couple
regions cannot get their deleted rows or delete marker removed even after
major compaction. However when the region triggered split (we set 100G for
auto split), the deletion worked. The 100G region becomes two 10G daughter
regions, and all the delete marker are gone.

Also, the same region in the slave cluster (through replication) have
normal size at about 20+G.

BTW, the delete marker in the regions are mostly deleteFamily if it
matters.

This is really weird. Anyone has any clue for this strange behavior?

Thanks
Tian-Ying

A snippet of the HFile generated by the major compaction:

: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@\x10\x00?PF/d:/1459808114380/DeleteFamily/vlen=0/ts=2292870047
V:
K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@\x10\x00?PF/d:/1459808114011/DeleteFamily/vlen=0/ts=2292869794
V:
K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@\x10\x00?PF/d:/1459805381104/DeleteFamily/vlen=0/ts=2291072240
V:
K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@\x10\x00?PF/d:/1459805380673/DeleteFamily/vlen=0/ts=2291071997
V:
K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@\x10\x00?PF/d:/1459802643449/DeleteFamily/vlen=0/ts=2290248886
V:
K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@\x10\x00?PF/d:/1459802643246/DeleteFamily/vlen=0/ts=2290248786
V:
K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@\x10\x00?PF/d:/1459799913003/DeleteFamily/vlen=0/ts=2289446916
V:
K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@\x10\x00?PF/d:/1459797181831/DeleteFamily/vlen=0/ts=2288670451
V:
K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@\x10\x00?PF/d:/1459794447388/DeleteFamily/vlen=0/ts=2287911443
V:
K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@\x10\x00?PF/d:/1459791708803/DeleteFamily/vlen=0/ts=2287213792
V:
K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@\x10\x00?PF/d:/1459788978387/DeleteFamily/vlen=0/ts=2286488738
V:
K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@\x10\x00?PF/d:/1459786243642/DeleteFamily/vlen=0/ts=2285778927
V:

Re: Major compaction cannot remove deleted rows until the region is split. Strange!

Posted by Tianying Chang <ty...@gmail.com>.

Yes, I tried manually trigger major compact on the problematic region, and
it is the same that the region did not get any deleted rows removed,
therefore the size is kept at 90+G. And I verified that if do a split, then
the major compaction will work as expected, i.e. into two 10+G regions.

Thanks
Tian-Ying

On Fri, May 27, 2016 at 2:54 PM, Frank Luo <jl...@merkleinc.com> wrote:

> What if you manually trigger major-compact on that particular region? Does
> it run and the delete markers removed?
>
> -----Original Message-----
> From: Tianying Chang [mailto:tychang@gmail.com]
> Sent: Friday, May 27, 2016 4:33 PM
> To: user@hbase.apache.org
> Subject: Major compaction cannot remove deleted rows until the region is
> split. Strange!
>
> Hi,
>
> We saw a very strange case in one of our production cluster. A couple
> regions cannot get their deleted rows or delete marker removed even after
> major compaction. However when the region triggered split (we set 100G for
> auto split), the deletion worked. The 100G region becomes two 10G daughter
> regions, and all the delete marker are gone.
>
> Also, the same region in the slave cluster (through replication) have
> normal size at about 20+G.
>
> BTW, the delete marker in the regions are mostly deleteFamily if it
> matters.
>
> This is really weird. Anyone has any clue for this strange behavior?
>
> Thanks
> Tian-Ying
>
> A snippet of the HFile generated by the major compaction:
>
> : \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@
> \x10\x00?PF/d:/1459808114380/DeleteFamily/vlen=0/ts=2292870047
> V:
> K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@
> \x10\x00?PF/d:/1459808114011/DeleteFamily/vlen=0/ts=2292869794
> V:
> K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@
> \x10\x00?PF/d:/1459805381104/DeleteFamily/vlen=0/ts=2291072240
> V:
> K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@
> \x10\x00?PF/d:/1459805380673/DeleteFamily/vlen=0/ts=2291071997
> V:
> K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@
> \x10\x00?PF/d:/1459802643449/DeleteFamily/vlen=0/ts=2290248886
> V:
> K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@
> \x10\x00?PF/d:/1459802643246/DeleteFamily/vlen=0/ts=2290248786
> V:
> K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@
> \x10\x00?PF/d:/1459799913003/DeleteFamily/vlen=0/ts=2289446916
> V:
> K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@
> \x10\x00?PF/d:/1459797181831/DeleteFamily/vlen=0/ts=2288670451
> V:
> K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@
> \x10\x00?PF/d:/1459794447388/DeleteFamily/vlen=0/ts=2287911443
> V:
> K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@
> \x10\x00?PF/d:/1459791708803/DeleteFamily/vlen=0/ts=2287213792
> V:
> K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@
> \x10\x00?PF/d:/1459788978387/DeleteFamily/vlen=0/ts=2286488738
> V:
> K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@
> \x10\x00?PF/d:/1459786243642/DeleteFamily/vlen=0/ts=2285778927
> V:
> Join us at Merkle’s 2016 annual Performance Marketer Executive Summit –
> June 7 – 9 in Memphis, TN
> <http://www2.merkleinc.com/l/47252/2016-04-26/3lbfdc>
>
> Download the latest installment of our annual Marketing Imperatives,
> “Winning with People-Based Marketing”<
> http://www2.merkleinc.com/l/47252/2016-04-26/3lbfd1>
> This email and any attachments transmitted with it are intended for use by
> the intended recipient(s) only. If you have received this email in error,
> please notify the sender immediately and then delete it. If you are not the
> intended recipient, you must not keep, use, disclose, copy or distribute
> this email without the author’s prior permission. We take precautions to
> minimize the risk of transmitting software viruses, but we advise you to
> perform your own virus checks on any attachment to this message. We cannot
> accept liability for any loss or damage caused by software viruses. The
> information contained in this communication may be confidential and may be
> subject to the attorney-client privilege.
>

RE: Major compaction cannot remove deleted rows until the region is split. Strange!

Posted by Frank Luo <jl...@merkleinc.com>.

What if you manually trigger major-compact on that particular region? Does it run and the delete markers removed?

-----Original Message-----
From: Tianying Chang [mailto:tychang@gmail.com]
Sent: Friday, May 27, 2016 4:33 PM
To: user@hbase.apache.org
Subject: Major compaction cannot remove deleted rows until the region is split. Strange!

Hi,

We saw a very strange case in one of our production cluster. A couple regions cannot get their deleted rows or delete marker removed even after major compaction. However when the region triggered split (we set 100G for auto split), the deletion worked. The 100G region becomes two 10G daughter regions, and all the delete marker are gone.

Also, the same region in the slave cluster (through replication) have normal size at about 20+G.

BTW, the delete marker in the regions are mostly deleteFamily if it matters.

This is really weird. Anyone has any clue for this strange behavior?

Thanks
Tian-Ying

A snippet of the HFile generated by the major compaction:

: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@\x10\x00?PF/d:/1459808114380/DeleteFamily/vlen=0/ts=2292870047
V:
K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@\x10\x00?PF/d:/1459808114011/DeleteFamily/vlen=0/ts=2292869794
V:
K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@\x10\x00?PF/d:/1459805381104/DeleteFamily/vlen=0/ts=2291072240
V:
K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@\x10\x00?PF/d:/1459805380673/DeleteFamily/vlen=0/ts=2291071997
V:
K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@\x10\x00?PF/d:/1459802643449/DeleteFamily/vlen=0/ts=2290248886
V:
K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@\x10\x00?PF/d:/1459802643246/DeleteFamily/vlen=0/ts=2290248786
V:
K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@\x10\x00?PF/d:/1459799913003/DeleteFamily/vlen=0/ts=2289446916
V:
K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@\x10\x00?PF/d:/1459797181831/DeleteFamily/vlen=0/ts=2288670451
V:
K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@\x10\x00?PF/d:/1459794447388/DeleteFamily/vlen=0/ts=2287911443
V:
K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@\x10\x00?PF/d:/1459791708803/DeleteFamily/vlen=0/ts=2287213792
V:
K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@\x10\x00?PF/d:/1459788978387/DeleteFamily/vlen=0/ts=2286488738
V:
K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@\x10\x00?PF/d:/1459786243642/DeleteFamily/vlen=0/ts=2285778927
V:
Join us at Merkle’s 2016 annual Performance Marketer Executive Summit – June 7 – 9 in Memphis, TN
<http://www2.merkleinc.com/l/47252/2016-04-26/3lbfdc>

Download the latest installment of our annual Marketing Imperatives, “Winning with People-Based Marketing”<http://www2.merkleinc.com/l/47252/2016-04-26/3lbfd1>
This email and any attachments transmitted with it are intended for use by the intended recipient(s) only. If you have received this email in error, please notify the sender immediately and then delete it. If you are not the intended recipient, you must not keep, use, disclose, copy or distribute this email without the author’s prior permission. We take precautions to minimize the risk of transmitting software viruses, but we advise you to perform your own virus checks on any attachment to this message. We cannot accept liability for any loss or damage caused by software viruses. The information contained in this communication may be confidential and may be subject to the attorney-client privilege.

Re: Major compaction cannot remove deleted rows until the region is split. Strange!

Posted by Guanghao Zhang <zg...@gmail.com>.

We add some log in our production cluster.
2016-06-06,21:37:28,443 INFO org.apache.hadoop.hbase.regionserver.HRegion:
scannerReadPoints size is 0 and smallest read point is 4037995
2016-06-06,21:37:51,429 ERROR
org.apache.hadoop.hbase.regionserver.HRegionServer: Failed
openScannerjava.io.IOException: Could not seek StoreFileScanner
2016-06-06,21:38:02,103 INFO org.apache.hadoop.hbase.regionserver.HRegion:
scannerReadPoints size is 1 and smallest read point is 4038037
2016-06-13,17:17:10,516 INFO org.apache.hadoop.hbase.regionserver.HRegion:
scannerReadPoints size is 1 and smallest read point is 4038037

When new RegionScanner, it will add a scanner read point in
scannerReadPoints. But if we got a exception after add read point, the read
point will keep in regions server and the delete after this mvcc number
will never be compacted.
Create a issue about this. https://issues.apache.org/jira/browse/HBASE-16012

2016-06-10 3:08 GMT+08:00 Stack <st...@duboce.net>:

> Tian-ying:
>
> Can you try the suggestion below to see if it helps?
>
> Do you see the lease expired logs?
>
> St.Ack
>
>
> On Thu, Jun 2, 2016 at 7:03 PM, hao zhang <zg...@gmail.com> wrote:
>
> > Hi, Stack
> >
> >   We found this in our production cluster, too.  I take a look about the
> > code and found one case which will make the major compact not work.
> >
> >   1. put some rows
> >   2. scan
> >   3. delete
> >   4. scanner didn't close rightly, it will keep the  the read point in
> > region's scannerReadPoints.
> >   5. major compact. It can't work because we have a scanner which have
> > small read point than the delete.
> >
> >   But if move the region to new rs, the scannerReadPoints will update to
> > the biggest memstoreTs which form sotrefile. So major compact will work.
> >   I thought the try...catch module in Scanner.leaseExpired() method
> needs a
> > finally module to close the region scanner rightly.
> >
> >     public void leaseExpired() {
> >       RegionScannerHolder rsh = scanners.remove(this.scannerName);
> >       if (rsh != null) {
> >         RegionScanner s = rsh.s;
> >         LOG.info("Scanner " + this.scannerName + " lease expired on
> region
> > "
> >           + s.getRegionInfo().getRegionNameAsString());
> >         try {
> >           Region region =
> > regionServer.getRegion(s.getRegionInfo().getRegionName());
> >           if (region != null && region.getCoprocessorHost() != null) {
> >             region.getCoprocessorHost().preScannerClose(s);
> >           }
> >           s.close();
> >           if (region != null && region.getCoprocessorHost() != null) {
> >             region.getCoprocessorHost().postScannerClose(s);
> >           }
> >         } catch (IOException e) {
> >           LOG.error("Closing scanner for "
> >             + s.getRegionInfo().getRegionNameAsString(), e);
> >         }
> >       } else {
> >         LOG.warn("Scanner " + this.scannerName + " lease expired, but no
> > related" +
> >           " scanner found, hence no chance to close that related
> > scanner!");
> >       }
> >     }
> >
> >
> > 2016-06-02 2:50 GMT+08:00 Stack <st...@duboce.net>:
> >
> > > On Wed, Jun 1, 2016 at 10:56 AM, Tianying Chang <ty...@gmail.com>
> > wrote:
> > >
> > > > Hi, Stack
> > > >
> > > > After moving the region and issue a major compact on that region, its
> > > size
> > > > shrink from 99G down to 24G. So it looks like the region is in a bad
> > > state
> > > > that cannot recover, close/open it fixed the issue. And from the
> region
> > > > size metric graph, we can see major compaction stop working  since
> > March
> > > > 31, so some bug that caused region enter into bad state...
> > Unfortunately,
> > > > we don't have DEBUG enabled and that is the last region that has the
> > > issue,
> > > > it is hard to figure out what is the bug that caused the bad state...
> > > >
> > > >
> > > Interesting. So moving it to another RS make it major-compactable? That
> > > would seem to indicate some state kept in the RS memory is preventing
> the
> > > major compaction running. Is moving the region a workaround for you
> until
> > > we figure what it is Tian-Ying?
> > >
> > > St.
> > >
> > >
> > >
> > > > Thanks
> > > > Tian-Ying
> > > >
> > > > On Tue, May 31, 2016 at 3:43 PM, Tianying Chang <ty...@gmail.com>
> > > wrote:
> > > >
> > > > > Hi, Stack
> > > > >
> > > > > Based on the log, the major compaction was run, and it took 5+
> hours.
> > > > And
> > > > > I also manually run major_compact from hbase shell explicitly to
> > > verify.
> > > > >
> > > > > I just moved the region to a different RS and issued a
> major_compact
> > on
> > > > > that region again, let me see if the major compaction can succeed
> and
> > > > will
> > > > > report back.
> > > > >
> > > > > Thanks
> > > > > Tian-Ying
> > > > >
> > > > > On Sun, May 29, 2016 at 4:35 PM, Stack <st...@duboce.net> wrote:
> > > > >
> > > > >> On Fri, May 27, 2016 at 3:17 PM, Tianying Chang <
> tychang@gmail.com>
> > > > >> wrote:
> > > > >>
> > > > >> > Yes, it is 94.26.  By a quick glance, I didn't  see any put that
> > is
> > > > >> older
> > > > >> > than the delete marker's TS, which could go as far as about
> couple
> > > > weeks
> > > > >> > ago since major compaction on it for long time seems.
> > > > >> >
> > > > >> Also it is really strange that if the region is split, then seems
> > > > >> > everything is working as expected. Also we noticed, the same
> > region
> > > > >> > replicated at the slave side is totally normal, i.e. at 20+G....
> > > > >> >
> > > > >> >
> > > > >> If you move the region to another server, does that work?
> > > > >>
> > > > >> Looking in 0.94 codebase, I see this in Compactor#compact
> > > > >>
> > > > >>
> > > > >>       // For major compactions calculate the earliest put
> timestamp
> > > > >>
> > > > >>       // of all involved storefiles. This is used to remove
> > > > >>
> > > > >>       // family delete marker during the compaction.
> > > > >>
> > > > >>       if (majorCompaction) {
> > > > >>
> > > > >>         tmp = fileInfo.get(StoreFile.EARLIEST_PUT_TS);
> > > > >>
> > > > >>         if (tmp == null) {
> > > > >>
> > > > >>           // There's a file with no information, must be an old
> one
> > > > >>
> > > > >>           // assume we have very old puts
> > > > >>
> > > > >>           earliestPutTs = HConstants.OLDEST_TIMESTAMP;
> > > > >>
> > > > >>         } else {
> > > > >>
> > > > >>           earliestPutTs = Math.min(earliestPutTs,
> > Bytes.toLong(tmp));
> > > > >>
> > > > >>         }
> > > > >>
> > > > >>       }
> > > > >>
> > > > >>
> > > > >> The above is followed by this log line:
> > > > >>
> > > > >>
> > > > >>       if (LOG.isDebugEnabled()) {
> > > > >>
> > > > >>         LOG.debug("Compacting " + file +
> > > > >>
> > > > >>           ", keycount=" + keyCount +
> > > > >>
> > > > >>           ", bloomtype=" + r.getBloomFilterType().toString() +
> > > > >>
> > > > >>           ", size=" + StringUtils.humanReadableInt(r.length()) +
> > > > >>
> > > > >>           ", encoding=" + r.getHFileReader().getEncodingOnDisk() +
> > > > >>
> > > > >>           (majorCompaction? ", earliestPutTs=" + earliestPutTs:
> > ""));
> > > > >>
> > > > >>       }
> > > > >>
> > > > >> This prints out earliestPutTs. You see that in the logs?  You
> > running
> > > > with
> > > > >> DEBUG? Does the earliest put ts preclude our dropping delete
> family?
> > > > >>
> > > > >>
> > > > >> Looking more in code, we retain deletes in following
> circumstances:
> > > > >>
> > > > >>
> > > > >>     this.retainDeletesInOutput = scanType ==
> ScanType.MINOR_COMPACT
> > ||
> > > > >> scan
> > > > >> .isRaw();
> > > > >>
> > > > >>
> > > > >> So, for sure we are running major compaction?
> > > > >>
> > > > >> Otherwise, have to dig in a bit more here.. This stuff is a little
> > > > >> involved.
> > > > >> St.Ack
> > > > >>
> > > > >>
> > > > >>
> > > > >>
> > > > >> > On Fri, May 27, 2016 at 3:13 PM, Stack <st...@duboce.net>
> wrote:
> > > > >> >
> > > > >> > > On Fri, May 27, 2016 at 2:32 PM, Tianying Chang <
> > > tychang@gmail.com>
> > > > >> > wrote:
> > > > >> > >
> > > > >> > > > Hi,
> > > > >> > > >
> > > > >> > > > We saw a very strange case in one of our production
> cluster. A
> > > > >> couple
> > > > >> > > > regions cannot get their deleted rows or delete marker
> removed
> > > > even
> > > > >> > after
> > > > >> > > > major compaction. However when the region triggered split
> (we
> > > set
> > > > >> 100G
> > > > >> > > for
> > > > >> > > > auto split), the deletion worked. The 100G region becomes
> two
> > > 10G
> > > > >> > > daughter
> > > > >> > > > regions, and all the delete marker are gone.
> > > > >> > > >
> > > > >> > > > Also, the same region in the slave cluster (through
> > replication)
> > > > >> have
> > > > >> > > > normal size at about 20+G.
> > > > >> > > >
> > > > >> > > > BTW, the delete marker in the regions are mostly
> deleteFamily
> > if
> > > > it
> > > > >> > > > matters.
> > > > >> > > >
> > > > >> > > > This is really weird. Anyone has any clue for this strange
> > > > behavior?
> > > > >> > > >
> > > > >> > > > Thanks
> > > > >> > > > Tian-Ying
> > > > >> > > >
> > > > >> > > > These 0.94 Tian-Ying?
> > > > >> > >
> > > > >> > > It looks like the DeleteFamily is retained only; do you see
> > > > incidence
> > > > >> > where
> > > > >> > > there may have been versions older than the DeleteFamily that
> > are
> > > > also
> > > > >> > > retained post-major-compaction?
> > > > >> > >
> > > > >> > > St.Ack
> > > > >> > >
> > > > >> > >
> > > > >> > >
> > > > >> > > > A snippet of the HFile generated by the major compaction:
> > > > >> > > >
> > > > >> > > > : \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@
> > > > >> > > > \x10\x00?PF/d:/1459808114380/DeleteFamily/vlen=0/ts=
> > 2292870047
> > > > >> > > > V:
> > > > >> > > > K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@
> > > > >> > > > \x10\x00?PF/d:/1459808114011/DeleteFamily/vlen=0/ts=
> > 2292869794
> > > > >> > > > V:
> > > > >> > > > K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@
> > > > >> > > >
> \x10\x00?PF/d:/1459805381104/DeleteFamily/vlen=0/ts=2291072240
> > > > >> > > > V:
> > > > >> > > > K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@
> > > > >> > > >
> \x10\x00?PF/d:/1459805380673/DeleteFamily/vlen=0/ts=2291071997
> > > > >> > > > V:
> > > > >> > > > K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@
> > > > >> > > >
> \x10\x00?PF/d:/1459802643449/DeleteFamily/vlen=0/ts=2290248886
> > > > >> > > > V:
> > > > >> > > > K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@
> > > > >> > > >
> \x10\x00?PF/d:/1459802643246/DeleteFamily/vlen=0/ts=2290248786
> > > > >> > > > V:
> > > > >> > > > K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@
> > > > >> > > > \x10\x00?PF/d:/1459799913003/DeleteFamily/vlen=0/ts=
> > 2289446916
> > > > >> > > > V:
> > > > >> > > > K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@
> > > > >> > > > \x10\x00?PF/d:/1459797181831/DeleteFamily/vlen=0/ts=
> > 2288670451
> > > > >> > > > V:
> > > > >> > > > K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@
> > > > >> > > > \x10\x00?PF/d:/1459794447388/DeleteFamily/vlen=0/ts=
> > 2287911443
> > > > >> > > > V:
> > > > >> > > > K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@
> > > > >> > > > \x10\x00?PF/d:/1459791708803/DeleteFamily/vlen=0/ts=
> > 2287213792
> > > > >> > > > V:
> > > > >> > > > K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@
> > > > >> > > > \x10\x00?PF/d:/1459788978387/DeleteFamily/vlen=0/ts=
> > 2286488738
> > > > >> > > > V:
> > > > >> > > > K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@
> > > > >> > > > \x10\x00?PF/d:/1459786243642/DeleteFamily/vlen=0/ts=
> > 2285778927
> > > > >> > > > V:
> > > > >> > > >
> > > > >> > >
> > > > >> >
> > > > >>
> > > > >
> > > > >
> > > >
> > >
> >
>

Re: Major compaction cannot remove deleted rows until the region is split. Strange!

Posted by Stack <st...@duboce.net>.

Tian-ying:

Can you try the suggestion below to see if it helps?

Do you see the lease expired logs?

St.Ack


On Thu, Jun 2, 2016 at 7:03 PM, hao zhang <zg...@gmail.com> wrote:

> Hi, Stack
>
>   We found this in our production cluster, too.  I take a look about the
> code and found one case which will make the major compact not work.
>
>   1. put some rows
>   2. scan
>   3. delete
>   4. scanner didn't close rightly, it will keep the  the read point in
> region's scannerReadPoints.
>   5. major compact. It can't work because we have a scanner which have
> small read point than the delete.
>
>   But if move the region to new rs, the scannerReadPoints will update to
> the biggest memstoreTs which form sotrefile. So major compact will work.
>   I thought the try...catch module in Scanner.leaseExpired() method needs a
> finally module to close the region scanner rightly.
>
>     public void leaseExpired() {
>       RegionScannerHolder rsh = scanners.remove(this.scannerName);
>       if (rsh != null) {
>         RegionScanner s = rsh.s;
>         LOG.info("Scanner " + this.scannerName + " lease expired on region
> "
>           + s.getRegionInfo().getRegionNameAsString());
>         try {
>           Region region =
> regionServer.getRegion(s.getRegionInfo().getRegionName());
>           if (region != null && region.getCoprocessorHost() != null) {
>             region.getCoprocessorHost().preScannerClose(s);
>           }
>           s.close();
>           if (region != null && region.getCoprocessorHost() != null) {
>             region.getCoprocessorHost().postScannerClose(s);
>           }
>         } catch (IOException e) {
>           LOG.error("Closing scanner for "
>             + s.getRegionInfo().getRegionNameAsString(), e);
>         }
>       } else {
>         LOG.warn("Scanner " + this.scannerName + " lease expired, but no
> related" +
>           " scanner found, hence no chance to close that related
> scanner!");
>       }
>     }
>
>
> 2016-06-02 2:50 GMT+08:00 Stack <st...@duboce.net>:
>
> > On Wed, Jun 1, 2016 at 10:56 AM, Tianying Chang <ty...@gmail.com>
> wrote:
> >
> > > Hi, Stack
> > >
> > > After moving the region and issue a major compact on that region, its
> > size
> > > shrink from 99G down to 24G. So it looks like the region is in a bad
> > state
> > > that cannot recover, close/open it fixed the issue. And from the region
> > > size metric graph, we can see major compaction stop working  since
> March
> > > 31, so some bug that caused region enter into bad state...
> Unfortunately,
> > > we don't have DEBUG enabled and that is the last region that has the
> > issue,
> > > it is hard to figure out what is the bug that caused the bad state...
> > >
> > >
> > Interesting. So moving it to another RS make it major-compactable? That
> > would seem to indicate some state kept in the RS memory is preventing the
> > major compaction running. Is moving the region a workaround for you until
> > we figure what it is Tian-Ying?
> >
> > St.
> >
> >
> >
> > > Thanks
> > > Tian-Ying
> > >
> > > On Tue, May 31, 2016 at 3:43 PM, Tianying Chang <ty...@gmail.com>
> > wrote:
> > >
> > > > Hi, Stack
> > > >
> > > > Based on the log, the major compaction was run, and it took 5+ hours.
> > > And
> > > > I also manually run major_compact from hbase shell explicitly to
> > verify.
> > > >
> > > > I just moved the region to a different RS and issued a major_compact
> on
> > > > that region again, let me see if the major compaction can succeed and
> > > will
> > > > report back.
> > > >
> > > > Thanks
> > > > Tian-Ying
> > > >
> > > > On Sun, May 29, 2016 at 4:35 PM, Stack <st...@duboce.net> wrote:
> > > >
> > > >> On Fri, May 27, 2016 at 3:17 PM, Tianying Chang <ty...@gmail.com>
> > > >> wrote:
> > > >>
> > > >> > Yes, it is 94.26.  By a quick glance, I didn't  see any put that
> is
> > > >> older
> > > >> > than the delete marker's TS, which could go as far as about couple
> > > weeks
> > > >> > ago since major compaction on it for long time seems.
> > > >> >
> > > >> Also it is really strange that if the region is split, then seems
> > > >> > everything is working as expected. Also we noticed, the same
> region
> > > >> > replicated at the slave side is totally normal, i.e. at 20+G....
> > > >> >
> > > >> >
> > > >> If you move the region to another server, does that work?
> > > >>
> > > >> Looking in 0.94 codebase, I see this in Compactor#compact
> > > >>
> > > >>
> > > >>       // For major compactions calculate the earliest put timestamp
> > > >>
> > > >>       // of all involved storefiles. This is used to remove
> > > >>
> > > >>       // family delete marker during the compaction.
> > > >>
> > > >>       if (majorCompaction) {
> > > >>
> > > >>         tmp = fileInfo.get(StoreFile.EARLIEST_PUT_TS);
> > > >>
> > > >>         if (tmp == null) {
> > > >>
> > > >>           // There's a file with no information, must be an old one
> > > >>
> > > >>           // assume we have very old puts
> > > >>
> > > >>           earliestPutTs = HConstants.OLDEST_TIMESTAMP;
> > > >>
> > > >>         } else {
> > > >>
> > > >>           earliestPutTs = Math.min(earliestPutTs,
> Bytes.toLong(tmp));
> > > >>
> > > >>         }
> > > >>
> > > >>       }
> > > >>
> > > >>
> > > >> The above is followed by this log line:
> > > >>
> > > >>
> > > >>       if (LOG.isDebugEnabled()) {
> > > >>
> > > >>         LOG.debug("Compacting " + file +
> > > >>
> > > >>           ", keycount=" + keyCount +
> > > >>
> > > >>           ", bloomtype=" + r.getBloomFilterType().toString() +
> > > >>
> > > >>           ", size=" + StringUtils.humanReadableInt(r.length()) +
> > > >>
> > > >>           ", encoding=" + r.getHFileReader().getEncodingOnDisk() +
> > > >>
> > > >>           (majorCompaction? ", earliestPutTs=" + earliestPutTs:
> ""));
> > > >>
> > > >>       }
> > > >>
> > > >> This prints out earliestPutTs. You see that in the logs?  You
> running
> > > with
> > > >> DEBUG? Does the earliest put ts preclude our dropping delete family?
> > > >>
> > > >>
> > > >> Looking more in code, we retain deletes in following circumstances:
> > > >>
> > > >>
> > > >>     this.retainDeletesInOutput = scanType == ScanType.MINOR_COMPACT
> ||
> > > >> scan
> > > >> .isRaw();
> > > >>
> > > >>
> > > >> So, for sure we are running major compaction?
> > > >>
> > > >> Otherwise, have to dig in a bit more here.. This stuff is a little
> > > >> involved.
> > > >> St.Ack
> > > >>
> > > >>
> > > >>
> > > >>
> > > >> > On Fri, May 27, 2016 at 3:13 PM, Stack <st...@duboce.net> wrote:
> > > >> >
> > > >> > > On Fri, May 27, 2016 at 2:32 PM, Tianying Chang <
> > tychang@gmail.com>
> > > >> > wrote:
> > > >> > >
> > > >> > > > Hi,
> > > >> > > >
> > > >> > > > We saw a very strange case in one of our production cluster. A
> > > >> couple
> > > >> > > > regions cannot get their deleted rows or delete marker removed
> > > even
> > > >> > after
> > > >> > > > major compaction. However when the region triggered split (we
> > set
> > > >> 100G
> > > >> > > for
> > > >> > > > auto split), the deletion worked. The 100G region becomes two
> > 10G
> > > >> > > daughter
> > > >> > > > regions, and all the delete marker are gone.
> > > >> > > >
> > > >> > > > Also, the same region in the slave cluster (through
> replication)
> > > >> have
> > > >> > > > normal size at about 20+G.
> > > >> > > >
> > > >> > > > BTW, the delete marker in the regions are mostly deleteFamily
> if
> > > it
> > > >> > > > matters.
> > > >> > > >
> > > >> > > > This is really weird. Anyone has any clue for this strange
> > > behavior?
> > > >> > > >
> > > >> > > > Thanks
> > > >> > > > Tian-Ying
> > > >> > > >
> > > >> > > > These 0.94 Tian-Ying?
> > > >> > >
> > > >> > > It looks like the DeleteFamily is retained only; do you see
> > > incidence
> > > >> > where
> > > >> > > there may have been versions older than the DeleteFamily that
> are
> > > also
> > > >> > > retained post-major-compaction?
> > > >> > >
> > > >> > > St.Ack
> > > >> > >
> > > >> > >
> > > >> > >
> > > >> > > > A snippet of the HFile generated by the major compaction:
> > > >> > > >
> > > >> > > > : \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@
> > > >> > > > \x10\x00?PF/d:/1459808114380/DeleteFamily/vlen=0/ts=
> 2292870047
> > > >> > > > V:
> > > >> > > > K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@
> > > >> > > > \x10\x00?PF/d:/1459808114011/DeleteFamily/vlen=0/ts=
> 2292869794
> > > >> > > > V:
> > > >> > > > K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@
> > > >> > > > \x10\x00?PF/d:/1459805381104/DeleteFamily/vlen=0/ts=2291072240
> > > >> > > > V:
> > > >> > > > K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@
> > > >> > > > \x10\x00?PF/d:/1459805380673/DeleteFamily/vlen=0/ts=2291071997
> > > >> > > > V:
> > > >> > > > K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@
> > > >> > > > \x10\x00?PF/d:/1459802643449/DeleteFamily/vlen=0/ts=2290248886
> > > >> > > > V:
> > > >> > > > K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@
> > > >> > > > \x10\x00?PF/d:/1459802643246/DeleteFamily/vlen=0/ts=2290248786
> > > >> > > > V:
> > > >> > > > K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@
> > > >> > > > \x10\x00?PF/d:/1459799913003/DeleteFamily/vlen=0/ts=
> 2289446916
> > > >> > > > V:
> > > >> > > > K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@
> > > >> > > > \x10\x00?PF/d:/1459797181831/DeleteFamily/vlen=0/ts=
> 2288670451
> > > >> > > > V:
> > > >> > > > K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@
> > > >> > > > \x10\x00?PF/d:/1459794447388/DeleteFamily/vlen=0/ts=
> 2287911443
> > > >> > > > V:
> > > >> > > > K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@
> > > >> > > > \x10\x00?PF/d:/1459791708803/DeleteFamily/vlen=0/ts=
> 2287213792
> > > >> > > > V:
> > > >> > > > K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@
> > > >> > > > \x10\x00?PF/d:/1459788978387/DeleteFamily/vlen=0/ts=
> 2286488738
> > > >> > > > V:
> > > >> > > > K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@
> > > >> > > > \x10\x00?PF/d:/1459786243642/DeleteFamily/vlen=0/ts=
> 2285778927
> > > >> > > > V:
> > > >> > > >
> > > >> > >
> > > >> >
> > > >>
> > > >
> > > >
> > >
> >
>

Re: Major compaction cannot remove deleted rows until the region is split. Strange!

Posted by Stack <st...@duboce.net>.

Thanks for filing HBASE-16012 Guanghao Zhang...
St.Ack

On Thu, Jun 2, 2016 at 10:06 PM, Anoop John <an...@gmail.com> wrote:

> So u feel the call  s.close() would have created some Exception and
> the removal of entry from scannerReadPoints would not have happened.
> hmm.. sounds possible
>
> public synchronized void close() {
>       if (storeHeap != null) {
>         storeHeap.close();
>         storeHeap = null;
>       }
>       if (joinedHeap != null) {
>         joinedHeap.close();
>         joinedHeap = null;
>       }
>       // no need to synchronize here.
>       scannerReadPoints.remove(this);
>       this.filterClosed = true;
>     }
>
> We may need to change this close impl and make sure the removal from
> scannerReadPoints happen in a finally block.
>
> -Anoop-
>
>
> On Fri, Jun 3, 2016 at 7:33 AM, hao zhang <zg...@gmail.com> wrote:
> > Hi, Stack
> >
> >   We found this in our production cluster, too.  I take a look about the
> > code and found one case which will make the major compact not work.
> >
> >   1. put some rows
> >   2. scan
> >   3. delete
> >   4. scanner didn't close rightly, it will keep the  the read point in
> > region's scannerReadPoints.
> >   5. major compact. It can't work because we have a scanner which have
> > small read point than the delete.
> >
> >   But if move the region to new rs, the scannerReadPoints will update to
> > the biggest memstoreTs which form sotrefile. So major compact will work.
> >   I thought the try...catch module in Scanner.leaseExpired() method
> needs a
> > finally module to close the region scanner rightly.
> >
> >     public void leaseExpired() {
> >       RegionScannerHolder rsh = scanners.remove(this.scannerName);
> >       if (rsh != null) {
> >         RegionScanner s = rsh.s;
> >         LOG.info("Scanner " + this.scannerName + " lease expired on
> region "
> >           + s.getRegionInfo().getRegionNameAsString());
> >         try {
> >           Region region =
> > regionServer.getRegion(s.getRegionInfo().getRegionName());
> >           if (region != null && region.getCoprocessorHost() != null) {
> >             region.getCoprocessorHost().preScannerClose(s);
> >           }
> >           s.close();
> >           if (region != null && region.getCoprocessorHost() != null) {
> >             region.getCoprocessorHost().postScannerClose(s);
> >           }
> >         } catch (IOException e) {
> >           LOG.error("Closing scanner for "
> >             + s.getRegionInfo().getRegionNameAsString(), e);
> >         }
> >       } else {
> >         LOG.warn("Scanner " + this.scannerName + " lease expired, but no
> > related" +
> >           " scanner found, hence no chance to close that related
> scanner!");
> >       }
> >     }
> >
> >
> > 2016-06-02 2:50 GMT+08:00 Stack <st...@duboce.net>:
> >
> >> On Wed, Jun 1, 2016 at 10:56 AM, Tianying Chang <ty...@gmail.com>
> wrote:
> >>
> >> > Hi, Stack
> >> >
> >> > After moving the region and issue a major compact on that region, its
> >> size
> >> > shrink from 99G down to 24G. So it looks like the region is in a bad
> >> state
> >> > that cannot recover, close/open it fixed the issue. And from the
> region
> >> > size metric graph, we can see major compaction stop working  since
> March
> >> > 31, so some bug that caused region enter into bad state...
> Unfortunately,
> >> > we don't have DEBUG enabled and that is the last region that has the
> >> issue,
> >> > it is hard to figure out what is the bug that caused the bad state...
> >> >
> >> >
> >> Interesting. So moving it to another RS make it major-compactable? That
> >> would seem to indicate some state kept in the RS memory is preventing
> the
> >> major compaction running. Is moving the region a workaround for you
> until
> >> we figure what it is Tian-Ying?
> >>
> >> St.
> >>
> >>
> >>
> >> > Thanks
> >> > Tian-Ying
> >> >
> >> > On Tue, May 31, 2016 at 3:43 PM, Tianying Chang <ty...@gmail.com>
> >> wrote:
> >> >
> >> > > Hi, Stack
> >> > >
> >> > > Based on the log, the major compaction was run, and it took 5+
> hours.
> >> > And
> >> > > I also manually run major_compact from hbase shell explicitly to
> >> verify.
> >> > >
> >> > > I just moved the region to a different RS and issued a
> major_compact on
> >> > > that region again, let me see if the major compaction can succeed
> and
> >> > will
> >> > > report back.
> >> > >
> >> > > Thanks
> >> > > Tian-Ying
> >> > >
> >> > > On Sun, May 29, 2016 at 4:35 PM, Stack <st...@duboce.net> wrote:
> >> > >
> >> > >> On Fri, May 27, 2016 at 3:17 PM, Tianying Chang <tychang@gmail.com
> >
> >> > >> wrote:
> >> > >>
> >> > >> > Yes, it is 94.26.  By a quick glance, I didn't  see any put that
> is
> >> > >> older
> >> > >> > than the delete marker's TS, which could go as far as about
> couple
> >> > weeks
> >> > >> > ago since major compaction on it for long time seems.
> >> > >> >
> >> > >> Also it is really strange that if the region is split, then seems
> >> > >> > everything is working as expected. Also we noticed, the same
> region
> >> > >> > replicated at the slave side is totally normal, i.e. at 20+G....
> >> > >> >
> >> > >> >
> >> > >> If you move the region to another server, does that work?
> >> > >>
> >> > >> Looking in 0.94 codebase, I see this in Compactor#compact
> >> > >>
> >> > >>
> >> > >>       // For major compactions calculate the earliest put timestamp
> >> > >>
> >> > >>       // of all involved storefiles. This is used to remove
> >> > >>
> >> > >>       // family delete marker during the compaction.
> >> > >>
> >> > >>       if (majorCompaction) {
> >> > >>
> >> > >>         tmp = fileInfo.get(StoreFile.EARLIEST_PUT_TS);
> >> > >>
> >> > >>         if (tmp == null) {
> >> > >>
> >> > >>           // There's a file with no information, must be an old one
> >> > >>
> >> > >>           // assume we have very old puts
> >> > >>
> >> > >>           earliestPutTs = HConstants.OLDEST_TIMESTAMP;
> >> > >>
> >> > >>         } else {
> >> > >>
> >> > >>           earliestPutTs = Math.min(earliestPutTs,
> Bytes.toLong(tmp));
> >> > >>
> >> > >>         }
> >> > >>
> >> > >>       }
> >> > >>
> >> > >>
> >> > >> The above is followed by this log line:
> >> > >>
> >> > >>
> >> > >>       if (LOG.isDebugEnabled()) {
> >> > >>
> >> > >>         LOG.debug("Compacting " + file +
> >> > >>
> >> > >>           ", keycount=" + keyCount +
> >> > >>
> >> > >>           ", bloomtype=" + r.getBloomFilterType().toString() +
> >> > >>
> >> > >>           ", size=" + StringUtils.humanReadableInt(r.length()) +
> >> > >>
> >> > >>           ", encoding=" + r.getHFileReader().getEncodingOnDisk() +
> >> > >>
> >> > >>           (majorCompaction? ", earliestPutTs=" + earliestPutTs:
> ""));
> >> > >>
> >> > >>       }
> >> > >>
> >> > >> This prints out earliestPutTs. You see that in the logs?  You
> running
> >> > with
> >> > >> DEBUG? Does the earliest put ts preclude our dropping delete
> family?
> >> > >>
> >> > >>
> >> > >> Looking more in code, we retain deletes in following circumstances:
> >> > >>
> >> > >>
> >> > >>     this.retainDeletesInOutput = scanType ==
> ScanType.MINOR_COMPACT ||
> >> > >> scan
> >> > >> .isRaw();
> >> > >>
> >> > >>
> >> > >> So, for sure we are running major compaction?
> >> > >>
> >> > >> Otherwise, have to dig in a bit more here.. This stuff is a little
> >> > >> involved.
> >> > >> St.Ack
> >> > >>
> >> > >>
> >> > >>
> >> > >>
> >> > >> > On Fri, May 27, 2016 at 3:13 PM, Stack <st...@duboce.net> wrote:
> >> > >> >
> >> > >> > > On Fri, May 27, 2016 at 2:32 PM, Tianying Chang <
> >> tychang@gmail.com>
> >> > >> > wrote:
> >> > >> > >
> >> > >> > > > Hi,
> >> > >> > > >
> >> > >> > > > We saw a very strange case in one of our production cluster.
> A
> >> > >> couple
> >> > >> > > > regions cannot get their deleted rows or delete marker
> removed
> >> > even
> >> > >> > after
> >> > >> > > > major compaction. However when the region triggered split (we
> >> set
> >> > >> 100G
> >> > >> > > for
> >> > >> > > > auto split), the deletion worked. The 100G region becomes two
> >> 10G
> >> > >> > > daughter
> >> > >> > > > regions, and all the delete marker are gone.
> >> > >> > > >
> >> > >> > > > Also, the same region in the slave cluster (through
> replication)
> >> > >> have
> >> > >> > > > normal size at about 20+G.
> >> > >> > > >
> >> > >> > > > BTW, the delete marker in the regions are mostly
> deleteFamily if
> >> > it
> >> > >> > > > matters.
> >> > >> > > >
> >> > >> > > > This is really weird. Anyone has any clue for this strange
> >> > behavior?
> >> > >> > > >
> >> > >> > > > Thanks
> >> > >> > > > Tian-Ying
> >> > >> > > >
> >> > >> > > > These 0.94 Tian-Ying?
> >> > >> > >
> >> > >> > > It looks like the DeleteFamily is retained only; do you see
> >> > incidence
> >> > >> > where
> >> > >> > > there may have been versions older than the DeleteFamily that
> are
> >> > also
> >> > >> > > retained post-major-compaction?
> >> > >> > >
> >> > >> > > St.Ack
> >> > >> > >
> >> > >> > >
> >> > >> > >
> >> > >> > > > A snippet of the HFile generated by the major compaction:
> >> > >> > > >
> >> > >> > > > : \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@
> >> > >> > > > \x10\x00?PF/d:/1459808114380/DeleteFamily/vlen=0/ts=
> 2292870047
> >> > >> > > > V:
> >> > >> > > > K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@
> >> > >> > > > \x10\x00?PF/d:/1459808114011/DeleteFamily/vlen=0/ts=
> 2292869794
> >> > >> > > > V:
> >> > >> > > > K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@
> >> > >> > > >
> \x10\x00?PF/d:/1459805381104/DeleteFamily/vlen=0/ts=2291072240
> >> > >> > > > V:
> >> > >> > > > K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@
> >> > >> > > >
> \x10\x00?PF/d:/1459805380673/DeleteFamily/vlen=0/ts=2291071997
> >> > >> > > > V:
> >> > >> > > > K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@
> >> > >> > > >
> \x10\x00?PF/d:/1459802643449/DeleteFamily/vlen=0/ts=2290248886
> >> > >> > > > V:
> >> > >> > > > K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@
> >> > >> > > >
> \x10\x00?PF/d:/1459802643246/DeleteFamily/vlen=0/ts=2290248786
> >> > >> > > > V:
> >> > >> > > > K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@
> >> > >> > > > \x10\x00?PF/d:/1459799913003/DeleteFamily/vlen=0/ts=
> 2289446916
> >> > >> > > > V:
> >> > >> > > > K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@
> >> > >> > > > \x10\x00?PF/d:/1459797181831/DeleteFamily/vlen=0/ts=
> 2288670451
> >> > >> > > > V:
> >> > >> > > > K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@
> >> > >> > > > \x10\x00?PF/d:/1459794447388/DeleteFamily/vlen=0/ts=
> 2287911443
> >> > >> > > > V:
> >> > >> > > > K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@
> >> > >> > > > \x10\x00?PF/d:/1459791708803/DeleteFamily/vlen=0/ts=
> 2287213792
> >> > >> > > > V:
> >> > >> > > > K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@
> >> > >> > > > \x10\x00?PF/d:/1459788978387/DeleteFamily/vlen=0/ts=
> 2286488738
> >> > >> > > > V:
> >> > >> > > > K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@
> >> > >> > > > \x10\x00?PF/d:/1459786243642/DeleteFamily/vlen=0/ts=
> 2285778927
> >> > >> > > > V:
> >> > >> > > >
> >> > >> > >
> >> > >> >
> >> > >>
> >> > >
> >> > >
> >> >
> >>
>

Re: Major compaction cannot remove deleted rows until the region is split. Strange!

Posted by Anoop John <an...@gmail.com>.

So u feel the call  s.close() would have created some Exception and
the removal of entry from scannerReadPoints would not have happened.
hmm.. sounds possible

public synchronized void close() {
      if (storeHeap != null) {
        storeHeap.close();
        storeHeap = null;
      }
      if (joinedHeap != null) {
        joinedHeap.close();
        joinedHeap = null;
      }
      // no need to synchronize here.
      scannerReadPoints.remove(this);
      this.filterClosed = true;
    }

We may need to change this close impl and make sure the removal from
scannerReadPoints happen in a finally block.

-Anoop-


On Fri, Jun 3, 2016 at 7:33 AM, hao zhang <zg...@gmail.com> wrote:
> Hi, Stack
>
>   We found this in our production cluster, too.  I take a look about the
> code and found one case which will make the major compact not work.
>
>   1. put some rows
>   2. scan
>   3. delete
>   4. scanner didn't close rightly, it will keep the  the read point in
> region's scannerReadPoints.
>   5. major compact. It can't work because we have a scanner which have
> small read point than the delete.
>
>   But if move the region to new rs, the scannerReadPoints will update to
> the biggest memstoreTs which form sotrefile. So major compact will work.
>   I thought the try...catch module in Scanner.leaseExpired() method needs a
> finally module to close the region scanner rightly.
>
>     public void leaseExpired() {
>       RegionScannerHolder rsh = scanners.remove(this.scannerName);
>       if (rsh != null) {
>         RegionScanner s = rsh.s;
>         LOG.info("Scanner " + this.scannerName + " lease expired on region "
>           + s.getRegionInfo().getRegionNameAsString());
>         try {
>           Region region =
> regionServer.getRegion(s.getRegionInfo().getRegionName());
>           if (region != null && region.getCoprocessorHost() != null) {
>             region.getCoprocessorHost().preScannerClose(s);
>           }
>           s.close();
>           if (region != null && region.getCoprocessorHost() != null) {
>             region.getCoprocessorHost().postScannerClose(s);
>           }
>         } catch (IOException e) {
>           LOG.error("Closing scanner for "
>             + s.getRegionInfo().getRegionNameAsString(), e);
>         }
>       } else {
>         LOG.warn("Scanner " + this.scannerName + " lease expired, but no
> related" +
>           " scanner found, hence no chance to close that related scanner!");
>       }
>     }
>
>
> 2016-06-02 2:50 GMT+08:00 Stack <st...@duboce.net>:
>
>> On Wed, Jun 1, 2016 at 10:56 AM, Tianying Chang <ty...@gmail.com> wrote:
>>
>> > Hi, Stack
>> >
>> > After moving the region and issue a major compact on that region, its
>> size
>> > shrink from 99G down to 24G. So it looks like the region is in a bad
>> state
>> > that cannot recover, close/open it fixed the issue. And from the region
>> > size metric graph, we can see major compaction stop working  since March
>> > 31, so some bug that caused region enter into bad state... Unfortunately,
>> > we don't have DEBUG enabled and that is the last region that has the
>> issue,
>> > it is hard to figure out what is the bug that caused the bad state...
>> >
>> >
>> Interesting. So moving it to another RS make it major-compactable? That
>> would seem to indicate some state kept in the RS memory is preventing the
>> major compaction running. Is moving the region a workaround for you until
>> we figure what it is Tian-Ying?
>>
>> St.
>>
>>
>>
>> > Thanks
>> > Tian-Ying
>> >
>> > On Tue, May 31, 2016 at 3:43 PM, Tianying Chang <ty...@gmail.com>
>> wrote:
>> >
>> > > Hi, Stack
>> > >
>> > > Based on the log, the major compaction was run, and it took 5+ hours.
>> > And
>> > > I also manually run major_compact from hbase shell explicitly to
>> verify.
>> > >
>> > > I just moved the region to a different RS and issued a major_compact on
>> > > that region again, let me see if the major compaction can succeed and
>> > will
>> > > report back.
>> > >
>> > > Thanks
>> > > Tian-Ying
>> > >
>> > > On Sun, May 29, 2016 at 4:35 PM, Stack <st...@duboce.net> wrote:
>> > >
>> > >> On Fri, May 27, 2016 at 3:17 PM, Tianying Chang <ty...@gmail.com>
>> > >> wrote:
>> > >>
>> > >> > Yes, it is 94.26.  By a quick glance, I didn't  see any put that is
>> > >> older
>> > >> > than the delete marker's TS, which could go as far as about couple
>> > weeks
>> > >> > ago since major compaction on it for long time seems.
>> > >> >
>> > >> Also it is really strange that if the region is split, then seems
>> > >> > everything is working as expected. Also we noticed, the same region
>> > >> > replicated at the slave side is totally normal, i.e. at 20+G....
>> > >> >
>> > >> >
>> > >> If you move the region to another server, does that work?
>> > >>
>> > >> Looking in 0.94 codebase, I see this in Compactor#compact
>> > >>
>> > >>
>> > >>       // For major compactions calculate the earliest put timestamp
>> > >>
>> > >>       // of all involved storefiles. This is used to remove
>> > >>
>> > >>       // family delete marker during the compaction.
>> > >>
>> > >>       if (majorCompaction) {
>> > >>
>> > >>         tmp = fileInfo.get(StoreFile.EARLIEST_PUT_TS);
>> > >>
>> > >>         if (tmp == null) {
>> > >>
>> > >>           // There's a file with no information, must be an old one
>> > >>
>> > >>           // assume we have very old puts
>> > >>
>> > >>           earliestPutTs = HConstants.OLDEST_TIMESTAMP;
>> > >>
>> > >>         } else {
>> > >>
>> > >>           earliestPutTs = Math.min(earliestPutTs, Bytes.toLong(tmp));
>> > >>
>> > >>         }
>> > >>
>> > >>       }
>> > >>
>> > >>
>> > >> The above is followed by this log line:
>> > >>
>> > >>
>> > >>       if (LOG.isDebugEnabled()) {
>> > >>
>> > >>         LOG.debug("Compacting " + file +
>> > >>
>> > >>           ", keycount=" + keyCount +
>> > >>
>> > >>           ", bloomtype=" + r.getBloomFilterType().toString() +
>> > >>
>> > >>           ", size=" + StringUtils.humanReadableInt(r.length()) +
>> > >>
>> > >>           ", encoding=" + r.getHFileReader().getEncodingOnDisk() +
>> > >>
>> > >>           (majorCompaction? ", earliestPutTs=" + earliestPutTs: ""));
>> > >>
>> > >>       }
>> > >>
>> > >> This prints out earliestPutTs. You see that in the logs?  You running
>> > with
>> > >> DEBUG? Does the earliest put ts preclude our dropping delete family?
>> > >>
>> > >>
>> > >> Looking more in code, we retain deletes in following circumstances:
>> > >>
>> > >>
>> > >>     this.retainDeletesInOutput = scanType == ScanType.MINOR_COMPACT ||
>> > >> scan
>> > >> .isRaw();
>> > >>
>> > >>
>> > >> So, for sure we are running major compaction?
>> > >>
>> > >> Otherwise, have to dig in a bit more here.. This stuff is a little
>> > >> involved.
>> > >> St.Ack
>> > >>
>> > >>
>> > >>
>> > >>
>> > >> > On Fri, May 27, 2016 at 3:13 PM, Stack <st...@duboce.net> wrote:
>> > >> >
>> > >> > > On Fri, May 27, 2016 at 2:32 PM, Tianying Chang <
>> tychang@gmail.com>
>> > >> > wrote:
>> > >> > >
>> > >> > > > Hi,
>> > >> > > >
>> > >> > > > We saw a very strange case in one of our production cluster. A
>> > >> couple
>> > >> > > > regions cannot get their deleted rows or delete marker removed
>> > even
>> > >> > after
>> > >> > > > major compaction. However when the region triggered split (we
>> set
>> > >> 100G
>> > >> > > for
>> > >> > > > auto split), the deletion worked. The 100G region becomes two
>> 10G
>> > >> > > daughter
>> > >> > > > regions, and all the delete marker are gone.
>> > >> > > >
>> > >> > > > Also, the same region in the slave cluster (through replication)
>> > >> have
>> > >> > > > normal size at about 20+G.
>> > >> > > >
>> > >> > > > BTW, the delete marker in the regions are mostly deleteFamily if
>> > it
>> > >> > > > matters.
>> > >> > > >
>> > >> > > > This is really weird. Anyone has any clue for this strange
>> > behavior?
>> > >> > > >
>> > >> > > > Thanks
>> > >> > > > Tian-Ying
>> > >> > > >
>> > >> > > > These 0.94 Tian-Ying?
>> > >> > >
>> > >> > > It looks like the DeleteFamily is retained only; do you see
>> > incidence
>> > >> > where
>> > >> > > there may have been versions older than the DeleteFamily that are
>> > also
>> > >> > > retained post-major-compaction?
>> > >> > >
>> > >> > > St.Ack
>> > >> > >
>> > >> > >
>> > >> > >
>> > >> > > > A snippet of the HFile generated by the major compaction:
>> > >> > > >
>> > >> > > > : \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@
>> > >> > > > \x10\x00?PF/d:/1459808114380/DeleteFamily/vlen=0/ts=2292870047
>> > >> > > > V:
>> > >> > > > K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@
>> > >> > > > \x10\x00?PF/d:/1459808114011/DeleteFamily/vlen=0/ts=2292869794
>> > >> > > > V:
>> > >> > > > K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@
>> > >> > > > \x10\x00?PF/d:/1459805381104/DeleteFamily/vlen=0/ts=2291072240
>> > >> > > > V:
>> > >> > > > K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@
>> > >> > > > \x10\x00?PF/d:/1459805380673/DeleteFamily/vlen=0/ts=2291071997
>> > >> > > > V:
>> > >> > > > K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@
>> > >> > > > \x10\x00?PF/d:/1459802643449/DeleteFamily/vlen=0/ts=2290248886
>> > >> > > > V:
>> > >> > > > K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@
>> > >> > > > \x10\x00?PF/d:/1459802643246/DeleteFamily/vlen=0/ts=2290248786
>> > >> > > > V:
>> > >> > > > K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@
>> > >> > > > \x10\x00?PF/d:/1459799913003/DeleteFamily/vlen=0/ts=2289446916
>> > >> > > > V:
>> > >> > > > K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@
>> > >> > > > \x10\x00?PF/d:/1459797181831/DeleteFamily/vlen=0/ts=2288670451
>> > >> > > > V:
>> > >> > > > K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@
>> > >> > > > \x10\x00?PF/d:/1459794447388/DeleteFamily/vlen=0/ts=2287911443
>> > >> > > > V:
>> > >> > > > K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@
>> > >> > > > \x10\x00?PF/d:/1459791708803/DeleteFamily/vlen=0/ts=2287213792
>> > >> > > > V:
>> > >> > > > K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@
>> > >> > > > \x10\x00?PF/d:/1459788978387/DeleteFamily/vlen=0/ts=2286488738
>> > >> > > > V:
>> > >> > > > K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@
>> > >> > > > \x10\x00?PF/d:/1459786243642/DeleteFamily/vlen=0/ts=2285778927
>> > >> > > > V:
>> > >> > > >
>> > >> > >
>> > >> >
>> > >>
>> > >
>> > >
>> >
>>

Re: Major compaction cannot remove deleted rows until the region is split. Strange!

Posted by hao zhang <zg...@gmail.com>.

Hi, Stack

  We found this in our production cluster, too.  I take a look about the
code and found one case which will make the major compact not work.

  1. put some rows
  2. scan
  3. delete
  4. scanner didn't close rightly, it will keep the  the read point in
region's scannerReadPoints.
  5. major compact. It can't work because we have a scanner which have
small read point than the delete.

  But if move the region to new rs, the scannerReadPoints will update to
the biggest memstoreTs which form sotrefile. So major compact will work.
  I thought the try...catch module in Scanner.leaseExpired() method needs a
finally module to close the region scanner rightly.

    public void leaseExpired() {
      RegionScannerHolder rsh = scanners.remove(this.scannerName);
      if (rsh != null) {
        RegionScanner s = rsh.s;
        LOG.info("Scanner " + this.scannerName + " lease expired on region "
          + s.getRegionInfo().getRegionNameAsString());
        try {
          Region region =
regionServer.getRegion(s.getRegionInfo().getRegionName());
          if (region != null && region.getCoprocessorHost() != null) {
            region.getCoprocessorHost().preScannerClose(s);
          }
          s.close();
          if (region != null && region.getCoprocessorHost() != null) {
            region.getCoprocessorHost().postScannerClose(s);
          }
        } catch (IOException e) {
          LOG.error("Closing scanner for "
            + s.getRegionInfo().getRegionNameAsString(), e);
        }
      } else {
        LOG.warn("Scanner " + this.scannerName + " lease expired, but no
related" +
          " scanner found, hence no chance to close that related scanner!");
      }
    }


2016-06-02 2:50 GMT+08:00 Stack <st...@duboce.net>:

> On Wed, Jun 1, 2016 at 10:56 AM, Tianying Chang <ty...@gmail.com> wrote:
>
> > Hi, Stack
> >
> > After moving the region and issue a major compact on that region, its
> size
> > shrink from 99G down to 24G. So it looks like the region is in a bad
> state
> > that cannot recover, close/open it fixed the issue. And from the region
> > size metric graph, we can see major compaction stop working  since March
> > 31, so some bug that caused region enter into bad state... Unfortunately,
> > we don't have DEBUG enabled and that is the last region that has the
> issue,
> > it is hard to figure out what is the bug that caused the bad state...
> >
> >
> Interesting. So moving it to another RS make it major-compactable? That
> would seem to indicate some state kept in the RS memory is preventing the
> major compaction running. Is moving the region a workaround for you until
> we figure what it is Tian-Ying?
>
> St.
>
>
>
> > Thanks
> > Tian-Ying
> >
> > On Tue, May 31, 2016 at 3:43 PM, Tianying Chang <ty...@gmail.com>
> wrote:
> >
> > > Hi, Stack
> > >
> > > Based on the log, the major compaction was run, and it took 5+ hours.
> > And
> > > I also manually run major_compact from hbase shell explicitly to
> verify.
> > >
> > > I just moved the region to a different RS and issued a major_compact on
> > > that region again, let me see if the major compaction can succeed and
> > will
> > > report back.
> > >
> > > Thanks
> > > Tian-Ying
> > >
> > > On Sun, May 29, 2016 at 4:35 PM, Stack <st...@duboce.net> wrote:
> > >
> > >> On Fri, May 27, 2016 at 3:17 PM, Tianying Chang <ty...@gmail.com>
> > >> wrote:
> > >>
> > >> > Yes, it is 94.26.  By a quick glance, I didn't  see any put that is
> > >> older
> > >> > than the delete marker's TS, which could go as far as about couple
> > weeks
> > >> > ago since major compaction on it for long time seems.
> > >> >
> > >> Also it is really strange that if the region is split, then seems
> > >> > everything is working as expected. Also we noticed, the same region
> > >> > replicated at the slave side is totally normal, i.e. at 20+G....
> > >> >
> > >> >
> > >> If you move the region to another server, does that work?
> > >>
> > >> Looking in 0.94 codebase, I see this in Compactor#compact
> > >>
> > >>
> > >>       // For major compactions calculate the earliest put timestamp
> > >>
> > >>       // of all involved storefiles. This is used to remove
> > >>
> > >>       // family delete marker during the compaction.
> > >>
> > >>       if (majorCompaction) {
> > >>
> > >>         tmp = fileInfo.get(StoreFile.EARLIEST_PUT_TS);
> > >>
> > >>         if (tmp == null) {
> > >>
> > >>           // There's a file with no information, must be an old one
> > >>
> > >>           // assume we have very old puts
> > >>
> > >>           earliestPutTs = HConstants.OLDEST_TIMESTAMP;
> > >>
> > >>         } else {
> > >>
> > >>           earliestPutTs = Math.min(earliestPutTs, Bytes.toLong(tmp));
> > >>
> > >>         }
> > >>
> > >>       }
> > >>
> > >>
> > >> The above is followed by this log line:
> > >>
> > >>
> > >>       if (LOG.isDebugEnabled()) {
> > >>
> > >>         LOG.debug("Compacting " + file +
> > >>
> > >>           ", keycount=" + keyCount +
> > >>
> > >>           ", bloomtype=" + r.getBloomFilterType().toString() +
> > >>
> > >>           ", size=" + StringUtils.humanReadableInt(r.length()) +
> > >>
> > >>           ", encoding=" + r.getHFileReader().getEncodingOnDisk() +
> > >>
> > >>           (majorCompaction? ", earliestPutTs=" + earliestPutTs: ""));
> > >>
> > >>       }
> > >>
> > >> This prints out earliestPutTs. You see that in the logs?  You running
> > with
> > >> DEBUG? Does the earliest put ts preclude our dropping delete family?
> > >>
> > >>
> > >> Looking more in code, we retain deletes in following circumstances:
> > >>
> > >>
> > >>     this.retainDeletesInOutput = scanType == ScanType.MINOR_COMPACT ||
> > >> scan
> > >> .isRaw();
> > >>
> > >>
> > >> So, for sure we are running major compaction?
> > >>
> > >> Otherwise, have to dig in a bit more here.. This stuff is a little
> > >> involved.
> > >> St.Ack
> > >>
> > >>
> > >>
> > >>
> > >> > On Fri, May 27, 2016 at 3:13 PM, Stack <st...@duboce.net> wrote:
> > >> >
> > >> > > On Fri, May 27, 2016 at 2:32 PM, Tianying Chang <
> tychang@gmail.com>
> > >> > wrote:
> > >> > >
> > >> > > > Hi,
> > >> > > >
> > >> > > > We saw a very strange case in one of our production cluster. A
> > >> couple
> > >> > > > regions cannot get their deleted rows or delete marker removed
> > even
> > >> > after
> > >> > > > major compaction. However when the region triggered split (we
> set
> > >> 100G
> > >> > > for
> > >> > > > auto split), the deletion worked. The 100G region becomes two
> 10G
> > >> > > daughter
> > >> > > > regions, and all the delete marker are gone.
> > >> > > >
> > >> > > > Also, the same region in the slave cluster (through replication)
> > >> have
> > >> > > > normal size at about 20+G.
> > >> > > >
> > >> > > > BTW, the delete marker in the regions are mostly deleteFamily if
> > it
> > >> > > > matters.
> > >> > > >
> > >> > > > This is really weird. Anyone has any clue for this strange
> > behavior?
> > >> > > >
> > >> > > > Thanks
> > >> > > > Tian-Ying
> > >> > > >
> > >> > > > These 0.94 Tian-Ying?
> > >> > >
> > >> > > It looks like the DeleteFamily is retained only; do you see
> > incidence
> > >> > where
> > >> > > there may have been versions older than the DeleteFamily that are
> > also
> > >> > > retained post-major-compaction?
> > >> > >
> > >> > > St.Ack
> > >> > >
> > >> > >
> > >> > >
> > >> > > > A snippet of the HFile generated by the major compaction:
> > >> > > >
> > >> > > > : \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@
> > >> > > > \x10\x00?PF/d:/1459808114380/DeleteFamily/vlen=0/ts=2292870047
> > >> > > > V:
> > >> > > > K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@
> > >> > > > \x10\x00?PF/d:/1459808114011/DeleteFamily/vlen=0/ts=2292869794
> > >> > > > V:
> > >> > > > K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@
> > >> > > > \x10\x00?PF/d:/1459805381104/DeleteFamily/vlen=0/ts=2291072240
> > >> > > > V:
> > >> > > > K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@
> > >> > > > \x10\x00?PF/d:/1459805380673/DeleteFamily/vlen=0/ts=2291071997
> > >> > > > V:
> > >> > > > K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@
> > >> > > > \x10\x00?PF/d:/1459802643449/DeleteFamily/vlen=0/ts=2290248886
> > >> > > > V:
> > >> > > > K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@
> > >> > > > \x10\x00?PF/d:/1459802643246/DeleteFamily/vlen=0/ts=2290248786
> > >> > > > V:
> > >> > > > K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@
> > >> > > > \x10\x00?PF/d:/1459799913003/DeleteFamily/vlen=0/ts=2289446916
> > >> > > > V:
> > >> > > > K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@
> > >> > > > \x10\x00?PF/d:/1459797181831/DeleteFamily/vlen=0/ts=2288670451
> > >> > > > V:
> > >> > > > K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@
> > >> > > > \x10\x00?PF/d:/1459794447388/DeleteFamily/vlen=0/ts=2287911443
> > >> > > > V:
> > >> > > > K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@
> > >> > > > \x10\x00?PF/d:/1459791708803/DeleteFamily/vlen=0/ts=2287213792
> > >> > > > V:
> > >> > > > K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@
> > >> > > > \x10\x00?PF/d:/1459788978387/DeleteFamily/vlen=0/ts=2286488738
> > >> > > > V:
> > >> > > > K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@
> > >> > > > \x10\x00?PF/d:/1459786243642/DeleteFamily/vlen=0/ts=2285778927
> > >> > > > V:
> > >> > > >
> > >> > >
> > >> >
> > >>
> > >
> > >
> >
>

Re: Major compaction cannot remove deleted rows until the region is split. Strange!

Posted by Stack <st...@duboce.net>.

On Wed, Jun 1, 2016 at 10:56 AM, Tianying Chang <ty...@gmail.com> wrote:

> Hi, Stack
>
> After moving the region and issue a major compact on that region, its size
> shrink from 99G down to 24G. So it looks like the region is in a bad state
> that cannot recover, close/open it fixed the issue. And from the region
> size metric graph, we can see major compaction stop working  since March
> 31, so some bug that caused region enter into bad state... Unfortunately,
> we don't have DEBUG enabled and that is the last region that has the issue,
> it is hard to figure out what is the bug that caused the bad state...
>
>
Interesting. So moving it to another RS make it major-compactable? That
would seem to indicate some state kept in the RS memory is preventing the
major compaction running. Is moving the region a workaround for you until
we figure what it is Tian-Ying?

St.



> Thanks
> Tian-Ying
>
> On Tue, May 31, 2016 at 3:43 PM, Tianying Chang <ty...@gmail.com> wrote:
>
> > Hi, Stack
> >
> > Based on the log, the major compaction was run, and it took 5+ hours.
> And
> > I also manually run major_compact from hbase shell explicitly to verify.
> >
> > I just moved the region to a different RS and issued a major_compact on
> > that region again, let me see if the major compaction can succeed and
> will
> > report back.
> >
> > Thanks
> > Tian-Ying
> >
> > On Sun, May 29, 2016 at 4:35 PM, Stack <st...@duboce.net> wrote:
> >
> >> On Fri, May 27, 2016 at 3:17 PM, Tianying Chang <ty...@gmail.com>
> >> wrote:
> >>
> >> > Yes, it is 94.26.  By a quick glance, I didn't  see any put that is
> >> older
> >> > than the delete marker's TS, which could go as far as about couple
> weeks
> >> > ago since major compaction on it for long time seems.
> >> >
> >> Also it is really strange that if the region is split, then seems
> >> > everything is working as expected. Also we noticed, the same region
> >> > replicated at the slave side is totally normal, i.e. at 20+G....
> >> >
> >> >
> >> If you move the region to another server, does that work?
> >>
> >> Looking in 0.94 codebase, I see this in Compactor#compact
> >>
> >>
> >>       // For major compactions calculate the earliest put timestamp
> >>
> >>       // of all involved storefiles. This is used to remove
> >>
> >>       // family delete marker during the compaction.
> >>
> >>       if (majorCompaction) {
> >>
> >>         tmp = fileInfo.get(StoreFile.EARLIEST_PUT_TS);
> >>
> >>         if (tmp == null) {
> >>
> >>           // There's a file with no information, must be an old one
> >>
> >>           // assume we have very old puts
> >>
> >>           earliestPutTs = HConstants.OLDEST_TIMESTAMP;
> >>
> >>         } else {
> >>
> >>           earliestPutTs = Math.min(earliestPutTs, Bytes.toLong(tmp));
> >>
> >>         }
> >>
> >>       }
> >>
> >>
> >> The above is followed by this log line:
> >>
> >>
> >>       if (LOG.isDebugEnabled()) {
> >>
> >>         LOG.debug("Compacting " + file +
> >>
> >>           ", keycount=" + keyCount +
> >>
> >>           ", bloomtype=" + r.getBloomFilterType().toString() +
> >>
> >>           ", size=" + StringUtils.humanReadableInt(r.length()) +
> >>
> >>           ", encoding=" + r.getHFileReader().getEncodingOnDisk() +
> >>
> >>           (majorCompaction? ", earliestPutTs=" + earliestPutTs: ""));
> >>
> >>       }
> >>
> >> This prints out earliestPutTs. You see that in the logs?  You running
> with
> >> DEBUG? Does the earliest put ts preclude our dropping delete family?
> >>
> >>
> >> Looking more in code, we retain deletes in following circumstances:
> >>
> >>
> >>     this.retainDeletesInOutput = scanType == ScanType.MINOR_COMPACT ||
> >> scan
> >> .isRaw();
> >>
> >>
> >> So, for sure we are running major compaction?
> >>
> >> Otherwise, have to dig in a bit more here.. This stuff is a little
> >> involved.
> >> St.Ack
> >>
> >>
> >>
> >>
> >> > On Fri, May 27, 2016 at 3:13 PM, Stack <st...@duboce.net> wrote:
> >> >
> >> > > On Fri, May 27, 2016 at 2:32 PM, Tianying Chang <ty...@gmail.com>
> >> > wrote:
> >> > >
> >> > > > Hi,
> >> > > >
> >> > > > We saw a very strange case in one of our production cluster. A
> >> couple
> >> > > > regions cannot get their deleted rows or delete marker removed
> even
> >> > after
> >> > > > major compaction. However when the region triggered split (we set
> >> 100G
> >> > > for
> >> > > > auto split), the deletion worked. The 100G region becomes two 10G
> >> > > daughter
> >> > > > regions, and all the delete marker are gone.
> >> > > >
> >> > > > Also, the same region in the slave cluster (through replication)
> >> have
> >> > > > normal size at about 20+G.
> >> > > >
> >> > > > BTW, the delete marker in the regions are mostly deleteFamily if
> it
> >> > > > matters.
> >> > > >
> >> > > > This is really weird. Anyone has any clue for this strange
> behavior?
> >> > > >
> >> > > > Thanks
> >> > > > Tian-Ying
> >> > > >
> >> > > > These 0.94 Tian-Ying?
> >> > >
> >> > > It looks like the DeleteFamily is retained only; do you see
> incidence
> >> > where
> >> > > there may have been versions older than the DeleteFamily that are
> also
> >> > > retained post-major-compaction?
> >> > >
> >> > > St.Ack
> >> > >
> >> > >
> >> > >
> >> > > > A snippet of the HFile generated by the major compaction:
> >> > > >
> >> > > > : \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@
> >> > > > \x10\x00?PF/d:/1459808114380/DeleteFamily/vlen=0/ts=2292870047
> >> > > > V:
> >> > > > K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@
> >> > > > \x10\x00?PF/d:/1459808114011/DeleteFamily/vlen=0/ts=2292869794
> >> > > > V:
> >> > > > K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@
> >> > > > \x10\x00?PF/d:/1459805381104/DeleteFamily/vlen=0/ts=2291072240
> >> > > > V:
> >> > > > K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@
> >> > > > \x10\x00?PF/d:/1459805380673/DeleteFamily/vlen=0/ts=2291071997
> >> > > > V:
> >> > > > K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@
> >> > > > \x10\x00?PF/d:/1459802643449/DeleteFamily/vlen=0/ts=2290248886
> >> > > > V:
> >> > > > K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@
> >> > > > \x10\x00?PF/d:/1459802643246/DeleteFamily/vlen=0/ts=2290248786
> >> > > > V:
> >> > > > K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@
> >> > > > \x10\x00?PF/d:/1459799913003/DeleteFamily/vlen=0/ts=2289446916
> >> > > > V:
> >> > > > K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@
> >> > > > \x10\x00?PF/d:/1459797181831/DeleteFamily/vlen=0/ts=2288670451
> >> > > > V:
> >> > > > K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@
> >> > > > \x10\x00?PF/d:/1459794447388/DeleteFamily/vlen=0/ts=2287911443
> >> > > > V:
> >> > > > K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@
> >> > > > \x10\x00?PF/d:/1459791708803/DeleteFamily/vlen=0/ts=2287213792
> >> > > > V:
> >> > > > K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@
> >> > > > \x10\x00?PF/d:/1459788978387/DeleteFamily/vlen=0/ts=2286488738
> >> > > > V:
> >> > > > K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@
> >> > > > \x10\x00?PF/d:/1459786243642/DeleteFamily/vlen=0/ts=2285778927
> >> > > > V:
> >> > > >
> >> > >
> >> >
> >>
> >
> >
>

Re: Major compaction cannot remove deleted rows until the region is split. Strange!

Posted by Tianying Chang <ty...@gmail.com>.

Hi, Stack

After moving the region and issue a major compact on that region, its size
shrink from 99G down to 24G. So it looks like the region is in a bad state
that cannot recover, close/open it fixed the issue. And from the region
size metric graph, we can see major compaction stop working  since March
31, so some bug that caused region enter into bad state... Unfortunately,
we don't have DEBUG enabled and that is the last region that has the issue,
it is hard to figure out what is the bug that caused the bad state...

Thanks
Tian-Ying

On Tue, May 31, 2016 at 3:43 PM, Tianying Chang <ty...@gmail.com> wrote:

> Hi, Stack
>
> Based on the log, the major compaction was run, and it took 5+ hours.  And
> I also manually run major_compact from hbase shell explicitly to verify.
>
> I just moved the region to a different RS and issued a major_compact on
> that region again, let me see if the major compaction can succeed and will
> report back.
>
> Thanks
> Tian-Ying
>
> On Sun, May 29, 2016 at 4:35 PM, Stack <st...@duboce.net> wrote:
>
>> On Fri, May 27, 2016 at 3:17 PM, Tianying Chang <ty...@gmail.com>
>> wrote:
>>
>> > Yes, it is 94.26.  By a quick glance, I didn't  see any put that is
>> older
>> > than the delete marker's TS, which could go as far as about couple weeks
>> > ago since major compaction on it for long time seems.
>> >
>> Also it is really strange that if the region is split, then seems
>> > everything is working as expected. Also we noticed, the same region
>> > replicated at the slave side is totally normal, i.e. at 20+G....
>> >
>> >
>> If you move the region to another server, does that work?
>>
>> Looking in 0.94 codebase, I see this in Compactor#compact
>>
>>
>>       // For major compactions calculate the earliest put timestamp
>>
>>       // of all involved storefiles. This is used to remove
>>
>>       // family delete marker during the compaction.
>>
>>       if (majorCompaction) {
>>
>>         tmp = fileInfo.get(StoreFile.EARLIEST_PUT_TS);
>>
>>         if (tmp == null) {
>>
>>           // There's a file with no information, must be an old one
>>
>>           // assume we have very old puts
>>
>>           earliestPutTs = HConstants.OLDEST_TIMESTAMP;
>>
>>         } else {
>>
>>           earliestPutTs = Math.min(earliestPutTs, Bytes.toLong(tmp));
>>
>>         }
>>
>>       }
>>
>>
>> The above is followed by this log line:
>>
>>
>>       if (LOG.isDebugEnabled()) {
>>
>>         LOG.debug("Compacting " + file +
>>
>>           ", keycount=" + keyCount +
>>
>>           ", bloomtype=" + r.getBloomFilterType().toString() +
>>
>>           ", size=" + StringUtils.humanReadableInt(r.length()) +
>>
>>           ", encoding=" + r.getHFileReader().getEncodingOnDisk() +
>>
>>           (majorCompaction? ", earliestPutTs=" + earliestPutTs: ""));
>>
>>       }
>>
>> This prints out earliestPutTs. You see that in the logs?  You running with
>> DEBUG? Does the earliest put ts preclude our dropping delete family?
>>
>>
>> Looking more in code, we retain deletes in following circumstances:
>>
>>
>>     this.retainDeletesInOutput = scanType == ScanType.MINOR_COMPACT ||
>> scan
>> .isRaw();
>>
>>
>> So, for sure we are running major compaction?
>>
>> Otherwise, have to dig in a bit more here.. This stuff is a little
>> involved.
>> St.Ack
>>
>>
>>
>>
>> > On Fri, May 27, 2016 at 3:13 PM, Stack <st...@duboce.net> wrote:
>> >
>> > > On Fri, May 27, 2016 at 2:32 PM, Tianying Chang <ty...@gmail.com>
>> > wrote:
>> > >
>> > > > Hi,
>> > > >
>> > > > We saw a very strange case in one of our production cluster. A
>> couple
>> > > > regions cannot get their deleted rows or delete marker removed even
>> > after
>> > > > major compaction. However when the region triggered split (we set
>> 100G
>> > > for
>> > > > auto split), the deletion worked. The 100G region becomes two 10G
>> > > daughter
>> > > > regions, and all the delete marker are gone.
>> > > >
>> > > > Also, the same region in the slave cluster (through replication)
>> have
>> > > > normal size at about 20+G.
>> > > >
>> > > > BTW, the delete marker in the regions are mostly deleteFamily if it
>> > > > matters.
>> > > >
>> > > > This is really weird. Anyone has any clue for this strange behavior?
>> > > >
>> > > > Thanks
>> > > > Tian-Ying
>> > > >
>> > > > These 0.94 Tian-Ying?
>> > >
>> > > It looks like the DeleteFamily is retained only; do you see incidence
>> > where
>> > > there may have been versions older than the DeleteFamily that are also
>> > > retained post-major-compaction?
>> > >
>> > > St.Ack
>> > >
>> > >
>> > >
>> > > > A snippet of the HFile generated by the major compaction:
>> > > >
>> > > > : \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@
>> > > > \x10\x00?PF/d:/1459808114380/DeleteFamily/vlen=0/ts=2292870047
>> > > > V:
>> > > > K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@
>> > > > \x10\x00?PF/d:/1459808114011/DeleteFamily/vlen=0/ts=2292869794
>> > > > V:
>> > > > K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@
>> > > > \x10\x00?PF/d:/1459805381104/DeleteFamily/vlen=0/ts=2291072240
>> > > > V:
>> > > > K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@
>> > > > \x10\x00?PF/d:/1459805380673/DeleteFamily/vlen=0/ts=2291071997
>> > > > V:
>> > > > K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@
>> > > > \x10\x00?PF/d:/1459802643449/DeleteFamily/vlen=0/ts=2290248886
>> > > > V:
>> > > > K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@
>> > > > \x10\x00?PF/d:/1459802643246/DeleteFamily/vlen=0/ts=2290248786
>> > > > V:
>> > > > K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@
>> > > > \x10\x00?PF/d:/1459799913003/DeleteFamily/vlen=0/ts=2289446916
>> > > > V:
>> > > > K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@
>> > > > \x10\x00?PF/d:/1459797181831/DeleteFamily/vlen=0/ts=2288670451
>> > > > V:
>> > > > K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@
>> > > > \x10\x00?PF/d:/1459794447388/DeleteFamily/vlen=0/ts=2287911443
>> > > > V:
>> > > > K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@
>> > > > \x10\x00?PF/d:/1459791708803/DeleteFamily/vlen=0/ts=2287213792
>> > > > V:
>> > > > K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@
>> > > > \x10\x00?PF/d:/1459788978387/DeleteFamily/vlen=0/ts=2286488738
>> > > > V:
>> > > > K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@
>> > > > \x10\x00?PF/d:/1459786243642/DeleteFamily/vlen=0/ts=2285778927
>> > > > V:
>> > > >
>> > >
>> >
>>
>
>

Re: Major compaction cannot remove deleted rows until the region is split. Strange!

Posted by Tianying Chang <ty...@gmail.com>.

Hi, Stack

Based on the log, the major compaction was run, and it took 5+ hours.  And
I also manually run major_compact from hbase shell explicitly to verify.

I just moved the region to a different RS and issued a major_compact on
that region again, let me see if the major compaction can succeed and will
report back.

Thanks
Tian-Ying

On Sun, May 29, 2016 at 4:35 PM, Stack <st...@duboce.net> wrote:

> On Fri, May 27, 2016 at 3:17 PM, Tianying Chang <ty...@gmail.com> wrote:
>
> > Yes, it is 94.26.  By a quick glance, I didn't  see any put that is older
> > than the delete marker's TS, which could go as far as about couple weeks
> > ago since major compaction on it for long time seems.
> >
> Also it is really strange that if the region is split, then seems
> > everything is working as expected. Also we noticed, the same region
> > replicated at the slave side is totally normal, i.e. at 20+G....
> >
> >
> If you move the region to another server, does that work?
>
> Looking in 0.94 codebase, I see this in Compactor#compact
>
>
>       // For major compactions calculate the earliest put timestamp
>
>       // of all involved storefiles. This is used to remove
>
>       // family delete marker during the compaction.
>
>       if (majorCompaction) {
>
>         tmp = fileInfo.get(StoreFile.EARLIEST_PUT_TS);
>
>         if (tmp == null) {
>
>           // There's a file with no information, must be an old one
>
>           // assume we have very old puts
>
>           earliestPutTs = HConstants.OLDEST_TIMESTAMP;
>
>         } else {
>
>           earliestPutTs = Math.min(earliestPutTs, Bytes.toLong(tmp));
>
>         }
>
>       }
>
>
> The above is followed by this log line:
>
>
>       if (LOG.isDebugEnabled()) {
>
>         LOG.debug("Compacting " + file +
>
>           ", keycount=" + keyCount +
>
>           ", bloomtype=" + r.getBloomFilterType().toString() +
>
>           ", size=" + StringUtils.humanReadableInt(r.length()) +
>
>           ", encoding=" + r.getHFileReader().getEncodingOnDisk() +
>
>           (majorCompaction? ", earliestPutTs=" + earliestPutTs: ""));
>
>       }
>
> This prints out earliestPutTs. You see that in the logs?  You running with
> DEBUG? Does the earliest put ts preclude our dropping delete family?
>
>
> Looking more in code, we retain deletes in following circumstances:
>
>
>     this.retainDeletesInOutput = scanType == ScanType.MINOR_COMPACT || scan
> .isRaw();
>
>
> So, for sure we are running major compaction?
>
> Otherwise, have to dig in a bit more here.. This stuff is a little
> involved.
> St.Ack
>
>
>
>
> > On Fri, May 27, 2016 at 3:13 PM, Stack <st...@duboce.net> wrote:
> >
> > > On Fri, May 27, 2016 at 2:32 PM, Tianying Chang <ty...@gmail.com>
> > wrote:
> > >
> > > > Hi,
> > > >
> > > > We saw a very strange case in one of our production cluster. A couple
> > > > regions cannot get their deleted rows or delete marker removed even
> > after
> > > > major compaction. However when the region triggered split (we set
> 100G
> > > for
> > > > auto split), the deletion worked. The 100G region becomes two 10G
> > > daughter
> > > > regions, and all the delete marker are gone.
> > > >
> > > > Also, the same region in the slave cluster (through replication) have
> > > > normal size at about 20+G.
> > > >
> > > > BTW, the delete marker in the regions are mostly deleteFamily if it
> > > > matters.
> > > >
> > > > This is really weird. Anyone has any clue for this strange behavior?
> > > >
> > > > Thanks
> > > > Tian-Ying
> > > >
> > > > These 0.94 Tian-Ying?
> > >
> > > It looks like the DeleteFamily is retained only; do you see incidence
> > where
> > > there may have been versions older than the DeleteFamily that are also
> > > retained post-major-compaction?
> > >
> > > St.Ack
> > >
> > >
> > >
> > > > A snippet of the HFile generated by the major compaction:
> > > >
> > > > : \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@
> > > > \x10\x00?PF/d:/1459808114380/DeleteFamily/vlen=0/ts=2292870047
> > > > V:
> > > > K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@
> > > > \x10\x00?PF/d:/1459808114011/DeleteFamily/vlen=0/ts=2292869794
> > > > V:
> > > > K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@
> > > > \x10\x00?PF/d:/1459805381104/DeleteFamily/vlen=0/ts=2291072240
> > > > V:
> > > > K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@
> > > > \x10\x00?PF/d:/1459805380673/DeleteFamily/vlen=0/ts=2291071997
> > > > V:
> > > > K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@
> > > > \x10\x00?PF/d:/1459802643449/DeleteFamily/vlen=0/ts=2290248886
> > > > V:
> > > > K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@
> > > > \x10\x00?PF/d:/1459802643246/DeleteFamily/vlen=0/ts=2290248786
> > > > V:
> > > > K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@
> > > > \x10\x00?PF/d:/1459799913003/DeleteFamily/vlen=0/ts=2289446916
> > > > V:
> > > > K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@
> > > > \x10\x00?PF/d:/1459797181831/DeleteFamily/vlen=0/ts=2288670451
> > > > V:
> > > > K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@
> > > > \x10\x00?PF/d:/1459794447388/DeleteFamily/vlen=0/ts=2287911443
> > > > V:
> > > > K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@
> > > > \x10\x00?PF/d:/1459791708803/DeleteFamily/vlen=0/ts=2287213792
> > > > V:
> > > > K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@
> > > > \x10\x00?PF/d:/1459788978387/DeleteFamily/vlen=0/ts=2286488738
> > > > V:
> > > > K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@
> > > > \x10\x00?PF/d:/1459786243642/DeleteFamily/vlen=0/ts=2285778927
> > > > V:
> > > >
> > >
> >
>

Re: Major compaction cannot remove deleted rows until the region is split. Strange!

Posted by Stack <st...@duboce.net>.

On Fri, May 27, 2016 at 3:17 PM, Tianying Chang <ty...@gmail.com> wrote:

> Yes, it is 94.26.  By a quick glance, I didn't  see any put that is older
> than the delete marker's TS, which could go as far as about couple weeks
> ago since major compaction on it for long time seems.
>
Also it is really strange that if the region is split, then seems
> everything is working as expected. Also we noticed, the same region
> replicated at the slave side is totally normal, i.e. at 20+G....
>
>
If you move the region to another server, does that work?

Looking in 0.94 codebase, I see this in Compactor#compact


      // For major compactions calculate the earliest put timestamp

      // of all involved storefiles. This is used to remove

      // family delete marker during the compaction.

      if (majorCompaction) {

        tmp = fileInfo.get(StoreFile.EARLIEST_PUT_TS);

        if (tmp == null) {

          // There's a file with no information, must be an old one

          // assume we have very old puts

          earliestPutTs = HConstants.OLDEST_TIMESTAMP;

        } else {

          earliestPutTs = Math.min(earliestPutTs, Bytes.toLong(tmp));

        }

      }


The above is followed by this log line:


      if (LOG.isDebugEnabled()) {

        LOG.debug("Compacting " + file +

          ", keycount=" + keyCount +

          ", bloomtype=" + r.getBloomFilterType().toString() +

          ", size=" + StringUtils.humanReadableInt(r.length()) +

          ", encoding=" + r.getHFileReader().getEncodingOnDisk() +

          (majorCompaction? ", earliestPutTs=" + earliestPutTs: ""));

      }

This prints out earliestPutTs. You see that in the logs?  You running with
DEBUG? Does the earliest put ts preclude our dropping delete family?


Looking more in code, we retain deletes in following circumstances:


    this.retainDeletesInOutput = scanType == ScanType.MINOR_COMPACT || scan
.isRaw();


So, for sure we are running major compaction?

Otherwise, have to dig in a bit more here.. This stuff is a little involved.
St.Ack




> On Fri, May 27, 2016 at 3:13 PM, Stack <st...@duboce.net> wrote:
>
> > On Fri, May 27, 2016 at 2:32 PM, Tianying Chang <ty...@gmail.com>
> wrote:
> >
> > > Hi,
> > >
> > > We saw a very strange case in one of our production cluster. A couple
> > > regions cannot get their deleted rows or delete marker removed even
> after
> > > major compaction. However when the region triggered split (we set 100G
> > for
> > > auto split), the deletion worked. The 100G region becomes two 10G
> > daughter
> > > regions, and all the delete marker are gone.
> > >
> > > Also, the same region in the slave cluster (through replication) have
> > > normal size at about 20+G.
> > >
> > > BTW, the delete marker in the regions are mostly deleteFamily if it
> > > matters.
> > >
> > > This is really weird. Anyone has any clue for this strange behavior?
> > >
> > > Thanks
> > > Tian-Ying
> > >
> > > These 0.94 Tian-Ying?
> >
> > It looks like the DeleteFamily is retained only; do you see incidence
> where
> > there may have been versions older than the DeleteFamily that are also
> > retained post-major-compaction?
> >
> > St.Ack
> >
> >
> >
> > > A snippet of the HFile generated by the major compaction:
> > >
> > > : \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@
> > > \x10\x00?PF/d:/1459808114380/DeleteFamily/vlen=0/ts=2292870047
> > > V:
> > > K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@
> > > \x10\x00?PF/d:/1459808114011/DeleteFamily/vlen=0/ts=2292869794
> > > V:
> > > K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@
> > > \x10\x00?PF/d:/1459805381104/DeleteFamily/vlen=0/ts=2291072240
> > > V:
> > > K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@
> > > \x10\x00?PF/d:/1459805380673/DeleteFamily/vlen=0/ts=2291071997
> > > V:
> > > K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@
> > > \x10\x00?PF/d:/1459802643449/DeleteFamily/vlen=0/ts=2290248886
> > > V:
> > > K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@
> > > \x10\x00?PF/d:/1459802643246/DeleteFamily/vlen=0/ts=2290248786
> > > V:
> > > K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@
> > > \x10\x00?PF/d:/1459799913003/DeleteFamily/vlen=0/ts=2289446916
> > > V:
> > > K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@
> > > \x10\x00?PF/d:/1459797181831/DeleteFamily/vlen=0/ts=2288670451
> > > V:
> > > K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@
> > > \x10\x00?PF/d:/1459794447388/DeleteFamily/vlen=0/ts=2287911443
> > > V:
> > > K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@
> > > \x10\x00?PF/d:/1459791708803/DeleteFamily/vlen=0/ts=2287213792
> > > V:
> > > K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@
> > > \x10\x00?PF/d:/1459788978387/DeleteFamily/vlen=0/ts=2286488738
> > > V:
> > > K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@
> > > \x10\x00?PF/d:/1459786243642/DeleteFamily/vlen=0/ts=2285778927
> > > V:
> > >
> >
>

Re: Major compaction cannot remove deleted rows until the region is split. Strange!

Posted by Tianying Chang <ty...@gmail.com>.

Yes, it is 94.26.  By a quick glance, I didn't  see any put that is older
than the delete marker's TS, which could go as far as about couple weeks
ago since major compaction on it for long time seems.

Also it is really strange that if the region is split, then seems
everything is working as expected. Also we noticed, the same region
replicated at the slave side is totally normal, i.e. at 20+G....

On Fri, May 27, 2016 at 3:13 PM, Stack <st...@duboce.net> wrote:

> On Fri, May 27, 2016 at 2:32 PM, Tianying Chang <ty...@gmail.com> wrote:
>
> > Hi,
> >
> > We saw a very strange case in one of our production cluster. A couple
> > regions cannot get their deleted rows or delete marker removed even after
> > major compaction. However when the region triggered split (we set 100G
> for
> > auto split), the deletion worked. The 100G region becomes two 10G
> daughter
> > regions, and all the delete marker are gone.
> >
> > Also, the same region in the slave cluster (through replication) have
> > normal size at about 20+G.
> >
> > BTW, the delete marker in the regions are mostly deleteFamily if it
> > matters.
> >
> > This is really weird. Anyone has any clue for this strange behavior?
> >
> > Thanks
> > Tian-Ying
> >
> > These 0.94 Tian-Ying?
>
> It looks like the DeleteFamily is retained only; do you see incidence where
> there may have been versions older than the DeleteFamily that are also
> retained post-major-compaction?
>
> St.Ack
>
>
>
> > A snippet of the HFile generated by the major compaction:
> >
> > : \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@
> > \x10\x00?PF/d:/1459808114380/DeleteFamily/vlen=0/ts=2292870047
> > V:
> > K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@
> > \x10\x00?PF/d:/1459808114011/DeleteFamily/vlen=0/ts=2292869794
> > V:
> > K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@
> > \x10\x00?PF/d:/1459805381104/DeleteFamily/vlen=0/ts=2291072240
> > V:
> > K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@
> > \x10\x00?PF/d:/1459805380673/DeleteFamily/vlen=0/ts=2291071997
> > V:
> > K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@
> > \x10\x00?PF/d:/1459802643449/DeleteFamily/vlen=0/ts=2290248886
> > V:
> > K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@
> > \x10\x00?PF/d:/1459802643246/DeleteFamily/vlen=0/ts=2290248786
> > V:
> > K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@
> > \x10\x00?PF/d:/1459799913003/DeleteFamily/vlen=0/ts=2289446916
> > V:
> > K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@
> > \x10\x00?PF/d:/1459797181831/DeleteFamily/vlen=0/ts=2288670451
> > V:
> > K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@
> > \x10\x00?PF/d:/1459794447388/DeleteFamily/vlen=0/ts=2287911443
> > V:
> > K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@
> > \x10\x00?PF/d:/1459791708803/DeleteFamily/vlen=0/ts=2287213792
> > V:
> > K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@
> > \x10\x00?PF/d:/1459788978387/DeleteFamily/vlen=0/ts=2286488738
> > V:
> > K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@
> > \x10\x00?PF/d:/1459786243642/DeleteFamily/vlen=0/ts=2285778927
> > V:
> >
>

Re: Major compaction cannot remove deleted rows until the region is split. Strange!

Posted by Stack <st...@duboce.net>.

On Fri, May 27, 2016 at 2:32 PM, Tianying Chang <ty...@gmail.com> wrote:

> Hi,
>
> We saw a very strange case in one of our production cluster. A couple
> regions cannot get their deleted rows or delete marker removed even after
> major compaction. However when the region triggered split (we set 100G for
> auto split), the deletion worked. The 100G region becomes two 10G daughter
> regions, and all the delete marker are gone.
>
> Also, the same region in the slave cluster (through replication) have
> normal size at about 20+G.
>
> BTW, the delete marker in the regions are mostly deleteFamily if it
> matters.
>
> This is really weird. Anyone has any clue for this strange behavior?
>
> Thanks
> Tian-Ying
>
> These 0.94 Tian-Ying?

It looks like the DeleteFamily is retained only; do you see incidence where
there may have been versions older than the DeleteFamily that are also
retained post-major-compaction?

St.Ack



> A snippet of the HFile generated by the major compaction:
>
> : \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@
> \x10\x00?PF/d:/1459808114380/DeleteFamily/vlen=0/ts=2292870047
> V:
> K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@
> \x10\x00?PF/d:/1459808114011/DeleteFamily/vlen=0/ts=2292869794
> V:
> K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@
> \x10\x00?PF/d:/1459805381104/DeleteFamily/vlen=0/ts=2291072240
> V:
> K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@
> \x10\x00?PF/d:/1459805380673/DeleteFamily/vlen=0/ts=2291071997
> V:
> K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@
> \x10\x00?PF/d:/1459802643449/DeleteFamily/vlen=0/ts=2290248886
> V:
> K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@
> \x10\x00?PF/d:/1459802643246/DeleteFamily/vlen=0/ts=2290248786
> V:
> K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@
> \x10\x00?PF/d:/1459799913003/DeleteFamily/vlen=0/ts=2289446916
> V:
> K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@
> \x10\x00?PF/d:/1459797181831/DeleteFamily/vlen=0/ts=2288670451
> V:
> K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@
> \x10\x00?PF/d:/1459794447388/DeleteFamily/vlen=0/ts=2287911443
> V:
> K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@
> \x10\x00?PF/d:/1459791708803/DeleteFamily/vlen=0/ts=2287213792
> V:
> K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@
> \x10\x00?PF/d:/1459788978387/DeleteFamily/vlen=0/ts=2286488738
> V:
> K: \xA0\x00\x00L\x1A@\x1CBe\x00\x00\x08m\x03\x1A@
> \x10\x00?PF/d:/1459786243642/DeleteFamily/vlen=0/ts=2285778927
> V:
>