You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Amit Sela <am...@infolinks.com> on 2013/04/17 10:30:11 UTC

RegionServer shutdown with ScanWildcardColumnTracker exception

Hi all,

I had a regionserver crushed during counters increment. Looking at the
regionserver log I saw:

org.apache.hadoop.hbase.DroppedSnapshotException: region: TABLE_NAME,
ROW_KEY...at
org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:1472)
        at
org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:1351)
        at
org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:1292)
        at
org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:406)
        at
org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:380)
        at
org.apache.hadoop.hbase.regionserver.MemStoreFlusher.run(MemStoreFlusher.java:243)
        at java.lang.Thread.run(Thread.java:722)
Caused by: java.io.IOException: ScanWildcardColumnTracker.checkColumn ran
into a column actually smaller than the previous column: *QUALIFIER*
at
org.apache.hadoop.hbase.regionserver.ScanWildcardColumnTracker.checkColumn(ScanWildcardColumnTracker.java:104)
        at
org.apache.hadoop.hbase.regionserver.ScanQueryMatcher.match(ScanQueryMatcher.java:354)
        at
org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:362)
        at
org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:311)
        at
org.apache.hadoop.hbase.regionserver.Store.internalFlushCache(Store.java:738)
        at
org.apache.hadoop.hbase.regionserver.Store.flushCache(Store.java:673)
        at
org.apache.hadoop.hbase.regionserver.Store.access$400(Store.java:108)
        at
org.apache.hadoop.hbase.regionserver.Store$StoreFlusherImpl.flushCache(Store.java:2276)
        at
org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:1447)

The strange thing is that the *QUALIFER* name as it appears in the log is
misspelled.... there is no, and never was such qualifier name.

Thanks,

Amit.

Re: RegionServer shutdown with ScanWildcardColumnTracker exception

Posted by Amit Sela <am...@infolinks.com>.
No. It happened in our production environment after running counters
increments every 5 minutes for a few weeks now. I could try to reproduce in
test cluster environment but that would mean running for weeks as well...
but I will keep digging and let you guys know if it happens again or / and
I have more information or insights on the issue.

Thanks.


On Wed, Apr 17, 2013 at 8:18 PM, ramkrishna vasudevan <
ramkrishna.s.vasudevan@gmail.com> wrote:

> Is there any testcases that tries to reproduce your issue?
>
> Regards
> Ram
>
>
> On Wed, Apr 17, 2013 at 9:47 PM, ramkrishna vasudevan <
> ramkrishna.s.vasudevan@gmail.com> wrote:
>
> > There is a hint mechanism available when scanning happens.  But i dont
> > think there should be much of difference between a scan that happens
> during
> > flush and the normal scan.
> >
> > Will look thro the code and come back on this.
> >
> > Regards
> > RAm
> >
> >
> > On Wed, Apr 17, 2013 at 9:40 PM, Amit Sela <am...@infolinks.com> wrote:
> >
> >> No, no encoding.
> >>
> >>
> >> On Wed, Apr 17, 2013 at 6:56 PM, ramkrishna vasudevan <
> >> ramkrishna.s.vasudevan@gmail.com> wrote:
> >>
> >> > @Lars
> >> > You have any suggestions on this?
> >> >
> >> > @Amit
> >> > You have any Encoder enabled like the Prefix Encoding stuff?
> >> > There was one optimization added recently but that is not in 0.94.2
> >> >
> >> > Regards
> >> > Ram
> >> >
> >> >
> >> > On Wed, Apr 17, 2013 at 5:17 PM, Amit Sela <am...@infolinks.com>
> wrote:
> >> >
> >> > > I scanned over this counter with and without column specification
> and
> >> all
> >> > > looks OK now.
> >> > > I have no CPs in this table.
> >> > > Is there some kind of a hint mechanism in HBase' internal scan ?
> >> because
> >> > > it's weird that ScanWildcardColumnTracker.checkColumn says that
> >> column is
> >> > > smaller than previous column: *imprersions_ALL_2013041617*. there
> is
> >> no
> >> > > imprersions only impressions and r is indeed smaller than s, could
> it
> >> be
> >> > > some kind of hint bug ? I don't think I know enough of HBase
> >> internals to
> >> > > fully understand that...
> >> > >
> >> > >
> >> > >
> >> > > On Wed, Apr 17, 2013 at 1:42 PM, ramkrishna vasudevan <
> >> > > ramkrishna.s.vasudevan@gmail.com> wrote:
> >> > >
> >> > > > Hi Amit
> >> > > >
> >> > > > Checking the code this is possible when the qualifiers are not
> >> sorted.
> >> > >  Do
> >> > > > you have any CPs in your path which tries to play with the KVs?
> >> > > >
> >> > > > Seems to be a very weird thing.
> >> > > > Can you try doing a scan on the KV just before this happens.  That
> >> will
> >> > > tel
> >> > > > you the existing kvs that are present.
> >> > > >
> >> > > > Even now if you can have the cluster you can try scanning for the
> >> > region
> >> > > > for which the flush happened.  That will give us some more info.
> >> > > >
> >> > > > Regards
> >> > > > Ram
> >> > > >
> >> > > >
> >> > > > On Wed, Apr 17, 2013 at 2:36 PM, Amit Sela <am...@infolinks.com>
> >> > wrote:
> >> > > >
> >> > > > > The cluster runs Hadoop 1.0.4 and HBase 0.94.2
> >> > > > >
> >> > > > > I have three families in this table: weekly, daily, hourly. each
> >> > family
> >> > > > has
> >> > > > > the following qualifiers:
> >> > > > > Weekly - impressions_{countrycode}_{week#} - country code is 0,
> 1
> >> or
> >> > > ALL
> >> > > > > (aggregation of both 0 and 1)
> >> > > > > Daily and hourly are the same but with yyyyMMdd and yyyyMMddhh
> >> > > > > respectively.
> >> > > > >
> >> > > > > Just before the exception the regionserver StoreFile executes
> the
> >> > > > > following:
> >> > > > >
> >> > > > > 2013-04-16 17:56:06,769 [regionserver8041.cacheFlusher] INFO
> >> > > > > org.apache.hadoop.hbase.regionserver.StoreFile: Delete Family
> >> Bloom
> >> > > > filter
> >> > > > > type for hdfs://
> >> > > > > hadoop-master.infolinks.com:8000/hbase/URL_COUNTERS/af2760e
> >> > > > > 4d04a9e3025d1fb53bdba8acf/.tmp/dc4ce516887f4e0bbaf6201d69ba90bc:
> >> > > > > CompoundBloomFilterWriter
> >> > > > > 2013-04-16 17:56:07,331 [regionserver8041.cacheFlusher] INFO
> >> > > > > org.apache.hadoop.hbase.regionserver.StoreFile: NO General Bloom
> >> and
> >> > NO
> >> > > > > DeleteFamily was added to HFile
> >> > (hdfs://hbase-master-address:8000/hbase
> >> > > > > /URL_COUNTERS/*af2760e4d04a9e3025d1fb53bdba8acf*
> >> > > > > /.tmp/dc4ce516887f4e0bbaf6201d69ba90bc)
> >> > > > > 2013-04-16 17:56:07,331 [regionserver8041.cacheFlusher] INFO
> >> > > > > org.apache.hadoop.hbase.regionserver.Store: Flushed ,
> >> > > > sequenceid=210517246,
> >> > > > > memsize=39.3m, into tmp file hdfs://hbase-master:8000/hbase
> >> > > > > /URL_COUNTERS/*af2760e4d04a9e3025d1fb53bdba8acf*
> >> > > > > /.tmp/dc4ce516887f4e0bbaf6201d69ba90bc
> >> > > > > 2013-04-16 17:56:07,357 [regionserver8041.cacheFlusher] INFO
> >> > > > > org.apache.hadoop.hbase.regionserver.StoreFile: Delete Family
> >> Bloom
> >> > > > filter
> >> > > > > type for hdfs://hbase-master:8000/hbase/URL_COUNTERS/*af2760e*
> >> > > > >
> *4d04a9e3025d1fb53bdba8acf*/.tmp/3fa7993dcb294be1bca5e4d7357f4003:
> >> > > > > CompoundBloomFilterWriter
> >> > > > > 2013-04-16 17:56:07,608 [regionserver8041.cacheFlusher] INFO
> >> > > > > org.apache.hadoop.hbase.regionserver.StoreFile: NO General Bloom
> >> and
> >> > NO
> >> > > > > DeleteFamily was added to HFile (hdfs://hbase-master:8000/hbase
> >> > > > > /URL_COUNTERS/*af2760e4d04a9e3025d1fb53bdba8acf*
> >> > > > > /.tmp/3fa7993dcb294be1bca5e4d7357f4003)
> >> > > > > 2013-04-16 17:56:07,608 [regionserver8041.cacheFlusher] FATAL
> >> > > > > org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING
> >> region
> >> > > > server
> >> > > > > region-server-address,8041,1364993168088: Replay of HLog
> required
> >> > > > > . Forcing server shutdown
> >> > > > > DroppedSnapshotException: region: TABLE,ROWKEY,1364317591568.*
> >> > > > > af2760e4d04a9e3025d1fb53bdba8acf*.
> >> > > > > ....
> >> > > > > ....
> >> > > > > ...
> >> > > > >
> >> > > > >
> >> > > > > On Wed, Apr 17, 2013 at 11:47 AM, ramkrishna vasudevan <
> >> > > > > ramkrishna.s.vasudevan@gmail.com> wrote:
> >> > > > >
> >> > > > > > Seems interesting.  Can  you tell us what are the families and
> >> the
> >> > > > > > qualifiers available in your schema.
> >> > > > > >
> >> > > > > > Any other interesting logs that you can see before this?
> >> > > > > >
> >> > > > > > BTW the version of HBase is also needed?  If we can track it
> >> out we
> >> > > can
> >> > > > > > then file a JIRA if it is a bug.
> >> > > > > >
> >> > > > > > Regards
> >> > > > > > RAm
> >> > > > > >
> >> > > > > >
> >> > > > > > On Wed, Apr 17, 2013 at 2:00 PM, Amit Sela <
> amits@infolinks.com
> >> >
> >> > > > wrote:
> >> > > > > >
> >> > > > > > > Hi all,
> >> > > > > > >
> >> > > > > > > I had a regionserver crushed during counters increment.
> >> Looking
> >> > at
> >> > > > the
> >> > > > > > > regionserver log I saw:
> >> > > > > > >
> >> > > > > > > org.apache.hadoop.hbase.DroppedSnapshotException: region:
> >> > > TABLE_NAME,
> >> > > > > > > ROW_KEY...at
> >> > > > > > >
> >> > > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:1472)
> >> > > > > > >         at
> >> > > > > > >
> >> > > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:1351)
> >> > > > > > >         at
> >> > > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:1292)
> >> > > > > > >         at
> >> > > > > > >
> >> > > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:406)
> >> > > > > > >         at
> >> > > > > > >
> >> > > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:380)
> >> > > > > > >         at
> >> > > > > > >
> >> > > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.run(MemStoreFlusher.java:243)
> >> > > > > > >         at java.lang.Thread.run(Thread.java:722)
> >> > > > > > > Caused by: java.io.IOException:
> >> > > ScanWildcardColumnTracker.checkColumn
> >> > > > > ran
> >> > > > > > > into a column actually smaller than the previous column:
> >> > > *QUALIFIER*
> >> > > > > > > at
> >> > > > > > >
> >> > > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> org.apache.hadoop.hbase.regionserver.ScanWildcardColumnTracker.checkColumn(ScanWildcardColumnTracker.java:104)
> >> > > > > > >         at
> >> > > > > > >
> >> > > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> org.apache.hadoop.hbase.regionserver.ScanQueryMatcher.match(ScanQueryMatcher.java:354)
> >> > > > > > >         at
> >> > > > > > >
> >> > > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:362)
> >> > > > > > >         at
> >> > > > > > >
> >> > > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:311)
> >> > > > > > >         at
> >> > > > > > >
> >> > > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> org.apache.hadoop.hbase.regionserver.Store.internalFlushCache(Store.java:738)
> >> > > > > > >         at
> >> > > > > > >
> >> > >
> org.apache.hadoop.hbase.regionserver.Store.flushCache(Store.java:673)
> >> > > > > > >         at
> >> > > > > > >
> >> > >
> org.apache.hadoop.hbase.regionserver.Store.access$400(Store.java:108)
> >> > > > > > >         at
> >> > > > > > >
> >> > > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> org.apache.hadoop.hbase.regionserver.Store$StoreFlusherImpl.flushCache(Store.java:2276)
> >> > > > > > >         at
> >> > > > > > >
> >> > > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:1447)
> >> > > > > > >
> >> > > > > > > The strange thing is that the *QUALIFER* name as it appears
> in
> >> > the
> >> > > > log
> >> > > > > is
> >> > > > > > > misspelled.... there is no, and never was such qualifier
> name.
> >> > > > > > >
> >> > > > > > > Thanks,
> >> > > > > > >
> >> > > > > > > Amit.
> >> > > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> >
> >
>

Re: RegionServer shutdown with ScanWildcardColumnTracker exception

Posted by ramkrishna vasudevan <ra...@gmail.com>.
Is there any testcases that tries to reproduce your issue?

Regards
Ram


On Wed, Apr 17, 2013 at 9:47 PM, ramkrishna vasudevan <
ramkrishna.s.vasudevan@gmail.com> wrote:

> There is a hint mechanism available when scanning happens.  But i dont
> think there should be much of difference between a scan that happens during
> flush and the normal scan.
>
> Will look thro the code and come back on this.
>
> Regards
> RAm
>
>
> On Wed, Apr 17, 2013 at 9:40 PM, Amit Sela <am...@infolinks.com> wrote:
>
>> No, no encoding.
>>
>>
>> On Wed, Apr 17, 2013 at 6:56 PM, ramkrishna vasudevan <
>> ramkrishna.s.vasudevan@gmail.com> wrote:
>>
>> > @Lars
>> > You have any suggestions on this?
>> >
>> > @Amit
>> > You have any Encoder enabled like the Prefix Encoding stuff?
>> > There was one optimization added recently but that is not in 0.94.2
>> >
>> > Regards
>> > Ram
>> >
>> >
>> > On Wed, Apr 17, 2013 at 5:17 PM, Amit Sela <am...@infolinks.com> wrote:
>> >
>> > > I scanned over this counter with and without column specification and
>> all
>> > > looks OK now.
>> > > I have no CPs in this table.
>> > > Is there some kind of a hint mechanism in HBase' internal scan ?
>> because
>> > > it's weird that ScanWildcardColumnTracker.checkColumn says that
>> column is
>> > > smaller than previous column: *imprersions_ALL_2013041617*. there is
>> no
>> > > imprersions only impressions and r is indeed smaller than s, could it
>> be
>> > > some kind of hint bug ? I don't think I know enough of HBase
>> internals to
>> > > fully understand that...
>> > >
>> > >
>> > >
>> > > On Wed, Apr 17, 2013 at 1:42 PM, ramkrishna vasudevan <
>> > > ramkrishna.s.vasudevan@gmail.com> wrote:
>> > >
>> > > > Hi Amit
>> > > >
>> > > > Checking the code this is possible when the qualifiers are not
>> sorted.
>> > >  Do
>> > > > you have any CPs in your path which tries to play with the KVs?
>> > > >
>> > > > Seems to be a very weird thing.
>> > > > Can you try doing a scan on the KV just before this happens.  That
>> will
>> > > tel
>> > > > you the existing kvs that are present.
>> > > >
>> > > > Even now if you can have the cluster you can try scanning for the
>> > region
>> > > > for which the flush happened.  That will give us some more info.
>> > > >
>> > > > Regards
>> > > > Ram
>> > > >
>> > > >
>> > > > On Wed, Apr 17, 2013 at 2:36 PM, Amit Sela <am...@infolinks.com>
>> > wrote:
>> > > >
>> > > > > The cluster runs Hadoop 1.0.4 and HBase 0.94.2
>> > > > >
>> > > > > I have three families in this table: weekly, daily, hourly. each
>> > family
>> > > > has
>> > > > > the following qualifiers:
>> > > > > Weekly - impressions_{countrycode}_{week#} - country code is 0, 1
>> or
>> > > ALL
>> > > > > (aggregation of both 0 and 1)
>> > > > > Daily and hourly are the same but with yyyyMMdd and yyyyMMddhh
>> > > > > respectively.
>> > > > >
>> > > > > Just before the exception the regionserver StoreFile executes the
>> > > > > following:
>> > > > >
>> > > > > 2013-04-16 17:56:06,769 [regionserver8041.cacheFlusher] INFO
>> > > > > org.apache.hadoop.hbase.regionserver.StoreFile: Delete Family
>> Bloom
>> > > > filter
>> > > > > type for hdfs://
>> > > > > hadoop-master.infolinks.com:8000/hbase/URL_COUNTERS/af2760e
>> > > > > 4d04a9e3025d1fb53bdba8acf/.tmp/dc4ce516887f4e0bbaf6201d69ba90bc:
>> > > > > CompoundBloomFilterWriter
>> > > > > 2013-04-16 17:56:07,331 [regionserver8041.cacheFlusher] INFO
>> > > > > org.apache.hadoop.hbase.regionserver.StoreFile: NO General Bloom
>> and
>> > NO
>> > > > > DeleteFamily was added to HFile
>> > (hdfs://hbase-master-address:8000/hbase
>> > > > > /URL_COUNTERS/*af2760e4d04a9e3025d1fb53bdba8acf*
>> > > > > /.tmp/dc4ce516887f4e0bbaf6201d69ba90bc)
>> > > > > 2013-04-16 17:56:07,331 [regionserver8041.cacheFlusher] INFO
>> > > > > org.apache.hadoop.hbase.regionserver.Store: Flushed ,
>> > > > sequenceid=210517246,
>> > > > > memsize=39.3m, into tmp file hdfs://hbase-master:8000/hbase
>> > > > > /URL_COUNTERS/*af2760e4d04a9e3025d1fb53bdba8acf*
>> > > > > /.tmp/dc4ce516887f4e0bbaf6201d69ba90bc
>> > > > > 2013-04-16 17:56:07,357 [regionserver8041.cacheFlusher] INFO
>> > > > > org.apache.hadoop.hbase.regionserver.StoreFile: Delete Family
>> Bloom
>> > > > filter
>> > > > > type for hdfs://hbase-master:8000/hbase/URL_COUNTERS/*af2760e*
>> > > > > *4d04a9e3025d1fb53bdba8acf*/.tmp/3fa7993dcb294be1bca5e4d7357f4003:
>> > > > > CompoundBloomFilterWriter
>> > > > > 2013-04-16 17:56:07,608 [regionserver8041.cacheFlusher] INFO
>> > > > > org.apache.hadoop.hbase.regionserver.StoreFile: NO General Bloom
>> and
>> > NO
>> > > > > DeleteFamily was added to HFile (hdfs://hbase-master:8000/hbase
>> > > > > /URL_COUNTERS/*af2760e4d04a9e3025d1fb53bdba8acf*
>> > > > > /.tmp/3fa7993dcb294be1bca5e4d7357f4003)
>> > > > > 2013-04-16 17:56:07,608 [regionserver8041.cacheFlusher] FATAL
>> > > > > org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING
>> region
>> > > > server
>> > > > > region-server-address,8041,1364993168088: Replay of HLog required
>> > > > > . Forcing server shutdown
>> > > > > DroppedSnapshotException: region: TABLE,ROWKEY,1364317591568.*
>> > > > > af2760e4d04a9e3025d1fb53bdba8acf*.
>> > > > > ....
>> > > > > ....
>> > > > > ...
>> > > > >
>> > > > >
>> > > > > On Wed, Apr 17, 2013 at 11:47 AM, ramkrishna vasudevan <
>> > > > > ramkrishna.s.vasudevan@gmail.com> wrote:
>> > > > >
>> > > > > > Seems interesting.  Can  you tell us what are the families and
>> the
>> > > > > > qualifiers available in your schema.
>> > > > > >
>> > > > > > Any other interesting logs that you can see before this?
>> > > > > >
>> > > > > > BTW the version of HBase is also needed?  If we can track it
>> out we
>> > > can
>> > > > > > then file a JIRA if it is a bug.
>> > > > > >
>> > > > > > Regards
>> > > > > > RAm
>> > > > > >
>> > > > > >
>> > > > > > On Wed, Apr 17, 2013 at 2:00 PM, Amit Sela <amits@infolinks.com
>> >
>> > > > wrote:
>> > > > > >
>> > > > > > > Hi all,
>> > > > > > >
>> > > > > > > I had a regionserver crushed during counters increment.
>> Looking
>> > at
>> > > > the
>> > > > > > > regionserver log I saw:
>> > > > > > >
>> > > > > > > org.apache.hadoop.hbase.DroppedSnapshotException: region:
>> > > TABLE_NAME,
>> > > > > > > ROW_KEY...at
>> > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:1472)
>> > > > > > >         at
>> > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:1351)
>> > > > > > >         at
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:1292)
>> > > > > > >         at
>> > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:406)
>> > > > > > >         at
>> > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:380)
>> > > > > > >         at
>> > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.run(MemStoreFlusher.java:243)
>> > > > > > >         at java.lang.Thread.run(Thread.java:722)
>> > > > > > > Caused by: java.io.IOException:
>> > > ScanWildcardColumnTracker.checkColumn
>> > > > > ran
>> > > > > > > into a column actually smaller than the previous column:
>> > > *QUALIFIER*
>> > > > > > > at
>> > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> org.apache.hadoop.hbase.regionserver.ScanWildcardColumnTracker.checkColumn(ScanWildcardColumnTracker.java:104)
>> > > > > > >         at
>> > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> org.apache.hadoop.hbase.regionserver.ScanQueryMatcher.match(ScanQueryMatcher.java:354)
>> > > > > > >         at
>> > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:362)
>> > > > > > >         at
>> > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:311)
>> > > > > > >         at
>> > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> org.apache.hadoop.hbase.regionserver.Store.internalFlushCache(Store.java:738)
>> > > > > > >         at
>> > > > > > >
>> > > org.apache.hadoop.hbase.regionserver.Store.flushCache(Store.java:673)
>> > > > > > >         at
>> > > > > > >
>> > > org.apache.hadoop.hbase.regionserver.Store.access$400(Store.java:108)
>> > > > > > >         at
>> > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> org.apache.hadoop.hbase.regionserver.Store$StoreFlusherImpl.flushCache(Store.java:2276)
>> > > > > > >         at
>> > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:1447)
>> > > > > > >
>> > > > > > > The strange thing is that the *QUALIFER* name as it appears in
>> > the
>> > > > log
>> > > > > is
>> > > > > > > misspelled.... there is no, and never was such qualifier name.
>> > > > > > >
>> > > > > > > Thanks,
>> > > > > > >
>> > > > > > > Amit.
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>>
>
>

Re: RegionServer shutdown with ScanWildcardColumnTracker exception

Posted by ramkrishna vasudevan <ra...@gmail.com>.
There is a hint mechanism available when scanning happens.  But i dont
think there should be much of difference between a scan that happens during
flush and the normal scan.

Will look thro the code and come back on this.

Regards
RAm


On Wed, Apr 17, 2013 at 9:40 PM, Amit Sela <am...@infolinks.com> wrote:

> No, no encoding.
>
>
> On Wed, Apr 17, 2013 at 6:56 PM, ramkrishna vasudevan <
> ramkrishna.s.vasudevan@gmail.com> wrote:
>
> > @Lars
> > You have any suggestions on this?
> >
> > @Amit
> > You have any Encoder enabled like the Prefix Encoding stuff?
> > There was one optimization added recently but that is not in 0.94.2
> >
> > Regards
> > Ram
> >
> >
> > On Wed, Apr 17, 2013 at 5:17 PM, Amit Sela <am...@infolinks.com> wrote:
> >
> > > I scanned over this counter with and without column specification and
> all
> > > looks OK now.
> > > I have no CPs in this table.
> > > Is there some kind of a hint mechanism in HBase' internal scan ?
> because
> > > it's weird that ScanWildcardColumnTracker.checkColumn says that column
> is
> > > smaller than previous column: *imprersions_ALL_2013041617*. there is no
> > > imprersions only impressions and r is indeed smaller than s, could it
> be
> > > some kind of hint bug ? I don't think I know enough of HBase internals
> to
> > > fully understand that...
> > >
> > >
> > >
> > > On Wed, Apr 17, 2013 at 1:42 PM, ramkrishna vasudevan <
> > > ramkrishna.s.vasudevan@gmail.com> wrote:
> > >
> > > > Hi Amit
> > > >
> > > > Checking the code this is possible when the qualifiers are not
> sorted.
> > >  Do
> > > > you have any CPs in your path which tries to play with the KVs?
> > > >
> > > > Seems to be a very weird thing.
> > > > Can you try doing a scan on the KV just before this happens.  That
> will
> > > tel
> > > > you the existing kvs that are present.
> > > >
> > > > Even now if you can have the cluster you can try scanning for the
> > region
> > > > for which the flush happened.  That will give us some more info.
> > > >
> > > > Regards
> > > > Ram
> > > >
> > > >
> > > > On Wed, Apr 17, 2013 at 2:36 PM, Amit Sela <am...@infolinks.com>
> > wrote:
> > > >
> > > > > The cluster runs Hadoop 1.0.4 and HBase 0.94.2
> > > > >
> > > > > I have three families in this table: weekly, daily, hourly. each
> > family
> > > > has
> > > > > the following qualifiers:
> > > > > Weekly - impressions_{countrycode}_{week#} - country code is 0, 1
> or
> > > ALL
> > > > > (aggregation of both 0 and 1)
> > > > > Daily and hourly are the same but with yyyyMMdd and yyyyMMddhh
> > > > > respectively.
> > > > >
> > > > > Just before the exception the regionserver StoreFile executes the
> > > > > following:
> > > > >
> > > > > 2013-04-16 17:56:06,769 [regionserver8041.cacheFlusher] INFO
> > > > > org.apache.hadoop.hbase.regionserver.StoreFile: Delete Family Bloom
> > > > filter
> > > > > type for hdfs://
> > > > > hadoop-master.infolinks.com:8000/hbase/URL_COUNTERS/af2760e
> > > > > 4d04a9e3025d1fb53bdba8acf/.tmp/dc4ce516887f4e0bbaf6201d69ba90bc:
> > > > > CompoundBloomFilterWriter
> > > > > 2013-04-16 17:56:07,331 [regionserver8041.cacheFlusher] INFO
> > > > > org.apache.hadoop.hbase.regionserver.StoreFile: NO General Bloom
> and
> > NO
> > > > > DeleteFamily was added to HFile
> > (hdfs://hbase-master-address:8000/hbase
> > > > > /URL_COUNTERS/*af2760e4d04a9e3025d1fb53bdba8acf*
> > > > > /.tmp/dc4ce516887f4e0bbaf6201d69ba90bc)
> > > > > 2013-04-16 17:56:07,331 [regionserver8041.cacheFlusher] INFO
> > > > > org.apache.hadoop.hbase.regionserver.Store: Flushed ,
> > > > sequenceid=210517246,
> > > > > memsize=39.3m, into tmp file hdfs://hbase-master:8000/hbase
> > > > > /URL_COUNTERS/*af2760e4d04a9e3025d1fb53bdba8acf*
> > > > > /.tmp/dc4ce516887f4e0bbaf6201d69ba90bc
> > > > > 2013-04-16 17:56:07,357 [regionserver8041.cacheFlusher] INFO
> > > > > org.apache.hadoop.hbase.regionserver.StoreFile: Delete Family Bloom
> > > > filter
> > > > > type for hdfs://hbase-master:8000/hbase/URL_COUNTERS/*af2760e*
> > > > > *4d04a9e3025d1fb53bdba8acf*/.tmp/3fa7993dcb294be1bca5e4d7357f4003:
> > > > > CompoundBloomFilterWriter
> > > > > 2013-04-16 17:56:07,608 [regionserver8041.cacheFlusher] INFO
> > > > > org.apache.hadoop.hbase.regionserver.StoreFile: NO General Bloom
> and
> > NO
> > > > > DeleteFamily was added to HFile (hdfs://hbase-master:8000/hbase
> > > > > /URL_COUNTERS/*af2760e4d04a9e3025d1fb53bdba8acf*
> > > > > /.tmp/3fa7993dcb294be1bca5e4d7357f4003)
> > > > > 2013-04-16 17:56:07,608 [regionserver8041.cacheFlusher] FATAL
> > > > > org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region
> > > > server
> > > > > region-server-address,8041,1364993168088: Replay of HLog required
> > > > > . Forcing server shutdown
> > > > > DroppedSnapshotException: region: TABLE,ROWKEY,1364317591568.*
> > > > > af2760e4d04a9e3025d1fb53bdba8acf*.
> > > > > ....
> > > > > ....
> > > > > ...
> > > > >
> > > > >
> > > > > On Wed, Apr 17, 2013 at 11:47 AM, ramkrishna vasudevan <
> > > > > ramkrishna.s.vasudevan@gmail.com> wrote:
> > > > >
> > > > > > Seems interesting.  Can  you tell us what are the families and
> the
> > > > > > qualifiers available in your schema.
> > > > > >
> > > > > > Any other interesting logs that you can see before this?
> > > > > >
> > > > > > BTW the version of HBase is also needed?  If we can track it out
> we
> > > can
> > > > > > then file a JIRA if it is a bug.
> > > > > >
> > > > > > Regards
> > > > > > RAm
> > > > > >
> > > > > >
> > > > > > On Wed, Apr 17, 2013 at 2:00 PM, Amit Sela <am...@infolinks.com>
> > > > wrote:
> > > > > >
> > > > > > > Hi all,
> > > > > > >
> > > > > > > I had a regionserver crushed during counters increment. Looking
> > at
> > > > the
> > > > > > > regionserver log I saw:
> > > > > > >
> > > > > > > org.apache.hadoop.hbase.DroppedSnapshotException: region:
> > > TABLE_NAME,
> > > > > > > ROW_KEY...at
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:1472)
> > > > > > >         at
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:1351)
> > > > > > >         at
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:1292)
> > > > > > >         at
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:406)
> > > > > > >         at
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:380)
> > > > > > >         at
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.run(MemStoreFlusher.java:243)
> > > > > > >         at java.lang.Thread.run(Thread.java:722)
> > > > > > > Caused by: java.io.IOException:
> > > ScanWildcardColumnTracker.checkColumn
> > > > > ran
> > > > > > > into a column actually smaller than the previous column:
> > > *QUALIFIER*
> > > > > > > at
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.ScanWildcardColumnTracker.checkColumn(ScanWildcardColumnTracker.java:104)
> > > > > > >         at
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.ScanQueryMatcher.match(ScanQueryMatcher.java:354)
> > > > > > >         at
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:362)
> > > > > > >         at
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:311)
> > > > > > >         at
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.Store.internalFlushCache(Store.java:738)
> > > > > > >         at
> > > > > > >
> > > org.apache.hadoop.hbase.regionserver.Store.flushCache(Store.java:673)
> > > > > > >         at
> > > > > > >
> > > org.apache.hadoop.hbase.regionserver.Store.access$400(Store.java:108)
> > > > > > >         at
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.Store$StoreFlusherImpl.flushCache(Store.java:2276)
> > > > > > >         at
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:1447)
> > > > > > >
> > > > > > > The strange thing is that the *QUALIFER* name as it appears in
> > the
> > > > log
> > > > > is
> > > > > > > misspelled.... there is no, and never was such qualifier name.
> > > > > > >
> > > > > > > Thanks,
> > > > > > >
> > > > > > > Amit.
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: RegionServer shutdown with ScanWildcardColumnTracker exception

Posted by Amit Sela <am...@infolinks.com>.
No, no encoding.


On Wed, Apr 17, 2013 at 6:56 PM, ramkrishna vasudevan <
ramkrishna.s.vasudevan@gmail.com> wrote:

> @Lars
> You have any suggestions on this?
>
> @Amit
> You have any Encoder enabled like the Prefix Encoding stuff?
> There was one optimization added recently but that is not in 0.94.2
>
> Regards
> Ram
>
>
> On Wed, Apr 17, 2013 at 5:17 PM, Amit Sela <am...@infolinks.com> wrote:
>
> > I scanned over this counter with and without column specification and all
> > looks OK now.
> > I have no CPs in this table.
> > Is there some kind of a hint mechanism in HBase' internal scan ? because
> > it's weird that ScanWildcardColumnTracker.checkColumn says that column is
> > smaller than previous column: *imprersions_ALL_2013041617*. there is no
> > imprersions only impressions and r is indeed smaller than s, could it be
> > some kind of hint bug ? I don't think I know enough of HBase internals to
> > fully understand that...
> >
> >
> >
> > On Wed, Apr 17, 2013 at 1:42 PM, ramkrishna vasudevan <
> > ramkrishna.s.vasudevan@gmail.com> wrote:
> >
> > > Hi Amit
> > >
> > > Checking the code this is possible when the qualifiers are not sorted.
> >  Do
> > > you have any CPs in your path which tries to play with the KVs?
> > >
> > > Seems to be a very weird thing.
> > > Can you try doing a scan on the KV just before this happens.  That will
> > tel
> > > you the existing kvs that are present.
> > >
> > > Even now if you can have the cluster you can try scanning for the
> region
> > > for which the flush happened.  That will give us some more info.
> > >
> > > Regards
> > > Ram
> > >
> > >
> > > On Wed, Apr 17, 2013 at 2:36 PM, Amit Sela <am...@infolinks.com>
> wrote:
> > >
> > > > The cluster runs Hadoop 1.0.4 and HBase 0.94.2
> > > >
> > > > I have three families in this table: weekly, daily, hourly. each
> family
> > > has
> > > > the following qualifiers:
> > > > Weekly - impressions_{countrycode}_{week#} - country code is 0, 1 or
> > ALL
> > > > (aggregation of both 0 and 1)
> > > > Daily and hourly are the same but with yyyyMMdd and yyyyMMddhh
> > > > respectively.
> > > >
> > > > Just before the exception the regionserver StoreFile executes the
> > > > following:
> > > >
> > > > 2013-04-16 17:56:06,769 [regionserver8041.cacheFlusher] INFO
> > > > org.apache.hadoop.hbase.regionserver.StoreFile: Delete Family Bloom
> > > filter
> > > > type for hdfs://
> > > > hadoop-master.infolinks.com:8000/hbase/URL_COUNTERS/af2760e
> > > > 4d04a9e3025d1fb53bdba8acf/.tmp/dc4ce516887f4e0bbaf6201d69ba90bc:
> > > > CompoundBloomFilterWriter
> > > > 2013-04-16 17:56:07,331 [regionserver8041.cacheFlusher] INFO
> > > > org.apache.hadoop.hbase.regionserver.StoreFile: NO General Bloom and
> NO
> > > > DeleteFamily was added to HFile
> (hdfs://hbase-master-address:8000/hbase
> > > > /URL_COUNTERS/*af2760e4d04a9e3025d1fb53bdba8acf*
> > > > /.tmp/dc4ce516887f4e0bbaf6201d69ba90bc)
> > > > 2013-04-16 17:56:07,331 [regionserver8041.cacheFlusher] INFO
> > > > org.apache.hadoop.hbase.regionserver.Store: Flushed ,
> > > sequenceid=210517246,
> > > > memsize=39.3m, into tmp file hdfs://hbase-master:8000/hbase
> > > > /URL_COUNTERS/*af2760e4d04a9e3025d1fb53bdba8acf*
> > > > /.tmp/dc4ce516887f4e0bbaf6201d69ba90bc
> > > > 2013-04-16 17:56:07,357 [regionserver8041.cacheFlusher] INFO
> > > > org.apache.hadoop.hbase.regionserver.StoreFile: Delete Family Bloom
> > > filter
> > > > type for hdfs://hbase-master:8000/hbase/URL_COUNTERS/*af2760e*
> > > > *4d04a9e3025d1fb53bdba8acf*/.tmp/3fa7993dcb294be1bca5e4d7357f4003:
> > > > CompoundBloomFilterWriter
> > > > 2013-04-16 17:56:07,608 [regionserver8041.cacheFlusher] INFO
> > > > org.apache.hadoop.hbase.regionserver.StoreFile: NO General Bloom and
> NO
> > > > DeleteFamily was added to HFile (hdfs://hbase-master:8000/hbase
> > > > /URL_COUNTERS/*af2760e4d04a9e3025d1fb53bdba8acf*
> > > > /.tmp/3fa7993dcb294be1bca5e4d7357f4003)
> > > > 2013-04-16 17:56:07,608 [regionserver8041.cacheFlusher] FATAL
> > > > org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region
> > > server
> > > > region-server-address,8041,1364993168088: Replay of HLog required
> > > > . Forcing server shutdown
> > > > DroppedSnapshotException: region: TABLE,ROWKEY,1364317591568.*
> > > > af2760e4d04a9e3025d1fb53bdba8acf*.
> > > > ....
> > > > ....
> > > > ...
> > > >
> > > >
> > > > On Wed, Apr 17, 2013 at 11:47 AM, ramkrishna vasudevan <
> > > > ramkrishna.s.vasudevan@gmail.com> wrote:
> > > >
> > > > > Seems interesting.  Can  you tell us what are the families and the
> > > > > qualifiers available in your schema.
> > > > >
> > > > > Any other interesting logs that you can see before this?
> > > > >
> > > > > BTW the version of HBase is also needed?  If we can track it out we
> > can
> > > > > then file a JIRA if it is a bug.
> > > > >
> > > > > Regards
> > > > > RAm
> > > > >
> > > > >
> > > > > On Wed, Apr 17, 2013 at 2:00 PM, Amit Sela <am...@infolinks.com>
> > > wrote:
> > > > >
> > > > > > Hi all,
> > > > > >
> > > > > > I had a regionserver crushed during counters increment. Looking
> at
> > > the
> > > > > > regionserver log I saw:
> > > > > >
> > > > > > org.apache.hadoop.hbase.DroppedSnapshotException: region:
> > TABLE_NAME,
> > > > > > ROW_KEY...at
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:1472)
> > > > > >         at
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:1351)
> > > > > >         at
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:1292)
> > > > > >         at
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:406)
> > > > > >         at
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:380)
> > > > > >         at
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.run(MemStoreFlusher.java:243)
> > > > > >         at java.lang.Thread.run(Thread.java:722)
> > > > > > Caused by: java.io.IOException:
> > ScanWildcardColumnTracker.checkColumn
> > > > ran
> > > > > > into a column actually smaller than the previous column:
> > *QUALIFIER*
> > > > > > at
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.ScanWildcardColumnTracker.checkColumn(ScanWildcardColumnTracker.java:104)
> > > > > >         at
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.ScanQueryMatcher.match(ScanQueryMatcher.java:354)
> > > > > >         at
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:362)
> > > > > >         at
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:311)
> > > > > >         at
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.Store.internalFlushCache(Store.java:738)
> > > > > >         at
> > > > > >
> > org.apache.hadoop.hbase.regionserver.Store.flushCache(Store.java:673)
> > > > > >         at
> > > > > >
> > org.apache.hadoop.hbase.regionserver.Store.access$400(Store.java:108)
> > > > > >         at
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.Store$StoreFlusherImpl.flushCache(Store.java:2276)
> > > > > >         at
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:1447)
> > > > > >
> > > > > > The strange thing is that the *QUALIFER* name as it appears in
> the
> > > log
> > > > is
> > > > > > misspelled.... there is no, and never was such qualifier name.
> > > > > >
> > > > > > Thanks,
> > > > > >
> > > > > > Amit.
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: RegionServer shutdown with ScanWildcardColumnTracker exception

Posted by ramkrishna vasudevan <ra...@gmail.com>.
@Lars
You have any suggestions on this?

@Amit
You have any Encoder enabled like the Prefix Encoding stuff?
There was one optimization added recently but that is not in 0.94.2

Regards
Ram


On Wed, Apr 17, 2013 at 5:17 PM, Amit Sela <am...@infolinks.com> wrote:

> I scanned over this counter with and without column specification and all
> looks OK now.
> I have no CPs in this table.
> Is there some kind of a hint mechanism in HBase' internal scan ? because
> it's weird that ScanWildcardColumnTracker.checkColumn says that column is
> smaller than previous column: *imprersions_ALL_2013041617*. there is no
> imprersions only impressions and r is indeed smaller than s, could it be
> some kind of hint bug ? I don't think I know enough of HBase internals to
> fully understand that...
>
>
>
> On Wed, Apr 17, 2013 at 1:42 PM, ramkrishna vasudevan <
> ramkrishna.s.vasudevan@gmail.com> wrote:
>
> > Hi Amit
> >
> > Checking the code this is possible when the qualifiers are not sorted.
>  Do
> > you have any CPs in your path which tries to play with the KVs?
> >
> > Seems to be a very weird thing.
> > Can you try doing a scan on the KV just before this happens.  That will
> tel
> > you the existing kvs that are present.
> >
> > Even now if you can have the cluster you can try scanning for the region
> > for which the flush happened.  That will give us some more info.
> >
> > Regards
> > Ram
> >
> >
> > On Wed, Apr 17, 2013 at 2:36 PM, Amit Sela <am...@infolinks.com> wrote:
> >
> > > The cluster runs Hadoop 1.0.4 and HBase 0.94.2
> > >
> > > I have three families in this table: weekly, daily, hourly. each family
> > has
> > > the following qualifiers:
> > > Weekly - impressions_{countrycode}_{week#} - country code is 0, 1 or
> ALL
> > > (aggregation of both 0 and 1)
> > > Daily and hourly are the same but with yyyyMMdd and yyyyMMddhh
> > > respectively.
> > >
> > > Just before the exception the regionserver StoreFile executes the
> > > following:
> > >
> > > 2013-04-16 17:56:06,769 [regionserver8041.cacheFlusher] INFO
> > > org.apache.hadoop.hbase.regionserver.StoreFile: Delete Family Bloom
> > filter
> > > type for hdfs://
> > > hadoop-master.infolinks.com:8000/hbase/URL_COUNTERS/af2760e
> > > 4d04a9e3025d1fb53bdba8acf/.tmp/dc4ce516887f4e0bbaf6201d69ba90bc:
> > > CompoundBloomFilterWriter
> > > 2013-04-16 17:56:07,331 [regionserver8041.cacheFlusher] INFO
> > > org.apache.hadoop.hbase.regionserver.StoreFile: NO General Bloom and NO
> > > DeleteFamily was added to HFile (hdfs://hbase-master-address:8000/hbase
> > > /URL_COUNTERS/*af2760e4d04a9e3025d1fb53bdba8acf*
> > > /.tmp/dc4ce516887f4e0bbaf6201d69ba90bc)
> > > 2013-04-16 17:56:07,331 [regionserver8041.cacheFlusher] INFO
> > > org.apache.hadoop.hbase.regionserver.Store: Flushed ,
> > sequenceid=210517246,
> > > memsize=39.3m, into tmp file hdfs://hbase-master:8000/hbase
> > > /URL_COUNTERS/*af2760e4d04a9e3025d1fb53bdba8acf*
> > > /.tmp/dc4ce516887f4e0bbaf6201d69ba90bc
> > > 2013-04-16 17:56:07,357 [regionserver8041.cacheFlusher] INFO
> > > org.apache.hadoop.hbase.regionserver.StoreFile: Delete Family Bloom
> > filter
> > > type for hdfs://hbase-master:8000/hbase/URL_COUNTERS/*af2760e*
> > > *4d04a9e3025d1fb53bdba8acf*/.tmp/3fa7993dcb294be1bca5e4d7357f4003:
> > > CompoundBloomFilterWriter
> > > 2013-04-16 17:56:07,608 [regionserver8041.cacheFlusher] INFO
> > > org.apache.hadoop.hbase.regionserver.StoreFile: NO General Bloom and NO
> > > DeleteFamily was added to HFile (hdfs://hbase-master:8000/hbase
> > > /URL_COUNTERS/*af2760e4d04a9e3025d1fb53bdba8acf*
> > > /.tmp/3fa7993dcb294be1bca5e4d7357f4003)
> > > 2013-04-16 17:56:07,608 [regionserver8041.cacheFlusher] FATAL
> > > org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region
> > server
> > > region-server-address,8041,1364993168088: Replay of HLog required
> > > . Forcing server shutdown
> > > DroppedSnapshotException: region: TABLE,ROWKEY,1364317591568.*
> > > af2760e4d04a9e3025d1fb53bdba8acf*.
> > > ....
> > > ....
> > > ...
> > >
> > >
> > > On Wed, Apr 17, 2013 at 11:47 AM, ramkrishna vasudevan <
> > > ramkrishna.s.vasudevan@gmail.com> wrote:
> > >
> > > > Seems interesting.  Can  you tell us what are the families and the
> > > > qualifiers available in your schema.
> > > >
> > > > Any other interesting logs that you can see before this?
> > > >
> > > > BTW the version of HBase is also needed?  If we can track it out we
> can
> > > > then file a JIRA if it is a bug.
> > > >
> > > > Regards
> > > > RAm
> > > >
> > > >
> > > > On Wed, Apr 17, 2013 at 2:00 PM, Amit Sela <am...@infolinks.com>
> > wrote:
> > > >
> > > > > Hi all,
> > > > >
> > > > > I had a regionserver crushed during counters increment. Looking at
> > the
> > > > > regionserver log I saw:
> > > > >
> > > > > org.apache.hadoop.hbase.DroppedSnapshotException: region:
> TABLE_NAME,
> > > > > ROW_KEY...at
> > > > >
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:1472)
> > > > >         at
> > > > >
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:1351)
> > > > >         at
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:1292)
> > > > >         at
> > > > >
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:406)
> > > > >         at
> > > > >
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:380)
> > > > >         at
> > > > >
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.run(MemStoreFlusher.java:243)
> > > > >         at java.lang.Thread.run(Thread.java:722)
> > > > > Caused by: java.io.IOException:
> ScanWildcardColumnTracker.checkColumn
> > > ran
> > > > > into a column actually smaller than the previous column:
> *QUALIFIER*
> > > > > at
> > > > >
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.ScanWildcardColumnTracker.checkColumn(ScanWildcardColumnTracker.java:104)
> > > > >         at
> > > > >
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.ScanQueryMatcher.match(ScanQueryMatcher.java:354)
> > > > >         at
> > > > >
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:362)
> > > > >         at
> > > > >
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:311)
> > > > >         at
> > > > >
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.Store.internalFlushCache(Store.java:738)
> > > > >         at
> > > > >
> org.apache.hadoop.hbase.regionserver.Store.flushCache(Store.java:673)
> > > > >         at
> > > > >
> org.apache.hadoop.hbase.regionserver.Store.access$400(Store.java:108)
> > > > >         at
> > > > >
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.Store$StoreFlusherImpl.flushCache(Store.java:2276)
> > > > >         at
> > > > >
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:1447)
> > > > >
> > > > > The strange thing is that the *QUALIFER* name as it appears in the
> > log
> > > is
> > > > > misspelled.... there is no, and never was such qualifier name.
> > > > >
> > > > > Thanks,
> > > > >
> > > > > Amit.
> > > > >
> > > >
> > >
> >
>

Re: RegionServer shutdown with ScanWildcardColumnTracker exception

Posted by Amit Sela <am...@infolinks.com>.
I scanned over this counter with and without column specification and all
looks OK now.
I have no CPs in this table.
Is there some kind of a hint mechanism in HBase' internal scan ? because
it's weird that ScanWildcardColumnTracker.checkColumn says that column is
smaller than previous column: *imprersions_ALL_2013041617*. there is no
imprersions only impressions and r is indeed smaller than s, could it be
some kind of hint bug ? I don't think I know enough of HBase internals to
fully understand that...



On Wed, Apr 17, 2013 at 1:42 PM, ramkrishna vasudevan <
ramkrishna.s.vasudevan@gmail.com> wrote:

> Hi Amit
>
> Checking the code this is possible when the qualifiers are not sorted.  Do
> you have any CPs in your path which tries to play with the KVs?
>
> Seems to be a very weird thing.
> Can you try doing a scan on the KV just before this happens.  That will tel
> you the existing kvs that are present.
>
> Even now if you can have the cluster you can try scanning for the region
> for which the flush happened.  That will give us some more info.
>
> Regards
> Ram
>
>
> On Wed, Apr 17, 2013 at 2:36 PM, Amit Sela <am...@infolinks.com> wrote:
>
> > The cluster runs Hadoop 1.0.4 and HBase 0.94.2
> >
> > I have three families in this table: weekly, daily, hourly. each family
> has
> > the following qualifiers:
> > Weekly - impressions_{countrycode}_{week#} - country code is 0, 1 or ALL
> > (aggregation of both 0 and 1)
> > Daily and hourly are the same but with yyyyMMdd and yyyyMMddhh
> > respectively.
> >
> > Just before the exception the regionserver StoreFile executes the
> > following:
> >
> > 2013-04-16 17:56:06,769 [regionserver8041.cacheFlusher] INFO
> > org.apache.hadoop.hbase.regionserver.StoreFile: Delete Family Bloom
> filter
> > type for hdfs://
> > hadoop-master.infolinks.com:8000/hbase/URL_COUNTERS/af2760e
> > 4d04a9e3025d1fb53bdba8acf/.tmp/dc4ce516887f4e0bbaf6201d69ba90bc:
> > CompoundBloomFilterWriter
> > 2013-04-16 17:56:07,331 [regionserver8041.cacheFlusher] INFO
> > org.apache.hadoop.hbase.regionserver.StoreFile: NO General Bloom and NO
> > DeleteFamily was added to HFile (hdfs://hbase-master-address:8000/hbase
> > /URL_COUNTERS/*af2760e4d04a9e3025d1fb53bdba8acf*
> > /.tmp/dc4ce516887f4e0bbaf6201d69ba90bc)
> > 2013-04-16 17:56:07,331 [regionserver8041.cacheFlusher] INFO
> > org.apache.hadoop.hbase.regionserver.Store: Flushed ,
> sequenceid=210517246,
> > memsize=39.3m, into tmp file hdfs://hbase-master:8000/hbase
> > /URL_COUNTERS/*af2760e4d04a9e3025d1fb53bdba8acf*
> > /.tmp/dc4ce516887f4e0bbaf6201d69ba90bc
> > 2013-04-16 17:56:07,357 [regionserver8041.cacheFlusher] INFO
> > org.apache.hadoop.hbase.regionserver.StoreFile: Delete Family Bloom
> filter
> > type for hdfs://hbase-master:8000/hbase/URL_COUNTERS/*af2760e*
> > *4d04a9e3025d1fb53bdba8acf*/.tmp/3fa7993dcb294be1bca5e4d7357f4003:
> > CompoundBloomFilterWriter
> > 2013-04-16 17:56:07,608 [regionserver8041.cacheFlusher] INFO
> > org.apache.hadoop.hbase.regionserver.StoreFile: NO General Bloom and NO
> > DeleteFamily was added to HFile (hdfs://hbase-master:8000/hbase
> > /URL_COUNTERS/*af2760e4d04a9e3025d1fb53bdba8acf*
> > /.tmp/3fa7993dcb294be1bca5e4d7357f4003)
> > 2013-04-16 17:56:07,608 [regionserver8041.cacheFlusher] FATAL
> > org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region
> server
> > region-server-address,8041,1364993168088: Replay of HLog required
> > . Forcing server shutdown
> > DroppedSnapshotException: region: TABLE,ROWKEY,1364317591568.*
> > af2760e4d04a9e3025d1fb53bdba8acf*.
> > ....
> > ....
> > ...
> >
> >
> > On Wed, Apr 17, 2013 at 11:47 AM, ramkrishna vasudevan <
> > ramkrishna.s.vasudevan@gmail.com> wrote:
> >
> > > Seems interesting.  Can  you tell us what are the families and the
> > > qualifiers available in your schema.
> > >
> > > Any other interesting logs that you can see before this?
> > >
> > > BTW the version of HBase is also needed?  If we can track it out we can
> > > then file a JIRA if it is a bug.
> > >
> > > Regards
> > > RAm
> > >
> > >
> > > On Wed, Apr 17, 2013 at 2:00 PM, Amit Sela <am...@infolinks.com>
> wrote:
> > >
> > > > Hi all,
> > > >
> > > > I had a regionserver crushed during counters increment. Looking at
> the
> > > > regionserver log I saw:
> > > >
> > > > org.apache.hadoop.hbase.DroppedSnapshotException: region: TABLE_NAME,
> > > > ROW_KEY...at
> > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:1472)
> > > >         at
> > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:1351)
> > > >         at
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:1292)
> > > >         at
> > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:406)
> > > >         at
> > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:380)
> > > >         at
> > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.run(MemStoreFlusher.java:243)
> > > >         at java.lang.Thread.run(Thread.java:722)
> > > > Caused by: java.io.IOException: ScanWildcardColumnTracker.checkColumn
> > ran
> > > > into a column actually smaller than the previous column: *QUALIFIER*
> > > > at
> > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.ScanWildcardColumnTracker.checkColumn(ScanWildcardColumnTracker.java:104)
> > > >         at
> > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.ScanQueryMatcher.match(ScanQueryMatcher.java:354)
> > > >         at
> > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:362)
> > > >         at
> > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:311)
> > > >         at
> > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.Store.internalFlushCache(Store.java:738)
> > > >         at
> > > > org.apache.hadoop.hbase.regionserver.Store.flushCache(Store.java:673)
> > > >         at
> > > > org.apache.hadoop.hbase.regionserver.Store.access$400(Store.java:108)
> > > >         at
> > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.Store$StoreFlusherImpl.flushCache(Store.java:2276)
> > > >         at
> > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:1447)
> > > >
> > > > The strange thing is that the *QUALIFER* name as it appears in the
> log
> > is
> > > > misspelled.... there is no, and never was such qualifier name.
> > > >
> > > > Thanks,
> > > >
> > > > Amit.
> > > >
> > >
> >
>

Re: RegionServer shutdown with ScanWildcardColumnTracker exception

Posted by ramkrishna vasudevan <ra...@gmail.com>.
Hi Amit

Checking the code this is possible when the qualifiers are not sorted.  Do
you have any CPs in your path which tries to play with the KVs?

Seems to be a very weird thing.
Can you try doing a scan on the KV just before this happens.  That will tel
you the existing kvs that are present.

Even now if you can have the cluster you can try scanning for the region
for which the flush happened.  That will give us some more info.

Regards
Ram


On Wed, Apr 17, 2013 at 2:36 PM, Amit Sela <am...@infolinks.com> wrote:

> The cluster runs Hadoop 1.0.4 and HBase 0.94.2
>
> I have three families in this table: weekly, daily, hourly. each family has
> the following qualifiers:
> Weekly - impressions_{countrycode}_{week#} - country code is 0, 1 or ALL
> (aggregation of both 0 and 1)
> Daily and hourly are the same but with yyyyMMdd and yyyyMMddhh
> respectively.
>
> Just before the exception the regionserver StoreFile executes the
> following:
>
> 2013-04-16 17:56:06,769 [regionserver8041.cacheFlusher] INFO
> org.apache.hadoop.hbase.regionserver.StoreFile: Delete Family Bloom filter
> type for hdfs://
> hadoop-master.infolinks.com:8000/hbase/URL_COUNTERS/af2760e
> 4d04a9e3025d1fb53bdba8acf/.tmp/dc4ce516887f4e0bbaf6201d69ba90bc:
> CompoundBloomFilterWriter
> 2013-04-16 17:56:07,331 [regionserver8041.cacheFlusher] INFO
> org.apache.hadoop.hbase.regionserver.StoreFile: NO General Bloom and NO
> DeleteFamily was added to HFile (hdfs://hbase-master-address:8000/hbase
> /URL_COUNTERS/*af2760e4d04a9e3025d1fb53bdba8acf*
> /.tmp/dc4ce516887f4e0bbaf6201d69ba90bc)
> 2013-04-16 17:56:07,331 [regionserver8041.cacheFlusher] INFO
> org.apache.hadoop.hbase.regionserver.Store: Flushed , sequenceid=210517246,
> memsize=39.3m, into tmp file hdfs://hbase-master:8000/hbase
> /URL_COUNTERS/*af2760e4d04a9e3025d1fb53bdba8acf*
> /.tmp/dc4ce516887f4e0bbaf6201d69ba90bc
> 2013-04-16 17:56:07,357 [regionserver8041.cacheFlusher] INFO
> org.apache.hadoop.hbase.regionserver.StoreFile: Delete Family Bloom filter
> type for hdfs://hbase-master:8000/hbase/URL_COUNTERS/*af2760e*
> *4d04a9e3025d1fb53bdba8acf*/.tmp/3fa7993dcb294be1bca5e4d7357f4003:
> CompoundBloomFilterWriter
> 2013-04-16 17:56:07,608 [regionserver8041.cacheFlusher] INFO
> org.apache.hadoop.hbase.regionserver.StoreFile: NO General Bloom and NO
> DeleteFamily was added to HFile (hdfs://hbase-master:8000/hbase
> /URL_COUNTERS/*af2760e4d04a9e3025d1fb53bdba8acf*
> /.tmp/3fa7993dcb294be1bca5e4d7357f4003)
> 2013-04-16 17:56:07,608 [regionserver8041.cacheFlusher] FATAL
> org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server
> region-server-address,8041,1364993168088: Replay of HLog required
> . Forcing server shutdown
> DroppedSnapshotException: region: TABLE,ROWKEY,1364317591568.*
> af2760e4d04a9e3025d1fb53bdba8acf*.
> ....
> ....
> ...
>
>
> On Wed, Apr 17, 2013 at 11:47 AM, ramkrishna vasudevan <
> ramkrishna.s.vasudevan@gmail.com> wrote:
>
> > Seems interesting.  Can  you tell us what are the families and the
> > qualifiers available in your schema.
> >
> > Any other interesting logs that you can see before this?
> >
> > BTW the version of HBase is also needed?  If we can track it out we can
> > then file a JIRA if it is a bug.
> >
> > Regards
> > RAm
> >
> >
> > On Wed, Apr 17, 2013 at 2:00 PM, Amit Sela <am...@infolinks.com> wrote:
> >
> > > Hi all,
> > >
> > > I had a regionserver crushed during counters increment. Looking at the
> > > regionserver log I saw:
> > >
> > > org.apache.hadoop.hbase.DroppedSnapshotException: region: TABLE_NAME,
> > > ROW_KEY...at
> > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:1472)
> > >         at
> > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:1351)
> > >         at
> > >
> >
> org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:1292)
> > >         at
> > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:406)
> > >         at
> > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:380)
> > >         at
> > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.run(MemStoreFlusher.java:243)
> > >         at java.lang.Thread.run(Thread.java:722)
> > > Caused by: java.io.IOException: ScanWildcardColumnTracker.checkColumn
> ran
> > > into a column actually smaller than the previous column: *QUALIFIER*
> > > at
> > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.ScanWildcardColumnTracker.checkColumn(ScanWildcardColumnTracker.java:104)
> > >         at
> > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.ScanQueryMatcher.match(ScanQueryMatcher.java:354)
> > >         at
> > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:362)
> > >         at
> > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:311)
> > >         at
> > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.Store.internalFlushCache(Store.java:738)
> > >         at
> > > org.apache.hadoop.hbase.regionserver.Store.flushCache(Store.java:673)
> > >         at
> > > org.apache.hadoop.hbase.regionserver.Store.access$400(Store.java:108)
> > >         at
> > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.Store$StoreFlusherImpl.flushCache(Store.java:2276)
> > >         at
> > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:1447)
> > >
> > > The strange thing is that the *QUALIFER* name as it appears in the log
> is
> > > misspelled.... there is no, and never was such qualifier name.
> > >
> > > Thanks,
> > >
> > > Amit.
> > >
> >
>

Re: RegionServer shutdown with ScanWildcardColumnTracker exception

Posted by Amit Sela <am...@infolinks.com>.
The cluster runs Hadoop 1.0.4 and HBase 0.94.2

I have three families in this table: weekly, daily, hourly. each family has
the following qualifiers:
Weekly - impressions_{countrycode}_{week#} - country code is 0, 1 or ALL
(aggregation of both 0 and 1)
Daily and hourly are the same but with yyyyMMdd and yyyyMMddhh
respectively.

Just before the exception the regionserver StoreFile executes the
following:

2013-04-16 17:56:06,769 [regionserver8041.cacheFlusher] INFO
org.apache.hadoop.hbase.regionserver.StoreFile: Delete Family Bloom filter
type for hdfs://hadoop-master.infolinks.com:8000/hbase/URL_COUNTERS/af2760e
4d04a9e3025d1fb53bdba8acf/.tmp/dc4ce516887f4e0bbaf6201d69ba90bc:
CompoundBloomFilterWriter
2013-04-16 17:56:07,331 [regionserver8041.cacheFlusher] INFO
org.apache.hadoop.hbase.regionserver.StoreFile: NO General Bloom and NO
DeleteFamily was added to HFile (hdfs://hbase-master-address:8000/hbase
/URL_COUNTERS/*af2760e4d04a9e3025d1fb53bdba8acf*
/.tmp/dc4ce516887f4e0bbaf6201d69ba90bc)
2013-04-16 17:56:07,331 [regionserver8041.cacheFlusher] INFO
org.apache.hadoop.hbase.regionserver.Store: Flushed , sequenceid=210517246,
memsize=39.3m, into tmp file hdfs://hbase-master:8000/hbase
/URL_COUNTERS/*af2760e4d04a9e3025d1fb53bdba8acf*
/.tmp/dc4ce516887f4e0bbaf6201d69ba90bc
2013-04-16 17:56:07,357 [regionserver8041.cacheFlusher] INFO
org.apache.hadoop.hbase.regionserver.StoreFile: Delete Family Bloom filter
type for hdfs://hbase-master:8000/hbase/URL_COUNTERS/*af2760e*
*4d04a9e3025d1fb53bdba8acf*/.tmp/3fa7993dcb294be1bca5e4d7357f4003:
CompoundBloomFilterWriter
2013-04-16 17:56:07,608 [regionserver8041.cacheFlusher] INFO
org.apache.hadoop.hbase.regionserver.StoreFile: NO General Bloom and NO
DeleteFamily was added to HFile (hdfs://hbase-master:8000/hbase
/URL_COUNTERS/*af2760e4d04a9e3025d1fb53bdba8acf*
/.tmp/3fa7993dcb294be1bca5e4d7357f4003)
2013-04-16 17:56:07,608 [regionserver8041.cacheFlusher] FATAL
org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server
region-server-address,8041,1364993168088: Replay of HLog required
. Forcing server shutdown
DroppedSnapshotException: region: TABLE,ROWKEY,1364317591568.*
af2760e4d04a9e3025d1fb53bdba8acf*.
....
....
...


On Wed, Apr 17, 2013 at 11:47 AM, ramkrishna vasudevan <
ramkrishna.s.vasudevan@gmail.com> wrote:

> Seems interesting.  Can  you tell us what are the families and the
> qualifiers available in your schema.
>
> Any other interesting logs that you can see before this?
>
> BTW the version of HBase is also needed?  If we can track it out we can
> then file a JIRA if it is a bug.
>
> Regards
> RAm
>
>
> On Wed, Apr 17, 2013 at 2:00 PM, Amit Sela <am...@infolinks.com> wrote:
>
> > Hi all,
> >
> > I had a regionserver crushed during counters increment. Looking at the
> > regionserver log I saw:
> >
> > org.apache.hadoop.hbase.DroppedSnapshotException: region: TABLE_NAME,
> > ROW_KEY...at
> >
> >
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:1472)
> >         at
> >
> >
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:1351)
> >         at
> >
> org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:1292)
> >         at
> >
> >
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:406)
> >         at
> >
> >
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:380)
> >         at
> >
> >
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.run(MemStoreFlusher.java:243)
> >         at java.lang.Thread.run(Thread.java:722)
> > Caused by: java.io.IOException: ScanWildcardColumnTracker.checkColumn ran
> > into a column actually smaller than the previous column: *QUALIFIER*
> > at
> >
> >
> org.apache.hadoop.hbase.regionserver.ScanWildcardColumnTracker.checkColumn(ScanWildcardColumnTracker.java:104)
> >         at
> >
> >
> org.apache.hadoop.hbase.regionserver.ScanQueryMatcher.match(ScanQueryMatcher.java:354)
> >         at
> >
> >
> org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:362)
> >         at
> >
> >
> org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:311)
> >         at
> >
> >
> org.apache.hadoop.hbase.regionserver.Store.internalFlushCache(Store.java:738)
> >         at
> > org.apache.hadoop.hbase.regionserver.Store.flushCache(Store.java:673)
> >         at
> > org.apache.hadoop.hbase.regionserver.Store.access$400(Store.java:108)
> >         at
> >
> >
> org.apache.hadoop.hbase.regionserver.Store$StoreFlusherImpl.flushCache(Store.java:2276)
> >         at
> >
> >
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:1447)
> >
> > The strange thing is that the *QUALIFER* name as it appears in the log is
> > misspelled.... there is no, and never was such qualifier name.
> >
> > Thanks,
> >
> > Amit.
> >
>

Re: RegionServer shutdown with ScanWildcardColumnTracker exception

Posted by ramkrishna vasudevan <ra...@gmail.com>.
Seems interesting.  Can  you tell us what are the families and the
qualifiers available in your schema.

Any other interesting logs that you can see before this?

BTW the version of HBase is also needed?  If we can track it out we can
then file a JIRA if it is a bug.

Regards
RAm


On Wed, Apr 17, 2013 at 2:00 PM, Amit Sela <am...@infolinks.com> wrote:

> Hi all,
>
> I had a regionserver crushed during counters increment. Looking at the
> regionserver log I saw:
>
> org.apache.hadoop.hbase.DroppedSnapshotException: region: TABLE_NAME,
> ROW_KEY...at
>
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:1472)
>         at
>
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:1351)
>         at
> org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:1292)
>         at
>
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:406)
>         at
>
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:380)
>         at
>
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.run(MemStoreFlusher.java:243)
>         at java.lang.Thread.run(Thread.java:722)
> Caused by: java.io.IOException: ScanWildcardColumnTracker.checkColumn ran
> into a column actually smaller than the previous column: *QUALIFIER*
> at
>
> org.apache.hadoop.hbase.regionserver.ScanWildcardColumnTracker.checkColumn(ScanWildcardColumnTracker.java:104)
>         at
>
> org.apache.hadoop.hbase.regionserver.ScanQueryMatcher.match(ScanQueryMatcher.java:354)
>         at
>
> org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:362)
>         at
>
> org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:311)
>         at
>
> org.apache.hadoop.hbase.regionserver.Store.internalFlushCache(Store.java:738)
>         at
> org.apache.hadoop.hbase.regionserver.Store.flushCache(Store.java:673)
>         at
> org.apache.hadoop.hbase.regionserver.Store.access$400(Store.java:108)
>         at
>
> org.apache.hadoop.hbase.regionserver.Store$StoreFlusherImpl.flushCache(Store.java:2276)
>         at
>
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:1447)
>
> The strange thing is that the *QUALIFER* name as it appears in the log is
> misspelled.... there is no, and never was such qualifier name.
>
> Thanks,
>
> Amit.
>