You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by "Huangmao (Homer) Quan" <lu...@gmail.com> on 2013/07/24 02:23:07 UTC

Data missing in import bulk data

Hi hbase users,

We got an issue when import data from thrift (perl)

We found the number of data is less than expected.

when scan the table, we got:
ERROR: java.lang.RuntimeException:
org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed after
attempts=7, exceptions:
Tue Jul 23 23:01:41 UTC 2013,
org.apache.hadoop.hbase.client.ScannerCallable@180f9720,
java.io.IOException: java.io.IOException: Could not iterate
StoreFileScanner[HFileScanner for reader
reader=file:/tmp/hbase-hbase/hbase/skg/d13644aae91d7ee9a8fdde461e8ec217/wrapstar/51a2e5871b7a4af8a2d9d17ed0c14031,
compression=none, cacheConf=CacheConfig:enabled [cacheDataOnRead=false]
[cacheDataOnWrite=false] [cacheIndexesOnWrite=false]
[cacheBloomsOnWrite=false] [cacheEvictOnClose=false]
[cacheCompressed=false], firstKey="Laughing"Larry
Berger-nm5619461/wrapstar:data/1374615644669/Put, lastKey=Jordan-Patrick
Marcantonio-nm0545093/wrapstar:data/1374616499993/Put, avgKeyLen=47,
avgValueLen=652, entries=156586, length=111099401, cur=George
McGovern-nm0569566/wrapstar:data/1374616538067/Put/vlen=17162/ts=0]


And even weird, when I monitoring the row number during import, I found
some time the row number decrease sharply (lots of data missing)

hbase(main):003:0> count 'skgtwo'
.............
*134453 row(s)* in 7.5510 seconds

hbase(main):004:0> count 'skgtwo'
...................
*88970 row(s)* in 7.5380 seconds

Any suggestion is appreciated.

Cheers

†Huangmao (Homer) Quan
Email:   lujooo@gmail.com
Google Voice: +1 (530) 903-8125
Facebook: http://www.facebook.com/homerquan
Linkedin: http://www.linkedin.com/in/homerquan<http://www.linkedin.com/in/earthisflat>

Re: Data missing in import bulk data

Posted by yonghu <yo...@gmail.com>.
The ways that you can lose data in my point of views:

1. some tuples share the same row-key+cf+column. Hence, when you load your
data in HBase, they will be loaded into the same column and may exceed the
predefined max version.

2. As Ted mentioned, you may import some delete, do you generate tombstones
in your bulk load?

By the way, can you show us the schema of your imported data, like whether
it contains duplicates,  how is your row key design?

regards!

Yong


On Wed, Jul 24, 2013 at 3:55 AM, Ted Yu <yu...@gmail.com> wrote:

> Which HBase release are you using ?
>
> Was it possible that the import included Delete's ?
>
> Cheers
>
> On Tue, Jul 23, 2013 at 5:23 PM, Huangmao (Homer) Quan <lujooo@gmail.com
> >wrote:
>
> > Hi hbase users,
> >
> > We got an issue when import data from thrift (perl)
> >
> > We found the number of data is less than expected.
> >
> > when scan the table, we got:
> > ERROR: java.lang.RuntimeException:
> > org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed after
> > attempts=7, exceptions:
> > Tue Jul 23 23:01:41 UTC 2013,
> > org.apache.hadoop.hbase.client.ScannerCallable@180f9720,
> > java.io.IOException: java.io.IOException: Could not iterate
> > StoreFileScanner[HFileScanner for reader
> >
> >
> reader=file:/tmp/hbase-hbase/hbase/skg/d13644aae91d7ee9a8fdde461e8ec217/wrapstar/51a2e5871b7a4af8a2d9d17ed0c14031,
> > compression=none, cacheConf=CacheConfig:enabled [cacheDataOnRead=false]
> > [cacheDataOnWrite=false] [cacheIndexesOnWrite=false]
> > [cacheBloomsOnWrite=false] [cacheEvictOnClose=false]
> > [cacheCompressed=false], firstKey="Laughing"Larry
> > Berger-nm5619461/wrapstar:data/1374615644669/Put, lastKey=Jordan-Patrick
> > Marcantonio-nm0545093/wrapstar:data/1374616499993/Put, avgKeyLen=47,
> > avgValueLen=652, entries=156586, length=111099401, cur=George
> > McGovern-nm0569566/wrapstar:data/1374616538067/Put/vlen=17162/ts=0]
> >
> >
> > And even weird, when I monitoring the row number during import, I found
> > some time the row number decrease sharply (lots of data missing)
> >
> > hbase(main):003:0> count 'skgtwo'
> > .............
> > *134453 row(s)* in 7.5510 seconds
> >
> > hbase(main):004:0> count 'skgtwo'
> > ...................
> > *88970 row(s)* in 7.5380 seconds
> >
> > Any suggestion is appreciated.
> >
> > Cheers
> >
> > †Huangmao (Homer) Quan
> > Email:   lujooo@gmail.com
> > Google Voice: +1 (530) 903-8125
> > Facebook: http://www.facebook.com/homerquan
> > Linkedin: http://www.linkedin.com/in/homerquan<
> > http://www.linkedin.com/in/earthisflat>
> >
>

Re: Data missing in import bulk data

Posted by Ted Yu <yu...@gmail.com>.
Which HBase release are you using ?

Was it possible that the import included Delete's ?

Cheers

On Tue, Jul 23, 2013 at 5:23 PM, Huangmao (Homer) Quan <lu...@gmail.com>wrote:

> Hi hbase users,
>
> We got an issue when import data from thrift (perl)
>
> We found the number of data is less than expected.
>
> when scan the table, we got:
> ERROR: java.lang.RuntimeException:
> org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed after
> attempts=7, exceptions:
> Tue Jul 23 23:01:41 UTC 2013,
> org.apache.hadoop.hbase.client.ScannerCallable@180f9720,
> java.io.IOException: java.io.IOException: Could not iterate
> StoreFileScanner[HFileScanner for reader
>
> reader=file:/tmp/hbase-hbase/hbase/skg/d13644aae91d7ee9a8fdde461e8ec217/wrapstar/51a2e5871b7a4af8a2d9d17ed0c14031,
> compression=none, cacheConf=CacheConfig:enabled [cacheDataOnRead=false]
> [cacheDataOnWrite=false] [cacheIndexesOnWrite=false]
> [cacheBloomsOnWrite=false] [cacheEvictOnClose=false]
> [cacheCompressed=false], firstKey="Laughing"Larry
> Berger-nm5619461/wrapstar:data/1374615644669/Put, lastKey=Jordan-Patrick
> Marcantonio-nm0545093/wrapstar:data/1374616499993/Put, avgKeyLen=47,
> avgValueLen=652, entries=156586, length=111099401, cur=George
> McGovern-nm0569566/wrapstar:data/1374616538067/Put/vlen=17162/ts=0]
>
>
> And even weird, when I monitoring the row number during import, I found
> some time the row number decrease sharply (lots of data missing)
>
> hbase(main):003:0> count 'skgtwo'
> .............
> *134453 row(s)* in 7.5510 seconds
>
> hbase(main):004:0> count 'skgtwo'
> ...................
> *88970 row(s)* in 7.5380 seconds
>
> Any suggestion is appreciated.
>
> Cheers
>
> †Huangmao (Homer) Quan
> Email:   lujooo@gmail.com
> Google Voice: +1 (530) 903-8125
> Facebook: http://www.facebook.com/homerquan
> Linkedin: http://www.linkedin.com/in/homerquan<
> http://www.linkedin.com/in/earthisflat>
>