You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by "Magana-zook, Steven Alan" <ma...@llnl.gov> on 2014/08/22 18:26:29 UTC

Is it possible for HTable.put(Š) to not make it into the table and silently fail?

Hello,

I have written a program in Java that is supposed to update rows in a Hbase table that do not yet have a value in a certain column (blob values of between 5k and 50k). The program keeps track of how many puts have been added to the table along with how long the program is running. These pieces of information are used to calculate a speed for data ingestion (records per second). After running the program for multiple days, and based on the average speed reported, the result from the RowCounter program is far fewer records than I expected. The essential parts of the code are shown below (error handling and other potentially not important code omitted) along with the command I use to see how many rows have been updated.

Is it possible that the put method call on Htable does not actually put the record in the database while also not throwing an exception?
Could the output of RowCounter be incorrect?
Am I doing something below that is obviously incorrect?

Row counter command (does frequently report OutOfOrderScannerNextException during execution): hbase org.apache.hadoop.hbase.mapreduce.RowCounter mytable cf:BLOBDATACOLUMN

Code that is essentially what I am doing in my program:
...
Scan scan = new Scan();
scan.setCaching(200);

HTable targetTable = new HTable(hbaseConfiguration, Bytes.toBytes(tblTarget));
targetTable.getScanner(scan);

int batchSize = 10;
Date startTime = new Date();
numFilesSent = 0;

Result[] rows = resultScanner.next(batchSize);
while (rows != null) {
for (Result row : rows) {
byte[] rowKey = row.getRow();
byte[] byteArrayBlobData = getFileContentsForRow(rowKey);

Put put = new Put(rowKey);
put.add(COLUMN_FAMILY, BLOB_COLUMN, byteArrayBlobData);
targetTable.put(put); // Auto-flush is on by default
numFilesSent++;
float elapsedSeconds = (new Date().getTime() - startTime.getTime()) / 1000.0f;
float speed = numFilesSent / elapsedSeconds;
System.out.println("Speed(rows/sec): " + speed); // routinely says from 80 to 200+
}
rows = resultScanner.next(batchSize);
}
...

Thanks,
Steven

Re: Is it possible for HTable.put(Š) to not make it into the table and silently fail?

Posted by "Magana-zook, Steven Alan" <ma...@llnl.gov>.
Hi Anoop,

I am using HBase 0.98.0.2.1.2.0-402-hadoop2 without the coprocessor
modification you mentioned. I merely threw the idea of a silent fail out
because I do catch and report Exception and Throwable on the client side,
and I see no reported errors (except for occasional Region too busy) that
would account for missing rows.

Thanks,
Steven



On 8/22/14 10:08 AM, "Anoop John" <an...@gmail.com> wrote:

>>Is it possible that the put method call on Htable does not actually put
>the record in the database while also not throwing an exception?
>
>You can.  Implement a region CP (implementing RegionObserver) and
>implement
>prePut() . In this u can bypass the operation using
>ObserverContext#bypass().
>So core will not throw exception and wont add data also
>
>-Anoop-
>
>On Fri, Aug 22, 2014 at 10:23 PM, Ted Yu <yu...@gmail.com> wrote:
>
>> bq. the result from the RowCounter program is far fewer records than I
>> expected.
>>
>> Can you give more detailed information about the gap ?
>>
>> Which hbase release are you running ?
>>
>> Cheers
>>
>>
>> On Fri, Aug 22, 2014 at 9:26 AM, Magana-zook, Steven Alan <
>> maganazook1@llnl.gov> wrote:
>>
>> > Hello,
>> >
>> > I have written a program in Java that is supposed to update rows in a
>> > Hbase table that do not yet have a value in a certain column (blob
>>values
>> > of between 5k and 50k). The program keeps track of how many puts have
>> been
>> > added to the table along with how long the program is running. These
>> pieces
>> > of information are used to calculate a speed for data ingestion
>>(records
>> > per second). After running the program for multiple days, and based on
>> the
>> > average speed reported, the result from the RowCounter program is far
>> fewer
>> > records than I expected. The essential parts of the code are shown
>>below
>> > (error handling and other potentially not important code omitted)
>>along
>> > with the command I use to see how many rows have been updated.
>> >
>> > Is it possible that the put method call on Htable does not actually
>>put
>> > the record in the database while also not throwing an exception?
>> > Could the output of RowCounter be incorrect?
>> > Am I doing something below that is obviously incorrect?
>> >
>> > Row counter command (does frequently report
>> OutOfOrderScannerNextException
>> > during execution): hbase org.apache.hadoop.hbase.mapreduce.RowCounter
>> > mytable cf:BLOBDATACOLUMN
>> >
>> > Code that is essentially what I am doing in my program:
>> > ...
>> > Scan scan = new Scan();
>> > scan.setCaching(200);
>> >
>> > HTable targetTable = new HTable(hbaseConfiguration,
>> > Bytes.toBytes(tblTarget));
>> > targetTable.getScanner(scan);
>> >
>> > int batchSize = 10;
>> > Date startTime = new Date();
>> > numFilesSent = 0;
>> >
>> > Result[] rows = resultScanner.next(batchSize);
>> > while (rows != null) {
>> > for (Result row : rows) {
>> > byte[] rowKey = row.getRow();
>> > byte[] byteArrayBlobData = getFileContentsForRow(rowKey);
>> >
>> > Put put = new Put(rowKey);
>> > put.add(COLUMN_FAMILY, BLOB_COLUMN, byteArrayBlobData);
>> > targetTable.put(put); // Auto-flush is on by default
>> > numFilesSent++;
>> > float elapsedSeconds = (new Date().getTime() - startTime.getTime()) /
>> > 1000.0f;
>> > float speed = numFilesSent / elapsedSeconds;
>> > System.out.println("Speed(rows/sec): " + speed); // routinely says
>>from
>> 80
>> > to 200+
>> > }
>> > rows = resultScanner.next(batchSize);
>> > }
>> > ...
>> >
>> > Thanks,
>> > Steven
>> >
>>


Re: Is it possible for HTable.put(Š) to not make it into the table and silently fail?

Posted by Anoop John <an...@gmail.com>.
>Is it possible that the put method call on Htable does not actually put
the record in the database while also not throwing an exception?

You can.  Implement a region CP (implementing RegionObserver) and implement
prePut() . In this u can bypass the operation using ObserverContext#bypass().
So core will not throw exception and wont add data also

-Anoop-

On Fri, Aug 22, 2014 at 10:23 PM, Ted Yu <yu...@gmail.com> wrote:

> bq. the result from the RowCounter program is far fewer records than I
> expected.
>
> Can you give more detailed information about the gap ?
>
> Which hbase release are you running ?
>
> Cheers
>
>
> On Fri, Aug 22, 2014 at 9:26 AM, Magana-zook, Steven Alan <
> maganazook1@llnl.gov> wrote:
>
> > Hello,
> >
> > I have written a program in Java that is supposed to update rows in a
> > Hbase table that do not yet have a value in a certain column (blob values
> > of between 5k and 50k). The program keeps track of how many puts have
> been
> > added to the table along with how long the program is running. These
> pieces
> > of information are used to calculate a speed for data ingestion (records
> > per second). After running the program for multiple days, and based on
> the
> > average speed reported, the result from the RowCounter program is far
> fewer
> > records than I expected. The essential parts of the code are shown below
> > (error handling and other potentially not important code omitted) along
> > with the command I use to see how many rows have been updated.
> >
> > Is it possible that the put method call on Htable does not actually put
> > the record in the database while also not throwing an exception?
> > Could the output of RowCounter be incorrect?
> > Am I doing something below that is obviously incorrect?
> >
> > Row counter command (does frequently report
> OutOfOrderScannerNextException
> > during execution): hbase org.apache.hadoop.hbase.mapreduce.RowCounter
> > mytable cf:BLOBDATACOLUMN
> >
> > Code that is essentially what I am doing in my program:
> > ...
> > Scan scan = new Scan();
> > scan.setCaching(200);
> >
> > HTable targetTable = new HTable(hbaseConfiguration,
> > Bytes.toBytes(tblTarget));
> > targetTable.getScanner(scan);
> >
> > int batchSize = 10;
> > Date startTime = new Date();
> > numFilesSent = 0;
> >
> > Result[] rows = resultScanner.next(batchSize);
> > while (rows != null) {
> > for (Result row : rows) {
> > byte[] rowKey = row.getRow();
> > byte[] byteArrayBlobData = getFileContentsForRow(rowKey);
> >
> > Put put = new Put(rowKey);
> > put.add(COLUMN_FAMILY, BLOB_COLUMN, byteArrayBlobData);
> > targetTable.put(put); // Auto-flush is on by default
> > numFilesSent++;
> > float elapsedSeconds = (new Date().getTime() - startTime.getTime()) /
> > 1000.0f;
> > float speed = numFilesSent / elapsedSeconds;
> > System.out.println("Speed(rows/sec): " + speed); // routinely says from
> 80
> > to 200+
> > }
> > rows = resultScanner.next(batchSize);
> > }
> > ...
> >
> > Thanks,
> > Steven
> >
>

Re: Is it possible for HTable.put(Š) to not make it into the table and silently fail?

Posted by Ted Yu <yu...@gmail.com>.
The exception was due to 120 being the max number of counters.

Consider increasing the setting for "mapreduce.job.counters.max"

Cheers


On Fri, Aug 22, 2014 at 1:46 PM, Magana-zook, Steven Alan <
maganazook1@llnl.gov> wrote:

> St.Ack,
>
>
> Running: hbase org.apache.hadoop.hbase.mapreduce.CellCounter mytable
> /user/samz/mytablecellcounter
>
>
> Prints a lot of these:
>
> 2014-08-22 13:40:51,037 INFO  [main] mapreduce.Job: Task Id :
> attempt_1406587748887_0063_m_000114_0, Status : FAILED
> Error: org.apache.hadoop.mapreduce.counters.LimitExceededException: Too
> many counters: 121 max=120
>         at
> org.apache.hadoop.mapreduce.counters.Limits.checkCounters(Limits.java:103)
>         at
> org.apache.hadoop.mapreduce.counters.Limits.incrCounters(Limits.java:110)
>         at
> org.apache.hadoop.mapreduce.counters.AbstractCounterGroup.addCounter(Abstra
> ctCounterGroup.java:78)
>         at
> org.apache.hadoop.mapreduce.counters.AbstractCounterGroup.addCounterImpl(Ab
> stractCounterGroup.java:95)
>         at
> org.apache.hadoop.mapreduce.counters.AbstractCounterGroup.findCounterImpl(A
> bstractCounterGroup.java:123)
>         at
> org.apache.hadoop.mapreduce.counters.AbstractCounterGroup.findCounter(Abstr
> actCounterGroup.java:113)
>         at
> org.apache.hadoop.mapreduce.counters.AbstractCounterGroup.findCounter(Abstr
> actCounterGroup.java:130)
>         at
> org.apache.hadoop.mapred.Counters$Group.findCounter(Counters.java:369)
>         at
> org.apache.hadoop.mapred.Counters$Group.getCounterForName(Counters.java:314
> )
>         at org.apache.hadoop.mapred.Counters.findCounter(Counters.java:479)
>         at
> org.apache.hadoop.mapred.Task$TaskReporter.getCounter(Task.java:658)
>         at
> org.apache.hadoop.mapred.Task$TaskReporter.getCounter(Task.java:602)
>         at
> org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl.getCounter(TaskAtte
> mptContextImpl.java:76)
>         at
> org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.getCounter(Wrappe
> dMapper.java:101)
>         at
> org.apache.hadoop.hbase.mapreduce.CellCounter$CellCounterMapper.map(CellCou
> nter.java:138)
>         at
> org.apache.hadoop.hbase.mapreduce.CellCounter$CellCounterMapper.map(CellCou
> nter.java:84)
>         at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
>         at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
>         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
>         at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:415)
>         at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.j
> ava:1557)
>         at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)
>
> Container killed by the ApplicationMaster.
> Container killed on request. Exit code is 143
> Container exited with a non-zero exit code 143
>
>
>
> Thanks,
> Steven
>
>
>
>
> On 8/22/14 12:27 PM, "Stack" <st...@duboce.net> wrote:
>
> >What does CellCounter return?
> >St.Ack
> >
> >
> >On Fri, Aug 22, 2014 at 10:17 AM, Magana-zook, Steven Alan <
> >maganazook1@llnl.gov> wrote:
> >
> >> Hi Ted,
> >>
> >> For example, if the program reports an average speed of 88 records a
> >> second, and I let the program run for 24 hours, then I would expect the
> >> RowCounter program to report a number around 88
> >> (rows/second)*24(hours)*(60min/hour)*60(seconds/min) = 7,603,200 rows.
> >>
> >> In actuality, RowCounter returns:
> >>
> >> org.apache.hadoop.hbase.mapreduce.RowCounter$RowCounterMapper$Counters
> >>         ROWS=1356588
> >>
> >>
> >> The vast difference between ~7 million rows and ~1 million rows has me
> >> confused about what happened to the other rows that should have been in
> >> the table.
> >>
> >> Thanks for your reply,
> >> Steven
> >>
> >>
> >>
> >>
> >>
> >>
> >> On 8/22/14 9:53 AM, "Ted Yu" <yu...@gmail.com> wrote:
> >>
> >> >bq. the result from the RowCounter program is far fewer records than I
> >> >expected.
> >> >
> >> >Can you give more detailed information about the gap ?
> >> >
> >> >Which hbase release are you running ?
> >> >
> >> >Cheers
> >> >
> >> >
> >> >On Fri, Aug 22, 2014 at 9:26 AM, Magana-zook, Steven Alan <
> >> >maganazook1@llnl.gov> wrote:
> >> >
> >> >> Hello,
> >> >>
> >> >> I have written a program in Java that is supposed to update rows in a
> >> >> Hbase table that do not yet have a value in a certain column (blob
> >> >>values
> >> >> of between 5k and 50k). The program keeps track of how many puts have
> >> >>been
> >> >> added to the table along with how long the program is running. These
> >> >>pieces
> >> >> of information are used to calculate a speed for data ingestion
> >>(records
> >> >> per second). After running the program for multiple days, and based
> >>on
> >> >>the
> >> >> average speed reported, the result from the RowCounter program is far
> >> >>fewer
> >> >> records than I expected. The essential parts of the code are shown
> >>below
> >> >> (error handling and other potentially not important code omitted)
> >>along
> >> >> with the command I use to see how many rows have been updated.
> >> >>
> >> >> Is it possible that the put method call on Htable does not actually
> >>put
> >> >> the record in the database while also not throwing an exception?
> >> >> Could the output of RowCounter be incorrect?
> >> >> Am I doing something below that is obviously incorrect?
> >> >>
> >> >> Row counter command (does frequently report
> >> >>OutOfOrderScannerNextException
> >> >> during execution): hbase org.apache.hadoop.hbase.mapreduce.RowCounter
> >> >> mytable cf:BLOBDATACOLUMN
> >> >>
> >> >> Code that is essentially what I am doing in my program:
> >> >> ...
> >> >> Scan scan = new Scan();
> >> >> scan.setCaching(200);
> >> >>
> >> >> HTable targetTable = new HTable(hbaseConfiguration,
> >> >> Bytes.toBytes(tblTarget));
> >> >> targetTable.getScanner(scan);
> >> >>
> >> >> int batchSize = 10;
> >> >> Date startTime = new Date();
> >> >> numFilesSent = 0;
> >> >>
> >> >> Result[] rows = resultScanner.next(batchSize);
> >> >> while (rows != null) {
> >> >> for (Result row : rows) {
> >> >> byte[] rowKey = row.getRow();
> >> >> byte[] byteArrayBlobData = getFileContentsForRow(rowKey);
> >> >>
> >> >> Put put = new Put(rowKey);
> >> >> put.add(COLUMN_FAMILY, BLOB_COLUMN, byteArrayBlobData);
> >> >> targetTable.put(put); // Auto-flush is on by default
> >> >> numFilesSent++;
> >> >> float elapsedSeconds = (new Date().getTime() - startTime.getTime()) /
> >> >> 1000.0f;
> >> >> float speed = numFilesSent / elapsedSeconds;
> >> >> System.out.println("Speed(rows/sec): " + speed); // routinely says
> >>from
> >> >>80
> >> >> to 200+
> >> >> }
> >> >> rows = resultScanner.next(batchSize);
> >> >> }
> >> >> ...
> >> >>
> >> >> Thanks,
> >> >> Steven
> >> >>
> >>
> >>
>
>

Re: Is it possible for HTable.put(Š) to not make it into the table and silently fail?

Posted by "Magana-zook, Steven Alan" <ma...@llnl.gov>.
St.Ack,


Running: hbase org.apache.hadoop.hbase.mapreduce.CellCounter mytable
/user/samz/mytablecellcounter


Prints a lot of these:

2014-08-22 13:40:51,037 INFO  [main] mapreduce.Job: Task Id :
attempt_1406587748887_0063_m_000114_0, Status : FAILED
Error: org.apache.hadoop.mapreduce.counters.LimitExceededException: Too
many counters: 121 max=120
	at 
org.apache.hadoop.mapreduce.counters.Limits.checkCounters(Limits.java:103)
	at 
org.apache.hadoop.mapreduce.counters.Limits.incrCounters(Limits.java:110)
	at 
org.apache.hadoop.mapreduce.counters.AbstractCounterGroup.addCounter(Abstra
ctCounterGroup.java:78)
	at 
org.apache.hadoop.mapreduce.counters.AbstractCounterGroup.addCounterImpl(Ab
stractCounterGroup.java:95)
	at 
org.apache.hadoop.mapreduce.counters.AbstractCounterGroup.findCounterImpl(A
bstractCounterGroup.java:123)
	at 
org.apache.hadoop.mapreduce.counters.AbstractCounterGroup.findCounter(Abstr
actCounterGroup.java:113)
	at 
org.apache.hadoop.mapreduce.counters.AbstractCounterGroup.findCounter(Abstr
actCounterGroup.java:130)
	at org.apache.hadoop.mapred.Counters$Group.findCounter(Counters.java:369)
	at 
org.apache.hadoop.mapred.Counters$Group.getCounterForName(Counters.java:314
)
	at org.apache.hadoop.mapred.Counters.findCounter(Counters.java:479)
	at org.apache.hadoop.mapred.Task$TaskReporter.getCounter(Task.java:658)
	at org.apache.hadoop.mapred.Task$TaskReporter.getCounter(Task.java:602)
	at 
org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl.getCounter(TaskAtte
mptContextImpl.java:76)
	at 
org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.getCounter(Wrappe
dMapper.java:101)
	at 
org.apache.hadoop.hbase.mapreduce.CellCounter$CellCounterMapper.map(CellCou
nter.java:138)
	at 
org.apache.hadoop.hbase.mapreduce.CellCounter$CellCounterMapper.map(CellCou
nter.java:84)
	at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
	at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:415)
	at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.j
ava:1557)
	at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)

Container killed by the ApplicationMaster.
Container killed on request. Exit code is 143
Container exited with a non-zero exit code 143



Thanks,
Steven




On 8/22/14 12:27 PM, "Stack" <st...@duboce.net> wrote:

>What does CellCounter return?
>St.Ack
>
>
>On Fri, Aug 22, 2014 at 10:17 AM, Magana-zook, Steven Alan <
>maganazook1@llnl.gov> wrote:
>
>> Hi Ted,
>>
>> For example, if the program reports an average speed of 88 records a
>> second, and I let the program run for 24 hours, then I would expect the
>> RowCounter program to report a number around 88
>> (rows/second)*24(hours)*(60min/hour)*60(seconds/min) = 7,603,200 rows.
>>
>> In actuality, RowCounter returns:
>>
>> org.apache.hadoop.hbase.mapreduce.RowCounter$RowCounterMapper$Counters
>>         ROWS=1356588
>>
>>
>> The vast difference between ~7 million rows and ~1 million rows has me
>> confused about what happened to the other rows that should have been in
>> the table.
>>
>> Thanks for your reply,
>> Steven
>>
>>
>>
>>
>>
>>
>> On 8/22/14 9:53 AM, "Ted Yu" <yu...@gmail.com> wrote:
>>
>> >bq. the result from the RowCounter program is far fewer records than I
>> >expected.
>> >
>> >Can you give more detailed information about the gap ?
>> >
>> >Which hbase release are you running ?
>> >
>> >Cheers
>> >
>> >
>> >On Fri, Aug 22, 2014 at 9:26 AM, Magana-zook, Steven Alan <
>> >maganazook1@llnl.gov> wrote:
>> >
>> >> Hello,
>> >>
>> >> I have written a program in Java that is supposed to update rows in a
>> >> Hbase table that do not yet have a value in a certain column (blob
>> >>values
>> >> of between 5k and 50k). The program keeps track of how many puts have
>> >>been
>> >> added to the table along with how long the program is running. These
>> >>pieces
>> >> of information are used to calculate a speed for data ingestion
>>(records
>> >> per second). After running the program for multiple days, and based
>>on
>> >>the
>> >> average speed reported, the result from the RowCounter program is far
>> >>fewer
>> >> records than I expected. The essential parts of the code are shown
>>below
>> >> (error handling and other potentially not important code omitted)
>>along
>> >> with the command I use to see how many rows have been updated.
>> >>
>> >> Is it possible that the put method call on Htable does not actually
>>put
>> >> the record in the database while also not throwing an exception?
>> >> Could the output of RowCounter be incorrect?
>> >> Am I doing something below that is obviously incorrect?
>> >>
>> >> Row counter command (does frequently report
>> >>OutOfOrderScannerNextException
>> >> during execution): hbase org.apache.hadoop.hbase.mapreduce.RowCounter
>> >> mytable cf:BLOBDATACOLUMN
>> >>
>> >> Code that is essentially what I am doing in my program:
>> >> ...
>> >> Scan scan = new Scan();
>> >> scan.setCaching(200);
>> >>
>> >> HTable targetTable = new HTable(hbaseConfiguration,
>> >> Bytes.toBytes(tblTarget));
>> >> targetTable.getScanner(scan);
>> >>
>> >> int batchSize = 10;
>> >> Date startTime = new Date();
>> >> numFilesSent = 0;
>> >>
>> >> Result[] rows = resultScanner.next(batchSize);
>> >> while (rows != null) {
>> >> for (Result row : rows) {
>> >> byte[] rowKey = row.getRow();
>> >> byte[] byteArrayBlobData = getFileContentsForRow(rowKey);
>> >>
>> >> Put put = new Put(rowKey);
>> >> put.add(COLUMN_FAMILY, BLOB_COLUMN, byteArrayBlobData);
>> >> targetTable.put(put); // Auto-flush is on by default
>> >> numFilesSent++;
>> >> float elapsedSeconds = (new Date().getTime() - startTime.getTime()) /
>> >> 1000.0f;
>> >> float speed = numFilesSent / elapsedSeconds;
>> >> System.out.println("Speed(rows/sec): " + speed); // routinely says
>>from
>> >>80
>> >> to 200+
>> >> }
>> >> rows = resultScanner.next(batchSize);
>> >> }
>> >> ...
>> >>
>> >> Thanks,
>> >> Steven
>> >>
>>
>>


Re: Is it possible for HTable.put(Š) to not make it into the table and silently fail?

Posted by Stack <st...@duboce.net>.
What does CellCounter return?
St.Ack


On Fri, Aug 22, 2014 at 10:17 AM, Magana-zook, Steven Alan <
maganazook1@llnl.gov> wrote:

> Hi Ted,
>
> For example, if the program reports an average speed of 88 records a
> second, and I let the program run for 24 hours, then I would expect the
> RowCounter program to report a number around 88
> (rows/second)*24(hours)*(60min/hour)*60(seconds/min) = 7,603,200 rows.
>
> In actuality, RowCounter returns:
>
> org.apache.hadoop.hbase.mapreduce.RowCounter$RowCounterMapper$Counters
>         ROWS=1356588
>
>
> The vast difference between ~7 million rows and ~1 million rows has me
> confused about what happened to the other rows that should have been in
> the table.
>
> Thanks for your reply,
> Steven
>
>
>
>
>
>
> On 8/22/14 9:53 AM, "Ted Yu" <yu...@gmail.com> wrote:
>
> >bq. the result from the RowCounter program is far fewer records than I
> >expected.
> >
> >Can you give more detailed information about the gap ?
> >
> >Which hbase release are you running ?
> >
> >Cheers
> >
> >
> >On Fri, Aug 22, 2014 at 9:26 AM, Magana-zook, Steven Alan <
> >maganazook1@llnl.gov> wrote:
> >
> >> Hello,
> >>
> >> I have written a program in Java that is supposed to update rows in a
> >> Hbase table that do not yet have a value in a certain column (blob
> >>values
> >> of between 5k and 50k). The program keeps track of how many puts have
> >>been
> >> added to the table along with how long the program is running. These
> >>pieces
> >> of information are used to calculate a speed for data ingestion (records
> >> per second). After running the program for multiple days, and based on
> >>the
> >> average speed reported, the result from the RowCounter program is far
> >>fewer
> >> records than I expected. The essential parts of the code are shown below
> >> (error handling and other potentially not important code omitted) along
> >> with the command I use to see how many rows have been updated.
> >>
> >> Is it possible that the put method call on Htable does not actually put
> >> the record in the database while also not throwing an exception?
> >> Could the output of RowCounter be incorrect?
> >> Am I doing something below that is obviously incorrect?
> >>
> >> Row counter command (does frequently report
> >>OutOfOrderScannerNextException
> >> during execution): hbase org.apache.hadoop.hbase.mapreduce.RowCounter
> >> mytable cf:BLOBDATACOLUMN
> >>
> >> Code that is essentially what I am doing in my program:
> >> ...
> >> Scan scan = new Scan();
> >> scan.setCaching(200);
> >>
> >> HTable targetTable = new HTable(hbaseConfiguration,
> >> Bytes.toBytes(tblTarget));
> >> targetTable.getScanner(scan);
> >>
> >> int batchSize = 10;
> >> Date startTime = new Date();
> >> numFilesSent = 0;
> >>
> >> Result[] rows = resultScanner.next(batchSize);
> >> while (rows != null) {
> >> for (Result row : rows) {
> >> byte[] rowKey = row.getRow();
> >> byte[] byteArrayBlobData = getFileContentsForRow(rowKey);
> >>
> >> Put put = new Put(rowKey);
> >> put.add(COLUMN_FAMILY, BLOB_COLUMN, byteArrayBlobData);
> >> targetTable.put(put); // Auto-flush is on by default
> >> numFilesSent++;
> >> float elapsedSeconds = (new Date().getTime() - startTime.getTime()) /
> >> 1000.0f;
> >> float speed = numFilesSent / elapsedSeconds;
> >> System.out.println("Speed(rows/sec): " + speed); // routinely says from
> >>80
> >> to 200+
> >> }
> >> rows = resultScanner.next(batchSize);
> >> }
> >> ...
> >>
> >> Thanks,
> >> Steven
> >>
>
>

Re: Is it possible for HTable.put(Š) to not make it into the table and silently fail?

Posted by "Magana-zook, Steven Alan" <ma...@llnl.gov>.
Yes, the idea is to add a byte array to the column cf:BLOBDATACOLUMN for
each row key. 

This *should* only ever happen once per row. I do not modify the row keys
given by Hbase (Result#getRow) in anyway, so the row key in each Put
should be unique during a single run. Subsequent runs of the same program
do not attempt to reload rows containing data. The way this is done is by
calling Result#getValue and checking for null. I know this method is
inefficient, but I have not found a good Filter strategy for bringing back
rows that *do not* have a certain column. Two Filters I have tried to use
with the Scan object are listed below:

This one returns rows that have data already loaded. The idea here was to
create a logic condition that no row would meet, but due to the
filterIfMissing=false I would get back rows that did not have the column
being tested:
final SingleColumnValueFilter fltrOnlyColsNoData2
= new SingleColumnValueFilter(
COLUMN_FAMILY, BLOBDATACOLUMN,
CompareFilter.CompareOp.LESS, Bytes.toBytes(0));


I used this one before, can't remember what problem I had with it, and
will try it again now that I switched from next(batchsize) to an iterator
from ResultScanner#iterator:
final SingleColumnValueExcludeFilter fltrOnlyColsNoData
                    = new SingleColumnValueExcludeFilter(
                            COLUMN_FAMILY, BLOBDATACOLUMN,
                            CompareFilter.CompareOp.EQUAL,
                            new NullComparator());



Thanks




On 8/22/14 10:26 AM, "Ted Yu" <yu...@gmail.com> wrote:

>For given rowkey, would there be only one record written per
>cf:BLOBDATACOLUMN
>column ?
>
>Cheers
>
>
>On Fri, Aug 22, 2014 at 10:17 AM, Magana-zook, Steven Alan <
>maganazook1@llnl.gov> wrote:
>
>> Hi Ted,
>>
>> For example, if the program reports an average speed of 88 records a
>> second, and I let the program run for 24 hours, then I would expect the
>> RowCounter program to report a number around 88
>> (rows/second)*24(hours)*(60min/hour)*60(seconds/min) = 7,603,200 rows.
>>
>> In actuality, RowCounter returns:
>>
>> org.apache.hadoop.hbase.mapreduce.RowCounter$RowCounterMapper$Counters
>>         ROWS=1356588
>>
>>
>> The vast difference between ~7 million rows and ~1 million rows has me
>> confused about what happened to the other rows that should have been in
>> the table.
>>
>> Thanks for your reply,
>> Steven
>>
>>
>>
>>
>>
>>
>> On 8/22/14 9:53 AM, "Ted Yu" <yu...@gmail.com> wrote:
>>
>> >bq. the result from the RowCounter program is far fewer records than I
>> >expected.
>> >
>> >Can you give more detailed information about the gap ?
>> >
>> >Which hbase release are you running ?
>> >
>> >Cheers
>> >
>> >
>> >On Fri, Aug 22, 2014 at 9:26 AM, Magana-zook, Steven Alan <
>> >maganazook1@llnl.gov> wrote:
>> >
>> >> Hello,
>> >>
>> >> I have written a program in Java that is supposed to update rows in a
>> >> Hbase table that do not yet have a value in a certain column (blob
>> >>values
>> >> of between 5k and 50k). The program keeps track of how many puts have
>> >>been
>> >> added to the table along with how long the program is running. These
>> >>pieces
>> >> of information are used to calculate a speed for data ingestion
>>(records
>> >> per second). After running the program for multiple days, and based
>>on
>> >>the
>> >> average speed reported, the result from the RowCounter program is far
>> >>fewer
>> >> records than I expected. The essential parts of the code are shown
>>below
>> >> (error handling and other potentially not important code omitted)
>>along
>> >> with the command I use to see how many rows have been updated.
>> >>
>> >> Is it possible that the put method call on Htable does not actually
>>put
>> >> the record in the database while also not throwing an exception?
>> >> Could the output of RowCounter be incorrect?
>> >> Am I doing something below that is obviously incorrect?
>> >>
>> >> Row counter command (does frequently report
>> >>OutOfOrderScannerNextException
>> >> during execution): hbase org.apache.hadoop.hbase.mapreduce.RowCounter
>> >> mytable cf:BLOBDATACOLUMN
>> >>
>> >> Code that is essentially what I am doing in my program:
>> >> ...
>> >> Scan scan = new Scan();
>> >> scan.setCaching(200);
>> >>
>> >> HTable targetTable = new HTable(hbaseConfiguration,
>> >> Bytes.toBytes(tblTarget));
>> >> targetTable.getScanner(scan);
>> >>
>> >> int batchSize = 10;
>> >> Date startTime = new Date();
>> >> numFilesSent = 0;
>> >>
>> >> Result[] rows = resultScanner.next(batchSize);
>> >> while (rows != null) {
>> >> for (Result row : rows) {
>> >> byte[] rowKey = row.getRow();
>> >> byte[] byteArrayBlobData = getFileContentsForRow(rowKey);
>> >>
>> >> Put put = new Put(rowKey);
>> >> put.add(COLUMN_FAMILY, BLOB_COLUMN, byteArrayBlobData);
>> >> targetTable.put(put); // Auto-flush is on by default
>> >> numFilesSent++;
>> >> float elapsedSeconds = (new Date().getTime() - startTime.getTime()) /
>> >> 1000.0f;
>> >> float speed = numFilesSent / elapsedSeconds;
>> >> System.out.println("Speed(rows/sec): " + speed); // routinely says
>>from
>> >>80
>> >> to 200+
>> >> }
>> >> rows = resultScanner.next(batchSize);
>> >> }
>> >> ...
>> >>
>> >> Thanks,
>> >> Steven
>> >>
>>
>>


Re: Is it possible for HTable.put(Š) to not make it into the table and silently fail?

Posted by Ted Yu <yu...@gmail.com>.
For given rowkey, would there be only one record written per cf:BLOBDATACOLUMN
column ?

Cheers


On Fri, Aug 22, 2014 at 10:17 AM, Magana-zook, Steven Alan <
maganazook1@llnl.gov> wrote:

> Hi Ted,
>
> For example, if the program reports an average speed of 88 records a
> second, and I let the program run for 24 hours, then I would expect the
> RowCounter program to report a number around 88
> (rows/second)*24(hours)*(60min/hour)*60(seconds/min) = 7,603,200 rows.
>
> In actuality, RowCounter returns:
>
> org.apache.hadoop.hbase.mapreduce.RowCounter$RowCounterMapper$Counters
>         ROWS=1356588
>
>
> The vast difference between ~7 million rows and ~1 million rows has me
> confused about what happened to the other rows that should have been in
> the table.
>
> Thanks for your reply,
> Steven
>
>
>
>
>
>
> On 8/22/14 9:53 AM, "Ted Yu" <yu...@gmail.com> wrote:
>
> >bq. the result from the RowCounter program is far fewer records than I
> >expected.
> >
> >Can you give more detailed information about the gap ?
> >
> >Which hbase release are you running ?
> >
> >Cheers
> >
> >
> >On Fri, Aug 22, 2014 at 9:26 AM, Magana-zook, Steven Alan <
> >maganazook1@llnl.gov> wrote:
> >
> >> Hello,
> >>
> >> I have written a program in Java that is supposed to update rows in a
> >> Hbase table that do not yet have a value in a certain column (blob
> >>values
> >> of between 5k and 50k). The program keeps track of how many puts have
> >>been
> >> added to the table along with how long the program is running. These
> >>pieces
> >> of information are used to calculate a speed for data ingestion (records
> >> per second). After running the program for multiple days, and based on
> >>the
> >> average speed reported, the result from the RowCounter program is far
> >>fewer
> >> records than I expected. The essential parts of the code are shown below
> >> (error handling and other potentially not important code omitted) along
> >> with the command I use to see how many rows have been updated.
> >>
> >> Is it possible that the put method call on Htable does not actually put
> >> the record in the database while also not throwing an exception?
> >> Could the output of RowCounter be incorrect?
> >> Am I doing something below that is obviously incorrect?
> >>
> >> Row counter command (does frequently report
> >>OutOfOrderScannerNextException
> >> during execution): hbase org.apache.hadoop.hbase.mapreduce.RowCounter
> >> mytable cf:BLOBDATACOLUMN
> >>
> >> Code that is essentially what I am doing in my program:
> >> ...
> >> Scan scan = new Scan();
> >> scan.setCaching(200);
> >>
> >> HTable targetTable = new HTable(hbaseConfiguration,
> >> Bytes.toBytes(tblTarget));
> >> targetTable.getScanner(scan);
> >>
> >> int batchSize = 10;
> >> Date startTime = new Date();
> >> numFilesSent = 0;
> >>
> >> Result[] rows = resultScanner.next(batchSize);
> >> while (rows != null) {
> >> for (Result row : rows) {
> >> byte[] rowKey = row.getRow();
> >> byte[] byteArrayBlobData = getFileContentsForRow(rowKey);
> >>
> >> Put put = new Put(rowKey);
> >> put.add(COLUMN_FAMILY, BLOB_COLUMN, byteArrayBlobData);
> >> targetTable.put(put); // Auto-flush is on by default
> >> numFilesSent++;
> >> float elapsedSeconds = (new Date().getTime() - startTime.getTime()) /
> >> 1000.0f;
> >> float speed = numFilesSent / elapsedSeconds;
> >> System.out.println("Speed(rows/sec): " + speed); // routinely says from
> >>80
> >> to 200+
> >> }
> >> rows = resultScanner.next(batchSize);
> >> }
> >> ...
> >>
> >> Thanks,
> >> Steven
> >>
>
>

Re: Is it possible for HTable.put(Š) to not make it into the table and silently fail?

Posted by "Magana-zook, Steven Alan" <ma...@llnl.gov>.
Hi Ted,

For example, if the program reports an average speed of 88 records a
second, and I let the program run for 24 hours, then I would expect the
RowCounter program to report a number around 88
(rows/second)*24(hours)*(60min/hour)*60(seconds/min) = 7,603,200 rows.

In actuality, RowCounter returns:
	org.apache.hadoop.hbase.mapreduce.RowCounter$RowCounterMapper$Counters
	ROWS=1356588


The vast difference between ~7 million rows and ~1 million rows has me
confused about what happened to the other rows that should have been in
the table.

Thanks for your reply,
Steven






On 8/22/14 9:53 AM, "Ted Yu" <yu...@gmail.com> wrote:

>bq. the result from the RowCounter program is far fewer records than I
>expected.
>
>Can you give more detailed information about the gap ?
>
>Which hbase release are you running ?
>
>Cheers
>
>
>On Fri, Aug 22, 2014 at 9:26 AM, Magana-zook, Steven Alan <
>maganazook1@llnl.gov> wrote:
>
>> Hello,
>>
>> I have written a program in Java that is supposed to update rows in a
>> Hbase table that do not yet have a value in a certain column (blob
>>values
>> of between 5k and 50k). The program keeps track of how many puts have
>>been
>> added to the table along with how long the program is running. These
>>pieces
>> of information are used to calculate a speed for data ingestion (records
>> per second). After running the program for multiple days, and based on
>>the
>> average speed reported, the result from the RowCounter program is far
>>fewer
>> records than I expected. The essential parts of the code are shown below
>> (error handling and other potentially not important code omitted) along
>> with the command I use to see how many rows have been updated.
>>
>> Is it possible that the put method call on Htable does not actually put
>> the record in the database while also not throwing an exception?
>> Could the output of RowCounter be incorrect?
>> Am I doing something below that is obviously incorrect?
>>
>> Row counter command (does frequently report
>>OutOfOrderScannerNextException
>> during execution): hbase org.apache.hadoop.hbase.mapreduce.RowCounter
>> mytable cf:BLOBDATACOLUMN
>>
>> Code that is essentially what I am doing in my program:
>> ...
>> Scan scan = new Scan();
>> scan.setCaching(200);
>>
>> HTable targetTable = new HTable(hbaseConfiguration,
>> Bytes.toBytes(tblTarget));
>> targetTable.getScanner(scan);
>>
>> int batchSize = 10;
>> Date startTime = new Date();
>> numFilesSent = 0;
>>
>> Result[] rows = resultScanner.next(batchSize);
>> while (rows != null) {
>> for (Result row : rows) {
>> byte[] rowKey = row.getRow();
>> byte[] byteArrayBlobData = getFileContentsForRow(rowKey);
>>
>> Put put = new Put(rowKey);
>> put.add(COLUMN_FAMILY, BLOB_COLUMN, byteArrayBlobData);
>> targetTable.put(put); // Auto-flush is on by default
>> numFilesSent++;
>> float elapsedSeconds = (new Date().getTime() - startTime.getTime()) /
>> 1000.0f;
>> float speed = numFilesSent / elapsedSeconds;
>> System.out.println("Speed(rows/sec): " + speed); // routinely says from
>>80
>> to 200+
>> }
>> rows = resultScanner.next(batchSize);
>> }
>> ...
>>
>> Thanks,
>> Steven
>>


Re: Is it possible for HTable.put(Š) to not make it into the table and silently fail?

Posted by Ted Yu <yu...@gmail.com>.
bq. the result from the RowCounter program is far fewer records than I
expected.

Can you give more detailed information about the gap ?

Which hbase release are you running ?

Cheers


On Fri, Aug 22, 2014 at 9:26 AM, Magana-zook, Steven Alan <
maganazook1@llnl.gov> wrote:

> Hello,
>
> I have written a program in Java that is supposed to update rows in a
> Hbase table that do not yet have a value in a certain column (blob values
> of between 5k and 50k). The program keeps track of how many puts have been
> added to the table along with how long the program is running. These pieces
> of information are used to calculate a speed for data ingestion (records
> per second). After running the program for multiple days, and based on the
> average speed reported, the result from the RowCounter program is far fewer
> records than I expected. The essential parts of the code are shown below
> (error handling and other potentially not important code omitted) along
> with the command I use to see how many rows have been updated.
>
> Is it possible that the put method call on Htable does not actually put
> the record in the database while also not throwing an exception?
> Could the output of RowCounter be incorrect?
> Am I doing something below that is obviously incorrect?
>
> Row counter command (does frequently report OutOfOrderScannerNextException
> during execution): hbase org.apache.hadoop.hbase.mapreduce.RowCounter
> mytable cf:BLOBDATACOLUMN
>
> Code that is essentially what I am doing in my program:
> ...
> Scan scan = new Scan();
> scan.setCaching(200);
>
> HTable targetTable = new HTable(hbaseConfiguration,
> Bytes.toBytes(tblTarget));
> targetTable.getScanner(scan);
>
> int batchSize = 10;
> Date startTime = new Date();
> numFilesSent = 0;
>
> Result[] rows = resultScanner.next(batchSize);
> while (rows != null) {
> for (Result row : rows) {
> byte[] rowKey = row.getRow();
> byte[] byteArrayBlobData = getFileContentsForRow(rowKey);
>
> Put put = new Put(rowKey);
> put.add(COLUMN_FAMILY, BLOB_COLUMN, byteArrayBlobData);
> targetTable.put(put); // Auto-flush is on by default
> numFilesSent++;
> float elapsedSeconds = (new Date().getTime() - startTime.getTime()) /
> 1000.0f;
> float speed = numFilesSent / elapsedSeconds;
> System.out.println("Speed(rows/sec): " + speed); // routinely says from 80
> to 200+
> }
> rows = resultScanner.next(batchSize);
> }
> ...
>
> Thanks,
> Steven
>