You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Vishal Kapoor <vi...@gmail.com> on 2011/04/06 14:12:44 UTC

does Scan guarantee min to max rows?

I am getting shuffled rows? is there a problem at my end somewhere? we
did some manual split of tables.
have a scoreboard kind of code for staged processing of table based on
it, which is going for a toss.

thanks,
Vishal

Re: does Scan guarantee min to max rows?

Posted by Ted Yu <yu...@gmail.com>.
Have you read the thread entitled 'min, max' ?

On Tue, Apr 12, 2011 at 7:33 AM, Vishal Kapoor
<vi...@gmail.com>wrote:

> Here is the problem.
>
> my row Ids are "starting" with reversed time stamp followed by "/" and
> some more values.
>
>
> 9223370735421724555/TimeStamp1/TimeStamp2/CustomerId/MacIdSystem1/MacIdSystem2/RowType
>
> the RowId is designed to make sure the latest row comes up first in the
> Scan.
>
> reverse time is calculated as below:
>
> long reverseTimeStampForRIghtNow = Long.MAX_VALUE -
> System.currentTimeMillis()
>
> Now, I have a need to only process the new incoming rows, so I land up
> keeping a ScoreBoard Table with records of what I process with every
> iteration.
>
> I pass start and stop Row to the Scan to define the scope.
>
> start row is taken as below.
>
>                Scan scan = new Scan();
>                scan.setCacheBlocks(false);
>                scan.setFilter(new FirstKeyOnlyFilter());
>                ResultScanner rsc = table.getScanner(scan);
>                Result firstRow = rsc.next();
>                        if(firstRow != null ) {
>                                startRow = firstRow.getRow();
>                        }
>
>  and the last row for the very "first" run is calculated like this.
>
>        Result lastRow = table.getRowOrBefore(Bytes.toBytes("9999999999"),
> someFamilyOfThisTableWhichAlwaysExist);
>                        if(lastRow != null )
>                        stopRow = lastRow.getRow();
>
> once processed, the first Row from this processing becomes the last
> row for Next Iteration.
> and since the last row is excluded from the scan, it should work to my
> advantage.
>
> conceptually I assume it to work as long as the processing code and
> new records writer code does not step on each other.
> But I have instances when the Scan does not give me the top most
> record from table.
>
> I am clueless on where I am going wrong.
> any pointers to improving it or switching to a design that is proven
> to be working on this kind of problem will help me.
>
> thanks,
> Vishal Kapoor
>
> On Wed, Apr 6, 2011 at 12:56 PM, Stack <st...@duboce.net> wrote:
> > On Wed, Apr 6, 2011 at 5:12 AM, Vishal Kapoor
> > <vi...@gmail.com> wrote:
> >> I am getting shuffled rows? is there a problem at my end somewhere? we
> >> did some manual split of tables.
> >> have a scoreboard kind of code for staged processing of table based on
> >> it, which is going for a toss.
> >>
> >
> > Vishal, you'll have to do better than the above describing your
> > problem if you are looking for some help from the list.
> > St.Ack
> >
>

Re: does Scan guarantee min to max rows?

Posted by Vishal Kapoor <vi...@gmail.com>.
Here is the problem.

my row Ids are "starting" with reversed time stamp followed by "/" and
some more values.

9223370735421724555/TimeStamp1/TimeStamp2/CustomerId/MacIdSystem1/MacIdSystem2/RowType

the RowId is designed to make sure the latest row comes up first in the Scan.

reverse time is calculated as below:

long reverseTimeStampForRIghtNow = Long.MAX_VALUE - System.currentTimeMillis()

Now, I have a need to only process the new incoming rows, so I land up
keeping a ScoreBoard Table with records of what I process with every
iteration.

I pass start and stop Row to the Scan to define the scope.

start row is taken as below.

		Scan scan = new Scan();
		scan.setCacheBlocks(false);
		scan.setFilter(new FirstKeyOnlyFilter());
                ResultScanner rsc = table.getScanner(scan);
                Result firstRow = rsc.next();
			if(firstRow != null ) {
				startRow = firstRow.getRow();	
			}

 and the last row for the very "first" run is calculated like this.

	Result lastRow = table.getRowOrBefore(Bytes.toBytes("9999999999"),
someFamilyOfThisTableWhichAlwaysExist);
			if(lastRow != null )
			stopRow = lastRow.getRow();

once processed, the first Row from this processing becomes the last
row for Next Iteration.
and since the last row is excluded from the scan, it should work to my
advantage.

conceptually I assume it to work as long as the processing code and
new records writer code does not step on each other.
But I have instances when the Scan does not give me the top most
record from table.

I am clueless on where I am going wrong.
any pointers to improving it or switching to a design that is proven
to be working on this kind of problem will help me.

thanks,
Vishal Kapoor

On Wed, Apr 6, 2011 at 12:56 PM, Stack <st...@duboce.net> wrote:
> On Wed, Apr 6, 2011 at 5:12 AM, Vishal Kapoor
> <vi...@gmail.com> wrote:
>> I am getting shuffled rows? is there a problem at my end somewhere? we
>> did some manual split of tables.
>> have a scoreboard kind of code for staged processing of table based on
>> it, which is going for a toss.
>>
>
> Vishal, you'll have to do better than the above describing your
> problem if you are looking for some help from the list.
> St.Ack
>

Re: does Scan guarantee min to max rows?

Posted by Stack <st...@duboce.net>.
On Wed, Apr 6, 2011 at 5:12 AM, Vishal Kapoor
<vi...@gmail.com> wrote:
> I am getting shuffled rows? is there a problem at my end somewhere? we
> did some manual split of tables.
> have a scoreboard kind of code for staged processing of table based on
> it, which is going for a toss.
>

Vishal, you'll have to do better than the above describing your
problem if you are looking for some help from the list.
St.Ack