You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hbase.apache.org by Xi Yang <al...@gmail.com> on 2016/10/17 05:34:09 UTC

Why HBase scan HFile first, before scan memstore?

I found codes in HStore.java

List<StoreFileScanner> sfScanners =
StoreFileScanner.getScannersForStoreFiles(files,
      cacheBlocks, usePread, isCompaction, false, matcher, readPt,
isPrimaryReplicaStore());
    List<KeyValueScanner> scanners = new
ArrayList<KeyValueScanner>(sfScanners.size() + 1);
    scanners.addAll(sfScanners);
    // Then the memstore scanners
    if (memStoreScanners != null) {
      scanners.addAll(memStoreScanners);
    }

So is it mean before scan memstore it will scan HFile first?
Why not scan memstore first, because memory is always faster then hard disk?


Thanks,
Alex

Re: Why HBase scan HFile first, before scan memstore?

Posted by Xi Yang <al...@gmail.com>.
Great explanation. And thank you for your patiently!

Thanks.
Alex

2016-10-16 23:34 GMT-07:00 ramkrishna vasudevan <
ramkrishna.s.vasudevan@gmail.com>:

> Yes you are right. You can see the code after the list of scanners are
> formed. They are all collected in a KeyValueHeap.
> Pls note that memstore is not a cache, it is only a data structure where
> the data is first written and subsequently gets flushed into files. So the
> data you read may or may not reside in the memstore. So it is always needed
> to scan
> the memstore and the files and then keep returning keys in the
> lexographical sorted order for which the heap comes into place.
>
> Regards
> Ram
>
> On Mon, Oct 17, 2016 at 11:54 AM, Xi Yang <al...@gmail.com> wrote:
>
> > Got it. So you mean, actually, the result HBase return to user is from
> the
> > Heap. And the scanners' jobs are collecting data into that Heap. So the
> > order of how to arrange HFile scanners and memstore scanner is not a big
> > deal?
> >
> > Thanks,
> > Alex
> >
> > 2016-10-16 23:04 GMT-07:00 Anoop John <an...@gmail.com>:
> >
> > > Over all these scanners we will be creating a Heap. (See in
> > > StoreScanner where we make KeyValueHeap).   Out of this cells come in
> > > their key order.  So said that, we will be opening and seeking to all
> > > scanners and get cur cells from all.. Based on the comparator result
> > > of all these cells emerge out from Heap.  So it is not that we will
> > > scan HFile scanners first and then do scan over memstore. Make sense?
> > >
> > > -Anoop-
> > >
> > > On Mon, Oct 17, 2016 at 11:04 AM, Xi Yang <al...@gmail.com>
> > wrote:
> > > > I found codes in HStore.java
> > > >
> > > > List<StoreFileScanner> sfScanners =
> > > > StoreFileScanner.getScannersForStoreFiles(files,
> > > >       cacheBlocks, usePread, isCompaction, false, matcher, readPt,
> > > > isPrimaryReplicaStore());
> > > >     List<KeyValueScanner> scanners = new
> > > > ArrayList<KeyValueScanner>(sfScanners.size() + 1);
> > > >     scanners.addAll(sfScanners);
> > > >     // Then the memstore scanners
> > > >     if (memStoreScanners != null) {
> > > >       scanners.addAll(memStoreScanners);
> > > >     }
> > > >
> > > > So is it mean before scan memstore it will scan HFile first?
> > > > Why not scan memstore first, because memory is always faster then
> hard
> > > disk?
> > > >
> > > >
> > > > Thanks,
> > > > Alex
> > >
> >
>

Re: Why HBase scan HFile first, before scan memstore?

Posted by ramkrishna vasudevan <ra...@gmail.com>.
Yes you are right. You can see the code after the list of scanners are
formed. They are all collected in a KeyValueHeap.
Pls note that memstore is not a cache, it is only a data structure where
the data is first written and subsequently gets flushed into files. So the
data you read may or may not reside in the memstore. So it is always needed
to scan
the memstore and the files and then keep returning keys in the
lexographical sorted order for which the heap comes into place.

Regards
Ram

On Mon, Oct 17, 2016 at 11:54 AM, Xi Yang <al...@gmail.com> wrote:

> Got it. So you mean, actually, the result HBase return to user is from the
> Heap. And the scanners' jobs are collecting data into that Heap. So the
> order of how to arrange HFile scanners and memstore scanner is not a big
> deal?
>
> Thanks,
> Alex
>
> 2016-10-16 23:04 GMT-07:00 Anoop John <an...@gmail.com>:
>
> > Over all these scanners we will be creating a Heap. (See in
> > StoreScanner where we make KeyValueHeap).   Out of this cells come in
> > their key order.  So said that, we will be opening and seeking to all
> > scanners and get cur cells from all.. Based on the comparator result
> > of all these cells emerge out from Heap.  So it is not that we will
> > scan HFile scanners first and then do scan over memstore. Make sense?
> >
> > -Anoop-
> >
> > On Mon, Oct 17, 2016 at 11:04 AM, Xi Yang <al...@gmail.com>
> wrote:
> > > I found codes in HStore.java
> > >
> > > List<StoreFileScanner> sfScanners =
> > > StoreFileScanner.getScannersForStoreFiles(files,
> > >       cacheBlocks, usePread, isCompaction, false, matcher, readPt,
> > > isPrimaryReplicaStore());
> > >     List<KeyValueScanner> scanners = new
> > > ArrayList<KeyValueScanner>(sfScanners.size() + 1);
> > >     scanners.addAll(sfScanners);
> > >     // Then the memstore scanners
> > >     if (memStoreScanners != null) {
> > >       scanners.addAll(memStoreScanners);
> > >     }
> > >
> > > So is it mean before scan memstore it will scan HFile first?
> > > Why not scan memstore first, because memory is always faster then hard
> > disk?
> > >
> > >
> > > Thanks,
> > > Alex
> >
>

Re: Why HBase scan HFile first, before scan memstore?

Posted by Xi Yang <al...@gmail.com>.
Got it. So you mean, actually, the result HBase return to user is from the
Heap. And the scanners' jobs are collecting data into that Heap. So the
order of how to arrange HFile scanners and memstore scanner is not a big
deal?

Thanks,
Alex

2016-10-16 23:04 GMT-07:00 Anoop John <an...@gmail.com>:

> Over all these scanners we will be creating a Heap. (See in
> StoreScanner where we make KeyValueHeap).   Out of this cells come in
> their key order.  So said that, we will be opening and seeking to all
> scanners and get cur cells from all.. Based on the comparator result
> of all these cells emerge out from Heap.  So it is not that we will
> scan HFile scanners first and then do scan over memstore. Make sense?
>
> -Anoop-
>
> On Mon, Oct 17, 2016 at 11:04 AM, Xi Yang <al...@gmail.com> wrote:
> > I found codes in HStore.java
> >
> > List<StoreFileScanner> sfScanners =
> > StoreFileScanner.getScannersForStoreFiles(files,
> >       cacheBlocks, usePread, isCompaction, false, matcher, readPt,
> > isPrimaryReplicaStore());
> >     List<KeyValueScanner> scanners = new
> > ArrayList<KeyValueScanner>(sfScanners.size() + 1);
> >     scanners.addAll(sfScanners);
> >     // Then the memstore scanners
> >     if (memStoreScanners != null) {
> >       scanners.addAll(memStoreScanners);
> >     }
> >
> > So is it mean before scan memstore it will scan HFile first?
> > Why not scan memstore first, because memory is always faster then hard
> disk?
> >
> >
> > Thanks,
> > Alex
>

Re: Why HBase scan HFile first, before scan memstore?

Posted by Anoop John <an...@gmail.com>.
Over all these scanners we will be creating a Heap. (See in
StoreScanner where we make KeyValueHeap).   Out of this cells come in
their key order.  So said that, we will be opening and seeking to all
scanners and get cur cells from all.. Based on the comparator result
of all these cells emerge out from Heap.  So it is not that we will
scan HFile scanners first and then do scan over memstore. Make sense?

-Anoop-

On Mon, Oct 17, 2016 at 11:04 AM, Xi Yang <al...@gmail.com> wrote:
> I found codes in HStore.java
>
> List<StoreFileScanner> sfScanners =
> StoreFileScanner.getScannersForStoreFiles(files,
>       cacheBlocks, usePread, isCompaction, false, matcher, readPt,
> isPrimaryReplicaStore());
>     List<KeyValueScanner> scanners = new
> ArrayList<KeyValueScanner>(sfScanners.size() + 1);
>     scanners.addAll(sfScanners);
>     // Then the memstore scanners
>     if (memStoreScanners != null) {
>       scanners.addAll(memStoreScanners);
>     }
>
> So is it mean before scan memstore it will scan HFile first?
> Why not scan memstore first, because memory is always faster then hard disk?
>
>
> Thanks,
> Alex