You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by yonghu <yo...@gmail.com> on 2013/02/07 18:23:56 UTC

Is it possible to indicate the column scan order when scanning table?

Dear all,

I wonder if it is possible to indicate the column scan order when
scanning table. For example, if I have two column families cf1 and cf2
and I create a scan object. Is the table scanning order of
scan.addFamily(cf1) and   scan.addFamily(cf2) is as same as
scan.addFamily(cf2) and scan.addFamily(cf1)? If it's the same order,
is it possible to indicate the scanning order of table?

regards!

Yong

Re: Is it possible to indicate the column scan order when scanning table?

Posted by yonghu <yo...@gmail.com>.
Thanks for your response. I will take a look.

yong
On Thu, Feb 7, 2013 at 10:11 PM, Ted Yu <yu...@gmail.com> wrote:
> Yonghu:
> You may want to take a look at HBASE-5416: Improve performance of scans
> with some kind of filters.
> It would be in the upcoming 0.94.5 release.
>
> You can designate an essential column family. Based on the result from this
> column family, extra column family can be scanned.
>
> Cheers
>
> On Thu, Feb 7, 2013 at 1:07 PM, Sergey Shelukhin <se...@hortonworks.com>wrote:
>
>> CFs are scanned in parallel in HBASE, and each row is built; scanning
>> entire CF and then building rows by scanning entire different CF wouldn't
>> scale very well.
>> Do you filter data on ttl column family?
>>
>> On Thu, Feb 7, 2013 at 12:01 PM, yonghu <yo...@gmail.com> wrote:
>>
>> > Like a table can contain ttl data and static data without indicating
>> > ttl. So, I want to first scan the columns which have ttl restrictions
>> > and later the static columns. The goal that I want to achieve is to
>> > reduce the data missing due to ttl expiration during the scan.
>> >
>> > regards!
>> >
>> > Yong
>> >
>> > On Thu, Feb 7, 2013 at 6:29 PM, Ted Yu <yu...@gmail.com> wrote:
>> > > Can you give us the use case where the scanning order is significant ?
>> > >
>> > > Thanks
>> > >
>> > > On Thu, Feb 7, 2013 at 9:23 AM, yonghu <yo...@gmail.com> wrote:
>> > >
>> > >> Dear all,
>> > >>
>> > >> I wonder if it is possible to indicate the column scan order when
>> > >> scanning table. For example, if I have two column families cf1 and cf2
>> > >> and I create a scan object. Is the table scanning order of
>> > >> scan.addFamily(cf1) and   scan.addFamily(cf2) is as same as
>> > >> scan.addFamily(cf2) and scan.addFamily(cf1)? If it's the same order,
>> > >> is it possible to indicate the scanning order of table?
>> > >>
>> > >> regards!
>> > >>
>> > >> Yong
>> > >>
>> >
>>

Re: Is it possible to indicate the column scan order when scanning table?

Posted by Ted Yu <yu...@gmail.com>.
Yonghu:
You may want to take a look at HBASE-5416: Improve performance of scans
with some kind of filters.
It would be in the upcoming 0.94.5 release.

You can designate an essential column family. Based on the result from this
column family, extra column family can be scanned.

Cheers

On Thu, Feb 7, 2013 at 1:07 PM, Sergey Shelukhin <se...@hortonworks.com>wrote:

> CFs are scanned in parallel in HBASE, and each row is built; scanning
> entire CF and then building rows by scanning entire different CF wouldn't
> scale very well.
> Do you filter data on ttl column family?
>
> On Thu, Feb 7, 2013 at 12:01 PM, yonghu <yo...@gmail.com> wrote:
>
> > Like a table can contain ttl data and static data without indicating
> > ttl. So, I want to first scan the columns which have ttl restrictions
> > and later the static columns. The goal that I want to achieve is to
> > reduce the data missing due to ttl expiration during the scan.
> >
> > regards!
> >
> > Yong
> >
> > On Thu, Feb 7, 2013 at 6:29 PM, Ted Yu <yu...@gmail.com> wrote:
> > > Can you give us the use case where the scanning order is significant ?
> > >
> > > Thanks
> > >
> > > On Thu, Feb 7, 2013 at 9:23 AM, yonghu <yo...@gmail.com> wrote:
> > >
> > >> Dear all,
> > >>
> > >> I wonder if it is possible to indicate the column scan order when
> > >> scanning table. For example, if I have two column families cf1 and cf2
> > >> and I create a scan object. Is the table scanning order of
> > >> scan.addFamily(cf1) and   scan.addFamily(cf2) is as same as
> > >> scan.addFamily(cf2) and scan.addFamily(cf1)? If it's the same order,
> > >> is it possible to indicate the scanning order of table?
> > >>
> > >> regards!
> > >>
> > >> Yong
> > >>
> >
>

Re: Is it possible to indicate the column scan order when scanning table?

Posted by Sergey Shelukhin <se...@hortonworks.com>.
CFs are scanned in parallel in HBASE, and each row is built; scanning
entire CF and then building rows by scanning entire different CF wouldn't
scale very well.
Do you filter data on ttl column family?

On Thu, Feb 7, 2013 at 12:01 PM, yonghu <yo...@gmail.com> wrote:

> Like a table can contain ttl data and static data without indicating
> ttl. So, I want to first scan the columns which have ttl restrictions
> and later the static columns. The goal that I want to achieve is to
> reduce the data missing due to ttl expiration during the scan.
>
> regards!
>
> Yong
>
> On Thu, Feb 7, 2013 at 6:29 PM, Ted Yu <yu...@gmail.com> wrote:
> > Can you give us the use case where the scanning order is significant ?
> >
> > Thanks
> >
> > On Thu, Feb 7, 2013 at 9:23 AM, yonghu <yo...@gmail.com> wrote:
> >
> >> Dear all,
> >>
> >> I wonder if it is possible to indicate the column scan order when
> >> scanning table. For example, if I have two column families cf1 and cf2
> >> and I create a scan object. Is the table scanning order of
> >> scan.addFamily(cf1) and   scan.addFamily(cf2) is as same as
> >> scan.addFamily(cf2) and scan.addFamily(cf1)? If it's the same order,
> >> is it possible to indicate the scanning order of table?
> >>
> >> regards!
> >>
> >> Yong
> >>
>

Re: Is it possible to indicate the column scan order when scanning table?

Posted by yonghu <yo...@gmail.com>.
Like a table can contain ttl data and static data without indicating
ttl. So, I want to first scan the columns which have ttl restrictions
and later the static columns. The goal that I want to achieve is to
reduce the data missing due to ttl expiration during the scan.

regards!

Yong

On Thu, Feb 7, 2013 at 6:29 PM, Ted Yu <yu...@gmail.com> wrote:
> Can you give us the use case where the scanning order is significant ?
>
> Thanks
>
> On Thu, Feb 7, 2013 at 9:23 AM, yonghu <yo...@gmail.com> wrote:
>
>> Dear all,
>>
>> I wonder if it is possible to indicate the column scan order when
>> scanning table. For example, if I have two column families cf1 and cf2
>> and I create a scan object. Is the table scanning order of
>> scan.addFamily(cf1) and   scan.addFamily(cf2) is as same as
>> scan.addFamily(cf2) and scan.addFamily(cf1)? If it's the same order,
>> is it possible to indicate the scanning order of table?
>>
>> regards!
>>
>> Yong
>>

Re: Is it possible to indicate the column scan order when scanning table?

Posted by Ted Yu <yu...@gmail.com>.
Can you give us the use case where the scanning order is significant ?

Thanks

On Thu, Feb 7, 2013 at 9:23 AM, yonghu <yo...@gmail.com> wrote:

> Dear all,
>
> I wonder if it is possible to indicate the column scan order when
> scanning table. For example, if I have two column families cf1 and cf2
> and I create a scan object. Is the table scanning order of
> scan.addFamily(cf1) and   scan.addFamily(cf2) is as same as
> scan.addFamily(cf2) and scan.addFamily(cf1)? If it's the same order,
> is it possible to indicate the scanning order of table?
>
> regards!
>
> Yong
>