You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hbase.apache.org by Ramasubramanian <ra...@gmail.com> on 2012/11/28 05:40:58 UTC

Regarding rework in changing column family

Hi,

I have created table in hbase with one column family and planned to release for development (in pentaho). 

Suppose later after doing the data profiling in production if I feel that out of 600 columns 200 is not going to get used frequently I am planning to group those into another column family. 

If I change the column family at later point of time I hope there will a lots of rework that has to be done (either if we use java or pentaho). Is my understanding is correct? Is there any other alternative available to overcome?

Regards,
Rams

RE: Regarding rework in changing column family

Posted by Anoop Sam John <an...@huawei.com>.

Also what about the current data in the table. Now all are under the single CF. Modifying the table with addition of a new CF will not move data to the new family!
Remember HBase only deals with CF at the table schema level. There is no qualifiers in the schema as such. When data is inserted/retrieved we can specify a qualifier.

-Anoop-
________________________________________
From: ramkrishna vasudevan [ramkrishna.s.vasudevan@gmail.com]
Sent: Wednesday, November 28, 2012 11:41 AM
To: user@hbase.apache.org
Subject: Re: Regarding rework in changing column family

I am afraid it has to be changed...Because for your puts to go to the
specified Col family the col family name should appear in your Puts that is
created by the client.

Regards
Ram

On Wed, Nov 28, 2012 at 11:18 AM, Ramasubramanian Narayanan <
ramasubramanian.narayanan@gmail.com> wrote:

> Thanks Ram!!!
>
> My question is like this...
>
> suppose I have create a table with 100 columns with single column family
> 'cf1',
>
> now in production there are billions of records are there in that table and
> there are mulitiple programs that is feeding into this table (let us take
> some 50 programs)...
>
> In this scenario, if I change the column family like first 40 columns let
> it be in 'cf1', the last 60 columns I want to move to new column family
> 'cf2', in this case, *do we need to change all 50 programs which are
> inserting into that table with 'cf1' for all columns?*
> *
> *
> regards,
> Rams
>
> On Wed, Nov 28, 2012 at 10:24 AM, ramkrishna vasudevan <
> ramkrishna.s.vasudevan@gmail.com> wrote:
>
> > As far as i see altering the table with the new columnfamily should be
> > easier.
> > -> disable the table
> > -> Issue modify table command with the new col family.
> > -> run a compaction.
> > Now after this when you start doing your puts, they should be in
> alignment
> > with the new schema defined for the table.  You may have to see one thing
> > is how much your rate of puts is getting affected because now both of
> your
> > CFs will start flushing whenever a memstore flush happens.
> >
> > Hope this helps.
> >
> > Regards
> > Ram
> >
> > On Wed, Nov 28, 2012 at 10:10 AM, Ramasubramanian <
> > ramasubramanian.narayanan@gmail.com> wrote:
> >
> > > Hi,
> > >
> > > I have created table in hbase with one column family and planned to
> > > release for development (in pentaho).
> > >
> > > Suppose later after doing the data profiling in production if I feel
> that
> > > out of 600 columns 200 is not going to get used frequently I am
> planning
> > to
> > > group those into another column family.
> > >
> > > If I change the column family at later point of time I hope there will
> a
> > > lots of rework that has to be done (either if we use java or pentaho).
> Is
> > > my understanding is correct? Is there any other alternative available
> to
> > > overcome?
> > >
> > > Regards,
> > > Rams
> >
>

Re: Regarding rework in changing column family

Posted by Harsh J <ha...@cloudera.com>.

Ram and Anoop have already said yes to that (that the programs will
have to be changed).

If you are crazy enough: I think you could possibly use the
co-processor, as far as writes go, via (RegionObserver#prePut), to
mutate the Put to duplicate the needed columns into another family as
well (via Put.add(…)), and perhaps run another periodic job until the
upgrade is complete, to prune or move the older/duplicate values out
via Deletes.

However, I can't imagine how you'd be able to handle reads gracefully,
if your reads already rely on a ColFam name (instead of iterating over
the whole result). The scenario would then be similar to a SQL query
breaking cause one of its tables altered a column name.

On Wed, Nov 28, 2012 at 1:41 PM, Ramasubramanian Narayanan
<ra...@gmail.com> wrote:
> Ram,
>
> In Java code, we use the following syntax which contains column family as
> one of the parameter for the "Add Record" function..
>
> *addrecord(String TableName, String RowKey, String ColumnFamilyName, String
> Qualifier, String Value);*
>
>
> public static void addRecord(String tableName, String rowKey, String
> family, String qualifier, String value) throws Exception {
>                 try{
>                         HTable table = new HTable(conf, tableName);
>                         Put put = new Put(Bytes.toBytes(rowKey));
>                         put.add(Bytes.toBytes(family),
> Bytes.toBytes(qualifier), Bytes.toBytes(value));
>                         table.put(put);
>                         System.out.println("insert recored " + rowKey + "
> to table "+ tableName + " ok.");
>                 }catch(IOException e){
>                         e.printStackTrace();
>                 }
>     }
>
>
> So if we change the column family, do we need to change all the Java
> programs which uses the old column family for a field?
>
> regards,
> Rams
>
> On Wed, Nov 28, 2012 at 11:41 AM, ramkrishna vasudevan <
> ramkrishna.s.vasudevan@gmail.com> wrote:
>
>> I am afraid it has to be changed...Because for your puts to go to the
>> specified Col family the col family name should appear in your Puts that is
>> created by the client.
>>
>> Regards
>> Ram
>>
>> On Wed, Nov 28, 2012 at 11:18 AM, Ramasubramanian Narayanan <
>> ramasubramanian.narayanan@gmail.com> wrote:
>>
>> > Thanks Ram!!!
>> >
>> > My question is like this...
>> >
>> > suppose I have create a table with 100 columns with single column family
>> > 'cf1',
>> >
>> > now in production there are billions of records are there in that table
>> and
>> > there are mulitiple programs that is feeding into this table (let us take
>> > some 50 programs)...
>> >
>> > In this scenario, if I change the column family like first 40 columns let
>> > it be in 'cf1', the last 60 columns I want to move to new column family
>> > 'cf2', in this case, *do we need to change all 50 programs which are
>> > inserting into that table with 'cf1' for all columns?*
>> > *
>> > *
>> > regards,
>> > Rams
>> >
>> > On Wed, Nov 28, 2012 at 10:24 AM, ramkrishna vasudevan <
>> > ramkrishna.s.vasudevan@gmail.com> wrote:
>> >
>> > > As far as i see altering the table with the new columnfamily should be
>> > > easier.
>> > > -> disable the table
>> > > -> Issue modify table command with the new col family.
>> > > -> run a compaction.
>> > > Now after this when you start doing your puts, they should be in
>> > alignment
>> > > with the new schema defined for the table.  You may have to see one
>> thing
>> > > is how much your rate of puts is getting affected because now both of
>> > your
>> > > CFs will start flushing whenever a memstore flush happens.
>> > >
>> > > Hope this helps.
>> > >
>> > > Regards
>> > > Ram
>> > >
>> > > On Wed, Nov 28, 2012 at 10:10 AM, Ramasubramanian <
>> > > ramasubramanian.narayanan@gmail.com> wrote:
>> > >
>> > > > Hi,
>> > > >
>> > > > I have created table in hbase with one column family and planned to
>> > > > release for development (in pentaho).
>> > > >
>> > > > Suppose later after doing the data profiling in production if I feel
>> > that
>> > > > out of 600 columns 200 is not going to get used frequently I am
>> > planning
>> > > to
>> > > > group those into another column family.
>> > > >
>> > > > If I change the column family at later point of time I hope there
>> will
>> > a
>> > > > lots of rework that has to be done (either if we use java or
>> pentaho).
>> > Is
>> > > > my understanding is correct? Is there any other alternative available
>> > to
>> > > > overcome?
>> > > >
>> > > > Regards,
>> > > > Rams
>> > >
>> >
>>



-- 
Harsh J

Re: Regarding rework in changing column family

Posted by ramkrishna vasudevan <ra...@gmail.com>.

As i can understand from your previous question,
-> You want to add another col family and move some part of the cols to the
new col family.
So now say i had CF1 with me...

I could have inserted CF1:q1, CF1:q2, CF1:q3 etc....

Now i have introduced CF2,,,

So what you need to do is
CF1:q1, CF1:q2,CF1:q3, CF2:q3  can be added in one put.

Note that you can have same qualifier name inside diff Col family.

So your addRecord api should be able to accept the new Col family also.

Regards
Ram

On Wed, Nov 28, 2012 at 1:41 PM, Ramasubramanian Narayanan <
ramasubramanian.narayanan@gmail.com> wrote:

> Ram,
>
> In Java code, we use the following syntax which contains column family as
> one of the parameter for the "Add Record" function..
>
> *addrecord(String TableName, String RowKey, String ColumnFamilyName, String
> Qualifier, String Value);*
>
>
> public static void addRecord(String tableName, String rowKey, String
> family, String qualifier, String value) throws Exception {
>                 try{
>                         HTable table = new HTable(conf, tableName);
>                         Put put = new Put(Bytes.toBytes(rowKey));
>                         put.add(Bytes.toBytes(family),
> Bytes.toBytes(qualifier), Bytes.toBytes(value));
>                         table.put(put);
>                         System.out.println("insert recored " + rowKey + "
> to table "+ tableName + " ok.");
>                 }catch(IOException e){
>                         e.printStackTrace();
>                 }
>     }
>
>
> So if we change the column family, do we need to change all the Java
> programs which uses the old column family for a field?
>
> regards,
> Rams
>
> On Wed, Nov 28, 2012 at 11:41 AM, ramkrishna vasudevan <
> ramkrishna.s.vasudevan@gmail.com> wrote:
>
> > I am afraid it has to be changed...Because for your puts to go to the
> > specified Col family the col family name should appear in your Puts that
> is
> > created by the client.
> >
> > Regards
> > Ram
> >
> > On Wed, Nov 28, 2012 at 11:18 AM, Ramasubramanian Narayanan <
> > ramasubramanian.narayanan@gmail.com> wrote:
> >
> > > Thanks Ram!!!
> > >
> > > My question is like this...
> > >
> > > suppose I have create a table with 100 columns with single column
> family
> > > 'cf1',
> > >
> > > now in production there are billions of records are there in that table
> > and
> > > there are mulitiple programs that is feeding into this table (let us
> take
> > > some 50 programs)...
> > >
> > > In this scenario, if I change the column family like first 40 columns
> let
> > > it be in 'cf1', the last 60 columns I want to move to new column family
> > > 'cf2', in this case, *do we need to change all 50 programs which are
> > > inserting into that table with 'cf1' for all columns?*
> > > *
> > > *
> > > regards,
> > > Rams
> > >
> > > On Wed, Nov 28, 2012 at 10:24 AM, ramkrishna vasudevan <
> > > ramkrishna.s.vasudevan@gmail.com> wrote:
> > >
> > > > As far as i see altering the table with the new columnfamily should
> be
> > > > easier.
> > > > -> disable the table
> > > > -> Issue modify table command with the new col family.
> > > > -> run a compaction.
> > > > Now after this when you start doing your puts, they should be in
> > > alignment
> > > > with the new schema defined for the table.  You may have to see one
> > thing
> > > > is how much your rate of puts is getting affected because now both of
> > > your
> > > > CFs will start flushing whenever a memstore flush happens.
> > > >
> > > > Hope this helps.
> > > >
> > > > Regards
> > > > Ram
> > > >
> > > > On Wed, Nov 28, 2012 at 10:10 AM, Ramasubramanian <
> > > > ramasubramanian.narayanan@gmail.com> wrote:
> > > >
> > > > > Hi,
> > > > >
> > > > > I have created table in hbase with one column family and planned to
> > > > > release for development (in pentaho).
> > > > >
> > > > > Suppose later after doing the data profiling in production if I
> feel
> > > that
> > > > > out of 600 columns 200 is not going to get used frequently I am
> > > planning
> > > > to
> > > > > group those into another column family.
> > > > >
> > > > > If I change the column family at later point of time I hope there
> > will
> > > a
> > > > > lots of rework that has to be done (either if we use java or
> > pentaho).
> > > Is
> > > > > my understanding is correct? Is there any other alternative
> available
> > > to
> > > > > overcome?
> > > > >
> > > > > Regards,
> > > > > Rams
> > > >
> > >
> >
>

Re: Regarding rework in changing column family

Posted by Ramasubramanian Narayanan <ra...@gmail.com>.

Ram,

In Java code, we use the following syntax which contains column family as
one of the parameter for the "Add Record" function..

*addrecord(String TableName, String RowKey, String ColumnFamilyName, String
Qualifier, String Value);*


public static void addRecord(String tableName, String rowKey, String
family, String qualifier, String value) throws Exception {
                try{
                        HTable table = new HTable(conf, tableName);
                        Put put = new Put(Bytes.toBytes(rowKey));
                        put.add(Bytes.toBytes(family),
Bytes.toBytes(qualifier), Bytes.toBytes(value));
                        table.put(put);
                        System.out.println("insert recored " + rowKey + "
to table "+ tableName + " ok.");
                }catch(IOException e){
                        e.printStackTrace();
                }
    }


So if we change the column family, do we need to change all the Java
programs which uses the old column family for a field?

regards,
Rams

On Wed, Nov 28, 2012 at 11:41 AM, ramkrishna vasudevan <
ramkrishna.s.vasudevan@gmail.com> wrote:

> I am afraid it has to be changed...Because for your puts to go to the
> specified Col family the col family name should appear in your Puts that is
> created by the client.
>
> Regards
> Ram
>
> On Wed, Nov 28, 2012 at 11:18 AM, Ramasubramanian Narayanan <
> ramasubramanian.narayanan@gmail.com> wrote:
>
> > Thanks Ram!!!
> >
> > My question is like this...
> >
> > suppose I have create a table with 100 columns with single column family
> > 'cf1',
> >
> > now in production there are billions of records are there in that table
> and
> > there are mulitiple programs that is feeding into this table (let us take
> > some 50 programs)...
> >
> > In this scenario, if I change the column family like first 40 columns let
> > it be in 'cf1', the last 60 columns I want to move to new column family
> > 'cf2', in this case, *do we need to change all 50 programs which are
> > inserting into that table with 'cf1' for all columns?*
> > *
> > *
> > regards,
> > Rams
> >
> > On Wed, Nov 28, 2012 at 10:24 AM, ramkrishna vasudevan <
> > ramkrishna.s.vasudevan@gmail.com> wrote:
> >
> > > As far as i see altering the table with the new columnfamily should be
> > > easier.
> > > -> disable the table
> > > -> Issue modify table command with the new col family.
> > > -> run a compaction.
> > > Now after this when you start doing your puts, they should be in
> > alignment
> > > with the new schema defined for the table.  You may have to see one
> thing
> > > is how much your rate of puts is getting affected because now both of
> > your
> > > CFs will start flushing whenever a memstore flush happens.
> > >
> > > Hope this helps.
> > >
> > > Regards
> > > Ram
> > >
> > > On Wed, Nov 28, 2012 at 10:10 AM, Ramasubramanian <
> > > ramasubramanian.narayanan@gmail.com> wrote:
> > >
> > > > Hi,
> > > >
> > > > I have created table in hbase with one column family and planned to
> > > > release for development (in pentaho).
> > > >
> > > > Suppose later after doing the data profiling in production if I feel
> > that
> > > > out of 600 columns 200 is not going to get used frequently I am
> > planning
> > > to
> > > > group those into another column family.
> > > >
> > > > If I change the column family at later point of time I hope there
> will
> > a
> > > > lots of rework that has to be done (either if we use java or
> pentaho).
> > Is
> > > > my understanding is correct? Is there any other alternative available
> > to
> > > > overcome?
> > > >
> > > > Regards,
> > > > Rams
> > >
> >
>

Re: Regarding rework in changing column family

Posted by ramkrishna vasudevan <ra...@gmail.com>.

I am afraid it has to be changed...Because for your puts to go to the
specified Col family the col family name should appear in your Puts that is
created by the client.

Regards
Ram

On Wed, Nov 28, 2012 at 11:18 AM, Ramasubramanian Narayanan <
ramasubramanian.narayanan@gmail.com> wrote:

> Thanks Ram!!!
>
> My question is like this...
>
> suppose I have create a table with 100 columns with single column family
> 'cf1',
>
> now in production there are billions of records are there in that table and
> there are mulitiple programs that is feeding into this table (let us take
> some 50 programs)...
>
> In this scenario, if I change the column family like first 40 columns let
> it be in 'cf1', the last 60 columns I want to move to new column family
> 'cf2', in this case, *do we need to change all 50 programs which are
> inserting into that table with 'cf1' for all columns?*
> *
> *
> regards,
> Rams
>
> On Wed, Nov 28, 2012 at 10:24 AM, ramkrishna vasudevan <
> ramkrishna.s.vasudevan@gmail.com> wrote:
>
> > As far as i see altering the table with the new columnfamily should be
> > easier.
> > -> disable the table
> > -> Issue modify table command with the new col family.
> > -> run a compaction.
> > Now after this when you start doing your puts, they should be in
> alignment
> > with the new schema defined for the table.  You may have to see one thing
> > is how much your rate of puts is getting affected because now both of
> your
> > CFs will start flushing whenever a memstore flush happens.
> >
> > Hope this helps.
> >
> > Regards
> > Ram
> >
> > On Wed, Nov 28, 2012 at 10:10 AM, Ramasubramanian <
> > ramasubramanian.narayanan@gmail.com> wrote:
> >
> > > Hi,
> > >
> > > I have created table in hbase with one column family and planned to
> > > release for development (in pentaho).
> > >
> > > Suppose later after doing the data profiling in production if I feel
> that
> > > out of 600 columns 200 is not going to get used frequently I am
> planning
> > to
> > > group those into another column family.
> > >
> > > If I change the column family at later point of time I hope there will
> a
> > > lots of rework that has to be done (either if we use java or pentaho).
> Is
> > > my understanding is correct? Is there any other alternative available
> to
> > > overcome?
> > >
> > > Regards,
> > > Rams
> >
>

Re: Regarding rework in changing column family

Posted by Ramasubramanian Narayanan <ra...@gmail.com>.

Thanks Ram!!!

My question is like this...

suppose I have create a table with 100 columns with single column family
'cf1',

now in production there are billions of records are there in that table and
there are mulitiple programs that is feeding into this table (let us take
some 50 programs)...

In this scenario, if I change the column family like first 40 columns let
it be in 'cf1', the last 60 columns I want to move to new column family
'cf2', in this case, *do we need to change all 50 programs which are
inserting into that table with 'cf1' for all columns?*
*
*
regards,
Rams

On Wed, Nov 28, 2012 at 10:24 AM, ramkrishna vasudevan <
ramkrishna.s.vasudevan@gmail.com> wrote:

> As far as i see altering the table with the new columnfamily should be
> easier.
> -> disable the table
> -> Issue modify table command with the new col family.
> -> run a compaction.
> Now after this when you start doing your puts, they should be in alignment
> with the new schema defined for the table.  You may have to see one thing
> is how much your rate of puts is getting affected because now both of your
> CFs will start flushing whenever a memstore flush happens.
>
> Hope this helps.
>
> Regards
> Ram
>
> On Wed, Nov 28, 2012 at 10:10 AM, Ramasubramanian <
> ramasubramanian.narayanan@gmail.com> wrote:
>
> > Hi,
> >
> > I have created table in hbase with one column family and planned to
> > release for development (in pentaho).
> >
> > Suppose later after doing the data profiling in production if I feel that
> > out of 600 columns 200 is not going to get used frequently I am planning
> to
> > group those into another column family.
> >
> > If I change the column family at later point of time I hope there will a
> > lots of rework that has to be done (either if we use java or pentaho). Is
> > my understanding is correct? Is there any other alternative available to
> > overcome?
> >
> > Regards,
> > Rams
>

Re: Regarding rework in changing column family

Posted by ramkrishna vasudevan <ra...@gmail.com>.

As far as i see altering the table with the new columnfamily should be
easier.
-> disable the table
-> Issue modify table command with the new col family.
-> run a compaction.
Now after this when you start doing your puts, they should be in alignment
with the new schema defined for the table.  You may have to see one thing
is how much your rate of puts is getting affected because now both of your
CFs will start flushing whenever a memstore flush happens.

Hope this helps.

Regards
Ram

On Wed, Nov 28, 2012 at 10:10 AM, Ramasubramanian <
ramasubramanian.narayanan@gmail.com> wrote:

> Hi,
>
> I have created table in hbase with one column family and planned to
> release for development (in pentaho).
>
> Suppose later after doing the data profiling in production if I feel that
> out of 600 columns 200 is not going to get used frequently I am planning to
> group those into another column family.
>
> If I change the column family at later point of time I hope there will a
> lots of rework that has to be done (either if we use java or pentaho). Is
> my understanding is correct? Is there any other alternative available to
> overcome?
>
> Regards,
> Rams