You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Dinusha Dilrukshi <sd...@gmail.com> on 2013/02/10 04:15:46 UTC
Issues with writing data to Cassandra column family using a Hive script
Hi All,
Data was originally stored in column family called "test_cf". Definition of
column family is as follows:
CREATE COLUMN FAMILY test_cf
WITH COMPARATOR = 'IntegerType'
AND key_validation_class = UTF8Type
AND default_validation_class = FloatType;
And, following is the sample data set that contains in "test_cf".
cqlsh:temp_ks> select * from test_cf;
key | column1 | value
------------------+----------------+-------
localhost:8282 | 1350468600 | 76
localhost:8282 | 1350468601 | 76
Hive script (shown in the end of mail) is use to take the data from above
column family "test_cf" and insert into a new column family
called "cpu_avg_5min_new7". Column family description
of "cpu_avg_5min_new7" is also same as the test_cf. Issue is, data written
in to "cpu_avg_5min_new7" column family after executing the hive script is
as follows. It's not in the format of data present in the original column
family "test_cf". Any explanations would highly appreciate..
cqlsh:temp_ks> select * from cpu_avg_5min_new7;
key | column1 | value
------------------+------------------------------+----------
localhost:8282 | 232340574229062170849328 | 1.09e-05
localhost:8282 | 232340574229062170849329 | 1.09e-05
Hive script:
----------------
drop table cpu_avg_5min_new7_hive;
CREATE EXTERNAL TABLE IF NOT EXISTS cpu_avg_5min_new7_hive (src_id STRING,
start_time INT, cpu_avg FLOAT) STORED BY
'org.apache.hadoop.hive.cassandra.CassandraStorageHandler' WITH
SERDEPROPERTIES (
"cassandra.host" = "127.0.0.1" , "cassandra.port" = "9160" , "
cassandra.ks.name" = "temp_ks" ,
"cassandra.ks.username" = "xxx" , "cassandra.ks.password" = "xxx" ,
"cassandra.columns.mapping" = ":key,:column,:value" , "cassandra.cf.name"
= "cpu_avg_5min_new7" );
drop table xxx;
CREATE EXTERNAL TABLE IF NOT EXISTS xxx (src_id STRING, start_time INT,
cpu_avg FLOAT) STORED BY
'org.apache.hadoop.hive.cassandra.CassandraStorageHandler' WITH
SERDEPROPERTIES (
"cassandra.host" = "127.0.0.1" , "cassandra.port" = "9160" , "
cassandra.ks.name" = "temp_ks" ,
"cassandra.ks.username" = "xxx" , "cassandra.ks.password" = "xxx" ,
"cassandra.columns.mapping" = ":key,:column,:value" , "cassandra.cf.name"
= "test_cf" );
insert overwrite table cpu_avg_5min_new7_hive select
src_id,start_time,cpu_avg from xxx;
Regards,
Dinusha.
Re: Issues with writing data to Cassandra column family using a Hive script
Posted by Dinusha Dilrukshi <sd...@gmail.com>.
Hi Aaron,
Thanks for the reply.. I ll try out your suggestion.
Regards,
Dinusha.
On Mon, Feb 11, 2013 at 1:55 AM, aaron morton <aa...@thelastpickle.com>wrote:
> Don't use the variable length Cassandra integer, use the Int32Type. It
> also sounds like you want to use a DoubleType rather than FloatType.
>
> http://www.datastax.com/docs/datastax_enterprise2.2/solutions/about_hive#hive-to-cassandra-table-mapping
>
> Cheers
>
>
> -----------------
> Aaron Morton
> Freelance Cassandra Developer
> New Zealand
>
> @aaronmorton
> http://www.thelastpickle.com
>
> On 10/02/2013, at 4:15 PM, Dinusha Dilrukshi <sd...@gmail.com>
> wrote:
>
> Hi All,
>
> Data was originally stored in column family called "test_cf". Definition
> of column family is as follows:
>
> CREATE COLUMN FAMILY test_cf
> WITH COMPARATOR = 'IntegerType'
> AND key_validation_class = UTF8Type
> AND default_validation_class = FloatType;
>
> And, following is the sample data set that contains in "test_cf".
>
> cqlsh:temp_ks> select * from test_cf;
> key | column1 | value
> ------------------+----------------+-------
> localhost:8282 | 1350468600 | 76
> localhost:8282 | 1350468601 | 76
>
>
> Hive script (shown in the end of mail) is use to take the data from above
> column family "test_cf" and insert into a new column family
> called "cpu_avg_5min_new7". Column family description
> of "cpu_avg_5min_new7" is also same as the test_cf. Issue is, data written
> in to "cpu_avg_5min_new7" column family after executing the hive script is
> as follows. It's not in the format of data present in the original column
> family "test_cf". Any explanations would highly appreciate..
>
>
> cqlsh:temp_ks> select * from cpu_avg_5min_new7;
> key | column1 | value
> ------------------+------------------------------+----------
> localhost:8282 | 232340574229062170849328 | 1.09e-05
> localhost:8282 | 232340574229062170849329 | 1.09e-05
>
>
> Hive script:
> ----------------
> drop table cpu_avg_5min_new7_hive;
> CREATE EXTERNAL TABLE IF NOT EXISTS cpu_avg_5min_new7_hive (src_id STRING,
> start_time INT, cpu_avg FLOAT) STORED BY
> 'org.apache.hadoop.hive.cassandra.CassandraStorageHandler' WITH
> SERDEPROPERTIES (
> "cassandra.host" = "127.0.0.1" , "cassandra.port" = "9160" , "
> cassandra.ks.name" = "temp_ks" ,
> "cassandra.ks.username" = "xxx" , "cassandra.ks.password" = "xxx" ,
> "cassandra.columns.mapping" = ":key,:column,:value" , "cassandra.cf.name"
> = "cpu_avg_5min_new7" );
>
> drop table xxx;
> CREATE EXTERNAL TABLE IF NOT EXISTS xxx (src_id STRING, start_time INT,
> cpu_avg FLOAT) STORED BY
> 'org.apache.hadoop.hive.cassandra.CassandraStorageHandler' WITH
> SERDEPROPERTIES (
> "cassandra.host" = "127.0.0.1" , "cassandra.port" = "9160" , "
> cassandra.ks.name" = "temp_ks" ,
> "cassandra.ks.username" = "xxx" , "cassandra.ks.password" = "xxx" ,
> "cassandra.columns.mapping" = ":key,:column,:value" , "
> cassandra.cf.name" = "test_cf" );
>
> insert overwrite table cpu_avg_5min_new7_hive select
> src_id,start_time,cpu_avg from xxx;
>
> Regards,
> Dinusha.
>
>
>
>
Re: Issues with writing data to Cassandra column family using a Hive script
Posted by aaron morton <aa...@thelastpickle.com>.
Don't use the variable length Cassandra integer, use the Int32Type. It also sounds like you want to use a DoubleType rather than FloatType.
http://www.datastax.com/docs/datastax_enterprise2.2/solutions/about_hive#hive-to-cassandra-table-mapping
Cheers
-----------------
Aaron Morton
Freelance Cassandra Developer
New Zealand
@aaronmorton
http://www.thelastpickle.com
On 10/02/2013, at 4:15 PM, Dinusha Dilrukshi <sd...@gmail.com> wrote:
> Hi All,
>
> Data was originally stored in column family called "test_cf". Definition of column family is as follows:
>
> CREATE COLUMN FAMILY test_cf
> WITH COMPARATOR = 'IntegerType'
> AND key_validation_class = UTF8Type
> AND default_validation_class = FloatType;
>
> And, following is the sample data set that contains in "test_cf".
>
> cqlsh:temp_ks> select * from test_cf;
> key | column1 | value
> ------------------+----------------+-------
> localhost:8282 | 1350468600 | 76
> localhost:8282 | 1350468601 | 76
>
>
> Hive script (shown in the end of mail) is use to take the data from above column family "test_cf" and insert into a new column family called "cpu_avg_5min_new7". Column family description of "cpu_avg_5min_new7" is also same as the test_cf. Issue is, data written in to "cpu_avg_5min_new7" column family after executing the hive script is as follows. It's not in the format of data present in the original column family "test_cf". Any explanations would highly appreciate..
>
>
> cqlsh:temp_ks> select * from cpu_avg_5min_new7;
> key | column1 | value
> ------------------+------------------------------+----------
> localhost:8282 | 232340574229062170849328 | 1.09e-05
> localhost:8282 | 232340574229062170849329 | 1.09e-05
>
>
> Hive script:
> ----------------
> drop table cpu_avg_5min_new7_hive;
> CREATE EXTERNAL TABLE IF NOT EXISTS cpu_avg_5min_new7_hive (src_id STRING, start_time INT, cpu_avg FLOAT) STORED BY
> 'org.apache.hadoop.hive.cassandra.CassandraStorageHandler' WITH SERDEPROPERTIES (
> "cassandra.host" = "127.0.0.1" , "cassandra.port" = "9160" , "cassandra.ks.name" = "temp_ks" ,
> "cassandra.ks.username" = "xxx" , "cassandra.ks.password" = "xxx" ,
> "cassandra.columns.mapping" = ":key,:column,:value" , "cassandra.cf.name" = "cpu_avg_5min_new7" );
>
> drop table xxx;
> CREATE EXTERNAL TABLE IF NOT EXISTS xxx (src_id STRING, start_time INT, cpu_avg FLOAT) STORED BY
> 'org.apache.hadoop.hive.cassandra.CassandraStorageHandler' WITH SERDEPROPERTIES (
> "cassandra.host" = "127.0.0.1" , "cassandra.port" = "9160" , "cassandra.ks.name" = "temp_ks" ,
> "cassandra.ks.username" = "xxx" , "cassandra.ks.password" = "xxx" ,
> "cassandra.columns.mapping" = ":key,:column,:value" , "cassandra.cf.name" = "test_cf" );
>
> insert overwrite table cpu_avg_5min_new7_hive select src_id,start_time,cpu_avg from xxx;
>
> Regards,
> Dinusha.
>
>