You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Akshay Ballarpure <ak...@tcs.com> on 2014/07/31 08:45:45 UTC

Cassandra - Pig integration

Hello,
I am trying to integrate cassandra into Hadoop and PIG and trying to load 
CSV file into Cassandra using PIG Script. Can someone help ?

root@hadoop-1:/home/hduser/apache-cassandra-2.0.9/examples/pig# cat 
pigCasandra.pig
data = LOAD 'example.csv' using PigStorage(',') AS (row_id: chararray, 
value1: chararray, value2: int);
data_to_insert = FOREACH data GENERATE TOTUPLE( TOTUPLE('row_id',row_id) 
), TOTUPLE(value1, value2);
STORE data_to_insert INTO 'cql://myschema/example?output_query=update 
example set value1 @ #,value2 @ #' USING CqlStorage();



root@hadoop-1:/home/hduser/apache-cassandra-2.0.9/examples/pig# 
/home/hduser/pig/pig-0.13.0/bin/pig pigCasandra.pig
14/07/31 17:38:00 INFO pig.ExecTypeProvider: Trying ExecType : LOCAL
14/07/31 17:38:00 INFO pig.ExecTypeProvider: Trying ExecType : MAPREDUCE
14/07/31 17:38:00 INFO pig.ExecTypeProvider: Picked MAPREDUCE as the 
ExecType
2014-07-31 17:38:00,078 [main] INFO  org.apache.pig.Main - Apache Pig 
version 0.13.0 (r1606446) compiled Jun 29 2014, 02:27:58
2014-07-31 17:38:00,078 [main] INFO  org.apache.pig.Main - Logging error 
messages to: 
/home/hduser/apache-cassandra-2.0.9/examples/pig/pig_1406808480077.log
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in 
[jar:file:/home/hduser/yarn/hadoop-2.4.1/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in 
[jar:file:/home/hduser/apache-cassandra-2.0.9/lib/slf4j-log4j12-1.7.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an 
explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
Java HotSpot(TM) 64-Bit Server VM warning: You have loaded library 
/home/hduser/yarn/hadoop-2.4.1/lib/native/libhadoop.so.1.0.0 which might 
have disabled sta       ck guard. The VM will try to fix the stack guard 
now.
It's highly recommended that you fix the library with 'execstack -c 
<libfile>', or link it with '-z noexecstack'.
2014-07-31 17:38:00,255 [main] WARN 
org.apache.hadoop.util.NativeCodeLoader - Unable to load native-hadoop 
library for your platform... using builtin-java c       lasses where 
applicable
2014-07-31 17:38:00,398 [main] INFO  org.apache.pig.impl.util.Utils - 
Default bootup file /root/.pigbootup not found
2014-07-31 17:38:00,484 [main] INFO 
org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker is 
deprecated. Instead, use mapreduce.jobtracker.a       ddress
2014-07-31 17:38:00,484 [main] INFO 
org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is 
deprecated. Instead, use fs.defaultFS
2014-07-31 17:38:00,484 [main] INFO 
org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - 
Connecting to hadoop file system at: hdfs://master:9000
2014-07-31 17:38:01,431 [main] INFO 
org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is 
deprecated. Instead, use fs.defaultFS
2014-07-31 17:38:01,557 [main] INFO 
org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is 
deprecated. Instead, use fs.defaultFS
2014-07-31 17:38:01,609 [main] ERROR org.apache.pig.tools.grunt.Grunt - 
ERROR 2998: Unhandled internal error. 
com/datastax/driver/core/policies/LoadBalancing       Policy
Details at logfile: 
/home/hduser/apache-cassandra-2.0.9/examples/pig/pig_1406808480077.log

Thanks & Regards
Akshay Ghanshyam Ballarpure
Tata Consultancy Services
Cell:- 9985084075
Mailto: akshay.ballarpure@tcs.com
Website: http://www.tcs.com
____________________________________________
Experience certainty.   IT Services
                        Business Solutions
                        Consulting
____________________________________________
=====-----=====-----=====
Notice: The information contained in this e-mail
message and/or attachments to it may contain 
confidential or privileged information. If you are 
not the intended recipient, any dissemination, use, 
review, distribution, printing or copying of the 
information contained in this e-mail message 
and/or attachments to it are strictly prohibited. If 
you have received this communication in error, 
please notify us by reply e-mail or telephone and 
immediately and permanently delete the message 
and any attachments. Thank you



Re: Cassandra - Pig integration

Posted by Kevin Burton <bu...@spinn3r.com>.
I think you need to send that details file for us to give you more
information:

 /home/hduser/apache-cassandra-2.0.9/examples/pig/pig_1406808480077.log

pig stores its runtime exceptions in a main details file that has the guts
of the problem.

Also, try posting to the pig user list.

I'll tell you this though.  It's very difficult to work with Pig +
Cassandra… I think mostly due to Pigs use of tuples and having to force
cassandra to make its columns tuples to work with pig.

It can be rather confusing.

Also, google for pig-with-cassandra.

There should probably be a dedicated forum for running cassandra with pig
as there are so many moving components.


On Wed, Jul 30, 2014 at 11:45 PM, Akshay Ballarpure <
akshay.ballarpure@tcs.com> wrote:

> Hello,
> I am trying to integrate cassandra into Hadoop and PIG and trying to load
> CSV file into Cassandra using PIG Script. Can someone help ?
>
> root@hadoop-1:/home/hduser/apache-cassandra-2.0.9/examples/pig# cat
> pigCasandra.pig
> data = LOAD 'example.csv' using PigStorage(',') AS (row_id: chararray,
> value1: chararray, value2: int);
> data_to_insert = FOREACH data GENERATE TOTUPLE( TOTUPLE('row_id',row_id)
> ), TOTUPLE(value1, value2);
> STORE data_to_insert INTO 'cql://myschema/example?output_query=update
> example set value1 @ #,value2 @ #' USING CqlStorage();
>
>
>
> root@hadoop-1:/home/hduser/apache-cassandra-2.0.9/examples/pig#
> /home/hduser/pig/pig-0.13.0/bin/pig pigCasandra.pig
> 14/07/31 17:38:00 INFO pig.ExecTypeProvider: Trying ExecType : LOCAL
> 14/07/31 17:38:00 INFO pig.ExecTypeProvider: Trying ExecType : MAPREDUCE
> 14/07/31 17:38:00 INFO pig.ExecTypeProvider: Picked MAPREDUCE as the
> ExecType
> 2014-07-31 17:38:00,078 [main] INFO  org.apache.pig.Main - Apache Pig
> version 0.13.0 (r1606446) compiled Jun 29 2014, 02:27:58
> 2014-07-31 17:38:00,078 [main] INFO  org.apache.pig.Main - Logging error
> messages to:
> /home/hduser/apache-cassandra-2.0.9/examples/pig/pig_1406808480077.log
> SLF4J: Class path contains multiple SLF4J bindings.
> SLF4J: Found binding in
> [jar:file:/home/hduser/yarn/hadoop-2.4.1/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in
> [jar:file:/home/hduser/apache-cassandra-2.0.9/lib/slf4j-log4j12-1.7.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an
> explanation.
> SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
> Java HotSpot(TM) 64-Bit Server VM warning: You have loaded library
> /home/hduser/yarn/hadoop-2.4.1/lib/native/libhadoop.so.1.0.0 which might
> have disabled sta       ck guard. The VM will try to fix the stack guard
> now.
> It's highly recommended that you fix the library with 'execstack -c
> <libfile>', or link it with '-z noexecstack'.
> 2014-07-31 17:38:00,255 [main] WARN
>  org.apache.hadoop.util.NativeCodeLoader - Unable to load native-hadoop
> library for your platform... using builtin-java c       lasses where
> applicable
> 2014-07-31 17:38:00,398 [main] INFO  org.apache.pig.impl.util.Utils -
> Default bootup file /root/.pigbootup not found
> 2014-07-31 17:38:00,484 [main] INFO
>  org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker is
> deprecated. Instead, use mapreduce.jobtracker.a       ddress
> 2014-07-31 17:38:00,484 [main] INFO
>  org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is
> deprecated. Instead, use fs.defaultFS
> 2014-07-31 17:38:00,484 [main] INFO
>  org.apache.pig.backend.hadoop.executionengine.HExecutionEngine -
> Connecting to hadoop file system at: hdfs://master:9000
> 2014-07-31 17:38:01,431 [main] INFO
>  org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is
> deprecated. Instead, use fs.defaultFS
> 2014-07-31 17:38:01,557 [main] INFO
>  org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is
> deprecated. Instead, use fs.defaultFS
> 2014-07-31 17:38:01,609 [main] ERROR org.apache.pig.tools.grunt.Grunt -
> ERROR 2998: Unhandled internal error.
> com/datastax/driver/core/policies/LoadBalancing       Policy
> Details at logfile:
> /home/hduser/apache-cassandra-2.0.9/examples/pig/pig_1406808480077.log
>
> Thanks & Regards
> Akshay Ghanshyam Ballarpure
> Tata Consultancy Services
> Cell:- 9985084075
> Mailto: akshay.ballarpure@tcs.com
> Website: http://www.tcs.com
> ____________________________________________
> Experience certainty.        IT Services
>                        Business Solutions
>                        Consulting
> ____________________________________________
>
> =====-----=====-----=====
> Notice: The information contained in this e-mail
> message and/or attachments to it may contain
> confidential or privileged information. If you are
> not the intended recipient, any dissemination, use,
> review, distribution, printing or copying of the
> information contained in this e-mail message
> and/or attachments to it are strictly prohibited. If
> you have received this communication in error,
> please notify us by reply e-mail or telephone and
> immediately and permanently delete the message
> and any attachments. Thank you
>
>


-- 

Founder/CEO Spinn3r.com
Location: *San Francisco, CA*
blog: http://burtonator.wordpress.com
… or check out my Google+ profile
<https://plus.google.com/102718274791889610666/posts>
<http://spinn3r.com>