You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Akshay Ballarpure <ak...@tcs.com> on 2014/07/31 08:45:45 UTC
Cassandra - Pig integration
Hello,
I am trying to integrate cassandra into Hadoop and PIG and trying to load
CSV file into Cassandra using PIG Script. Can someone help ?
root@hadoop-1:/home/hduser/apache-cassandra-2.0.9/examples/pig# cat
pigCasandra.pig
data = LOAD 'example.csv' using PigStorage(',') AS (row_id: chararray,
value1: chararray, value2: int);
data_to_insert = FOREACH data GENERATE TOTUPLE( TOTUPLE('row_id',row_id)
), TOTUPLE(value1, value2);
STORE data_to_insert INTO 'cql://myschema/example?output_query=update
example set value1 @ #,value2 @ #' USING CqlStorage();
root@hadoop-1:/home/hduser/apache-cassandra-2.0.9/examples/pig#
/home/hduser/pig/pig-0.13.0/bin/pig pigCasandra.pig
14/07/31 17:38:00 INFO pig.ExecTypeProvider: Trying ExecType : LOCAL
14/07/31 17:38:00 INFO pig.ExecTypeProvider: Trying ExecType : MAPREDUCE
14/07/31 17:38:00 INFO pig.ExecTypeProvider: Picked MAPREDUCE as the
ExecType
2014-07-31 17:38:00,078 [main] INFO org.apache.pig.Main - Apache Pig
version 0.13.0 (r1606446) compiled Jun 29 2014, 02:27:58
2014-07-31 17:38:00,078 [main] INFO org.apache.pig.Main - Logging error
messages to:
/home/hduser/apache-cassandra-2.0.9/examples/pig/pig_1406808480077.log
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in
[jar:file:/home/hduser/yarn/hadoop-2.4.1/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in
[jar:file:/home/hduser/apache-cassandra-2.0.9/lib/slf4j-log4j12-1.7.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an
explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
Java HotSpot(TM) 64-Bit Server VM warning: You have loaded library
/home/hduser/yarn/hadoop-2.4.1/lib/native/libhadoop.so.1.0.0 which might
have disabled sta ck guard. The VM will try to fix the stack guard
now.
It's highly recommended that you fix the library with 'execstack -c
<libfile>', or link it with '-z noexecstack'.
2014-07-31 17:38:00,255 [main] WARN
org.apache.hadoop.util.NativeCodeLoader - Unable to load native-hadoop
library for your platform... using builtin-java c lasses where
applicable
2014-07-31 17:38:00,398 [main] INFO org.apache.pig.impl.util.Utils -
Default bootup file /root/.pigbootup not found
2014-07-31 17:38:00,484 [main] INFO
org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker is
deprecated. Instead, use mapreduce.jobtracker.a ddress
2014-07-31 17:38:00,484 [main] INFO
org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is
deprecated. Instead, use fs.defaultFS
2014-07-31 17:38:00,484 [main] INFO
org.apache.pig.backend.hadoop.executionengine.HExecutionEngine -
Connecting to hadoop file system at: hdfs://master:9000
2014-07-31 17:38:01,431 [main] INFO
org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is
deprecated. Instead, use fs.defaultFS
2014-07-31 17:38:01,557 [main] INFO
org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is
deprecated. Instead, use fs.defaultFS
2014-07-31 17:38:01,609 [main] ERROR org.apache.pig.tools.grunt.Grunt -
ERROR 2998: Unhandled internal error.
com/datastax/driver/core/policies/LoadBalancing Policy
Details at logfile:
/home/hduser/apache-cassandra-2.0.9/examples/pig/pig_1406808480077.log
Thanks & Regards
Akshay Ghanshyam Ballarpure
Tata Consultancy Services
Cell:- 9985084075
Mailto: akshay.ballarpure@tcs.com
Website: http://www.tcs.com
____________________________________________
Experience certainty. IT Services
Business Solutions
Consulting
____________________________________________
=====-----=====-----=====
Notice: The information contained in this e-mail
message and/or attachments to it may contain
confidential or privileged information. If you are
not the intended recipient, any dissemination, use,
review, distribution, printing or copying of the
information contained in this e-mail message
and/or attachments to it are strictly prohibited. If
you have received this communication in error,
please notify us by reply e-mail or telephone and
immediately and permanently delete the message
and any attachments. Thank you
Re: Cassandra - Pig integration
Posted by Kevin Burton <bu...@spinn3r.com>.
I think you need to send that details file for us to give you more
information:
/home/hduser/apache-cassandra-2.0.9/examples/pig/pig_1406808480077.log
pig stores its runtime exceptions in a main details file that has the guts
of the problem.
Also, try posting to the pig user list.
I'll tell you this though. It's very difficult to work with Pig +
Cassandra… I think mostly due to Pigs use of tuples and having to force
cassandra to make its columns tuples to work with pig.
It can be rather confusing.
Also, google for pig-with-cassandra.
There should probably be a dedicated forum for running cassandra with pig
as there are so many moving components.
On Wed, Jul 30, 2014 at 11:45 PM, Akshay Ballarpure <
akshay.ballarpure@tcs.com> wrote:
> Hello,
> I am trying to integrate cassandra into Hadoop and PIG and trying to load
> CSV file into Cassandra using PIG Script. Can someone help ?
>
> root@hadoop-1:/home/hduser/apache-cassandra-2.0.9/examples/pig# cat
> pigCasandra.pig
> data = LOAD 'example.csv' using PigStorage(',') AS (row_id: chararray,
> value1: chararray, value2: int);
> data_to_insert = FOREACH data GENERATE TOTUPLE( TOTUPLE('row_id',row_id)
> ), TOTUPLE(value1, value2);
> STORE data_to_insert INTO 'cql://myschema/example?output_query=update
> example set value1 @ #,value2 @ #' USING CqlStorage();
>
>
>
> root@hadoop-1:/home/hduser/apache-cassandra-2.0.9/examples/pig#
> /home/hduser/pig/pig-0.13.0/bin/pig pigCasandra.pig
> 14/07/31 17:38:00 INFO pig.ExecTypeProvider: Trying ExecType : LOCAL
> 14/07/31 17:38:00 INFO pig.ExecTypeProvider: Trying ExecType : MAPREDUCE
> 14/07/31 17:38:00 INFO pig.ExecTypeProvider: Picked MAPREDUCE as the
> ExecType
> 2014-07-31 17:38:00,078 [main] INFO org.apache.pig.Main - Apache Pig
> version 0.13.0 (r1606446) compiled Jun 29 2014, 02:27:58
> 2014-07-31 17:38:00,078 [main] INFO org.apache.pig.Main - Logging error
> messages to:
> /home/hduser/apache-cassandra-2.0.9/examples/pig/pig_1406808480077.log
> SLF4J: Class path contains multiple SLF4J bindings.
> SLF4J: Found binding in
> [jar:file:/home/hduser/yarn/hadoop-2.4.1/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in
> [jar:file:/home/hduser/apache-cassandra-2.0.9/lib/slf4j-log4j12-1.7.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an
> explanation.
> SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
> Java HotSpot(TM) 64-Bit Server VM warning: You have loaded library
> /home/hduser/yarn/hadoop-2.4.1/lib/native/libhadoop.so.1.0.0 which might
> have disabled sta ck guard. The VM will try to fix the stack guard
> now.
> It's highly recommended that you fix the library with 'execstack -c
> <libfile>', or link it with '-z noexecstack'.
> 2014-07-31 17:38:00,255 [main] WARN
> org.apache.hadoop.util.NativeCodeLoader - Unable to load native-hadoop
> library for your platform... using builtin-java c lasses where
> applicable
> 2014-07-31 17:38:00,398 [main] INFO org.apache.pig.impl.util.Utils -
> Default bootup file /root/.pigbootup not found
> 2014-07-31 17:38:00,484 [main] INFO
> org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker is
> deprecated. Instead, use mapreduce.jobtracker.a ddress
> 2014-07-31 17:38:00,484 [main] INFO
> org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is
> deprecated. Instead, use fs.defaultFS
> 2014-07-31 17:38:00,484 [main] INFO
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine -
> Connecting to hadoop file system at: hdfs://master:9000
> 2014-07-31 17:38:01,431 [main] INFO
> org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is
> deprecated. Instead, use fs.defaultFS
> 2014-07-31 17:38:01,557 [main] INFO
> org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is
> deprecated. Instead, use fs.defaultFS
> 2014-07-31 17:38:01,609 [main] ERROR org.apache.pig.tools.grunt.Grunt -
> ERROR 2998: Unhandled internal error.
> com/datastax/driver/core/policies/LoadBalancing Policy
> Details at logfile:
> /home/hduser/apache-cassandra-2.0.9/examples/pig/pig_1406808480077.log
>
> Thanks & Regards
> Akshay Ghanshyam Ballarpure
> Tata Consultancy Services
> Cell:- 9985084075
> Mailto: akshay.ballarpure@tcs.com
> Website: http://www.tcs.com
> ____________________________________________
> Experience certainty. IT Services
> Business Solutions
> Consulting
> ____________________________________________
>
> =====-----=====-----=====
> Notice: The information contained in this e-mail
> message and/or attachments to it may contain
> confidential or privileged information. If you are
> not the intended recipient, any dissemination, use,
> review, distribution, printing or copying of the
> information contained in this e-mail message
> and/or attachments to it are strictly prohibited. If
> you have received this communication in error,
> please notify us by reply e-mail or telephone and
> immediately and permanently delete the message
> and any attachments. Thank you
>
>
--
Founder/CEO Spinn3r.com
Location: *San Francisco, CA*
blog: http://burtonator.wordpress.com
… or check out my Google+ profile
<https://plus.google.com/102718274791889610666/posts>
<http://spinn3r.com>