You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Kevin Burton <bu...@spinn3r.com> on 2014/05/20 22:49:03 UTC
CassandraStorage loader generating 2x many record?
(accidentally cross posted this to the cassandra list… when I meant to post
it here)
This has to be a bug or either that or I'm insane.
Here's my table in Cassandra:
CREATE TABLE test_source (
id int ,
primary key(id)
);
INSERT INTO test_source (ID) VALUES(1);
INSERT INTO test_source (ID) VALUES(2);
INSERT INTO test_source (ID) VALUES(3);
INSERT INTO test_source (ID) VALUES(4);
cqlsh:blogindex> select * from test_source;
id
----
1
2
4
3
(4 rows)
… now I load that into pig and run:
test_source = LOAD 'cassandra://blogindex/test_source' USING
CassandraStorage() AS (source, target: bag {T: tuple(name, value)});
dump test_source;
(4,{((),)})
(1,{((),)})
(2,{((),)})
(4,{((),)})
(1,{((),)})
(3,{((),)})
(3,{((),)})
(2,{((),)})
… now it COULD be a bug with 'dump' … but even then that's a bug.
I suspect that Cassandra might be getting confused and giving too many rows
to pig due to maybe duplicating input splits?
--
Founder/CEO Spinn3r.com
Location: *San Francisco, CA*
Skype: *burtonator*
blog: http://burtonator.wordpress.com
… or check out my Google+
profile<https://plus.google.com/102718274791889610666/posts>
<http://spinn3r.com>
War is peace. Freedom is slavery. Ignorance is strength. Corporations are
people.