You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Silvère Lestang (JIRA)" <ji...@apache.org> on 2011/06/22 16:50:53 UTC

[jira] [Created] (CASSANDRA-2810) RuntimeException in Pig when using "dump" command on column name

RuntimeException in Pig when using "dump" command on column name
----------------------------------------------------------------

                 Key: CASSANDRA-2810
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2810
             Project: Cassandra
          Issue Type: Bug
    Affects Versions: 0.8.1
         Environment: Ubuntu 10.10, 32 bits
java version "1.6.0_24"
Brisk beta-2 installed from Debian packages
            Reporter: Silvère Lestang


This bug was previously report on [Brisk bug tracker|https://datastax.jira.com/browse/BRISK-232].

In cassandra-cli:
{code}
[default@unknown] create keyspace Test
    with placement_strategy = 'org.apache.cassandra.locator.SimpleStrategy'
    and strategy_options = [{replication_factor:1}];

[default@unknown] use Test;
Authenticated to keyspace: Test

[default@Test] create column family test;

[default@Test] set test[ascii('row1')][long(1)]=integer(35);
set test[ascii('row1')][long(2)]=integer(36);
set test[ascii('row1')][long(3)]=integer(38);
set test[ascii('row2')][long(1)]=integer(45);
set test[ascii('row2')][long(2)]=integer(42);
set test[ascii('row2')][long(3)]=integer(33);

[default@Test] list test;
Using default limit of 100
-------------------
RowKey: 726f7731
=> (column=0000000000000001, value=35, timestamp=1308744931122000)
=> (column=0000000000000002, value=36, timestamp=1308744931124000)
=> (column=0000000000000003, value=38, timestamp=1308744931125000)
-------------------
RowKey: 726f7732
=> (column=0000000000000001, value=45, timestamp=1308744931127000)
=> (column=0000000000000002, value=42, timestamp=1308744931128000)
=> (column=0000000000000003, value=33, timestamp=1308744932722000)

2 Rows Returned.

[default@Test] describe keyspace;
Keyspace: Test:
  Replication Strategy: org.apache.cassandra.locator.SimpleStrategy
  Durable Writes: true
    Options: [replication_factor:1]
  Column Families:
    ColumnFamily: test
      Key Validation Class: org.apache.cassandra.db.marshal.BytesType
      Default column value validator: org.apache.cassandra.db.marshal.BytesType
      Columns sorted by: org.apache.cassandra.db.marshal.BytesType
      Row cache size / save period in seconds: 0.0/0
      Key cache size / save period in seconds: 200000.0/14400
      Memtable thresholds: 0.571875/122/1440 (millions of ops/MB/minutes)
      GC grace seconds: 864000
      Compaction min/max thresholds: 4/32
      Read repair chance: 1.0
      Replicate on write: false
      Built indexes: []
{code}
In Pig command line:
{code}
grunt> test = LOAD 'cassandra://Test/test' USING CassandraStorage() AS (rowkey:chararray, columns: bag {T: (name:long, value:int)});

grunt> value_test = foreach test generate rowkey, columns.name, columns.value;

grunt> dump value_test;
{code}
In /var/log/cassandra/system.log, I have severals time this exception:
{code}
INFO [IPC Server handler 3 on 8012] 2011-06-22 15:03:28,533 TaskInProgress.java (line 551) Error from attempt_201106210955_0051_m_000000_3: java.lang.RuntimeException: Unexpected data type -1 found in stream.
	at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:478)
	at org.apache.pig.data.BinInterSedes.writeTuple(BinInterSedes.java:541)
	at org.apache.pig.data.BinInterSedes.writeBag(BinInterSedes.java:522)
	at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:361)
	at org.apache.pig.data.BinInterSedes.writeTuple(BinInterSedes.java:541)
	at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:357)
	at org.apache.pig.impl.io.InterRecordWriter.write(InterRecordWriter.java:73)
	at org.apache.pig.impl.io.InterStorage.putNext(InterStorage.java:87)
	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:138)
	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:97)
	at org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:638)
	at org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapOnly$Map.collect(PigMapOnly.java:48)
	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:239)
	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:232)
	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:53)
	at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:763)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:369)
	at org.apache.hadoop.mapred.Child$4.run(Child.java:259)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:396)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
	at org.apache.hadoop.mapred.Child.main(Child.java:253)
{code}
and the request failed.

{code}
grunt> test = LOAD 'cassandra://Test/test' USING CassandraStorage() AS (rowkey:chararray, columns: bag {T: (name:long, value:int)});

grunt> value_test = foreach test generate rowkey, columns.value;

grunt> dump value_test;
{code}

This time, without the column name, it's work (but the value are displayed as char instead of integer). Result:
{code}
(row1,{(#),($),(&)})
(row2,{(-),(*),(!)})
{code}

Now we do the same test but we set a comparator to the CF.
{code}
[default@Test] create column family test with comparator = 'LongType';

[default@Test] set test[ascii('row1')][long(1)]=integer(35);
set test[ascii('row1')][long(2)]=integer(36);
set test[ascii('row1')][long(3)]=integer(38);
set test[ascii('row2')][long(1)]=integer(45);
set test[ascii('row2')][long(2)]=integer(42);
set test[ascii('row2')][long(3)]=integer(33);

[default@Test] list test;
Using default limit of 100
-------------------
RowKey: 726f7731
=> (column=1, value=35, timestamp=1308748643506000)
=> (column=2, value=36, timestamp=1308748643508000)
=> (column=3, value=38, timestamp=1308748643509000)
-------------------
RowKey: 726f7732
=> (column=1, value=45, timestamp=1308748643510000)
=> (column=2, value=42, timestamp=1308748643512000)
=> (column=3, value=33, timestamp=1308748645138000)

2 Rows Returned.

[default@Test] describe keyspace;
Keyspace: Test:
  Replication Strategy: org.apache.cassandra.locator.SimpleStrategy
  Durable Writes: true
    Options: [replication_factor:1]
  Column Families:
    ColumnFamily: test
      Key Validation Class: org.apache.cassandra.db.marshal.BytesType
      Default column value validator: org.apache.cassandra.db.marshal.BytesType
      Columns sorted by: org.apache.cassandra.db.marshal.LongType
      Row cache size / save period in seconds: 0.0/0
      Key cache size / save period in seconds: 200000.0/14400
      Memtable thresholds: 0.571875/122/1440 (millions of ops/MB/minutes)
      GC grace seconds: 864000
      Compaction min/max thresholds: 4/32
      Read repair chance: 1.0
      Replicate on write: false
      Built indexes: []
{code}
{code}
grunt> test = LOAD 'cassandra://Test/test' USING CassandraStorage() AS (rowkey:chararray, columns: bag {T: (name:long, value:int)});

grunt> value_test = foreach test generate rowkey, columns.name, columns.value;

grunt> dump value_test;
{code}
This time it's work as expected (appart from the value displayed as char). Result:
{code}
(row1,{(1),(2),(3)},{(#),($),(&)})
(row2,{(1),(2),(3)},{(-),(*),(!)})
{code}


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

[jira] [Updated] (CASSANDRA-2810) RuntimeException in Pig when using "dump" command on column name

Posted by "Brandon Williams (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-2810?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Brandon Williams updated CASSANDRA-2810:
----------------------------------------

    Attachment: 2810-v2.txt

It looks like the final problem here is that IntegerType always returns a BigInteger, which pig does not like.  This is unfortunate since IntegerType can't be easily subclassed and overridden to return ints.

v2 instead adds a setTupleValue method that is always used for adding values to tuples, and houses all the special-casing currently needed and provides a spot for more in the future, rather than proliferating custom type converters since I'm sure IntegerType won't be alone here.

> RuntimeException in Pig when using "dump" command on column name
> ----------------------------------------------------------------
>
>                 Key: CASSANDRA-2810
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2810
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 0.8.1
>         Environment: Ubuntu 10.10, 32 bits
> java version "1.6.0_24"
> Brisk beta-2 installed from Debian packages
>            Reporter: Silvère Lestang
>            Assignee: Brandon Williams
>         Attachments: 2810-v2.txt, 2810.txt
>
>
> This bug was previously report on [Brisk bug tracker|https://datastax.jira.com/browse/BRISK-232].
> In cassandra-cli:
> {code}
> [default@unknown] create keyspace Test
>     with placement_strategy = 'org.apache.cassandra.locator.SimpleStrategy'
>     and strategy_options = [{replication_factor:1}];
> [default@unknown] use Test;
> Authenticated to keyspace: Test
> [default@Test] create column family test;
> [default@Test] set test[ascii('row1')][long(1)]=integer(35);
> set test[ascii('row1')][long(2)]=integer(36);
> set test[ascii('row1')][long(3)]=integer(38);
> set test[ascii('row2')][long(1)]=integer(45);
> set test[ascii('row2')][long(2)]=integer(42);
> set test[ascii('row2')][long(3)]=integer(33);
> [default@Test] list test;
> Using default limit of 100
> -------------------
> RowKey: 726f7731
> => (column=0000000000000001, value=35, timestamp=1308744931122000)
> => (column=0000000000000002, value=36, timestamp=1308744931124000)
> => (column=0000000000000003, value=38, timestamp=1308744931125000)
> -------------------
> RowKey: 726f7732
> => (column=0000000000000001, value=45, timestamp=1308744931127000)
> => (column=0000000000000002, value=42, timestamp=1308744931128000)
> => (column=0000000000000003, value=33, timestamp=1308744932722000)
> 2 Rows Returned.
> [default@Test] describe keyspace;
> Keyspace: Test:
>   Replication Strategy: org.apache.cassandra.locator.SimpleStrategy
>   Durable Writes: true
>     Options: [replication_factor:1]
>   Column Families:
>     ColumnFamily: test
>       Key Validation Class: org.apache.cassandra.db.marshal.BytesType
>       Default column value validator: org.apache.cassandra.db.marshal.BytesType
>       Columns sorted by: org.apache.cassandra.db.marshal.BytesType
>       Row cache size / save period in seconds: 0.0/0
>       Key cache size / save period in seconds: 200000.0/14400
>       Memtable thresholds: 0.571875/122/1440 (millions of ops/MB/minutes)
>       GC grace seconds: 864000
>       Compaction min/max thresholds: 4/32
>       Read repair chance: 1.0
>       Replicate on write: false
>       Built indexes: []
> {code}
> In Pig command line:
> {code}
> grunt> test = LOAD 'cassandra://Test/test' USING CassandraStorage() AS (rowkey:chararray, columns: bag {T: (name:long, value:int)});
> grunt> value_test = foreach test generate rowkey, columns.name, columns.value;
> grunt> dump value_test;
> {code}
> In /var/log/cassandra/system.log, I have severals time this exception:
> {code}
> INFO [IPC Server handler 3 on 8012] 2011-06-22 15:03:28,533 TaskInProgress.java (line 551) Error from attempt_201106210955_0051_m_000000_3: java.lang.RuntimeException: Unexpected data type -1 found in stream.
> 	at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:478)
> 	at org.apache.pig.data.BinInterSedes.writeTuple(BinInterSedes.java:541)
> 	at org.apache.pig.data.BinInterSedes.writeBag(BinInterSedes.java:522)
> 	at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:361)
> 	at org.apache.pig.data.BinInterSedes.writeTuple(BinInterSedes.java:541)
> 	at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:357)
> 	at org.apache.pig.impl.io.InterRecordWriter.write(InterRecordWriter.java:73)
> 	at org.apache.pig.impl.io.InterStorage.putNext(InterStorage.java:87)
> 	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:138)
> 	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:97)
> 	at org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:638)
> 	at org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
> 	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapOnly$Map.collect(PigMapOnly.java:48)
> 	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:239)
> 	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:232)
> 	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:53)
> 	at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
> 	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:763)
> 	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:369)
> 	at org.apache.hadoop.mapred.Child$4.run(Child.java:259)
> 	at java.security.AccessController.doPrivileged(Native Method)
> 	at javax.security.auth.Subject.doAs(Subject.java:396)
> 	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
> 	at org.apache.hadoop.mapred.Child.main(Child.java:253)
> {code}
> and the request failed.
> {code}
> grunt> test = LOAD 'cassandra://Test/test' USING CassandraStorage() AS (rowkey:chararray, columns: bag {T: (name:long, value:int)});
> grunt> value_test = foreach test generate rowkey, columns.value;
> grunt> dump value_test;
> {code}
> This time, without the column name, it's work (but the value are displayed as char instead of integer). Result:
> {code}
> (row1,{(#),($),(&)})
> (row2,{(-),(*),(!)})
> {code}
> Now we do the same test but we set a comparator to the CF.
> {code}
> [default@Test] create column family test with comparator = 'LongType';
> [default@Test] set test[ascii('row1')][long(1)]=integer(35);
> set test[ascii('row1')][long(2)]=integer(36);
> set test[ascii('row1')][long(3)]=integer(38);
> set test[ascii('row2')][long(1)]=integer(45);
> set test[ascii('row2')][long(2)]=integer(42);
> set test[ascii('row2')][long(3)]=integer(33);
> [default@Test] list test;
> Using default limit of 100
> -------------------
> RowKey: 726f7731
> => (column=1, value=35, timestamp=1308748643506000)
> => (column=2, value=36, timestamp=1308748643508000)
> => (column=3, value=38, timestamp=1308748643509000)
> -------------------
> RowKey: 726f7732
> => (column=1, value=45, timestamp=1308748643510000)
> => (column=2, value=42, timestamp=1308748643512000)
> => (column=3, value=33, timestamp=1308748645138000)
> 2 Rows Returned.
> [default@Test] describe keyspace;
> Keyspace: Test:
>   Replication Strategy: org.apache.cassandra.locator.SimpleStrategy
>   Durable Writes: true
>     Options: [replication_factor:1]
>   Column Families:
>     ColumnFamily: test
>       Key Validation Class: org.apache.cassandra.db.marshal.BytesType
>       Default column value validator: org.apache.cassandra.db.marshal.BytesType
>       Columns sorted by: org.apache.cassandra.db.marshal.LongType
>       Row cache size / save period in seconds: 0.0/0
>       Key cache size / save period in seconds: 200000.0/14400
>       Memtable thresholds: 0.571875/122/1440 (millions of ops/MB/minutes)
>       GC grace seconds: 864000
>       Compaction min/max thresholds: 4/32
>       Read repair chance: 1.0
>       Replicate on write: false
>       Built indexes: []
> {code}
> {code}
> grunt> test = LOAD 'cassandra://Test/test' USING CassandraStorage() AS (rowkey:chararray, columns: bag {T: (name:long, value:int)});
> grunt> value_test = foreach test generate rowkey, columns.name, columns.value;
> grunt> dump value_test;
> {code}
> This time it's work as expected (appart from the value displayed as char). Result:
> {code}
> (row1,{(1),(2),(3)},{(#),($),(&)})
> (row2,{(1),(2),(3)},{(-),(*),(!)})
> {code}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

[jira] [Commented] (CASSANDRA-2810) RuntimeException in Pig when using "dump" command on column name

Posted by "Hudson (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-2810?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13116856#comment-13116856 ] 

Hudson commented on CASSANDRA-2810:
-----------------------------------

Integrated in Cassandra-0.8 #348 (See [https://builds.apache.org/job/Cassandra-0.8/348/])
    Fix handling of integer types in pig.
Patch by brandonwilliams, reviewed by Jeremy Hanna for CASSANDRA-2810

brandonwilliams : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1177084
Files : 
* /cassandra/branches/cassandra-0.8/contrib/pig/src/java/org/apache/cassandra/hadoop/pig/CassandraStorage.java

                
> RuntimeException in Pig when using "dump" command on column name
> ----------------------------------------------------------------
>
>                 Key: CASSANDRA-2810
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2810
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 0.8.1
>         Environment: Ubuntu 10.10, 32 bits
> java version "1.6.0_24"
> Brisk beta-2 installed from Debian packages
>            Reporter: Silvère Lestang
>            Assignee: Brandon Williams
>             Fix For: 0.8.7
>
>         Attachments: 2810-v2.txt, 2810-v3.txt, 2810.txt
>
>
> This bug was previously report on [Brisk bug tracker|https://datastax.jira.com/browse/BRISK-232].
> In cassandra-cli:
> {code}
> [default@unknown] create keyspace Test
>     with placement_strategy = 'org.apache.cassandra.locator.SimpleStrategy'
>     and strategy_options = [{replication_factor:1}];
> [default@unknown] use Test;
> Authenticated to keyspace: Test
> [default@Test] create column family test;
> [default@Test] set test[ascii('row1')][long(1)]=integer(35);
> set test[ascii('row1')][long(2)]=integer(36);
> set test[ascii('row1')][long(3)]=integer(38);
> set test[ascii('row2')][long(1)]=integer(45);
> set test[ascii('row2')][long(2)]=integer(42);
> set test[ascii('row2')][long(3)]=integer(33);
> [default@Test] list test;
> Using default limit of 100
> -------------------
> RowKey: 726f7731
> => (column=0000000000000001, value=35, timestamp=1308744931122000)
> => (column=0000000000000002, value=36, timestamp=1308744931124000)
> => (column=0000000000000003, value=38, timestamp=1308744931125000)
> -------------------
> RowKey: 726f7732
> => (column=0000000000000001, value=45, timestamp=1308744931127000)
> => (column=0000000000000002, value=42, timestamp=1308744931128000)
> => (column=0000000000000003, value=33, timestamp=1308744932722000)
> 2 Rows Returned.
> [default@Test] describe keyspace;
> Keyspace: Test:
>   Replication Strategy: org.apache.cassandra.locator.SimpleStrategy
>   Durable Writes: true
>     Options: [replication_factor:1]
>   Column Families:
>     ColumnFamily: test
>       Key Validation Class: org.apache.cassandra.db.marshal.BytesType
>       Default column value validator: org.apache.cassandra.db.marshal.BytesType
>       Columns sorted by: org.apache.cassandra.db.marshal.BytesType
>       Row cache size / save period in seconds: 0.0/0
>       Key cache size / save period in seconds: 200000.0/14400
>       Memtable thresholds: 0.571875/122/1440 (millions of ops/MB/minutes)
>       GC grace seconds: 864000
>       Compaction min/max thresholds: 4/32
>       Read repair chance: 1.0
>       Replicate on write: false
>       Built indexes: []
> {code}
> In Pig command line:
> {code}
> grunt> test = LOAD 'cassandra://Test/test' USING CassandraStorage() AS (rowkey:chararray, columns: bag {T: (name:long, value:int)});
> grunt> value_test = foreach test generate rowkey, columns.name, columns.value;
> grunt> dump value_test;
> {code}
> In /var/log/cassandra/system.log, I have severals time this exception:
> {code}
> INFO [IPC Server handler 3 on 8012] 2011-06-22 15:03:28,533 TaskInProgress.java (line 551) Error from attempt_201106210955_0051_m_000000_3: java.lang.RuntimeException: Unexpected data type -1 found in stream.
> 	at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:478)
> 	at org.apache.pig.data.BinInterSedes.writeTuple(BinInterSedes.java:541)
> 	at org.apache.pig.data.BinInterSedes.writeBag(BinInterSedes.java:522)
> 	at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:361)
> 	at org.apache.pig.data.BinInterSedes.writeTuple(BinInterSedes.java:541)
> 	at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:357)
> 	at org.apache.pig.impl.io.InterRecordWriter.write(InterRecordWriter.java:73)
> 	at org.apache.pig.impl.io.InterStorage.putNext(InterStorage.java:87)
> 	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:138)
> 	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:97)
> 	at org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:638)
> 	at org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
> 	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapOnly$Map.collect(PigMapOnly.java:48)
> 	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:239)
> 	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:232)
> 	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:53)
> 	at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
> 	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:763)
> 	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:369)
> 	at org.apache.hadoop.mapred.Child$4.run(Child.java:259)
> 	at java.security.AccessController.doPrivileged(Native Method)
> 	at javax.security.auth.Subject.doAs(Subject.java:396)
> 	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
> 	at org.apache.hadoop.mapred.Child.main(Child.java:253)
> {code}
> and the request failed.
> {code}
> grunt> test = LOAD 'cassandra://Test/test' USING CassandraStorage() AS (rowkey:chararray, columns: bag {T: (name:long, value:int)});
> grunt> value_test = foreach test generate rowkey, columns.value;
> grunt> dump value_test;
> {code}
> This time, without the column name, it's work (but the value are displayed as char instead of integer). Result:
> {code}
> (row1,{(#),($),(&)})
> (row2,{(-),(*),(!)})
> {code}
> Now we do the same test but we set a comparator to the CF.
> {code}
> [default@Test] create column family test with comparator = 'LongType';
> [default@Test] set test[ascii('row1')][long(1)]=integer(35);
> set test[ascii('row1')][long(2)]=integer(36);
> set test[ascii('row1')][long(3)]=integer(38);
> set test[ascii('row2')][long(1)]=integer(45);
> set test[ascii('row2')][long(2)]=integer(42);
> set test[ascii('row2')][long(3)]=integer(33);
> [default@Test] list test;
> Using default limit of 100
> -------------------
> RowKey: 726f7731
> => (column=1, value=35, timestamp=1308748643506000)
> => (column=2, value=36, timestamp=1308748643508000)
> => (column=3, value=38, timestamp=1308748643509000)
> -------------------
> RowKey: 726f7732
> => (column=1, value=45, timestamp=1308748643510000)
> => (column=2, value=42, timestamp=1308748643512000)
> => (column=3, value=33, timestamp=1308748645138000)
> 2 Rows Returned.
> [default@Test] describe keyspace;
> Keyspace: Test:
>   Replication Strategy: org.apache.cassandra.locator.SimpleStrategy
>   Durable Writes: true
>     Options: [replication_factor:1]
>   Column Families:
>     ColumnFamily: test
>       Key Validation Class: org.apache.cassandra.db.marshal.BytesType
>       Default column value validator: org.apache.cassandra.db.marshal.BytesType
>       Columns sorted by: org.apache.cassandra.db.marshal.LongType
>       Row cache size / save period in seconds: 0.0/0
>       Key cache size / save period in seconds: 200000.0/14400
>       Memtable thresholds: 0.571875/122/1440 (millions of ops/MB/minutes)
>       GC grace seconds: 864000
>       Compaction min/max thresholds: 4/32
>       Read repair chance: 1.0
>       Replicate on write: false
>       Built indexes: []
> {code}
> {code}
> grunt> test = LOAD 'cassandra://Test/test' USING CassandraStorage() AS (rowkey:chararray, columns: bag {T: (name:long, value:int)});
> grunt> value_test = foreach test generate rowkey, columns.name, columns.value;
> grunt> dump value_test;
> {code}
> This time it's work as expected (appart from the value displayed as char). Result:
> {code}
> (row1,{(1),(2),(3)},{(#),($),(&)})
> (row2,{(1),(2),(3)},{(-),(*),(!)})
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

[jira] [Commented] (CASSANDRA-2810) RuntimeException in Pig when using "dump" command on column name

Posted by "Silvère Lestang (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-2810?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13063906#comment-13063906 ] 

Silvère Lestang commented on CASSANDRA-2810:
--------------------------------------------

No, from my test I arrived to the inverse conclusion: [#CASSANDRA-2777] seems to works fine (Pig has the good type for my column family) but the bug is still here despite the 2 patches.

> RuntimeException in Pig when using "dump" command on column name
> ----------------------------------------------------------------
>
>                 Key: CASSANDRA-2810
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2810
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 0.8.1
>         Environment: Ubuntu 10.10, 32 bits
> java version "1.6.0_24"
> Brisk beta-2 installed from Debian packages
>            Reporter: Silvère Lestang
>            Assignee: Brandon Williams
>         Attachments: 2810.txt
>
>
> This bug was previously report on [Brisk bug tracker|https://datastax.jira.com/browse/BRISK-232].
> In cassandra-cli:
> {code}
> [default@unknown] create keyspace Test
>     with placement_strategy = 'org.apache.cassandra.locator.SimpleStrategy'
>     and strategy_options = [{replication_factor:1}];
> [default@unknown] use Test;
> Authenticated to keyspace: Test
> [default@Test] create column family test;
> [default@Test] set test[ascii('row1')][long(1)]=integer(35);
> set test[ascii('row1')][long(2)]=integer(36);
> set test[ascii('row1')][long(3)]=integer(38);
> set test[ascii('row2')][long(1)]=integer(45);
> set test[ascii('row2')][long(2)]=integer(42);
> set test[ascii('row2')][long(3)]=integer(33);
> [default@Test] list test;
> Using default limit of 100
> -------------------
> RowKey: 726f7731
> => (column=0000000000000001, value=35, timestamp=1308744931122000)
> => (column=0000000000000002, value=36, timestamp=1308744931124000)
> => (column=0000000000000003, value=38, timestamp=1308744931125000)
> -------------------
> RowKey: 726f7732
> => (column=0000000000000001, value=45, timestamp=1308744931127000)
> => (column=0000000000000002, value=42, timestamp=1308744931128000)
> => (column=0000000000000003, value=33, timestamp=1308744932722000)
> 2 Rows Returned.
> [default@Test] describe keyspace;
> Keyspace: Test:
>   Replication Strategy: org.apache.cassandra.locator.SimpleStrategy
>   Durable Writes: true
>     Options: [replication_factor:1]
>   Column Families:
>     ColumnFamily: test
>       Key Validation Class: org.apache.cassandra.db.marshal.BytesType
>       Default column value validator: org.apache.cassandra.db.marshal.BytesType
>       Columns sorted by: org.apache.cassandra.db.marshal.BytesType
>       Row cache size / save period in seconds: 0.0/0
>       Key cache size / save period in seconds: 200000.0/14400
>       Memtable thresholds: 0.571875/122/1440 (millions of ops/MB/minutes)
>       GC grace seconds: 864000
>       Compaction min/max thresholds: 4/32
>       Read repair chance: 1.0
>       Replicate on write: false
>       Built indexes: []
> {code}
> In Pig command line:
> {code}
> grunt> test = LOAD 'cassandra://Test/test' USING CassandraStorage() AS (rowkey:chararray, columns: bag {T: (name:long, value:int)});
> grunt> value_test = foreach test generate rowkey, columns.name, columns.value;
> grunt> dump value_test;
> {code}
> In /var/log/cassandra/system.log, I have severals time this exception:
> {code}
> INFO [IPC Server handler 3 on 8012] 2011-06-22 15:03:28,533 TaskInProgress.java (line 551) Error from attempt_201106210955_0051_m_000000_3: java.lang.RuntimeException: Unexpected data type -1 found in stream.
> 	at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:478)
> 	at org.apache.pig.data.BinInterSedes.writeTuple(BinInterSedes.java:541)
> 	at org.apache.pig.data.BinInterSedes.writeBag(BinInterSedes.java:522)
> 	at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:361)
> 	at org.apache.pig.data.BinInterSedes.writeTuple(BinInterSedes.java:541)
> 	at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:357)
> 	at org.apache.pig.impl.io.InterRecordWriter.write(InterRecordWriter.java:73)
> 	at org.apache.pig.impl.io.InterStorage.putNext(InterStorage.java:87)
> 	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:138)
> 	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:97)
> 	at org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:638)
> 	at org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
> 	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapOnly$Map.collect(PigMapOnly.java:48)
> 	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:239)
> 	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:232)
> 	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:53)
> 	at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
> 	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:763)
> 	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:369)
> 	at org.apache.hadoop.mapred.Child$4.run(Child.java:259)
> 	at java.security.AccessController.doPrivileged(Native Method)
> 	at javax.security.auth.Subject.doAs(Subject.java:396)
> 	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
> 	at org.apache.hadoop.mapred.Child.main(Child.java:253)
> {code}
> and the request failed.
> {code}
> grunt> test = LOAD 'cassandra://Test/test' USING CassandraStorage() AS (rowkey:chararray, columns: bag {T: (name:long, value:int)});
> grunt> value_test = foreach test generate rowkey, columns.value;
> grunt> dump value_test;
> {code}
> This time, without the column name, it's work (but the value are displayed as char instead of integer). Result:
> {code}
> (row1,{(#),($),(&)})
> (row2,{(-),(*),(!)})
> {code}
> Now we do the same test but we set a comparator to the CF.
> {code}
> [default@Test] create column family test with comparator = 'LongType';
> [default@Test] set test[ascii('row1')][long(1)]=integer(35);
> set test[ascii('row1')][long(2)]=integer(36);
> set test[ascii('row1')][long(3)]=integer(38);
> set test[ascii('row2')][long(1)]=integer(45);
> set test[ascii('row2')][long(2)]=integer(42);
> set test[ascii('row2')][long(3)]=integer(33);
> [default@Test] list test;
> Using default limit of 100
> -------------------
> RowKey: 726f7731
> => (column=1, value=35, timestamp=1308748643506000)
> => (column=2, value=36, timestamp=1308748643508000)
> => (column=3, value=38, timestamp=1308748643509000)
> -------------------
> RowKey: 726f7732
> => (column=1, value=45, timestamp=1308748643510000)
> => (column=2, value=42, timestamp=1308748643512000)
> => (column=3, value=33, timestamp=1308748645138000)
> 2 Rows Returned.
> [default@Test] describe keyspace;
> Keyspace: Test:
>   Replication Strategy: org.apache.cassandra.locator.SimpleStrategy
>   Durable Writes: true
>     Options: [replication_factor:1]
>   Column Families:
>     ColumnFamily: test
>       Key Validation Class: org.apache.cassandra.db.marshal.BytesType
>       Default column value validator: org.apache.cassandra.db.marshal.BytesType
>       Columns sorted by: org.apache.cassandra.db.marshal.LongType
>       Row cache size / save period in seconds: 0.0/0
>       Key cache size / save period in seconds: 200000.0/14400
>       Memtable thresholds: 0.571875/122/1440 (millions of ops/MB/minutes)
>       GC grace seconds: 864000
>       Compaction min/max thresholds: 4/32
>       Read repair chance: 1.0
>       Replicate on write: false
>       Built indexes: []
> {code}
> {code}
> grunt> test = LOAD 'cassandra://Test/test' USING CassandraStorage() AS (rowkey:chararray, columns: bag {T: (name:long, value:int)});
> grunt> value_test = foreach test generate rowkey, columns.name, columns.value;
> grunt> dump value_test;
> {code}
> This time it's work as expected (appart from the value displayed as char). Result:
> {code}
> (row1,{(1),(2),(3)},{(#),($),(&)})
> (row2,{(1),(2),(3)},{(-),(*),(!)})
> {code}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

[jira] [Updated] (CASSANDRA-2810) RuntimeException in Pig when using "dump" command on column name

Posted by "Jonathan Ellis (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-2810?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Ellis updated CASSANDRA-2810:
--------------------------------------

    Fix Version/s:     (was: 1.0.1)
    
> RuntimeException in Pig when using "dump" command on column name
> ----------------------------------------------------------------
>
>                 Key: CASSANDRA-2810
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2810
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 0.8.1
>         Environment: Ubuntu 10.10, 32 bits
> java version "1.6.0_24"
> Brisk beta-2 installed from Debian packages
>            Reporter: Silvère Lestang
>            Assignee: Brandon Williams
>             Fix For: 0.8.7
>
>         Attachments: 2810-v2.txt, 2810-v3.txt, 2810.txt
>
>
> This bug was previously report on [Brisk bug tracker|https://datastax.jira.com/browse/BRISK-232].
> In cassandra-cli:
> {code}
> [default@unknown] create keyspace Test
>     with placement_strategy = 'org.apache.cassandra.locator.SimpleStrategy'
>     and strategy_options = [{replication_factor:1}];
> [default@unknown] use Test;
> Authenticated to keyspace: Test
> [default@Test] create column family test;
> [default@Test] set test[ascii('row1')][long(1)]=integer(35);
> set test[ascii('row1')][long(2)]=integer(36);
> set test[ascii('row1')][long(3)]=integer(38);
> set test[ascii('row2')][long(1)]=integer(45);
> set test[ascii('row2')][long(2)]=integer(42);
> set test[ascii('row2')][long(3)]=integer(33);
> [default@Test] list test;
> Using default limit of 100
> -------------------
> RowKey: 726f7731
> => (column=0000000000000001, value=35, timestamp=1308744931122000)
> => (column=0000000000000002, value=36, timestamp=1308744931124000)
> => (column=0000000000000003, value=38, timestamp=1308744931125000)
> -------------------
> RowKey: 726f7732
> => (column=0000000000000001, value=45, timestamp=1308744931127000)
> => (column=0000000000000002, value=42, timestamp=1308744931128000)
> => (column=0000000000000003, value=33, timestamp=1308744932722000)
> 2 Rows Returned.
> [default@Test] describe keyspace;
> Keyspace: Test:
>   Replication Strategy: org.apache.cassandra.locator.SimpleStrategy
>   Durable Writes: true
>     Options: [replication_factor:1]
>   Column Families:
>     ColumnFamily: test
>       Key Validation Class: org.apache.cassandra.db.marshal.BytesType
>       Default column value validator: org.apache.cassandra.db.marshal.BytesType
>       Columns sorted by: org.apache.cassandra.db.marshal.BytesType
>       Row cache size / save period in seconds: 0.0/0
>       Key cache size / save period in seconds: 200000.0/14400
>       Memtable thresholds: 0.571875/122/1440 (millions of ops/MB/minutes)
>       GC grace seconds: 864000
>       Compaction min/max thresholds: 4/32
>       Read repair chance: 1.0
>       Replicate on write: false
>       Built indexes: []
> {code}
> In Pig command line:
> {code}
> grunt> test = LOAD 'cassandra://Test/test' USING CassandraStorage() AS (rowkey:chararray, columns: bag {T: (name:long, value:int)});
> grunt> value_test = foreach test generate rowkey, columns.name, columns.value;
> grunt> dump value_test;
> {code}
> In /var/log/cassandra/system.log, I have severals time this exception:
> {code}
> INFO [IPC Server handler 3 on 8012] 2011-06-22 15:03:28,533 TaskInProgress.java (line 551) Error from attempt_201106210955_0051_m_000000_3: java.lang.RuntimeException: Unexpected data type -1 found in stream.
> 	at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:478)
> 	at org.apache.pig.data.BinInterSedes.writeTuple(BinInterSedes.java:541)
> 	at org.apache.pig.data.BinInterSedes.writeBag(BinInterSedes.java:522)
> 	at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:361)
> 	at org.apache.pig.data.BinInterSedes.writeTuple(BinInterSedes.java:541)
> 	at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:357)
> 	at org.apache.pig.impl.io.InterRecordWriter.write(InterRecordWriter.java:73)
> 	at org.apache.pig.impl.io.InterStorage.putNext(InterStorage.java:87)
> 	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:138)
> 	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:97)
> 	at org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:638)
> 	at org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
> 	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapOnly$Map.collect(PigMapOnly.java:48)
> 	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:239)
> 	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:232)
> 	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:53)
> 	at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
> 	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:763)
> 	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:369)
> 	at org.apache.hadoop.mapred.Child$4.run(Child.java:259)
> 	at java.security.AccessController.doPrivileged(Native Method)
> 	at javax.security.auth.Subject.doAs(Subject.java:396)
> 	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
> 	at org.apache.hadoop.mapred.Child.main(Child.java:253)
> {code}
> and the request failed.
> {code}
> grunt> test = LOAD 'cassandra://Test/test' USING CassandraStorage() AS (rowkey:chararray, columns: bag {T: (name:long, value:int)});
> grunt> value_test = foreach test generate rowkey, columns.value;
> grunt> dump value_test;
> {code}
> This time, without the column name, it's work (but the value are displayed as char instead of integer). Result:
> {code}
> (row1,{(#),($),(&)})
> (row2,{(-),(*),(!)})
> {code}
> Now we do the same test but we set a comparator to the CF.
> {code}
> [default@Test] create column family test with comparator = 'LongType';
> [default@Test] set test[ascii('row1')][long(1)]=integer(35);
> set test[ascii('row1')][long(2)]=integer(36);
> set test[ascii('row1')][long(3)]=integer(38);
> set test[ascii('row2')][long(1)]=integer(45);
> set test[ascii('row2')][long(2)]=integer(42);
> set test[ascii('row2')][long(3)]=integer(33);
> [default@Test] list test;
> Using default limit of 100
> -------------------
> RowKey: 726f7731
> => (column=1, value=35, timestamp=1308748643506000)
> => (column=2, value=36, timestamp=1308748643508000)
> => (column=3, value=38, timestamp=1308748643509000)
> -------------------
> RowKey: 726f7732
> => (column=1, value=45, timestamp=1308748643510000)
> => (column=2, value=42, timestamp=1308748643512000)
> => (column=3, value=33, timestamp=1308748645138000)
> 2 Rows Returned.
> [default@Test] describe keyspace;
> Keyspace: Test:
>   Replication Strategy: org.apache.cassandra.locator.SimpleStrategy
>   Durable Writes: true
>     Options: [replication_factor:1]
>   Column Families:
>     ColumnFamily: test
>       Key Validation Class: org.apache.cassandra.db.marshal.BytesType
>       Default column value validator: org.apache.cassandra.db.marshal.BytesType
>       Columns sorted by: org.apache.cassandra.db.marshal.LongType
>       Row cache size / save period in seconds: 0.0/0
>       Key cache size / save period in seconds: 200000.0/14400
>       Memtable thresholds: 0.571875/122/1440 (millions of ops/MB/minutes)
>       GC grace seconds: 864000
>       Compaction min/max thresholds: 4/32
>       Read repair chance: 1.0
>       Replicate on write: false
>       Built indexes: []
> {code}
> {code}
> grunt> test = LOAD 'cassandra://Test/test' USING CassandraStorage() AS (rowkey:chararray, columns: bag {T: (name:long, value:int)});
> grunt> value_test = foreach test generate rowkey, columns.name, columns.value;
> grunt> dump value_test;
> {code}
> This time it's work as expected (appart from the value displayed as char). Result:
> {code}
> (row1,{(1),(2),(3)},{(#),($),(&)})
> (row2,{(1),(2),(3)},{(-),(*),(!)})
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

[jira] [Commented] (CASSANDRA-2810) RuntimeException in Pig when using "dump" command on column name

Posted by "Brandon Williams (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-2810?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13054494#comment-13054494 ] 

Brandon Williams commented on CASSANDRA-2810:
---------------------------------------------

Yes, basically a byte array, but it's the pig type.

> RuntimeException in Pig when using "dump" command on column name
> ----------------------------------------------------------------
>
>                 Key: CASSANDRA-2810
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2810
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 0.8.1
>         Environment: Ubuntu 10.10, 32 bits
> java version "1.6.0_24"
> Brisk beta-2 installed from Debian packages
>            Reporter: Silvère Lestang
>            Assignee: Brandon Williams
>         Attachments: 2810.txt
>
>
> This bug was previously report on [Brisk bug tracker|https://datastax.jira.com/browse/BRISK-232].
> In cassandra-cli:
> {code}
> [default@unknown] create keyspace Test
>     with placement_strategy = 'org.apache.cassandra.locator.SimpleStrategy'
>     and strategy_options = [{replication_factor:1}];
> [default@unknown] use Test;
> Authenticated to keyspace: Test
> [default@Test] create column family test;
> [default@Test] set test[ascii('row1')][long(1)]=integer(35);
> set test[ascii('row1')][long(2)]=integer(36);
> set test[ascii('row1')][long(3)]=integer(38);
> set test[ascii('row2')][long(1)]=integer(45);
> set test[ascii('row2')][long(2)]=integer(42);
> set test[ascii('row2')][long(3)]=integer(33);
> [default@Test] list test;
> Using default limit of 100
> -------------------
> RowKey: 726f7731
> => (column=0000000000000001, value=35, timestamp=1308744931122000)
> => (column=0000000000000002, value=36, timestamp=1308744931124000)
> => (column=0000000000000003, value=38, timestamp=1308744931125000)
> -------------------
> RowKey: 726f7732
> => (column=0000000000000001, value=45, timestamp=1308744931127000)
> => (column=0000000000000002, value=42, timestamp=1308744931128000)
> => (column=0000000000000003, value=33, timestamp=1308744932722000)
> 2 Rows Returned.
> [default@Test] describe keyspace;
> Keyspace: Test:
>   Replication Strategy: org.apache.cassandra.locator.SimpleStrategy
>   Durable Writes: true
>     Options: [replication_factor:1]
>   Column Families:
>     ColumnFamily: test
>       Key Validation Class: org.apache.cassandra.db.marshal.BytesType
>       Default column value validator: org.apache.cassandra.db.marshal.BytesType
>       Columns sorted by: org.apache.cassandra.db.marshal.BytesType
>       Row cache size / save period in seconds: 0.0/0
>       Key cache size / save period in seconds: 200000.0/14400
>       Memtable thresholds: 0.571875/122/1440 (millions of ops/MB/minutes)
>       GC grace seconds: 864000
>       Compaction min/max thresholds: 4/32
>       Read repair chance: 1.0
>       Replicate on write: false
>       Built indexes: []
> {code}
> In Pig command line:
> {code}
> grunt> test = LOAD 'cassandra://Test/test' USING CassandraStorage() AS (rowkey:chararray, columns: bag {T: (name:long, value:int)});
> grunt> value_test = foreach test generate rowkey, columns.name, columns.value;
> grunt> dump value_test;
> {code}
> In /var/log/cassandra/system.log, I have severals time this exception:
> {code}
> INFO [IPC Server handler 3 on 8012] 2011-06-22 15:03:28,533 TaskInProgress.java (line 551) Error from attempt_201106210955_0051_m_000000_3: java.lang.RuntimeException: Unexpected data type -1 found in stream.
> 	at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:478)
> 	at org.apache.pig.data.BinInterSedes.writeTuple(BinInterSedes.java:541)
> 	at org.apache.pig.data.BinInterSedes.writeBag(BinInterSedes.java:522)
> 	at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:361)
> 	at org.apache.pig.data.BinInterSedes.writeTuple(BinInterSedes.java:541)
> 	at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:357)
> 	at org.apache.pig.impl.io.InterRecordWriter.write(InterRecordWriter.java:73)
> 	at org.apache.pig.impl.io.InterStorage.putNext(InterStorage.java:87)
> 	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:138)
> 	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:97)
> 	at org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:638)
> 	at org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
> 	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapOnly$Map.collect(PigMapOnly.java:48)
> 	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:239)
> 	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:232)
> 	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:53)
> 	at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
> 	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:763)
> 	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:369)
> 	at org.apache.hadoop.mapred.Child$4.run(Child.java:259)
> 	at java.security.AccessController.doPrivileged(Native Method)
> 	at javax.security.auth.Subject.doAs(Subject.java:396)
> 	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
> 	at org.apache.hadoop.mapred.Child.main(Child.java:253)
> {code}
> and the request failed.
> {code}
> grunt> test = LOAD 'cassandra://Test/test' USING CassandraStorage() AS (rowkey:chararray, columns: bag {T: (name:long, value:int)});
> grunt> value_test = foreach test generate rowkey, columns.value;
> grunt> dump value_test;
> {code}
> This time, without the column name, it's work (but the value are displayed as char instead of integer). Result:
> {code}
> (row1,{(#),($),(&)})
> (row2,{(-),(*),(!)})
> {code}
> Now we do the same test but we set a comparator to the CF.
> {code}
> [default@Test] create column family test with comparator = 'LongType';
> [default@Test] set test[ascii('row1')][long(1)]=integer(35);
> set test[ascii('row1')][long(2)]=integer(36);
> set test[ascii('row1')][long(3)]=integer(38);
> set test[ascii('row2')][long(1)]=integer(45);
> set test[ascii('row2')][long(2)]=integer(42);
> set test[ascii('row2')][long(3)]=integer(33);
> [default@Test] list test;
> Using default limit of 100
> -------------------
> RowKey: 726f7731
> => (column=1, value=35, timestamp=1308748643506000)
> => (column=2, value=36, timestamp=1308748643508000)
> => (column=3, value=38, timestamp=1308748643509000)
> -------------------
> RowKey: 726f7732
> => (column=1, value=45, timestamp=1308748643510000)
> => (column=2, value=42, timestamp=1308748643512000)
> => (column=3, value=33, timestamp=1308748645138000)
> 2 Rows Returned.
> [default@Test] describe keyspace;
> Keyspace: Test:
>   Replication Strategy: org.apache.cassandra.locator.SimpleStrategy
>   Durable Writes: true
>     Options: [replication_factor:1]
>   Column Families:
>     ColumnFamily: test
>       Key Validation Class: org.apache.cassandra.db.marshal.BytesType
>       Default column value validator: org.apache.cassandra.db.marshal.BytesType
>       Columns sorted by: org.apache.cassandra.db.marshal.LongType
>       Row cache size / save period in seconds: 0.0/0
>       Key cache size / save period in seconds: 200000.0/14400
>       Memtable thresholds: 0.571875/122/1440 (millions of ops/MB/minutes)
>       GC grace seconds: 864000
>       Compaction min/max thresholds: 4/32
>       Read repair chance: 1.0
>       Replicate on write: false
>       Built indexes: []
> {code}
> {code}
> grunt> test = LOAD 'cassandra://Test/test' USING CassandraStorage() AS (rowkey:chararray, columns: bag {T: (name:long, value:int)});
> grunt> value_test = foreach test generate rowkey, columns.name, columns.value;
> grunt> dump value_test;
> {code}
> This time it's work as expected (appart from the value displayed as char). Result:
> {code}
> (row1,{(1),(2),(3)},{(#),($),(&)})
> (row2,{(1),(2),(3)},{(-),(*),(!)})
> {code}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

[jira] [Commented] (CASSANDRA-2810) RuntimeException in Pig when using "dump" command on column name

Posted by "Jeremy Hanna (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-2810?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13116825#comment-13116825 ] 

Jeremy Hanna commented on CASSANDRA-2810:
-----------------------------------------

+1 - if we find any issues with it in production, we'll submit bug reports.
                
> RuntimeException in Pig when using "dump" command on column name
> ----------------------------------------------------------------
>
>                 Key: CASSANDRA-2810
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2810
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 0.8.1
>         Environment: Ubuntu 10.10, 32 bits
> java version "1.6.0_24"
> Brisk beta-2 installed from Debian packages
>            Reporter: Silvère Lestang
>            Assignee: Brandon Williams
>         Attachments: 2810-v2.txt, 2810-v3.txt, 2810.txt
>
>
> This bug was previously report on [Brisk bug tracker|https://datastax.jira.com/browse/BRISK-232].
> In cassandra-cli:
> {code}
> [default@unknown] create keyspace Test
>     with placement_strategy = 'org.apache.cassandra.locator.SimpleStrategy'
>     and strategy_options = [{replication_factor:1}];
> [default@unknown] use Test;
> Authenticated to keyspace: Test
> [default@Test] create column family test;
> [default@Test] set test[ascii('row1')][long(1)]=integer(35);
> set test[ascii('row1')][long(2)]=integer(36);
> set test[ascii('row1')][long(3)]=integer(38);
> set test[ascii('row2')][long(1)]=integer(45);
> set test[ascii('row2')][long(2)]=integer(42);
> set test[ascii('row2')][long(3)]=integer(33);
> [default@Test] list test;
> Using default limit of 100
> -------------------
> RowKey: 726f7731
> => (column=0000000000000001, value=35, timestamp=1308744931122000)
> => (column=0000000000000002, value=36, timestamp=1308744931124000)
> => (column=0000000000000003, value=38, timestamp=1308744931125000)
> -------------------
> RowKey: 726f7732
> => (column=0000000000000001, value=45, timestamp=1308744931127000)
> => (column=0000000000000002, value=42, timestamp=1308744931128000)
> => (column=0000000000000003, value=33, timestamp=1308744932722000)
> 2 Rows Returned.
> [default@Test] describe keyspace;
> Keyspace: Test:
>   Replication Strategy: org.apache.cassandra.locator.SimpleStrategy
>   Durable Writes: true
>     Options: [replication_factor:1]
>   Column Families:
>     ColumnFamily: test
>       Key Validation Class: org.apache.cassandra.db.marshal.BytesType
>       Default column value validator: org.apache.cassandra.db.marshal.BytesType
>       Columns sorted by: org.apache.cassandra.db.marshal.BytesType
>       Row cache size / save period in seconds: 0.0/0
>       Key cache size / save period in seconds: 200000.0/14400
>       Memtable thresholds: 0.571875/122/1440 (millions of ops/MB/minutes)
>       GC grace seconds: 864000
>       Compaction min/max thresholds: 4/32
>       Read repair chance: 1.0
>       Replicate on write: false
>       Built indexes: []
> {code}
> In Pig command line:
> {code}
> grunt> test = LOAD 'cassandra://Test/test' USING CassandraStorage() AS (rowkey:chararray, columns: bag {T: (name:long, value:int)});
> grunt> value_test = foreach test generate rowkey, columns.name, columns.value;
> grunt> dump value_test;
> {code}
> In /var/log/cassandra/system.log, I have severals time this exception:
> {code}
> INFO [IPC Server handler 3 on 8012] 2011-06-22 15:03:28,533 TaskInProgress.java (line 551) Error from attempt_201106210955_0051_m_000000_3: java.lang.RuntimeException: Unexpected data type -1 found in stream.
> 	at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:478)
> 	at org.apache.pig.data.BinInterSedes.writeTuple(BinInterSedes.java:541)
> 	at org.apache.pig.data.BinInterSedes.writeBag(BinInterSedes.java:522)
> 	at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:361)
> 	at org.apache.pig.data.BinInterSedes.writeTuple(BinInterSedes.java:541)
> 	at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:357)
> 	at org.apache.pig.impl.io.InterRecordWriter.write(InterRecordWriter.java:73)
> 	at org.apache.pig.impl.io.InterStorage.putNext(InterStorage.java:87)
> 	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:138)
> 	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:97)
> 	at org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:638)
> 	at org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
> 	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapOnly$Map.collect(PigMapOnly.java:48)
> 	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:239)
> 	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:232)
> 	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:53)
> 	at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
> 	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:763)
> 	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:369)
> 	at org.apache.hadoop.mapred.Child$4.run(Child.java:259)
> 	at java.security.AccessController.doPrivileged(Native Method)
> 	at javax.security.auth.Subject.doAs(Subject.java:396)
> 	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
> 	at org.apache.hadoop.mapred.Child.main(Child.java:253)
> {code}
> and the request failed.
> {code}
> grunt> test = LOAD 'cassandra://Test/test' USING CassandraStorage() AS (rowkey:chararray, columns: bag {T: (name:long, value:int)});
> grunt> value_test = foreach test generate rowkey, columns.value;
> grunt> dump value_test;
> {code}
> This time, without the column name, it's work (but the value are displayed as char instead of integer). Result:
> {code}
> (row1,{(#),($),(&)})
> (row2,{(-),(*),(!)})
> {code}
> Now we do the same test but we set a comparator to the CF.
> {code}
> [default@Test] create column family test with comparator = 'LongType';
> [default@Test] set test[ascii('row1')][long(1)]=integer(35);
> set test[ascii('row1')][long(2)]=integer(36);
> set test[ascii('row1')][long(3)]=integer(38);
> set test[ascii('row2')][long(1)]=integer(45);
> set test[ascii('row2')][long(2)]=integer(42);
> set test[ascii('row2')][long(3)]=integer(33);
> [default@Test] list test;
> Using default limit of 100
> -------------------
> RowKey: 726f7731
> => (column=1, value=35, timestamp=1308748643506000)
> => (column=2, value=36, timestamp=1308748643508000)
> => (column=3, value=38, timestamp=1308748643509000)
> -------------------
> RowKey: 726f7732
> => (column=1, value=45, timestamp=1308748643510000)
> => (column=2, value=42, timestamp=1308748643512000)
> => (column=3, value=33, timestamp=1308748645138000)
> 2 Rows Returned.
> [default@Test] describe keyspace;
> Keyspace: Test:
>   Replication Strategy: org.apache.cassandra.locator.SimpleStrategy
>   Durable Writes: true
>     Options: [replication_factor:1]
>   Column Families:
>     ColumnFamily: test
>       Key Validation Class: org.apache.cassandra.db.marshal.BytesType
>       Default column value validator: org.apache.cassandra.db.marshal.BytesType
>       Columns sorted by: org.apache.cassandra.db.marshal.LongType
>       Row cache size / save period in seconds: 0.0/0
>       Key cache size / save period in seconds: 200000.0/14400
>       Memtable thresholds: 0.571875/122/1440 (millions of ops/MB/minutes)
>       GC grace seconds: 864000
>       Compaction min/max thresholds: 4/32
>       Read repair chance: 1.0
>       Replicate on write: false
>       Built indexes: []
> {code}
> {code}
> grunt> test = LOAD 'cassandra://Test/test' USING CassandraStorage() AS (rowkey:chararray, columns: bag {T: (name:long, value:int)});
> grunt> value_test = foreach test generate rowkey, columns.name, columns.value;
> grunt> dump value_test;
> {code}
> This time it's work as expected (appart from the value displayed as char). Result:
> {code}
> (row1,{(1),(2),(3)},{(#),($),(&)})
> (row2,{(1),(2),(3)},{(-),(*),(!)})
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

[jira] [Commented] (CASSANDRA-2810) RuntimeException in Pig when using "dump" command on column name

Posted by "Brandon Williams (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-2810?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13062128#comment-13062128 ] 

Brandon Williams commented on CASSANDRA-2810:
---------------------------------------------

So is the conclusion that this patch by itself works fine, but there is a problem with CASSANDRA-2777?

> RuntimeException in Pig when using "dump" command on column name
> ----------------------------------------------------------------
>
>                 Key: CASSANDRA-2810
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2810
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 0.8.1
>         Environment: Ubuntu 10.10, 32 bits
> java version "1.6.0_24"
> Brisk beta-2 installed from Debian packages
>            Reporter: Silvère Lestang
>            Assignee: Brandon Williams
>         Attachments: 2810.txt
>
>
> This bug was previously report on [Brisk bug tracker|https://datastax.jira.com/browse/BRISK-232].
> In cassandra-cli:
> {code}
> [default@unknown] create keyspace Test
>     with placement_strategy = 'org.apache.cassandra.locator.SimpleStrategy'
>     and strategy_options = [{replication_factor:1}];
> [default@unknown] use Test;
> Authenticated to keyspace: Test
> [default@Test] create column family test;
> [default@Test] set test[ascii('row1')][long(1)]=integer(35);
> set test[ascii('row1')][long(2)]=integer(36);
> set test[ascii('row1')][long(3)]=integer(38);
> set test[ascii('row2')][long(1)]=integer(45);
> set test[ascii('row2')][long(2)]=integer(42);
> set test[ascii('row2')][long(3)]=integer(33);
> [default@Test] list test;
> Using default limit of 100
> -------------------
> RowKey: 726f7731
> => (column=0000000000000001, value=35, timestamp=1308744931122000)
> => (column=0000000000000002, value=36, timestamp=1308744931124000)
> => (column=0000000000000003, value=38, timestamp=1308744931125000)
> -------------------
> RowKey: 726f7732
> => (column=0000000000000001, value=45, timestamp=1308744931127000)
> => (column=0000000000000002, value=42, timestamp=1308744931128000)
> => (column=0000000000000003, value=33, timestamp=1308744932722000)
> 2 Rows Returned.
> [default@Test] describe keyspace;
> Keyspace: Test:
>   Replication Strategy: org.apache.cassandra.locator.SimpleStrategy
>   Durable Writes: true
>     Options: [replication_factor:1]
>   Column Families:
>     ColumnFamily: test
>       Key Validation Class: org.apache.cassandra.db.marshal.BytesType
>       Default column value validator: org.apache.cassandra.db.marshal.BytesType
>       Columns sorted by: org.apache.cassandra.db.marshal.BytesType
>       Row cache size / save period in seconds: 0.0/0
>       Key cache size / save period in seconds: 200000.0/14400
>       Memtable thresholds: 0.571875/122/1440 (millions of ops/MB/minutes)
>       GC grace seconds: 864000
>       Compaction min/max thresholds: 4/32
>       Read repair chance: 1.0
>       Replicate on write: false
>       Built indexes: []
> {code}
> In Pig command line:
> {code}
> grunt> test = LOAD 'cassandra://Test/test' USING CassandraStorage() AS (rowkey:chararray, columns: bag {T: (name:long, value:int)});
> grunt> value_test = foreach test generate rowkey, columns.name, columns.value;
> grunt> dump value_test;
> {code}
> In /var/log/cassandra/system.log, I have severals time this exception:
> {code}
> INFO [IPC Server handler 3 on 8012] 2011-06-22 15:03:28,533 TaskInProgress.java (line 551) Error from attempt_201106210955_0051_m_000000_3: java.lang.RuntimeException: Unexpected data type -1 found in stream.
> 	at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:478)
> 	at org.apache.pig.data.BinInterSedes.writeTuple(BinInterSedes.java:541)
> 	at org.apache.pig.data.BinInterSedes.writeBag(BinInterSedes.java:522)
> 	at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:361)
> 	at org.apache.pig.data.BinInterSedes.writeTuple(BinInterSedes.java:541)
> 	at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:357)
> 	at org.apache.pig.impl.io.InterRecordWriter.write(InterRecordWriter.java:73)
> 	at org.apache.pig.impl.io.InterStorage.putNext(InterStorage.java:87)
> 	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:138)
> 	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:97)
> 	at org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:638)
> 	at org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
> 	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapOnly$Map.collect(PigMapOnly.java:48)
> 	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:239)
> 	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:232)
> 	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:53)
> 	at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
> 	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:763)
> 	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:369)
> 	at org.apache.hadoop.mapred.Child$4.run(Child.java:259)
> 	at java.security.AccessController.doPrivileged(Native Method)
> 	at javax.security.auth.Subject.doAs(Subject.java:396)
> 	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
> 	at org.apache.hadoop.mapred.Child.main(Child.java:253)
> {code}
> and the request failed.
> {code}
> grunt> test = LOAD 'cassandra://Test/test' USING CassandraStorage() AS (rowkey:chararray, columns: bag {T: (name:long, value:int)});
> grunt> value_test = foreach test generate rowkey, columns.value;
> grunt> dump value_test;
> {code}
> This time, without the column name, it's work (but the value are displayed as char instead of integer). Result:
> {code}
> (row1,{(#),($),(&)})
> (row2,{(-),(*),(!)})
> {code}
> Now we do the same test but we set a comparator to the CF.
> {code}
> [default@Test] create column family test with comparator = 'LongType';
> [default@Test] set test[ascii('row1')][long(1)]=integer(35);
> set test[ascii('row1')][long(2)]=integer(36);
> set test[ascii('row1')][long(3)]=integer(38);
> set test[ascii('row2')][long(1)]=integer(45);
> set test[ascii('row2')][long(2)]=integer(42);
> set test[ascii('row2')][long(3)]=integer(33);
> [default@Test] list test;
> Using default limit of 100
> -------------------
> RowKey: 726f7731
> => (column=1, value=35, timestamp=1308748643506000)
> => (column=2, value=36, timestamp=1308748643508000)
> => (column=3, value=38, timestamp=1308748643509000)
> -------------------
> RowKey: 726f7732
> => (column=1, value=45, timestamp=1308748643510000)
> => (column=2, value=42, timestamp=1308748643512000)
> => (column=3, value=33, timestamp=1308748645138000)
> 2 Rows Returned.
> [default@Test] describe keyspace;
> Keyspace: Test:
>   Replication Strategy: org.apache.cassandra.locator.SimpleStrategy
>   Durable Writes: true
>     Options: [replication_factor:1]
>   Column Families:
>     ColumnFamily: test
>       Key Validation Class: org.apache.cassandra.db.marshal.BytesType
>       Default column value validator: org.apache.cassandra.db.marshal.BytesType
>       Columns sorted by: org.apache.cassandra.db.marshal.LongType
>       Row cache size / save period in seconds: 0.0/0
>       Key cache size / save period in seconds: 200000.0/14400
>       Memtable thresholds: 0.571875/122/1440 (millions of ops/MB/minutes)
>       GC grace seconds: 864000
>       Compaction min/max thresholds: 4/32
>       Read repair chance: 1.0
>       Replicate on write: false
>       Built indexes: []
> {code}
> {code}
> grunt> test = LOAD 'cassandra://Test/test' USING CassandraStorage() AS (rowkey:chararray, columns: bag {T: (name:long, value:int)});
> grunt> value_test = foreach test generate rowkey, columns.name, columns.value;
> grunt> dump value_test;
> {code}
> This time it's work as expected (appart from the value displayed as char). Result:
> {code}
> (row1,{(1),(2),(3)},{(#),($),(&)})
> (row2,{(1),(2),(3)},{(-),(*),(!)})
> {code}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

[jira] [Commented] (CASSANDRA-2810) RuntimeException in Pig when using "dump" command on column name

Posted by "Silvère Lestang (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-2810?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13055585#comment-13055585 ] 

Silvère Lestang commented on CASSANDRA-2810:
--------------------------------------------

After more test (with both patches), path [^2810.txt] doesn't seems to solve the bug.
Here is a new test case:
Create a _Test_ keyspace and a _test_ column family with key_validation_class = 'AsciiType' and comparator = 'LongType' and default_validation_class = 'IntegerType' (don't use the cli because of [#CASSANDRA-2831]).
Insert some data:
{code}
set test[ascii('row1')][long(1)]=integer(35);
set test[ascii('row1')][long(2)]=integer(36);
set test[ascii('row1')][long(3)]=integer(38);
set test[ascii('row2')][long(1)]=integer(45);
set test[ascii('row2')][long(2)]=integer(42);
set test[ascii('row2')][long(3)]=integer(33);
{code}

In Pig cli:
{code}
test = LOAD 'cassandra://Test/test' USING CassandraStorage() AS ();
dump test;
{code}
The same exception as before is raised:
{code}
 INFO [IPC Server handler 4 on 8012] 2011-06-27 16:40:28,562 TaskInProgress.java (line 551) Error from attempt_201106271436_0012_m_000000_1: java.lang.RuntimeException: Unexpected data type -1 found in stream.
	at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:478)
	at org.apache.pig.data.BinInterSedes.writeTuple(BinInterSedes.java:541)
	at org.apache.pig.data.BinInterSedes.writeBag(BinInterSedes.java:522)
	at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:361)
	at org.apache.pig.data.BinInterSedes.writeTuple(BinInterSedes.java:541)
	at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:357)
	at org.apache.pig.impl.io.InterRecordWriter.write(InterRecordWriter.java:73)
	at org.apache.pig.impl.io.InterStorage.putNext(InterStorage.java:87)
	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:138)
	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:97)
	at org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:638)
	at org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapOnly$Map.collect(PigMapOnly.java:48)
	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:224)
	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:53)
	at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:763)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:369)
	at org.apache.hadoop.mapred.Child$4.run(Child.java:259)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:396)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
	at org.apache.hadoop.mapred.Child.main(Child.java:253)

{code}

> RuntimeException in Pig when using "dump" command on column name
> ----------------------------------------------------------------
>
>                 Key: CASSANDRA-2810
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2810
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 0.8.1
>         Environment: Ubuntu 10.10, 32 bits
> java version "1.6.0_24"
> Brisk beta-2 installed from Debian packages
>            Reporter: Silvère Lestang
>            Assignee: Brandon Williams
>         Attachments: 2810.txt
>
>
> This bug was previously report on [Brisk bug tracker|https://datastax.jira.com/browse/BRISK-232].
> In cassandra-cli:
> {code}
> [default@unknown] create keyspace Test
>     with placement_strategy = 'org.apache.cassandra.locator.SimpleStrategy'
>     and strategy_options = [{replication_factor:1}];
> [default@unknown] use Test;
> Authenticated to keyspace: Test
> [default@Test] create column family test;
> [default@Test] set test[ascii('row1')][long(1)]=integer(35);
> set test[ascii('row1')][long(2)]=integer(36);
> set test[ascii('row1')][long(3)]=integer(38);
> set test[ascii('row2')][long(1)]=integer(45);
> set test[ascii('row2')][long(2)]=integer(42);
> set test[ascii('row2')][long(3)]=integer(33);
> [default@Test] list test;
> Using default limit of 100
> -------------------
> RowKey: 726f7731
> => (column=0000000000000001, value=35, timestamp=1308744931122000)
> => (column=0000000000000002, value=36, timestamp=1308744931124000)
> => (column=0000000000000003, value=38, timestamp=1308744931125000)
> -------------------
> RowKey: 726f7732
> => (column=0000000000000001, value=45, timestamp=1308744931127000)
> => (column=0000000000000002, value=42, timestamp=1308744931128000)
> => (column=0000000000000003, value=33, timestamp=1308744932722000)
> 2 Rows Returned.
> [default@Test] describe keyspace;
> Keyspace: Test:
>   Replication Strategy: org.apache.cassandra.locator.SimpleStrategy
>   Durable Writes: true
>     Options: [replication_factor:1]
>   Column Families:
>     ColumnFamily: test
>       Key Validation Class: org.apache.cassandra.db.marshal.BytesType
>       Default column value validator: org.apache.cassandra.db.marshal.BytesType
>       Columns sorted by: org.apache.cassandra.db.marshal.BytesType
>       Row cache size / save period in seconds: 0.0/0
>       Key cache size / save period in seconds: 200000.0/14400
>       Memtable thresholds: 0.571875/122/1440 (millions of ops/MB/minutes)
>       GC grace seconds: 864000
>       Compaction min/max thresholds: 4/32
>       Read repair chance: 1.0
>       Replicate on write: false
>       Built indexes: []
> {code}
> In Pig command line:
> {code}
> grunt> test = LOAD 'cassandra://Test/test' USING CassandraStorage() AS (rowkey:chararray, columns: bag {T: (name:long, value:int)});
> grunt> value_test = foreach test generate rowkey, columns.name, columns.value;
> grunt> dump value_test;
> {code}
> In /var/log/cassandra/system.log, I have severals time this exception:
> {code}
> INFO [IPC Server handler 3 on 8012] 2011-06-22 15:03:28,533 TaskInProgress.java (line 551) Error from attempt_201106210955_0051_m_000000_3: java.lang.RuntimeException: Unexpected data type -1 found in stream.
> 	at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:478)
> 	at org.apache.pig.data.BinInterSedes.writeTuple(BinInterSedes.java:541)
> 	at org.apache.pig.data.BinInterSedes.writeBag(BinInterSedes.java:522)
> 	at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:361)
> 	at org.apache.pig.data.BinInterSedes.writeTuple(BinInterSedes.java:541)
> 	at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:357)
> 	at org.apache.pig.impl.io.InterRecordWriter.write(InterRecordWriter.java:73)
> 	at org.apache.pig.impl.io.InterStorage.putNext(InterStorage.java:87)
> 	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:138)
> 	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:97)
> 	at org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:638)
> 	at org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
> 	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapOnly$Map.collect(PigMapOnly.java:48)
> 	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:239)
> 	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:232)
> 	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:53)
> 	at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
> 	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:763)
> 	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:369)
> 	at org.apache.hadoop.mapred.Child$4.run(Child.java:259)
> 	at java.security.AccessController.doPrivileged(Native Method)
> 	at javax.security.auth.Subject.doAs(Subject.java:396)
> 	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
> 	at org.apache.hadoop.mapred.Child.main(Child.java:253)
> {code}
> and the request failed.
> {code}
> grunt> test = LOAD 'cassandra://Test/test' USING CassandraStorage() AS (rowkey:chararray, columns: bag {T: (name:long, value:int)});
> grunt> value_test = foreach test generate rowkey, columns.value;
> grunt> dump value_test;
> {code}
> This time, without the column name, it's work (but the value are displayed as char instead of integer). Result:
> {code}
> (row1,{(#),($),(&)})
> (row2,{(-),(*),(!)})
> {code}
> Now we do the same test but we set a comparator to the CF.
> {code}
> [default@Test] create column family test with comparator = 'LongType';
> [default@Test] set test[ascii('row1')][long(1)]=integer(35);
> set test[ascii('row1')][long(2)]=integer(36);
> set test[ascii('row1')][long(3)]=integer(38);
> set test[ascii('row2')][long(1)]=integer(45);
> set test[ascii('row2')][long(2)]=integer(42);
> set test[ascii('row2')][long(3)]=integer(33);
> [default@Test] list test;
> Using default limit of 100
> -------------------
> RowKey: 726f7731
> => (column=1, value=35, timestamp=1308748643506000)
> => (column=2, value=36, timestamp=1308748643508000)
> => (column=3, value=38, timestamp=1308748643509000)
> -------------------
> RowKey: 726f7732
> => (column=1, value=45, timestamp=1308748643510000)
> => (column=2, value=42, timestamp=1308748643512000)
> => (column=3, value=33, timestamp=1308748645138000)
> 2 Rows Returned.
> [default@Test] describe keyspace;
> Keyspace: Test:
>   Replication Strategy: org.apache.cassandra.locator.SimpleStrategy
>   Durable Writes: true
>     Options: [replication_factor:1]
>   Column Families:
>     ColumnFamily: test
>       Key Validation Class: org.apache.cassandra.db.marshal.BytesType
>       Default column value validator: org.apache.cassandra.db.marshal.BytesType
>       Columns sorted by: org.apache.cassandra.db.marshal.LongType
>       Row cache size / save period in seconds: 0.0/0
>       Key cache size / save period in seconds: 200000.0/14400
>       Memtable thresholds: 0.571875/122/1440 (millions of ops/MB/minutes)
>       GC grace seconds: 864000
>       Compaction min/max thresholds: 4/32
>       Read repair chance: 1.0
>       Replicate on write: false
>       Built indexes: []
> {code}
> {code}
> grunt> test = LOAD 'cassandra://Test/test' USING CassandraStorage() AS (rowkey:chararray, columns: bag {T: (name:long, value:int)});
> grunt> value_test = foreach test generate rowkey, columns.name, columns.value;
> grunt> dump value_test;
> {code}
> This time it's work as expected (appart from the value displayed as char). Result:
> {code}
> (row1,{(1),(2),(3)},{(#),($),(&)})
> (row2,{(1),(2),(3)},{(-),(*),(!)})
> {code}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

[jira] [Commented] (CASSANDRA-2810) RuntimeException in Pig when using "dump" command on column name

Posted by "Silvère Lestang (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-2810?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13055521#comment-13055521 ] 

Silvère Lestang commented on CASSANDRA-2810:
--------------------------------------------

I try again after applying [^2810.txt] and the patch from bug [CASSANDRA-2777] and the bug is still here.
With the patch, you need to replace
{code}
test = LOAD 'cassandra://Test/test' USING CassandraStorage() AS (rowkey:chararray, columns: bag {T: (name:long, value:int)});
{code}
by
{code}
test = LOAD 'cassandra://Test/test' USING CassandraStorage() AS ();
{code}
because CassandraStorage takes care of the schema.

I try:
{code}
grunt> describe test;
test: {key: chararray,columns: {(name: long,value: int)}}
{code}
so we can see that the patch from bug 2777 works correctly (I also test with different types for value).
But when I dump test, I still have the same exception.

> RuntimeException in Pig when using "dump" command on column name
> ----------------------------------------------------------------
>
>                 Key: CASSANDRA-2810
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2810
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 0.8.1
>         Environment: Ubuntu 10.10, 32 bits
> java version "1.6.0_24"
> Brisk beta-2 installed from Debian packages
>            Reporter: Silvère Lestang
>            Assignee: Brandon Williams
>         Attachments: 2810.txt
>
>
> This bug was previously report on [Brisk bug tracker|https://datastax.jira.com/browse/BRISK-232].
> In cassandra-cli:
> {code}
> [default@unknown] create keyspace Test
>     with placement_strategy = 'org.apache.cassandra.locator.SimpleStrategy'
>     and strategy_options = [{replication_factor:1}];
> [default@unknown] use Test;
> Authenticated to keyspace: Test
> [default@Test] create column family test;
> [default@Test] set test[ascii('row1')][long(1)]=integer(35);
> set test[ascii('row1')][long(2)]=integer(36);
> set test[ascii('row1')][long(3)]=integer(38);
> set test[ascii('row2')][long(1)]=integer(45);
> set test[ascii('row2')][long(2)]=integer(42);
> set test[ascii('row2')][long(3)]=integer(33);
> [default@Test] list test;
> Using default limit of 100
> -------------------
> RowKey: 726f7731
> => (column=0000000000000001, value=35, timestamp=1308744931122000)
> => (column=0000000000000002, value=36, timestamp=1308744931124000)
> => (column=0000000000000003, value=38, timestamp=1308744931125000)
> -------------------
> RowKey: 726f7732
> => (column=0000000000000001, value=45, timestamp=1308744931127000)
> => (column=0000000000000002, value=42, timestamp=1308744931128000)
> => (column=0000000000000003, value=33, timestamp=1308744932722000)
> 2 Rows Returned.
> [default@Test] describe keyspace;
> Keyspace: Test:
>   Replication Strategy: org.apache.cassandra.locator.SimpleStrategy
>   Durable Writes: true
>     Options: [replication_factor:1]
>   Column Families:
>     ColumnFamily: test
>       Key Validation Class: org.apache.cassandra.db.marshal.BytesType
>       Default column value validator: org.apache.cassandra.db.marshal.BytesType
>       Columns sorted by: org.apache.cassandra.db.marshal.BytesType
>       Row cache size / save period in seconds: 0.0/0
>       Key cache size / save period in seconds: 200000.0/14400
>       Memtable thresholds: 0.571875/122/1440 (millions of ops/MB/minutes)
>       GC grace seconds: 864000
>       Compaction min/max thresholds: 4/32
>       Read repair chance: 1.0
>       Replicate on write: false
>       Built indexes: []
> {code}
> In Pig command line:
> {code}
> grunt> test = LOAD 'cassandra://Test/test' USING CassandraStorage() AS (rowkey:chararray, columns: bag {T: (name:long, value:int)});
> grunt> value_test = foreach test generate rowkey, columns.name, columns.value;
> grunt> dump value_test;
> {code}
> In /var/log/cassandra/system.log, I have severals time this exception:
> {code}
> INFO [IPC Server handler 3 on 8012] 2011-06-22 15:03:28,533 TaskInProgress.java (line 551) Error from attempt_201106210955_0051_m_000000_3: java.lang.RuntimeException: Unexpected data type -1 found in stream.
> 	at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:478)
> 	at org.apache.pig.data.BinInterSedes.writeTuple(BinInterSedes.java:541)
> 	at org.apache.pig.data.BinInterSedes.writeBag(BinInterSedes.java:522)
> 	at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:361)
> 	at org.apache.pig.data.BinInterSedes.writeTuple(BinInterSedes.java:541)
> 	at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:357)
> 	at org.apache.pig.impl.io.InterRecordWriter.write(InterRecordWriter.java:73)
> 	at org.apache.pig.impl.io.InterStorage.putNext(InterStorage.java:87)
> 	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:138)
> 	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:97)
> 	at org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:638)
> 	at org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
> 	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapOnly$Map.collect(PigMapOnly.java:48)
> 	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:239)
> 	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:232)
> 	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:53)
> 	at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
> 	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:763)
> 	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:369)
> 	at org.apache.hadoop.mapred.Child$4.run(Child.java:259)
> 	at java.security.AccessController.doPrivileged(Native Method)
> 	at javax.security.auth.Subject.doAs(Subject.java:396)
> 	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
> 	at org.apache.hadoop.mapred.Child.main(Child.java:253)
> {code}
> and the request failed.
> {code}
> grunt> test = LOAD 'cassandra://Test/test' USING CassandraStorage() AS (rowkey:chararray, columns: bag {T: (name:long, value:int)});
> grunt> value_test = foreach test generate rowkey, columns.value;
> grunt> dump value_test;
> {code}
> This time, without the column name, it's work (but the value are displayed as char instead of integer). Result:
> {code}
> (row1,{(#),($),(&)})
> (row2,{(-),(*),(!)})
> {code}
> Now we do the same test but we set a comparator to the CF.
> {code}
> [default@Test] create column family test with comparator = 'LongType';
> [default@Test] set test[ascii('row1')][long(1)]=integer(35);
> set test[ascii('row1')][long(2)]=integer(36);
> set test[ascii('row1')][long(3)]=integer(38);
> set test[ascii('row2')][long(1)]=integer(45);
> set test[ascii('row2')][long(2)]=integer(42);
> set test[ascii('row2')][long(3)]=integer(33);
> [default@Test] list test;
> Using default limit of 100
> -------------------
> RowKey: 726f7731
> => (column=1, value=35, timestamp=1308748643506000)
> => (column=2, value=36, timestamp=1308748643508000)
> => (column=3, value=38, timestamp=1308748643509000)
> -------------------
> RowKey: 726f7732
> => (column=1, value=45, timestamp=1308748643510000)
> => (column=2, value=42, timestamp=1308748643512000)
> => (column=3, value=33, timestamp=1308748645138000)
> 2 Rows Returned.
> [default@Test] describe keyspace;
> Keyspace: Test:
>   Replication Strategy: org.apache.cassandra.locator.SimpleStrategy
>   Durable Writes: true
>     Options: [replication_factor:1]
>   Column Families:
>     ColumnFamily: test
>       Key Validation Class: org.apache.cassandra.db.marshal.BytesType
>       Default column value validator: org.apache.cassandra.db.marshal.BytesType
>       Columns sorted by: org.apache.cassandra.db.marshal.LongType
>       Row cache size / save period in seconds: 0.0/0
>       Key cache size / save period in seconds: 200000.0/14400
>       Memtable thresholds: 0.571875/122/1440 (millions of ops/MB/minutes)
>       GC grace seconds: 864000
>       Compaction min/max thresholds: 4/32
>       Read repair chance: 1.0
>       Replicate on write: false
>       Built indexes: []
> {code}
> {code}
> grunt> test = LOAD 'cassandra://Test/test' USING CassandraStorage() AS (rowkey:chararray, columns: bag {T: (name:long, value:int)});
> grunt> value_test = foreach test generate rowkey, columns.name, columns.value;
> grunt> dump value_test;
> {code}
> This time it's work as expected (appart from the value displayed as char). Result:
> {code}
> (row1,{(1),(2),(3)},{(#),($),(&)})
> (row2,{(1),(2),(3)},{(-),(*),(!)})
> {code}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

[jira] [Updated] (CASSANDRA-2810) RuntimeException in Pig when using "dump" command on column name

Posted by "Brandon Williams (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-2810?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Brandon Williams updated CASSANDRA-2810:
----------------------------------------

    Attachment: 2810-v3.txt

v3 also removes decomposing the values before inserting and instead forces them into a ByteBuffer with objToBB, since we actually don't care about the type. (why did we ever change this?)

This means that a UDF that doesn't preserve the schema and hands us back DataByteArrays when we fed it specific types can't make us fail anymore.

> RuntimeException in Pig when using "dump" command on column name
> ----------------------------------------------------------------
>
>                 Key: CASSANDRA-2810
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2810
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 0.8.1
>         Environment: Ubuntu 10.10, 32 bits
> java version "1.6.0_24"
> Brisk beta-2 installed from Debian packages
>            Reporter: Silvère Lestang
>            Assignee: Brandon Williams
>         Attachments: 2810-v2.txt, 2810-v3.txt, 2810.txt
>
>
> This bug was previously report on [Brisk bug tracker|https://datastax.jira.com/browse/BRISK-232].
> In cassandra-cli:
> {code}
> [default@unknown] create keyspace Test
>     with placement_strategy = 'org.apache.cassandra.locator.SimpleStrategy'
>     and strategy_options = [{replication_factor:1}];
> [default@unknown] use Test;
> Authenticated to keyspace: Test
> [default@Test] create column family test;
> [default@Test] set test[ascii('row1')][long(1)]=integer(35);
> set test[ascii('row1')][long(2)]=integer(36);
> set test[ascii('row1')][long(3)]=integer(38);
> set test[ascii('row2')][long(1)]=integer(45);
> set test[ascii('row2')][long(2)]=integer(42);
> set test[ascii('row2')][long(3)]=integer(33);
> [default@Test] list test;
> Using default limit of 100
> -------------------
> RowKey: 726f7731
> => (column=0000000000000001, value=35, timestamp=1308744931122000)
> => (column=0000000000000002, value=36, timestamp=1308744931124000)
> => (column=0000000000000003, value=38, timestamp=1308744931125000)
> -------------------
> RowKey: 726f7732
> => (column=0000000000000001, value=45, timestamp=1308744931127000)
> => (column=0000000000000002, value=42, timestamp=1308744931128000)
> => (column=0000000000000003, value=33, timestamp=1308744932722000)
> 2 Rows Returned.
> [default@Test] describe keyspace;
> Keyspace: Test:
>   Replication Strategy: org.apache.cassandra.locator.SimpleStrategy
>   Durable Writes: true
>     Options: [replication_factor:1]
>   Column Families:
>     ColumnFamily: test
>       Key Validation Class: org.apache.cassandra.db.marshal.BytesType
>       Default column value validator: org.apache.cassandra.db.marshal.BytesType
>       Columns sorted by: org.apache.cassandra.db.marshal.BytesType
>       Row cache size / save period in seconds: 0.0/0
>       Key cache size / save period in seconds: 200000.0/14400
>       Memtable thresholds: 0.571875/122/1440 (millions of ops/MB/minutes)
>       GC grace seconds: 864000
>       Compaction min/max thresholds: 4/32
>       Read repair chance: 1.0
>       Replicate on write: false
>       Built indexes: []
> {code}
> In Pig command line:
> {code}
> grunt> test = LOAD 'cassandra://Test/test' USING CassandraStorage() AS (rowkey:chararray, columns: bag {T: (name:long, value:int)});
> grunt> value_test = foreach test generate rowkey, columns.name, columns.value;
> grunt> dump value_test;
> {code}
> In /var/log/cassandra/system.log, I have severals time this exception:
> {code}
> INFO [IPC Server handler 3 on 8012] 2011-06-22 15:03:28,533 TaskInProgress.java (line 551) Error from attempt_201106210955_0051_m_000000_3: java.lang.RuntimeException: Unexpected data type -1 found in stream.
> 	at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:478)
> 	at org.apache.pig.data.BinInterSedes.writeTuple(BinInterSedes.java:541)
> 	at org.apache.pig.data.BinInterSedes.writeBag(BinInterSedes.java:522)
> 	at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:361)
> 	at org.apache.pig.data.BinInterSedes.writeTuple(BinInterSedes.java:541)
> 	at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:357)
> 	at org.apache.pig.impl.io.InterRecordWriter.write(InterRecordWriter.java:73)
> 	at org.apache.pig.impl.io.InterStorage.putNext(InterStorage.java:87)
> 	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:138)
> 	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:97)
> 	at org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:638)
> 	at org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
> 	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapOnly$Map.collect(PigMapOnly.java:48)
> 	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:239)
> 	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:232)
> 	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:53)
> 	at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
> 	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:763)
> 	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:369)
> 	at org.apache.hadoop.mapred.Child$4.run(Child.java:259)
> 	at java.security.AccessController.doPrivileged(Native Method)
> 	at javax.security.auth.Subject.doAs(Subject.java:396)
> 	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
> 	at org.apache.hadoop.mapred.Child.main(Child.java:253)
> {code}
> and the request failed.
> {code}
> grunt> test = LOAD 'cassandra://Test/test' USING CassandraStorage() AS (rowkey:chararray, columns: bag {T: (name:long, value:int)});
> grunt> value_test = foreach test generate rowkey, columns.value;
> grunt> dump value_test;
> {code}
> This time, without the column name, it's work (but the value are displayed as char instead of integer). Result:
> {code}
> (row1,{(#),($),(&)})
> (row2,{(-),(*),(!)})
> {code}
> Now we do the same test but we set a comparator to the CF.
> {code}
> [default@Test] create column family test with comparator = 'LongType';
> [default@Test] set test[ascii('row1')][long(1)]=integer(35);
> set test[ascii('row1')][long(2)]=integer(36);
> set test[ascii('row1')][long(3)]=integer(38);
> set test[ascii('row2')][long(1)]=integer(45);
> set test[ascii('row2')][long(2)]=integer(42);
> set test[ascii('row2')][long(3)]=integer(33);
> [default@Test] list test;
> Using default limit of 100
> -------------------
> RowKey: 726f7731
> => (column=1, value=35, timestamp=1308748643506000)
> => (column=2, value=36, timestamp=1308748643508000)
> => (column=3, value=38, timestamp=1308748643509000)
> -------------------
> RowKey: 726f7732
> => (column=1, value=45, timestamp=1308748643510000)
> => (column=2, value=42, timestamp=1308748643512000)
> => (column=3, value=33, timestamp=1308748645138000)
> 2 Rows Returned.
> [default@Test] describe keyspace;
> Keyspace: Test:
>   Replication Strategy: org.apache.cassandra.locator.SimpleStrategy
>   Durable Writes: true
>     Options: [replication_factor:1]
>   Column Families:
>     ColumnFamily: test
>       Key Validation Class: org.apache.cassandra.db.marshal.BytesType
>       Default column value validator: org.apache.cassandra.db.marshal.BytesType
>       Columns sorted by: org.apache.cassandra.db.marshal.LongType
>       Row cache size / save period in seconds: 0.0/0
>       Key cache size / save period in seconds: 200000.0/14400
>       Memtable thresholds: 0.571875/122/1440 (millions of ops/MB/minutes)
>       GC grace seconds: 864000
>       Compaction min/max thresholds: 4/32
>       Read repair chance: 1.0
>       Replicate on write: false
>       Built indexes: []
> {code}
> {code}
> grunt> test = LOAD 'cassandra://Test/test' USING CassandraStorage() AS (rowkey:chararray, columns: bag {T: (name:long, value:int)});
> grunt> value_test = foreach test generate rowkey, columns.name, columns.value;
> grunt> dump value_test;
> {code}
> This time it's work as expected (appart from the value displayed as char). Result:
> {code}
> (row1,{(1),(2),(3)},{(#),($),(&)})
> (row2,{(1),(2),(3)},{(-),(*),(!)})
> {code}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

[jira] [Commented] (CASSANDRA-2810) RuntimeException in Pig when using "dump" command on column name

Posted by "Steeve Morin (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-2810?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13116818#comment-13116818 ] 

Steeve Morin commented on CASSANDRA-2810:
-----------------------------------------

Fixed it for me on Pig 0.9 and Cassandra 0.8.6 (Brisk).
                
> RuntimeException in Pig when using "dump" command on column name
> ----------------------------------------------------------------
>
>                 Key: CASSANDRA-2810
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2810
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 0.8.1
>         Environment: Ubuntu 10.10, 32 bits
> java version "1.6.0_24"
> Brisk beta-2 installed from Debian packages
>            Reporter: Silvère Lestang
>            Assignee: Brandon Williams
>         Attachments: 2810-v2.txt, 2810-v3.txt, 2810.txt
>
>
> This bug was previously report on [Brisk bug tracker|https://datastax.jira.com/browse/BRISK-232].
> In cassandra-cli:
> {code}
> [default@unknown] create keyspace Test
>     with placement_strategy = 'org.apache.cassandra.locator.SimpleStrategy'
>     and strategy_options = [{replication_factor:1}];
> [default@unknown] use Test;
> Authenticated to keyspace: Test
> [default@Test] create column family test;
> [default@Test] set test[ascii('row1')][long(1)]=integer(35);
> set test[ascii('row1')][long(2)]=integer(36);
> set test[ascii('row1')][long(3)]=integer(38);
> set test[ascii('row2')][long(1)]=integer(45);
> set test[ascii('row2')][long(2)]=integer(42);
> set test[ascii('row2')][long(3)]=integer(33);
> [default@Test] list test;
> Using default limit of 100
> -------------------
> RowKey: 726f7731
> => (column=0000000000000001, value=35, timestamp=1308744931122000)
> => (column=0000000000000002, value=36, timestamp=1308744931124000)
> => (column=0000000000000003, value=38, timestamp=1308744931125000)
> -------------------
> RowKey: 726f7732
> => (column=0000000000000001, value=45, timestamp=1308744931127000)
> => (column=0000000000000002, value=42, timestamp=1308744931128000)
> => (column=0000000000000003, value=33, timestamp=1308744932722000)
> 2 Rows Returned.
> [default@Test] describe keyspace;
> Keyspace: Test:
>   Replication Strategy: org.apache.cassandra.locator.SimpleStrategy
>   Durable Writes: true
>     Options: [replication_factor:1]
>   Column Families:
>     ColumnFamily: test
>       Key Validation Class: org.apache.cassandra.db.marshal.BytesType
>       Default column value validator: org.apache.cassandra.db.marshal.BytesType
>       Columns sorted by: org.apache.cassandra.db.marshal.BytesType
>       Row cache size / save period in seconds: 0.0/0
>       Key cache size / save period in seconds: 200000.0/14400
>       Memtable thresholds: 0.571875/122/1440 (millions of ops/MB/minutes)
>       GC grace seconds: 864000
>       Compaction min/max thresholds: 4/32
>       Read repair chance: 1.0
>       Replicate on write: false
>       Built indexes: []
> {code}
> In Pig command line:
> {code}
> grunt> test = LOAD 'cassandra://Test/test' USING CassandraStorage() AS (rowkey:chararray, columns: bag {T: (name:long, value:int)});
> grunt> value_test = foreach test generate rowkey, columns.name, columns.value;
> grunt> dump value_test;
> {code}
> In /var/log/cassandra/system.log, I have severals time this exception:
> {code}
> INFO [IPC Server handler 3 on 8012] 2011-06-22 15:03:28,533 TaskInProgress.java (line 551) Error from attempt_201106210955_0051_m_000000_3: java.lang.RuntimeException: Unexpected data type -1 found in stream.
> 	at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:478)
> 	at org.apache.pig.data.BinInterSedes.writeTuple(BinInterSedes.java:541)
> 	at org.apache.pig.data.BinInterSedes.writeBag(BinInterSedes.java:522)
> 	at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:361)
> 	at org.apache.pig.data.BinInterSedes.writeTuple(BinInterSedes.java:541)
> 	at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:357)
> 	at org.apache.pig.impl.io.InterRecordWriter.write(InterRecordWriter.java:73)
> 	at org.apache.pig.impl.io.InterStorage.putNext(InterStorage.java:87)
> 	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:138)
> 	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:97)
> 	at org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:638)
> 	at org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
> 	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapOnly$Map.collect(PigMapOnly.java:48)
> 	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:239)
> 	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:232)
> 	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:53)
> 	at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
> 	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:763)
> 	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:369)
> 	at org.apache.hadoop.mapred.Child$4.run(Child.java:259)
> 	at java.security.AccessController.doPrivileged(Native Method)
> 	at javax.security.auth.Subject.doAs(Subject.java:396)
> 	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
> 	at org.apache.hadoop.mapred.Child.main(Child.java:253)
> {code}
> and the request failed.
> {code}
> grunt> test = LOAD 'cassandra://Test/test' USING CassandraStorage() AS (rowkey:chararray, columns: bag {T: (name:long, value:int)});
> grunt> value_test = foreach test generate rowkey, columns.value;
> grunt> dump value_test;
> {code}
> This time, without the column name, it's work (but the value are displayed as char instead of integer). Result:
> {code}
> (row1,{(#),($),(&)})
> (row2,{(-),(*),(!)})
> {code}
> Now we do the same test but we set a comparator to the CF.
> {code}
> [default@Test] create column family test with comparator = 'LongType';
> [default@Test] set test[ascii('row1')][long(1)]=integer(35);
> set test[ascii('row1')][long(2)]=integer(36);
> set test[ascii('row1')][long(3)]=integer(38);
> set test[ascii('row2')][long(1)]=integer(45);
> set test[ascii('row2')][long(2)]=integer(42);
> set test[ascii('row2')][long(3)]=integer(33);
> [default@Test] list test;
> Using default limit of 100
> -------------------
> RowKey: 726f7731
> => (column=1, value=35, timestamp=1308748643506000)
> => (column=2, value=36, timestamp=1308748643508000)
> => (column=3, value=38, timestamp=1308748643509000)
> -------------------
> RowKey: 726f7732
> => (column=1, value=45, timestamp=1308748643510000)
> => (column=2, value=42, timestamp=1308748643512000)
> => (column=3, value=33, timestamp=1308748645138000)
> 2 Rows Returned.
> [default@Test] describe keyspace;
> Keyspace: Test:
>   Replication Strategy: org.apache.cassandra.locator.SimpleStrategy
>   Durable Writes: true
>     Options: [replication_factor:1]
>   Column Families:
>     ColumnFamily: test
>       Key Validation Class: org.apache.cassandra.db.marshal.BytesType
>       Default column value validator: org.apache.cassandra.db.marshal.BytesType
>       Columns sorted by: org.apache.cassandra.db.marshal.LongType
>       Row cache size / save period in seconds: 0.0/0
>       Key cache size / save period in seconds: 200000.0/14400
>       Memtable thresholds: 0.571875/122/1440 (millions of ops/MB/minutes)
>       GC grace seconds: 864000
>       Compaction min/max thresholds: 4/32
>       Read repair chance: 1.0
>       Replicate on write: false
>       Built indexes: []
> {code}
> {code}
> grunt> test = LOAD 'cassandra://Test/test' USING CassandraStorage() AS (rowkey:chararray, columns: bag {T: (name:long, value:int)});
> grunt> value_test = foreach test generate rowkey, columns.name, columns.value;
> grunt> dump value_test;
> {code}
> This time it's work as expected (appart from the value displayed as char). Result:
> {code}
> (row1,{(1),(2),(3)},{(#),($),(&)})
> (row2,{(1),(2),(3)},{(-),(*),(!)})
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

[jira] [Commented] (CASSANDRA-2810) RuntimeException in Pig when using "dump" command on column name

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-2810?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13054485#comment-13054485 ] 

Jonathan Ellis commented on CASSANDRA-2810:
-------------------------------------------

DataByteArray is some kind of Pig thing?

> RuntimeException in Pig when using "dump" command on column name
> ----------------------------------------------------------------
>
>                 Key: CASSANDRA-2810
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2810
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 0.8.1
>         Environment: Ubuntu 10.10, 32 bits
> java version "1.6.0_24"
> Brisk beta-2 installed from Debian packages
>            Reporter: Silvère Lestang
>            Assignee: Brandon Williams
>         Attachments: 2810.txt
>
>
> This bug was previously report on [Brisk bug tracker|https://datastax.jira.com/browse/BRISK-232].
> In cassandra-cli:
> {code}
> [default@unknown] create keyspace Test
>     with placement_strategy = 'org.apache.cassandra.locator.SimpleStrategy'
>     and strategy_options = [{replication_factor:1}];
> [default@unknown] use Test;
> Authenticated to keyspace: Test
> [default@Test] create column family test;
> [default@Test] set test[ascii('row1')][long(1)]=integer(35);
> set test[ascii('row1')][long(2)]=integer(36);
> set test[ascii('row1')][long(3)]=integer(38);
> set test[ascii('row2')][long(1)]=integer(45);
> set test[ascii('row2')][long(2)]=integer(42);
> set test[ascii('row2')][long(3)]=integer(33);
> [default@Test] list test;
> Using default limit of 100
> -------------------
> RowKey: 726f7731
> => (column=0000000000000001, value=35, timestamp=1308744931122000)
> => (column=0000000000000002, value=36, timestamp=1308744931124000)
> => (column=0000000000000003, value=38, timestamp=1308744931125000)
> -------------------
> RowKey: 726f7732
> => (column=0000000000000001, value=45, timestamp=1308744931127000)
> => (column=0000000000000002, value=42, timestamp=1308744931128000)
> => (column=0000000000000003, value=33, timestamp=1308744932722000)
> 2 Rows Returned.
> [default@Test] describe keyspace;
> Keyspace: Test:
>   Replication Strategy: org.apache.cassandra.locator.SimpleStrategy
>   Durable Writes: true
>     Options: [replication_factor:1]
>   Column Families:
>     ColumnFamily: test
>       Key Validation Class: org.apache.cassandra.db.marshal.BytesType
>       Default column value validator: org.apache.cassandra.db.marshal.BytesType
>       Columns sorted by: org.apache.cassandra.db.marshal.BytesType
>       Row cache size / save period in seconds: 0.0/0
>       Key cache size / save period in seconds: 200000.0/14400
>       Memtable thresholds: 0.571875/122/1440 (millions of ops/MB/minutes)
>       GC grace seconds: 864000
>       Compaction min/max thresholds: 4/32
>       Read repair chance: 1.0
>       Replicate on write: false
>       Built indexes: []
> {code}
> In Pig command line:
> {code}
> grunt> test = LOAD 'cassandra://Test/test' USING CassandraStorage() AS (rowkey:chararray, columns: bag {T: (name:long, value:int)});
> grunt> value_test = foreach test generate rowkey, columns.name, columns.value;
> grunt> dump value_test;
> {code}
> In /var/log/cassandra/system.log, I have severals time this exception:
> {code}
> INFO [IPC Server handler 3 on 8012] 2011-06-22 15:03:28,533 TaskInProgress.java (line 551) Error from attempt_201106210955_0051_m_000000_3: java.lang.RuntimeException: Unexpected data type -1 found in stream.
> 	at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:478)
> 	at org.apache.pig.data.BinInterSedes.writeTuple(BinInterSedes.java:541)
> 	at org.apache.pig.data.BinInterSedes.writeBag(BinInterSedes.java:522)
> 	at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:361)
> 	at org.apache.pig.data.BinInterSedes.writeTuple(BinInterSedes.java:541)
> 	at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:357)
> 	at org.apache.pig.impl.io.InterRecordWriter.write(InterRecordWriter.java:73)
> 	at org.apache.pig.impl.io.InterStorage.putNext(InterStorage.java:87)
> 	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:138)
> 	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:97)
> 	at org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:638)
> 	at org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
> 	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapOnly$Map.collect(PigMapOnly.java:48)
> 	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:239)
> 	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:232)
> 	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:53)
> 	at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
> 	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:763)
> 	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:369)
> 	at org.apache.hadoop.mapred.Child$4.run(Child.java:259)
> 	at java.security.AccessController.doPrivileged(Native Method)
> 	at javax.security.auth.Subject.doAs(Subject.java:396)
> 	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
> 	at org.apache.hadoop.mapred.Child.main(Child.java:253)
> {code}
> and the request failed.
> {code}
> grunt> test = LOAD 'cassandra://Test/test' USING CassandraStorage() AS (rowkey:chararray, columns: bag {T: (name:long, value:int)});
> grunt> value_test = foreach test generate rowkey, columns.value;
> grunt> dump value_test;
> {code}
> This time, without the column name, it's work (but the value are displayed as char instead of integer). Result:
> {code}
> (row1,{(#),($),(&)})
> (row2,{(-),(*),(!)})
> {code}
> Now we do the same test but we set a comparator to the CF.
> {code}
> [default@Test] create column family test with comparator = 'LongType';
> [default@Test] set test[ascii('row1')][long(1)]=integer(35);
> set test[ascii('row1')][long(2)]=integer(36);
> set test[ascii('row1')][long(3)]=integer(38);
> set test[ascii('row2')][long(1)]=integer(45);
> set test[ascii('row2')][long(2)]=integer(42);
> set test[ascii('row2')][long(3)]=integer(33);
> [default@Test] list test;
> Using default limit of 100
> -------------------
> RowKey: 726f7731
> => (column=1, value=35, timestamp=1308748643506000)
> => (column=2, value=36, timestamp=1308748643508000)
> => (column=3, value=38, timestamp=1308748643509000)
> -------------------
> RowKey: 726f7732
> => (column=1, value=45, timestamp=1308748643510000)
> => (column=2, value=42, timestamp=1308748643512000)
> => (column=3, value=33, timestamp=1308748645138000)
> 2 Rows Returned.
> [default@Test] describe keyspace;
> Keyspace: Test:
>   Replication Strategy: org.apache.cassandra.locator.SimpleStrategy
>   Durable Writes: true
>     Options: [replication_factor:1]
>   Column Families:
>     ColumnFamily: test
>       Key Validation Class: org.apache.cassandra.db.marshal.BytesType
>       Default column value validator: org.apache.cassandra.db.marshal.BytesType
>       Columns sorted by: org.apache.cassandra.db.marshal.LongType
>       Row cache size / save period in seconds: 0.0/0
>       Key cache size / save period in seconds: 200000.0/14400
>       Memtable thresholds: 0.571875/122/1440 (millions of ops/MB/minutes)
>       GC grace seconds: 864000
>       Compaction min/max thresholds: 4/32
>       Read repair chance: 1.0
>       Replicate on write: false
>       Built indexes: []
> {code}
> {code}
> grunt> test = LOAD 'cassandra://Test/test' USING CassandraStorage() AS (rowkey:chararray, columns: bag {T: (name:long, value:int)});
> grunt> value_test = foreach test generate rowkey, columns.name, columns.value;
> grunt> dump value_test;
> {code}
> This time it's work as expected (appart from the value displayed as char). Result:
> {code}
> (row1,{(1),(2),(3)},{(#),($),(&)})
> (row2,{(1),(2),(3)},{(-),(*),(!)})
> {code}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

[jira] [Updated] (CASSANDRA-2810) RuntimeException in Pig when using "dump" command on column name

Posted by "Brandon Williams (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-2810?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Brandon Williams updated CASSANDRA-2810:
----------------------------------------

    Reviewer: jeromatron

> RuntimeException in Pig when using "dump" command on column name
> ----------------------------------------------------------------
>
>                 Key: CASSANDRA-2810
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2810
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 0.8.1
>         Environment: Ubuntu 10.10, 32 bits
> java version "1.6.0_24"
> Brisk beta-2 installed from Debian packages
>            Reporter: Silvère Lestang
>            Assignee: Brandon Williams
>         Attachments: 2810-v2.txt, 2810-v3.txt, 2810.txt
>
>
> This bug was previously report on [Brisk bug tracker|https://datastax.jira.com/browse/BRISK-232].
> In cassandra-cli:
> {code}
> [default@unknown] create keyspace Test
>     with placement_strategy = 'org.apache.cassandra.locator.SimpleStrategy'
>     and strategy_options = [{replication_factor:1}];
> [default@unknown] use Test;
> Authenticated to keyspace: Test
> [default@Test] create column family test;
> [default@Test] set test[ascii('row1')][long(1)]=integer(35);
> set test[ascii('row1')][long(2)]=integer(36);
> set test[ascii('row1')][long(3)]=integer(38);
> set test[ascii('row2')][long(1)]=integer(45);
> set test[ascii('row2')][long(2)]=integer(42);
> set test[ascii('row2')][long(3)]=integer(33);
> [default@Test] list test;
> Using default limit of 100
> -------------------
> RowKey: 726f7731
> => (column=0000000000000001, value=35, timestamp=1308744931122000)
> => (column=0000000000000002, value=36, timestamp=1308744931124000)
> => (column=0000000000000003, value=38, timestamp=1308744931125000)
> -------------------
> RowKey: 726f7732
> => (column=0000000000000001, value=45, timestamp=1308744931127000)
> => (column=0000000000000002, value=42, timestamp=1308744931128000)
> => (column=0000000000000003, value=33, timestamp=1308744932722000)
> 2 Rows Returned.
> [default@Test] describe keyspace;
> Keyspace: Test:
>   Replication Strategy: org.apache.cassandra.locator.SimpleStrategy
>   Durable Writes: true
>     Options: [replication_factor:1]
>   Column Families:
>     ColumnFamily: test
>       Key Validation Class: org.apache.cassandra.db.marshal.BytesType
>       Default column value validator: org.apache.cassandra.db.marshal.BytesType
>       Columns sorted by: org.apache.cassandra.db.marshal.BytesType
>       Row cache size / save period in seconds: 0.0/0
>       Key cache size / save period in seconds: 200000.0/14400
>       Memtable thresholds: 0.571875/122/1440 (millions of ops/MB/minutes)
>       GC grace seconds: 864000
>       Compaction min/max thresholds: 4/32
>       Read repair chance: 1.0
>       Replicate on write: false
>       Built indexes: []
> {code}
> In Pig command line:
> {code}
> grunt> test = LOAD 'cassandra://Test/test' USING CassandraStorage() AS (rowkey:chararray, columns: bag {T: (name:long, value:int)});
> grunt> value_test = foreach test generate rowkey, columns.name, columns.value;
> grunt> dump value_test;
> {code}
> In /var/log/cassandra/system.log, I have severals time this exception:
> {code}
> INFO [IPC Server handler 3 on 8012] 2011-06-22 15:03:28,533 TaskInProgress.java (line 551) Error from attempt_201106210955_0051_m_000000_3: java.lang.RuntimeException: Unexpected data type -1 found in stream.
> 	at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:478)
> 	at org.apache.pig.data.BinInterSedes.writeTuple(BinInterSedes.java:541)
> 	at org.apache.pig.data.BinInterSedes.writeBag(BinInterSedes.java:522)
> 	at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:361)
> 	at org.apache.pig.data.BinInterSedes.writeTuple(BinInterSedes.java:541)
> 	at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:357)
> 	at org.apache.pig.impl.io.InterRecordWriter.write(InterRecordWriter.java:73)
> 	at org.apache.pig.impl.io.InterStorage.putNext(InterStorage.java:87)
> 	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:138)
> 	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:97)
> 	at org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:638)
> 	at org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
> 	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapOnly$Map.collect(PigMapOnly.java:48)
> 	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:239)
> 	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:232)
> 	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:53)
> 	at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
> 	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:763)
> 	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:369)
> 	at org.apache.hadoop.mapred.Child$4.run(Child.java:259)
> 	at java.security.AccessController.doPrivileged(Native Method)
> 	at javax.security.auth.Subject.doAs(Subject.java:396)
> 	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
> 	at org.apache.hadoop.mapred.Child.main(Child.java:253)
> {code}
> and the request failed.
> {code}
> grunt> test = LOAD 'cassandra://Test/test' USING CassandraStorage() AS (rowkey:chararray, columns: bag {T: (name:long, value:int)});
> grunt> value_test = foreach test generate rowkey, columns.value;
> grunt> dump value_test;
> {code}
> This time, without the column name, it's work (but the value are displayed as char instead of integer). Result:
> {code}
> (row1,{(#),($),(&)})
> (row2,{(-),(*),(!)})
> {code}
> Now we do the same test but we set a comparator to the CF.
> {code}
> [default@Test] create column family test with comparator = 'LongType';
> [default@Test] set test[ascii('row1')][long(1)]=integer(35);
> set test[ascii('row1')][long(2)]=integer(36);
> set test[ascii('row1')][long(3)]=integer(38);
> set test[ascii('row2')][long(1)]=integer(45);
> set test[ascii('row2')][long(2)]=integer(42);
> set test[ascii('row2')][long(3)]=integer(33);
> [default@Test] list test;
> Using default limit of 100
> -------------------
> RowKey: 726f7731
> => (column=1, value=35, timestamp=1308748643506000)
> => (column=2, value=36, timestamp=1308748643508000)
> => (column=3, value=38, timestamp=1308748643509000)
> -------------------
> RowKey: 726f7732
> => (column=1, value=45, timestamp=1308748643510000)
> => (column=2, value=42, timestamp=1308748643512000)
> => (column=3, value=33, timestamp=1308748645138000)
> 2 Rows Returned.
> [default@Test] describe keyspace;
> Keyspace: Test:
>   Replication Strategy: org.apache.cassandra.locator.SimpleStrategy
>   Durable Writes: true
>     Options: [replication_factor:1]
>   Column Families:
>     ColumnFamily: test
>       Key Validation Class: org.apache.cassandra.db.marshal.BytesType
>       Default column value validator: org.apache.cassandra.db.marshal.BytesType
>       Columns sorted by: org.apache.cassandra.db.marshal.LongType
>       Row cache size / save period in seconds: 0.0/0
>       Key cache size / save period in seconds: 200000.0/14400
>       Memtable thresholds: 0.571875/122/1440 (millions of ops/MB/minutes)
>       GC grace seconds: 864000
>       Compaction min/max thresholds: 4/32
>       Read repair chance: 1.0
>       Replicate on write: false
>       Built indexes: []
> {code}
> {code}
> grunt> test = LOAD 'cassandra://Test/test' USING CassandraStorage() AS (rowkey:chararray, columns: bag {T: (name:long, value:int)});
> grunt> value_test = foreach test generate rowkey, columns.name, columns.value;
> grunt> dump value_test;
> {code}
> This time it's work as expected (appart from the value displayed as char). Result:
> {code}
> (row1,{(1),(2),(3)},{(#),($),(&)})
> (row2,{(1),(2),(3)},{(-),(*),(!)})
> {code}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

[jira] [Updated] (CASSANDRA-2810) RuntimeException in Pig when using "dump" command on column name

Posted by "Brandon Williams (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-2810?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Brandon Williams updated CASSANDRA-2810:
----------------------------------------

    Attachment: 2810.txt

Patch to use a custom AbstractType in place of BytesType to nip this in the bud, rather than have a bunch of one-off checks.  Also fixes a bug where the supercolumn name is never set.

> RuntimeException in Pig when using "dump" command on column name
> ----------------------------------------------------------------
>
>                 Key: CASSANDRA-2810
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2810
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 0.8.1
>         Environment: Ubuntu 10.10, 32 bits
> java version "1.6.0_24"
> Brisk beta-2 installed from Debian packages
>            Reporter: Silvère Lestang
>         Attachments: 2810.txt
>
>
> This bug was previously report on [Brisk bug tracker|https://datastax.jira.com/browse/BRISK-232].
> In cassandra-cli:
> {code}
> [default@unknown] create keyspace Test
>     with placement_strategy = 'org.apache.cassandra.locator.SimpleStrategy'
>     and strategy_options = [{replication_factor:1}];
> [default@unknown] use Test;
> Authenticated to keyspace: Test
> [default@Test] create column family test;
> [default@Test] set test[ascii('row1')][long(1)]=integer(35);
> set test[ascii('row1')][long(2)]=integer(36);
> set test[ascii('row1')][long(3)]=integer(38);
> set test[ascii('row2')][long(1)]=integer(45);
> set test[ascii('row2')][long(2)]=integer(42);
> set test[ascii('row2')][long(3)]=integer(33);
> [default@Test] list test;
> Using default limit of 100
> -------------------
> RowKey: 726f7731
> => (column=0000000000000001, value=35, timestamp=1308744931122000)
> => (column=0000000000000002, value=36, timestamp=1308744931124000)
> => (column=0000000000000003, value=38, timestamp=1308744931125000)
> -------------------
> RowKey: 726f7732
> => (column=0000000000000001, value=45, timestamp=1308744931127000)
> => (column=0000000000000002, value=42, timestamp=1308744931128000)
> => (column=0000000000000003, value=33, timestamp=1308744932722000)
> 2 Rows Returned.
> [default@Test] describe keyspace;
> Keyspace: Test:
>   Replication Strategy: org.apache.cassandra.locator.SimpleStrategy
>   Durable Writes: true
>     Options: [replication_factor:1]
>   Column Families:
>     ColumnFamily: test
>       Key Validation Class: org.apache.cassandra.db.marshal.BytesType
>       Default column value validator: org.apache.cassandra.db.marshal.BytesType
>       Columns sorted by: org.apache.cassandra.db.marshal.BytesType
>       Row cache size / save period in seconds: 0.0/0
>       Key cache size / save period in seconds: 200000.0/14400
>       Memtable thresholds: 0.571875/122/1440 (millions of ops/MB/minutes)
>       GC grace seconds: 864000
>       Compaction min/max thresholds: 4/32
>       Read repair chance: 1.0
>       Replicate on write: false
>       Built indexes: []
> {code}
> In Pig command line:
> {code}
> grunt> test = LOAD 'cassandra://Test/test' USING CassandraStorage() AS (rowkey:chararray, columns: bag {T: (name:long, value:int)});
> grunt> value_test = foreach test generate rowkey, columns.name, columns.value;
> grunt> dump value_test;
> {code}
> In /var/log/cassandra/system.log, I have severals time this exception:
> {code}
> INFO [IPC Server handler 3 on 8012] 2011-06-22 15:03:28,533 TaskInProgress.java (line 551) Error from attempt_201106210955_0051_m_000000_3: java.lang.RuntimeException: Unexpected data type -1 found in stream.
> 	at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:478)
> 	at org.apache.pig.data.BinInterSedes.writeTuple(BinInterSedes.java:541)
> 	at org.apache.pig.data.BinInterSedes.writeBag(BinInterSedes.java:522)
> 	at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:361)
> 	at org.apache.pig.data.BinInterSedes.writeTuple(BinInterSedes.java:541)
> 	at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:357)
> 	at org.apache.pig.impl.io.InterRecordWriter.write(InterRecordWriter.java:73)
> 	at org.apache.pig.impl.io.InterStorage.putNext(InterStorage.java:87)
> 	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:138)
> 	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:97)
> 	at org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:638)
> 	at org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
> 	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapOnly$Map.collect(PigMapOnly.java:48)
> 	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:239)
> 	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:232)
> 	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:53)
> 	at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
> 	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:763)
> 	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:369)
> 	at org.apache.hadoop.mapred.Child$4.run(Child.java:259)
> 	at java.security.AccessController.doPrivileged(Native Method)
> 	at javax.security.auth.Subject.doAs(Subject.java:396)
> 	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
> 	at org.apache.hadoop.mapred.Child.main(Child.java:253)
> {code}
> and the request failed.
> {code}
> grunt> test = LOAD 'cassandra://Test/test' USING CassandraStorage() AS (rowkey:chararray, columns: bag {T: (name:long, value:int)});
> grunt> value_test = foreach test generate rowkey, columns.value;
> grunt> dump value_test;
> {code}
> This time, without the column name, it's work (but the value are displayed as char instead of integer). Result:
> {code}
> (row1,{(#),($),(&)})
> (row2,{(-),(*),(!)})
> {code}
> Now we do the same test but we set a comparator to the CF.
> {code}
> [default@Test] create column family test with comparator = 'LongType';
> [default@Test] set test[ascii('row1')][long(1)]=integer(35);
> set test[ascii('row1')][long(2)]=integer(36);
> set test[ascii('row1')][long(3)]=integer(38);
> set test[ascii('row2')][long(1)]=integer(45);
> set test[ascii('row2')][long(2)]=integer(42);
> set test[ascii('row2')][long(3)]=integer(33);
> [default@Test] list test;
> Using default limit of 100
> -------------------
> RowKey: 726f7731
> => (column=1, value=35, timestamp=1308748643506000)
> => (column=2, value=36, timestamp=1308748643508000)
> => (column=3, value=38, timestamp=1308748643509000)
> -------------------
> RowKey: 726f7732
> => (column=1, value=45, timestamp=1308748643510000)
> => (column=2, value=42, timestamp=1308748643512000)
> => (column=3, value=33, timestamp=1308748645138000)
> 2 Rows Returned.
> [default@Test] describe keyspace;
> Keyspace: Test:
>   Replication Strategy: org.apache.cassandra.locator.SimpleStrategy
>   Durable Writes: true
>     Options: [replication_factor:1]
>   Column Families:
>     ColumnFamily: test
>       Key Validation Class: org.apache.cassandra.db.marshal.BytesType
>       Default column value validator: org.apache.cassandra.db.marshal.BytesType
>       Columns sorted by: org.apache.cassandra.db.marshal.LongType
>       Row cache size / save period in seconds: 0.0/0
>       Key cache size / save period in seconds: 200000.0/14400
>       Memtable thresholds: 0.571875/122/1440 (millions of ops/MB/minutes)
>       GC grace seconds: 864000
>       Compaction min/max thresholds: 4/32
>       Read repair chance: 1.0
>       Replicate on write: false
>       Built indexes: []
> {code}
> {code}
> grunt> test = LOAD 'cassandra://Test/test' USING CassandraStorage() AS (rowkey:chararray, columns: bag {T: (name:long, value:int)});
> grunt> value_test = foreach test generate rowkey, columns.name, columns.value;
> grunt> dump value_test;
> {code}
> This time it's work as expected (appart from the value displayed as char). Result:
> {code}
> (row1,{(1),(2),(3)},{(#),($),(&)})
> (row2,{(1),(2),(3)},{(-),(*),(!)})
> {code}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

[jira] [Updated] (CASSANDRA-2810) RuntimeException in Pig when using "dump" command on column name

Posted by "Jonathan Ellis (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-2810?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Ellis updated CASSANDRA-2810:
--------------------------------------

    Fix Version/s: 1.0.1
    
> RuntimeException in Pig when using "dump" command on column name
> ----------------------------------------------------------------
>
>                 Key: CASSANDRA-2810
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2810
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 0.8.1
>         Environment: Ubuntu 10.10, 32 bits
> java version "1.6.0_24"
> Brisk beta-2 installed from Debian packages
>            Reporter: Silvère Lestang
>            Assignee: Brandon Williams
>             Fix For: 0.8.7
>
>         Attachments: 2810-v2.txt, 2810-v3.txt, 2810.txt
>
>
> This bug was previously report on [Brisk bug tracker|https://datastax.jira.com/browse/BRISK-232].
> In cassandra-cli:
> {code}
> [default@unknown] create keyspace Test
>     with placement_strategy = 'org.apache.cassandra.locator.SimpleStrategy'
>     and strategy_options = [{replication_factor:1}];
> [default@unknown] use Test;
> Authenticated to keyspace: Test
> [default@Test] create column family test;
> [default@Test] set test[ascii('row1')][long(1)]=integer(35);
> set test[ascii('row1')][long(2)]=integer(36);
> set test[ascii('row1')][long(3)]=integer(38);
> set test[ascii('row2')][long(1)]=integer(45);
> set test[ascii('row2')][long(2)]=integer(42);
> set test[ascii('row2')][long(3)]=integer(33);
> [default@Test] list test;
> Using default limit of 100
> -------------------
> RowKey: 726f7731
> => (column=0000000000000001, value=35, timestamp=1308744931122000)
> => (column=0000000000000002, value=36, timestamp=1308744931124000)
> => (column=0000000000000003, value=38, timestamp=1308744931125000)
> -------------------
> RowKey: 726f7732
> => (column=0000000000000001, value=45, timestamp=1308744931127000)
> => (column=0000000000000002, value=42, timestamp=1308744931128000)
> => (column=0000000000000003, value=33, timestamp=1308744932722000)
> 2 Rows Returned.
> [default@Test] describe keyspace;
> Keyspace: Test:
>   Replication Strategy: org.apache.cassandra.locator.SimpleStrategy
>   Durable Writes: true
>     Options: [replication_factor:1]
>   Column Families:
>     ColumnFamily: test
>       Key Validation Class: org.apache.cassandra.db.marshal.BytesType
>       Default column value validator: org.apache.cassandra.db.marshal.BytesType
>       Columns sorted by: org.apache.cassandra.db.marshal.BytesType
>       Row cache size / save period in seconds: 0.0/0
>       Key cache size / save period in seconds: 200000.0/14400
>       Memtable thresholds: 0.571875/122/1440 (millions of ops/MB/minutes)
>       GC grace seconds: 864000
>       Compaction min/max thresholds: 4/32
>       Read repair chance: 1.0
>       Replicate on write: false
>       Built indexes: []
> {code}
> In Pig command line:
> {code}
> grunt> test = LOAD 'cassandra://Test/test' USING CassandraStorage() AS (rowkey:chararray, columns: bag {T: (name:long, value:int)});
> grunt> value_test = foreach test generate rowkey, columns.name, columns.value;
> grunt> dump value_test;
> {code}
> In /var/log/cassandra/system.log, I have severals time this exception:
> {code}
> INFO [IPC Server handler 3 on 8012] 2011-06-22 15:03:28,533 TaskInProgress.java (line 551) Error from attempt_201106210955_0051_m_000000_3: java.lang.RuntimeException: Unexpected data type -1 found in stream.
> 	at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:478)
> 	at org.apache.pig.data.BinInterSedes.writeTuple(BinInterSedes.java:541)
> 	at org.apache.pig.data.BinInterSedes.writeBag(BinInterSedes.java:522)
> 	at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:361)
> 	at org.apache.pig.data.BinInterSedes.writeTuple(BinInterSedes.java:541)
> 	at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:357)
> 	at org.apache.pig.impl.io.InterRecordWriter.write(InterRecordWriter.java:73)
> 	at org.apache.pig.impl.io.InterStorage.putNext(InterStorage.java:87)
> 	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:138)
> 	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:97)
> 	at org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:638)
> 	at org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
> 	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapOnly$Map.collect(PigMapOnly.java:48)
> 	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:239)
> 	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:232)
> 	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:53)
> 	at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
> 	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:763)
> 	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:369)
> 	at org.apache.hadoop.mapred.Child$4.run(Child.java:259)
> 	at java.security.AccessController.doPrivileged(Native Method)
> 	at javax.security.auth.Subject.doAs(Subject.java:396)
> 	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
> 	at org.apache.hadoop.mapred.Child.main(Child.java:253)
> {code}
> and the request failed.
> {code}
> grunt> test = LOAD 'cassandra://Test/test' USING CassandraStorage() AS (rowkey:chararray, columns: bag {T: (name:long, value:int)});
> grunt> value_test = foreach test generate rowkey, columns.value;
> grunt> dump value_test;
> {code}
> This time, without the column name, it's work (but the value are displayed as char instead of integer). Result:
> {code}
> (row1,{(#),($),(&)})
> (row2,{(-),(*),(!)})
> {code}
> Now we do the same test but we set a comparator to the CF.
> {code}
> [default@Test] create column family test with comparator = 'LongType';
> [default@Test] set test[ascii('row1')][long(1)]=integer(35);
> set test[ascii('row1')][long(2)]=integer(36);
> set test[ascii('row1')][long(3)]=integer(38);
> set test[ascii('row2')][long(1)]=integer(45);
> set test[ascii('row2')][long(2)]=integer(42);
> set test[ascii('row2')][long(3)]=integer(33);
> [default@Test] list test;
> Using default limit of 100
> -------------------
> RowKey: 726f7731
> => (column=1, value=35, timestamp=1308748643506000)
> => (column=2, value=36, timestamp=1308748643508000)
> => (column=3, value=38, timestamp=1308748643509000)
> -------------------
> RowKey: 726f7732
> => (column=1, value=45, timestamp=1308748643510000)
> => (column=2, value=42, timestamp=1308748643512000)
> => (column=3, value=33, timestamp=1308748645138000)
> 2 Rows Returned.
> [default@Test] describe keyspace;
> Keyspace: Test:
>   Replication Strategy: org.apache.cassandra.locator.SimpleStrategy
>   Durable Writes: true
>     Options: [replication_factor:1]
>   Column Families:
>     ColumnFamily: test
>       Key Validation Class: org.apache.cassandra.db.marshal.BytesType
>       Default column value validator: org.apache.cassandra.db.marshal.BytesType
>       Columns sorted by: org.apache.cassandra.db.marshal.LongType
>       Row cache size / save period in seconds: 0.0/0
>       Key cache size / save period in seconds: 200000.0/14400
>       Memtable thresholds: 0.571875/122/1440 (millions of ops/MB/minutes)
>       GC grace seconds: 864000
>       Compaction min/max thresholds: 4/32
>       Read repair chance: 1.0
>       Replicate on write: false
>       Built indexes: []
> {code}
> {code}
> grunt> test = LOAD 'cassandra://Test/test' USING CassandraStorage() AS (rowkey:chararray, columns: bag {T: (name:long, value:int)});
> grunt> value_test = foreach test generate rowkey, columns.name, columns.value;
> grunt> dump value_test;
> {code}
> This time it's work as expected (appart from the value displayed as char). Result:
> {code}
> (row1,{(1),(2),(3)},{(#),($),(&)})
> (row2,{(1),(2),(3)},{(-),(*),(!)})
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira