You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Ralph Romanos (JIRA)" <ji...@apache.org> on 2012/10/16 10:59:03 UTC

[jira] [Created] (CASSANDRA-4813) Problem using BulkOutputFormat while streaming several SSTables simultaneously from a given node.

Ralph Romanos created CASSANDRA-4813:
----------------------------------------

             Summary: Problem using BulkOutputFormat while streaming several SSTables simultaneously from a given node.
                 Key: CASSANDRA-4813
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-4813
             Project: Cassandra
          Issue Type: Bug
    Affects Versions: 1.1.5, 1.1.3
         Environment: I am using SLES 10 SP3, 4 Cassandra + Hadoop nodes, 3 Hadoop only nodes (datanodes/tasktrackers), 1 namenode/jobtracker. The machines used are Six-Core AMD Opteron(tm) Processor 8431, 24 cores and 33 GB of RAM. I get the issue on both cassandra 1.1.3, 1.1.5 and I am using Hadoop 0.20.2.
            Reporter: Ralph Romanos


The issue occurs when streaming simultaneously SSTables from the same node to a cassandra cluster using SSTableloader. It seems to me that Cassandra cannot handle receiving simultaneously SSTables from the same node. However, when it receives simultaneously SSTables from two different nodes, everything works fine. As a consequence, when using BulkOutputFormat to generate SSTables and stream them to a cassandra cluster, I cannot use more than one reducer per node otherwise I get a java.io.EOFException in the tasktracker's logs and a java.io.IOException: Broken pipe in the Cassandra logs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (CASSANDRA-4813) Problem using BulkOutputFormat while streaming several SSTables simultaneously from a given node.

Posted by "Yuki Morishita (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-4813?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Yuki Morishita updated CASSANDRA-4813:
--------------------------------------

    Attachment:     (was: 4813.txt)
    
> Problem using BulkOutputFormat while streaming several SSTables simultaneously from a given node.
> -------------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-4813
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-4813
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 1.1.0
>         Environment: I am using SLES 10 SP3, Java 6, 4 Cassandra + Hadoop nodes, 3 Hadoop only nodes (datanodes/tasktrackers), 1 namenode/jobtracker. The machines used are Six-Core AMD Opteron(tm) Processor 8431, 24 cores and 33 GB of RAM. I get the issue on both cassandra 1.1.3, 1.1.5 and I am using Hadoop 0.20.2.
>            Reporter: Ralph Romanos
>            Assignee: Yuki Morishita
>            Priority: Minor
>              Labels: Bulkoutputformat, Hadoop, SSTables
>             Fix For: 1.2.0 rc1
>
>         Attachments: 4813.txt
>
>
> The issue occurs when streaming simultaneously SSTables from the same node to a cassandra cluster using SSTableloader. It seems to me that Cassandra cannot handle receiving simultaneously SSTables from the same node. However, when it receives simultaneously SSTables from two different nodes, everything works fine. As a consequence, when using BulkOutputFormat to generate SSTables and stream them to a cassandra cluster, I cannot use more than one reducer per node otherwise I get a java.io.EOFException in the tasktracker's logs and a java.io.IOException: Broken pipe in the Cassandra logs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-4813) Problem using BulkOutputFormat while streaming several SSTables simultaneously from a given node.

Posted by "Brandon Williams (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-4813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13477398#comment-13477398 ] 

Brandon Williams commented on CASSANDRA-4813:
---------------------------------------------

Can you post a stacktrace?
                
> Problem using BulkOutputFormat while streaming several SSTables simultaneously from a given node.
> -------------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-4813
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-4813
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 1.1.3, 1.1.5
>         Environment: I am using SLES 10 SP3, Java 6, 4 Cassandra + Hadoop nodes, 3 Hadoop only nodes (datanodes/tasktrackers), 1 namenode/jobtracker. The machines used are Six-Core AMD Opteron(tm) Processor 8431, 24 cores and 33 GB of RAM. I get the issue on both cassandra 1.1.3, 1.1.5 and I am using Hadoop 0.20.2.
>            Reporter: Ralph Romanos
>              Labels: Bulkoutputformat, Hadoop, SSTables
>
> The issue occurs when streaming simultaneously SSTables from the same node to a cassandra cluster using SSTableloader. It seems to me that Cassandra cannot handle receiving simultaneously SSTables from the same node. However, when it receives simultaneously SSTables from two different nodes, everything works fine. As a consequence, when using BulkOutputFormat to generate SSTables and stream them to a cassandra cluster, I cannot use more than one reducer per node otherwise I get a java.io.EOFException in the tasktracker's logs and a java.io.IOException: Broken pipe in the Cassandra logs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-4813) Problem using BulkOutputFormat while streaming several SSTables simultaneously from a given node.

Posted by "Yuki Morishita (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-4813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13496406#comment-13496406 ] 

Yuki Morishita commented on CASSANDRA-4813:
-------------------------------------------

[~mkjellman] OK, I did test on distributed hadoop cluster with multiple reducers and job succeeded.
At first I was getting the same error as yours (java.io.IOException: Broken pipe), but I figured out it was due to the mismatch in partitioner(My cluster was configured as RandomPartitioner and I used Murmur3Partitioner for BOF).
Can you check that also?
If that's not your case, can you upload Cassandra system.log?
                
> Problem using BulkOutputFormat while streaming several SSTables simultaneously from a given node.
> -------------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-4813
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-4813
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 1.1.0
>         Environment: I am using SLES 10 SP3, Java 6, 4 Cassandra + Hadoop nodes, 3 Hadoop only nodes (datanodes/tasktrackers), 1 namenode/jobtracker. The machines used are Six-Core AMD Opteron(tm) Processor 8431, 24 cores and 33 GB of RAM. I get the issue on both cassandra 1.1.3, 1.1.5 and I am using Hadoop 0.20.2.
>            Reporter: Ralph Romanos
>            Assignee: Yuki Morishita
>            Priority: Minor
>              Labels: Bulkoutputformat, Hadoop, SSTables
>             Fix For: 1.2.0 rc1
>
>         Attachments: 4813.txt
>
>
> The issue occurs when streaming simultaneously SSTables from the same node to a cassandra cluster using SSTableloader. It seems to me that Cassandra cannot handle receiving simultaneously SSTables from the same node. However, when it receives simultaneously SSTables from two different nodes, everything works fine. As a consequence, when using BulkOutputFormat to generate SSTables and stream them to a cassandra cluster, I cannot use more than one reducer per node otherwise I get a java.io.EOFException in the tasktracker's logs and a java.io.IOException: Broken pipe in the Cassandra logs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-4813) Problem using BulkOutputFormat while streaming several SSTables simultaneously from a given node.

Posted by "Yuki Morishita (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-4813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13482389#comment-13482389 ] 

Yuki Morishita commented on CASSANDRA-4813:
-------------------------------------------

Session counter is per JVM, so when you spawn two or more stream session on different JVM on same host, you will have chance to get same session ID, say (10.x.x.x, 1) on both JVM.
This is also true with Cassandra on one JVM and reducer on the other JVM on the same machine(but session collision in this set up is less likely than two reducers spawned simultaneously on the same node).
                
> Problem using BulkOutputFormat while streaming several SSTables simultaneously from a given node.
> -------------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-4813
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-4813
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 1.1.3, 1.1.5
>         Environment: I am using SLES 10 SP3, Java 6, 4 Cassandra + Hadoop nodes, 3 Hadoop only nodes (datanodes/tasktrackers), 1 namenode/jobtracker. The machines used are Six-Core AMD Opteron(tm) Processor 8431, 24 cores and 33 GB of RAM. I get the issue on both cassandra 1.1.3, 1.1.5 and I am using Hadoop 0.20.2.
>            Reporter: Ralph Romanos
>            Assignee: Yuki Morishita
>              Labels: Bulkoutputformat, Hadoop, SSTables
>
> The issue occurs when streaming simultaneously SSTables from the same node to a cassandra cluster using SSTableloader. It seems to me that Cassandra cannot handle receiving simultaneously SSTables from the same node. However, when it receives simultaneously SSTables from two different nodes, everything works fine. As a consequence, when using BulkOutputFormat to generate SSTables and stream them to a cassandra cluster, I cannot use more than one reducer per node otherwise I get a java.io.EOFException in the tasktracker's logs and a java.io.IOException: Broken pipe in the Cassandra logs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-4813) Problem using BulkOutputFormat while streaming several SSTables simultaneously from a given node.

Posted by "Ralph Romanos (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-4813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13487901#comment-13487901 ] 

Ralph Romanos commented on CASSANDRA-4813:
------------------------------------------

Me too, I will let u know as soon as I test it. Thank you for this quick patch Yuki.
                
> Problem using BulkOutputFormat while streaming several SSTables simultaneously from a given node.
> -------------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-4813
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-4813
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 1.1.0
>         Environment: I am using SLES 10 SP3, Java 6, 4 Cassandra + Hadoop nodes, 3 Hadoop only nodes (datanodes/tasktrackers), 1 namenode/jobtracker. The machines used are Six-Core AMD Opteron(tm) Processor 8431, 24 cores and 33 GB of RAM. I get the issue on both cassandra 1.1.3, 1.1.5 and I am using Hadoop 0.20.2.
>            Reporter: Ralph Romanos
>            Assignee: Yuki Morishita
>            Priority: Minor
>              Labels: Bulkoutputformat, Hadoop, SSTables
>             Fix For: 1.2.0
>
>         Attachments: 4813.txt
>
>
> The issue occurs when streaming simultaneously SSTables from the same node to a cassandra cluster using SSTableloader. It seems to me that Cassandra cannot handle receiving simultaneously SSTables from the same node. However, when it receives simultaneously SSTables from two different nodes, everything works fine. As a consequence, when using BulkOutputFormat to generate SSTables and stream them to a cassandra cluster, I cannot use more than one reducer per node otherwise I get a java.io.EOFException in the tasktracker's logs and a java.io.IOException: Broken pipe in the Cassandra logs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-4813) Problem using BulkOutputFormat while streaming several SSTables simultaneously from a given node.

Posted by "Michael Kjellman (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-4813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13480351#comment-13480351 ] 

Michael Kjellman commented on CASSANDRA-4813:
---------------------------------------------

just reproduced this again with one reducer.
                
> Problem using BulkOutputFormat while streaming several SSTables simultaneously from a given node.
> -------------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-4813
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-4813
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 1.1.3, 1.1.5
>         Environment: I am using SLES 10 SP3, Java 6, 4 Cassandra + Hadoop nodes, 3 Hadoop only nodes (datanodes/tasktrackers), 1 namenode/jobtracker. The machines used are Six-Core AMD Opteron(tm) Processor 8431, 24 cores and 33 GB of RAM. I get the issue on both cassandra 1.1.3, 1.1.5 and I am using Hadoop 0.20.2.
>            Reporter: Ralph Romanos
>            Assignee: Yuki Morishita
>              Labels: Bulkoutputformat, Hadoop, SSTables
>
> The issue occurs when streaming simultaneously SSTables from the same node to a cassandra cluster using SSTableloader. It seems to me that Cassandra cannot handle receiving simultaneously SSTables from the same node. However, when it receives simultaneously SSTables from two different nodes, everything works fine. As a consequence, when using BulkOutputFormat to generate SSTables and stream them to a cassandra cluster, I cannot use more than one reducer per node otherwise I get a java.io.EOFException in the tasktracker's logs and a java.io.IOException: Broken pipe in the Cassandra logs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Comment Edited] (CASSANDRA-4813) Problem using BulkOutputFormat while streaming several SSTables simultaneously from a given node.

Posted by "Ralph Romanos (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-4813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13477678#comment-13477678 ] 

Ralph Romanos edited comment on CASSANDRA-4813 at 10/17/12 1:50 PM:
--------------------------------------------------------------------

I get the following error in the tasktracker's logs when SSTables 
are streamed into the Cassandra cluster:

Exception in thread "Streaming to /172.16.110.79:1" java.lang.RuntimeException: java.io.EOFException
	at org.apache.cassandra.utils.FBUtilities.unchecked(FBUtilities.java:628)
	at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:34)
	at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
	at java.lang.Thread.run(Unknown Source)
Caused by: java.io.EOFException
	at java.io.DataInputStream.readInt(Unknown Source)
	at org.apache.cassandra.streaming.FileStreamTask.receiveReply(FileStreamTask.java:194)
	at org.apache.cassandra.streaming.FileStreamTask.stream(FileStreamTask.java:181)
	at org.apache.cassandra.streaming.FileStreamTask.runMayThrow(FileStreamTask.java:94)
	at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)
	... 3 more
Exception in thread "Streaming to /172.16.110.92:1" java.lang.RuntimeException: java.io.EOFException
	at org.apache.cassandra.utils.FBUtilities.unchecked(FBUtilities.java:628)
	at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:34)
	at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
	at java.lang.Thread.run(Unknown Source)
Caused by: java.io.EOFException
	at java.io.DataInputStream.readInt(Unknown Source)
	at org.apache.cassandra.streaming.FileStreamTask.receiveReply(FileStreamTask.java:194)
	at org.apache.cassandra.streaming.FileStreamTask.stream(FileStreamTask.java:181)
	at org.apache.cassandra.streaming.FileStreamTask.runMayThrow(FileStreamTask.java:94)
	at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)
	... 3 more
                
      was (Author: ralph.romanos):
    I get the following error in the tasktracker's logs when they 
are streamed into the Cassandra cluster:

Exception in thread "Streaming to /172.16.110.79:1" java.lang.RuntimeException: java.io.EOFException
	at org.apache.cassandra.utils.FBUtilities.unchecked(FBUtilities.java:628)
	at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:34)
	at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
	at java.lang.Thread.run(Unknown Source)
Caused by: java.io.EOFException
	at java.io.DataInputStream.readInt(Unknown Source)
	at org.apache.cassandra.streaming.FileStreamTask.receiveReply(FileStreamTask.java:194)
	at org.apache.cassandra.streaming.FileStreamTask.stream(FileStreamTask.java:181)
	at org.apache.cassandra.streaming.FileStreamTask.runMayThrow(FileStreamTask.java:94)
	at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)
	... 3 more
Exception in thread "Streaming to /172.16.110.92:1" java.lang.RuntimeException: java.io.EOFException
	at org.apache.cassandra.utils.FBUtilities.unchecked(FBUtilities.java:628)
	at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:34)
	at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
	at java.lang.Thread.run(Unknown Source)
Caused by: java.io.EOFException
	at java.io.DataInputStream.readInt(Unknown Source)
	at org.apache.cassandra.streaming.FileStreamTask.receiveReply(FileStreamTask.java:194)
	at org.apache.cassandra.streaming.FileStreamTask.stream(FileStreamTask.java:181)
	at org.apache.cassandra.streaming.FileStreamTask.runMayThrow(FileStreamTask.java:94)
	at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)
	... 3 more
                  
> Problem using BulkOutputFormat while streaming several SSTables simultaneously from a given node.
> -------------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-4813
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-4813
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 1.1.3, 1.1.5
>         Environment: I am using SLES 10 SP3, Java 6, 4 Cassandra + Hadoop nodes, 3 Hadoop only nodes (datanodes/tasktrackers), 1 namenode/jobtracker. The machines used are Six-Core AMD Opteron(tm) Processor 8431, 24 cores and 33 GB of RAM. I get the issue on both cassandra 1.1.3, 1.1.5 and I am using Hadoop 0.20.2.
>            Reporter: Ralph Romanos
>              Labels: Bulkoutputformat, Hadoop, SSTables
>
> The issue occurs when streaming simultaneously SSTables from the same node to a cassandra cluster using SSTableloader. It seems to me that Cassandra cannot handle receiving simultaneously SSTables from the same node. However, when it receives simultaneously SSTables from two different nodes, everything works fine. As a consequence, when using BulkOutputFormat to generate SSTables and stream them to a cassandra cluster, I cannot use more than one reducer per node otherwise I get a java.io.EOFException in the tasktracker's logs and a java.io.IOException: Broken pipe in the Cassandra logs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-4813) Problem using BulkOutputFormat while streaming several SSTables simultaneously from a given node.

Posted by "Yuki Morishita (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-4813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13489106#comment-13489106 ] 

Yuki Morishita commented on CASSANDRA-4813:
-------------------------------------------

Michael,

I think it is not related to this issue.
Do you upgrade from which version? '-b' in system-local-KeyCache-b.db indicates it's from version 1.2.
Also, it possible that the saved cache file is really corrupted for some reason(shutting down in the middle of cache saving?), so C* was just WARNing you about it.
                
> Problem using BulkOutputFormat while streaming several SSTables simultaneously from a given node.
> -------------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-4813
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-4813
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 1.1.0
>         Environment: I am using SLES 10 SP3, Java 6, 4 Cassandra + Hadoop nodes, 3 Hadoop only nodes (datanodes/tasktrackers), 1 namenode/jobtracker. The machines used are Six-Core AMD Opteron(tm) Processor 8431, 24 cores and 33 GB of RAM. I get the issue on both cassandra 1.1.3, 1.1.5 and I am using Hadoop 0.20.2.
>            Reporter: Ralph Romanos
>            Assignee: Yuki Morishita
>            Priority: Minor
>              Labels: Bulkoutputformat, Hadoop, SSTables
>             Fix For: 1.2.0
>
>         Attachments: 4813.txt
>
>
> The issue occurs when streaming simultaneously SSTables from the same node to a cassandra cluster using SSTableloader. It seems to me that Cassandra cannot handle receiving simultaneously SSTables from the same node. However, when it receives simultaneously SSTables from two different nodes, everything works fine. As a consequence, when using BulkOutputFormat to generate SSTables and stream them to a cassandra cluster, I cannot use more than one reducer per node otherwise I get a java.io.EOFException in the tasktracker's logs and a java.io.IOException: Broken pipe in the Cassandra logs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Comment Edited] (CASSANDRA-4813) Problem using BulkOutputFormat while streaming several SSTables simultaneously from a given node.

Posted by "Michael Kjellman (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-4813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13488774#comment-13488774 ] 

Michael Kjellman edited comment on CASSANDRA-4813 at 11/1/12 3:53 PM:
----------------------------------------------------------------------

[~yukim] not sure if this will be an issue for "normal" upgraders or if this was because i went between patch versions but just wanted you to be aware of the EOF i saw on startup. Node only has the system schema at this point.

WARN 08:40:29,821 error reading saved cache /ssd/saved_caches/system-local-KeyCache-b.db
java.io.EOFException
	at java.io.DataInputStream.readInt(DataInputStream.java:392)
	at org.apache.cassandra.utils.ByteBufferUtil.readWithLength(ByteBufferUtil.java:349)
	at org.apache.cassandra.service.CacheService$KeyCacheSerializer.deserialize(CacheService.java:378)
	at org.apache.cassandra.cache.AutoSavingCache.loadSaved(AutoSavingCache.java:144)
	at org.apache.cassandra.db.ColumnFamilyStore.<init>(ColumnFamilyStore.java:278)
	at org.apache.cassandra.db.ColumnFamilyStore.createColumnFamilyStore(ColumnFamilyStore.java:393)
	at org.apache.cassandra.db.ColumnFamilyStore.createColumnFamilyStore(ColumnFamilyStore.java:365)
	at org.apache.cassandra.db.Table.initCf(Table.java:334)
	at org.apache.cassandra.db.Table.<init>(Table.java:272)
	at org.apache.cassandra.db.Table.open(Table.java:102)
	at org.apache.cassandra.db.Table.open(Table.java:80)
	at org.apache.cassandra.db.SystemTable.checkHealth(SystemTable.java:320)
	at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:203)
	at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:395)
	at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:438)

Edit: actually thinking this is a totally unrelated bug?
                
      was (Author: mkjellman):
    [~yukim] not sure if this will be an issue for "normal" upgraders or if this was because i went between patch versions but just wanted you to be aware of the EOF i saw on startup. Node only has the system schema at this point.

WARN 08:40:29,821 error reading saved cache /ssd/saved_caches/system-local-KeyCache-b.db
java.io.EOFException
	at java.io.DataInputStream.readInt(DataInputStream.java:392)
	at org.apache.cassandra.utils.ByteBufferUtil.readWithLength(ByteBufferUtil.java:349)
	at org.apache.cassandra.service.CacheService$KeyCacheSerializer.deserialize(CacheService.java:378)
	at org.apache.cassandra.cache.AutoSavingCache.loadSaved(AutoSavingCache.java:144)
	at org.apache.cassandra.db.ColumnFamilyStore.<init>(ColumnFamilyStore.java:278)
	at org.apache.cassandra.db.ColumnFamilyStore.createColumnFamilyStore(ColumnFamilyStore.java:393)
	at org.apache.cassandra.db.ColumnFamilyStore.createColumnFamilyStore(ColumnFamilyStore.java:365)
	at org.apache.cassandra.db.Table.initCf(Table.java:334)
	at org.apache.cassandra.db.Table.<init>(Table.java:272)
	at org.apache.cassandra.db.Table.open(Table.java:102)
	at org.apache.cassandra.db.Table.open(Table.java:80)
	at org.apache.cassandra.db.SystemTable.checkHealth(SystemTable.java:320)
	at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:203)
	at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:395)
	at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:438)
                  
> Problem using BulkOutputFormat while streaming several SSTables simultaneously from a given node.
> -------------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-4813
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-4813
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 1.1.0
>         Environment: I am using SLES 10 SP3, Java 6, 4 Cassandra + Hadoop nodes, 3 Hadoop only nodes (datanodes/tasktrackers), 1 namenode/jobtracker. The machines used are Six-Core AMD Opteron(tm) Processor 8431, 24 cores and 33 GB of RAM. I get the issue on both cassandra 1.1.3, 1.1.5 and I am using Hadoop 0.20.2.
>            Reporter: Ralph Romanos
>            Assignee: Yuki Morishita
>            Priority: Minor
>              Labels: Bulkoutputformat, Hadoop, SSTables
>             Fix For: 1.2.0
>
>         Attachments: 4813.txt
>
>
> The issue occurs when streaming simultaneously SSTables from the same node to a cassandra cluster using SSTableloader. It seems to me that Cassandra cannot handle receiving simultaneously SSTables from the same node. However, when it receives simultaneously SSTables from two different nodes, everything works fine. As a consequence, when using BulkOutputFormat to generate SSTables and stream them to a cassandra cluster, I cannot use more than one reducer per node otherwise I get a java.io.EOFException in the tasktracker's logs and a java.io.IOException: Broken pipe in the Cassandra logs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-4813) Problem using BulkOutputFormat while streaming several SSTables simultaneously from a given node.

Posted by "Michael Kjellman (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-4813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13491032#comment-13491032 ] 

Michael Kjellman commented on CASSANDRA-4813:
---------------------------------------------

Exception in thread "Streaming to /10.25.9.5:1" java.lang.RuntimeException: java.net.SocketException: Already bound
        at com.google.common.base.Throwables.propagate(Throwables.java:156)
        at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:32)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
        at java.lang.Thread.run(Thread.java:722)
Caused by: java.net.SocketException: Already bound
        at sun.nio.ch.Net.translateToSocketException(Net.java:109)
        at sun.nio.ch.Net.translateException(Net.java:141)
        at sun.nio.ch.Net.translateException(Net.java:147)
        at sun.nio.ch.SocketAdaptor.bind(SocketAdaptor.java:147)
        at org.apache.cassandra.net.OutboundTcpConnectionPool.newSocket(OutboundTcpConnectionPool.java:128)
        at org.apache.cassandra.streaming.FileStreamTask.connectAttempt(FileStreamTask.java:236)
        at org.apache.cassandra.streaming.FileStreamTask.runMayThrow(FileStreamTask.java:88)
        at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
        ... 3 more
Caused by: java.nio.channels.AlreadyBoundException
        at sun.nio.ch.SocketChannelImpl.bind(SocketChannelImpl.java:556)
        at sun.nio.ch.SocketAdaptor.bind(SocketAdaptor.java:145)
        ... 7 more

is it intended that we fail that reducer? this looks like just a more elegant collision, no? seeing the same failure on every node.


                
> Problem using BulkOutputFormat while streaming several SSTables simultaneously from a given node.
> -------------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-4813
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-4813
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 1.1.0
>         Environment: I am using SLES 10 SP3, Java 6, 4 Cassandra + Hadoop nodes, 3 Hadoop only nodes (datanodes/tasktrackers), 1 namenode/jobtracker. The machines used are Six-Core AMD Opteron(tm) Processor 8431, 24 cores and 33 GB of RAM. I get the issue on both cassandra 1.1.3, 1.1.5 and I am using Hadoop 0.20.2.
>            Reporter: Ralph Romanos
>            Assignee: Yuki Morishita
>            Priority: Minor
>              Labels: Bulkoutputformat, Hadoop, SSTables
>             Fix For: 1.2.0
>
>         Attachments: 4813.txt
>
>
> The issue occurs when streaming simultaneously SSTables from the same node to a cassandra cluster using SSTableloader. It seems to me that Cassandra cannot handle receiving simultaneously SSTables from the same node. However, when it receives simultaneously SSTables from two different nodes, everything works fine. As a consequence, when using BulkOutputFormat to generate SSTables and stream them to a cassandra cluster, I cannot use more than one reducer per node otherwise I get a java.io.EOFException in the tasktracker's logs and a java.io.IOException: Broken pipe in the Cassandra logs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (CASSANDRA-4813) Problem using BulkOutputFormat while streaming several SSTables simultaneously from a given node.

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-4813?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Ellis updated CASSANDRA-4813:
--------------------------------------

             Priority: Minor  (was: Major)
    Affects Version/s:     (was: 1.1.5)
                           (was: 1.1.3)
                       1.1.0
        Fix Version/s: 1.2.0

If we need to change streaming protocol to fix this then we should target 1.2.
                
> Problem using BulkOutputFormat while streaming several SSTables simultaneously from a given node.
> -------------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-4813
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-4813
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 1.1.0
>         Environment: I am using SLES 10 SP3, Java 6, 4 Cassandra + Hadoop nodes, 3 Hadoop only nodes (datanodes/tasktrackers), 1 namenode/jobtracker. The machines used are Six-Core AMD Opteron(tm) Processor 8431, 24 cores and 33 GB of RAM. I get the issue on both cassandra 1.1.3, 1.1.5 and I am using Hadoop 0.20.2.
>            Reporter: Ralph Romanos
>            Assignee: Yuki Morishita
>            Priority: Minor
>              Labels: Bulkoutputformat, Hadoop, SSTables
>             Fix For: 1.2.0
>
>
> The issue occurs when streaming simultaneously SSTables from the same node to a cassandra cluster using SSTableloader. It seems to me that Cassandra cannot handle receiving simultaneously SSTables from the same node. However, when it receives simultaneously SSTables from two different nodes, everything works fine. As a consequence, when using BulkOutputFormat to generate SSTables and stream them to a cassandra cluster, I cannot use more than one reducer per node otherwise I get a java.io.EOFException in the tasktracker's logs and a java.io.IOException: Broken pipe in the Cassandra logs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Assigned] (CASSANDRA-4813) Problem using BulkOutputFormat while streaming several SSTables simultaneously from a given node.

Posted by "Brandon Williams (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-4813?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Brandon Williams reassigned CASSANDRA-4813:
-------------------------------------------

    Assignee: Yuki Morishita
    
> Problem using BulkOutputFormat while streaming several SSTables simultaneously from a given node.
> -------------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-4813
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-4813
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 1.1.3, 1.1.5
>         Environment: I am using SLES 10 SP3, Java 6, 4 Cassandra + Hadoop nodes, 3 Hadoop only nodes (datanodes/tasktrackers), 1 namenode/jobtracker. The machines used are Six-Core AMD Opteron(tm) Processor 8431, 24 cores and 33 GB of RAM. I get the issue on both cassandra 1.1.3, 1.1.5 and I am using Hadoop 0.20.2.
>            Reporter: Ralph Romanos
>            Assignee: Yuki Morishita
>              Labels: Bulkoutputformat, Hadoop, SSTables
>
> The issue occurs when streaming simultaneously SSTables from the same node to a cassandra cluster using SSTableloader. It seems to me that Cassandra cannot handle receiving simultaneously SSTables from the same node. However, when it receives simultaneously SSTables from two different nodes, everything works fine. As a consequence, when using BulkOutputFormat to generate SSTables and stream them to a cassandra cluster, I cannot use more than one reducer per node otherwise I get a java.io.EOFException in the tasktracker's logs and a java.io.IOException: Broken pipe in the Cassandra logs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (CASSANDRA-4813) Problem using BulkOutputFormat while streaming several SSTables simultaneously from a given node.

Posted by "Ralph Romanos (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-4813?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ralph Romanos updated CASSANDRA-4813:
-------------------------------------

    Environment: I am using SLES 10 SP3, Java 6, 4 Cassandra + Hadoop nodes, 3 Hadoop only nodes (datanodes/tasktrackers), 1 namenode/jobtracker. The machines used are Six-Core AMD Opteron(tm) Processor 8431, 24 cores and 33 GB of RAM. I get the issue on both cassandra 1.1.3, 1.1.5 and I am using Hadoop 0.20.2.  (was: I am using SLES 10 SP3, 4 Cassandra + Hadoop nodes, 3 Hadoop only nodes (datanodes/tasktrackers), 1 namenode/jobtracker. The machines used are Six-Core AMD Opteron(tm) Processor 8431, 24 cores and 33 GB of RAM. I get the issue on both cassandra 1.1.3, 1.1.5 and I am using Hadoop 0.20.2.)
    
> Problem using BulkOutputFormat while streaming several SSTables simultaneously from a given node.
> -------------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-4813
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-4813
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 1.1.3, 1.1.5
>         Environment: I am using SLES 10 SP3, Java 6, 4 Cassandra + Hadoop nodes, 3 Hadoop only nodes (datanodes/tasktrackers), 1 namenode/jobtracker. The machines used are Six-Core AMD Opteron(tm) Processor 8431, 24 cores and 33 GB of RAM. I get the issue on both cassandra 1.1.3, 1.1.5 and I am using Hadoop 0.20.2.
>            Reporter: Ralph Romanos
>              Labels: Bulkoutputformat, Hadoop, SSTables
>
> The issue occurs when streaming simultaneously SSTables from the same node to a cassandra cluster using SSTableloader. It seems to me that Cassandra cannot handle receiving simultaneously SSTables from the same node. However, when it receives simultaneously SSTables from two different nodes, everything works fine. As a consequence, when using BulkOutputFormat to generate SSTables and stream them to a cassandra cluster, I cannot use more than one reducer per node otherwise I get a java.io.EOFException in the tasktracker's logs and a java.io.IOException: Broken pipe in the Cassandra logs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-4813) Problem using BulkOutputFormat while streaming several SSTables simultaneously from a given node.

Posted by "Michael Kjellman (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-4813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13491678#comment-13491678 ] 

Michael Kjellman commented on CASSANDRA-4813:
---------------------------------------------

Also, while we don't throw the EOF anymore further down the stack as far as I can tell from my testing, the net result is an IOException being thrown in the Reducer.
                
> Problem using BulkOutputFormat while streaming several SSTables simultaneously from a given node.
> -------------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-4813
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-4813
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 1.1.0
>         Environment: I am using SLES 10 SP3, Java 6, 4 Cassandra + Hadoop nodes, 3 Hadoop only nodes (datanodes/tasktrackers), 1 namenode/jobtracker. The machines used are Six-Core AMD Opteron(tm) Processor 8431, 24 cores and 33 GB of RAM. I get the issue on both cassandra 1.1.3, 1.1.5 and I am using Hadoop 0.20.2.
>            Reporter: Ralph Romanos
>            Assignee: Yuki Morishita
>            Priority: Minor
>              Labels: Bulkoutputformat, Hadoop, SSTables
>             Fix For: 1.2.0
>
>         Attachments: 4813.txt
>
>
> The issue occurs when streaming simultaneously SSTables from the same node to a cassandra cluster using SSTableloader. It seems to me that Cassandra cannot handle receiving simultaneously SSTables from the same node. However, when it receives simultaneously SSTables from two different nodes, everything works fine. As a consequence, when using BulkOutputFormat to generate SSTables and stream them to a cassandra cluster, I cannot use more than one reducer per node otherwise I get a java.io.EOFException in the tasktracker's logs and a java.io.IOException: Broken pipe in the Cassandra logs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (CASSANDRA-4813) Problem using BulkOutputFormat while streaming several SSTables simultaneously from a given node.

Posted by "Yuki Morishita (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-4813?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Yuki Morishita updated CASSANDRA-4813:
--------------------------------------

    Attachment: 4813.txt

Attaching patch to change streaming session ID from (host, counter) pair to Time UUID. This should hugely drop probability of session ID collision. I haven't tested with BOF, but I spawned 3 sstableloader simultaneously on the same node with C* and could finish streaming without getting errors.
                
> Problem using BulkOutputFormat while streaming several SSTables simultaneously from a given node.
> -------------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-4813
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-4813
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 1.1.0
>         Environment: I am using SLES 10 SP3, Java 6, 4 Cassandra + Hadoop nodes, 3 Hadoop only nodes (datanodes/tasktrackers), 1 namenode/jobtracker. The machines used are Six-Core AMD Opteron(tm) Processor 8431, 24 cores and 33 GB of RAM. I get the issue on both cassandra 1.1.3, 1.1.5 and I am using Hadoop 0.20.2.
>            Reporter: Ralph Romanos
>            Assignee: Yuki Morishita
>            Priority: Minor
>              Labels: Bulkoutputformat, Hadoop, SSTables
>             Fix For: 1.2.0
>
>         Attachments: 4813.txt
>
>
> The issue occurs when streaming simultaneously SSTables from the same node to a cassandra cluster using SSTableloader. It seems to me that Cassandra cannot handle receiving simultaneously SSTables from the same node. However, when it receives simultaneously SSTables from two different nodes, everything works fine. As a consequence, when using BulkOutputFormat to generate SSTables and stream them to a cassandra cluster, I cannot use more than one reducer per node otherwise I get a java.io.EOFException in the tasktracker's logs and a java.io.IOException: Broken pipe in the Cassandra logs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-4813) Problem using BulkOutputFormat while streaming several SSTables simultaneously from a given node.

Posted by "Michael Kjellman (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-4813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13490732#comment-13490732 ] 

Michael Kjellman commented on CASSANDRA-4813:
---------------------------------------------

Sounds good Yuki. I've been trying to get a secondary cluster setup since the 31st..keep getting pulled away. I promise i'll get multiple reducers tested today :)
                
> Problem using BulkOutputFormat while streaming several SSTables simultaneously from a given node.
> -------------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-4813
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-4813
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 1.1.0
>         Environment: I am using SLES 10 SP3, Java 6, 4 Cassandra + Hadoop nodes, 3 Hadoop only nodes (datanodes/tasktrackers), 1 namenode/jobtracker. The machines used are Six-Core AMD Opteron(tm) Processor 8431, 24 cores and 33 GB of RAM. I get the issue on both cassandra 1.1.3, 1.1.5 and I am using Hadoop 0.20.2.
>            Reporter: Ralph Romanos
>            Assignee: Yuki Morishita
>            Priority: Minor
>              Labels: Bulkoutputformat, Hadoop, SSTables
>             Fix For: 1.2.0
>
>         Attachments: 4813.txt
>
>
> The issue occurs when streaming simultaneously SSTables from the same node to a cassandra cluster using SSTableloader. It seems to me that Cassandra cannot handle receiving simultaneously SSTables from the same node. However, when it receives simultaneously SSTables from two different nodes, everything works fine. As a consequence, when using BulkOutputFormat to generate SSTables and stream them to a cassandra cluster, I cannot use more than one reducer per node otherwise I get a java.io.EOFException in the tasktracker's logs and a java.io.IOException: Broken pipe in the Cassandra logs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-4813) Problem using BulkOutputFormat while streaming several SSTables simultaneously from a given node.

Posted by "Michael Kjellman (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-4813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13481673#comment-13481673 ] 

Michael Kjellman commented on CASSANDRA-4813:
---------------------------------------------

[~nakagawayk] Yes - the single reducer was running on top of a Cassandra node. Why would having the reducer+the cassandra node cause a session ID collision if there is only one reducer?
                
> Problem using BulkOutputFormat while streaming several SSTables simultaneously from a given node.
> -------------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-4813
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-4813
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 1.1.3, 1.1.5
>         Environment: I am using SLES 10 SP3, Java 6, 4 Cassandra + Hadoop nodes, 3 Hadoop only nodes (datanodes/tasktrackers), 1 namenode/jobtracker. The machines used are Six-Core AMD Opteron(tm) Processor 8431, 24 cores and 33 GB of RAM. I get the issue on both cassandra 1.1.3, 1.1.5 and I am using Hadoop 0.20.2.
>            Reporter: Ralph Romanos
>            Assignee: Yuki Morishita
>              Labels: Bulkoutputformat, Hadoop, SSTables
>
> The issue occurs when streaming simultaneously SSTables from the same node to a cassandra cluster using SSTableloader. It seems to me that Cassandra cannot handle receiving simultaneously SSTables from the same node. However, when it receives simultaneously SSTables from two different nodes, everything works fine. As a consequence, when using BulkOutputFormat to generate SSTables and stream them to a cassandra cluster, I cannot use more than one reducer per node otherwise I get a java.io.EOFException in the tasktracker's logs and a java.io.IOException: Broken pipe in the Cassandra logs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-4813) Problem using BulkOutputFormat while streaming several SSTables simultaneously from a given node.

Posted by "Michael Kjellman (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-4813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13494317#comment-13494317 ] 

Michael Kjellman commented on CASSANDRA-4813:
---------------------------------------------

yes, actually the first time i didn't and then i realized i hadn't rebuilt the Cassandra nodes and only replaced the compiled jar in my hadoop job. I've tested a few times now just to make sure i'm not missing anything. Seems to die as soon as it tries to stream the first sstable to the nodes. Never progresses past 0% on the streaming and then throws the exception.
                
> Problem using BulkOutputFormat while streaming several SSTables simultaneously from a given node.
> -------------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-4813
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-4813
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 1.1.0
>         Environment: I am using SLES 10 SP3, Java 6, 4 Cassandra + Hadoop nodes, 3 Hadoop only nodes (datanodes/tasktrackers), 1 namenode/jobtracker. The machines used are Six-Core AMD Opteron(tm) Processor 8431, 24 cores and 33 GB of RAM. I get the issue on both cassandra 1.1.3, 1.1.5 and I am using Hadoop 0.20.2.
>            Reporter: Ralph Romanos
>            Assignee: Yuki Morishita
>            Priority: Minor
>              Labels: Bulkoutputformat, Hadoop, SSTables
>             Fix For: 1.2.0 rc1
>
>         Attachments: 4813.txt
>
>
> The issue occurs when streaming simultaneously SSTables from the same node to a cassandra cluster using SSTableloader. It seems to me that Cassandra cannot handle receiving simultaneously SSTables from the same node. However, when it receives simultaneously SSTables from two different nodes, everything works fine. As a consequence, when using BulkOutputFormat to generate SSTables and stream them to a cassandra cluster, I cannot use more than one reducer per node otherwise I get a java.io.EOFException in the tasktracker's logs and a java.io.IOException: Broken pipe in the Cassandra logs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-4813) Problem using BulkOutputFormat while streaming several SSTables simultaneously from a given node.

Posted by "Michael Kjellman (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-4813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13487867#comment-13487867 ] 

Michael Kjellman commented on CASSANDRA-4813:
---------------------------------------------

Yuki- Thanks for your work on the patch. I'll test BOF with 1.1.6 today.
                
> Problem using BulkOutputFormat while streaming several SSTables simultaneously from a given node.
> -------------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-4813
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-4813
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 1.1.0
>         Environment: I am using SLES 10 SP3, Java 6, 4 Cassandra + Hadoop nodes, 3 Hadoop only nodes (datanodes/tasktrackers), 1 namenode/jobtracker. The machines used are Six-Core AMD Opteron(tm) Processor 8431, 24 cores and 33 GB of RAM. I get the issue on both cassandra 1.1.3, 1.1.5 and I am using Hadoop 0.20.2.
>            Reporter: Ralph Romanos
>            Assignee: Yuki Morishita
>            Priority: Minor
>              Labels: Bulkoutputformat, Hadoop, SSTables
>             Fix For: 1.2.0
>
>         Attachments: 4813.txt
>
>
> The issue occurs when streaming simultaneously SSTables from the same node to a cassandra cluster using SSTableloader. It seems to me that Cassandra cannot handle receiving simultaneously SSTables from the same node. However, when it receives simultaneously SSTables from two different nodes, everything works fine. As a consequence, when using BulkOutputFormat to generate SSTables and stream them to a cassandra cluster, I cannot use more than one reducer per node otherwise I get a java.io.EOFException in the tasktracker's logs and a java.io.IOException: Broken pipe in the Cassandra logs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-4813) Problem using BulkOutputFormat while streaming several SSTables simultaneously from a given node.

Posted by "Michael Kjellman (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-4813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13495517#comment-13495517 ] 

Michael Kjellman commented on CASSANDRA-4813:
---------------------------------------------

[~yukim] I can produce it in local mode (standalone) and distributed mode with the current revision of the patch. I haven't ran it in psuedo-cluster mode.
                
> Problem using BulkOutputFormat while streaming several SSTables simultaneously from a given node.
> -------------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-4813
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-4813
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 1.1.0
>         Environment: I am using SLES 10 SP3, Java 6, 4 Cassandra + Hadoop nodes, 3 Hadoop only nodes (datanodes/tasktrackers), 1 namenode/jobtracker. The machines used are Six-Core AMD Opteron(tm) Processor 8431, 24 cores and 33 GB of RAM. I get the issue on both cassandra 1.1.3, 1.1.5 and I am using Hadoop 0.20.2.
>            Reporter: Ralph Romanos
>            Assignee: Yuki Morishita
>            Priority: Minor
>              Labels: Bulkoutputformat, Hadoop, SSTables
>             Fix For: 1.2.0 rc1
>
>         Attachments: 4813.txt
>
>
> The issue occurs when streaming simultaneously SSTables from the same node to a cassandra cluster using SSTableloader. It seems to me that Cassandra cannot handle receiving simultaneously SSTables from the same node. However, when it receives simultaneously SSTables from two different nodes, everything works fine. As a consequence, when using BulkOutputFormat to generate SSTables and stream them to a cassandra cluster, I cannot use more than one reducer per node otherwise I get a java.io.EOFException in the tasktracker's logs and a java.io.IOException: Broken pipe in the Cassandra logs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-4813) Problem using BulkOutputFormat while streaming several SSTables simultaneously from a given node.

Posted by "Michael Kjellman (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-4813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13489127#comment-13489127 ] 

Michael Kjellman commented on CASSANDRA-4813:
---------------------------------------------

Brand new cluster running 1.2 with your original patch, nodetool drain, put new jar with new patch in place, started Cassandra and got error. had to delete the cache file as subsequent initializations threw the EOF as well. Happened on all three nodes in the 3 node test cluster.
                
> Problem using BulkOutputFormat while streaming several SSTables simultaneously from a given node.
> -------------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-4813
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-4813
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 1.1.0
>         Environment: I am using SLES 10 SP3, Java 6, 4 Cassandra + Hadoop nodes, 3 Hadoop only nodes (datanodes/tasktrackers), 1 namenode/jobtracker. The machines used are Six-Core AMD Opteron(tm) Processor 8431, 24 cores and 33 GB of RAM. I get the issue on both cassandra 1.1.3, 1.1.5 and I am using Hadoop 0.20.2.
>            Reporter: Ralph Romanos
>            Assignee: Yuki Morishita
>            Priority: Minor
>              Labels: Bulkoutputformat, Hadoop, SSTables
>             Fix For: 1.2.0
>
>         Attachments: 4813.txt
>
>
> The issue occurs when streaming simultaneously SSTables from the same node to a cassandra cluster using SSTableloader. It seems to me that Cassandra cannot handle receiving simultaneously SSTables from the same node. However, when it receives simultaneously SSTables from two different nodes, everything works fine. As a consequence, when using BulkOutputFormat to generate SSTables and stream them to a cassandra cluster, I cannot use more than one reducer per node otherwise I get a java.io.EOFException in the tasktracker's logs and a java.io.IOException: Broken pipe in the Cassandra logs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-4813) Problem using BulkOutputFormat while streaming several SSTables simultaneously from a given node.

Posted by "Michael Kjellman (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-4813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13496504#comment-13496504 ] 

Michael Kjellman commented on CASSANDRA-4813:
---------------------------------------------

[~yukim] Looks like you are right. Was Murmur Partitioner and my job was configured for Random. On a first run looks like this patch is good to go. Sorry for the wild goose chase there.
                
> Problem using BulkOutputFormat while streaming several SSTables simultaneously from a given node.
> -------------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-4813
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-4813
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 1.1.0
>         Environment: I am using SLES 10 SP3, Java 6, 4 Cassandra + Hadoop nodes, 3 Hadoop only nodes (datanodes/tasktrackers), 1 namenode/jobtracker. The machines used are Six-Core AMD Opteron(tm) Processor 8431, 24 cores and 33 GB of RAM. I get the issue on both cassandra 1.1.3, 1.1.5 and I am using Hadoop 0.20.2.
>            Reporter: Ralph Romanos
>            Assignee: Yuki Morishita
>            Priority: Minor
>              Labels: Bulkoutputformat, Hadoop, SSTables
>             Fix For: 1.2.0 rc1
>
>         Attachments: 4813.txt
>
>
> The issue occurs when streaming simultaneously SSTables from the same node to a cassandra cluster using SSTableloader. It seems to me that Cassandra cannot handle receiving simultaneously SSTables from the same node. However, when it receives simultaneously SSTables from two different nodes, everything works fine. As a consequence, when using BulkOutputFormat to generate SSTables and stream them to a cassandra cluster, I cannot use more than one reducer per node otherwise I get a java.io.EOFException in the tasktracker's logs and a java.io.IOException: Broken pipe in the Cassandra logs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-4813) Problem using BulkOutputFormat while streaming several SSTables simultaneously from a given node.

Posted by "Michael Kjellman (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-4813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13494393#comment-13494393 ] 

Michael Kjellman commented on CASSANDRA-4813:
---------------------------------------------

[~yukim] I take it back. Ship It! Dependency issues on my end with the last patch, sorry for the false alarm.
                
> Problem using BulkOutputFormat while streaming several SSTables simultaneously from a given node.
> -------------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-4813
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-4813
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 1.1.0
>         Environment: I am using SLES 10 SP3, Java 6, 4 Cassandra + Hadoop nodes, 3 Hadoop only nodes (datanodes/tasktrackers), 1 namenode/jobtracker. The machines used are Six-Core AMD Opteron(tm) Processor 8431, 24 cores and 33 GB of RAM. I get the issue on both cassandra 1.1.3, 1.1.5 and I am using Hadoop 0.20.2.
>            Reporter: Ralph Romanos
>            Assignee: Yuki Morishita
>            Priority: Minor
>              Labels: Bulkoutputformat, Hadoop, SSTables
>             Fix For: 1.2.0 rc1
>
>         Attachments: 4813.txt
>
>
> The issue occurs when streaming simultaneously SSTables from the same node to a cassandra cluster using SSTableloader. It seems to me that Cassandra cannot handle receiving simultaneously SSTables from the same node. However, when it receives simultaneously SSTables from two different nodes, everything works fine. As a consequence, when using BulkOutputFormat to generate SSTables and stream them to a cassandra cluster, I cannot use more than one reducer per node otherwise I get a java.io.EOFException in the tasktracker's logs and a java.io.IOException: Broken pipe in the Cassandra logs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-4813) Problem using BulkOutputFormat while streaming several SSTables simultaneously from a given node.

Posted by "Michael Kjellman (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-4813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13487876#comment-13487876 ] 

Michael Kjellman commented on CASSANDRA-4813:
---------------------------------------------

Good to know - I'll get a 1.2 cluster setup on trunk then and test.
                
> Problem using BulkOutputFormat while streaming several SSTables simultaneously from a given node.
> -------------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-4813
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-4813
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 1.1.0
>         Environment: I am using SLES 10 SP3, Java 6, 4 Cassandra + Hadoop nodes, 3 Hadoop only nodes (datanodes/tasktrackers), 1 namenode/jobtracker. The machines used are Six-Core AMD Opteron(tm) Processor 8431, 24 cores and 33 GB of RAM. I get the issue on both cassandra 1.1.3, 1.1.5 and I am using Hadoop 0.20.2.
>            Reporter: Ralph Romanos
>            Assignee: Yuki Morishita
>            Priority: Minor
>              Labels: Bulkoutputformat, Hadoop, SSTables
>             Fix For: 1.2.0
>
>         Attachments: 4813.txt
>
>
> The issue occurs when streaming simultaneously SSTables from the same node to a cassandra cluster using SSTableloader. It seems to me that Cassandra cannot handle receiving simultaneously SSTables from the same node. However, when it receives simultaneously SSTables from two different nodes, everything works fine. As a consequence, when using BulkOutputFormat to generate SSTables and stream them to a cassandra cluster, I cannot use more than one reducer per node otherwise I get a java.io.EOFException in the tasktracker's logs and a java.io.IOException: Broken pipe in the Cassandra logs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-4813) Problem using BulkOutputFormat while streaming several SSTables simultaneously from a given node.

Posted by "Michael Kjellman (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-4813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13478368#comment-13478368 ] 

Michael Kjellman commented on CASSANDRA-4813:
---------------------------------------------

Yes, in my reducer. Didn't restart the Cassandra cluster -- what would that accomplish? Limited the reducer in my job configuration.
                
> Problem using BulkOutputFormat while streaming several SSTables simultaneously from a given node.
> -------------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-4813
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-4813
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 1.1.3, 1.1.5
>         Environment: I am using SLES 10 SP3, Java 6, 4 Cassandra + Hadoop nodes, 3 Hadoop only nodes (datanodes/tasktrackers), 1 namenode/jobtracker. The machines used are Six-Core AMD Opteron(tm) Processor 8431, 24 cores and 33 GB of RAM. I get the issue on both cassandra 1.1.3, 1.1.5 and I am using Hadoop 0.20.2.
>            Reporter: Ralph Romanos
>              Labels: Bulkoutputformat, Hadoop, SSTables
>
> The issue occurs when streaming simultaneously SSTables from the same node to a cassandra cluster using SSTableloader. It seems to me that Cassandra cannot handle receiving simultaneously SSTables from the same node. However, when it receives simultaneously SSTables from two different nodes, everything works fine. As a consequence, when using BulkOutputFormat to generate SSTables and stream them to a cassandra cluster, I cannot use more than one reducer per node otherwise I get a java.io.EOFException in the tasktracker's logs and a java.io.IOException: Broken pipe in the Cassandra logs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (CASSANDRA-4813) Problem using BulkOutputFormat while streaming several SSTables simultaneously from a given node.

Posted by "Michael Kjellman (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-4813?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Michael Kjellman updated CASSANDRA-4813:
----------------------------------------

    Comment: was deleted

(was: [~yukim] I take it back. Ship It! Dependency issues on my end with the last patch, sorry for the false alarm.)
    
> Problem using BulkOutputFormat while streaming several SSTables simultaneously from a given node.
> -------------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-4813
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-4813
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 1.1.0
>         Environment: I am using SLES 10 SP3, Java 6, 4 Cassandra + Hadoop nodes, 3 Hadoop only nodes (datanodes/tasktrackers), 1 namenode/jobtracker. The machines used are Six-Core AMD Opteron(tm) Processor 8431, 24 cores and 33 GB of RAM. I get the issue on both cassandra 1.1.3, 1.1.5 and I am using Hadoop 0.20.2.
>            Reporter: Ralph Romanos
>            Assignee: Yuki Morishita
>            Priority: Minor
>              Labels: Bulkoutputformat, Hadoop, SSTables
>             Fix For: 1.2.0 rc1
>
>         Attachments: 4813.txt
>
>
> The issue occurs when streaming simultaneously SSTables from the same node to a cassandra cluster using SSTableloader. It seems to me that Cassandra cannot handle receiving simultaneously SSTables from the same node. However, when it receives simultaneously SSTables from two different nodes, everything works fine. As a consequence, when using BulkOutputFormat to generate SSTables and stream them to a cassandra cluster, I cannot use more than one reducer per node otherwise I get a java.io.EOFException in the tasktracker's logs and a java.io.IOException: Broken pipe in the Cassandra logs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-4813) Problem using BulkOutputFormat while streaming several SSTables simultaneously from a given node.

Posted by "Michael Kjellman (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-4813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13478302#comment-13478302 ] 

Michael Kjellman commented on CASSANDRA-4813:
---------------------------------------------

Same issue with 1.1.6 and Hadoop 1.0.3

I have the following from the Cassandra logs as well

ERROR 12:46:06,256 Exception in thread Thread[Thread-1224,5,main]
java.lang.AssertionError: We shouldn't fail acquiring a reference on a sstable that has just been transferred
	at org.apache.cassandra.streaming.StreamInSession.closeIfFinished(StreamInSession.java:188)
	at org.apache.cassandra.streaming.IncomingStreamReader.read(IncomingStreamReader.java:103)
	at org.apache.cassandra.net.IncomingTcpConnection.stream(IncomingTcpConnection.java:182)
	at org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:78)
                
> Problem using BulkOutputFormat while streaming several SSTables simultaneously from a given node.
> -------------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-4813
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-4813
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 1.1.3, 1.1.5
>         Environment: I am using SLES 10 SP3, Java 6, 4 Cassandra + Hadoop nodes, 3 Hadoop only nodes (datanodes/tasktrackers), 1 namenode/jobtracker. The machines used are Six-Core AMD Opteron(tm) Processor 8431, 24 cores and 33 GB of RAM. I get the issue on both cassandra 1.1.3, 1.1.5 and I am using Hadoop 0.20.2.
>            Reporter: Ralph Romanos
>              Labels: Bulkoutputformat, Hadoop, SSTables
>
> The issue occurs when streaming simultaneously SSTables from the same node to a cassandra cluster using SSTableloader. It seems to me that Cassandra cannot handle receiving simultaneously SSTables from the same node. However, when it receives simultaneously SSTables from two different nodes, everything works fine. As a consequence, when using BulkOutputFormat to generate SSTables and stream them to a cassandra cluster, I cannot use more than one reducer per node otherwise I get a java.io.EOFException in the tasktracker's logs and a java.io.IOException: Broken pipe in the Cassandra logs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-4813) Problem using BulkOutputFormat while streaming several SSTables simultaneously from a given node.

Posted by "Michael Kjellman (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-4813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13478352#comment-13478352 ] 

Michael Kjellman commented on CASSANDRA-4813:
---------------------------------------------

Just confirmed that limiting the reducer to 1 does not change the behavior in my environment. Also noticed that BulkRecordWriter will always throw an IOException (as mentioned in the original bug) if mapreduce.output.bulkoutputformat.maxfailedhosts is ever > 0 (assuming defaults).

In my case future.getFailedHosts() always returns every node in my cluster when the condition occurs. I'm doing about 50 million insertions into 50 million rows and the EOFExceptions seem to crop up after a good number of the sstables have already been successfully sent.
                
> Problem using BulkOutputFormat while streaming several SSTables simultaneously from a given node.
> -------------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-4813
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-4813
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 1.1.3, 1.1.5
>         Environment: I am using SLES 10 SP3, Java 6, 4 Cassandra + Hadoop nodes, 3 Hadoop only nodes (datanodes/tasktrackers), 1 namenode/jobtracker. The machines used are Six-Core AMD Opteron(tm) Processor 8431, 24 cores and 33 GB of RAM. I get the issue on both cassandra 1.1.3, 1.1.5 and I am using Hadoop 0.20.2.
>            Reporter: Ralph Romanos
>              Labels: Bulkoutputformat, Hadoop, SSTables
>
> The issue occurs when streaming simultaneously SSTables from the same node to a cassandra cluster using SSTableloader. It seems to me that Cassandra cannot handle receiving simultaneously SSTables from the same node. However, when it receives simultaneously SSTables from two different nodes, everything works fine. As a consequence, when using BulkOutputFormat to generate SSTables and stream them to a cassandra cluster, I cannot use more than one reducer per node otherwise I get a java.io.EOFException in the tasktracker's logs and a java.io.IOException: Broken pipe in the Cassandra logs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Comment Edited] (CASSANDRA-4813) Problem using BulkOutputFormat while streaming several SSTables simultaneously from a given node.

Posted by "Michael Kjellman (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-4813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13493597#comment-13493597 ] 

Michael Kjellman edited comment on CASSANDRA-4813 at 11/8/12 11:28 PM:
-----------------------------------------------------------------------

new problem now with this patch.


attempt_201211061048_0010_r_000002_0: Exception in thread "Streaming to /10.25.9.5:1" java.lang.RuntimeException: java.io.IOException: Broken pipe
attempt_201211061048_0010_r_000002_0: 	at com.google.common.base.Throwables.propagate(Throwables.java:156)
attempt_201211061048_0010_r_000002_0: 	at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:32)
attempt_201211061048_0010_r_000002_0: 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
attempt_201211061048_0010_r_000002_0: 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
attempt_201211061048_0010_r_000002_0: 	at java.lang.Thread.run(Thread.java:722)
attempt_201211061048_0010_r_000002_0: Caused by: java.io.IOException: Broken pipe
attempt_201211061048_0010_r_000002_0: 	at sun.nio.ch.FileDispatcherImpl.write0(Native Method)
attempt_201211061048_0010_r_000002_0: 	at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47)
attempt_201211061048_0010_r_000002_0: 	at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:89)
attempt_201211061048_0010_r_000002_0: 	at sun.nio.ch.IOUtil.write(IOUtil.java:60)
attempt_201211061048_0010_r_000002_0: 	at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:450)
attempt_201211061048_0010_r_000002_0: 	at java.nio.channels.Channels.writeFullyImpl(Channels.java:78)
attempt_201211061048_0010_r_000002_0: 	at java.nio.channels.Channels.writeFully(Channels.java:98)
attempt_201211061048_0010_r_000002_0: 	at java.nio.channels.Channels.access$000(Channels.java:61)
attempt_201211061048_0010_r_000002_0: 	at java.nio.channels.Channels$1.write(Channels.java:174)
attempt_201211061048_0010_r_000002_0: 	at com.ning.compress.lzf.ChunkEncoder.encodeAndWriteChunk(ChunkEncoder.java:133)
attempt_201211061048_0010_r_000002_0: 	at com.ning.compress.lzf.LZFOutputStream.writeCompressedBlock(LZFOutputStream.java:203)
attempt_201211061048_0010_r_000002_0: 	at com.ning.compress.lzf.LZFOutputStream.write(LZFOutputStream.java:97)
attempt_201211061048_0010_r_000002_0: 	at org.apache.cassandra.streaming.FileStreamTask.write(FileStreamTask.java:218)
attempt_201211061048_0010_r_000002_0: 	at org.apache.cassandra.streaming.FileStreamTask.stream(FileStreamTask.java:164)
attempt_201211061048_0010_r_000002_0: 	at org.apache.cassandra.streaming.FileStreamTask.runMayThrow(FileStreamTask.java:91)
attempt_201211061048_0010_r_000002_0: 	at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
attempt_201211061048_0010_r_000002_0: 	... 3 more
                
      was (Author: mkjellman):
    same behavior on the first run unfortunately. all nodes marked as down at the same time due to trying to rebind to the socket.
                  
> Problem using BulkOutputFormat while streaming several SSTables simultaneously from a given node.
> -------------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-4813
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-4813
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 1.1.0
>         Environment: I am using SLES 10 SP3, Java 6, 4 Cassandra + Hadoop nodes, 3 Hadoop only nodes (datanodes/tasktrackers), 1 namenode/jobtracker. The machines used are Six-Core AMD Opteron(tm) Processor 8431, 24 cores and 33 GB of RAM. I get the issue on both cassandra 1.1.3, 1.1.5 and I am using Hadoop 0.20.2.
>            Reporter: Ralph Romanos
>            Assignee: Yuki Morishita
>            Priority: Minor
>              Labels: Bulkoutputformat, Hadoop, SSTables
>             Fix For: 1.2.0 rc1
>
>         Attachments: 4813.txt
>
>
> The issue occurs when streaming simultaneously SSTables from the same node to a cassandra cluster using SSTableloader. It seems to me that Cassandra cannot handle receiving simultaneously SSTables from the same node. However, when it receives simultaneously SSTables from two different nodes, everything works fine. As a consequence, when using BulkOutputFormat to generate SSTables and stream them to a cassandra cluster, I cannot use more than one reducer per node otherwise I get a java.io.EOFException in the tasktracker's logs and a java.io.IOException: Broken pipe in the Cassandra logs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Comment Edited] (CASSANDRA-4813) Problem using BulkOutputFormat while streaming several SSTables simultaneously from a given node.

Posted by "Michael Kjellman (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-4813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13494317#comment-13494317 ] 

Michael Kjellman edited comment on CASSANDRA-4813 at 11/9/12 11:13 PM:
-----------------------------------------------------------------------

yes, actually the first time I tested I did forget to ensure both the Hadoop jar and Cassandra nodes had the newest patched version. I've tested a few times now just to make sure i'm not missing anything. Seems to die as soon as it tries to stream the first sstable to the nodes. Never progresses past 0% on the streaming and then throws the exception.

applied patch to trunk when it was at commit f09a89f4cd13af2087fcc92f09f6cf1ee4785feb. i rebuilt the entire cluster, and ensured my maven dependencies were all set. Still reproduced the problem unfortunately (i actually thought it had been resolved but i just reproduced the java.io.IOException: Broken pipe again).

MD5 (build/apache-cassandra-1.2.0-beta2-SNAPSHOT.jar) = 92d8ffacb3963116dd153a2c8c83fbe9
                
      was (Author: mkjellman):
    yes, actually the first time i didn't and then i realized i hadn't rebuilt the Cassandra nodes and only replaced the compiled jar in my hadoop job. I've tested a few times now just to make sure i'm not missing anything. Seems to die as soon as it tries to stream the first sstable to the nodes. Never progresses past 0% on the streaming and then throws the exception.
                  
> Problem using BulkOutputFormat while streaming several SSTables simultaneously from a given node.
> -------------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-4813
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-4813
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 1.1.0
>         Environment: I am using SLES 10 SP3, Java 6, 4 Cassandra + Hadoop nodes, 3 Hadoop only nodes (datanodes/tasktrackers), 1 namenode/jobtracker. The machines used are Six-Core AMD Opteron(tm) Processor 8431, 24 cores and 33 GB of RAM. I get the issue on both cassandra 1.1.3, 1.1.5 and I am using Hadoop 0.20.2.
>            Reporter: Ralph Romanos
>            Assignee: Yuki Morishita
>            Priority: Minor
>              Labels: Bulkoutputformat, Hadoop, SSTables
>             Fix For: 1.2.0 rc1
>
>         Attachments: 4813.txt
>
>
> The issue occurs when streaming simultaneously SSTables from the same node to a cassandra cluster using SSTableloader. It seems to me that Cassandra cannot handle receiving simultaneously SSTables from the same node. However, when it receives simultaneously SSTables from two different nodes, everything works fine. As a consequence, when using BulkOutputFormat to generate SSTables and stream them to a cassandra cluster, I cannot use more than one reducer per node otherwise I get a java.io.EOFException in the tasktracker's logs and a java.io.IOException: Broken pipe in the Cassandra logs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-4813) Problem using BulkOutputFormat while streaming several SSTables simultaneously from a given node.

Posted by "Ralph Romanos (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-4813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13477678#comment-13477678 ] 

Ralph Romanos commented on CASSANDRA-4813:
------------------------------------------

I get the following error in the tasktracker's logs when they 
are streamed into the Cassandra cluster:

Exception in thread "Streaming to /172.16.110.79:1" java.lang.RuntimeException: java.io.EOFException
	at org.apache.cassandra.utils.FBUtilities.unchecked(FBUtilities.java:628)
	at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:34)
	at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
	at java.lang.Thread.run(Unknown Source)
Caused by: java.io.EOFException
	at java.io.DataInputStream.readInt(Unknown Source)
	at org.apache.cassandra.streaming.FileStreamTask.receiveReply(FileStreamTask.java:194)
	at org.apache.cassandra.streaming.FileStreamTask.stream(FileStreamTask.java:181)
	at org.apache.cassandra.streaming.FileStreamTask.runMayThrow(FileStreamTask.java:94)
	at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)
	... 3 more
Exception in thread "Streaming to /172.16.110.92:1" java.lang.RuntimeException: java.io.EOFException
	at org.apache.cassandra.utils.FBUtilities.unchecked(FBUtilities.java:628)
	at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:34)
	at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
	at java.lang.Thread.run(Unknown Source)
Caused by: java.io.EOFException
	at java.io.DataInputStream.readInt(Unknown Source)
	at org.apache.cassandra.streaming.FileStreamTask.receiveReply(FileStreamTask.java:194)
	at org.apache.cassandra.streaming.FileStreamTask.stream(FileStreamTask.java:181)
	at org.apache.cassandra.streaming.FileStreamTask.runMayThrow(FileStreamTask.java:94)
	at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)
	... 3 more
                
> Problem using BulkOutputFormat while streaming several SSTables simultaneously from a given node.
> -------------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-4813
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-4813
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 1.1.3, 1.1.5
>         Environment: I am using SLES 10 SP3, Java 6, 4 Cassandra + Hadoop nodes, 3 Hadoop only nodes (datanodes/tasktrackers), 1 namenode/jobtracker. The machines used are Six-Core AMD Opteron(tm) Processor 8431, 24 cores and 33 GB of RAM. I get the issue on both cassandra 1.1.3, 1.1.5 and I am using Hadoop 0.20.2.
>            Reporter: Ralph Romanos
>              Labels: Bulkoutputformat, Hadoop, SSTables
>
> The issue occurs when streaming simultaneously SSTables from the same node to a cassandra cluster using SSTableloader. It seems to me that Cassandra cannot handle receiving simultaneously SSTables from the same node. However, when it receives simultaneously SSTables from two different nodes, everything works fine. As a consequence, when using BulkOutputFormat to generate SSTables and stream them to a cassandra cluster, I cannot use more than one reducer per node otherwise I get a java.io.EOFException in the tasktracker's logs and a java.io.IOException: Broken pipe in the Cassandra logs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-4813) Problem using BulkOutputFormat while streaming several SSTables simultaneously from a given node.

Posted by "Ralph Romanos (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-4813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13482174#comment-13482174 ] 

Ralph Romanos commented on CASSANDRA-4813:
------------------------------------------

I never get this exception when I run my reducer on a single node (either on top of a cassandra node or on a Hadoop only node). In addition, when I run multiple reducers, I create a java process for each reducer; therefore my reducers should be running on different JVMs right? If that is the case we shouldn't get counter conflicts when running multiple reducers.
                
> Problem using BulkOutputFormat while streaming several SSTables simultaneously from a given node.
> -------------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-4813
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-4813
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 1.1.3, 1.1.5
>         Environment: I am using SLES 10 SP3, Java 6, 4 Cassandra + Hadoop nodes, 3 Hadoop only nodes (datanodes/tasktrackers), 1 namenode/jobtracker. The machines used are Six-Core AMD Opteron(tm) Processor 8431, 24 cores and 33 GB of RAM. I get the issue on both cassandra 1.1.3, 1.1.5 and I am using Hadoop 0.20.2.
>            Reporter: Ralph Romanos
>            Assignee: Yuki Morishita
>              Labels: Bulkoutputformat, Hadoop, SSTables
>
> The issue occurs when streaming simultaneously SSTables from the same node to a cassandra cluster using SSTableloader. It seems to me that Cassandra cannot handle receiving simultaneously SSTables from the same node. However, when it receives simultaneously SSTables from two different nodes, everything works fine. As a consequence, when using BulkOutputFormat to generate SSTables and stream them to a cassandra cluster, I cannot use more than one reducer per node otherwise I get a java.io.EOFException in the tasktracker's logs and a java.io.IOException: Broken pipe in the Cassandra logs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-4813) Problem using BulkOutputFormat while streaming several SSTables simultaneously from a given node.

Posted by "Michael Kjellman (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-4813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13493597#comment-13493597 ] 

Michael Kjellman commented on CASSANDRA-4813:
---------------------------------------------

same behavior on the first run unfortunately. all nodes marked as down at the same time due to trying to rebind to the socket.
                
> Problem using BulkOutputFormat while streaming several SSTables simultaneously from a given node.
> -------------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-4813
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-4813
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 1.1.0
>         Environment: I am using SLES 10 SP3, Java 6, 4 Cassandra + Hadoop nodes, 3 Hadoop only nodes (datanodes/tasktrackers), 1 namenode/jobtracker. The machines used are Six-Core AMD Opteron(tm) Processor 8431, 24 cores and 33 GB of RAM. I get the issue on both cassandra 1.1.3, 1.1.5 and I am using Hadoop 0.20.2.
>            Reporter: Ralph Romanos
>            Assignee: Yuki Morishita
>            Priority: Minor
>              Labels: Bulkoutputformat, Hadoop, SSTables
>             Fix For: 1.2.0 rc1
>
>         Attachments: 4813.txt
>
>
> The issue occurs when streaming simultaneously SSTables from the same node to a cassandra cluster using SSTableloader. It seems to me that Cassandra cannot handle receiving simultaneously SSTables from the same node. However, when it receives simultaneously SSTables from two different nodes, everything works fine. As a consequence, when using BulkOutputFormat to generate SSTables and stream them to a cassandra cluster, I cannot use more than one reducer per node otherwise I get a java.io.EOFException in the tasktracker's logs and a java.io.IOException: Broken pipe in the Cassandra logs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (CASSANDRA-4813) Problem using BulkOutputFormat while streaming several SSTables simultaneously from a given node.

Posted by "Yuki Morishita (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-4813?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Yuki Morishita updated CASSANDRA-4813:
--------------------------------------

    Attachment:     (was: 4813.txt)
    
> Problem using BulkOutputFormat while streaming several SSTables simultaneously from a given node.
> -------------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-4813
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-4813
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 1.1.0
>         Environment: I am using SLES 10 SP3, Java 6, 4 Cassandra + Hadoop nodes, 3 Hadoop only nodes (datanodes/tasktrackers), 1 namenode/jobtracker. The machines used are Six-Core AMD Opteron(tm) Processor 8431, 24 cores and 33 GB of RAM. I get the issue on both cassandra 1.1.3, 1.1.5 and I am using Hadoop 0.20.2.
>            Reporter: Ralph Romanos
>            Assignee: Yuki Morishita
>            Priority: Minor
>              Labels: Bulkoutputformat, Hadoop, SSTables
>             Fix For: 1.2.0
>
>
> The issue occurs when streaming simultaneously SSTables from the same node to a cassandra cluster using SSTableloader. It seems to me that Cassandra cannot handle receiving simultaneously SSTables from the same node. However, when it receives simultaneously SSTables from two different nodes, everything works fine. As a consequence, when using BulkOutputFormat to generate SSTables and stream them to a cassandra cluster, I cannot use more than one reducer per node otherwise I get a java.io.EOFException in the tasktracker's logs and a java.io.IOException: Broken pipe in the Cassandra logs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-4813) Problem using BulkOutputFormat while streaming several SSTables simultaneously from a given node.

Posted by "Ralph Romanos (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-4813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13478380#comment-13478380 ] 

Ralph Romanos commented on CASSANDRA-4813:
------------------------------------------

I had the same error (IOException) when I did not restart Cassandra. It is caused by the EOFException and because the streaming failed in your previous job. Restarting Cassandra should allow you to make it work with 1 reducer.
                
> Problem using BulkOutputFormat while streaming several SSTables simultaneously from a given node.
> -------------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-4813
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-4813
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 1.1.3, 1.1.5
>         Environment: I am using SLES 10 SP3, Java 6, 4 Cassandra + Hadoop nodes, 3 Hadoop only nodes (datanodes/tasktrackers), 1 namenode/jobtracker. The machines used are Six-Core AMD Opteron(tm) Processor 8431, 24 cores and 33 GB of RAM. I get the issue on both cassandra 1.1.3, 1.1.5 and I am using Hadoop 0.20.2.
>            Reporter: Ralph Romanos
>              Labels: Bulkoutputformat, Hadoop, SSTables
>
> The issue occurs when streaming simultaneously SSTables from the same node to a cassandra cluster using SSTableloader. It seems to me that Cassandra cannot handle receiving simultaneously SSTables from the same node. However, when it receives simultaneously SSTables from two different nodes, everything works fine. As a consequence, when using BulkOutputFormat to generate SSTables and stream them to a cassandra cluster, I cannot use more than one reducer per node otherwise I get a java.io.EOFException in the tasktracker's logs and a java.io.IOException: Broken pipe in the Cassandra logs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-4813) Problem using BulkOutputFormat while streaming several SSTables simultaneously from a given node.

Posted by "Yuki Morishita (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-4813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13494315#comment-13494315 ] 

Yuki Morishita commented on CASSANDRA-4813:
-------------------------------------------

[~mkjellman] Just checking, did you use patched version for both Cassandra and hadoop job?
                
> Problem using BulkOutputFormat while streaming several SSTables simultaneously from a given node.
> -------------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-4813
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-4813
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 1.1.0
>         Environment: I am using SLES 10 SP3, Java 6, 4 Cassandra + Hadoop nodes, 3 Hadoop only nodes (datanodes/tasktrackers), 1 namenode/jobtracker. The machines used are Six-Core AMD Opteron(tm) Processor 8431, 24 cores and 33 GB of RAM. I get the issue on both cassandra 1.1.3, 1.1.5 and I am using Hadoop 0.20.2.
>            Reporter: Ralph Romanos
>            Assignee: Yuki Morishita
>            Priority: Minor
>              Labels: Bulkoutputformat, Hadoop, SSTables
>             Fix For: 1.2.0 rc1
>
>         Attachments: 4813.txt
>
>
> The issue occurs when streaming simultaneously SSTables from the same node to a cassandra cluster using SSTableloader. It seems to me that Cassandra cannot handle receiving simultaneously SSTables from the same node. However, when it receives simultaneously SSTables from two different nodes, everything works fine. As a consequence, when using BulkOutputFormat to generate SSTables and stream them to a cassandra cluster, I cannot use more than one reducer per node otherwise I get a java.io.EOFException in the tasktracker's logs and a java.io.IOException: Broken pipe in the Cassandra logs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-4813) Problem using BulkOutputFormat while streaming several SSTables simultaneously from a given node.

Posted by "Yuki Morishita (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-4813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13481658#comment-13481658 ] 

Yuki Morishita commented on CASSANDRA-4813:
-------------------------------------------

This is limitation of BulkOutputFormat right now. Currently, streaming session uses (IP, counter) for its ID. Since counter is per JVM, running two or more reducers on same node streaming to one cassandra node likely cause session conflict, and I think that is causing the issue here.
To resolve this, we need to change the way to distinguish each session(possibly by changing to use UUID for session ID).

[~mkjellman] Do you run your reducer on top of cassandra node? If that is the case, session conflict I described above may be the cause. If not, there is another issue in your one reducer case I think.
                
> Problem using BulkOutputFormat while streaming several SSTables simultaneously from a given node.
> -------------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-4813
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-4813
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 1.1.3, 1.1.5
>         Environment: I am using SLES 10 SP3, Java 6, 4 Cassandra + Hadoop nodes, 3 Hadoop only nodes (datanodes/tasktrackers), 1 namenode/jobtracker. The machines used are Six-Core AMD Opteron(tm) Processor 8431, 24 cores and 33 GB of RAM. I get the issue on both cassandra 1.1.3, 1.1.5 and I am using Hadoop 0.20.2.
>            Reporter: Ralph Romanos
>            Assignee: Yuki Morishita
>              Labels: Bulkoutputformat, Hadoop, SSTables
>
> The issue occurs when streaming simultaneously SSTables from the same node to a cassandra cluster using SSTableloader. It seems to me that Cassandra cannot handle receiving simultaneously SSTables from the same node. However, when it receives simultaneously SSTables from two different nodes, everything works fine. As a consequence, when using BulkOutputFormat to generate SSTables and stream them to a cassandra cluster, I cannot use more than one reducer per node otherwise I get a java.io.EOFException in the tasktracker's logs and a java.io.IOException: Broken pipe in the Cassandra logs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-4813) Problem using BulkOutputFormat while streaming several SSTables simultaneously from a given node.

Posted by "Ralph Romanos (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-4813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13478362#comment-13478362 ] 

Ralph Romanos commented on CASSANDRA-4813:
------------------------------------------

Do you stream the SSTables in your reducer? And when you tried to limit the reducer to 1, did you restart the cassandra cluster?
                
> Problem using BulkOutputFormat while streaming several SSTables simultaneously from a given node.
> -------------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-4813
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-4813
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 1.1.3, 1.1.5
>         Environment: I am using SLES 10 SP3, Java 6, 4 Cassandra + Hadoop nodes, 3 Hadoop only nodes (datanodes/tasktrackers), 1 namenode/jobtracker. The machines used are Six-Core AMD Opteron(tm) Processor 8431, 24 cores and 33 GB of RAM. I get the issue on both cassandra 1.1.3, 1.1.5 and I am using Hadoop 0.20.2.
>            Reporter: Ralph Romanos
>              Labels: Bulkoutputformat, Hadoop, SSTables
>
> The issue occurs when streaming simultaneously SSTables from the same node to a cassandra cluster using SSTableloader. It seems to me that Cassandra cannot handle receiving simultaneously SSTables from the same node. However, when it receives simultaneously SSTables from two different nodes, everything works fine. As a consequence, when using BulkOutputFormat to generate SSTables and stream them to a cassandra cluster, I cannot use more than one reducer per node otherwise I get a java.io.EOFException in the tasktracker's logs and a java.io.IOException: Broken pipe in the Cassandra logs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-4813) Problem using BulkOutputFormat while streaming several SSTables simultaneously from a given node.

Posted by "Yuki Morishita (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-4813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13495461#comment-13495461 ] 

Yuki Morishita commented on CASSANDRA-4813:
-------------------------------------------

[~mkjellman] Hmm, I tried standalone and psuedo-cluster hadoop on my machine and haven't seen that error. I will try in fully distributed mode.
By the way, what kind of error did you see on cassandra side? Can you post stacktrace?
                
> Problem using BulkOutputFormat while streaming several SSTables simultaneously from a given node.
> -------------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-4813
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-4813
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 1.1.0
>         Environment: I am using SLES 10 SP3, Java 6, 4 Cassandra + Hadoop nodes, 3 Hadoop only nodes (datanodes/tasktrackers), 1 namenode/jobtracker. The machines used are Six-Core AMD Opteron(tm) Processor 8431, 24 cores and 33 GB of RAM. I get the issue on both cassandra 1.1.3, 1.1.5 and I am using Hadoop 0.20.2.
>            Reporter: Ralph Romanos
>            Assignee: Yuki Morishita
>            Priority: Minor
>              Labels: Bulkoutputformat, Hadoop, SSTables
>             Fix For: 1.2.0 rc1
>
>         Attachments: 4813.txt
>
>
> The issue occurs when streaming simultaneously SSTables from the same node to a cassandra cluster using SSTableloader. It seems to me that Cassandra cannot handle receiving simultaneously SSTables from the same node. However, when it receives simultaneously SSTables from two different nodes, everything works fine. As a consequence, when using BulkOutputFormat to generate SSTables and stream them to a cassandra cluster, I cannot use more than one reducer per node otherwise I get a java.io.EOFException in the tasktracker's logs and a java.io.IOException: Broken pipe in the Cassandra logs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-4813) Problem using BulkOutputFormat while streaming several SSTables simultaneously from a given node.

Posted by "Michael Kjellman (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-4813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13488774#comment-13488774 ] 

Michael Kjellman commented on CASSANDRA-4813:
---------------------------------------------

[~yukim] not sure if this will be an issue for "normal" upgraders or if this was because i went between patch versions but just wanted you to be aware of the EOF i saw on startup. Node only has the system schema at this point.

WARN 08:40:29,821 error reading saved cache /ssd/saved_caches/system-local-KeyCache-b.db
java.io.EOFException
	at java.io.DataInputStream.readInt(DataInputStream.java:392)
	at org.apache.cassandra.utils.ByteBufferUtil.readWithLength(ByteBufferUtil.java:349)
	at org.apache.cassandra.service.CacheService$KeyCacheSerializer.deserialize(CacheService.java:378)
	at org.apache.cassandra.cache.AutoSavingCache.loadSaved(AutoSavingCache.java:144)
	at org.apache.cassandra.db.ColumnFamilyStore.<init>(ColumnFamilyStore.java:278)
	at org.apache.cassandra.db.ColumnFamilyStore.createColumnFamilyStore(ColumnFamilyStore.java:393)
	at org.apache.cassandra.db.ColumnFamilyStore.createColumnFamilyStore(ColumnFamilyStore.java:365)
	at org.apache.cassandra.db.Table.initCf(Table.java:334)
	at org.apache.cassandra.db.Table.<init>(Table.java:272)
	at org.apache.cassandra.db.Table.open(Table.java:102)
	at org.apache.cassandra.db.Table.open(Table.java:80)
	at org.apache.cassandra.db.SystemTable.checkHealth(SystemTable.java:320)
	at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:203)
	at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:395)
	at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:438)
                
> Problem using BulkOutputFormat while streaming several SSTables simultaneously from a given node.
> -------------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-4813
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-4813
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 1.1.0
>         Environment: I am using SLES 10 SP3, Java 6, 4 Cassandra + Hadoop nodes, 3 Hadoop only nodes (datanodes/tasktrackers), 1 namenode/jobtracker. The machines used are Six-Core AMD Opteron(tm) Processor 8431, 24 cores and 33 GB of RAM. I get the issue on both cassandra 1.1.3, 1.1.5 and I am using Hadoop 0.20.2.
>            Reporter: Ralph Romanos
>            Assignee: Yuki Morishita
>            Priority: Minor
>              Labels: Bulkoutputformat, Hadoop, SSTables
>             Fix For: 1.2.0
>
>         Attachments: 4813.txt
>
>
> The issue occurs when streaming simultaneously SSTables from the same node to a cassandra cluster using SSTableloader. It seems to me that Cassandra cannot handle receiving simultaneously SSTables from the same node. However, when it receives simultaneously SSTables from two different nodes, everything works fine. As a consequence, when using BulkOutputFormat to generate SSTables and stream them to a cassandra cluster, I cannot use more than one reducer per node otherwise I get a java.io.EOFException in the tasktracker's logs and a java.io.IOException: Broken pipe in the Cassandra logs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Comment Edited] (CASSANDRA-4813) Problem using BulkOutputFormat while streaming several SSTables simultaneously from a given node.

Posted by "Michael Kjellman (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-4813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13495517#comment-13495517 ] 

Michael Kjellman edited comment on CASSANDRA-4813 at 11/12/12 7:35 PM:
-----------------------------------------------------------------------

[~yukim] I can produce it in local mode (standalone) and distributed mode with the current revision of the patch. I haven't ran it in psuedo-cluster mode. Also should mention I have reproduced it even when I limit to 1 reducer.
                
      was (Author: mkjellman):
    [~yukim] I can produce it in local mode (standalone) and distributed mode with the current revision of the patch. I haven't ran it in psuedo-cluster mode.
                  
> Problem using BulkOutputFormat while streaming several SSTables simultaneously from a given node.
> -------------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-4813
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-4813
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 1.1.0
>         Environment: I am using SLES 10 SP3, Java 6, 4 Cassandra + Hadoop nodes, 3 Hadoop only nodes (datanodes/tasktrackers), 1 namenode/jobtracker. The machines used are Six-Core AMD Opteron(tm) Processor 8431, 24 cores and 33 GB of RAM. I get the issue on both cassandra 1.1.3, 1.1.5 and I am using Hadoop 0.20.2.
>            Reporter: Ralph Romanos
>            Assignee: Yuki Morishita
>            Priority: Minor
>              Labels: Bulkoutputformat, Hadoop, SSTables
>             Fix For: 1.2.0 rc1
>
>         Attachments: 4813.txt
>
>
> The issue occurs when streaming simultaneously SSTables from the same node to a cassandra cluster using SSTableloader. It seems to me that Cassandra cannot handle receiving simultaneously SSTables from the same node. However, when it receives simultaneously SSTables from two different nodes, everything works fine. As a consequence, when using BulkOutputFormat to generate SSTables and stream them to a cassandra cluster, I cannot use more than one reducer per node otherwise I get a java.io.EOFException in the tasktracker's logs and a java.io.IOException: Broken pipe in the Cassandra logs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-4813) Problem using BulkOutputFormat while streaming several SSTables simultaneously from a given node.

Posted by "Michael Kjellman (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-4813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13496587#comment-13496587 ] 

Michael Kjellman commented on CASSANDRA-4813:
---------------------------------------------

Just tested a few more times. Looks good. Ship it!
                
> Problem using BulkOutputFormat while streaming several SSTables simultaneously from a given node.
> -------------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-4813
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-4813
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 1.1.0
>         Environment: I am using SLES 10 SP3, Java 6, 4 Cassandra + Hadoop nodes, 3 Hadoop only nodes (datanodes/tasktrackers), 1 namenode/jobtracker. The machines used are Six-Core AMD Opteron(tm) Processor 8431, 24 cores and 33 GB of RAM. I get the issue on both cassandra 1.1.3, 1.1.5 and I am using Hadoop 0.20.2.
>            Reporter: Ralph Romanos
>            Assignee: Yuki Morishita
>            Priority: Minor
>              Labels: Bulkoutputformat, Hadoop, SSTables
>             Fix For: 1.2.0 rc1
>
>         Attachments: 4813.txt
>
>
> The issue occurs when streaming simultaneously SSTables from the same node to a cassandra cluster using SSTableloader. It seems to me that Cassandra cannot handle receiving simultaneously SSTables from the same node. However, when it receives simultaneously SSTables from two different nodes, everything works fine. As a consequence, when using BulkOutputFormat to generate SSTables and stream them to a cassandra cluster, I cannot use more than one reducer per node otherwise I get a java.io.EOFException in the tasktracker's logs and a java.io.IOException: Broken pipe in the Cassandra logs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-4813) Problem using BulkOutputFormat while streaming several SSTables simultaneously from a given node.

Posted by "Yuki Morishita (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-4813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13490728#comment-13490728 ] 

Yuki Morishita commented on CASSANDRA-4813:
-------------------------------------------

Michael,

I couldn't reproduce your error, and I believe that is not related to this issue.
So if you see that error constantly, please open another issue.
                
> Problem using BulkOutputFormat while streaming several SSTables simultaneously from a given node.
> -------------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-4813
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-4813
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 1.1.0
>         Environment: I am using SLES 10 SP3, Java 6, 4 Cassandra + Hadoop nodes, 3 Hadoop only nodes (datanodes/tasktrackers), 1 namenode/jobtracker. The machines used are Six-Core AMD Opteron(tm) Processor 8431, 24 cores and 33 GB of RAM. I get the issue on both cassandra 1.1.3, 1.1.5 and I am using Hadoop 0.20.2.
>            Reporter: Ralph Romanos
>            Assignee: Yuki Morishita
>            Priority: Minor
>              Labels: Bulkoutputformat, Hadoop, SSTables
>             Fix For: 1.2.0
>
>         Attachments: 4813.txt
>
>
> The issue occurs when streaming simultaneously SSTables from the same node to a cassandra cluster using SSTableloader. It seems to me that Cassandra cannot handle receiving simultaneously SSTables from the same node. However, when it receives simultaneously SSTables from two different nodes, everything works fine. As a consequence, when using BulkOutputFormat to generate SSTables and stream them to a cassandra cluster, I cannot use more than one reducer per node otherwise I get a java.io.EOFException in the tasktracker's logs and a java.io.IOException: Broken pipe in the Cassandra logs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (CASSANDRA-4813) Problem using BulkOutputFormat while streaming several SSTables simultaneously from a given node.

Posted by "Yuki Morishita (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-4813?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Yuki Morishita updated CASSANDRA-4813:
--------------------------------------

    Attachment: 4813.txt

Attaching newer version. Since with this patch, we only distinguish streaming session by UUID, we don't need to carry around broadcast address(CASSANDRA-3503), so I removed it from StreamHeader.
                
> Problem using BulkOutputFormat while streaming several SSTables simultaneously from a given node.
> -------------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-4813
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-4813
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 1.1.0
>         Environment: I am using SLES 10 SP3, Java 6, 4 Cassandra + Hadoop nodes, 3 Hadoop only nodes (datanodes/tasktrackers), 1 namenode/jobtracker. The machines used are Six-Core AMD Opteron(tm) Processor 8431, 24 cores and 33 GB of RAM. I get the issue on both cassandra 1.1.3, 1.1.5 and I am using Hadoop 0.20.2.
>            Reporter: Ralph Romanos
>            Assignee: Yuki Morishita
>            Priority: Minor
>              Labels: Bulkoutputformat, Hadoop, SSTables
>             Fix For: 1.2.0
>
>         Attachments: 4813.txt
>
>
> The issue occurs when streaming simultaneously SSTables from the same node to a cassandra cluster using SSTableloader. It seems to me that Cassandra cannot handle receiving simultaneously SSTables from the same node. However, when it receives simultaneously SSTables from two different nodes, everything works fine. As a consequence, when using BulkOutputFormat to generate SSTables and stream them to a cassandra cluster, I cannot use more than one reducer per node otherwise I get a java.io.EOFException in the tasktracker's logs and a java.io.IOException: Broken pipe in the Cassandra logs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-4813) Problem using BulkOutputFormat while streaming several SSTables simultaneously from a given node.

Posted by "Ralph Romanos (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-4813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13497063#comment-13497063 ] 

Ralph Romanos commented on CASSANDRA-4813:
------------------------------------------

Hello, 
I just tested the patch on 1.2 with 2 reducers per node and 7 nodes. Everything works fine now.

Thank you Yuki!
                
> Problem using BulkOutputFormat while streaming several SSTables simultaneously from a given node.
> -------------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-4813
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-4813
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 1.1.0
>         Environment: I am using SLES 10 SP3, Java 6, 4 Cassandra + Hadoop nodes, 3 Hadoop only nodes (datanodes/tasktrackers), 1 namenode/jobtracker. The machines used are Six-Core AMD Opteron(tm) Processor 8431, 24 cores and 33 GB of RAM. I get the issue on both cassandra 1.1.3, 1.1.5 and I am using Hadoop 0.20.2.
>            Reporter: Ralph Romanos
>            Assignee: Yuki Morishita
>            Priority: Minor
>              Labels: Bulkoutputformat, Hadoop, SSTables
>             Fix For: 1.2.0 rc1
>
>         Attachments: 4813.txt
>
>
> The issue occurs when streaming simultaneously SSTables from the same node to a cassandra cluster using SSTableloader. It seems to me that Cassandra cannot handle receiving simultaneously SSTables from the same node. However, when it receives simultaneously SSTables from two different nodes, everything works fine. As a consequence, when using BulkOutputFormat to generate SSTables and stream them to a cassandra cluster, I cannot use more than one reducer per node otherwise I get a java.io.EOFException in the tasktracker's logs and a java.io.IOException: Broken pipe in the Cassandra logs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-4813) Problem using BulkOutputFormat while streaming several SSTables simultaneously from a given node.

Posted by "Yuki Morishita (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-4813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13487869#comment-13487869 ] 

Yuki Morishita commented on CASSANDRA-4813:
-------------------------------------------

Michael,

Patch is for trunk(version 1.2). The change breaks messaging compatibility in 1.1, so this won't be ported for next 1.1.7 release.

                
> Problem using BulkOutputFormat while streaming several SSTables simultaneously from a given node.
> -------------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-4813
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-4813
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 1.1.0
>         Environment: I am using SLES 10 SP3, Java 6, 4 Cassandra + Hadoop nodes, 3 Hadoop only nodes (datanodes/tasktrackers), 1 namenode/jobtracker. The machines used are Six-Core AMD Opteron(tm) Processor 8431, 24 cores and 33 GB of RAM. I get the issue on both cassandra 1.1.3, 1.1.5 and I am using Hadoop 0.20.2.
>            Reporter: Ralph Romanos
>            Assignee: Yuki Morishita
>            Priority: Minor
>              Labels: Bulkoutputformat, Hadoop, SSTables
>             Fix For: 1.2.0
>
>         Attachments: 4813.txt
>
>
> The issue occurs when streaming simultaneously SSTables from the same node to a cassandra cluster using SSTableloader. It seems to me that Cassandra cannot handle receiving simultaneously SSTables from the same node. However, when it receives simultaneously SSTables from two different nodes, everything works fine. As a consequence, when using BulkOutputFormat to generate SSTables and stream them to a cassandra cluster, I cannot use more than one reducer per node otherwise I get a java.io.EOFException in the tasktracker's logs and a java.io.IOException: Broken pipe in the Cassandra logs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-4813) Problem using BulkOutputFormat while streaming several SSTables simultaneously from a given node.

Posted by "Michael Kjellman (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-4813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13478536#comment-13478536 ] 

Michael Kjellman commented on CASSANDRA-4813:
---------------------------------------------

you're quite right. no issues after a rolling restart and only one reducer.
                
> Problem using BulkOutputFormat while streaming several SSTables simultaneously from a given node.
> -------------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-4813
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-4813
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 1.1.3, 1.1.5
>         Environment: I am using SLES 10 SP3, Java 6, 4 Cassandra + Hadoop nodes, 3 Hadoop only nodes (datanodes/tasktrackers), 1 namenode/jobtracker. The machines used are Six-Core AMD Opteron(tm) Processor 8431, 24 cores and 33 GB of RAM. I get the issue on both cassandra 1.1.3, 1.1.5 and I am using Hadoop 0.20.2.
>            Reporter: Ralph Romanos
>              Labels: Bulkoutputformat, Hadoop, SSTables
>
> The issue occurs when streaming simultaneously SSTables from the same node to a cassandra cluster using SSTableloader. It seems to me that Cassandra cannot handle receiving simultaneously SSTables from the same node. However, when it receives simultaneously SSTables from two different nodes, everything works fine. As a consequence, when using BulkOutputFormat to generate SSTables and stream them to a cassandra cluster, I cannot use more than one reducer per node otherwise I get a java.io.EOFException in the tasktracker's logs and a java.io.IOException: Broken pipe in the Cassandra logs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (CASSANDRA-4813) Problem using BulkOutputFormat while streaming several SSTables simultaneously from a given node.

Posted by "Yuki Morishita (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-4813?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Yuki Morishita updated CASSANDRA-4813:
--------------------------------------

    Attachment: 4813.txt

ok, it looks like we have CASSANDRA-3839 for BOF.
Updated patch to avoid socket re-binding.

[~mkjellman] How about this one?
                
> Problem using BulkOutputFormat while streaming several SSTables simultaneously from a given node.
> -------------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-4813
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-4813
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 1.1.0
>         Environment: I am using SLES 10 SP3, Java 6, 4 Cassandra + Hadoop nodes, 3 Hadoop only nodes (datanodes/tasktrackers), 1 namenode/jobtracker. The machines used are Six-Core AMD Opteron(tm) Processor 8431, 24 cores and 33 GB of RAM. I get the issue on both cassandra 1.1.3, 1.1.5 and I am using Hadoop 0.20.2.
>            Reporter: Ralph Romanos
>            Assignee: Yuki Morishita
>            Priority: Minor
>              Labels: Bulkoutputformat, Hadoop, SSTables
>             Fix For: 1.2.0 rc1
>
>         Attachments: 4813.txt
>
>
> The issue occurs when streaming simultaneously SSTables from the same node to a cassandra cluster using SSTableloader. It seems to me that Cassandra cannot handle receiving simultaneously SSTables from the same node. However, when it receives simultaneously SSTables from two different nodes, everything works fine. As a consequence, when using BulkOutputFormat to generate SSTables and stream them to a cassandra cluster, I cannot use more than one reducer per node otherwise I get a java.io.EOFException in the tasktracker's logs and a java.io.IOException: Broken pipe in the Cassandra logs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Comment Edited] (CASSANDRA-4813) Problem using BulkOutputFormat while streaming several SSTables simultaneously from a given node.

Posted by "Michael Kjellman (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-4813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13493597#comment-13493597 ] 

Michael Kjellman edited comment on CASSANDRA-4813 at 11/9/12 2:03 AM:
----------------------------------------------------------------------

new problem now with current patch.

2012-11-08 15:29:05,323 ERROR org.apache.cassandra.concurrent.DebuggableThreadPoolExecutor: Error in ThreadPoolExecutor
java.lang.RuntimeException: java.io.IOException: Broken pipe
        at com.google.common.base.Throwables.propagate(Throwables.java:156)
        at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:32)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
        at java.lang.Thread.run(Thread.java:722)
Caused by: java.io.IOException: Broken pipe
        at sun.nio.ch.FileDispatcherImpl.write0(Native Method)
        at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47)
        at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:89)
        at sun.nio.ch.IOUtil.write(IOUtil.java:60)
        at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:450)
        at java.nio.channels.Channels.writeFullyImpl(Channels.java:78)
        at java.nio.channels.Channels.writeFully(Channels.java:98)
        at java.nio.channels.Channels.access$000(Channels.java:61)
        at java.nio.channels.Channels$1.write(Channels.java:174)
        at com.ning.compress.lzf.LZFChunk.writeCompressedHeader(LZFChunk.java:77)
        at com.ning.compress.lzf.ChunkEncoder.encodeAndWriteChunk(ChunkEncoder.java:132)
        at com.ning.compress.lzf.LZFOutputStream.writeCompressedBlock(LZFOutputStream.java:203)
        at com.ning.compress.lzf.LZFOutputStream.write(LZFOutputStream.java:97)
        at org.apache.cassandra.streaming.FileStreamTask.write(FileStreamTask.java:218)
        at org.apache.cassandra.streaming.FileStreamTask.stream(FileStreamTask.java:164)
        at org.apache.cassandra.streaming.FileStreamTask.runMayThrow(FileStreamTask.java:91)
        at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
        ... 3 more
                
      was (Author: mkjellman):
    new problem now with this patch.


attempt_201211061048_0010_r_000002_0: Exception in thread "Streaming to /10.25.9.5:1" java.lang.RuntimeException: java.io.IOException: Broken pipe
attempt_201211061048_0010_r_000002_0: 	at com.google.common.base.Throwables.propagate(Throwables.java:156)
attempt_201211061048_0010_r_000002_0: 	at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:32)
attempt_201211061048_0010_r_000002_0: 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
attempt_201211061048_0010_r_000002_0: 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
attempt_201211061048_0010_r_000002_0: 	at java.lang.Thread.run(Thread.java:722)
attempt_201211061048_0010_r_000002_0: Caused by: java.io.IOException: Broken pipe
attempt_201211061048_0010_r_000002_0: 	at sun.nio.ch.FileDispatcherImpl.write0(Native Method)
attempt_201211061048_0010_r_000002_0: 	at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47)
attempt_201211061048_0010_r_000002_0: 	at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:89)
attempt_201211061048_0010_r_000002_0: 	at sun.nio.ch.IOUtil.write(IOUtil.java:60)
attempt_201211061048_0010_r_000002_0: 	at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:450)
attempt_201211061048_0010_r_000002_0: 	at java.nio.channels.Channels.writeFullyImpl(Channels.java:78)
attempt_201211061048_0010_r_000002_0: 	at java.nio.channels.Channels.writeFully(Channels.java:98)
attempt_201211061048_0010_r_000002_0: 	at java.nio.channels.Channels.access$000(Channels.java:61)
attempt_201211061048_0010_r_000002_0: 	at java.nio.channels.Channels$1.write(Channels.java:174)
attempt_201211061048_0010_r_000002_0: 	at com.ning.compress.lzf.ChunkEncoder.encodeAndWriteChunk(ChunkEncoder.java:133)
attempt_201211061048_0010_r_000002_0: 	at com.ning.compress.lzf.LZFOutputStream.writeCompressedBlock(LZFOutputStream.java:203)
attempt_201211061048_0010_r_000002_0: 	at com.ning.compress.lzf.LZFOutputStream.write(LZFOutputStream.java:97)
attempt_201211061048_0010_r_000002_0: 	at org.apache.cassandra.streaming.FileStreamTask.write(FileStreamTask.java:218)
attempt_201211061048_0010_r_000002_0: 	at org.apache.cassandra.streaming.FileStreamTask.stream(FileStreamTask.java:164)
attempt_201211061048_0010_r_000002_0: 	at org.apache.cassandra.streaming.FileStreamTask.runMayThrow(FileStreamTask.java:91)
attempt_201211061048_0010_r_000002_0: 	at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
attempt_201211061048_0010_r_000002_0: 	... 3 more
                  
> Problem using BulkOutputFormat while streaming several SSTables simultaneously from a given node.
> -------------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-4813
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-4813
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 1.1.0
>         Environment: I am using SLES 10 SP3, Java 6, 4 Cassandra + Hadoop nodes, 3 Hadoop only nodes (datanodes/tasktrackers), 1 namenode/jobtracker. The machines used are Six-Core AMD Opteron(tm) Processor 8431, 24 cores and 33 GB of RAM. I get the issue on both cassandra 1.1.3, 1.1.5 and I am using Hadoop 0.20.2.
>            Reporter: Ralph Romanos
>            Assignee: Yuki Morishita
>            Priority: Minor
>              Labels: Bulkoutputformat, Hadoop, SSTables
>             Fix For: 1.2.0 rc1
>
>         Attachments: 4813.txt
>
>
> The issue occurs when streaming simultaneously SSTables from the same node to a cassandra cluster using SSTableloader. It seems to me that Cassandra cannot handle receiving simultaneously SSTables from the same node. However, when it receives simultaneously SSTables from two different nodes, everything works fine. As a consequence, when using BulkOutputFormat to generate SSTables and stream them to a cassandra cluster, I cannot use more than one reducer per node otherwise I get a java.io.EOFException in the tasktracker's logs and a java.io.IOException: Broken pipe in the Cassandra logs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira