You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Sylvain Lebresne (Created) (JIRA)" <ji...@apache.org> on 2011/11/15 17:42:52 UTC

[jira] [Created] (CASSANDRA-3494) Streaming is mono-threaded (the bulk loader too by extension)

Streaming is mono-threaded (the bulk loader too by extension)
-------------------------------------------------------------

                 Key: CASSANDRA-3494
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3494
             Project: Cassandra
          Issue Type: Improvement
          Components: Core
    Affects Versions: 0.8.0
            Reporter: Sylvain Lebresne
            Priority: Minor


The streamExecutor is define as:
{noformat}
streamExecutor_ = new DebuggableThreadPoolExecutor("Streaming", Thread.MIN_PRIORITY);
{noformat}
In the meantime, in DebuggableThreadPoolExecutor.java:
{noformat}
public DebuggableThreadPoolExecutor(String threadPoolName, int priority)
{
   this(1, Integer.MAX_VALUE, TimeUnit.SECONDS, new LinkedBlockingQueue<Runnable>(), new NamedThreadFactory(threadPoolName, priority));
}
{noformat}
In other word, since the core pool size is 1 and the queue unbounded, tasks will always queued and the executor is essentially mono-threaded.
This is clearly not necessary since we already have stream throttling nowadays. And it could be a limiting factor in the case of the bulk loader.

Besides, I would venture that this maybe was not the intention, because putting the max core size to MAX_VALUE would suggest that the intention was to spawn threads on demand. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-3494) Streaming is mono-threaded (the bulk loader too by extension)

Posted by "Peter Schuller (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13161228#comment-13161228 ] 

Peter Schuller commented on CASSANDRA-3494:
-------------------------------------------

A complication is that merely upping the concurrency works, but given that you may have a lot of files destined for one particular host, you actually need to allow a *lot* of streams in order to ensure that you're streaming to all nodes that want files. At some point you're gonna die seek death from the amount of readers.
                
> Streaming is mono-threaded (the bulk loader too by extension)
> -------------------------------------------------------------
>
>                 Key: CASSANDRA-3494
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3494
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>    Affects Versions: 0.8.0
>            Reporter: Sylvain Lebresne
>            Priority: Minor
>
> The streamExecutor is define as:
> {noformat}
> streamExecutor_ = new DebuggableThreadPoolExecutor("Streaming", Thread.MIN_PRIORITY);
> {noformat}
> In the meantime, in DebuggableThreadPoolExecutor.java:
> {noformat}
> public DebuggableThreadPoolExecutor(String threadPoolName, int priority)
> {
>    this(1, Integer.MAX_VALUE, TimeUnit.SECONDS, new LinkedBlockingQueue<Runnable>(), new NamedThreadFactory(threadPoolName, priority));
> }
> {noformat}
> In other word, since the core pool size is 1 and the queue unbounded, tasks will always queued and the executor is essentially mono-threaded.
> This is clearly not necessary since we already have stream throttling nowadays. And it could be a limiting factor in the case of the bulk loader.
> Besides, I would venture that this maybe was not the intention, because putting the max core size to MAX_VALUE would suggest that the intention was to spawn threads on demand. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-3494) Streaming is mono-threaded (the bulk loader too by extension)

Posted by "Peter Schuller (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13161051#comment-13161051 ] 

Peter Schuller commented on CASSANDRA-3494:
-------------------------------------------

(Part of the reason the destinations were bottlenecking were that the sstable rebuilds + minor compactions happening during streaming are expensive and eat disk throughput. So you'd have a bunch of machines just idling waiting for the sender to send them some data, while the sender is blocking on a write to a TCP conenction, where on the other end some receiver is blocking on a write to disk because that node happened to be lucky and got some good sources and got more streams to it than it could handle in terms of throughput.)
                
> Streaming is mono-threaded (the bulk loader too by extension)
> -------------------------------------------------------------
>
>                 Key: CASSANDRA-3494
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3494
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>    Affects Versions: 0.8.0
>            Reporter: Sylvain Lebresne
>            Priority: Minor
>
> The streamExecutor is define as:
> {noformat}
> streamExecutor_ = new DebuggableThreadPoolExecutor("Streaming", Thread.MIN_PRIORITY);
> {noformat}
> In the meantime, in DebuggableThreadPoolExecutor.java:
> {noformat}
> public DebuggableThreadPoolExecutor(String threadPoolName, int priority)
> {
>    this(1, Integer.MAX_VALUE, TimeUnit.SECONDS, new LinkedBlockingQueue<Runnable>(), new NamedThreadFactory(threadPoolName, priority));
> }
> {noformat}
> In other word, since the core pool size is 1 and the queue unbounded, tasks will always queued and the executor is essentially mono-threaded.
> This is clearly not necessary since we already have stream throttling nowadays. And it could be a limiting factor in the case of the bulk loader.
> Besides, I would venture that this maybe was not the intention, because putting the max core size to MAX_VALUE would suggest that the intention was to spawn threads on demand. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (CASSANDRA-3494) Streaming is mono-threaded (the bulk loader too by extension)

Posted by "Jonathan Ellis (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-3494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Ellis updated CASSANDRA-3494:
--------------------------------------

    Reviewer: yukim
    
> Streaming is mono-threaded (the bulk loader too by extension)
> -------------------------------------------------------------
>
>                 Key: CASSANDRA-3494
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3494
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>    Affects Versions: 0.8.0
>            Reporter: Sylvain Lebresne
>            Assignee: Peter Schuller
>            Priority: Minor
>         Attachments: CASSANDRA-3494-0.8-prelim.txt
>
>
> The streamExecutor is define as:
> {noformat}
> streamExecutor_ = new DebuggableThreadPoolExecutor("Streaming", Thread.MIN_PRIORITY);
> {noformat}
> In the meantime, in DebuggableThreadPoolExecutor.java:
> {noformat}
> public DebuggableThreadPoolExecutor(String threadPoolName, int priority)
> {
>    this(1, Integer.MAX_VALUE, TimeUnit.SECONDS, new LinkedBlockingQueue<Runnable>(), new NamedThreadFactory(threadPoolName, priority));
> }
> {noformat}
> In other word, since the core pool size is 1 and the queue unbounded, tasks will always queued and the executor is essentially mono-threaded.
> This is clearly not necessary since we already have stream throttling nowadays. And it could be a limiting factor in the case of the bulk loader.
> Besides, I would venture that this maybe was not the intention, because putting the max core size to MAX_VALUE would suggest that the intention was to spawn threads on demand. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Assigned] (CASSANDRA-3494) Streaming is mono-threaded (the bulk loader too by extension)

Posted by "Peter Schuller (Assigned) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-3494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Peter Schuller reassigned CASSANDRA-3494:
-----------------------------------------

    Assignee: Peter Schuller
    
> Streaming is mono-threaded (the bulk loader too by extension)
> -------------------------------------------------------------
>
>                 Key: CASSANDRA-3494
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3494
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>    Affects Versions: 0.8.0
>            Reporter: Sylvain Lebresne
>            Assignee: Peter Schuller
>            Priority: Minor
>
> The streamExecutor is define as:
> {noformat}
> streamExecutor_ = new DebuggableThreadPoolExecutor("Streaming", Thread.MIN_PRIORITY);
> {noformat}
> In the meantime, in DebuggableThreadPoolExecutor.java:
> {noformat}
> public DebuggableThreadPoolExecutor(String threadPoolName, int priority)
> {
>    this(1, Integer.MAX_VALUE, TimeUnit.SECONDS, new LinkedBlockingQueue<Runnable>(), new NamedThreadFactory(threadPoolName, priority));
> }
> {noformat}
> In other word, since the core pool size is 1 and the queue unbounded, tasks will always queued and the executor is essentially mono-threaded.
> This is clearly not necessary since we already have stream throttling nowadays. And it could be a limiting factor in the case of the bulk loader.
> Besides, I would venture that this maybe was not the intention, because putting the max core size to MAX_VALUE would suggest that the intention was to spawn threads on demand. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (CASSANDRA-3494) Streaming is mono-threaded (the bulk loader too by extension)

Posted by "Sylvain Lebresne (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-3494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sylvain Lebresne updated CASSANDRA-3494:
----------------------------------------

    Attachment: 0001-cleanup-DTPE.patch

oh right, fixed.
                
> Streaming is mono-threaded (the bulk loader too by extension)
> -------------------------------------------------------------
>
>                 Key: CASSANDRA-3494
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3494
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>    Affects Versions: 0.8.0
>            Reporter: Sylvain Lebresne
>            Assignee: Peter Schuller
>            Priority: Minor
>             Fix For: 1.1
>
>         Attachments: 0001-cleanup-DTPE.patch, CASSANDRA-3494-0.8-prelim.txt, CASSANDRA-3494-1.0.txt
>
>
> The streamExecutor is define as:
> {noformat}
> streamExecutor_ = new DebuggableThreadPoolExecutor("Streaming", Thread.MIN_PRIORITY);
> {noformat}
> In the meantime, in DebuggableThreadPoolExecutor.java:
> {noformat}
> public DebuggableThreadPoolExecutor(String threadPoolName, int priority)
> {
>    this(1, Integer.MAX_VALUE, TimeUnit.SECONDS, new LinkedBlockingQueue<Runnable>(), new NamedThreadFactory(threadPoolName, priority));
> }
> {noformat}
> In other word, since the core pool size is 1 and the queue unbounded, tasks will always queued and the executor is essentially mono-threaded.
> This is clearly not necessary since we already have stream throttling nowadays. And it could be a limiting factor in the case of the bulk loader.
> Besides, I would venture that this maybe was not the intention, because putting the max core size to MAX_VALUE would suggest that the intention was to spawn threads on demand. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-3494) Streaming is mono-threaded (the bulk loader too by extension)

Posted by "Jonathan Ellis (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13150599#comment-13150599 ] 

Jonathan Ellis commented on CASSANDRA-3494:
-------------------------------------------

This was intentional to start with (it pre-dates the stream throttling).  And if we can hit the throttle cap w/ a single thread (we can) then I'm not sure that part needs to be multithreaded.

The sstable building pre-stream seems like a more profitable area to multithread for the bulkloader scenario.
                
> Streaming is mono-threaded (the bulk loader too by extension)
> -------------------------------------------------------------
>
>                 Key: CASSANDRA-3494
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3494
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>    Affects Versions: 0.8.0
>            Reporter: Sylvain Lebresne
>            Priority: Minor
>
> The streamExecutor is define as:
> {noformat}
> streamExecutor_ = new DebuggableThreadPoolExecutor("Streaming", Thread.MIN_PRIORITY);
> {noformat}
> In the meantime, in DebuggableThreadPoolExecutor.java:
> {noformat}
> public DebuggableThreadPoolExecutor(String threadPoolName, int priority)
> {
>    this(1, Integer.MAX_VALUE, TimeUnit.SECONDS, new LinkedBlockingQueue<Runnable>(), new NamedThreadFactory(threadPoolName, priority));
> }
> {noformat}
> In other word, since the core pool size is 1 and the queue unbounded, tasks will always queued and the executor is essentially mono-threaded.
> This is clearly not necessary since we already have stream throttling nowadays. And it could be a limiting factor in the case of the bulk loader.
> Besides, I would venture that this maybe was not the intention, because putting the max core size to MAX_VALUE would suggest that the intention was to spawn threads on demand. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-3494) Streaming is mono-threaded (the bulk loader too by extension)

Posted by "Hudson (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13163312#comment-13163312 ] 

Hudson commented on CASSANDRA-3494:
-----------------------------------

Integrated in Cassandra #1238 (See [https://builds.apache.org/job/Cassandra/1238/])
    multithreaded streaming
patch by Peter Schuller; reviewed by yukim for CASSANDRA-3494

jbellis : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1210748
Files : 
* /cassandra/trunk/CHANGES.txt
* /cassandra/trunk/src/java/org/apache/cassandra/concurrent/DebuggableThreadPoolExecutor.java
* /cassandra/trunk/src/java/org/apache/cassandra/net/MessagingService.java
* /cassandra/trunk/src/java/org/apache/cassandra/streaming/FileStreamTask.java

                
> Streaming is mono-threaded (the bulk loader too by extension)
> -------------------------------------------------------------
>
>                 Key: CASSANDRA-3494
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3494
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>    Affects Versions: 0.8.0
>            Reporter: Sylvain Lebresne
>            Assignee: Peter Schuller
>            Priority: Minor
>             Fix For: 1.1
>
>         Attachments: CASSANDRA-3494-0.8-prelim.txt, CASSANDRA-3494-1.0.txt
>
>
> The streamExecutor is define as:
> {noformat}
> streamExecutor_ = new DebuggableThreadPoolExecutor("Streaming", Thread.MIN_PRIORITY);
> {noformat}
> In the meantime, in DebuggableThreadPoolExecutor.java:
> {noformat}
> public DebuggableThreadPoolExecutor(String threadPoolName, int priority)
> {
>    this(1, Integer.MAX_VALUE, TimeUnit.SECONDS, new LinkedBlockingQueue<Runnable>(), new NamedThreadFactory(threadPoolName, priority));
> }
> {noformat}
> In other word, since the core pool size is 1 and the queue unbounded, tasks will always queued and the executor is essentially mono-threaded.
> This is clearly not necessary since we already have stream throttling nowadays. And it could be a limiting factor in the case of the bulk loader.
> Besides, I would venture that this maybe was not the intention, because putting the max core size to MAX_VALUE would suggest that the intention was to spawn threads on demand. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-3494) Streaming is mono-threaded (the bulk loader too by extension)

Posted by "Sylvain Lebresne (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13162706#comment-13162706 ] 

Sylvain Lebresne commented on CASSANDRA-3494:
---------------------------------------------

I'm pretty sure we don't want to commit this in 0.8. This is clearly an optimization (not a bug or regression fix), so there is no point in taking any risks with 0.8. To be honest, I wouldn't be opposed to putting this in trunk only.
                
> Streaming is mono-threaded (the bulk loader too by extension)
> -------------------------------------------------------------
>
>                 Key: CASSANDRA-3494
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3494
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>    Affects Versions: 0.8.0
>            Reporter: Sylvain Lebresne
>            Assignee: Peter Schuller
>            Priority: Minor
>         Attachments: CASSANDRA-3494-0.8-prelim.txt, CASSANDRA-3494-1.0.txt
>
>
> The streamExecutor is define as:
> {noformat}
> streamExecutor_ = new DebuggableThreadPoolExecutor("Streaming", Thread.MIN_PRIORITY);
> {noformat}
> In the meantime, in DebuggableThreadPoolExecutor.java:
> {noformat}
> public DebuggableThreadPoolExecutor(String threadPoolName, int priority)
> {
>    this(1, Integer.MAX_VALUE, TimeUnit.SECONDS, new LinkedBlockingQueue<Runnable>(), new NamedThreadFactory(threadPoolName, priority));
> }
> {noformat}
> In other word, since the core pool size is 1 and the queue unbounded, tasks will always queued and the executor is essentially mono-threaded.
> This is clearly not necessary since we already have stream throttling nowadays. And it could be a limiting factor in the case of the bulk loader.
> Besides, I would venture that this maybe was not the intention, because putting the max core size to MAX_VALUE would suggest that the intention was to spawn threads on demand. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-3494) Streaming is mono-threaded (the bulk loader too by extension)

Posted by "Jonathan Ellis (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13162791#comment-13162791 ] 

Jonathan Ellis commented on CASSANDRA-3494:
-------------------------------------------

+1 trunk-only
                
> Streaming is mono-threaded (the bulk loader too by extension)
> -------------------------------------------------------------
>
>                 Key: CASSANDRA-3494
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3494
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>    Affects Versions: 0.8.0
>            Reporter: Sylvain Lebresne
>            Assignee: Peter Schuller
>            Priority: Minor
>         Attachments: CASSANDRA-3494-0.8-prelim.txt, CASSANDRA-3494-1.0.txt
>
>
> The streamExecutor is define as:
> {noformat}
> streamExecutor_ = new DebuggableThreadPoolExecutor("Streaming", Thread.MIN_PRIORITY);
> {noformat}
> In the meantime, in DebuggableThreadPoolExecutor.java:
> {noformat}
> public DebuggableThreadPoolExecutor(String threadPoolName, int priority)
> {
>    this(1, Integer.MAX_VALUE, TimeUnit.SECONDS, new LinkedBlockingQueue<Runnable>(), new NamedThreadFactory(threadPoolName, priority));
> }
> {noformat}
> In other word, since the core pool size is 1 and the queue unbounded, tasks will always queued and the executor is essentially mono-threaded.
> This is clearly not necessary since we already have stream throttling nowadays. And it could be a limiting factor in the case of the bulk loader.
> Besides, I would venture that this maybe was not the intention, because putting the max core size to MAX_VALUE would suggest that the intention was to spawn threads on demand. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-3494) Streaming is mono-threaded (the bulk loader too by extension)

Posted by "Jonathan Ellis (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13181358#comment-13181358 ] 

Jonathan Ellis commented on CASSANDRA-3494:
-------------------------------------------

is there still a meaningful distinction b/t createWithFixedPoolSize and createWithMaximumPoolSize?
                
> Streaming is mono-threaded (the bulk loader too by extension)
> -------------------------------------------------------------
>
>                 Key: CASSANDRA-3494
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3494
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>    Affects Versions: 0.8.0
>            Reporter: Sylvain Lebresne
>            Assignee: Peter Schuller
>            Priority: Minor
>             Fix For: 1.1
>
>         Attachments: 0001-cleanup-DTPE.patch, CASSANDRA-3494-0.8-prelim.txt, CASSANDRA-3494-1.0.txt
>
>
> The streamExecutor is define as:
> {noformat}
> streamExecutor_ = new DebuggableThreadPoolExecutor("Streaming", Thread.MIN_PRIORITY);
> {noformat}
> In the meantime, in DebuggableThreadPoolExecutor.java:
> {noformat}
> public DebuggableThreadPoolExecutor(String threadPoolName, int priority)
> {
>    this(1, Integer.MAX_VALUE, TimeUnit.SECONDS, new LinkedBlockingQueue<Runnable>(), new NamedThreadFactory(threadPoolName, priority));
> }
> {noformat}
> In other word, since the core pool size is 1 and the queue unbounded, tasks will always queued and the executor is essentially mono-threaded.
> This is clearly not necessary since we already have stream throttling nowadays. And it could be a limiting factor in the case of the bulk loader.
> Besides, I would venture that this maybe was not the intention, because putting the max core size to MAX_VALUE would suggest that the intention was to spawn threads on demand. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-3494) Streaming is mono-threaded (the bulk loader too by extension)

Posted by "Peter Schuller (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13178533#comment-13178533 ] 

Peter Schuller commented on CASSANDRA-3494:
-------------------------------------------

+1. I no longer remember why I wanted 0, in light of re-reading the javadocs. Unless I observed empirically that threads were retained. I remember there was *some* reason that caused me to start looking at the JDK source. So if changing to 1 I'd suggest just making sure that threads do indeed die off properly.

                
> Streaming is mono-threaded (the bulk loader too by extension)
> -------------------------------------------------------------
>
>                 Key: CASSANDRA-3494
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3494
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>    Affects Versions: 0.8.0
>            Reporter: Sylvain Lebresne
>            Assignee: Peter Schuller
>            Priority: Minor
>             Fix For: 1.1
>
>         Attachments: CASSANDRA-3494-0.8-prelim.txt, CASSANDRA-3494-1.0.txt
>
>
> The streamExecutor is define as:
> {noformat}
> streamExecutor_ = new DebuggableThreadPoolExecutor("Streaming", Thread.MIN_PRIORITY);
> {noformat}
> In the meantime, in DebuggableThreadPoolExecutor.java:
> {noformat}
> public DebuggableThreadPoolExecutor(String threadPoolName, int priority)
> {
>    this(1, Integer.MAX_VALUE, TimeUnit.SECONDS, new LinkedBlockingQueue<Runnable>(), new NamedThreadFactory(threadPoolName, priority));
> }
> {noformat}
> In other word, since the core pool size is 1 and the queue unbounded, tasks will always queued and the executor is essentially mono-threaded.
> This is clearly not necessary since we already have stream throttling nowadays. And it could be a limiting factor in the case of the bulk loader.
> Besides, I would venture that this maybe was not the intention, because putting the max core size to MAX_VALUE would suggest that the intention was to spawn threads on demand. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-3494) Streaming is mono-threaded (the bulk loader too by extension)

Posted by "Peter Schuller (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13161058#comment-13161058 ] 

Peter Schuller commented on CASSANDRA-3494:
-------------------------------------------

Just had a thought (will have to check later): I hope we're not *also* waiting for the entire streaming session to complete before the next streaming session is allowed to run, because that would imply also waiting on the SSTable rebuild on the destination (though no longer in 1.0).
                
> Streaming is mono-threaded (the bulk loader too by extension)
> -------------------------------------------------------------
>
>                 Key: CASSANDRA-3494
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3494
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>    Affects Versions: 0.8.0
>            Reporter: Sylvain Lebresne
>            Priority: Minor
>
> The streamExecutor is define as:
> {noformat}
> streamExecutor_ = new DebuggableThreadPoolExecutor("Streaming", Thread.MIN_PRIORITY);
> {noformat}
> In the meantime, in DebuggableThreadPoolExecutor.java:
> {noformat}
> public DebuggableThreadPoolExecutor(String threadPoolName, int priority)
> {
>    this(1, Integer.MAX_VALUE, TimeUnit.SECONDS, new LinkedBlockingQueue<Runnable>(), new NamedThreadFactory(threadPoolName, priority));
> }
> {noformat}
> In other word, since the core pool size is 1 and the queue unbounded, tasks will always queued and the executor is essentially mono-threaded.
> This is clearly not necessary since we already have stream throttling nowadays. And it could be a limiting factor in the case of the bulk loader.
> Besides, I would venture that this maybe was not the intention, because putting the max core size to MAX_VALUE would suggest that the intention was to spawn threads on demand. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-3494) Streaming is mono-threaded (the bulk loader too by extension)

Posted by "Peter Schuller (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13161134#comment-13161134 ] 

Peter Schuller commented on CASSANDRA-3494:
-------------------------------------------

I can submit a patch to make this a tunable, defaulting to something semi-reasonable like 10. Do we however believe there are concurrency issues here? A very very brief sifting of the code looked to me like we're alright.
                
> Streaming is mono-threaded (the bulk loader too by extension)
> -------------------------------------------------------------
>
>                 Key: CASSANDRA-3494
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3494
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>    Affects Versions: 0.8.0
>            Reporter: Sylvain Lebresne
>            Priority: Minor
>
> The streamExecutor is define as:
> {noformat}
> streamExecutor_ = new DebuggableThreadPoolExecutor("Streaming", Thread.MIN_PRIORITY);
> {noformat}
> In the meantime, in DebuggableThreadPoolExecutor.java:
> {noformat}
> public DebuggableThreadPoolExecutor(String threadPoolName, int priority)
> {
>    this(1, Integer.MAX_VALUE, TimeUnit.SECONDS, new LinkedBlockingQueue<Runnable>(), new NamedThreadFactory(threadPoolName, priority));
> }
> {noformat}
> In other word, since the core pool size is 1 and the queue unbounded, tasks will always queued and the executor is essentially mono-threaded.
> This is clearly not necessary since we already have stream throttling nowadays. And it could be a limiting factor in the case of the bulk loader.
> Besides, I would venture that this maybe was not the intention, because putting the max core size to MAX_VALUE would suggest that the intention was to spawn threads on demand. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-3494) Streaming is mono-threaded (the bulk loader too by extension)

Posted by "Sylvain Lebresne (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13181384#comment-13181384 ] 

Sylvain Lebresne commented on CASSANDRA-3494:
---------------------------------------------

As much as there was before my remove of the allowCoreThreadTimeOut line :)
Basically createWithFixed doesn't timeout it's thread, while createWithMaximum does it. Not claiming the names are perfect, though I'm not too unhappy with them.
                
> Streaming is mono-threaded (the bulk loader too by extension)
> -------------------------------------------------------------
>
>                 Key: CASSANDRA-3494
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3494
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>    Affects Versions: 0.8.0
>            Reporter: Sylvain Lebresne
>            Assignee: Peter Schuller
>            Priority: Minor
>             Fix For: 1.1
>
>         Attachments: 0001-cleanup-DTPE.patch, CASSANDRA-3494-0.8-prelim.txt, CASSANDRA-3494-1.0.txt
>
>
> The streamExecutor is define as:
> {noformat}
> streamExecutor_ = new DebuggableThreadPoolExecutor("Streaming", Thread.MIN_PRIORITY);
> {noformat}
> In the meantime, in DebuggableThreadPoolExecutor.java:
> {noformat}
> public DebuggableThreadPoolExecutor(String threadPoolName, int priority)
> {
>    this(1, Integer.MAX_VALUE, TimeUnit.SECONDS, new LinkedBlockingQueue<Runnable>(), new NamedThreadFactory(threadPoolName, priority));
> }
> {noformat}
> In other word, since the core pool size is 1 and the queue unbounded, tasks will always queued and the executor is essentially mono-threaded.
> This is clearly not necessary since we already have stream throttling nowadays. And it could be a limiting factor in the case of the bulk loader.
> Besides, I would venture that this maybe was not the intention, because putting the max core size to MAX_VALUE would suggest that the intention was to spawn threads on demand. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-3494) Streaming is mono-threaded (the bulk loader too by extension)

Posted by "Jonathan Ellis (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13176731#comment-13176731 ] 

Jonathan Ellis commented on CASSANDRA-3494:
-------------------------------------------

Does this actually work?  My reading of ThreadPoolExecutor suggests that an unbounded queue will result in no non-core threads ever being created.
                
> Streaming is mono-threaded (the bulk loader too by extension)
> -------------------------------------------------------------
>
>                 Key: CASSANDRA-3494
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3494
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>    Affects Versions: 0.8.0
>            Reporter: Sylvain Lebresne
>            Assignee: Peter Schuller
>            Priority: Minor
>             Fix For: 1.1
>
>         Attachments: CASSANDRA-3494-0.8-prelim.txt, CASSANDRA-3494-1.0.txt
>
>
> The streamExecutor is define as:
> {noformat}
> streamExecutor_ = new DebuggableThreadPoolExecutor("Streaming", Thread.MIN_PRIORITY);
> {noformat}
> In the meantime, in DebuggableThreadPoolExecutor.java:
> {noformat}
> public DebuggableThreadPoolExecutor(String threadPoolName, int priority)
> {
>    this(1, Integer.MAX_VALUE, TimeUnit.SECONDS, new LinkedBlockingQueue<Runnable>(), new NamedThreadFactory(threadPoolName, priority));
> }
> {noformat}
> In other word, since the core pool size is 1 and the queue unbounded, tasks will always queued and the executor is essentially mono-threaded.
> This is clearly not necessary since we already have stream throttling nowadays. And it could be a limiting factor in the case of the bulk loader.
> Besides, I would venture that this maybe was not the intention, because putting the max core size to MAX_VALUE would suggest that the intention was to spawn threads on demand. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-3494) Streaming is mono-threaded (the bulk loader too by extension)

Posted by "Jonathan Ellis (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13178417#comment-13178417 ] 

Jonathan Ellis commented on CASSANDRA-3494:
-------------------------------------------

bq. Thus, no more than corePoolSize threads will ever be created

Right, that was the source of my confusion.

bq. I believe the correct way to create an executor with a max number of threads and where all threads timeout if unused with ThreadPoolExecutor is to set corePoolSize == maxPoolSize == whateverTheMaxShouldBe (1 for that patch) and to use ThreadPoolExecutor.allowCoreTheadTimeout().

+1
                
> Streaming is mono-threaded (the bulk loader too by extension)
> -------------------------------------------------------------
>
>                 Key: CASSANDRA-3494
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3494
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>    Affects Versions: 0.8.0
>            Reporter: Sylvain Lebresne
>            Assignee: Peter Schuller
>            Priority: Minor
>             Fix For: 1.1
>
>         Attachments: CASSANDRA-3494-0.8-prelim.txt, CASSANDRA-3494-1.0.txt
>
>
> The streamExecutor is define as:
> {noformat}
> streamExecutor_ = new DebuggableThreadPoolExecutor("Streaming", Thread.MIN_PRIORITY);
> {noformat}
> In the meantime, in DebuggableThreadPoolExecutor.java:
> {noformat}
> public DebuggableThreadPoolExecutor(String threadPoolName, int priority)
> {
>    this(1, Integer.MAX_VALUE, TimeUnit.SECONDS, new LinkedBlockingQueue<Runnable>(), new NamedThreadFactory(threadPoolName, priority));
> }
> {noformat}
> In other word, since the core pool size is 1 and the queue unbounded, tasks will always queued and the executor is essentially mono-threaded.
> This is clearly not necessary since we already have stream throttling nowadays. And it could be a limiting factor in the case of the bulk loader.
> Besides, I would venture that this maybe was not the intention, because putting the max core size to MAX_VALUE would suggest that the intention was to spawn threads on demand. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-3494) Streaming is mono-threaded (the bulk loader too by extension)

Posted by "Peter Schuller (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13162630#comment-13162630 ] 

Peter Schuller commented on CASSANDRA-3494:
-------------------------------------------

If you want to apply this to 0.8 I can submit an 0.8 patch which is properly rebased against the 0.8 branch instead of our internal one. I was assuming it would only be going into 1.x. Want one?
                
> Streaming is mono-threaded (the bulk loader too by extension)
> -------------------------------------------------------------
>
>                 Key: CASSANDRA-3494
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3494
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>    Affects Versions: 0.8.0
>            Reporter: Sylvain Lebresne
>            Assignee: Peter Schuller
>            Priority: Minor
>         Attachments: CASSANDRA-3494-0.8-prelim.txt, CASSANDRA-3494-1.0.txt
>
>
> The streamExecutor is define as:
> {noformat}
> streamExecutor_ = new DebuggableThreadPoolExecutor("Streaming", Thread.MIN_PRIORITY);
> {noformat}
> In the meantime, in DebuggableThreadPoolExecutor.java:
> {noformat}
> public DebuggableThreadPoolExecutor(String threadPoolName, int priority)
> {
>    this(1, Integer.MAX_VALUE, TimeUnit.SECONDS, new LinkedBlockingQueue<Runnable>(), new NamedThreadFactory(threadPoolName, priority));
> }
> {noformat}
> In other word, since the core pool size is 1 and the queue unbounded, tasks will always queued and the executor is essentially mono-threaded.
> This is clearly not necessary since we already have stream throttling nowadays. And it could be a limiting factor in the case of the bulk loader.
> Besides, I would venture that this maybe was not the intention, because putting the max core size to MAX_VALUE would suggest that the intention was to spawn threads on demand. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-3494) Streaming is mono-threaded (the bulk loader too by extension)

Posted by "Jonathan Ellis (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13180982#comment-13180982 ] 

Jonathan Ellis commented on CASSANDRA-3494:
-------------------------------------------

        allowCoreThreadTimeOut(true) is default behavior in DTPE

                
> Streaming is mono-threaded (the bulk loader too by extension)
> -------------------------------------------------------------
>
>                 Key: CASSANDRA-3494
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3494
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>    Affects Versions: 0.8.0
>            Reporter: Sylvain Lebresne
>            Assignee: Peter Schuller
>            Priority: Minor
>             Fix For: 1.1
>
>         Attachments: 0001-cleanup-DTPE.patch, CASSANDRA-3494-0.8-prelim.txt, CASSANDRA-3494-1.0.txt
>
>
> The streamExecutor is define as:
> {noformat}
> streamExecutor_ = new DebuggableThreadPoolExecutor("Streaming", Thread.MIN_PRIORITY);
> {noformat}
> In the meantime, in DebuggableThreadPoolExecutor.java:
> {noformat}
> public DebuggableThreadPoolExecutor(String threadPoolName, int priority)
> {
>    this(1, Integer.MAX_VALUE, TimeUnit.SECONDS, new LinkedBlockingQueue<Runnable>(), new NamedThreadFactory(threadPoolName, priority));
> }
> {noformat}
> In other word, since the core pool size is 1 and the queue unbounded, tasks will always queued and the executor is essentially mono-threaded.
> This is clearly not necessary since we already have stream throttling nowadays. And it could be a limiting factor in the case of the bulk loader.
> Besides, I would venture that this maybe was not the intention, because putting the max core size to MAX_VALUE would suggest that the intention was to spawn threads on demand. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (CASSANDRA-3494) Streaming is mono-threaded (the bulk loader too by extension)

Posted by "Sylvain Lebresne (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-3494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sylvain Lebresne updated CASSANDRA-3494:
----------------------------------------

    Attachment:     (was: 0001-cleanup-DTPE.patch)
    
> Streaming is mono-threaded (the bulk loader too by extension)
> -------------------------------------------------------------
>
>                 Key: CASSANDRA-3494
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3494
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>    Affects Versions: 0.8.0
>            Reporter: Sylvain Lebresne
>            Assignee: Peter Schuller
>            Priority: Minor
>             Fix For: 1.1
>
>         Attachments: 0001-cleanup-DTPE.patch, CASSANDRA-3494-0.8-prelim.txt, CASSANDRA-3494-1.0.txt
>
>
> The streamExecutor is define as:
> {noformat}
> streamExecutor_ = new DebuggableThreadPoolExecutor("Streaming", Thread.MIN_PRIORITY);
> {noformat}
> In the meantime, in DebuggableThreadPoolExecutor.java:
> {noformat}
> public DebuggableThreadPoolExecutor(String threadPoolName, int priority)
> {
>    this(1, Integer.MAX_VALUE, TimeUnit.SECONDS, new LinkedBlockingQueue<Runnable>(), new NamedThreadFactory(threadPoolName, priority));
> }
> {noformat}
> In other word, since the core pool size is 1 and the queue unbounded, tasks will always queued and the executor is essentially mono-threaded.
> This is clearly not necessary since we already have stream throttling nowadays. And it could be a limiting factor in the case of the bulk loader.
> Besides, I would venture that this maybe was not the intention, because putting the max core size to MAX_VALUE would suggest that the intention was to spawn threads on demand. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-3494) Streaming is mono-threaded (the bulk loader too by extension)

Posted by "Yuki Morishita (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13162627#comment-13162627 ] 

Yuki Morishita commented on CASSANDRA-3494:
-------------------------------------------

I like Peter's idea about having one executor per destination.
+1 on patch against 1.0. (Patch for 0.8 needs rebase but the change is essentially the same.)
                
> Streaming is mono-threaded (the bulk loader too by extension)
> -------------------------------------------------------------
>
>                 Key: CASSANDRA-3494
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3494
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>    Affects Versions: 0.8.0
>            Reporter: Sylvain Lebresne
>            Assignee: Peter Schuller
>            Priority: Minor
>         Attachments: CASSANDRA-3494-0.8-prelim.txt, CASSANDRA-3494-1.0.txt
>
>
> The streamExecutor is define as:
> {noformat}
> streamExecutor_ = new DebuggableThreadPoolExecutor("Streaming", Thread.MIN_PRIORITY);
> {noformat}
> In the meantime, in DebuggableThreadPoolExecutor.java:
> {noformat}
> public DebuggableThreadPoolExecutor(String threadPoolName, int priority)
> {
>    this(1, Integer.MAX_VALUE, TimeUnit.SECONDS, new LinkedBlockingQueue<Runnable>(), new NamedThreadFactory(threadPoolName, priority));
> }
> {noformat}
> In other word, since the core pool size is 1 and the queue unbounded, tasks will always queued and the executor is essentially mono-threaded.
> This is clearly not necessary since we already have stream throttling nowadays. And it could be a limiting factor in the case of the bulk loader.
> Besides, I would venture that this maybe was not the intention, because putting the max core size to MAX_VALUE would suggest that the intention was to spawn threads on demand. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-3494) Streaming is mono-threaded (the bulk loader too by extension)

Posted by "Peter Schuller (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13161045#comment-13161045 ] 

Peter Schuller commented on CASSANDRA-3494:
-------------------------------------------

We had it set very high. Throttling is not the problem here. The problem is that each node was sending data to only one bootstrapping node at a time, and that process was bottlenecking on the destination node's write capacity (see CASSANDRA-3549 too).

Another effect is that nodes weren't evenly bootstrapping across the cluster; so some nodes were at 200 gigs whiles other were at 500, because of the pseudo-random ordering restriction imposed by this limitation (multiple destinations dogpiling on a source -> some have to just wait for a long time before they even begin receiving data). In addition to the aggregate bandwidth limit, this caused even more delay to get to the state where all nodes were bootstrapped.
                
> Streaming is mono-threaded (the bulk loader too by extension)
> -------------------------------------------------------------
>
>                 Key: CASSANDRA-3494
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3494
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>    Affects Versions: 0.8.0
>            Reporter: Sylvain Lebresne
>            Priority: Minor
>
> The streamExecutor is define as:
> {noformat}
> streamExecutor_ = new DebuggableThreadPoolExecutor("Streaming", Thread.MIN_PRIORITY);
> {noformat}
> In the meantime, in DebuggableThreadPoolExecutor.java:
> {noformat}
> public DebuggableThreadPoolExecutor(String threadPoolName, int priority)
> {
>    this(1, Integer.MAX_VALUE, TimeUnit.SECONDS, new LinkedBlockingQueue<Runnable>(), new NamedThreadFactory(threadPoolName, priority));
> }
> {noformat}
> In other word, since the core pool size is 1 and the queue unbounded, tasks will always queued and the executor is essentially mono-threaded.
> This is clearly not necessary since we already have stream throttling nowadays. And it could be a limiting factor in the case of the bulk loader.
> Besides, I would venture that this maybe was not the intention, because putting the max core size to MAX_VALUE would suggest that the intention was to spawn threads on demand. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-3494) Streaming is mono-threaded (the bulk loader too by extension)

Posted by "Peter Schuller (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13162983#comment-13162983 ] 

Peter Schuller commented on CASSANDRA-3494:
-------------------------------------------

The patch applies cleanly to trunk.
                
> Streaming is mono-threaded (the bulk loader too by extension)
> -------------------------------------------------------------
>
>                 Key: CASSANDRA-3494
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3494
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>    Affects Versions: 0.8.0
>            Reporter: Sylvain Lebresne
>            Assignee: Peter Schuller
>            Priority: Minor
>         Attachments: CASSANDRA-3494-0.8-prelim.txt, CASSANDRA-3494-1.0.txt
>
>
> The streamExecutor is define as:
> {noformat}
> streamExecutor_ = new DebuggableThreadPoolExecutor("Streaming", Thread.MIN_PRIORITY);
> {noformat}
> In the meantime, in DebuggableThreadPoolExecutor.java:
> {noformat}
> public DebuggableThreadPoolExecutor(String threadPoolName, int priority)
> {
>    this(1, Integer.MAX_VALUE, TimeUnit.SECONDS, new LinkedBlockingQueue<Runnable>(), new NamedThreadFactory(threadPoolName, priority));
> }
> {noformat}
> In other word, since the core pool size is 1 and the queue unbounded, tasks will always queued and the executor is essentially mono-threaded.
> This is clearly not necessary since we already have stream throttling nowadays. And it could be a limiting factor in the case of the bulk loader.
> Besides, I would venture that this maybe was not the intention, because putting the max core size to MAX_VALUE would suggest that the intention was to spawn threads on demand. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-3494) Streaming is mono-threaded (the bulk loader too by extension)

Posted by "Peter Schuller (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13161248#comment-13161248 ] 

Peter Schuller commented on CASSANDRA-3494:
-------------------------------------------

I suggest having a single executor per target node, and creating the executors on-demand (under a simple lock, initiating streaming is not a critical code path). Opinions? That would mean one active stream per destination node.

                
> Streaming is mono-threaded (the bulk loader too by extension)
> -------------------------------------------------------------
>
>                 Key: CASSANDRA-3494
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3494
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>    Affects Versions: 0.8.0
>            Reporter: Sylvain Lebresne
>            Priority: Minor
>
> The streamExecutor is define as:
> {noformat}
> streamExecutor_ = new DebuggableThreadPoolExecutor("Streaming", Thread.MIN_PRIORITY);
> {noformat}
> In the meantime, in DebuggableThreadPoolExecutor.java:
> {noformat}
> public DebuggableThreadPoolExecutor(String threadPoolName, int priority)
> {
>    this(1, Integer.MAX_VALUE, TimeUnit.SECONDS, new LinkedBlockingQueue<Runnable>(), new NamedThreadFactory(threadPoolName, priority));
> }
> {noformat}
> In other word, since the core pool size is 1 and the queue unbounded, tasks will always queued and the executor is essentially mono-threaded.
> This is clearly not necessary since we already have stream throttling nowadays. And it could be a limiting factor in the case of the bulk loader.
> Besides, I would venture that this maybe was not the intention, because putting the max core size to MAX_VALUE would suggest that the intention was to spawn threads on demand. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-3494) Streaming is mono-threaded (the bulk loader too by extension)

Posted by "Stu Hood (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13152399#comment-13152399 ] 

Stu Hood commented on CASSANDRA-3494:
-------------------------------------

In cases like decommission, or the bootstrapping of multiple nodes, multithreading would result in distribution of the throttled bandwidth across more of your cluster, rather than across one switch at a time, for example.
                
> Streaming is mono-threaded (the bulk loader too by extension)
> -------------------------------------------------------------
>
>                 Key: CASSANDRA-3494
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3494
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>    Affects Versions: 0.8.0
>            Reporter: Sylvain Lebresne
>            Priority: Minor
>
> The streamExecutor is define as:
> {noformat}
> streamExecutor_ = new DebuggableThreadPoolExecutor("Streaming", Thread.MIN_PRIORITY);
> {noformat}
> In the meantime, in DebuggableThreadPoolExecutor.java:
> {noformat}
> public DebuggableThreadPoolExecutor(String threadPoolName, int priority)
> {
>    this(1, Integer.MAX_VALUE, TimeUnit.SECONDS, new LinkedBlockingQueue<Runnable>(), new NamedThreadFactory(threadPoolName, priority));
> }
> {noformat}
> In other word, since the core pool size is 1 and the queue unbounded, tasks will always queued and the executor is essentially mono-threaded.
> This is clearly not necessary since we already have stream throttling nowadays. And it could be a limiting factor in the case of the bulk loader.
> Besides, I would venture that this maybe was not the intention, because putting the max core size to MAX_VALUE would suggest that the intention was to spawn threads on demand. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (CASSANDRA-3494) Streaming is mono-threaded (the bulk loader too by extension)

Posted by "Sylvain Lebresne (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-3494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sylvain Lebresne updated CASSANDRA-3494:
----------------------------------------

    Attachment: 0001-cleanup-DTPE.patch

Attaching patch implementing the change described above.
                
> Streaming is mono-threaded (the bulk loader too by extension)
> -------------------------------------------------------------
>
>                 Key: CASSANDRA-3494
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3494
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>    Affects Versions: 0.8.0
>            Reporter: Sylvain Lebresne
>            Assignee: Peter Schuller
>            Priority: Minor
>             Fix For: 1.1
>
>         Attachments: 0001-cleanup-DTPE.patch, CASSANDRA-3494-0.8-prelim.txt, CASSANDRA-3494-1.0.txt
>
>
> The streamExecutor is define as:
> {noformat}
> streamExecutor_ = new DebuggableThreadPoolExecutor("Streaming", Thread.MIN_PRIORITY);
> {noformat}
> In the meantime, in DebuggableThreadPoolExecutor.java:
> {noformat}
> public DebuggableThreadPoolExecutor(String threadPoolName, int priority)
> {
>    this(1, Integer.MAX_VALUE, TimeUnit.SECONDS, new LinkedBlockingQueue<Runnable>(), new NamedThreadFactory(threadPoolName, priority));
> }
> {noformat}
> In other word, since the core pool size is 1 and the queue unbounded, tasks will always queued and the executor is essentially mono-threaded.
> This is clearly not necessary since we already have stream throttling nowadays. And it could be a limiting factor in the case of the bulk loader.
> Besides, I would venture that this maybe was not the intention, because putting the max core size to MAX_VALUE would suggest that the intention was to spawn threads on demand. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-3494) Streaming is mono-threaded (the bulk loader too by extension)

Posted by "Yuki Morishita (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13162699#comment-13162699 ] 

Yuki Morishita commented on CASSANDRA-3494:
-------------------------------------------

Sure. Why not?
                
> Streaming is mono-threaded (the bulk loader too by extension)
> -------------------------------------------------------------
>
>                 Key: CASSANDRA-3494
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3494
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>    Affects Versions: 0.8.0
>            Reporter: Sylvain Lebresne
>            Assignee: Peter Schuller
>            Priority: Minor
>         Attachments: CASSANDRA-3494-0.8-prelim.txt, CASSANDRA-3494-1.0.txt
>
>
> The streamExecutor is define as:
> {noformat}
> streamExecutor_ = new DebuggableThreadPoolExecutor("Streaming", Thread.MIN_PRIORITY);
> {noformat}
> In the meantime, in DebuggableThreadPoolExecutor.java:
> {noformat}
> public DebuggableThreadPoolExecutor(String threadPoolName, int priority)
> {
>    this(1, Integer.MAX_VALUE, TimeUnit.SECONDS, new LinkedBlockingQueue<Runnable>(), new NamedThreadFactory(threadPoolName, priority));
> }
> {noformat}
> In other word, since the core pool size is 1 and the queue unbounded, tasks will always queued and the executor is essentially mono-threaded.
> This is clearly not necessary since we already have stream throttling nowadays. And it could be a limiting factor in the case of the bulk loader.
> Besides, I would venture that this maybe was not the intention, because putting the max core size to MAX_VALUE would suggest that the intention was to spawn threads on demand. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-3494) Streaming is mono-threaded (the bulk loader too by extension)

Posted by "Peter Schuller (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13176775#comment-13176775 ] 

Peter Schuller commented on CASSANDRA-3494:
-------------------------------------------

I did test it, yes. Streaming does work, and it does indeed not retain an idle thread per destination beyond end of streaming and timeout (verified with jstack). I'd have to go back to the code again (I also looked at ThreadPoolExecutor) to address your concern specifically. What problem are you seeing?

                
> Streaming is mono-threaded (the bulk loader too by extension)
> -------------------------------------------------------------
>
>                 Key: CASSANDRA-3494
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3494
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>    Affects Versions: 0.8.0
>            Reporter: Sylvain Lebresne
>            Assignee: Peter Schuller
>            Priority: Minor
>             Fix For: 1.1
>
>         Attachments: CASSANDRA-3494-0.8-prelim.txt, CASSANDRA-3494-1.0.txt
>
>
> The streamExecutor is define as:
> {noformat}
> streamExecutor_ = new DebuggableThreadPoolExecutor("Streaming", Thread.MIN_PRIORITY);
> {noformat}
> In the meantime, in DebuggableThreadPoolExecutor.java:
> {noformat}
> public DebuggableThreadPoolExecutor(String threadPoolName, int priority)
> {
>    this(1, Integer.MAX_VALUE, TimeUnit.SECONDS, new LinkedBlockingQueue<Runnable>(), new NamedThreadFactory(threadPoolName, priority));
> }
> {noformat}
> In other word, since the core pool size is 1 and the queue unbounded, tasks will always queued and the executor is essentially mono-threaded.
> This is clearly not necessary since we already have stream throttling nowadays. And it could be a limiting factor in the case of the bulk loader.
> Besides, I would venture that this maybe was not the intention, because putting the max core size to MAX_VALUE would suggest that the intention was to spawn threads on demand. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-3494) Streaming is mono-threaded (the bulk loader too by extension)

Posted by "Chris Goffinet (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13160844#comment-13160844 ] 

Chris Goffinet commented on CASSANDRA-3494:
-------------------------------------------

We actually just ran into this tonight. We had the network bandwidth, but because it was monothreaded, we couldnt fully saturate the NIC with 1 thread. It hurt us pretty hard since we needed to transfer 18TB in under 6 hours. Please increase this.
                
> Streaming is mono-threaded (the bulk loader too by extension)
> -------------------------------------------------------------
>
>                 Key: CASSANDRA-3494
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3494
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>    Affects Versions: 0.8.0
>            Reporter: Sylvain Lebresne
>            Priority: Minor
>
> The streamExecutor is define as:
> {noformat}
> streamExecutor_ = new DebuggableThreadPoolExecutor("Streaming", Thread.MIN_PRIORITY);
> {noformat}
> In the meantime, in DebuggableThreadPoolExecutor.java:
> {noformat}
> public DebuggableThreadPoolExecutor(String threadPoolName, int priority)
> {
>    this(1, Integer.MAX_VALUE, TimeUnit.SECONDS, new LinkedBlockingQueue<Runnable>(), new NamedThreadFactory(threadPoolName, priority));
> }
> {noformat}
> In other word, since the core pool size is 1 and the queue unbounded, tasks will always queued and the executor is essentially mono-threaded.
> This is clearly not necessary since we already have stream throttling nowadays. And it could be a limiting factor in the case of the bulk loader.
> Besides, I would venture that this maybe was not the intention, because putting the max core size to MAX_VALUE would suggest that the intention was to spawn threads on demand. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-3494) Streaming is mono-threaded (the bulk loader too by extension)

Posted by "Sylvain Lebresne (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13178349#comment-13178349 ] 

Sylvain Lebresne commented on CASSANDRA-3494:
---------------------------------------------

This work because of some details of ThreadPoolExecutor. Namely, in the sentence "When a new task is submitted in method execute(java.lang.Runnable), and fewer than corePoolSize threads are running, a new thread is created to handle the request, ...", the "fewer" is meant strictly. So with a 0 for corePoolSize, it will create a thread for the first task. It will also terminate that thread afterward because the doc says "If the pool currently has more than corePoolSize threads, excess threads will be terminated if they have been idle for more than the keepAliveTime".

However, this is dodgy code as for instance if maxCorePoolSize was 10, the thread pool would *not* create up to 10 threads, it would create one and then queue up all other tasks.

In fact, the documentation is actually inconsistent, as for unbounded queues it states that "Thus, no more than corePoolSize threads will ever be created. (And the value of the maximumPoolSize therefore doesn't have any effect.)". But this is not true if corePoolSize is 0.

I believe the correct way to create an executor with a max number of threads and where all threads timeout if unused with ThreadPoolExecutor is to set corePoolSize == maxPoolSize == whateverTheMaxShouldBe (1 for that patch) and to use ThreadPoolExecutor.allowCoreTheadTimeout().
                
> Streaming is mono-threaded (the bulk loader too by extension)
> -------------------------------------------------------------
>
>                 Key: CASSANDRA-3494
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3494
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>    Affects Versions: 0.8.0
>            Reporter: Sylvain Lebresne
>            Assignee: Peter Schuller
>            Priority: Minor
>             Fix For: 1.1
>
>         Attachments: CASSANDRA-3494-0.8-prelim.txt, CASSANDRA-3494-1.0.txt
>
>
> The streamExecutor is define as:
> {noformat}
> streamExecutor_ = new DebuggableThreadPoolExecutor("Streaming", Thread.MIN_PRIORITY);
> {noformat}
> In the meantime, in DebuggableThreadPoolExecutor.java:
> {noformat}
> public DebuggableThreadPoolExecutor(String threadPoolName, int priority)
> {
>    this(1, Integer.MAX_VALUE, TimeUnit.SECONDS, new LinkedBlockingQueue<Runnable>(), new NamedThreadFactory(threadPoolName, priority));
> }
> {noformat}
> In other word, since the core pool size is 1 and the queue unbounded, tasks will always queued and the executor is essentially mono-threaded.
> This is clearly not necessary since we already have stream throttling nowadays. And it could be a limiting factor in the case of the bulk loader.
> Besides, I would venture that this maybe was not the intention, because putting the max core size to MAX_VALUE would suggest that the intention was to spawn threads on demand. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (CASSANDRA-3494) Streaming is mono-threaded (the bulk loader too by extension)

Posted by "Peter Schuller (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-3494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Peter Schuller updated CASSANDRA-3494:
--------------------------------------

    Attachment: CASSANDRA-3494-1.0.txt

Attaching version rebased for 1.0 (and tested).
                
> Streaming is mono-threaded (the bulk loader too by extension)
> -------------------------------------------------------------
>
>                 Key: CASSANDRA-3494
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3494
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>    Affects Versions: 0.8.0
>            Reporter: Sylvain Lebresne
>            Assignee: Peter Schuller
>            Priority: Minor
>         Attachments: CASSANDRA-3494-0.8-prelim.txt, CASSANDRA-3494-1.0.txt
>
>
> The streamExecutor is define as:
> {noformat}
> streamExecutor_ = new DebuggableThreadPoolExecutor("Streaming", Thread.MIN_PRIORITY);
> {noformat}
> In the meantime, in DebuggableThreadPoolExecutor.java:
> {noformat}
> public DebuggableThreadPoolExecutor(String threadPoolName, int priority)
> {
>    this(1, Integer.MAX_VALUE, TimeUnit.SECONDS, new LinkedBlockingQueue<Runnable>(), new NamedThreadFactory(threadPoolName, priority));
> }
> {noformat}
> In other word, since the core pool size is 1 and the queue unbounded, tasks will always queued and the executor is essentially mono-threaded.
> This is clearly not necessary since we already have stream throttling nowadays. And it could be a limiting factor in the case of the bulk loader.
> Besides, I would venture that this maybe was not the intention, because putting the max core size to MAX_VALUE would suggest that the intention was to spawn threads on demand. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-3494) Streaming is mono-threaded (the bulk loader too by extension)

Posted by "Hudson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13266677#comment-13266677 ] 

Hudson commented on CASSANDRA-3494:
-----------------------------------

Integrated in Cassandra #1304 (See [https://builds.apache.org/job/Cassandra/1304/])
    Cleanup DTPE usage after CASSANDRA-3494 (Revision 263f192b65db18e3cfb5126e86358f374a0be5fa)

     Result = FAILURE
sylvain : 
Files : 
* src/java/org/apache/cassandra/concurrent/DebuggableThreadPoolExecutor.java
* src/java/org/apache/cassandra/io/sstable/SSTableReader.java
* src/java/org/apache/cassandra/net/MessagingService.java

                
> Streaming is mono-threaded (the bulk loader too by extension)
> -------------------------------------------------------------
>
>                 Key: CASSANDRA-3494
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3494
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>    Affects Versions: 0.8.0
>            Reporter: Sylvain Lebresne
>            Assignee: Peter Schuller
>            Priority: Minor
>             Fix For: 1.1.0
>
>         Attachments: 0001-cleanup-DTPE.patch, CASSANDRA-3494-0.8-prelim.txt, CASSANDRA-3494-1.0.txt
>
>
> The streamExecutor is define as:
> {noformat}
> streamExecutor_ = new DebuggableThreadPoolExecutor("Streaming", Thread.MIN_PRIORITY);
> {noformat}
> In the meantime, in DebuggableThreadPoolExecutor.java:
> {noformat}
> public DebuggableThreadPoolExecutor(String threadPoolName, int priority)
> {
>    this(1, Integer.MAX_VALUE, TimeUnit.SECONDS, new LinkedBlockingQueue<Runnable>(), new NamedThreadFactory(threadPoolName, priority));
> }
> {noformat}
> In other word, since the core pool size is 1 and the queue unbounded, tasks will always queued and the executor is essentially mono-threaded.
> This is clearly not necessary since we already have stream throttling nowadays. And it could be a limiting factor in the case of the bulk loader.
> Besides, I would venture that this maybe was not the intention, because putting the max core size to MAX_VALUE would suggest that the intention was to spawn threads on demand. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (CASSANDRA-3494) Streaming is mono-threaded (the bulk loader too by extension)

Posted by "Peter Schuller (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-3494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Peter Schuller updated CASSANDRA-3494:
--------------------------------------

    Attachment: CASSANDRA-3494-0.8-prelim.txt

Attaching a "principle" diff against our internal 0.8 for cursory examination. I want to submit another one for inclusion for 1.0 later (it's blocking on some 1.0 work that I have to do first).

The idea is this: Now we have one executor per destination host. In order to avoid complex synchronization we never bother removing an executor once created; but this is fine because we make sure threads time out so the cost is just the executor instance and not a thread, for destinations that are not being streamed to.

Make the tracking of active streams for throttling purposes be explicit, to avoid iterating over the O(n) map.
                
> Streaming is mono-threaded (the bulk loader too by extension)
> -------------------------------------------------------------
>
>                 Key: CASSANDRA-3494
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3494
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>    Affects Versions: 0.8.0
>            Reporter: Sylvain Lebresne
>            Assignee: Peter Schuller
>            Priority: Minor
>         Attachments: CASSANDRA-3494-0.8-prelim.txt
>
>
> The streamExecutor is define as:
> {noformat}
> streamExecutor_ = new DebuggableThreadPoolExecutor("Streaming", Thread.MIN_PRIORITY);
> {noformat}
> In the meantime, in DebuggableThreadPoolExecutor.java:
> {noformat}
> public DebuggableThreadPoolExecutor(String threadPoolName, int priority)
> {
>    this(1, Integer.MAX_VALUE, TimeUnit.SECONDS, new LinkedBlockingQueue<Runnable>(), new NamedThreadFactory(threadPoolName, priority));
> }
> {noformat}
> In other word, since the core pool size is 1 and the queue unbounded, tasks will always queued and the executor is essentially mono-threaded.
> This is clearly not necessary since we already have stream throttling nowadays. And it could be a limiting factor in the case of the bulk loader.
> Besides, I would venture that this maybe was not the intention, because putting the max core size to MAX_VALUE would suggest that the intention was to spawn threads on demand. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-3494) Streaming is mono-threaded (the bulk loader too by extension)

Posted by "Brandon Williams (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13160963#comment-13160963 ] 

Brandon Williams commented on CASSANDRA-3494:
---------------------------------------------

Just to confirm, you had stream_throughput_outbound_megabits_per_sec set to zero?  It defaults to 400Mbps when not set.
                
> Streaming is mono-threaded (the bulk loader too by extension)
> -------------------------------------------------------------
>
>                 Key: CASSANDRA-3494
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3494
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>    Affects Versions: 0.8.0
>            Reporter: Sylvain Lebresne
>            Priority: Minor
>
> The streamExecutor is define as:
> {noformat}
> streamExecutor_ = new DebuggableThreadPoolExecutor("Streaming", Thread.MIN_PRIORITY);
> {noformat}
> In the meantime, in DebuggableThreadPoolExecutor.java:
> {noformat}
> public DebuggableThreadPoolExecutor(String threadPoolName, int priority)
> {
>    this(1, Integer.MAX_VALUE, TimeUnit.SECONDS, new LinkedBlockingQueue<Runnable>(), new NamedThreadFactory(threadPoolName, priority));
> }
> {noformat}
> In other word, since the core pool size is 1 and the queue unbounded, tasks will always queued and the executor is essentially mono-threaded.
> This is clearly not necessary since we already have stream throttling nowadays. And it could be a limiting factor in the case of the bulk loader.
> Besides, I would venture that this maybe was not the intention, because putting the max core size to MAX_VALUE would suggest that the intention was to spawn threads on demand. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira