You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Yuki Morishita (JIRA)" <ji...@apache.org> on 2012/05/30 19:52:23 UTC

[jira] [Created] (CASSANDRA-4297) Use java NIO as much as possible when streaming compressed SSTables

Yuki Morishita created CASSANDRA-4297:
-----------------------------------------

             Summary: Use java NIO as much as possible when streaming compressed SSTables
                 Key: CASSANDRA-4297
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-4297
             Project: Cassandra
          Issue Type: Improvement
            Reporter: Yuki Morishita
            Assignee: Yuki Morishita
            Priority: Minor
             Fix For: 1.2


Back in 0.8, streaming uses java NIO (FileChannel#transferTo/transferFrom) to perform zero copy file transfer between nodes. Since 1.0, in order to add new features like sstable compression and internode encryption we had to switch to java IO Input/OutputStreams. What we currently do to transfer compressed SSTable is, in source node, 1) decompress chunk in SSTable, 2) compress using LZF for network, and in destination node, 3) decompress using LZF as reading from socket, 4) compress for SSTable on disk.

Now, 1.1 comes out with SSTable compression turned on by default. It is reasonable to transfer compressed file as is using NIO instead of decompress/compress in source node.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (CASSANDRA-4297) Use java NIO as much as possible when streaming compressed SSTables

Posted by "Yuki Morishita (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-4297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Yuki Morishita updated CASSANDRA-4297:
--------------------------------------

    Attachment: 4297.txt

Attaching patch for review(also updated https://github.com/yukim/cassandra/tree/4297).

* removed unnecessary changes from trunk
* corrected progress reporting on dest node
* added some comments
                
> Use java NIO as much as possible when streaming compressed SSTables
> -------------------------------------------------------------------
>
>                 Key: CASSANDRA-4297
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-4297
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Yuki Morishita
>            Assignee: Yuki Morishita
>            Priority: Minor
>              Labels: streaming
>             Fix For: 1.2
>
>         Attachments: 4297.txt
>
>
> Back in 0.8, streaming uses java NIO (FileChannel#transferTo/transferFrom) to perform zero copy file transfer between nodes. Since 1.0, in order to add new features like sstable compression and internode encryption we had to switch to java IO Input/OutputStreams. What we currently do to transfer compressed SSTable is, in source node, 1) decompress chunk in SSTable, 2) compress using LZF for network, and in destination node, 3) decompress using LZF as reading from socket, 4) compress for SSTable on disk.
> Now, 1.1 comes out with SSTable compression turned on by default. It is reasonable to transfer compressed file as is using NIO instead of decompress/compress in source node.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-4297) Use java NIO as much as possible when streaming compressed SSTables

Posted by "Sylvain Lebresne (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-4297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13287370#comment-13287370 ] 

Sylvain Lebresne commented on CASSANDRA-4297:
---------------------------------------------

bq. It looks like the only reason to decompress is to compare crc32... is that right?

No. We decompress because we need to build secondary indexes, compute the sstable stats, clean counters delta, etc...
                
> Use java NIO as much as possible when streaming compressed SSTables
> -------------------------------------------------------------------
>
>                 Key: CASSANDRA-4297
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-4297
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Yuki Morishita
>            Assignee: Yuki Morishita
>            Priority: Minor
>              Labels: streaming
>             Fix For: 1.2
>
>
> Back in 0.8, streaming uses java NIO (FileChannel#transferTo/transferFrom) to perform zero copy file transfer between nodes. Since 1.0, in order to add new features like sstable compression and internode encryption we had to switch to java IO Input/OutputStreams. What we currently do to transfer compressed SSTable is, in source node, 1) decompress chunk in SSTable, 2) compress using LZF for network, and in destination node, 3) decompress using LZF as reading from socket, 4) compress for SSTable on disk.
> Now, 1.1 comes out with SSTable compression turned on by default. It is reasonable to transfer compressed file as is using NIO instead of decompress/compress in source node.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-4297) Use java NIO as much as possible when streaming compressed SSTables

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-4297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13293900#comment-13293900 ] 

Jonathan Ellis commented on CASSANDRA-4297:
-------------------------------------------

LGTM, +1.

bq. The reason why I use Set here is to eliminate duplicate chunks. Given two different file section can be mapped to just one chunk

Can you expand the "since sections are not guaranteed to be sorted" comment to elaborate on that?  (Still might be a bit cleaner to just new ArrayList(set) instead of manually copying to array; performance difference would be negligible.)
                
> Use java NIO as much as possible when streaming compressed SSTables
> -------------------------------------------------------------------
>
>                 Key: CASSANDRA-4297
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-4297
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Yuki Morishita
>            Assignee: Yuki Morishita
>            Priority: Minor
>              Labels: streaming
>             Fix For: 1.2
>
>         Attachments: 4297-v2.txt, 4297.txt
>
>
> Back in 0.8, streaming uses java NIO (FileChannel#transferTo/transferFrom) to perform zero copy file transfer between nodes. Since 1.0, in order to add new features like sstable compression and internode encryption we had to switch to java IO Input/OutputStreams. What we currently do to transfer compressed SSTable is, in source node, 1) decompress chunk in SSTable, 2) compress using LZF for network, and in destination node, 3) decompress using LZF as reading from socket, 4) compress for SSTable on disk.
> Now, 1.1 comes out with SSTable compression turned on by default. It is reasonable to transfer compressed file as is using NIO instead of decompress/compress in source node.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-4297) Use java NIO as much as possible when streaming compressed SSTables

Posted by "Yuki Morishita (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-4297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13285857#comment-13285857 ] 

Yuki Morishita commented on CASSANDRA-4297:
-------------------------------------------

I've pushed working commit to https://github.com/yukim/cassandra/tree/4297.

When streaming compressed files, source node appends compression info to stream header, and dest node uses that info to decompress data from stream.
If inter-node encryption is turned on, then zero copy transfer cannot be performed, so in that case we fall back to current way of streaming.

I ran simple bulk loading test which transfers several compressed SSTables between nodes. Although overall throughput and time took to complete streaming is about the same, patched version reduced CPU usage (20% -> 2%) on source node. Most of the time was spent on source node waiting for dest node to decompress and write to disk.

I still don't know if this is useful in production, so if someone can perform more realistic tests, I'm greatly appreciated.
                
> Use java NIO as much as possible when streaming compressed SSTables
> -------------------------------------------------------------------
>
>                 Key: CASSANDRA-4297
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-4297
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Yuki Morishita
>            Assignee: Yuki Morishita
>            Priority: Minor
>              Labels: streaming
>             Fix For: 1.2
>
>
> Back in 0.8, streaming uses java NIO (FileChannel#transferTo/transferFrom) to perform zero copy file transfer between nodes. Since 1.0, in order to add new features like sstable compression and internode encryption we had to switch to java IO Input/OutputStreams. What we currently do to transfer compressed SSTable is, in source node, 1) decompress chunk in SSTable, 2) compress using LZF for network, and in destination node, 3) decompress using LZF as reading from socket, 4) compress for SSTable on disk.
> Now, 1.1 comes out with SSTable compression turned on by default. It is reasonable to transfer compressed file as is using NIO instead of decompress/compress in source node.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-4297) Use java NIO as much as possible when streaming compressed SSTables

Posted by "Yuki Morishita (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-4297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13288717#comment-13288717 ] 

Yuki Morishita commented on CASSANDRA-4297:
-------------------------------------------

CompressedDIS is DataInputStream version of CompressedRandomAccessReader. It reads compressed chunks directly from stream and provides decompressed data  while reading from stream.  CRC check is also performed after decompressing chunk based on crc_chance setting in compression option, which is default to 1.0 or 100%, as done in CRAR.
                
> Use java NIO as much as possible when streaming compressed SSTables
> -------------------------------------------------------------------
>
>                 Key: CASSANDRA-4297
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-4297
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Yuki Morishita
>            Assignee: Yuki Morishita
>            Priority: Minor
>              Labels: streaming
>             Fix For: 1.2
>
>
> Back in 0.8, streaming uses java NIO (FileChannel#transferTo/transferFrom) to perform zero copy file transfer between nodes. Since 1.0, in order to add new features like sstable compression and internode encryption we had to switch to java IO Input/OutputStreams. What we currently do to transfer compressed SSTable is, in source node, 1) decompress chunk in SSTable, 2) compress using LZF for network, and in destination node, 3) decompress using LZF as reading from socket, 4) compress for SSTable on disk.
> Now, 1.1 comes out with SSTable compression turned on by default. It is reasonable to transfer compressed file as is using NIO instead of decompress/compress in source node.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-4297) Use java NIO as much as possible when streaming compressed SSTables

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-4297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13286271#comment-13286271 ] 

Jonathan Ellis commented on CASSANDRA-4297:
-------------------------------------------

bq. Most of the time was spent on source node waiting for dest node to decompress and write to disk

It looks like the only reason to decompress is to compare crc32...  is that right?  Why did we crc uncompressed data instead of compressed?  Should we introduce a new version of snappy compression that CRCs the compressed data instead?
                
> Use java NIO as much as possible when streaming compressed SSTables
> -------------------------------------------------------------------
>
>                 Key: CASSANDRA-4297
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-4297
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Yuki Morishita
>            Assignee: Yuki Morishita
>            Priority: Minor
>              Labels: streaming
>             Fix For: 1.2
>
>
> Back in 0.8, streaming uses java NIO (FileChannel#transferTo/transferFrom) to perform zero copy file transfer between nodes. Since 1.0, in order to add new features like sstable compression and internode encryption we had to switch to java IO Input/OutputStreams. What we currently do to transfer compressed SSTable is, in source node, 1) decompress chunk in SSTable, 2) compress using LZF for network, and in destination node, 3) decompress using LZF as reading from socket, 4) compress for SSTable on disk.
> Now, 1.1 comes out with SSTable compression turned on by default. It is reasonable to transfer compressed file as is using NIO instead of decompress/compress in source node.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-4297) Use java NIO as much as possible when streaming compressed SSTables

Posted by "Yuki Morishita (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-4297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13293947#comment-13293947 ] 

Yuki Morishita commented on CASSANDRA-4297:
-------------------------------------------

Why didn't I use SortedSet/TreeSet to eliminate dups and sort? :(
I will update patch with more detailed comment.
                
> Use java NIO as much as possible when streaming compressed SSTables
> -------------------------------------------------------------------
>
>                 Key: CASSANDRA-4297
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-4297
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Yuki Morishita
>            Assignee: Yuki Morishita
>            Priority: Minor
>              Labels: streaming
>             Fix For: 1.2
>
>         Attachments: 4297-v2.txt, 4297.txt
>
>
> Back in 0.8, streaming uses java NIO (FileChannel#transferTo/transferFrom) to perform zero copy file transfer between nodes. Since 1.0, in order to add new features like sstable compression and internode encryption we had to switch to java IO Input/OutputStreams. What we currently do to transfer compressed SSTable is, in source node, 1) decompress chunk in SSTable, 2) compress using LZF for network, and in destination node, 3) decompress using LZF as reading from socket, 4) compress for SSTable on disk.
> Now, 1.1 comes out with SSTable compression turned on by default. It is reasonable to transfer compressed file as is using NIO instead of decompress/compress in source node.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-4297) Use java NIO as much as possible when streaming compressed SSTables

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-4297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13294561#comment-13294561 ] 

Jonathan Ellis commented on CASSANDRA-4297:
-------------------------------------------

+1
                
> Use java NIO as much as possible when streaming compressed SSTables
> -------------------------------------------------------------------
>
>                 Key: CASSANDRA-4297
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-4297
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Yuki Morishita
>            Assignee: Yuki Morishita
>            Priority: Minor
>              Labels: streaming
>             Fix For: 1.2
>
>         Attachments: 0001-Use-SortedSet-instead-of-Set-and-Arrays.sort.txt, 4297-v2.txt, 4297.txt
>
>
> Back in 0.8, streaming uses java NIO (FileChannel#transferTo/transferFrom) to perform zero copy file transfer between nodes. Since 1.0, in order to add new features like sstable compression and internode encryption we had to switch to java IO Input/OutputStreams. What we currently do to transfer compressed SSTable is, in source node, 1) decompress chunk in SSTable, 2) compress using LZF for network, and in destination node, 3) decompress using LZF as reading from socket, 4) compress for SSTable on disk.
> Now, 1.1 comes out with SSTable compression turned on by default. It is reasonable to transfer compressed file as is using NIO instead of decompress/compress in source node.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (CASSANDRA-4297) Use java NIO as much as possible when streaming compressed SSTables

Posted by "Yuki Morishita (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-4297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Yuki Morishita updated CASSANDRA-4297:
--------------------------------------

    Attachment: 0001-Use-SortedSet-instead-of-Set-and-Arrays.sort.txt

I attached the part I modified since v2 patch. Basically just switched to use TreeSet instead of Set and Arrays.sort.
                
> Use java NIO as much as possible when streaming compressed SSTables
> -------------------------------------------------------------------
>
>                 Key: CASSANDRA-4297
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-4297
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Yuki Morishita
>            Assignee: Yuki Morishita
>            Priority: Minor
>              Labels: streaming
>             Fix For: 1.2
>
>         Attachments: 0001-Use-SortedSet-instead-of-Set-and-Arrays.sort.txt, 4297-v2.txt, 4297.txt
>
>
> Back in 0.8, streaming uses java NIO (FileChannel#transferTo/transferFrom) to perform zero copy file transfer between nodes. Since 1.0, in order to add new features like sstable compression and internode encryption we had to switch to java IO Input/OutputStreams. What we currently do to transfer compressed SSTable is, in source node, 1) decompress chunk in SSTable, 2) compress using LZF for network, and in destination node, 3) decompress using LZF as reading from socket, 4) compress for SSTable on disk.
> Now, 1.1 comes out with SSTable compression turned on by default. It is reasonable to transfer compressed file as is using NIO instead of decompress/compress in source node.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (CASSANDRA-4297) Use java NIO as much as possible when streaming compressed SSTables

Posted by "Yuki Morishita (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-4297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Yuki Morishita updated CASSANDRA-4297:
--------------------------------------

    Attachment: 4297-v2.txt
    
> Use java NIO as much as possible when streaming compressed SSTables
> -------------------------------------------------------------------
>
>                 Key: CASSANDRA-4297
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-4297
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Yuki Morishita
>            Assignee: Yuki Morishita
>            Priority: Minor
>              Labels: streaming
>             Fix For: 1.2
>
>         Attachments: 4297-v2.txt, 4297.txt
>
>
> Back in 0.8, streaming uses java NIO (FileChannel#transferTo/transferFrom) to perform zero copy file transfer between nodes. Since 1.0, in order to add new features like sstable compression and internode encryption we had to switch to java IO Input/OutputStreams. What we currently do to transfer compressed SSTable is, in source node, 1) decompress chunk in SSTable, 2) compress using LZF for network, and in destination node, 3) decompress using LZF as reading from socket, 4) compress for SSTable on disk.
> Now, 1.1 comes out with SSTable compression turned on by default. It is reasonable to transfer compressed file as is using NIO instead of decompress/compress in source node.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-4297) Use java NIO as much as possible when streaming compressed SSTables

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-4297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13290398#comment-13290398 ] 

Jonathan Ellis commented on CASSANDRA-4297:
-------------------------------------------

Comments:

- Would prefer to have CDIS just implement IS, and let callers wrap in DIS when desired, similar to how we use SnappyInputStream in IncomingTcpConnection, or LZFInputStream in ISR
- Why the changes to OutboundTcpConnection?
- Re MS changes: when would header.file be null?
- Chunk[] sort can use Guava Longs.compare
- I suggest adding a comment to explain why sort is necessary (b/c ranges are from replication strategy, so may not be sorted?)
- Instead of using Set + copy into array, why not use an ArrayList + trimToSize()
- is the FST comment {{// TODO just use a raw RandomAccessFile since we're managing our own buffer here}} obsolete?
- is the CompressedRandomAccessReader path used at all in FST anymore?
- Nit: avoid double negation in if statements with else clauses, e.g. instead of
{code}
.           if (remoteFile.compressionInfo != null)
                dis = new CompressedDataInputStream(socket.getInputStream(), remoteFile.compressionInfo);
            else
                dis = new DataInputStream(new LZFInputStream(socket.getInputStream()));
{code}
prefer
{code}
[           if (remoteFile.compressionInfo == null)
                dis = new DataInputStream(new LZFInputStream(socket.getInputStream()));
            else
                dis = new CompressedDataInputStream(socket.getInputStream(), remoteFile.compressionInfo);
{code}
- Nit: suggest moving serialization code for Chunk and CompressionParameters into ChunkSerializer and ChunkParametersSerializer classes, respectively, just to make the code discoverable for re-use later

At a higher level,
- Should we make nio transfer the default for uncompressed sstables as well, and add an option to enable compression?  Alternatively, now that compression is the default for new sstables, I'd be okay with removing LZF stream compression entirely
- Does this over-transfer data on chunk boundaries?  Put another way, do we stream data that doesn't actually belong on the target node?  (I'm okay with this, just want to be clear about what's happening.)
                
> Use java NIO as much as possible when streaming compressed SSTables
> -------------------------------------------------------------------
>
>                 Key: CASSANDRA-4297
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-4297
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Yuki Morishita
>            Assignee: Yuki Morishita
>            Priority: Minor
>              Labels: streaming
>             Fix For: 1.2
>
>         Attachments: 4297.txt
>
>
> Back in 0.8, streaming uses java NIO (FileChannel#transferTo/transferFrom) to perform zero copy file transfer between nodes. Since 1.0, in order to add new features like sstable compression and internode encryption we had to switch to java IO Input/OutputStreams. What we currently do to transfer compressed SSTable is, in source node, 1) decompress chunk in SSTable, 2) compress using LZF for network, and in destination node, 3) decompress using LZF as reading from socket, 4) compress for SSTable on disk.
> Now, 1.1 comes out with SSTable compression turned on by default. It is reasonable to transfer compressed file as is using NIO instead of decompress/compress in source node.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-4297) Use java NIO as much as possible when streaming compressed SSTables

Posted by "Yuki Morishita (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-4297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13288721#comment-13288721 ] 

Yuki Morishita commented on CASSANDRA-4297:
-------------------------------------------

I have to brush up my patch around progress  to update correctly. Will post updated version soon.
                
> Use java NIO as much as possible when streaming compressed SSTables
> -------------------------------------------------------------------
>
>                 Key: CASSANDRA-4297
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-4297
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Yuki Morishita
>            Assignee: Yuki Morishita
>            Priority: Minor
>              Labels: streaming
>             Fix For: 1.2
>
>
> Back in 0.8, streaming uses java NIO (FileChannel#transferTo/transferFrom) to perform zero copy file transfer between nodes. Since 1.0, in order to add new features like sstable compression and internode encryption we had to switch to java IO Input/OutputStreams. What we currently do to transfer compressed SSTable is, in source node, 1) decompress chunk in SSTable, 2) compress using LZF for network, and in destination node, 3) decompress using LZF as reading from socket, 4) compress for SSTable on disk.
> Now, 1.1 comes out with SSTable compression turned on by default. It is reasonable to transfer compressed file as is using NIO instead of decompress/compress in source node.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-4297) Use java NIO as much as possible when streaming compressed SSTables

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-4297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13288705#comment-13288705 ] 

Jonathan Ellis commented on CASSANDRA-4297:
-------------------------------------------

Can you break this down a bit for me?  CompressedFST looks straightforward, but what is CompressedDIS doing?
                
> Use java NIO as much as possible when streaming compressed SSTables
> -------------------------------------------------------------------
>
>                 Key: CASSANDRA-4297
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-4297
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Yuki Morishita
>            Assignee: Yuki Morishita
>            Priority: Minor
>              Labels: streaming
>             Fix For: 1.2
>
>
> Back in 0.8, streaming uses java NIO (FileChannel#transferTo/transferFrom) to perform zero copy file transfer between nodes. Since 1.0, in order to add new features like sstable compression and internode encryption we had to switch to java IO Input/OutputStreams. What we currently do to transfer compressed SSTable is, in source node, 1) decompress chunk in SSTable, 2) compress using LZF for network, and in destination node, 3) decompress using LZF as reading from socket, 4) compress for SSTable on disk.
> Now, 1.1 comes out with SSTable compression turned on by default. It is reasonable to transfer compressed file as is using NIO instead of decompress/compress in source node.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-4297) Use java NIO as much as possible when streaming compressed SSTables

Posted by "Yuki Morishita (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-4297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13293852#comment-13293852 ] 

Yuki Morishita commented on CASSANDRA-4297:
-------------------------------------------


V2 attached based on the review + some test related change.

bq. Would prefer to have CDIS just implement IS, and let callers wrap in DIS when desired, similar to how we use SnappyInputStream in IncomingTcpConnection, or LZFInputStream in ISR

CDIS now implements InputStream only and renamed to CompressedInputStream.

bq. Why the changes to OutboundTcpConnection?

The changes are made in order to obtain nio.SocketChannel, socket has to be created using SocketChannel.open.

bq. Re MS changes: when would header.file be null?

When a node requests range but target node doesn't have corresponding data. I reverted the change in MS to send at least send streaming header when header.file is null. It seems redundant but for now, it's necessary to terminate stream session of requesting node.

bq. Chunk[] sort can use Guava Longs.compare

done.

bq. I suggest adding a comment to explain why sort is necessary (b/c ranges are from replication strategy, so may not be sorted?) Instead of using Set + copy into array, why not use an ArrayList + trimToSize()

The reason why I use Set here is to eliminate duplicate chunks. Given two different file section can be mapped to just one chunk.

bq. is the FST comment // TODO just use a raw RandomAccessFile since we're managing our own buffer here obsolete? is the CompressedRandomAccessReader path used at all in FST anymore?

I removed CRAR from FST in v2. Even if nio is not available (in case of inter-node SSL), streaming uses CompressedFileStreamTask with socket's InputStream to transfer file directly.

{quote}
Nit: avoid double negation in if statements with else clauses
Nit: suggest moving serialization code for Chunk and CompressionParameters into ChunkSerializer and ChunkParametersSerializer classes, respectively, just to make the code discoverable for re-use later
{quote}

done.

bq. Should we make nio transfer the default for uncompressed sstables as well, and add an option to enable compression? Alternatively, now that compression is the default for new sstables, I'd be okay with removing LZF stream compression entirely

I don't do any benchmark, but I think always using LZF compression is fine when transferring uncompressed data.

bq. Does this over-transfer data on chunk boundaries? Put another way, do we stream data that doesn't actually belong on the target node? (I'm okay with this, just want to be clear about what's happening.)

Source node can send unrelated range of data inside chunk, but receiving node ignores (or skips) that part when reading from socket, so, the answer is no.

                
> Use java NIO as much as possible when streaming compressed SSTables
> -------------------------------------------------------------------
>
>                 Key: CASSANDRA-4297
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-4297
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Yuki Morishita
>            Assignee: Yuki Morishita
>            Priority: Minor
>              Labels: streaming
>             Fix For: 1.2
>
>         Attachments: 4297-v2.txt, 4297.txt
>
>
> Back in 0.8, streaming uses java NIO (FileChannel#transferTo/transferFrom) to perform zero copy file transfer between nodes. Since 1.0, in order to add new features like sstable compression and internode encryption we had to switch to java IO Input/OutputStreams. What we currently do to transfer compressed SSTable is, in source node, 1) decompress chunk in SSTable, 2) compress using LZF for network, and in destination node, 3) decompress using LZF as reading from socket, 4) compress for SSTable on disk.
> Now, 1.1 comes out with SSTable compression turned on by default. It is reasonable to transfer compressed file as is using NIO instead of decompress/compress in source node.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (CASSANDRA-4297) Use java NIO as much as possible when streaming compressed SSTables

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-4297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Ellis updated CASSANDRA-4297:
--------------------------------------

    Reviewer: jbellis
    
> Use java NIO as much as possible when streaming compressed SSTables
> -------------------------------------------------------------------
>
>                 Key: CASSANDRA-4297
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-4297
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Yuki Morishita
>            Assignee: Yuki Morishita
>            Priority: Minor
>              Labels: streaming
>             Fix For: 1.2
>
>
> Back in 0.8, streaming uses java NIO (FileChannel#transferTo/transferFrom) to perform zero copy file transfer between nodes. Since 1.0, in order to add new features like sstable compression and internode encryption we had to switch to java IO Input/OutputStreams. What we currently do to transfer compressed SSTable is, in source node, 1) decompress chunk in SSTable, 2) compress using LZF for network, and in destination node, 3) decompress using LZF as reading from socket, 4) compress for SSTable on disk.
> Now, 1.1 comes out with SSTable compression turned on by default. It is reasonable to transfer compressed file as is using NIO instead of decompress/compress in source node.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira