You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by Apache Wiki <wi...@apache.org> on 2010/06/22 04:50:20 UTC

[Cassandra Wiki] Update of "Streaming_JA" by yutuki

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Cassandra Wiki" for change notification.

The "Streaming_JA" page has been changed by yutuki.
http://wiki.apache.org/cassandra/Streaming_JA

--------------------------------------------------

New page:
CassandraのClusterを構成するNode間でデータ移転を行う必要が出た場合、下記の様な手順で行われます。

 1. データ受信側が、データ送信側に対して必要とするデータの範囲を送ります。
 1. データ送信側は、受け取った範囲情報に従って必要なSStableファイルをStreamingの為にCopyします。複数のSSTableから単一のSSTableを生成する「Compaction」と逆の処理を行う為、この処理は「Anti-Compaction」と呼ばれています。
 1. データ送信側は、データ受信側に対してまず送信するデータの一覧を送り、それに続いて実データの転送を開始します。

Monitoring the status of streaming on both source and destination nodes can be found (in 0.6) under the `org.apache.cassandra.streaming.StreamingService` MBean.  The `Status` attribute gives an easy indication of what a node is doing with respect to streaming.

Step 2 is what takes the most time on most systems. The destination will be idle during this stage; to monitor anti-compaction progress,  you should check the `Compaction` mbean on the source.

Once step 3 begins actual data transfer, the sending node will report a status of `"Waiting for transfer to $some_node to complete."`  The receiving node will report `"Receiving stream"` while receiving stream data.  The `StreamDestinations` and `StreamSources` attributes each contain a list of hosts that the current node is either sending stream data to or receiving it from.

The operations `getOutgoingFiles(host)` and `getIncomingFiles(host)` each return a list of strings describing the status of individual files being streamed to and from a given host.  Each string follows this format:  `[path to file] [bytes sent/received]/[file size]` If you think that streaming is taking too long on your cluster, the first thing you should do is check `StreamSources` or `StreamDestinations` to figure out which hosts are streaming files.  Use those hosts as inputs to `getOutgoingFiles()` or `getIncomingFiles()` to check on the status of individual files from the problematic source and destination nodes.  Streaming is conducted in 32MB chunks, so you should refresh the file status after a few seconds to see if the sent/received values change.  If they do not change, or change more slowly than you'd like, something is wrong.  Keep in mind that a source node can only stream a single file at a time, but a destination node can simultaneously receive several files.