You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by Apache Wiki <wi...@apache.org> on 2010/06/22 04:50:20 UTC
[Cassandra Wiki] Update of "Streaming_JA" by yutuki
Dear Wiki user,
You have subscribed to a wiki page or wiki category on "Cassandra Wiki" for change notification.
The "Streaming_JA" page has been changed by yutuki.
http://wiki.apache.org/cassandra/Streaming_JA
--------------------------------------------------
New page:
CassandraのClusterを構成するNode間でデータ移転を行う必要が出た場合、下記の様な手順で行われます。
1. データ受信側が、データ送信側に対して必要とするデータの範囲を送ります。
1. データ送信側は、受け取った範囲情報に従って必要なSStableファイルをStreamingの為にCopyします。複数のSSTableから単一のSSTableを生成する「Compaction」と逆の処理を行う為、この処理は「Anti-Compaction」と呼ばれています。
1. データ送信側は、データ受信側に対してまず送信するデータの一覧を送り、それに続いて実データの転送を開始します。
Monitoring the status of streaming on both source and destination nodes can be found (in 0.6) under the `org.apache.cassandra.streaming.StreamingService` MBean. The `Status` attribute gives an easy indication of what a node is doing with respect to streaming.
Step 2 is what takes the most time on most systems. The destination will be idle during this stage; to monitor anti-compaction progress, you should check the `Compaction` mbean on the source.
Once step 3 begins actual data transfer, the sending node will report a status of `"Waiting for transfer to $some_node to complete."` The receiving node will report `"Receiving stream"` while receiving stream data. The `StreamDestinations` and `StreamSources` attributes each contain a list of hosts that the current node is either sending stream data to or receiving it from.
The operations `getOutgoingFiles(host)` and `getIncomingFiles(host)` each return a list of strings describing the status of individual files being streamed to and from a given host. Each string follows this format: `[path to file] [bytes sent/received]/[file size]` If you think that streaming is taking too long on your cluster, the first thing you should do is check `StreamSources` or `StreamDestinations` to figure out which hosts are streaming files. Use those hosts as inputs to `getOutgoingFiles()` or `getIncomingFiles()` to check on the status of individual files from the problematic source and destination nodes. Streaming is conducted in 32MB chunks, so you should refresh the file status after a few seconds to see if the sent/received values change. If they do not change, or change more slowly than you'd like, something is wrong. Keep in mind that a source node can only stream a single file at a time, but a destination node can simultaneously receive several files.