You are viewing a plain text version of this content. The canonical link for it is here.
Posted to pr@cassandra.apache.org by GitBox <gi...@apache.org> on 2020/01/07 03:07:15 UTC

[GitHub] [cassandra] dvohra opened a new pull request #409: Bulkloading

dvohra opened a new pull request #409: Bulkloading
URL: https://github.com/apache/cassandra/pull/409
 
 
   Added page on bulk loading. 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: pr-unsubscribe@cassandra.apache.org
For additional commands, e-mail: pr-help@cassandra.apache.org


[GitHub] [cassandra] rustyrazorblade closed pull request #409: Bulkloading

Posted by GitBox <gi...@apache.org>.
rustyrazorblade closed pull request #409: Bulkloading
URL: https://github.com/apache/cassandra/pull/409
 
 
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: pr-unsubscribe@cassandra.apache.org
For additional commands, e-mail: pr-help@cassandra.apache.org


[GitHub] [cassandra] alexott commented on a change in pull request #409: Bulkloading

Posted by GitBox <gi...@apache.org>.
alexott commented on a change in pull request #409: Bulkloading
URL: https://github.com/apache/cassandra/pull/409#discussion_r363708109
 
 

 ##########
 File path: doc/source/operating/bulk_loading.rst
 ##########
 @@ -1,24 +1,693 @@
-.. Licensed to the Apache Software Foundation (ASF) under one
-.. or more contributor license agreements.  See the NOTICE file
-.. distributed with this work for additional information
-.. regarding copyright ownership.  The ASF licenses this file
-.. to you under the Apache License, Version 2.0 (the
-.. "License"); you may not use this file except in compliance
-.. with the License.  You may obtain a copy of the License at
-..
-..     http://www.apache.org/licenses/LICENSE-2.0
-..
-.. Unless required by applicable law or agreed to in writing, software
-.. distributed under the License is distributed on an "AS IS" BASIS,
-.. WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-.. See the License for the specific language governing permissions and
-.. limitations under the License.
-
-.. highlight:: none
-
-.. _bulk-loading:
-
-Bulk Loading
-------------
-
-.. todo:: TODO
+.. Licensed to the Apache Software Foundation (ASF) under one
+.. or more contributor license agreements.  See the NOTICE file
+.. distributed with this work for additional information
+.. regarding copyright ownership.  The ASF licenses this file
+.. to you under the Apache License, Version 2.0 (the
+.. "License"); you may not use this file except in compliance
+.. with the License.  You may obtain a copy of the License at
+..
+..     http://www.apache.org/licenses/LICENSE-2.0
+..
+.. Unless required by applicable law or agreed to in writing, software
+.. distributed under the License is distributed on an "AS IS" BASIS,
+.. WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+.. See the License for the specific language governing permissions and
+.. limitations under the License.
+.. highlight:: none
+
+.. _bulk-loading:
+
+Bulk Loading  
+============== 
+
+Bulk loading of data in Apache Cassandra is supported by different tools. The data to be bulk loaded must be in the form of SSTables. Cassandra does not support loading data in any other format such as CSV, JSON, and XML directly. Bulk loading could be used to:
+
+- Restore incremental backups and snapshots. Backups and snapshots are already in the form of SSTables.
+- Load existing SSTables into another cluster, which could have a different number of nodes or replication strategy.
+- Load external data into a cluster
+
+Tools for Bulk Loading
+^^^^^^^^^^^^^^^^^^^^^^
+
+Cassandra provides two commands or tools for bulk loading data. These are:
+
+- Cassandra Bulk loader, also called ``sstableloader``
+- The ``nodetool import`` command
+
+The ``sstableloader`` and ``nodetool import`` are accessible if the Cassandra installation ``bin`` directory is in the ``PATH`` environment variable.  Or these may be accessed directly from the ``bin`` directory. We shall discuss each of these next. We shall use the example or sample keyspaces and tables created in the Backups section. 
+
+Using sstableloader
+^^^^^^^^^^^^^^^^^^^
+
+The ``sstableloader`` is the main tool for bulk uploading data. The ``sstableloader`` streams SSTable data files to a running cluster. The ``sstableloader`` loads data conforming to the replication strategy and replication factor. The table to upload data to does need not to be empty.  
+
+The only requirements to run ``sstableloader`` are:
+
+1. One or more comma separated initial hosts to connect to and get ring information.
+2. A directory path for the SSTables to load.
+
+Its usage is as follows.
+
+::
+
+ sstableloader [options] <dir_path>
+
+Sstableloader bulk loads the SSTables found in the directory ``<dir_path>`` to the configured cluster. The   ``<dir_path>`` is used as the target *keyspace/table* name. As an example, to load an SSTable named
+``Standard1-g-1-Data.db`` into ``Keyspace1/Standard1``, you will need to have the
+files ``Standard1-g-1-Data.db`` and ``Standard1-g-1-Index.db`` in a directory ``/path/to/Keyspace1/Standard1/``.
+
+Sstableloader Option to accept Target keyspace name
+**************************************************** 
+Often as part of a backup strategy some Cassandra DBAs store an entire data directory. When corruption in data is found then they would like to restore data in the same cluster (for large clusters 200 nodes) but with different keyspace name.
+
+Currently ``sstableloader`` derives keyspace name from the folder structure. As  an option to specify target keyspace name as part of ``sstableloader``, version 4.0 adds support for the ``--target-keyspace``  option (`CASSANDRA-13884
+<https://issues.apache.org/jira/browse/CASSANDRA-13884>`_).
+
+The supported options are as follows from which only ``-d,--nodes <initial hosts>``  is required.
+
+::
+
+ -alg,--ssl-alg <ALGORITHM>                                   Client SSL: algorithm
+                                                               
+ -ap,--auth-provider <auth provider>                          Custom
+                                                              AuthProvider class name for 
+                                                              cassandra authentication
+ -ciphers,--ssl-ciphers <CIPHER-SUITES>                       Client SSL:
+                                                              comma-separated list of 
+                                                              encryption suites to use
+ -cph,--connections-per-host <connectionsPerHost>             Number of
+                                                              concurrent connections-per-host.
+ -d,--nodes <initial hosts>                                   Required.
+                                                              Try to connect to these hosts (comma separated) initially for ring information
+                                                               
+ -f,--conf-path <path to config file>                         cassandra.yaml file path for streaming throughput and client/server SSL.
+                                                              
+ -h,--help                                                    Display this help message
+                                                             
+ -i,--ignore <NODES>                                          Don't stream to this (comma separated) list of nodes
+
+ -idct,--inter-dc-throttle <inter-dc-throttle>                Inter-datacenter throttle speed in Mbits (default unlimited)
+                                                               
+ -k,--target-keyspace <target keyspace name>                  Target
+                                                              keyspace name
+ -ks,--keystore <KEYSTORE>                                    Client SSL:
+                                                              full path to keystore
+ -kspw,--keystore-password <KEYSTORE-PASSWORD>                Client SSL:
+                                                              password of the keystore
+ --no-progress                                                Don't
+                                                              display progress
+ -p,--port <native transport port>                            Port used
+                                                              for native connection (default 9042)
+ -prtcl,--ssl-protocol <PROTOCOL>                             Client SSL:
+                                                              connections protocol to use (default: TLS)
+ -pw,--password <password>                                    Password for
+                                                              cassandra authentication
+ -sp,--storage-port <storage port>                            Port used
+                                                              for internode communication (default 7000)
+ -spd,--server-port-discovery <allow server port discovery>   Use ports
+                                                              published by server to decide how to connect. With SSL requires StartTLS
+                                                              to be used.
+ -ssp,--ssl-storage-port <ssl storage port>                   Port used
+                                                              for TLS internode communication (default 7001)
+ -st,--store-type <STORE-TYPE>                                Client SSL:
+                                                              type of store
+ -t,--throttle <throttle>                                     Throttle
+                                                              speed in Mbits (default unlimited)
+ -ts,--truststore <TRUSTSTORE>                                Client SSL:
+                                                              full path to truststore
+ -tspw,--truststore-password <TRUSTSTORE-PASSWORD>            Client SSL:
+                                                              Password of the truststore
+ -u,--username <username>                                     Username for
+                                                              cassandra authentication
+ -v,--verbose                                                 verbose
+                                                              output
+
+The ``cassandra.yaml`` file could be provided  on the command-line with ``-f`` option to set up streaming throughput, client and server encryption options. Only ``stream_throughput_outbound_megabits_per_sec``, ``server_encryption_options`` and ``client_encryption_options`` are read from yaml. You can override options read from ``cassandra.yaml`` with corresponding command line options.
+
+A sstableloader Demo
+********************
+We shall demonstrate using ``sstableloader`` by uploading incremental backup data for table ``catalogkeyspace.magazine``.  We shall also use a snapshot of the same table to bulk upload in a different run of  ``sstableloader``.  The backups and snapshots for the ``catalogkeyspace.magazine`` table are listed as follows.
+
+::
+
+ [ec2-user@ip-10-0-2-238 ~]$ cd ./cassandra/data/data/catalogkeyspace/magazine- 
+ 446eae30c22a11e9b1350d927649052c
+ [ec2-user@ip-10-0-2-238 magazine-446eae30c22a11e9b1350d927649052c]$ ls -l
+ total 0
+ drwxrwxr-x. 2 ec2-user ec2-user 226 Aug 19 02:38 backups
+ drwxrwxr-x. 4 ec2-user ec2-user  40 Aug 19 02:45 snapshots
+
+The directory path structure of SSTables to be uploaded using ``sstableloader`` is used as the  target keyspace/table.  
+
+We could have directly uploaded from the ``backups`` and ``snapshots`` directories respectively if the directory structure were in the format used by ``sstableloader``. But the directory path of backups and snapshots for SSTables  is ``/catalogkeyspace/magazine-446eae30c22a11e9b1350d927649052c/backups`` and ``/catalogkeyspace/magazine-446eae30c22a11e9b1350d927649052c/snapshots`` respectively, which cannot be used to upload SSTables to ``catalogkeyspace.magazine`` table. The directory path structure must be ``/catalogkeyspace/magazine/`` to use ``sstableloader``. We need to create a new directory structure to upload SSTables with ``sstableloader`` which is typical when using ``sstableloader``. Create a directory structure ``/catalogkeyspace/magazine`` and set its permissions.
+
+::
+
+ [ec2-user@ip-10-0-2-238 ~]$ sudo mkdir -p /catalogkeyspace/magazine
+ [ec2-user@ip-10-0-2-238 ~]$ sudo chmod -R 777 /catalogkeyspace/magazine
+
+Bulk Loading from an Incremental Backup
++++++++++++++++++++++++++++++++++++++++
+An incremental backup does not include the DDL for a table. The table must already exist. If the table was dropped it may be created using the ``schema.cql`` generated with every snapshot of a table. As we shall be using ``sstableloader`` to load SSTables to the ``magazine`` table, the table must exist prior to running ``sstableloader``. The table does not need to be empty but we have used an empty table as indicated by a CQL query:
+
+::
+
+ cqlsh:catalogkeyspace> SELECT * FROM magazine;
+
+ id | name | publisher
+ ----+------+-----------
+
+ (0 rows)
+ 
+After the table to upload has been created copy the SSTable files from the ``backups`` directory to the ``/catalogkeyspace/magazine/`` directory that we created.
+
+::
+
+ [ec2-user@ip-10-0-2-238 ~]$ sudo cp ./cassandra/data/data/catalogkeyspace/magazine- 
+ 446eae30c22a11e9b1350d927649052c/backups/* /catalogkeyspace/magazine/
+
+Run the ``sstableloader`` to upload SSTables from the ``/catalogkeyspace/magazine/`` directory.
+
+::
+
+ sstableloader --nodes 10.0.2.238  /catalogkeyspace/magazine/
+
+The output from the ``sstableloader`` command should be similar to the listed: 
+
+::
+
+ [ec2-user@ip-10-0-2-238 ~]$ sstableloader --nodes 10.0.2.238  /catalogkeyspace/magazine/
+ Opening SSTables and calculating sections to stream
+ Streaming relevant part of /catalogkeyspace/magazine/na-1-big-Data.db 
+ /catalogkeyspace/magazine/na-2-big-Data.db  to [35.173.233.153:7000, 10.0.2.238:7000, 
+ 54.158.45.75:7000]
+ progress: [35.173.233.153:7000]0:1/2 88 % total: 88% 0.018KiB/s (avg: 0.018KiB/s)
+ progress: [35.173.233.153:7000]0:2/2 176% total: 176% 33.807KiB/s (avg: 0.036KiB/s)
+ progress: [35.173.233.153:7000]0:2/2 176% total: 176% 0.000KiB/s (avg: 0.029KiB/s)
+ progress: [35.173.233.153:7000]0:2/2 176% [10.0.2.238:7000]0:1/2 39 % total: 81% 0.115KiB/s 
+ (avg: 0.024KiB/s)
+ progress: [35.173.233.153:7000]0:2/2 176% [10.0.2.238:7000]0:2/2 78 % total: 108% 
+ 97.683KiB/s (avg: 0.033KiB/s)
+ progress: [35.173.233.153:7000]0:2/2 176% [10.0.2.238:7000]0:2/2 78 % 
+ [54.158.45.75:7000]0:1/2 39 % total: 80% 0.233KiB/s (avg: 0.040KiB/s)
+ progress: [35.173.233.153:7000]0:2/2 176% [10.0.2.238:7000]0:2/2 78 % 
+ [54.158.45.75:7000]0:2/2 78 % total: 96% 88.522KiB/s (avg: 0.049KiB/s)
+ progress: [35.173.233.153:7000]0:2/2 176% [10.0.2.238:7000]0:2/2 78 % 
+ [54.158.45.75:7000]0:2/2 78 % total: 96% 0.000KiB/s (avg: 0.045KiB/s)
+ progress: [35.173.233.153:7000]0:2/2 176% [10.0.2.238:7000]0:2/2 78 % 
+ [54.158.45.75:7000]0:2/2 78 % total: 96% 0.000KiB/s (avg: 0.044KiB/s)
+
+After the ``sstableloader`` has run query the ``magazine`` table and the loaded table should get listed when a query is run.
+
+::
+
+ cqlsh:catalogkeyspace> SELECT * FROM magazine;
+
+ id | name                      | publisher
+ ----+---------------------------+------------------
+  1 |        Couchbase Magazine |        Couchbase
+  0 | Apache Cassandra Magazine | Apache Cassandra
+
+ (2 rows)
+ cqlsh:catalogkeyspace> 
+
+Bulk Loading from a Snapshot
++++++++++++++++++++++++++++++ 
+In this section we shall demonstrate restoring a snapshot of the ``magazine`` table to the ``magazine`` table.  As we used the same table to restore data from a backup the directory structure required by ``sstableloader`` should already exist.  If the directory structure needed to load SSTables to ``catalogkeyspace.magazine`` does not exist create the directories and set their permissions.
+
+::
+
+ [ec2-user@ip-10-0-2-238 ~]$ sudo mkdir -p /catalogkeyspace/magazine
+ [ec2-user@ip-10-0-2-238 ~]$ sudo chmod -R 777 /catalogkeyspace/magazine
+
+As we shall be copying the snapshot  files to the directory remove any files that may be in the directory.
 
 Review comment:
   it's easier just to create symlinks, not copy the data. something like:
   
   ```sh
   mkdir keyspace_name
   ln -s _path_to_snapshot_folder keyspace_name/table_name
   ```

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: pr-unsubscribe@cassandra.apache.org
For additional commands, e-mail: pr-help@cassandra.apache.org


[GitHub] [cassandra] alexott commented on a change in pull request #409: Bulkloading

Posted by GitBox <gi...@apache.org>.
alexott commented on a change in pull request #409: Bulkloading
URL: https://github.com/apache/cassandra/pull/409#discussion_r363708385
 
 

 ##########
 File path: doc/source/operating/bulk_loading.rst
 ##########
 @@ -1,24 +1,693 @@
-.. Licensed to the Apache Software Foundation (ASF) under one
-.. or more contributor license agreements.  See the NOTICE file
-.. distributed with this work for additional information
-.. regarding copyright ownership.  The ASF licenses this file
-.. to you under the Apache License, Version 2.0 (the
-.. "License"); you may not use this file except in compliance
-.. with the License.  You may obtain a copy of the License at
-..
-..     http://www.apache.org/licenses/LICENSE-2.0
-..
-.. Unless required by applicable law or agreed to in writing, software
-.. distributed under the License is distributed on an "AS IS" BASIS,
-.. WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-.. See the License for the specific language governing permissions and
-.. limitations under the License.
-
-.. highlight:: none
-
-.. _bulk-loading:
-
-Bulk Loading
-------------
-
-.. todo:: TODO
+.. Licensed to the Apache Software Foundation (ASF) under one
+.. or more contributor license agreements.  See the NOTICE file
+.. distributed with this work for additional information
+.. regarding copyright ownership.  The ASF licenses this file
+.. to you under the Apache License, Version 2.0 (the
+.. "License"); you may not use this file except in compliance
+.. with the License.  You may obtain a copy of the License at
+..
+..     http://www.apache.org/licenses/LICENSE-2.0
+..
+.. Unless required by applicable law or agreed to in writing, software
+.. distributed under the License is distributed on an "AS IS" BASIS,
+.. WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+.. See the License for the specific language governing permissions and
+.. limitations under the License.
+.. highlight:: none
+
+.. _bulk-loading:
+
+Bulk Loading  
+============== 
+
+Bulk loading of data in Apache Cassandra is supported by different tools. The data to be bulk loaded must be in the form of SSTables. Cassandra does not support loading data in any other format such as CSV, JSON, and XML directly. Bulk loading could be used to:
+
+- Restore incremental backups and snapshots. Backups and snapshots are already in the form of SSTables.
+- Load existing SSTables into another cluster, which could have a different number of nodes or replication strategy.
+- Load external data into a cluster
+
+Tools for Bulk Loading
+^^^^^^^^^^^^^^^^^^^^^^
+
+Cassandra provides two commands or tools for bulk loading data. These are:
+
+- Cassandra Bulk loader, also called ``sstableloader``
+- The ``nodetool import`` command
+
+The ``sstableloader`` and ``nodetool import`` are accessible if the Cassandra installation ``bin`` directory is in the ``PATH`` environment variable.  Or these may be accessed directly from the ``bin`` directory. We shall discuss each of these next. We shall use the example or sample keyspaces and tables created in the Backups section. 
+
+Using sstableloader
+^^^^^^^^^^^^^^^^^^^
+
+The ``sstableloader`` is the main tool for bulk uploading data. The ``sstableloader`` streams SSTable data files to a running cluster. The ``sstableloader`` loads data conforming to the replication strategy and replication factor. The table to upload data to does need not to be empty.  
+
+The only requirements to run ``sstableloader`` are:
+
+1. One or more comma separated initial hosts to connect to and get ring information.
+2. A directory path for the SSTables to load.
+
+Its usage is as follows.
+
+::
+
+ sstableloader [options] <dir_path>
+
+Sstableloader bulk loads the SSTables found in the directory ``<dir_path>`` to the configured cluster. The   ``<dir_path>`` is used as the target *keyspace/table* name. As an example, to load an SSTable named
+``Standard1-g-1-Data.db`` into ``Keyspace1/Standard1``, you will need to have the
+files ``Standard1-g-1-Data.db`` and ``Standard1-g-1-Index.db`` in a directory ``/path/to/Keyspace1/Standard1/``.
+
+Sstableloader Option to accept Target keyspace name
+**************************************************** 
+Often as part of a backup strategy some Cassandra DBAs store an entire data directory. When corruption in data is found then they would like to restore data in the same cluster (for large clusters 200 nodes) but with different keyspace name.
+
+Currently ``sstableloader`` derives keyspace name from the folder structure. As  an option to specify target keyspace name as part of ``sstableloader``, version 4.0 adds support for the ``--target-keyspace``  option (`CASSANDRA-13884
+<https://issues.apache.org/jira/browse/CASSANDRA-13884>`_).
+
+The supported options are as follows from which only ``-d,--nodes <initial hosts>``  is required.
+
+::
+
+ -alg,--ssl-alg <ALGORITHM>                                   Client SSL: algorithm
+                                                               
+ -ap,--auth-provider <auth provider>                          Custom
+                                                              AuthProvider class name for 
+                                                              cassandra authentication
+ -ciphers,--ssl-ciphers <CIPHER-SUITES>                       Client SSL:
+                                                              comma-separated list of 
+                                                              encryption suites to use
+ -cph,--connections-per-host <connectionsPerHost>             Number of
+                                                              concurrent connections-per-host.
+ -d,--nodes <initial hosts>                                   Required.
+                                                              Try to connect to these hosts (comma separated) initially for ring information
+                                                               
+ -f,--conf-path <path to config file>                         cassandra.yaml file path for streaming throughput and client/server SSL.
+                                                              
+ -h,--help                                                    Display this help message
+                                                             
+ -i,--ignore <NODES>                                          Don't stream to this (comma separated) list of nodes
+
+ -idct,--inter-dc-throttle <inter-dc-throttle>                Inter-datacenter throttle speed in Mbits (default unlimited)
+                                                               
+ -k,--target-keyspace <target keyspace name>                  Target
+                                                              keyspace name
+ -ks,--keystore <KEYSTORE>                                    Client SSL:
+                                                              full path to keystore
+ -kspw,--keystore-password <KEYSTORE-PASSWORD>                Client SSL:
+                                                              password of the keystore
+ --no-progress                                                Don't
+                                                              display progress
+ -p,--port <native transport port>                            Port used
+                                                              for native connection (default 9042)
+ -prtcl,--ssl-protocol <PROTOCOL>                             Client SSL:
+                                                              connections protocol to use (default: TLS)
+ -pw,--password <password>                                    Password for
+                                                              cassandra authentication
+ -sp,--storage-port <storage port>                            Port used
+                                                              for internode communication (default 7000)
+ -spd,--server-port-discovery <allow server port discovery>   Use ports
+                                                              published by server to decide how to connect. With SSL requires StartTLS
+                                                              to be used.
+ -ssp,--ssl-storage-port <ssl storage port>                   Port used
+                                                              for TLS internode communication (default 7001)
+ -st,--store-type <STORE-TYPE>                                Client SSL:
+                                                              type of store
+ -t,--throttle <throttle>                                     Throttle
+                                                              speed in Mbits (default unlimited)
+ -ts,--truststore <TRUSTSTORE>                                Client SSL:
+                                                              full path to truststore
+ -tspw,--truststore-password <TRUSTSTORE-PASSWORD>            Client SSL:
+                                                              Password of the truststore
+ -u,--username <username>                                     Username for
+                                                              cassandra authentication
+ -v,--verbose                                                 verbose
+                                                              output
+
+The ``cassandra.yaml`` file could be provided  on the command-line with ``-f`` option to set up streaming throughput, client and server encryption options. Only ``stream_throughput_outbound_megabits_per_sec``, ``server_encryption_options`` and ``client_encryption_options`` are read from yaml. You can override options read from ``cassandra.yaml`` with corresponding command line options.
+
+A sstableloader Demo
+********************
+We shall demonstrate using ``sstableloader`` by uploading incremental backup data for table ``catalogkeyspace.magazine``.  We shall also use a snapshot of the same table to bulk upload in a different run of  ``sstableloader``.  The backups and snapshots for the ``catalogkeyspace.magazine`` table are listed as follows.
+
+::
+
+ [ec2-user@ip-10-0-2-238 ~]$ cd ./cassandra/data/data/catalogkeyspace/magazine- 
+ 446eae30c22a11e9b1350d927649052c
+ [ec2-user@ip-10-0-2-238 magazine-446eae30c22a11e9b1350d927649052c]$ ls -l
+ total 0
+ drwxrwxr-x. 2 ec2-user ec2-user 226 Aug 19 02:38 backups
+ drwxrwxr-x. 4 ec2-user ec2-user  40 Aug 19 02:45 snapshots
+
+The directory path structure of SSTables to be uploaded using ``sstableloader`` is used as the  target keyspace/table.  
+
+We could have directly uploaded from the ``backups`` and ``snapshots`` directories respectively if the directory structure were in the format used by ``sstableloader``. But the directory path of backups and snapshots for SSTables  is ``/catalogkeyspace/magazine-446eae30c22a11e9b1350d927649052c/backups`` and ``/catalogkeyspace/magazine-446eae30c22a11e9b1350d927649052c/snapshots`` respectively, which cannot be used to upload SSTables to ``catalogkeyspace.magazine`` table. The directory path structure must be ``/catalogkeyspace/magazine/`` to use ``sstableloader``. We need to create a new directory structure to upload SSTables with ``sstableloader`` which is typical when using ``sstableloader``. Create a directory structure ``/catalogkeyspace/magazine`` and set its permissions.
+
+::
+
+ [ec2-user@ip-10-0-2-238 ~]$ sudo mkdir -p /catalogkeyspace/magazine
+ [ec2-user@ip-10-0-2-238 ~]$ sudo chmod -R 777 /catalogkeyspace/magazine
+
+Bulk Loading from an Incremental Backup
++++++++++++++++++++++++++++++++++++++++
+An incremental backup does not include the DDL for a table. The table must already exist. If the table was dropped it may be created using the ``schema.cql`` generated with every snapshot of a table. As we shall be using ``sstableloader`` to load SSTables to the ``magazine`` table, the table must exist prior to running ``sstableloader``. The table does not need to be empty but we have used an empty table as indicated by a CQL query:
+
+::
+
+ cqlsh:catalogkeyspace> SELECT * FROM magazine;
+
+ id | name | publisher
+ ----+------+-----------
+
+ (0 rows)
+ 
+After the table to upload has been created copy the SSTable files from the ``backups`` directory to the ``/catalogkeyspace/magazine/`` directory that we created.
+
+::
+
+ [ec2-user@ip-10-0-2-238 ~]$ sudo cp ./cassandra/data/data/catalogkeyspace/magazine- 
+ 446eae30c22a11e9b1350d927649052c/backups/* /catalogkeyspace/magazine/
+
+Run the ``sstableloader`` to upload SSTables from the ``/catalogkeyspace/magazine/`` directory.
+
+::
+
+ sstableloader --nodes 10.0.2.238  /catalogkeyspace/magazine/
+
+The output from the ``sstableloader`` command should be similar to the listed: 
+
+::
+
+ [ec2-user@ip-10-0-2-238 ~]$ sstableloader --nodes 10.0.2.238  /catalogkeyspace/magazine/
+ Opening SSTables and calculating sections to stream
+ Streaming relevant part of /catalogkeyspace/magazine/na-1-big-Data.db 
+ /catalogkeyspace/magazine/na-2-big-Data.db  to [35.173.233.153:7000, 10.0.2.238:7000, 
+ 54.158.45.75:7000]
+ progress: [35.173.233.153:7000]0:1/2 88 % total: 88% 0.018KiB/s (avg: 0.018KiB/s)
+ progress: [35.173.233.153:7000]0:2/2 176% total: 176% 33.807KiB/s (avg: 0.036KiB/s)
+ progress: [35.173.233.153:7000]0:2/2 176% total: 176% 0.000KiB/s (avg: 0.029KiB/s)
+ progress: [35.173.233.153:7000]0:2/2 176% [10.0.2.238:7000]0:1/2 39 % total: 81% 0.115KiB/s 
+ (avg: 0.024KiB/s)
+ progress: [35.173.233.153:7000]0:2/2 176% [10.0.2.238:7000]0:2/2 78 % total: 108% 
+ 97.683KiB/s (avg: 0.033KiB/s)
+ progress: [35.173.233.153:7000]0:2/2 176% [10.0.2.238:7000]0:2/2 78 % 
+ [54.158.45.75:7000]0:1/2 39 % total: 80% 0.233KiB/s (avg: 0.040KiB/s)
+ progress: [35.173.233.153:7000]0:2/2 176% [10.0.2.238:7000]0:2/2 78 % 
+ [54.158.45.75:7000]0:2/2 78 % total: 96% 88.522KiB/s (avg: 0.049KiB/s)
+ progress: [35.173.233.153:7000]0:2/2 176% [10.0.2.238:7000]0:2/2 78 % 
+ [54.158.45.75:7000]0:2/2 78 % total: 96% 0.000KiB/s (avg: 0.045KiB/s)
+ progress: [35.173.233.153:7000]0:2/2 176% [10.0.2.238:7000]0:2/2 78 % 
+ [54.158.45.75:7000]0:2/2 78 % total: 96% 0.000KiB/s (avg: 0.044KiB/s)
+
+After the ``sstableloader`` has run query the ``magazine`` table and the loaded table should get listed when a query is run.
+
+::
+
+ cqlsh:catalogkeyspace> SELECT * FROM magazine;
+
+ id | name                      | publisher
+ ----+---------------------------+------------------
+  1 |        Couchbase Magazine |        Couchbase
+  0 | Apache Cassandra Magazine | Apache Cassandra
+
+ (2 rows)
+ cqlsh:catalogkeyspace> 
+
+Bulk Loading from a Snapshot
++++++++++++++++++++++++++++++ 
+In this section we shall demonstrate restoring a snapshot of the ``magazine`` table to the ``magazine`` table.  As we used the same table to restore data from a backup the directory structure required by ``sstableloader`` should already exist.  If the directory structure needed to load SSTables to ``catalogkeyspace.magazine`` does not exist create the directories and set their permissions.
+
+::
+
+ [ec2-user@ip-10-0-2-238 ~]$ sudo mkdir -p /catalogkeyspace/magazine
+ [ec2-user@ip-10-0-2-238 ~]$ sudo chmod -R 777 /catalogkeyspace/magazine
+
+As we shall be copying the snapshot  files to the directory remove any files that may be in the directory.
+
+::
+
+ [ec2-user@ip-10-0-2-238 ~]$ sudo rm /catalogkeyspace/magazine/*
+ [ec2-user@ip-10-0-2-238 ~]$ cd /catalogkeyspace/magazine/
+ [ec2-user@ip-10-0-2-238 magazine]$ ls -l
+ total 0
+
+
+Copy the snapshot files to the ``/catalogkeyspace/magazine`` directory. 
+
+::
+
+ [ec2-user@ip-10-0-2-238 ~]$ sudo cp ./cassandra/data/data/catalogkeyspace/magazine- 
+ 446eae30c22a11e9b1350d927649052c/snapshots/magazine/* /catalogkeyspace/magazine
+
+List the files in the ``/catalogkeyspace/magazine`` directory and a ``schema.cql`` should also get listed.
+
+::
+
+ [ec2-user@ip-10-0-2-238 ~]$ cd /catalogkeyspace/magazine
+ [ec2-user@ip-10-0-2-238 magazine]$ ls -l
+ total 44
+ -rw-r--r--. 1 root root   31 Aug 19 04:13 manifest.json
+ -rw-r--r--. 1 root root   47 Aug 19 04:13 na-1-big-CompressionInfo.db
+ -rw-r--r--. 1 root root   97 Aug 19 04:13 na-1-big-Data.db
+ -rw-r--r--. 1 root root   10 Aug 19 04:13 na-1-big-Digest.crc32
+ -rw-r--r--. 1 root root   16 Aug 19 04:13 na-1-big-Filter.db
+ -rw-r--r--. 1 root root   16 Aug 19 04:13 na-1-big-Index.db
+ -rw-r--r--. 1 root root 4687 Aug 19 04:13 na-1-big-Statistics.db
+ -rw-r--r--. 1 root root   56 Aug 19 04:13 na-1-big-Summary.db
+ -rw-r--r--. 1 root root   92 Aug 19 04:13 na-1-big-TOC.txt
+ -rw-r--r--. 1 root root  815 Aug 19 04:13 schema.cql
+
+If the ``magazine`` table was dropped run the DDL in the ``schema.cql`` to create the table.  Run the ``sstableloader`` with the following command.
+
+::
+
+ sstableloader --nodes 10.0.2.238  /catalogkeyspace/magazine/
+
+As the output from the command indicates SSTables get streamed to the cluster.
+
+::
+
+ [ec2-user@ip-10-0-2-238 ~]$ sstableloader --nodes 10.0.2.238  /catalogkeyspace/magazine/
+ 
+ Established connection to initial hosts
+ Opening SSTables and calculating sections to stream
+ Streaming relevant part of /catalogkeyspace/magazine/na-1-big-Data.db  to 
+ [35.173.233.153:7000, 10.0.2.238:7000, 54.158.45.75:7000]
+ progress: [35.173.233.153:7000]0:1/1 176% total: 176% 0.017KiB/s (avg: 0.017KiB/s)
+ progress: [35.173.233.153:7000]0:1/1 176% total: 176% 0.000KiB/s (avg: 0.014KiB/s)
+ progress: [35.173.233.153:7000]0:1/1 176% [10.0.2.238:7000]0:1/1 78 % total: 108% 0.115KiB/s 
+ (avg: 0.017KiB/s)
+ progress: [35.173.233.153:7000]0:1/1 176% [10.0.2.238:7000]0:1/1 78 % 
+ [54.158.45.75:7000]0:1/1 78 % total: 96% 0.232KiB/s (avg: 0.024KiB/s)
+ progress: [35.173.233.153:7000]0:1/1 176% [10.0.2.238:7000]0:1/1 78 % 
+ [54.158.45.75:7000]0:1/1 78 % total: 96% 0.000KiB/s (avg: 0.022KiB/s)
+ progress: [35.173.233.153:7000]0:1/1 176% [10.0.2.238:7000]0:1/1 78 % 
+ [54.158.45.75:7000]0:1/1 78 % total: 96% 0.000KiB/s (avg: 0.021KiB/s)
+ 
+Some other requirements of ``sstableloader`` that should be kept into consideration are:
+
+- The SSTables to be loaded must be compatible with  the Cassandra version being loaded into. 
+- Repairing tables that have been loaded into a different cluster does not repair the source tables.
+- Sstableloader makes use of port 7000 for internode communication.
+- Before restoring incremental backups run ``nodetool flush`` to backup any data in memtables
+
+Using nodetool import
+^^^^^^^^^^^^^^^^^^^^^
+In this section we shall import SSTables into a table using the ``nodetool import`` command. The ``nodetool refresh`` command is deprecated, and it is recommended to use ``nodetool import`` instead. The ``nodetool refresh`` does not have an option to load new SSTables from a separate directory which the ``nodetool import`` does.
+
+The command usage is as follows.
+
+::
+
+ nodetool <options> -- <keyspace> <table> <directory> ...
 
 Review comment:
   `nodetool <options> import` or `nodetool import <options>` ?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: pr-unsubscribe@cassandra.apache.org
For additional commands, e-mail: pr-help@cassandra.apache.org


[GitHub] [cassandra] dvohra commented on issue #409: Bulkloading

Posted by GitBox <gi...@apache.org>.
dvohra commented on issue #409: Bulkloading
URL: https://github.com/apache/cassandra/pull/409#issuecomment-571672509
 
 
   Alex, 
   Thanks for suggestions.
   Updated PR to add using symlinks as an alternative, and fix the _nodetool import_ syntax.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: pr-unsubscribe@cassandra.apache.org
For additional commands, e-mail: pr-help@cassandra.apache.org


[GitHub] [cassandra] alexott commented on a change in pull request #409: Bulkloading

Posted by GitBox <gi...@apache.org>.
alexott commented on a change in pull request #409: Bulkloading
URL: https://github.com/apache/cassandra/pull/409#discussion_r363708385
 
 

 ##########
 File path: doc/source/operating/bulk_loading.rst
 ##########
 @@ -1,24 +1,693 @@
-.. Licensed to the Apache Software Foundation (ASF) under one
-.. or more contributor license agreements.  See the NOTICE file
-.. distributed with this work for additional information
-.. regarding copyright ownership.  The ASF licenses this file
-.. to you under the Apache License, Version 2.0 (the
-.. "License"); you may not use this file except in compliance
-.. with the License.  You may obtain a copy of the License at
-..
-..     http://www.apache.org/licenses/LICENSE-2.0
-..
-.. Unless required by applicable law or agreed to in writing, software
-.. distributed under the License is distributed on an "AS IS" BASIS,
-.. WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-.. See the License for the specific language governing permissions and
-.. limitations under the License.
-
-.. highlight:: none
-
-.. _bulk-loading:
-
-Bulk Loading
-------------
-
-.. todo:: TODO
+.. Licensed to the Apache Software Foundation (ASF) under one
+.. or more contributor license agreements.  See the NOTICE file
+.. distributed with this work for additional information
+.. regarding copyright ownership.  The ASF licenses this file
+.. to you under the Apache License, Version 2.0 (the
+.. "License"); you may not use this file except in compliance
+.. with the License.  You may obtain a copy of the License at
+..
+..     http://www.apache.org/licenses/LICENSE-2.0
+..
+.. Unless required by applicable law or agreed to in writing, software
+.. distributed under the License is distributed on an "AS IS" BASIS,
+.. WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+.. See the License for the specific language governing permissions and
+.. limitations under the License.
+.. highlight:: none
+
+.. _bulk-loading:
+
+Bulk Loading  
+============== 
+
+Bulk loading of data in Apache Cassandra is supported by different tools. The data to be bulk loaded must be in the form of SSTables. Cassandra does not support loading data in any other format such as CSV, JSON, and XML directly. Bulk loading could be used to:
+
+- Restore incremental backups and snapshots. Backups and snapshots are already in the form of SSTables.
+- Load existing SSTables into another cluster, which could have a different number of nodes or replication strategy.
+- Load external data into a cluster
+
+Tools for Bulk Loading
+^^^^^^^^^^^^^^^^^^^^^^
+
+Cassandra provides two commands or tools for bulk loading data. These are:
+
+- Cassandra Bulk loader, also called ``sstableloader``
+- The ``nodetool import`` command
+
+The ``sstableloader`` and ``nodetool import`` are accessible if the Cassandra installation ``bin`` directory is in the ``PATH`` environment variable.  Or these may be accessed directly from the ``bin`` directory. We shall discuss each of these next. We shall use the example or sample keyspaces and tables created in the Backups section. 
+
+Using sstableloader
+^^^^^^^^^^^^^^^^^^^
+
+The ``sstableloader`` is the main tool for bulk uploading data. The ``sstableloader`` streams SSTable data files to a running cluster. The ``sstableloader`` loads data conforming to the replication strategy and replication factor. The table to upload data to does need not to be empty.  
+
+The only requirements to run ``sstableloader`` are:
+
+1. One or more comma separated initial hosts to connect to and get ring information.
+2. A directory path for the SSTables to load.
+
+Its usage is as follows.
+
+::
+
+ sstableloader [options] <dir_path>
+
+Sstableloader bulk loads the SSTables found in the directory ``<dir_path>`` to the configured cluster. The   ``<dir_path>`` is used as the target *keyspace/table* name. As an example, to load an SSTable named
+``Standard1-g-1-Data.db`` into ``Keyspace1/Standard1``, you will need to have the
+files ``Standard1-g-1-Data.db`` and ``Standard1-g-1-Index.db`` in a directory ``/path/to/Keyspace1/Standard1/``.
+
+Sstableloader Option to accept Target keyspace name
+**************************************************** 
+Often as part of a backup strategy some Cassandra DBAs store an entire data directory. When corruption in data is found then they would like to restore data in the same cluster (for large clusters 200 nodes) but with different keyspace name.
+
+Currently ``sstableloader`` derives keyspace name from the folder structure. As  an option to specify target keyspace name as part of ``sstableloader``, version 4.0 adds support for the ``--target-keyspace``  option (`CASSANDRA-13884
+<https://issues.apache.org/jira/browse/CASSANDRA-13884>`_).
+
+The supported options are as follows from which only ``-d,--nodes <initial hosts>``  is required.
+
+::
+
+ -alg,--ssl-alg <ALGORITHM>                                   Client SSL: algorithm
+                                                               
+ -ap,--auth-provider <auth provider>                          Custom
+                                                              AuthProvider class name for 
+                                                              cassandra authentication
+ -ciphers,--ssl-ciphers <CIPHER-SUITES>                       Client SSL:
+                                                              comma-separated list of 
+                                                              encryption suites to use
+ -cph,--connections-per-host <connectionsPerHost>             Number of
+                                                              concurrent connections-per-host.
+ -d,--nodes <initial hosts>                                   Required.
+                                                              Try to connect to these hosts (comma separated) initially for ring information
+                                                               
+ -f,--conf-path <path to config file>                         cassandra.yaml file path for streaming throughput and client/server SSL.
+                                                              
+ -h,--help                                                    Display this help message
+                                                             
+ -i,--ignore <NODES>                                          Don't stream to this (comma separated) list of nodes
+
+ -idct,--inter-dc-throttle <inter-dc-throttle>                Inter-datacenter throttle speed in Mbits (default unlimited)
+                                                               
+ -k,--target-keyspace <target keyspace name>                  Target
+                                                              keyspace name
+ -ks,--keystore <KEYSTORE>                                    Client SSL:
+                                                              full path to keystore
+ -kspw,--keystore-password <KEYSTORE-PASSWORD>                Client SSL:
+                                                              password of the keystore
+ --no-progress                                                Don't
+                                                              display progress
+ -p,--port <native transport port>                            Port used
+                                                              for native connection (default 9042)
+ -prtcl,--ssl-protocol <PROTOCOL>                             Client SSL:
+                                                              connections protocol to use (default: TLS)
+ -pw,--password <password>                                    Password for
+                                                              cassandra authentication
+ -sp,--storage-port <storage port>                            Port used
+                                                              for internode communication (default 7000)
+ -spd,--server-port-discovery <allow server port discovery>   Use ports
+                                                              published by server to decide how to connect. With SSL requires StartTLS
+                                                              to be used.
+ -ssp,--ssl-storage-port <ssl storage port>                   Port used
+                                                              for TLS internode communication (default 7001)
+ -st,--store-type <STORE-TYPE>                                Client SSL:
+                                                              type of store
+ -t,--throttle <throttle>                                     Throttle
+                                                              speed in Mbits (default unlimited)
+ -ts,--truststore <TRUSTSTORE>                                Client SSL:
+                                                              full path to truststore
+ -tspw,--truststore-password <TRUSTSTORE-PASSWORD>            Client SSL:
+                                                              Password of the truststore
+ -u,--username <username>                                     Username for
+                                                              cassandra authentication
+ -v,--verbose                                                 verbose
+                                                              output
+
+The ``cassandra.yaml`` file could be provided  on the command-line with ``-f`` option to set up streaming throughput, client and server encryption options. Only ``stream_throughput_outbound_megabits_per_sec``, ``server_encryption_options`` and ``client_encryption_options`` are read from yaml. You can override options read from ``cassandra.yaml`` with corresponding command line options.
+
+A sstableloader Demo
+********************
+We shall demonstrate using ``sstableloader`` by uploading incremental backup data for table ``catalogkeyspace.magazine``.  We shall also use a snapshot of the same table to bulk upload in a different run of  ``sstableloader``.  The backups and snapshots for the ``catalogkeyspace.magazine`` table are listed as follows.
+
+::
+
+ [ec2-user@ip-10-0-2-238 ~]$ cd ./cassandra/data/data/catalogkeyspace/magazine- 
+ 446eae30c22a11e9b1350d927649052c
+ [ec2-user@ip-10-0-2-238 magazine-446eae30c22a11e9b1350d927649052c]$ ls -l
+ total 0
+ drwxrwxr-x. 2 ec2-user ec2-user 226 Aug 19 02:38 backups
+ drwxrwxr-x. 4 ec2-user ec2-user  40 Aug 19 02:45 snapshots
+
+The directory path structure of SSTables to be uploaded using ``sstableloader`` is used as the  target keyspace/table.  
+
+We could have directly uploaded from the ``backups`` and ``snapshots`` directories respectively if the directory structure were in the format used by ``sstableloader``. But the directory path of backups and snapshots for SSTables  is ``/catalogkeyspace/magazine-446eae30c22a11e9b1350d927649052c/backups`` and ``/catalogkeyspace/magazine-446eae30c22a11e9b1350d927649052c/snapshots`` respectively, which cannot be used to upload SSTables to ``catalogkeyspace.magazine`` table. The directory path structure must be ``/catalogkeyspace/magazine/`` to use ``sstableloader``. We need to create a new directory structure to upload SSTables with ``sstableloader`` which is typical when using ``sstableloader``. Create a directory structure ``/catalogkeyspace/magazine`` and set its permissions.
+
+::
+
+ [ec2-user@ip-10-0-2-238 ~]$ sudo mkdir -p /catalogkeyspace/magazine
+ [ec2-user@ip-10-0-2-238 ~]$ sudo chmod -R 777 /catalogkeyspace/magazine
+
+Bulk Loading from an Incremental Backup
++++++++++++++++++++++++++++++++++++++++
+An incremental backup does not include the DDL for a table. The table must already exist. If the table was dropped it may be created using the ``schema.cql`` generated with every snapshot of a table. As we shall be using ``sstableloader`` to load SSTables to the ``magazine`` table, the table must exist prior to running ``sstableloader``. The table does not need to be empty but we have used an empty table as indicated by a CQL query:
+
+::
+
+ cqlsh:catalogkeyspace> SELECT * FROM magazine;
+
+ id | name | publisher
+ ----+------+-----------
+
+ (0 rows)
+ 
+After the table to upload has been created copy the SSTable files from the ``backups`` directory to the ``/catalogkeyspace/magazine/`` directory that we created.
+
+::
+
+ [ec2-user@ip-10-0-2-238 ~]$ sudo cp ./cassandra/data/data/catalogkeyspace/magazine- 
+ 446eae30c22a11e9b1350d927649052c/backups/* /catalogkeyspace/magazine/
+
+Run the ``sstableloader`` to upload SSTables from the ``/catalogkeyspace/magazine/`` directory.
+
+::
+
+ sstableloader --nodes 10.0.2.238  /catalogkeyspace/magazine/
+
+The output from the ``sstableloader`` command should be similar to the listed: 
+
+::
+
+ [ec2-user@ip-10-0-2-238 ~]$ sstableloader --nodes 10.0.2.238  /catalogkeyspace/magazine/
+ Opening SSTables and calculating sections to stream
+ Streaming relevant part of /catalogkeyspace/magazine/na-1-big-Data.db 
+ /catalogkeyspace/magazine/na-2-big-Data.db  to [35.173.233.153:7000, 10.0.2.238:7000, 
+ 54.158.45.75:7000]
+ progress: [35.173.233.153:7000]0:1/2 88 % total: 88% 0.018KiB/s (avg: 0.018KiB/s)
+ progress: [35.173.233.153:7000]0:2/2 176% total: 176% 33.807KiB/s (avg: 0.036KiB/s)
+ progress: [35.173.233.153:7000]0:2/2 176% total: 176% 0.000KiB/s (avg: 0.029KiB/s)
+ progress: [35.173.233.153:7000]0:2/2 176% [10.0.2.238:7000]0:1/2 39 % total: 81% 0.115KiB/s 
+ (avg: 0.024KiB/s)
+ progress: [35.173.233.153:7000]0:2/2 176% [10.0.2.238:7000]0:2/2 78 % total: 108% 
+ 97.683KiB/s (avg: 0.033KiB/s)
+ progress: [35.173.233.153:7000]0:2/2 176% [10.0.2.238:7000]0:2/2 78 % 
+ [54.158.45.75:7000]0:1/2 39 % total: 80% 0.233KiB/s (avg: 0.040KiB/s)
+ progress: [35.173.233.153:7000]0:2/2 176% [10.0.2.238:7000]0:2/2 78 % 
+ [54.158.45.75:7000]0:2/2 78 % total: 96% 88.522KiB/s (avg: 0.049KiB/s)
+ progress: [35.173.233.153:7000]0:2/2 176% [10.0.2.238:7000]0:2/2 78 % 
+ [54.158.45.75:7000]0:2/2 78 % total: 96% 0.000KiB/s (avg: 0.045KiB/s)
+ progress: [35.173.233.153:7000]0:2/2 176% [10.0.2.238:7000]0:2/2 78 % 
+ [54.158.45.75:7000]0:2/2 78 % total: 96% 0.000KiB/s (avg: 0.044KiB/s)
+
+After the ``sstableloader`` has run query the ``magazine`` table and the loaded table should get listed when a query is run.
+
+::
+
+ cqlsh:catalogkeyspace> SELECT * FROM magazine;
+
+ id | name                      | publisher
+ ----+---------------------------+------------------
+  1 |        Couchbase Magazine |        Couchbase
+  0 | Apache Cassandra Magazine | Apache Cassandra
+
+ (2 rows)
+ cqlsh:catalogkeyspace> 
+
+Bulk Loading from a Snapshot
++++++++++++++++++++++++++++++ 
+In this section we shall demonstrate restoring a snapshot of the ``magazine`` table to the ``magazine`` table.  As we used the same table to restore data from a backup the directory structure required by ``sstableloader`` should already exist.  If the directory structure needed to load SSTables to ``catalogkeyspace.magazine`` does not exist create the directories and set their permissions.
+
+::
+
+ [ec2-user@ip-10-0-2-238 ~]$ sudo mkdir -p /catalogkeyspace/magazine
+ [ec2-user@ip-10-0-2-238 ~]$ sudo chmod -R 777 /catalogkeyspace/magazine
+
+As we shall be copying the snapshot  files to the directory remove any files that may be in the directory.
+
+::
+
+ [ec2-user@ip-10-0-2-238 ~]$ sudo rm /catalogkeyspace/magazine/*
+ [ec2-user@ip-10-0-2-238 ~]$ cd /catalogkeyspace/magazine/
+ [ec2-user@ip-10-0-2-238 magazine]$ ls -l
+ total 0
+
+
+Copy the snapshot files to the ``/catalogkeyspace/magazine`` directory. 
+
+::
+
+ [ec2-user@ip-10-0-2-238 ~]$ sudo cp ./cassandra/data/data/catalogkeyspace/magazine- 
+ 446eae30c22a11e9b1350d927649052c/snapshots/magazine/* /catalogkeyspace/magazine
+
+List the files in the ``/catalogkeyspace/magazine`` directory and a ``schema.cql`` should also get listed.
+
+::
+
+ [ec2-user@ip-10-0-2-238 ~]$ cd /catalogkeyspace/magazine
+ [ec2-user@ip-10-0-2-238 magazine]$ ls -l
+ total 44
+ -rw-r--r--. 1 root root   31 Aug 19 04:13 manifest.json
+ -rw-r--r--. 1 root root   47 Aug 19 04:13 na-1-big-CompressionInfo.db
+ -rw-r--r--. 1 root root   97 Aug 19 04:13 na-1-big-Data.db
+ -rw-r--r--. 1 root root   10 Aug 19 04:13 na-1-big-Digest.crc32
+ -rw-r--r--. 1 root root   16 Aug 19 04:13 na-1-big-Filter.db
+ -rw-r--r--. 1 root root   16 Aug 19 04:13 na-1-big-Index.db
+ -rw-r--r--. 1 root root 4687 Aug 19 04:13 na-1-big-Statistics.db
+ -rw-r--r--. 1 root root   56 Aug 19 04:13 na-1-big-Summary.db
+ -rw-r--r--. 1 root root   92 Aug 19 04:13 na-1-big-TOC.txt
+ -rw-r--r--. 1 root root  815 Aug 19 04:13 schema.cql
+
+If the ``magazine`` table was dropped run the DDL in the ``schema.cql`` to create the table.  Run the ``sstableloader`` with the following command.
+
+::
+
+ sstableloader --nodes 10.0.2.238  /catalogkeyspace/magazine/
+
+As the output from the command indicates SSTables get streamed to the cluster.
+
+::
+
+ [ec2-user@ip-10-0-2-238 ~]$ sstableloader --nodes 10.0.2.238  /catalogkeyspace/magazine/
+ 
+ Established connection to initial hosts
+ Opening SSTables and calculating sections to stream
+ Streaming relevant part of /catalogkeyspace/magazine/na-1-big-Data.db  to 
+ [35.173.233.153:7000, 10.0.2.238:7000, 54.158.45.75:7000]
+ progress: [35.173.233.153:7000]0:1/1 176% total: 176% 0.017KiB/s (avg: 0.017KiB/s)
+ progress: [35.173.233.153:7000]0:1/1 176% total: 176% 0.000KiB/s (avg: 0.014KiB/s)
+ progress: [35.173.233.153:7000]0:1/1 176% [10.0.2.238:7000]0:1/1 78 % total: 108% 0.115KiB/s 
+ (avg: 0.017KiB/s)
+ progress: [35.173.233.153:7000]0:1/1 176% [10.0.2.238:7000]0:1/1 78 % 
+ [54.158.45.75:7000]0:1/1 78 % total: 96% 0.232KiB/s (avg: 0.024KiB/s)
+ progress: [35.173.233.153:7000]0:1/1 176% [10.0.2.238:7000]0:1/1 78 % 
+ [54.158.45.75:7000]0:1/1 78 % total: 96% 0.000KiB/s (avg: 0.022KiB/s)
+ progress: [35.173.233.153:7000]0:1/1 176% [10.0.2.238:7000]0:1/1 78 % 
+ [54.158.45.75:7000]0:1/1 78 % total: 96% 0.000KiB/s (avg: 0.021KiB/s)
+ 
+Some other requirements of ``sstableloader`` that should be kept into consideration are:
+
+- The SSTables to be loaded must be compatible with  the Cassandra version being loaded into. 
+- Repairing tables that have been loaded into a different cluster does not repair the source tables.
+- Sstableloader makes use of port 7000 for internode communication.
+- Before restoring incremental backups run ``nodetool flush`` to backup any data in memtables
+
+Using nodetool import
+^^^^^^^^^^^^^^^^^^^^^
+In this section we shall import SSTables into a table using the ``nodetool import`` command. The ``nodetool refresh`` command is deprecated, and it is recommended to use ``nodetool import`` instead. The ``nodetool refresh`` does not have an option to load new SSTables from a separate directory which the ``nodetool import`` does.
+
+The command usage is as follows.
+
+::
+
+ nodetool <options> -- <keyspace> <table> <directory> ...
 
 Review comment:
   `nodetool <options> import` ?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: pr-unsubscribe@cassandra.apache.org
For additional commands, e-mail: pr-help@cassandra.apache.org