You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Eric Fenderbosch (JIRA)" <ji...@apache.org> on 2015/11/03 15:13:27 UTC

[jira] [Updated] (CASSANDRA-10637) Extract LoaderOptions and refactor BulkLoader to be able to be used from within existing Java code instead of just through main()

     [ https://issues.apache.org/jira/browse/CASSANDRA-10637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Eric Fenderbosch updated CASSANDRA-10637:
-----------------------------------------
    Description: 
We are writing a service to migrate data from various RDMBS tables in to Cassandra. We write out a CSV from the source system, use CQLSSTableWriter to write sstables to disk, then call sstableloader to stream to the Cassandra cluster.

Right now, we either have to:

* return a CSV location from one Java process to a wrapper script which then kicks off sstableloader
* or call sstableloader via Runtime.getRuntime().exec
* or call BulkLoader.main from within our Java code, using a custom SecurityManager to trap the System.exit calls
* or subclass BulkLoader putting the subclass in the org.apache.cassandra.tools package in order to access the package scoped inner classes

None of these solutions are ideal. Ideally, we should be able to use the functionality of BulkLoader.main directly. I've extracted LoaderOptions to a top level class that uses the builder pattern so that it can be used as part of a Java migration service directly.

Creating the builder can now be performed with a fluent builder interface:

LoaderOptions options = LoaderOptions.builder(). //
                connectionsPerHost(2). //
                directory(directory). //
                hosts(hosts). //
                build();

Or used to parse command line arguments:

    LoaderOptions options = LoaderOptions.builder().parseArgs(args).build();

A new load method takes a LoaderOptions parameter and throws BulkLoadException instead of System.exit(1).

Fork on github can be found here:

https://github.com/efenderbosch/cassandra

  was:
We are writing a service to migrate data from various RDMBS tables in to Cassandra. We write out a CSV from the source system, use CQLSSTableWriter to write sstables to disk, then call sstableloader to stream to the Cassandra cluster.

Right now, we either have to:

* return a CSV location from one Java process to a wrapper script which then kicks off sstableloader
* or call sstableloader via Runtime.getRuntime().exec
* or call BulkLoader.main from within our Java code, using a custom SecurityManager to trap the System.exit calls
* or subclass BulkLoader putting the subclass in the org.apache.cassandra.tools package in order to access the package scoped inner classes

None of these solutions are ideal. Ideally, we should be able to use the functionality of BulkLoader.main directly. I've extracted LoaderOptions to a top level class that uses the builder pattern so that it can be used as part of a Java migration service directly.

Creating the builder can now be performed with a fluent builder interface:

```java
LoaderOptions options = LoaderOptions.builder(). //
                connectionsPerHost(2). //
                directory(directory). //
                hosts(hosts). //
                build();
```

Or used to parse command line arguments:

```java
LoaderOptions options = LoaderOptions.builder().parseArgs(args).build();
```

A new load method takes a ``LoaderOptions`` parameter and throws ``BulkLoadException`` instead of ```System.exit(1)```.

Fork on github can be found here:

https://github.com/efenderbosch/cassandra


> Extract LoaderOptions and refactor BulkLoader to be able to be used from within existing Java code instead of just through main()
> ---------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-10637
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-10637
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Tools
>            Reporter: Eric Fenderbosch
>            Priority: Minor
>             Fix For: 3.x
>
>
> We are writing a service to migrate data from various RDMBS tables in to Cassandra. We write out a CSV from the source system, use CQLSSTableWriter to write sstables to disk, then call sstableloader to stream to the Cassandra cluster.
> Right now, we either have to:
> * return a CSV location from one Java process to a wrapper script which then kicks off sstableloader
> * or call sstableloader via Runtime.getRuntime().exec
> * or call BulkLoader.main from within our Java code, using a custom SecurityManager to trap the System.exit calls
> * or subclass BulkLoader putting the subclass in the org.apache.cassandra.tools package in order to access the package scoped inner classes
> None of these solutions are ideal. Ideally, we should be able to use the functionality of BulkLoader.main directly. I've extracted LoaderOptions to a top level class that uses the builder pattern so that it can be used as part of a Java migration service directly.
> Creating the builder can now be performed with a fluent builder interface:
> LoaderOptions options = LoaderOptions.builder(). //
>                 connectionsPerHost(2). //
>                 directory(directory). //
>                 hosts(hosts). //
>                 build();
> Or used to parse command line arguments:
>     LoaderOptions options = LoaderOptions.builder().parseArgs(args).build();
> A new load method takes a LoaderOptions parameter and throws BulkLoadException instead of System.exit(1).
> Fork on github can be found here:
> https://github.com/efenderbosch/cassandra



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)