You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Check Peck <co...@gmail.com> on 2015/02/28 05:24:39 UTC

How to extract all the user id from a single table in Cassandra?

I have a Cassandra table like this -

    create table user_record (user_id text, record_name text, record_value
blob, primary key (user_id, record_name));

What is the best way to extract all the user_id from this table? As of now,
I cannot change my data model to do this exercise so I need to find a way
by which I can extract all the user_id from the above table.

I am using Datastax Java driver in my project. Is there any other easy way
apart from code to extract all the user_id from the above table through
come cqlsh utility and dump it into some file?

I am thinking below code might timed out after some time -

    public class TestCassandra {

            private Session session = null;
            private Cluster cluster = null;

            private static class ConnectionHolder {
                static final TestCassandra connection = new TestCassandra();
            }

            public static TestCassandra getInstance() {
                return ConnectionHolder.connection;
            }

            private TestCassandra() {
                Builder builder = Cluster.builder();
                builder.addContactPoints("127.0.0.1");

                PoolingOptions opts = new PoolingOptions();
                opts.setCoreConnectionsPerHost(HostDistance.LOCAL,
opts.getCoreConnectionsPerHost(HostDistance.LOCAL));

                cluster =
builder.withRetryPolicy(DowngradingConsistencyRetryPolicy.INSTANCE).withPoolingOptions(opts)
                        .withLoadBalancingPolicy(new TokenAwarePolicy(new
DCAwareRoundRobinPolicy("PI")))
                        .withReconnectionPolicy(new
ConstantReconnectionPolicy(100L))
                        .build();
                session = cluster.connect();
            }

        private Set<String> getRandomUsers() {
            Set<String> userList = new HashSet<String>();

            String sql = "select user_id from testkeyspace.user_record;";

            try {
                SimpleStatement query = new SimpleStatement(sql);
                query.setConsistencyLevel(ConsistencyLevel.ONE);
                ResultSet res = session.execute(query);

                Iterator<Row> rows = res.iterator();
                while (rows.hasNext()) {
                    Row r = rows.next();

                    String user_id = r.getString("user_id");
                    userList.add(user_id);
                }
            } catch (Exception e) {
                System.out.println("error= " + e);
            }

            return userList;
        }
    }

Adding java-driver group and Cassandra group as well to see whether there
is any better way to execute this?

Re: How to extract all the user id from a single table in Cassandra?

Posted by Jens Rantil <je...@tink.se>.
Hi Check,

Please avoid double posting on mailing lists. It leads to double work
(respect people's time!) and makes it hard for people in the future having
the same issue as you to follow discussions and answers.

That said, if you have a lot of primary keys

    select user_id from testkeyspace.user_record;

will most definitely timeout. Have a look at `SELECT DISTINCT` at [1]. More
importantly, for larger datasets you will also need to split the token
space into smaller segments and iteratively select your primary keys. See
[2].

[1]
http://www.datastax.com/documentation/cql/3.1/cql/cql_reference/select_r.html
[2]
http://www.datastax.com/documentation/cql/3.1/cql/cql_reference/select_r.html?scroll=reference_ds_d35_v2q_xj__paging-through-unordered-results

If you are having specific issues with the Java Driver I suggest you ask on
that mailing list (only).

Cheers,
Jens

On Sun, Mar 1, 2015 at 6:38 PM, Check Peck <co...@gmail.com> wrote:

> Sending again as I didn't got any response on this.
>
> Any thoughts?
>
> On Fri, Feb 27, 2015 at 8:24 PM, Check Peck <co...@gmail.com>
> wrote:
>
>> I have a Cassandra table like this -
>>
>>     create table user_record (user_id text, record_name text,
>> record_value blob, primary key (user_id, record_name));
>>
>> What is the best way to extract all the user_id from this table? As of
>> now, I cannot change my data model to do this exercise so I need to find a
>> way by which I can extract all the user_id from the above table.
>>
>> I am using Datastax Java driver in my project. Is there any other easy
>> way apart from code to extract all the user_id from the above table through
>> come cqlsh utility and dump it into some file?
>>
>> I am thinking below code might timed out after some time -
>>
>>     public class TestCassandra {
>>
>>             private Session session = null;
>>             private Cluster cluster = null;
>>
>>             private static class ConnectionHolder {
>>                 static final TestCassandra connection = new
>> TestCassandra();
>>             }
>>
>>             public static TestCassandra getInstance() {
>>                 return ConnectionHolder.connection;
>>             }
>>
>>             private TestCassandra() {
>>                 Builder builder = Cluster.builder();
>>                 builder.addContactPoints("127.0.0.1");
>>
>>                 PoolingOptions opts = new PoolingOptions();
>>                 opts.setCoreConnectionsPerHost(HostDistance.LOCAL,
>> opts.getCoreConnectionsPerHost(HostDistance.LOCAL));
>>
>>                 cluster =
>> builder.withRetryPolicy(DowngradingConsistencyRetryPolicy.INSTANCE).withPoolingOptions(opts)
>>                         .withLoadBalancingPolicy(new TokenAwarePolicy(new
>> DCAwareRoundRobinPolicy("PI")))
>>                         .withReconnectionPolicy(new
>> ConstantReconnectionPolicy(100L))
>>                         .build();
>>                 session = cluster.connect();
>>             }
>>
>>         private Set<String> getRandomUsers() {
>>             Set<String> userList = new HashSet<String>();
>>
>>             String sql = "select user_id from testkeyspace.user_record;";
>>
>>             try {
>>                 SimpleStatement query = new SimpleStatement(sql);
>>                 query.setConsistencyLevel(ConsistencyLevel.ONE);
>>                 ResultSet res = session.execute(query);
>>
>>                 Iterator<Row> rows = res.iterator();
>>                 while (rows.hasNext()) {
>>                     Row r = rows.next();
>>
>>                     String user_id = r.getString("user_id");
>>                     userList.add(user_id);
>>                 }
>>             } catch (Exception e) {
>>                 System.out.println("error= " + e);
>>             }
>>
>>             return userList;
>>         }
>>     }
>>
>> Adding java-driver group and Cassandra group as well to see whether there
>> is any better way to execute this?
>>
>
>


-- 
Jens Rantil
Backend engineer
Tink AB

Email: jens.rantil@tink.se
Phone: +46 708 84 18 32
Web: www.tink.se

Facebook <https://www.facebook.com/#!/tink.se> Linkedin
<http://www.linkedin.com/company/2735919?trk=vsrp_companies_res_photo&trkInfo=VSRPsearchId%3A1057023381369207406670%2CVSRPtargetId%3A2735919%2CVSRPcmpt%3Aprimary>
 Twitter <https://twitter.com/tink>

Re: How to extract all the user id from a single table in Cassandra?

Posted by Check Peck <co...@gmail.com>.
Sending again as I didn't got any response on this.

Any thoughts?

On Fri, Feb 27, 2015 at 8:24 PM, Check Peck <co...@gmail.com> wrote:

> I have a Cassandra table like this -
>
>     create table user_record (user_id text, record_name text, record_value
> blob, primary key (user_id, record_name));
>
> What is the best way to extract all the user_id from this table? As of
> now, I cannot change my data model to do this exercise so I need to find a
> way by which I can extract all the user_id from the above table.
>
> I am using Datastax Java driver in my project. Is there any other easy way
> apart from code to extract all the user_id from the above table through
> come cqlsh utility and dump it into some file?
>
> I am thinking below code might timed out after some time -
>
>     public class TestCassandra {
>
>             private Session session = null;
>             private Cluster cluster = null;
>
>             private static class ConnectionHolder {
>                 static final TestCassandra connection = new
> TestCassandra();
>             }
>
>             public static TestCassandra getInstance() {
>                 return ConnectionHolder.connection;
>             }
>
>             private TestCassandra() {
>                 Builder builder = Cluster.builder();
>                 builder.addContactPoints("127.0.0.1");
>
>                 PoolingOptions opts = new PoolingOptions();
>                 opts.setCoreConnectionsPerHost(HostDistance.LOCAL,
> opts.getCoreConnectionsPerHost(HostDistance.LOCAL));
>
>                 cluster =
> builder.withRetryPolicy(DowngradingConsistencyRetryPolicy.INSTANCE).withPoolingOptions(opts)
>                         .withLoadBalancingPolicy(new TokenAwarePolicy(new
> DCAwareRoundRobinPolicy("PI")))
>                         .withReconnectionPolicy(new
> ConstantReconnectionPolicy(100L))
>                         .build();
>                 session = cluster.connect();
>             }
>
>         private Set<String> getRandomUsers() {
>             Set<String> userList = new HashSet<String>();
>
>             String sql = "select user_id from testkeyspace.user_record;";
>
>             try {
>                 SimpleStatement query = new SimpleStatement(sql);
>                 query.setConsistencyLevel(ConsistencyLevel.ONE);
>                 ResultSet res = session.execute(query);
>
>                 Iterator<Row> rows = res.iterator();
>                 while (rows.hasNext()) {
>                     Row r = rows.next();
>
>                     String user_id = r.getString("user_id");
>                     userList.add(user_id);
>                 }
>             } catch (Exception e) {
>                 System.out.println("error= " + e);
>             }
>
>             return userList;
>         }
>     }
>
> Adding java-driver group and Cassandra group as well to see whether there
> is any better way to execute this?
>