You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Sonny Heer <so...@gmail.com> on 2010/03/16 22:30:06 UTC

Dividing the client load between machines in Cassandra

How can I accomplish this?

The way I'm doing it now it is creating a TSocket connection using a
static IP of one of the boxes on Cassandra:
        TTransport tr = new TSocket(host, port.intValue());
        TProtocol proto = new TBinaryProtocol(tr);
        Cassandra.Client client = new Cassandra.Client(proto);
        tr.open();

With a larger cluster I would imagine there is another preferred
solution with no single point of failure (e.g. that one box  goes
down).

Re: Dividing the client load between machines in Cassandra

Posted by Tom Chen <to...@gogii.net>.
Try using the cassandra hector client.

It has failover and load balancing built in.

http://github.com/rantav/hector

<http://github.com/rantav/hector>Tom


On Tue, Mar 16, 2010 at 2:30 PM, Sonny Heer <so...@gmail.com> wrote:

> How can I accomplish this?
>
> The way I'm doing it now it is creating a TSocket connection using a
> static IP of one of the boxes on Cassandra:
>        TTransport tr = new TSocket(host, port.intValue());
>        TProtocol proto = new TBinaryProtocol(tr);
>        Cassandra.Client client = new Cassandra.Client(proto);
>        tr.open();
>
> With a larger cluster I would imagine there is another preferred
> solution with no single point of failure (e.g. that one box  goes
> down).
>



-- 
Tom Chen
Software Architect
GOGII, Inc
tom@gogii.net
650-468-6318

Re: Storing large blobs

Posted by Ted Zlatanov <tz...@lifelogs.com>.
On Wed, 17 Mar 2010 22:42:13 -0400 Carlos Sanchez <ca...@riskmetrics.com> wrote: 

CS> We could have blob as large as 50mb compressed (XML compresses quite
CS> well).  Typical documents we would deal with would be between 500K
CS> and 3MB

When just starting to use Cassandra I had serious issues with 0.5 and
blobs (compressed JSON) over 500 MB, but it was because of the heap size
and not something inherently broken in Cassandra.

Ted


Re: Storing large blobs

Posted by Avinash Lakshman <av...@gmail.com>.
It is practically a seek and large streaming read. I do not believe this
would be an issue. I have never run such a workload but a simple experiment
should clear the air.

Cheers
Avinash

On Wed, Mar 17, 2010 at 7:42 PM, Carlos Sanchez <
carlos.sanchez@riskmetrics.com> wrote:

> We could have blob as large as 50mb compressed (XML compresses quite well).
>  Typical documents we would deal with would be between 500K and 3MB
>
> Carlos
>
>
> ________________________________________
> From: Avinash Lakshman [avinash.lakshman@gmail.com]
> Sent: Wednesday, March 17, 2010 8:49 PM
> To: user@cassandra.apache.org
> Subject: Re: Storing large blobs
>
> My question would be how large is large? Perhaps you could compress the
> blobs and then store them. But it depends on the answer to the first
> question.
>
> Cheers
> Avinash
>
> On Wed, Mar 17, 2010 at 5:10 PM, Carlos Sanchez <
> carlos.sanchez@riskmetrics.com<ma...@riskmetrics.com>>
> wrote:
> Has anyone had experience storing large blobs in Cassandra? Is really
> Cassandra tailored for large content?
>
> Carlos
>
> This email message and any attachments are for the sole use of the intended
> recipients and may contain proprietary and/or confidential information which
> may be privileged or otherwise protected from disclosure. Any unauthorized
> review, use, disclosure or distribution is prohibited. If you are not an
> intended recipient, please contact the sender by reply email and destroy the
> original message and any copies of the message as well as any attachments to
> the original message.
>
>
> This email message and any attachments are for the sole use of the intended
> recipients and may contain proprietary and/or confidential information which
> may be privileged or otherwise protected from disclosure. Any unauthorized
> review, use, disclosure or distribution is prohibited. If you are not an
> intended recipient, please contact the sender by reply email and destroy the
> original message and any copies of the message as well as any attachments to
> the original message.
>

RE: Storing large blobs

Posted by Carlos Sanchez <ca...@riskmetrics.com>.
We could have blob as large as 50mb compressed (XML compresses quite well).  Typical documents we would deal with would be between 500K and 3MB

Carlos


________________________________________
From: Avinash Lakshman [avinash.lakshman@gmail.com]
Sent: Wednesday, March 17, 2010 8:49 PM
To: user@cassandra.apache.org
Subject: Re: Storing large blobs

My question would be how large is large? Perhaps you could compress the blobs and then store them. But it depends on the answer to the first question.

Cheers
Avinash

On Wed, Mar 17, 2010 at 5:10 PM, Carlos Sanchez <ca...@riskmetrics.com>> wrote:
Has anyone had experience storing large blobs in Cassandra? Is really Cassandra tailored for large content?

Carlos

This email message and any attachments are for the sole use of the intended recipients and may contain proprietary and/or confidential information which may be privileged or otherwise protected from disclosure. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not an intended recipient, please contact the sender by reply email and destroy the original message and any copies of the message as well as any attachments to the original message.


This email message and any attachments are for the sole use of the intended recipients and may contain proprietary and/or confidential information which may be privileged or otherwise protected from disclosure. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not an intended recipient, please contact the sender by reply email and destroy the original message and any copies of the message as well as any attachments to the original message.

Re: Storing large blobs

Posted by Avinash Lakshman <av...@gmail.com>.
My question would be how large is large? Perhaps you could compress the
blobs and then store them. But it depends on the answer to the first
question.

Cheers
Avinash

On Wed, Mar 17, 2010 at 5:10 PM, Carlos Sanchez <
carlos.sanchez@riskmetrics.com> wrote:

> Has anyone had experience storing large blobs in Cassandra? Is really
> Cassandra tailored for large content?
>
> Carlos
>
> This email message and any attachments are for the sole use of the intended
> recipients and may contain proprietary and/or confidential information which
> may be privileged or otherwise protected from disclosure. Any unauthorized
> review, use, disclosure or distribution is prohibited. If you are not an
> intended recipient, please contact the sender by reply email and destroy the
> original message and any copies of the message as well as any attachments to
> the original message.
>

Re: Storing large blobs

Posted by Jonathan Ellis <jb...@gmail.com>.
It's not tailored for it, but it works "well enough" for some
applications.  Better than having to deal with two different data
stores.

On Wed, Mar 17, 2010 at 8:10 PM, Carlos Sanchez
<ca...@riskmetrics.com> wrote:
> Has anyone had experience storing large blobs in Cassandra? Is really Cassandra tailored for large content?
>
> Carlos
>
> This email message and any attachments are for the sole use of the intended recipients and may contain proprietary and/or confidential information which may be privileged or otherwise protected from disclosure. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not an intended recipient, please contact the sender by reply email and destroy the original message and any copies of the message as well as any attachments to the original message.
>

Storing large blobs

Posted by Carlos Sanchez <ca...@riskmetrics.com>.
Has anyone had experience storing large blobs in Cassandra? Is really Cassandra tailored for large content?

Carlos

This email message and any attachments are for the sole use of the intended recipients and may contain proprietary and/or confidential information which may be privileged or otherwise protected from disclosure. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not an intended recipient, please contact the sender by reply email and destroy the original message and any copies of the message as well as any attachments to the original message.

Re: Dividing the client load between machines in Cassandra

Posted by Sonny Heer <so...@gmail.com>.
Opps.  Yep, thanks!

On Wed, Mar 17, 2010 at 1:47 PM, Jonathan Ellis <jb...@gmail.com> wrote:
> You didn't call tr.open() ?
>
> On Wed, Mar 17, 2010 at 3:45 PM, Sonny Heer <so...@gmail.com> wrote:
>> I'm getting:
>> org.apache.thrift.transport.TTransportException: Cannot write to null
>> outputStream
>>        at org.apache.thrift.transport.TIOStreamTransport.write(TIOStreamTransport.java:137)
>>        at org.apache.thrift.protocol.TBinaryProtocol.writeI32(TBinaryProtocol.java:152)
>>        at org.apache.thrift.protocol.TBinaryProtocol.writeMessageBegin(TBinaryProtocol.java:80)
>>        at org.apache.cassandra.service.Cassandra$Client.send_get_string_property(Cassandra.java:681)
>>        at org.apache.cassandra.service.Cassandra$Client.get_string_property(Cassandra.java:675)
>>        at com.atsid.cassandra.ngram.test.TestRingConnection.main(TestRingConnection.java:26)
>>
>>
>> when running:
>>
>>        TTransport tr = new TSocket("localhost", 9160);
>>        TProtocol proto = new TBinaryProtocol(tr);
>>        Cassandra.Client client = new Cassandra.Client(proto);
>>                try {
>>                        String jsonServerList = client.get_string_property("token map");
>>
>>
>> What am I doing wrong here?
>>
>> On Wed, Mar 17, 2010 at 11:33 AM, Sonny Heer <so...@gmail.com> wrote:
>>> Cool thanks Todd.  I'd be interested at some point to see the updated
>>> .6 version as well.  Thanks again!
>>>
>>> On Wed, Mar 17, 2010 at 9:24 AM, B. Todd Burruss <bb...@real.com> wrote:
>>>> below is the commented out code i once used.  i think it is from the 0.5
>>>> days, so it might not even work now.  not sure.  the bootstrapHostArr is
>>>> simply a list of host information used to bootstrap the process.
>>>>  connectToHost is a method used to generate a Cassandra.Client object.
>>>>  there is sample code on cassandra wiki for doing this.  good luck!
>>>>
>>>> // can't use this on cassandra because the tokens returned are for the
>>>> "internal cassandra server comm", not the thrift IPs
>>>> //        String    hostList = null;
>>>> //        for ( HostInfo hi : bootstrapHostArr ) {
>>>> //            Cassandra.Client    client = null;
>>>> //            try {
>>>> //                client = connectToHost( hi.getHostName(), hi.getPort() );
>>>> //                hostList = client.get_string_property( "token map" );
>>>> //                break;
>>>> //            }
>>>> //            catch ( TTransportException e ) {
>>>> //                logger.error( "cannot connect to bootstrap node - will try
>>>> another if available : " + hi.getNameAndPort() );
>>>> //            }
>>>> //            catch ( TException e ) {
>>>> //                logger.error( "cannot retrieve host list from node - will
>>>> try another if available : " + hi.getNameAndPort() );
>>>> //            }
>>>> //            finally {
>>>> //                if ( null != client ) {
>>>> //                    disconnectFromCluster( client );
>>>> //                }
>>>> //            }
>>>> //        }
>>>> //   //        if ( null != hostList ) {           //
>>>>  ArrayList<String>    newArr;
>>>> //            try {
>>>> //                JSONObject    jsonObj = new JSONObject( hostList );
>>>> //                String[]    ringArr = JSONObject.getNames( jsonObj );
>>>> //                newArr = new ArrayList<String>( ringArr.length );
>>>> //               //                for ( int i=0;i < ringArr.length;i++ ) {
>>>> //                    String    hostName = jsonObj.getString( ringArr[i] );
>>>> //                    if ( !hostIgnoreSet.contains(hostName) ) {
>>>> //                        newArr.add( hostName );
>>>> //                    }
>>>> //                }
>>>> //            }
>>>> //            catch ( JSONException e ) {
>>>> //                throw new ClusterRuntimeException( "Could not parse JSON
>>>> returned from Cassandra - don't know what to do?  ARRRRGGGG" );
>>>> //            }
>>>>
>>>>
>>>> Sonny Heer wrote:
>>>>>
>>>>> Is there some example code on how to do this?
>>>>>
>>>>> On Tue, Mar 16, 2010 at 3:07 PM, Jonathan Ellis <jb...@gmail.com> wrote:
>>>>>
>>>>>>
>>>>>> token map is an internal representation, so returning the internal IPs
>>>>>> is correct, even though this makes it slightly more difficult to use
>>>>>> for thrift clients.
>>>>>>
>>>>>> On Tue, Mar 16, 2010 at 4:55 PM, B. Todd Burruss <bb...@real.com>
>>>>>> wrote:
>>>>>>
>>>>>>>
>>>>>>> if you choose #3 - get_string_property("token map") - keep in mind that
>>>>>>> the
>>>>>>> IPs returned from this call are the IPs used for "ListenAddress" param
>>>>>>> in
>>>>>>> storage-conf.xml.  In my case we have two NICs and I set this to be an
>>>>>>> IP
>>>>>>> that is only for "node to node" communication.  The "ThriftAddress"
>>>>>>> param is
>>>>>>> the one i really want.  maybe this has been changed, "fixed" ;), haven't
>>>>>>> tested in a while.
>>>>>>>
>>>>>>> Jonathan Ellis wrote:
>>>>>>>
>>>>>>>>
>>>>>>>> http://wiki.apache.org/cassandra/FAQ#node_clients_connect_to
>>>>>>>>
>>>>>>>> On Tue, Mar 16, 2010 at 4:30 PM, Sonny Heer <so...@gmail.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>>
>>>>>>>>> How can I accomplish this?
>>>>>>>>>
>>>>>>>>> The way I'm doing it now it is creating a TSocket connection using a
>>>>>>>>> static IP of one of the boxes on Cassandra:
>>>>>>>>>      TTransport tr = new TSocket(host, port.intValue());
>>>>>>>>>      TProtocol proto = new TBinaryProtocol(tr);
>>>>>>>>>      Cassandra.Client client = new Cassandra.Client(proto);
>>>>>>>>>      tr.open();
>>>>>>>>>
>>>>>>>>> With a larger cluster I would imagine there is another preferred
>>>>>>>>> solution with no single point of failure (e.g. that one box  goes
>>>>>>>>> down).
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>
>>>
>>
>

Re: Dividing the client load between machines in Cassandra

Posted by Jonathan Ellis <jb...@gmail.com>.
You didn't call tr.open() ?

On Wed, Mar 17, 2010 at 3:45 PM, Sonny Heer <so...@gmail.com> wrote:
> I'm getting:
> org.apache.thrift.transport.TTransportException: Cannot write to null
> outputStream
>        at org.apache.thrift.transport.TIOStreamTransport.write(TIOStreamTransport.java:137)
>        at org.apache.thrift.protocol.TBinaryProtocol.writeI32(TBinaryProtocol.java:152)
>        at org.apache.thrift.protocol.TBinaryProtocol.writeMessageBegin(TBinaryProtocol.java:80)
>        at org.apache.cassandra.service.Cassandra$Client.send_get_string_property(Cassandra.java:681)
>        at org.apache.cassandra.service.Cassandra$Client.get_string_property(Cassandra.java:675)
>        at com.atsid.cassandra.ngram.test.TestRingConnection.main(TestRingConnection.java:26)
>
>
> when running:
>
>        TTransport tr = new TSocket("localhost", 9160);
>        TProtocol proto = new TBinaryProtocol(tr);
>        Cassandra.Client client = new Cassandra.Client(proto);
>                try {
>                        String jsonServerList = client.get_string_property("token map");
>
>
> What am I doing wrong here?
>
> On Wed, Mar 17, 2010 at 11:33 AM, Sonny Heer <so...@gmail.com> wrote:
>> Cool thanks Todd.  I'd be interested at some point to see the updated
>> .6 version as well.  Thanks again!
>>
>> On Wed, Mar 17, 2010 at 9:24 AM, B. Todd Burruss <bb...@real.com> wrote:
>>> below is the commented out code i once used.  i think it is from the 0.5
>>> days, so it might not even work now.  not sure.  the bootstrapHostArr is
>>> simply a list of host information used to bootstrap the process.
>>>  connectToHost is a method used to generate a Cassandra.Client object.
>>>  there is sample code on cassandra wiki for doing this.  good luck!
>>>
>>> // can't use this on cassandra because the tokens returned are for the
>>> "internal cassandra server comm", not the thrift IPs
>>> //        String    hostList = null;
>>> //        for ( HostInfo hi : bootstrapHostArr ) {
>>> //            Cassandra.Client    client = null;
>>> //            try {
>>> //                client = connectToHost( hi.getHostName(), hi.getPort() );
>>> //                hostList = client.get_string_property( "token map" );
>>> //                break;
>>> //            }
>>> //            catch ( TTransportException e ) {
>>> //                logger.error( "cannot connect to bootstrap node - will try
>>> another if available : " + hi.getNameAndPort() );
>>> //            }
>>> //            catch ( TException e ) {
>>> //                logger.error( "cannot retrieve host list from node - will
>>> try another if available : " + hi.getNameAndPort() );
>>> //            }
>>> //            finally {
>>> //                if ( null != client ) {
>>> //                    disconnectFromCluster( client );
>>> //                }
>>> //            }
>>> //        }
>>> //   //        if ( null != hostList ) {           //
>>>  ArrayList<String>    newArr;
>>> //            try {
>>> //                JSONObject    jsonObj = new JSONObject( hostList );
>>> //                String[]    ringArr = JSONObject.getNames( jsonObj );
>>> //                newArr = new ArrayList<String>( ringArr.length );
>>> //               //                for ( int i=0;i < ringArr.length;i++ ) {
>>> //                    String    hostName = jsonObj.getString( ringArr[i] );
>>> //                    if ( !hostIgnoreSet.contains(hostName) ) {
>>> //                        newArr.add( hostName );
>>> //                    }
>>> //                }
>>> //            }
>>> //            catch ( JSONException e ) {
>>> //                throw new ClusterRuntimeException( "Could not parse JSON
>>> returned from Cassandra - don't know what to do?  ARRRRGGGG" );
>>> //            }
>>>
>>>
>>> Sonny Heer wrote:
>>>>
>>>> Is there some example code on how to do this?
>>>>
>>>> On Tue, Mar 16, 2010 at 3:07 PM, Jonathan Ellis <jb...@gmail.com> wrote:
>>>>
>>>>>
>>>>> token map is an internal representation, so returning the internal IPs
>>>>> is correct, even though this makes it slightly more difficult to use
>>>>> for thrift clients.
>>>>>
>>>>> On Tue, Mar 16, 2010 at 4:55 PM, B. Todd Burruss <bb...@real.com>
>>>>> wrote:
>>>>>
>>>>>>
>>>>>> if you choose #3 - get_string_property("token map") - keep in mind that
>>>>>> the
>>>>>> IPs returned from this call are the IPs used for "ListenAddress" param
>>>>>> in
>>>>>> storage-conf.xml.  In my case we have two NICs and I set this to be an
>>>>>> IP
>>>>>> that is only for "node to node" communication.  The "ThriftAddress"
>>>>>> param is
>>>>>> the one i really want.  maybe this has been changed, "fixed" ;), haven't
>>>>>> tested in a while.
>>>>>>
>>>>>> Jonathan Ellis wrote:
>>>>>>
>>>>>>>
>>>>>>> http://wiki.apache.org/cassandra/FAQ#node_clients_connect_to
>>>>>>>
>>>>>>> On Tue, Mar 16, 2010 at 4:30 PM, Sonny Heer <so...@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>> How can I accomplish this?
>>>>>>>>
>>>>>>>> The way I'm doing it now it is creating a TSocket connection using a
>>>>>>>> static IP of one of the boxes on Cassandra:
>>>>>>>>      TTransport tr = new TSocket(host, port.intValue());
>>>>>>>>      TProtocol proto = new TBinaryProtocol(tr);
>>>>>>>>      Cassandra.Client client = new Cassandra.Client(proto);
>>>>>>>>      tr.open();
>>>>>>>>
>>>>>>>> With a larger cluster I would imagine there is another preferred
>>>>>>>> solution with no single point of failure (e.g. that one box  goes
>>>>>>>> down).
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>
>>
>

Re: Dividing the client load between machines in Cassandra

Posted by Sonny Heer <so...@gmail.com>.
I'm getting:
org.apache.thrift.transport.TTransportException: Cannot write to null
outputStream
	at org.apache.thrift.transport.TIOStreamTransport.write(TIOStreamTransport.java:137)
	at org.apache.thrift.protocol.TBinaryProtocol.writeI32(TBinaryProtocol.java:152)
	at org.apache.thrift.protocol.TBinaryProtocol.writeMessageBegin(TBinaryProtocol.java:80)
	at org.apache.cassandra.service.Cassandra$Client.send_get_string_property(Cassandra.java:681)
	at org.apache.cassandra.service.Cassandra$Client.get_string_property(Cassandra.java:675)
	at com.atsid.cassandra.ngram.test.TestRingConnection.main(TestRingConnection.java:26)


when running:

        TTransport tr = new TSocket("localhost", 9160);
        TProtocol proto = new TBinaryProtocol(tr);
        Cassandra.Client client = new Cassandra.Client(proto);
		try {
			String jsonServerList = client.get_string_property("token map");


What am I doing wrong here?

On Wed, Mar 17, 2010 at 11:33 AM, Sonny Heer <so...@gmail.com> wrote:
> Cool thanks Todd.  I'd be interested at some point to see the updated
> .6 version as well.  Thanks again!
>
> On Wed, Mar 17, 2010 at 9:24 AM, B. Todd Burruss <bb...@real.com> wrote:
>> below is the commented out code i once used.  i think it is from the 0.5
>> days, so it might not even work now.  not sure.  the bootstrapHostArr is
>> simply a list of host information used to bootstrap the process.
>>  connectToHost is a method used to generate a Cassandra.Client object.
>>  there is sample code on cassandra wiki for doing this.  good luck!
>>
>> // can't use this on cassandra because the tokens returned are for the
>> "internal cassandra server comm", not the thrift IPs
>> //        String    hostList = null;
>> //        for ( HostInfo hi : bootstrapHostArr ) {
>> //            Cassandra.Client    client = null;
>> //            try {
>> //                client = connectToHost( hi.getHostName(), hi.getPort() );
>> //                hostList = client.get_string_property( "token map" );
>> //                break;
>> //            }
>> //            catch ( TTransportException e ) {
>> //                logger.error( "cannot connect to bootstrap node - will try
>> another if available : " + hi.getNameAndPort() );
>> //            }
>> //            catch ( TException e ) {
>> //                logger.error( "cannot retrieve host list from node - will
>> try another if available : " + hi.getNameAndPort() );
>> //            }
>> //            finally {
>> //                if ( null != client ) {
>> //                    disconnectFromCluster( client );
>> //                }
>> //            }
>> //        }
>> //   //        if ( null != hostList ) {           //
>>  ArrayList<String>    newArr;
>> //            try {
>> //                JSONObject    jsonObj = new JSONObject( hostList );
>> //                String[]    ringArr = JSONObject.getNames( jsonObj );
>> //                newArr = new ArrayList<String>( ringArr.length );
>> //               //                for ( int i=0;i < ringArr.length;i++ ) {
>> //                    String    hostName = jsonObj.getString( ringArr[i] );
>> //                    if ( !hostIgnoreSet.contains(hostName) ) {
>> //                        newArr.add( hostName );
>> //                    }
>> //                }
>> //            }
>> //            catch ( JSONException e ) {
>> //                throw new ClusterRuntimeException( "Could not parse JSON
>> returned from Cassandra - don't know what to do?  ARRRRGGGG" );
>> //            }
>>
>>
>> Sonny Heer wrote:
>>>
>>> Is there some example code on how to do this?
>>>
>>> On Tue, Mar 16, 2010 at 3:07 PM, Jonathan Ellis <jb...@gmail.com> wrote:
>>>
>>>>
>>>> token map is an internal representation, so returning the internal IPs
>>>> is correct, even though this makes it slightly more difficult to use
>>>> for thrift clients.
>>>>
>>>> On Tue, Mar 16, 2010 at 4:55 PM, B. Todd Burruss <bb...@real.com>
>>>> wrote:
>>>>
>>>>>
>>>>> if you choose #3 - get_string_property("token map") - keep in mind that
>>>>> the
>>>>> IPs returned from this call are the IPs used for "ListenAddress" param
>>>>> in
>>>>> storage-conf.xml.  In my case we have two NICs and I set this to be an
>>>>> IP
>>>>> that is only for "node to node" communication.  The "ThriftAddress"
>>>>> param is
>>>>> the one i really want.  maybe this has been changed, "fixed" ;), haven't
>>>>> tested in a while.
>>>>>
>>>>> Jonathan Ellis wrote:
>>>>>
>>>>>>
>>>>>> http://wiki.apache.org/cassandra/FAQ#node_clients_connect_to
>>>>>>
>>>>>> On Tue, Mar 16, 2010 at 4:30 PM, Sonny Heer <so...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>
>>>>>>>
>>>>>>> How can I accomplish this?
>>>>>>>
>>>>>>> The way I'm doing it now it is creating a TSocket connection using a
>>>>>>> static IP of one of the boxes on Cassandra:
>>>>>>>      TTransport tr = new TSocket(host, port.intValue());
>>>>>>>      TProtocol proto = new TBinaryProtocol(tr);
>>>>>>>      Cassandra.Client client = new Cassandra.Client(proto);
>>>>>>>      tr.open();
>>>>>>>
>>>>>>> With a larger cluster I would imagine there is another preferred
>>>>>>> solution with no single point of failure (e.g. that one box  goes
>>>>>>> down).
>>>>>>>
>>>>>>>
>>>>>>>
>>
>

Re: Dividing the client load between machines in Cassandra

Posted by Sonny Heer <so...@gmail.com>.
Cool thanks Todd.  I'd be interested at some point to see the updated
.6 version as well.  Thanks again!

On Wed, Mar 17, 2010 at 9:24 AM, B. Todd Burruss <bb...@real.com> wrote:
> below is the commented out code i once used.  i think it is from the 0.5
> days, so it might not even work now.  not sure.  the bootstrapHostArr is
> simply a list of host information used to bootstrap the process.
>  connectToHost is a method used to generate a Cassandra.Client object.
>  there is sample code on cassandra wiki for doing this.  good luck!
>
> // can't use this on cassandra because the tokens returned are for the
> "internal cassandra server comm", not the thrift IPs
> //        String    hostList = null;
> //        for ( HostInfo hi : bootstrapHostArr ) {
> //            Cassandra.Client    client = null;
> //            try {
> //                client = connectToHost( hi.getHostName(), hi.getPort() );
> //                hostList = client.get_string_property( "token map" );
> //                break;
> //            }
> //            catch ( TTransportException e ) {
> //                logger.error( "cannot connect to bootstrap node - will try
> another if available : " + hi.getNameAndPort() );
> //            }
> //            catch ( TException e ) {
> //                logger.error( "cannot retrieve host list from node - will
> try another if available : " + hi.getNameAndPort() );
> //            }
> //            finally {
> //                if ( null != client ) {
> //                    disconnectFromCluster( client );
> //                }
> //            }
> //        }
> //   //        if ( null != hostList ) {           //
>  ArrayList<String>    newArr;
> //            try {
> //                JSONObject    jsonObj = new JSONObject( hostList );
> //                String[]    ringArr = JSONObject.getNames( jsonObj );
> //                newArr = new ArrayList<String>( ringArr.length );
> //               //                for ( int i=0;i < ringArr.length;i++ ) {
> //                    String    hostName = jsonObj.getString( ringArr[i] );
> //                    if ( !hostIgnoreSet.contains(hostName) ) {
> //                        newArr.add( hostName );
> //                    }
> //                }
> //            }
> //            catch ( JSONException e ) {
> //                throw new ClusterRuntimeException( "Could not parse JSON
> returned from Cassandra - don't know what to do?  ARRRRGGGG" );
> //            }
>
>
> Sonny Heer wrote:
>>
>> Is there some example code on how to do this?
>>
>> On Tue, Mar 16, 2010 at 3:07 PM, Jonathan Ellis <jb...@gmail.com> wrote:
>>
>>>
>>> token map is an internal representation, so returning the internal IPs
>>> is correct, even though this makes it slightly more difficult to use
>>> for thrift clients.
>>>
>>> On Tue, Mar 16, 2010 at 4:55 PM, B. Todd Burruss <bb...@real.com>
>>> wrote:
>>>
>>>>
>>>> if you choose #3 - get_string_property("token map") - keep in mind that
>>>> the
>>>> IPs returned from this call are the IPs used for "ListenAddress" param
>>>> in
>>>> storage-conf.xml.  In my case we have two NICs and I set this to be an
>>>> IP
>>>> that is only for "node to node" communication.  The "ThriftAddress"
>>>> param is
>>>> the one i really want.  maybe this has been changed, "fixed" ;), haven't
>>>> tested in a while.
>>>>
>>>> Jonathan Ellis wrote:
>>>>
>>>>>
>>>>> http://wiki.apache.org/cassandra/FAQ#node_clients_connect_to
>>>>>
>>>>> On Tue, Mar 16, 2010 at 4:30 PM, Sonny Heer <so...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>
>>>>>>
>>>>>> How can I accomplish this?
>>>>>>
>>>>>> The way I'm doing it now it is creating a TSocket connection using a
>>>>>> static IP of one of the boxes on Cassandra:
>>>>>>      TTransport tr = new TSocket(host, port.intValue());
>>>>>>      TProtocol proto = new TBinaryProtocol(tr);
>>>>>>      Cassandra.Client client = new Cassandra.Client(proto);
>>>>>>      tr.open();
>>>>>>
>>>>>> With a larger cluster I would imagine there is another preferred
>>>>>> solution with no single point of failure (e.g. that one box  goes
>>>>>> down).
>>>>>>
>>>>>>
>>>>>>
>

Re: Dividing the client load between machines in Cassandra

Posted by "B. Todd Burruss" <bb...@real.com>.
below is the commented out code i once used.  i think it is from the 0.5 
days, so it might not even work now.  not sure.  the bootstrapHostArr is 
simply a list of host information used to bootstrap the process.  
connectToHost is a method used to generate a Cassandra.Client object.  
there is sample code on cassandra wiki for doing this.  good luck!

// can't use this on cassandra because the tokens returned are for the 
"internal cassandra server comm", not the thrift IPs
//        String    hostList = null;
//        for ( HostInfo hi : bootstrapHostArr ) {
//            Cassandra.Client    client = null;
//            try {
//                client = connectToHost( hi.getHostName(), hi.getPort() );
//                hostList = client.get_string_property( "token map" );
//                break;
//            }
//            catch ( TTransportException e ) {
//                logger.error( "cannot connect to bootstrap node - will 
try another if available : " + hi.getNameAndPort() );
//            }
//            catch ( TException e ) {
//                logger.error( "cannot retrieve host list from node - 
will try another if available : " + hi.getNameAndPort() );
//            }
//            finally {
//                if ( null != client ) {
//                    disconnectFromCluster( client );
//                }
//            }
//        }
//   
//        if ( null != hostList ) {           
//            ArrayList<String>    newArr;
//            try {
//                JSONObject    jsonObj = new JSONObject( hostList );
//                String[]    ringArr = JSONObject.getNames( jsonObj );
//                newArr = new ArrayList<String>( ringArr.length );
//               
//                for ( int i=0;i < ringArr.length;i++ ) {
//                    String    hostName = jsonObj.getString( ringArr[i] );
//                    if ( !hostIgnoreSet.contains(hostName) ) {
//                        newArr.add( hostName );
//                    }
//                }
//            }
//            catch ( JSONException e ) {
//                throw new ClusterRuntimeException( "Could not parse 
JSON returned from Cassandra - don't know what to do?  ARRRRGGGG" );
//            }


Sonny Heer wrote:
> Is there some example code on how to do this?
>
> On Tue, Mar 16, 2010 at 3:07 PM, Jonathan Ellis <jb...@gmail.com> wrote:
>   
>> token map is an internal representation, so returning the internal IPs
>> is correct, even though this makes it slightly more difficult to use
>> for thrift clients.
>>
>> On Tue, Mar 16, 2010 at 4:55 PM, B. Todd Burruss <bb...@real.com> wrote:
>>     
>>> if you choose #3 - get_string_property("token map") - keep in mind that the
>>> IPs returned from this call are the IPs used for "ListenAddress" param in
>>> storage-conf.xml.  In my case we have two NICs and I set this to be an IP
>>> that is only for "node to node" communication.  The "ThriftAddress" param is
>>> the one i really want.  maybe this has been changed, "fixed" ;), haven't
>>> tested in a while.
>>>
>>> Jonathan Ellis wrote:
>>>       
>>>> http://wiki.apache.org/cassandra/FAQ#node_clients_connect_to
>>>>
>>>> On Tue, Mar 16, 2010 at 4:30 PM, Sonny Heer <so...@gmail.com> wrote:
>>>>
>>>>         
>>>>> How can I accomplish this?
>>>>>
>>>>> The way I'm doing it now it is creating a TSocket connection using a
>>>>> static IP of one of the boxes on Cassandra:
>>>>>       TTransport tr = new TSocket(host, port.intValue());
>>>>>       TProtocol proto = new TBinaryProtocol(tr);
>>>>>       Cassandra.Client client = new Cassandra.Client(proto);
>>>>>       tr.open();
>>>>>
>>>>> With a larger cluster I would imagine there is another preferred
>>>>> solution with no single point of failure (e.g. that one box  goes
>>>>> down).
>>>>>
>>>>>
>>>>>           

Re: Dividing the client load between machines in Cassandra

Posted by Sonny Heer <so...@gmail.com>.
Is there some example code on how to do this?

On Tue, Mar 16, 2010 at 3:07 PM, Jonathan Ellis <jb...@gmail.com> wrote:
> token map is an internal representation, so returning the internal IPs
> is correct, even though this makes it slightly more difficult to use
> for thrift clients.
>
> On Tue, Mar 16, 2010 at 4:55 PM, B. Todd Burruss <bb...@real.com> wrote:
>> if you choose #3 - get_string_property("token map") - keep in mind that the
>> IPs returned from this call are the IPs used for "ListenAddress" param in
>> storage-conf.xml.  In my case we have two NICs and I set this to be an IP
>> that is only for "node to node" communication.  The "ThriftAddress" param is
>> the one i really want.  maybe this has been changed, "fixed" ;), haven't
>> tested in a while.
>>
>> Jonathan Ellis wrote:
>>>
>>> http://wiki.apache.org/cassandra/FAQ#node_clients_connect_to
>>>
>>> On Tue, Mar 16, 2010 at 4:30 PM, Sonny Heer <so...@gmail.com> wrote:
>>>
>>>>
>>>> How can I accomplish this?
>>>>
>>>> The way I'm doing it now it is creating a TSocket connection using a
>>>> static IP of one of the boxes on Cassandra:
>>>>       TTransport tr = new TSocket(host, port.intValue());
>>>>       TProtocol proto = new TBinaryProtocol(tr);
>>>>       Cassandra.Client client = new Cassandra.Client(proto);
>>>>       tr.open();
>>>>
>>>> With a larger cluster I would imagine there is another preferred
>>>> solution with no single point of failure (e.g. that one box  goes
>>>> down).
>>>>
>>>>
>>
>

Re: Dividing the client load between machines in Cassandra

Posted by Jonathan Ellis <jb...@gmail.com>.
token map is an internal representation, so returning the internal IPs
is correct, even though this makes it slightly more difficult to use
for thrift clients.

On Tue, Mar 16, 2010 at 4:55 PM, B. Todd Burruss <bb...@real.com> wrote:
> if you choose #3 - get_string_property("token map") - keep in mind that the
> IPs returned from this call are the IPs used for "ListenAddress" param in
> storage-conf.xml.  In my case we have two NICs and I set this to be an IP
> that is only for "node to node" communication.  The "ThriftAddress" param is
> the one i really want.  maybe this has been changed, "fixed" ;), haven't
> tested in a while.
>
> Jonathan Ellis wrote:
>>
>> http://wiki.apache.org/cassandra/FAQ#node_clients_connect_to
>>
>> On Tue, Mar 16, 2010 at 4:30 PM, Sonny Heer <so...@gmail.com> wrote:
>>
>>>
>>> How can I accomplish this?
>>>
>>> The way I'm doing it now it is creating a TSocket connection using a
>>> static IP of one of the boxes on Cassandra:
>>>       TTransport tr = new TSocket(host, port.intValue());
>>>       TProtocol proto = new TBinaryProtocol(tr);
>>>       Cassandra.Client client = new Cassandra.Client(proto);
>>>       tr.open();
>>>
>>> With a larger cluster I would imagine there is another preferred
>>> solution with no single point of failure (e.g. that one box  goes
>>> down).
>>>
>>>
>

Re: Dividing the client load between machines in Cassandra

Posted by "B. Todd Burruss" <bb...@real.com>.
if you choose #3 - get_string_property("token map") - keep in mind that 
the IPs returned from this call are the IPs used for "ListenAddress" 
param in storage-conf.xml.  In my case we have two NICs and I set this 
to be an IP that is only for "node to node" communication.  The 
"ThriftAddress" param is the one i really want.  maybe this has been 
changed, "fixed" ;), haven't tested in a while.

Jonathan Ellis wrote:
> http://wiki.apache.org/cassandra/FAQ#node_clients_connect_to
>
> On Tue, Mar 16, 2010 at 4:30 PM, Sonny Heer <so...@gmail.com> wrote:
>   
>> How can I accomplish this?
>>
>> The way I'm doing it now it is creating a TSocket connection using a
>> static IP of one of the boxes on Cassandra:
>>        TTransport tr = new TSocket(host, port.intValue());
>>        TProtocol proto = new TBinaryProtocol(tr);
>>        Cassandra.Client client = new Cassandra.Client(proto);
>>        tr.open();
>>
>> With a larger cluster I would imagine there is another preferred
>> solution with no single point of failure (e.g. that one box  goes
>> down).
>>
>>     

Re: Dividing the client load between machines in Cassandra

Posted by Jonathan Ellis <jb...@gmail.com>.
http://wiki.apache.org/cassandra/FAQ#node_clients_connect_to

On Tue, Mar 16, 2010 at 4:30 PM, Sonny Heer <so...@gmail.com> wrote:
> How can I accomplish this?
>
> The way I'm doing it now it is creating a TSocket connection using a
> static IP of one of the boxes on Cassandra:
>        TTransport tr = new TSocket(host, port.intValue());
>        TProtocol proto = new TBinaryProtocol(tr);
>        Cassandra.Client client = new Cassandra.Client(proto);
>        tr.open();
>
> With a larger cluster I would imagine there is another preferred
> solution with no single point of failure (e.g. that one box  goes
> down).
>