You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by Alexandru Sicoe <ad...@gmail.com> on 2012/04/03 18:18:15 UTC

2 questions DataStax Enterprise

Hi guys,
 I'm trying out DSE and looking for the best way to arrange the cluster. I
have 9 nodes: 3 behind a gateway taking in writes from my collectors and 6
outside the gateway that are supposed to take replicas from the other 3 and
serve reads and analytics jobs.

1. Is it ok to run the 3 nodes as normal Cassandra nodes and run the other
6 nodes as analytics? Can I serve both real time reads and M/R jobs from
the 6 nodes? How will these affect each other performancewise?

I know that the way the system is supposed to be used is to separate
analytics from real time queries. I've already explored a possible 3DC
setup with Tyler in another message and it indeed works but I'm afraid it
is too complex and would require me to send 2 replicas across the firewall
which it can't handle very well at peak times, affecting other applications.

2. I started the cluster in the setup described in 1 (3 normal, 6
analytics) and as soon as the Analytics nodes start up they start
outputting this message:

INFO [TASK-TRACKER-INIT] 2012-04-03 17:54:59,575 Client.java (line 629)
Retrying connect to server: IP_OF_NORMAL_CASSANDRA_SEED_NODE:8012. Already
tried 10 time(s).
....

So it seems my analytics nodes are trying to contact the normal Cassandra
seed node on port 8012 which I read is a "Hadoop Job Tracker client port".
It doesn't seem like this is the normal behavior. Why is it getting
confused? In the .yaml of each node I'm using endpoint_snitch:
com.datastax.bdp.snitch.DseSimpleSnitch and putting in the Analytics seed
node before the normal cassandra seed node in the seeds.

Cheers,
Alex

Re: 2 questions DataStax Enterprise

Posted by Jake Luciani <ja...@gmail.com>.

Hi reply inline.

On Tue, Apr 3, 2012 at 12:18 PM, Alexandru Sicoe <ad...@gmail.com> wrote:

> Hi guys,
>  I'm trying out DSE and looking for the best way to arrange the cluster. I
> have 9 nodes: 3 behind a gateway taking in writes from my collectors and 6
> outside the gateway that are supposed to take replicas from the other 3 and
> serve reads and analytics jobs.
>
> 1. Is it ok to run the 3 nodes as normal Cassandra nodes and run the other
> 6 nodes as analytics? Can I serve both real time reads and M/R jobs from
> the 6 nodes? How will these affect each other performancewise?
>

if you plan to use CFS heavily then it will affect performance of the other
nodes.  If you raise the RF of your column families then it should be fine
if you run mapreduce at CL=ONE


>
> I know that the way the system is supposed to be used is to separate
> analytics from real time queries. I've already explored a possible 3DC
> setup with Tyler in another message and it indeed works but I'm afraid it
> is too complex and would require me to send 2 replicas across the firewall
> which it can't handle very well at peak times, affecting other applications.
>
> 2. I started the cluster in the setup described in 1 (3 normal, 6
> analytics) and as soon as the Analytics nodes start up they start
> outputting this message:
>
> INFO [TASK-TRACKER-INIT] 2012-04-03 17:54:59,575 Client.java (line 629)
> Retrying connect to server: IP_OF_NORMAL_CASSANDRA_SEED_NODE:8012. Already
> tried 10 time(s).
> ....
>
> So it seems my analytics nodes are trying to contact the normal Cassandra
> seed node on port 8012 which I read is a "Hadoop Job Tracker client port".
> It doesn't seem like this is the normal behavior. Why is it getting
> confused? In the .yaml of each node I'm using endpoint_snitch:
> com.datastax.bdp.snitch.DseSimpleSnitch and putting in the Analytics seed
> node before the normal cassandra seed node in the seeds.
>


You can run dsetool movejt to move the jobtracker to one of the known
hadoop nodes.


>
> Cheers,
> Alex
>
>


-- 
http://twitter.com/tjake