You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@storm.apache.org by Aaron Zimmerman <az...@sproutsocial.com> on 2014/08/11 19:22:51 UTC

Trident without Zookeeper?

Is it possible to use the trident API without having to track the
transaction state in zookeeper?

I have a high throughput topology, essentially ETL, and I don't need once
and only once. The  topology keeps dying, or tuples timeout, with zookeeper
connection errors.  I've raised the connection timeout to 25 seconds and
the session timeout to 60 seconds and this hasn't seemed to help much.

Thanks,

Aaron Zimmerman

Re: Trident without Zookeeper?

Posted by Aaron Zimmerman <az...@sproutsocial.com>.
Usually none.  I've seen an occasional out of memory exception.   There
isn't a lot of data in zookeeper itself, so I'm under the impression that
it is more of a garbage collection issue than it is lack of resources.  So
I was looking for a way to use zookeeper less as one possible solutions.
I'm working on getting zookeeper more memory as well.

I've noticed this once:
[SendWorker:5:QuorumCnxManager$SendWorker@679] - Interrupted while waiting
for message on queue
java.lang.InterruptedException
at
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:1961)


But in the storm logs I get many many errors like

supervisor.log-java.net.ConnectException: Connection refused
supervisor.log- at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
~[na:1.6.0_45]
supervisor.log- at
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:599)
~[na:1.6.0_45]
supervisor.log- at
org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1119)
~[zookeeper-3.3.3.jar:3.3.3-1073969]

AND

supervisor.log:2014-06-02 19:37:07 c.n.c.ConnectionState [ERROR] Connection
timed out
supervisor.log-org.apache.zookeeper.KeeperException$ConnectionLossException:
KeeperErrorCode = ConnectionLoss
supervisor.log- at
com.netflix.curator.ConnectionState.getZooKeeper(ConnectionState.java:72)
~[curator-client-1.0.1.jar:na]
supervisor.log- at
com.netflix.curator.CuratorZookeeperClient.getZooKeeper(CuratorZookeeperClient.java:74)
[curator-client-1.0.1.jar:na]

and

worker-6703.log:2014-08-11 03:04:54 c.n.c.f.i.CuratorFrameworkImpl [ERROR]
Background operation retry gave up
worker-6703.log-org.apache.zookeeper.KeeperException$ConnectionLossException:
KeeperErrorCode = ConnectionLoss
worker-6703.log- at
org.apache.zookeeper.KeeperException.create(KeeperException.java:90)
~[zookeeper-3.3.3.jar:3.3.3-1073969]
worker-6703.log- at
com.netflix.curator.framework.imps.CuratorFrameworkImpl.processBackgroundOperation(CuratorFrameworkImpl.java:380)
~[curator-framework-1.0.1.jar:na]



On Mon, Aug 11, 2014 at 1:34 PM, Danijel Schiavuzzi <da...@schiavuzzi.com>
wrote:

> What errors are you getting in your Zookeeper logs?
>
>
> On Mon, Aug 11, 2014 at 7:54 PM, Andrew Xor <an...@gmail.com>
> wrote:
>
>> Well, trident topologies compile directly to normal topologies. I think
>> you can get away with it albeit with a bit more code.
>>
>>
>> On Monday, August 11, 2014, Aaron Zimmerman <az...@sproutsocial.com>
>> wrote:
>>
>>> Just because I really like the api, I'm merging various data streams and
>>> then operating on them, storing each in a few places.  It is a bit awkward
>>> to do in the usual spouts and bolts.
>>>
>>>
>>> On Mon, Aug 11, 2014 at 12:25 PM, Andrew Xor <
>>> andreas.grammenos@gmail.com> wrote:
>>>
>>>> Why use a Trident topology then? Non transactional topologies have at
>>>> least once guarantee without the throughput pentalty imposed by using a
>>>> transactional topology.
>>>>
>>>>
>>>> On Monday, August 11, 2014, Aaron Zimmerman <
>>>> azimmerman@sproutsocial.com> wrote:
>>>>
>>>>> Is it possible to use the trident API without having to track the
>>>>> transaction state in zookeeper?
>>>>>
>>>>> I have a high throughput topology, essentially ETL, and I don't need
>>>>> once and only once. The  topology keeps dying, or tuples timeout, with
>>>>> zookeeper connection errors.  I've raised the connection timeout to 25
>>>>> seconds and the session timeout to 60 seconds and this hasn't seemed to
>>>>> help much.
>>>>>
>>>>> Thanks,
>>>>>
>>>>> Aaron Zimmerman
>>>>>
>>>>
>>>>
>>>> --
>>>> Kindly yours,
>>>>
>>>> Andrew Grammenos
>>>>
>>>> -- PGP PKey --
>>>> ​ <https://www.dropbox.com/s/2kcxe59zsi9nrdt/pgpsig.txt>
>>>> https://www.dropbox.com/s/ei2nqsen641daei/pgpsig.txt
>>>>
>>>>
>>>
>>
>> --
>> Kindly yours,
>>
>> Andrew Grammenos
>>
>> -- PGP PKey --
>> ​ <https://www.dropbox.com/s/2kcxe59zsi9nrdt/pgpsig.txt>
>> https://www.dropbox.com/s/ei2nqsen641daei/pgpsig.txt
>>
>>
>
>
> --
> Danijel Schiavuzzi
>
> E: danijel@schiavuzzi.com
> W: www.schiavuzzi.com
> T: +385 98 9035562
> Skype: danijel.schiavuzzi
>

Re: Trident without Zookeeper?

Posted by Danijel Schiavuzzi <da...@schiavuzzi.com>.
What errors are you getting in your Zookeeper logs?


On Mon, Aug 11, 2014 at 7:54 PM, Andrew Xor <an...@gmail.com>
wrote:

> Well, trident topologies compile directly to normal topologies. I think
> you can get away with it albeit with a bit more code.
>
>
> On Monday, August 11, 2014, Aaron Zimmerman <az...@sproutsocial.com>
> wrote:
>
>> Just because I really like the api, I'm merging various data streams and
>> then operating on them, storing each in a few places.  It is a bit awkward
>> to do in the usual spouts and bolts.
>>
>>
>> On Mon, Aug 11, 2014 at 12:25 PM, Andrew Xor <andreas.grammenos@gmail.com
>> > wrote:
>>
>>> Why use a Trident topology then? Non transactional topologies have at
>>> least once guarantee without the throughput pentalty imposed by using a
>>> transactional topology.
>>>
>>>
>>> On Monday, August 11, 2014, Aaron Zimmerman <az...@sproutsocial.com>
>>> wrote:
>>>
>>>> Is it possible to use the trident API without having to track the
>>>> transaction state in zookeeper?
>>>>
>>>> I have a high throughput topology, essentially ETL, and I don't need
>>>> once and only once. The  topology keeps dying, or tuples timeout, with
>>>> zookeeper connection errors.  I've raised the connection timeout to 25
>>>> seconds and the session timeout to 60 seconds and this hasn't seemed to
>>>> help much.
>>>>
>>>> Thanks,
>>>>
>>>> Aaron Zimmerman
>>>>
>>>
>>>
>>> --
>>> Kindly yours,
>>>
>>> Andrew Grammenos
>>>
>>> -- PGP PKey --
>>> ​ <https://www.dropbox.com/s/2kcxe59zsi9nrdt/pgpsig.txt>
>>> https://www.dropbox.com/s/ei2nqsen641daei/pgpsig.txt
>>>
>>>
>>
>
> --
> Kindly yours,
>
> Andrew Grammenos
>
> -- PGP PKey --
> ​ <https://www.dropbox.com/s/2kcxe59zsi9nrdt/pgpsig.txt>
> https://www.dropbox.com/s/ei2nqsen641daei/pgpsig.txt
>
>


-- 
Danijel Schiavuzzi

E: danijel@schiavuzzi.com
W: www.schiavuzzi.com
T: +385 98 9035562
Skype: danijel.schiavuzzi

Re: Trident without Zookeeper?

Posted by Andrew Xor <an...@gmail.com>.
Well, trident topologies compile directly to normal topologies. I think you
can get away with it albeit with a bit more code.

On Monday, August 11, 2014, Aaron Zimmerman <az...@sproutsocial.com>
wrote:

> Just because I really like the api, I'm merging various data streams and
> then operating on them, storing each in a few places.  It is a bit awkward
> to do in the usual spouts and bolts.
>
>
> On Mon, Aug 11, 2014 at 12:25 PM, Andrew Xor <andreas.grammenos@gmail.com
> <javascript:_e(%7B%7D,'cvml','andreas.grammenos@gmail.com');>> wrote:
>
>> Why use a Trident topology then? Non transactional topologies have at
>> least once guarantee without the throughput pentalty imposed by using a
>> transactional topology.
>>
>>
>> On Monday, August 11, 2014, Aaron Zimmerman <azimmerman@sproutsocial.com
>> <javascript:_e(%7B%7D,'cvml','azimmerman@sproutsocial.com');>> wrote:
>>
>>> Is it possible to use the trident API without having to track the
>>> transaction state in zookeeper?
>>>
>>> I have a high throughput topology, essentially ETL, and I don't need
>>> once and only once. The  topology keeps dying, or tuples timeout, with
>>> zookeeper connection errors.  I've raised the connection timeout to 25
>>> seconds and the session timeout to 60 seconds and this hasn't seemed to
>>> help much.
>>>
>>> Thanks,
>>>
>>> Aaron Zimmerman
>>>
>>
>>
>> --
>> Kindly yours,
>>
>> Andrew Grammenos
>>
>> -- PGP PKey --
>> ​ <https://www.dropbox.com/s/2kcxe59zsi9nrdt/pgpsig.txt>
>> https://www.dropbox.com/s/ei2nqsen641daei/pgpsig.txt
>>
>>
>

-- 
Kindly yours,

Andrew Grammenos

-- PGP PKey --
​ <https://www.dropbox.com/s/2kcxe59zsi9nrdt/pgpsig.txt>
https://www.dropbox.com/s/ei2nqsen641daei/pgpsig.txt

Re: Trident without Zookeeper?

Posted by Aaron Zimmerman <az...@sproutsocial.com>.
Just because I really like the api, I'm merging various data streams and
then operating on them, storing each in a few places.  It is a bit awkward
to do in the usual spouts and bolts.


On Mon, Aug 11, 2014 at 12:25 PM, Andrew Xor <an...@gmail.com>
wrote:

> Why use a Trident topology then? Non transactional topologies have at
> least once guarantee without the throughput pentalty imposed by using a
> transactional topology.
>
>
> On Monday, August 11, 2014, Aaron Zimmerman <az...@sproutsocial.com>
> wrote:
>
>> Is it possible to use the trident API without having to track the
>> transaction state in zookeeper?
>>
>> I have a high throughput topology, essentially ETL, and I don't need once
>> and only once. The  topology keeps dying, or tuples timeout, with zookeeper
>> connection errors.  I've raised the connection timeout to 25 seconds and
>> the session timeout to 60 seconds and this hasn't seemed to help much.
>>
>> Thanks,
>>
>> Aaron Zimmerman
>>
>
>
> --
> Kindly yours,
>
> Andrew Grammenos
>
> -- PGP PKey --
> ​ <https://www.dropbox.com/s/2kcxe59zsi9nrdt/pgpsig.txt>
> https://www.dropbox.com/s/ei2nqsen641daei/pgpsig.txt
>
>

Re: Trident without Zookeeper?

Posted by Andrew Xor <an...@gmail.com>.
Why use a Trident topology then? Non transactional topologies have at least
once guarantee without the throughput pentalty imposed by using a
transactional topology.

On Monday, August 11, 2014, Aaron Zimmerman <az...@sproutsocial.com>
wrote:

> Is it possible to use the trident API without having to track the
> transaction state in zookeeper?
>
> I have a high throughput topology, essentially ETL, and I don't need once
> and only once. The  topology keeps dying, or tuples timeout, with zookeeper
> connection errors.  I've raised the connection timeout to 25 seconds and
> the session timeout to 60 seconds and this hasn't seemed to help much.
>
> Thanks,
>
> Aaron Zimmerman
>


-- 
Kindly yours,

Andrew Grammenos

-- PGP PKey --
​ <https://www.dropbox.com/s/2kcxe59zsi9nrdt/pgpsig.txt>
https://www.dropbox.com/s/ei2nqsen641daei/pgpsig.txt