You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@storm.apache.org by Sachin Pasalkar <Sa...@symantec.com> on 2015/09/10 08:47:00 UTC

How to rotate file incase of storm worker fails?

Hi,

I was looking at code where HDFSBolt writes the file with tuple coming to it. I also had a look  at JIRA STORM-969<https://issues.apache.org/jira/browse/STORM-969>, I have following question to it

1)Lets say I have setup fileRotation policy at 64 MB, and I have written file upto 59MB. Now, my worker failed, the file which I was writing to will be never get rotated to final location.
2) As per 969 Jira, they have added the forceSync way but they are keeping all tuples in memory and delaying the acks sent back to spout. In our case, to write 64 MB data we need to store 5,400,000 process messages, which leads to lot of data in     memory. This may lead to unnecessary reply of tuple from spout(I am aware its at least once and I can increase TOPOLOGY_MESSAGE_TIMEOUT_SECS to full fill my requirement but is there other way?)

Thanks,
Sachin

Re: How to rotate file incase of storm worker fails?

Posted by Sachin Pasalkar <Sa...@symantec.com>.
I don’t see why it can’t be happen though it uses the tick tuple, what if the worker die in between? This file will never get rotate to final destination

From: Arun Iyer <ai...@hortonworks.com>>
Reply-To: "dev@storm.apache.org<ma...@storm.apache.org>" <de...@storm.apache.org>>
Date: Thursday, 10 September 2015 12:35 pm
To: "dev@storm.apache.org<ma...@storm.apache.org>" <de...@storm.apache.org>>
Subject: Re: How to rotate file incase of storm worker fails?

Sachin,

STORM-969 makes use of tick tuple to periodically ack and flush the tuples so the scenario you mentioned would not happen. The tickTupleInterval is configurable.

- Arun




On 9/9/15, 11:47 PM, "Sachin Pasalkar" <Sa...@symantec.com>> wrote:

Hi,

I was looking at code where HDFSBolt writes the file with tuple coming to it. I also had a look  at JIRA STORM-969<https://issues.apache.org/jira/browse/STORM-969>, I have following question to it

1)Lets say I have setup fileRotation policy at 64 MB, and I have written file upto 59MB. Now, my worker failed, the file which I was writing to will be never get rotated to final location.
2) As per 969 Jira, they have added the forceSync way but they are keeping all tuples in memory and delaying the acks sent back to spout. In our case, to write 64 MB data we need to store 5,400,000 process messages, which leads to lot of data in     memory. This may lead to unnecessary reply of tuple from spout(I am aware its at least once and I can increase TOPOLOGY_MESSAGE_TIMEOUT_SECS to full fill my requirement but is there other way?)

Thanks,
Sachin


Re: How to rotate file incase of storm worker fails?

Posted by Arun Iyer <ai...@hortonworks.com>.
Sachin,

STORM-969 makes use of tick tuple to periodically ack and flush the tuples so the scenario you mentioned would not happen. The tickTupleInterval is configurable.

- Arun




On 9/9/15, 11:47 PM, "Sachin Pasalkar" <Sa...@symantec.com> wrote:

>Hi,
>
>I was looking at code where HDFSBolt writes the file with tuple coming to it. I also had a look  at JIRA STORM-969<https://issues.apache.org/jira/browse/STORM-969>, I have following question to it
>
>1)Lets say I have setup fileRotation policy at 64 MB, and I have written file upto 59MB. Now, my worker failed, the file which I was writing to will be never get rotated to final location.
>2) As per 969 Jira, they have added the forceSync way but they are keeping all tuples in memory and delaying the acks sent back to spout. In our case, to write 64 MB data we need to store 5,400,000 process messages, which leads to lot of data in     memory. This may lead to unnecessary reply of tuple from spout(I am aware its at least once and I can increase TOPOLOGY_MESSAGE_TIMEOUT_SECS to full fill my requirement but is there other way?)
>
>Thanks,
>Sachin