You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flume.apache.org by mahendran m <ma...@hotmail.com> on 2014/11/11 11:20:32 UTC

File channel data lost



Hi All ,
I have just implemented the Flume agent with below configuration 
Configuration
# example.conf: A single-node Flume configuration
# Name the components on this agenta1.sources = r1a1.sinks = k1a1.channels = c1
# Describe/configure the sourcea1.sources.r1.type = avroa1.sources.r1.bind = localhosta1.sources.r1.port = 44440
# Describe the sinka1.sinks.k1.type = hdfsa1.sinks.k1.hdfs.fileType = DataStreama1.sinks.k1.hdfs.fileSuffix= .txta1.sinks.k1.hdfs.rollSize = 1048576a1.sinks.k1.hdfs.rollCount = 0a1.sinks.k1.hdfs.rollInterval = 0a1.sinks.k1.hdfs.batchSize = 1000a1.sinks.k1.hdfs.minBlockReplicas = 1a1.sinks.k1.hdfs.path = hdfs://localhost:9000/flume/MemoryChannel/Avro #using the file channela1.channels.c1.type = filea1.channels.c1.capacity = 1000000a1.channels.c1.transactionCapacity = 10000
# Bind the source and sink to the channela1.sources.r1.channels = c1a1.sinks.k1.channel = c1

Now i am sending batch of 1000 event to flume AVRO source and each event with UID incremented by one. HDFS server  create text file of 1MB each as per my configuration and file with .tmp extension (file which is process now). Now i stopping the flume agent and start it again. below are my two Expectation when starting flume agent again
1. Agent will resend the event from next to last successfully received event (in my case .tmp file has event with UID 12000 as     last so next event will be event with UID as 12001 )But what append is it start event with 12500 UID , event from 12001 to 12499 is completely lost
2. Agent will resume the appending event to file where it left last that is file which is not completed (file with .tmp extension)But agent not resumed the appending event to file where it had left . it created the new text file and start to append it .
Can any one explain we why my two expectation failed ?
And also file are remained with .tmp extension once i stopped the agent it doesn't remove this extension . can any know why these happening ? 
Regards,Mahendran 		 	   		  
 		 	   		  

RE: File channel data lost

Posted by Hari Shreedharan <hs...@cloudera.com>.
Running on Windows is something I have no idea about. Perhaps Roshan can help?


Thanks,
Hari

On Tue, Nov 11, 2014 at 9:08 PM, mahendran m <ma...@hotmail.com>
wrote:

> Hi Hari ,
> Thanks  for your reply .
> Q: How did you stop the agent?
> A: I was start agent from widows command prompt and simply closing the agent by clicking close button of command prompt
> Q: the application which you wrote - does it handle resending the events when Avro Source throws an exception. It looks like Avro Source received a bunch of events, then you killed the agent and did not resend?
> A: Yes  i am sending the event to source then you killed agent when sending event is in process .Can you please explain me how can i implement the re-sending event in my application 
> Thanks,Mahendran
> Date: Tue, 11 Nov 2014 11:50:21 -0800
> From: hshreedharan@cloudera.com
> To: user@flume.apache.org
> CC: user@flume.apache.org
> Subject: Re: File channel data lost
> the application which you wrote - does it handle resending the events when Avro Source throws an exception. It looks like Avro Source received a bunch of events, then you killed the agent and did not resend?
> #2 is expected. We don’t append to the file, instead we create a new file. There is nothing wrong with it. 
> How did you stop the agent? If you killed it with kill -9 then the rename will not happen.
> Thanks,
> Hari
> On Tue, Nov 11, 2014 at 2:22 AM, mahendran m <ma...@hotmail.com> wrote:
> Hi All ,
> I have just implemented the Flume agent with below configuration 
> Configuration
> # example.conf: A single-node Flume configuration
> # Name the components on this agent
> a1.sources = r1
> a1.sinks = k1
> a1.channels = c1
> # Describe/configure the source
> a1.sources.r1.type = avro
> a1.sources.r1.bind = localhost
> a1.sources.r1.port = 44440
> # Describe the sink
> a1.sinks.k1.type = hdfs
> a1.sinks.k1.hdfs.fileType = DataStream
> a1.sinks.k1.hdfs.fileSuffix= .txt
> a1.sinks.k1.hdfs.rollSize = 1048576
> a1.sinks.k1.hdfs.rollCount = 0
> a1.sinks.k1.hdfs.rollInterval = 0
> a1.sinks.k1.hdfs.batchSize = 1000
> a1.sinks.k1.hdfs.minBlockReplicas = 1
> a1.sinks.k1.hdfs.path = hdfs://localhost:9000/flume/MemoryChannel/Avro
>  
> #using the file channel
> a1.channels.c1.type = file
> a1.channels.c1.capacity = 1000000
> a1.channels.c1.transactionCapacity = 10000
> # Bind the source and sink to the channel
> a1.sources.r1.channels = c1
> a1.sinks.k1.channel = c1
> Now i am sending batch of 1000 event to flume AVRO source and each event with UID incremented by one. HDFS server  create text file of 1MB each as per my configuration and file with .tmp extension (file which is process now). Now i stopping the flume agent and start it again. below are my two Expectation when starting flume agent again
> 1. Agent will resend the event from next to last successfully received event (in my case .tmp file has event with UID 12000 as     last so next event will be event with UID as 12001 )
> But what append is it start event with 12500 UID , event from 12001 to 12499 is completely lost
> 2. Agent will resume the appending event to file where it left last that is file which is not completed (file with .tmp extension)
> But agent not resumed the appending event to file where it had left . it created the new text file and start to append it .
> Can any one explain we why my two expectation failed ?
> And also file are remained with .tmp extension once i stopped the agent it doesn't remove this extension . can any know why these happening ? 
> Regards,
> Mahendran 		 	   		  
>  		 	   		  
>  		 	   		  

RE: File channel data lost

Posted by mahendran m <ma...@hotmail.com>.
Hi Hari ,
Thanks  for your reply .
Q: How did you stop the agent?
A: I was start agent from widows command prompt and simply closing the agent by clicking close button of command prompt

Q: the application which you wrote - does it handle resending the events when Avro Source throws an exception. It looks like Avro Source received a bunch of events, then you killed the agent and did not resend?
A: Yes  i am sending the event to source then you killed agent when sending event is in process .Can you please explain me how can i implement the re-sending event in my application 
Thanks,Mahendran
Date: Tue, 11 Nov 2014 11:50:21 -0800
From: hshreedharan@cloudera.com
To: user@flume.apache.org
CC: user@flume.apache.org
Subject: Re: File channel data lost


the application which you wrote - does it handle resending the events when Avro Source throws an exception. It looks like Avro Source received a bunch of events, then you killed the agent and did not resend?


#2 is expected. We don’t append to the file, instead we create a new file. There is nothing wrong with it. 


How did you stop the agent? If you killed it with kill -9 then the rename will not happen.

Thanks,
Hari


On Tue, Nov 11, 2014 at 2:22 AM, mahendran m <ma...@hotmail.com> wrote:




Hi All ,


I have just implemented the Flume agent with below configuration 


Configuration



# example.conf: A single-node Flume configuration


# Name the components on this agent
a1.sources = r1
a1.sinks = k1
a1.channels = c1


# Describe/configure the source
a1.sources.r1.type = avro
a1.sources.r1.bind = localhost
a1.sources.r1.port = 44440


# Describe the sink
a1.sinks.k1.type = hdfs
a1.sinks.k1.hdfs.fileType = DataStream
a1.sinks.k1.hdfs.fileSuffix= .txt
a1.sinks.k1.hdfs.rollSize = 1048576
a1.sinks.k1.hdfs.rollCount = 0
a1.sinks.k1.hdfs.rollInterval = 0
a1.sinks.k1.hdfs.batchSize = 1000
a1.sinks.k1.hdfs.minBlockReplicas = 1
a1.sinks.k1.hdfs.path = hdfs://localhost:9000/flume/MemoryChannel/Avro
 
#using the file channel
a1.channels.c1.type = file
a1.channels.c1.capacity = 1000000
a1.channels.c1.transactionCapacity = 10000


# Bind the source and sink to the channel
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1





Now i am sending batch of 1000 event to flume AVRO source and each event with UID incremented by one. HDFS server  create text file of 1MB each as per my configuration and file with .tmp extension (file which is process now). Now i stopping the flume agent and start it again. below are my two Expectation when starting flume agent again




1. Agent will resend the event from next to last successfully received event (in my case .tmp file has event with UID 12000 as     last so next event will be event with UID as 12001 )

But what append is it start event with 12500 UID , event from 12001 to 12499 is completely lost



2. Agent will resume the appending event to file where it left last that is file which is not completed (file with .tmp extension)
But agent not resumed the appending event to file where it had left . it created the new text file and start to append it .





Can any one explain we why my two expectation failed ?


And also file are remained with .tmp extension once i stopped the agent it doesn't remove this extension . can any know why these happening ? 


Regards,
Mahendran 		 	   		  

 		 	   		  
 		 	   		  

Re: File channel data lost

Posted by Hari Shreedharan <hs...@cloudera.com>.
the application which you wrote - does it handle resending the events when Avro Source throws an exception. It looks like Avro Source received a bunch of events, then you killed the agent and did not resend?




#2 is expected. We don’t append to the file, instead we create a new file. There is nothing wrong with it. 




How did you stop the agent? If you killed it with kill -9 then the rename will not happen.


Thanks,
Hari

On Tue, Nov 11, 2014 at 2:22 AM, mahendran m <ma...@hotmail.com>
wrote:

> Hi All ,
> I have just implemented the Flume agent with below configuration 
> Configuration
> # example.conf: A single-node Flume configuration
> # Name the components on this agenta1.sources = r1a1.sinks = k1a1.channels = c1
> # Describe/configure the sourcea1.sources.r1.type = avroa1.sources.r1.bind = localhosta1.sources.r1.port = 44440
> # Describe the sinka1.sinks.k1.type = hdfsa1.sinks.k1.hdfs.fileType = DataStreama1.sinks.k1.hdfs.fileSuffix= .txta1.sinks.k1.hdfs.rollSize = 1048576a1.sinks.k1.hdfs.rollCount = 0a1.sinks.k1.hdfs.rollInterval = 0a1.sinks.k1.hdfs.batchSize = 1000a1.sinks.k1.hdfs.minBlockReplicas = 1a1.sinks.k1.hdfs.path = hdfs://localhost:9000/flume/MemoryChannel/Avro #using the file channela1.channels.c1.type = filea1.channels.c1.capacity = 1000000a1.channels.c1.transactionCapacity = 10000
> # Bind the source and sink to the channela1.sources.r1.channels = c1a1.sinks.k1.channel = c1
> Now i am sending batch of 1000 event to flume AVRO source and each event with UID incremented by one. HDFS server  create text file of 1MB each as per my configuration and file with .tmp extension (file which is process now). Now i stopping the flume agent and start it again. below are my two Expectation when starting flume agent again
> 1. Agent will resend the event from next to last successfully received event (in my case .tmp file has event with UID 12000 as     last so next event will be event with UID as 12001 )But what append is it start event with 12500 UID , event from 12001 to 12499 is completely lost
> 2. Agent will resume the appending event to file where it left last that is file which is not completed (file with .tmp extension)But agent not resumed the appending event to file where it had left . it created the new text file and start to append it .
> Can any one explain we why my two expectation failed ?
> And also file are remained with .tmp extension once i stopped the agent it doesn't remove this extension . can any know why these happening ? 
> Regards,Mahendran 		 	   		  
>