You are viewing a plain text version of this content. The canonical link for it is here.
Posted to general@hadoop.apache.org by Scott Mackenzie <sc...@meeza.com.qa> on 2009/05/17 10:35:57 UTC

Question - Remote Logging to Hadoop

All,

Does anyone have a working method to capture system and application logging for remote servers directly into HDFS in real-time?

We have tested a few methods but all seem reliant on a batch process run every 5 to 10 minutes.

Any ideas would be appreciated?



Warm Regards,


Scott

________________________________
Disclaimer: This email and any attachments thereto are confidential and it may contain information protected by intellectual Property Laws or otherwise legally privileged. It is intended solely for the use of the addressee(s). If you are not the intended recipient, please notify the sender immediately and delete this email and any attachments from your system and destroy any printout thereof. Any unauthorized use or disclosure of this email and its attachments is strictly prohibited and may be unlawful. MEEZA QSTP-LLC (MEEZA) does not certify that this email or any attachments are free of viruses or defects and recommends that you undertake your own virus checking procedures before opening this e-mail or any attachments. MEEZA does not accept any liability for any loss or damage resulting from the use of this e-mail or any attachments. The contents of this e-mail and any attachments do not necessarily represent the views or polices of MEEZA. As the integrity of e-mails across the internet cannot be guaranteed, MEEZA does not accept any liability for any e-mails and attachments which may violate the relevant applicable laws.


Re: Question - Remote Logging to Hadoop

Posted by Guo Leitao <le...@gmail.com>.
have you ever tried X-trace?
http://research.yahoo.com/files/andy_konwinski_x-tracing_hadoop.pdf

2009/5/17 Scott Mackenzie <sc...@meeza.com.qa>

> All,
>
> Does anyone have a working method to capture system and application logging
> for remote servers directly into HDFS in real-time?
>
> We have tested a few methods but all seem reliant on a batch process run
> every 5 to 10 minutes.
>
> Any ideas would be appreciated?
>
>
>
> Warm Regards,
>
>
> Scott
>
> ________________________________
> Disclaimer: This email and any attachments thereto are confidential and it
> may contain information protected by intellectual Property Laws or otherwise
> legally privileged. It is intended solely for the use of the addressee(s).
> If you are not the intended recipient, please notify the sender immediately
> and delete this email and any attachments from your system and destroy any
> printout thereof. Any unauthorized use or disclosure of this email and its
> attachments is strictly prohibited and may be unlawful. MEEZA QSTP-LLC
> (MEEZA) does not certify that this email or any attachments are free of
> viruses or defects and recommends that you undertake your own virus checking
> procedures before opening this e-mail or any attachments. MEEZA does not
> accept any liability for any loss or damage resulting from the use of this
> e-mail or any attachments. The contents of this e-mail and any attachments
> do not necessarily represent the views or polices of MEEZA. As the integrity
> of e-mails across the internet cannot be guaranteed, MEEZA does not accept
> any liability for any e-mails and attachments which may violate the relevant
> applicable laws.
>
>

Re: Question - Remote Logging to Hadoop

Posted by Dhruba Borthakur <dh...@gmail.com>.
We have deployed a pilot for making scribe log directly into a hdfs 0.19
cluster. We will be pushing these changes to scribe to the open-source
location this week. Also, there are a few changes to HDFS that were required
to make this work smoothly, e.g. HADOOP-2757. These changes are being
debated to figure out what will be pushed into hdfs trunk.

 Also, here are some of the configuration settings:

        <property>
                <name>ipc.client.idlethreshold</name>
                <value>10000</value>
                <description>Defines the threshold number of connections
after which
               connections will be inspected for idleness.
               </description>
        </property>
        <property>
          <name>ipc.client.connection.maxidletime</name>
          <value>10000</value>
          <description>The maximum time in msec after which a client will
bring down the
                       connection to the server.
          </description>
        </property>
        <property>
          <name>ipc.client.connect.max.retries</name>
          <value>2</value>
          <description>Indicates the number of retries a client will make to
establish
                       a server connection.
          </description>
        </property>
        <property>
          <name>ipc.server.listen.queue.size</name>
          <value>128</value>
          <description>Indicates the length of the listen queue for servers
accepting
                       client connections.
          </description>
        </property>
      <property>
          <name>ipc.server.tcpnodelay</name>
          <value>true</value>
          <description>Turn on/off Nagle's algorithm for the TCP socket
connection on
          the server. Setting to true disables the algorithm and may
decrease latency
          with a cost of more/smaller packets.
          </description>
        </property>
        <property>
          <name>ipc.client.tcpnodelay</name>
          <value>true</value>
          <description>Turn on/off Nagle's algorithm for the TCP socket
connection on
          the client. Setting to true disables the algorithm and may
decrease latency
          with a cost of more/smaller packets.
          </description>
        </property>
        <property>
          <name>ipc.ping.interval</name>
          <value>5000</value>
          <description>The Client sends a ping message to server every
period. This is helpful
          to detect socket connections that were idle and have been
terminated by a failed server.
          </description>
        </property>
        <property>
          <name>ipc.client.connect.maxwaittime</name>
          <value>5000</value>
          <description>The Client waits for this much time for a socket
connect call to be establised
          with the server.
          </description>
        </property>
        <property>
          <name>dfs.datanode.socket.write.timeout</name>
          <value>20000</value>
          <description>The dfs Client waits for this much time for a socket
write call to the datanode.
          </description>
        </property>
        <property>
          <name>dfs.leaserenewal.timeout</name>
          <value>10000</value>
          <description>The dfs writer waits for this much time after the
last successful lease renewal
          before aborting the write to the file.
          </description>
        </property>


On Mon, May 18, 2009 at 12:22 AM, Amr Awadallah <aa...@cloudera.com> wrote:

> Scribe from Facebook:
>
> http://developers.facebook.com/scribe/
>
> -- amr
>
>> All,
>>
>> Does anyone have a working method to capture system and application
>> logging for remote servers directly into HDFS in real-time?
>>
>> We have tested a few methods but all seem reliant on a batch process run
>> every 5 to 10 minutes.
>>
>> Any ideas would be appreciated?
>>
>>
>>
>> Warm Regards,
>>
>>
>> Scott
>>
>> ________________________________
>> Disclaimer: This email and any attachments thereto are confidential and it
>> may contain information protected by intellectual Property Laws or otherwise
>> legally privileged. It is intended solely for the use of the addressee(s).
>> If you are not the intended recipient, please notify the sender immediately
>> and delete this email and any attachments from your system and destroy any
>> printout thereof. Any unauthorized use or disclosure of this email and its
>> attachments is strictly prohibited and may be unlawful. MEEZA QSTP-LLC
>> (MEEZA) does not certify that this email or any attachments are free of
>> viruses or defects and recommends that you undertake your own virus checking
>> procedures before opening this e-mail or any attachments. MEEZA does not
>> accept any liability for any loss or damage resulting from the use of this
>> e-mail or any attachments. The contents of this e-mail and any attachments
>> do not necessarily represent the views or polices of MEEZA. As the integrity
>> of e-mails across the internet cannot be guaranteed, MEEZA does not accept
>> any liability for any e-mails and attachments which may violate the relevant
>> applicable laws.
>>
>>
>>
>>
>
>
>

Re: Question - Remote Logging to Hadoop

Posted by Amr Awadallah <aa...@cloudera.com>.
Scribe from Facebook:

http://developers.facebook.com/scribe/

-- amr
> All,
>
> Does anyone have a working method to capture system and application logging for remote servers directly into HDFS in real-time?
>
> We have tested a few methods but all seem reliant on a batch process run every 5 to 10 minutes.
>
> Any ideas would be appreciated?
>
>
>
> Warm Regards,
>
>
> Scott
>
> ________________________________
> Disclaimer: This email and any attachments thereto are confidential and it may contain information protected by intellectual Property Laws or otherwise legally privileged. It is intended solely for the use of the addressee(s). If you are not the intended recipient, please notify the sender immediately and delete this email and any attachments from your system and destroy any printout thereof. Any unauthorized use or disclosure of this email and its attachments is strictly prohibited and may be unlawful. MEEZA QSTP-LLC (MEEZA) does not certify that this email or any attachments are free of viruses or defects and recommends that you undertake your own virus checking procedures before opening this e-mail or any attachments. MEEZA does not accept any liability for any loss or damage resulting from the use of this e-mail or any attachments. The contents of this e-mail and any attachments do not necessarily represent the views or polices of MEEZA. As the integrity of e-mails across the internet cannot be guaranteed, MEEZA does not accept any liability for any e-mails and attachments which may violate the relevant applicable laws.
>
>
>   



Re: Question - Remote Logging to Hadoop

Posted by Jerome Boulon <jb...@yahoo-inc.com>.
The documentation for chukwa has not been released yet but you can access it here:
http://people.apache.org/~eyang/docs/r0.1.2/index.html

/Jerome.


On 5/18/09 12:39 AM, "Arun C Murthy" <ac...@yahoo-inc.com> wrote:



On May 17, 2009, at 1:35 AM, Scott Mackenzie wrote:

> All,
>
> Does anyone have a working method to capture system and application
> logging for remote servers directly into HDFS in real-time?
>
> We have tested a few methods but all seem reliant on a batch process
> run every 5 to 10 minutes.
>
> Any ideas would be appreciated?
>
>

Not exactly 'real-time', but you might want to take a look at Chukwa (http://svn.apache.org/viewvc/hadoop/chukwa/
).

Unfortunately I can't seem to find documentation for it, I'll try and
push the chukwa-devs.

Arun

>
> Warm Regards,
>
>
> Scott
>
> ________________________________
> Disclaimer: This email and any attachments thereto are confidential
> and it may contain information protected by intellectual Property
> Laws or otherwise legally privileged. It is intended solely for the
> use of the addressee(s). If you are not the intended recipient,
> please notify the sender immediately and delete this email and any
> attachments from your system and destroy any printout thereof. Any
> unauthorized use or disclosure of this email and its attachments is
> strictly prohibited and may be unlawful. MEEZA QSTP-LLC (MEEZA) does
> not certify that this email or any attachments are free of viruses
> or defects and recommends that you undertake your own virus checking
> procedures before opening this e-mail or any attachments. MEEZA does
> not accept any liability for any loss or damage resulting from the
> use of this e-mail or any attachments. The contents of this e-mail
> and any attachments do not necessarily represent the views or
> polices of MEEZA. As the integrity of e-mails across the internet
> cannot be guaranteed, MEEZA does not accept any liability for any e-
> mails and attachments which may violate the relevant applicable laws.
>



Re: Question - Remote Logging to Hadoop

Posted by Arun C Murthy <ac...@yahoo-inc.com>.
On May 17, 2009, at 1:35 AM, Scott Mackenzie wrote:

> All,
>
> Does anyone have a working method to capture system and application  
> logging for remote servers directly into HDFS in real-time?
>
> We have tested a few methods but all seem reliant on a batch process  
> run every 5 to 10 minutes.
>
> Any ideas would be appreciated?
>
>

Not exactly 'real-time', but you might want to take a look at Chukwa (http://svn.apache.org/viewvc/hadoop/chukwa/ 
).

Unfortunately I can't seem to find documentation for it, I'll try and  
push the chukwa-devs.

Arun

>
> Warm Regards,
>
>
> Scott
>
> ________________________________
> Disclaimer: This email and any attachments thereto are confidential  
> and it may contain information protected by intellectual Property  
> Laws or otherwise legally privileged. It is intended solely for the  
> use of the addressee(s). If you are not the intended recipient,  
> please notify the sender immediately and delete this email and any  
> attachments from your system and destroy any printout thereof. Any  
> unauthorized use or disclosure of this email and its attachments is  
> strictly prohibited and may be unlawful. MEEZA QSTP-LLC (MEEZA) does  
> not certify that this email or any attachments are free of viruses  
> or defects and recommends that you undertake your own virus checking  
> procedures before opening this e-mail or any attachments. MEEZA does  
> not accept any liability for any loss or damage resulting from the  
> use of this e-mail or any attachments. The contents of this e-mail  
> and any attachments do not necessarily represent the views or  
> polices of MEEZA. As the integrity of e-mails across the internet  
> cannot be guaranteed, MEEZA does not accept any liability for any e- 
> mails and attachments which may violate the relevant applicable laws.
>