You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@chukwa.apache.org by "Mac Yang (JIRA)" <ji...@apache.org> on 2009/04/08 02:56:13 UTC

[jira] Created: (CHUKWA-102) Provide streaming adapter

Provide streaming adapter
-------------------------

                 Key: CHUKWA-102
                 URL: https://issues.apache.org/jira/browse/CHUKWA-102
             Project: Hadoop Chukwa
          Issue Type: New Feature
          Components: input tools
            Reporter: Mac Yang


Enable applications to send data directly to the agent without writing to disk first.


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CHUKWA-102) Provide streaming adapter

Posted by "Ari Rabkin (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CHUKWA-102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12739103#action_12739103 ] 

Ari Rabkin commented on CHUKWA-102:
-----------------------------------

OK.  I didn't understand that that was the purpose of this JIRA.  Sounds good.  Just to be clear-- this is bypassing the agent process?

> Provide streaming adapter
> -------------------------
>
>                 Key: CHUKWA-102
>                 URL: https://issues.apache.org/jira/browse/CHUKWA-102
>             Project: Hadoop Chukwa
>          Issue Type: New Feature
>          Components: input tools
>            Reporter: Mac Yang
>
> Enable applications to send data directly to the agent without writing to disk first.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CHUKWA-102) Provide streaming adapter

Posted by "Jerome Boulon (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CHUKWA-102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12739094#action_12739094 ] 

Jerome Boulon commented on CHUKWA-102:
--------------------------------------

I'm currently working on a streaming solution for Chukwa but open to share the work ;-)
Just let me know what you have in mind.

My current plan is to use Thrift for 2 principal reasons.
1- A lot of people are forced to use scribe since there's no other option and for a lot of them, this will be the only part of their system in C. Using Thrift will provide an easy migration path for them and make Chukwa even more attractive.
2- Thrift is working relatively well and that will save us a lot of time/energy to get a stable version

So I'm planning on adding:
- a new Log4Appender
- a new Thrift server that can run in standalone mode or embedded inside the current collector
- Thrift will be used only for the transport, and current writer will still be responsible for writing the data to Seq. File.
- A new Writer, that could be used to write data cross network to an another Thrift server/collector



> Provide streaming adapter
> -------------------------
>
>                 Key: CHUKWA-102
>                 URL: https://issues.apache.org/jira/browse/CHUKWA-102
>             Project: Hadoop Chukwa
>          Issue Type: New Feature
>          Components: input tools
>            Reporter: Mac Yang
>
> Enable applications to send data directly to the agent without writing to disk first.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CHUKWA-102) Provide streaming adapter

Posted by "Eric Yang (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CHUKWA-102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12788988#action_12788988 ] 

Eric Yang commented on CHUKWA-102:
----------------------------------

Llog4j socketappender should be used by the source, and I will write a socket adaptor for this jira.

> Provide streaming adapter
> -------------------------
>
>                 Key: CHUKWA-102
>                 URL: https://issues.apache.org/jira/browse/CHUKWA-102
>             Project: Hadoop Chukwa
>          Issue Type: New Feature
>          Components: input tools
>            Reporter: Mac Yang
>            Assignee: Eric Yang
>
> Enable applications to send data directly to the agent without writing to disk first.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CHUKWA-102) Provide streaming adapter

Posted by "Ari Rabkin (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CHUKWA-102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12738841#action_12738841 ] 

Ari Rabkin commented on CHUKWA-102:
-----------------------------------

Is anything happening with this?  Jerome, would you mind if someone else took a crack at it?

> Provide streaming adapter
> -------------------------
>
>                 Key: CHUKWA-102
>                 URL: https://issues.apache.org/jira/browse/CHUKWA-102
>             Project: Hadoop Chukwa
>          Issue Type: New Feature
>          Components: input tools
>            Reporter: Mac Yang
>
> Enable applications to send data directly to the agent without writing to disk first.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CHUKWA-102) Provide streaming adapter

Posted by "Jerome Boulon (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CHUKWA-102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12739128#action_12739128 ] 

Jerome Boulon commented on CHUKWA-102:
--------------------------------------

The initial idea when Mac creates this Jira was to stream data from an application to the local agent but my proposal is to send directly to the collector and scale horizontally if possible then if you can no longer scale horizontally (too many files on HDFS for example) then scale by creating a DAG of collector (similar to what scribe is doing).

So in my proposal there's no need for an Agent, the agent could be a simple as a cronjob that read a file and stream it over to a remote collector.
I may put some logic inside the log4j Appender to fall back to disk in case of network issue or to have the same kind of checkpoint directly at the log4j appender. But I'm still working on the design.



> Provide streaming adapter
> -------------------------
>
>                 Key: CHUKWA-102
>                 URL: https://issues.apache.org/jira/browse/CHUKWA-102
>             Project: Hadoop Chukwa
>          Issue Type: New Feature
>          Components: input tools
>            Reporter: Mac Yang
>
> Enable applications to send data directly to the agent without writing to disk first.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Assigned: (CHUKWA-102) Provide streaming adapter

Posted by "Eric Yang (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CHUKWA-102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Eric Yang reassigned CHUKWA-102:
--------------------------------

    Assignee: Eric Yang

> Provide streaming adapter
> -------------------------
>
>                 Key: CHUKWA-102
>                 URL: https://issues.apache.org/jira/browse/CHUKWA-102
>             Project: Hadoop Chukwa
>          Issue Type: New Feature
>          Components: input tools
>            Reporter: Mac Yang
>            Assignee: Eric Yang
>
> Enable applications to send data directly to the agent without writing to disk first.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CHUKWA-102) Provide streaming adapter

Posted by "Ari Rabkin (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CHUKWA-102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12739131#action_12739131 ] 

Ari Rabkin commented on CHUKWA-102:
-----------------------------------

Yes. The price of this approach is that you can't handle failures well, you can't share connections between different appenders, and you can't gracefully support anything where you don't control the logging.  So I think for most purposes the Agent design is an improvement.  Obviously, there are cases where you can change the log4j adaptor, and can't add a deamon, however.

I'm sort of skeptical that a DAG of collectors really helps you that much.  The thing is accomplishes is bounding the number of incoming connections per collector; but it's not clear that that's a serious scaling limit. 

> Provide streaming adapter
> -------------------------
>
>                 Key: CHUKWA-102
>                 URL: https://issues.apache.org/jira/browse/CHUKWA-102
>             Project: Hadoop Chukwa
>          Issue Type: New Feature
>          Components: input tools
>            Reporter: Mac Yang
>
> Enable applications to send data directly to the agent without writing to disk first.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CHUKWA-102) Provide streaming adapter

Posted by "Eric Yang (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CHUKWA-102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12783514#action_12783514 ] 

Eric Yang commented on CHUKWA-102:
----------------------------------

Resurrecting this jira.  The components to make this feature useful are:

Streaming adaptor - Listen to a port, and receive chunks, and send directly to collector, spill to disk if it could not send the chunk.
Streaming appender - Chunk creator which can configure to send to an agent or an collector.  There is no fail safe buffering in the appender.

Streaming appender could send to collector directly for small clusters to avoid agent setup.  For large cluster, it should be configured to send to agent then collector.  These utilities support dag configuration and avoid tcp incast problem.  Streaming appender should communicate to streaming adaptor through local loopback when they are deployed on the same system.  This provides best effort log delivery system with minimum effort for implementation.

> Provide streaming adapter
> -------------------------
>
>                 Key: CHUKWA-102
>                 URL: https://issues.apache.org/jira/browse/CHUKWA-102
>             Project: Hadoop Chukwa
>          Issue Type: New Feature
>          Components: input tools
>            Reporter: Mac Yang
>
> Enable applications to send data directly to the agent without writing to disk first.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CHUKWA-102) Provide streaming adapter

Posted by "Ari Rabkin (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CHUKWA-102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12783520#action_12783520 ] 

Ari Rabkin commented on CHUKWA-102:
-----------------------------------

I was going to do the buffer-to-disk as CHUKWA-395.  I expect to get to it in about a week.

> Provide streaming adapter
> -------------------------
>
>                 Key: CHUKWA-102
>                 URL: https://issues.apache.org/jira/browse/CHUKWA-102
>             Project: Hadoop Chukwa
>          Issue Type: New Feature
>          Components: input tools
>            Reporter: Mac Yang
>
> Enable applications to send data directly to the agent without writing to disk first.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CHUKWA-102) Provide streaming adapter

Posted by "Jerome Boulon (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CHUKWA-102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12697089#action_12697089 ] 

Jerome Boulon commented on CHUKWA-102:
--------------------------------------

I effectively wrote a prototype long time ago, in fact prior to the FileAdaptor. 
I understand your concern about buffering, network issue and so on however, there's some persistent requests to get something that does not write to disk but the contract will be clear: StreamingAdaptor could not guarantee any data delevery so if agent is not there the data will be dropped. Later on we may have some kind of buffering but that will not be the case for the first iteration. If loosing data is not acceptable then a file based adaptopr shold be used.
 
It's on my plan to work on this withing the next 3 weeks and I was planning to use Avro.

> Provide streaming adapter
> -------------------------
>
>                 Key: CHUKWA-102
>                 URL: https://issues.apache.org/jira/browse/CHUKWA-102
>             Project: Hadoop Chukwa
>          Issue Type: New Feature
>          Components: input tools
>            Reporter: Mac Yang
>
> Enable applications to send data directly to the agent without writing to disk first.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CHUKWA-102) Provide streaming adapter

Posted by "Eric Yang (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CHUKWA-102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12789005#action_12789005 ] 

Eric Yang commented on CHUKWA-102:
----------------------------------

I am sending to agent for large scale deployment to avoid excessive local disk usage and tcp incast problem.  I believe your version is  good for small/simple deployment.  Both implementation should be committed.

> Provide streaming adapter
> -------------------------
>
>                 Key: CHUKWA-102
>                 URL: https://issues.apache.org/jira/browse/CHUKWA-102
>             Project: Hadoop Chukwa
>          Issue Type: New Feature
>          Components: input tools
>            Reporter: Mac Yang
>            Assignee: Eric Yang
>
> Enable applications to send data directly to the agent without writing to disk first.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CHUKWA-102) Provide streaming adapter

Posted by "Jerome Boulon (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CHUKWA-102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12788991#action_12788991 ] 

Jerome Boulon commented on CHUKWA-102:
--------------------------------------

Eric, are you planning to send Chunks from Log4j to Collectors or just to an Agent?

I'm testing my version of a Log4JStreaming appender using Thrift and sending to a remote collector (Agent less solution).
So could you clarify what you want to do?



> Provide streaming adapter
> -------------------------
>
>                 Key: CHUKWA-102
>                 URL: https://issues.apache.org/jira/browse/CHUKWA-102
>             Project: Hadoop Chukwa
>          Issue Type: New Feature
>          Components: input tools
>            Reporter: Mac Yang
>            Assignee: Eric Yang
>
> Enable applications to send data directly to the agent without writing to disk first.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CHUKWA-102) Provide streaming adapter

Posted by "Ari Rabkin (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CHUKWA-102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12696862#action_12696862 ] 

Ari Rabkin commented on CHUKWA-102:
-----------------------------------

I don't mind including code for it, but I think as a matter of policy this isn't a good idea.  Buffering everything on disk gives us a lot of tolerance for network failures and load spikes, at modest performance cost.   A direct pipe forces either Chukwa or the app to do explicit buffering and such.  

I believe there may be code for this floating around; Jerome and Eric wrote it for the prototype.  I'd be interested to see a performance comparison with file tailing.



> Provide streaming adapter
> -------------------------
>
>                 Key: CHUKWA-102
>                 URL: https://issues.apache.org/jira/browse/CHUKWA-102
>             Project: Hadoop Chukwa
>          Issue Type: New Feature
>          Components: input tools
>            Reporter: Mac Yang
>
> Enable applications to send data directly to the agent without writing to disk first.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CHUKWA-102) Provide streaming adapter

Posted by "Ari Rabkin (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CHUKWA-102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12739153#action_12739153 ] 

Ari Rabkin commented on CHUKWA-102:
-----------------------------------

Just to be clear -- I'm all in favor of the approach you outline, so long as it's a supplement to agents, not a replacement, going forwards.

> Provide streaming adapter
> -------------------------
>
>                 Key: CHUKWA-102
>                 URL: https://issues.apache.org/jira/browse/CHUKWA-102
>             Project: Hadoop Chukwa
>          Issue Type: New Feature
>          Components: input tools
>            Reporter: Mac Yang
>
> Enable applications to send data directly to the agent without writing to disk first.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.