You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@chukwa.apache.org by "Ari Rabkin (JIRA)" <ji...@apache.org> on 2009/07/12 20:25:15 UTC

[jira] Created: (CHUKWA-345) Chunk.application and Chunk.streamName are redundant

Chunk.application and Chunk.streamName are redundant
----------------------------------------------------

                 Key: CHUKWA-345
                 URL: https://issues.apache.org/jira/browse/CHUKWA-345
             Project: Hadoop Chukwa
          Issue Type: Bug
          Components: data collection
            Reporter: Ari Rabkin


The chunk interface has both an "application" field and a "stream name" field. But they map to the same value.  Should cut one of those names, for clarity.

I think "application" is the less descriptive name, and should be cut.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CHUKWA-345) Chunk.application and Chunk.streamName are redundant

Posted by "Ari Rabkin (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CHUKWA-345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12773074#action_12773074 ] 

Ari Rabkin commented on CHUKWA-345:
-----------------------------------

Comments on this?

> Chunk.application and Chunk.streamName are redundant
> ----------------------------------------------------
>
>                 Key: CHUKWA-345
>                 URL: https://issues.apache.org/jira/browse/CHUKWA-345
>             Project: Hadoop Chukwa
>          Issue Type: Bug
>          Components: data collection
>            Reporter: Ari Rabkin
>            Assignee: Ari Rabkin
>         Attachments: CHUKWA-345.patch
>
>
> The chunk interface has both an "application" field and a "stream name" field. But they map to the same value.  Should cut one of those names, for clarity.
> I think "application" is the less descriptive name, and should be cut.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CHUKWA-345) Chunk.application and Chunk.streamName are redundant

Posted by "Ari Rabkin (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CHUKWA-345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12773288#action_12773288 ] 

Ari Rabkin commented on CHUKWA-345:
-----------------------------------

Eric, is that a +1 to commit this patch?

I'm open to a more general metadata framework; the intent of this patch was to clean up the status quo, and make sure that the interface correctly describes what's going on underneath.  I don't think it breaks backward-compatibility, unless there's a substantial volume of code for processing Chukwa data that we don't know about.

> Chunk.application and Chunk.streamName are redundant
> ----------------------------------------------------
>
>                 Key: CHUKWA-345
>                 URL: https://issues.apache.org/jira/browse/CHUKWA-345
>             Project: Hadoop Chukwa
>          Issue Type: Bug
>          Components: data collection
>            Reporter: Ari Rabkin
>            Assignee: Ari Rabkin
>         Attachments: CHUKWA-345.patch
>
>
> The chunk interface has both an "application" field and a "stream name" field. But they map to the same value.  Should cut one of those names, for clarity.
> I think "application" is the less descriptive name, and should be cut.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CHUKWA-345) Chunk.application and Chunk.streamName are redundant

Posted by "Jerome Boulon (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CHUKWA-345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12773096#action_12773096 ] 

Jerome Boulon commented on CHUKWA-345:
--------------------------------------

Instead of adding/removing some fields, it will be easier to support a MAP at the tag level that way everyone can put what ever make sense for them.

Chunk
 - DataType
 - Source
 - SeqId (uuid)
 - Map<String,String>
 - List<String> messages

A Map could easily be serialized using whatever format make sence (SerDe, Json,Avro,....) and could be used for routing, priority....

Also if we want to do this, since it's going to brake the compatibility why not moving to a well define serialization format (Avro,Thrift or Json),

> Chunk.application and Chunk.streamName are redundant
> ----------------------------------------------------
>
>                 Key: CHUKWA-345
>                 URL: https://issues.apache.org/jira/browse/CHUKWA-345
>             Project: Hadoop Chukwa
>          Issue Type: Bug
>          Components: data collection
>            Reporter: Ari Rabkin
>            Assignee: Ari Rabkin
>         Attachments: CHUKWA-345.patch
>
>
> The chunk interface has both an "application" field and a "stream name" field. But they map to the same value.  Should cut one of those names, for clarity.
> I think "application" is the less descriptive name, and should be cut.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (CHUKWA-345) Chunk.application and Chunk.streamName are redundant

Posted by "Ari Rabkin (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CHUKWA-345?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ari Rabkin updated CHUKWA-345:
------------------------------

    Attachment: CHUKWA-345.patch

> Chunk.application and Chunk.streamName are redundant
> ----------------------------------------------------
>
>                 Key: CHUKWA-345
>                 URL: https://issues.apache.org/jira/browse/CHUKWA-345
>             Project: Hadoop Chukwa
>          Issue Type: Bug
>          Components: data collection
>            Reporter: Ari Rabkin
>            Assignee: Ari Rabkin
>         Attachments: CHUKWA-345.patch
>
>
> The chunk interface has both an "application" field and a "stream name" field. But they map to the same value.  Should cut one of those names, for clarity.
> I think "application" is the less descriptive name, and should be cut.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CHUKWA-345) Chunk.application and Chunk.streamName are redundant

Posted by "Ari Rabkin (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CHUKWA-345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12773351#action_12773351 ] 

Ari Rabkin commented on CHUKWA-345:
-----------------------------------

I don't think this patch has anything to do with the transport protocol. It doesn't change the serialization format. My only goal is to remove a redundancy from the Chunk API. I don't intend to break pig -- I'm happy to revise the patch to make sure that pig stays OK.

> Chunk.application and Chunk.streamName are redundant
> ----------------------------------------------------
>
>                 Key: CHUKWA-345
>                 URL: https://issues.apache.org/jira/browse/CHUKWA-345
>             Project: Hadoop Chukwa
>          Issue Type: Bug
>          Components: data collection
>            Reporter: Ari Rabkin
>            Assignee: Ari Rabkin
>         Attachments: CHUKWA-345.patch
>
>
> The chunk interface has both an "application" field and a "stream name" field. But they map to the same value.  Should cut one of those names, for clarity.
> I think "application" is the less descriptive name, and should be cut.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CHUKWA-345) Chunk.application and Chunk.streamName are redundant

Posted by "Jerome Boulon (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CHUKWA-345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12736089#action_12736089 ] 

Jerome Boulon commented on CHUKWA-345:
--------------------------------------

the initial idea was to have one for the Application, like Hadoop and to set the streamname to the actual file path. 
If you remove it make sure that you update all dependent source code like pig queries

> Chunk.application and Chunk.streamName are redundant
> ----------------------------------------------------
>
>                 Key: CHUKWA-345
>                 URL: https://issues.apache.org/jira/browse/CHUKWA-345
>             Project: Hadoop Chukwa
>          Issue Type: Bug
>          Components: data collection
>            Reporter: Ari Rabkin
>
> The chunk interface has both an "application" field and a "stream name" field. But they map to the same value.  Should cut one of those names, for clarity.
> I think "application" is the less descriptive name, and should be cut.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (CHUKWA-345) Chunk.application and Chunk.streamName are redundant

Posted by "Ari Rabkin (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CHUKWA-345?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ari Rabkin updated CHUKWA-345:
------------------------------

       Resolution: Fixed
    Fix Version/s: 0.3.0
           Status: Resolved  (was: Patch Available)

I just committed this, after consulting privately with Jerome.

@Eric, Jerome: I'm open to changing the serialization format or metadata structure if the need arises, but it isn't clear to me that we have any such need at present.  I think the current tagging mechanism satisfies most needs for general-purpose custom metadata.

> Chunk.application and Chunk.streamName are redundant
> ----------------------------------------------------
>
>                 Key: CHUKWA-345
>                 URL: https://issues.apache.org/jira/browse/CHUKWA-345
>             Project: Hadoop Chukwa
>          Issue Type: Bug
>          Components: data collection
>            Reporter: Ari Rabkin
>            Assignee: Ari Rabkin
>             Fix For: 0.3.0
>
>         Attachments: CHUKWA-345.patch
>
>
> The chunk interface has both an "application" field and a "stream name" field. But they map to the same value.  Should cut one of those names, for clarity.
> I think "application" is the less descriptive name, and should be cut.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CHUKWA-345) Chunk.application and Chunk.streamName are redundant

Posted by "Ari Rabkin (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CHUKWA-345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12736180#action_12736180 ] 

Ari Rabkin commented on CHUKWA-345:
-----------------------------------

*nod*  However, we never implemented it that way.  There really isn't any such field.   

Is there a case for actually implementing it?  (Which would also require touching a lot of code)

> Chunk.application and Chunk.streamName are redundant
> ----------------------------------------------------
>
>                 Key: CHUKWA-345
>                 URL: https://issues.apache.org/jira/browse/CHUKWA-345
>             Project: Hadoop Chukwa
>          Issue Type: Bug
>          Components: data collection
>            Reporter: Ari Rabkin
>
> The chunk interface has both an "application" field and a "stream name" field. But they map to the same value.  Should cut one of those names, for clarity.
> I think "application" is the less descriptive name, and should be cut.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Assigned: (CHUKWA-345) Chunk.application and Chunk.streamName are redundant

Posted by "Ari Rabkin (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CHUKWA-345?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ari Rabkin reassigned CHUKWA-345:
---------------------------------

    Assignee: Ari Rabkin

> Chunk.application and Chunk.streamName are redundant
> ----------------------------------------------------
>
>                 Key: CHUKWA-345
>                 URL: https://issues.apache.org/jira/browse/CHUKWA-345
>             Project: Hadoop Chukwa
>          Issue Type: Bug
>          Components: data collection
>            Reporter: Ari Rabkin
>            Assignee: Ari Rabkin
>         Attachments: CHUKWA-345.patch
>
>
> The chunk interface has both an "application" field and a "stream name" field. But they map to the same value.  Should cut one of those names, for clarity.
> I think "application" is the less descriptive name, and should be cut.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CHUKWA-345) Chunk.application and Chunk.streamName are redundant

Posted by "Jerome Boulon (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CHUKWA-345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12773322#action_12773322 ] 

Jerome Boulon commented on CHUKWA-345:
--------------------------------------

Eric,  I'm not sure why you want to define the serialization format and the transport protocol at the same time? How you serialized the data and how you send them it's 2 things.

If your idea is to lock everyone to one single transport protocol HTTP then I'll put a -1 on it since there's no reason to lock the transport protocol and HTTP is not the one I'm using for example.
Regarding the serialization format, this should applied only to the datasink file format, the final output. For the same reason, if our serialization format is not supported in the target language/env then I want to be free to use what ever make more sense for me and of course I will have to convert it before writing the chunk to the dataSink file but hopefully the serialization format will work for everyone.

Ari, if we're going to do the Map, why would you want to commit that patch? The pig contrib project need to be updated if you want to commit that patch



> Chunk.application and Chunk.streamName are redundant
> ----------------------------------------------------
>
>                 Key: CHUKWA-345
>                 URL: https://issues.apache.org/jira/browse/CHUKWA-345
>             Project: Hadoop Chukwa
>          Issue Type: Bug
>          Components: data collection
>            Reporter: Ari Rabkin
>            Assignee: Ari Rabkin
>         Attachments: CHUKWA-345.patch
>
>
> The chunk interface has both an "application" field and a "stream name" field. But they map to the same value.  Should cut one of those names, for clarity.
> I think "application" is the less descriptive name, and should be cut.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CHUKWA-345) Chunk.application and Chunk.streamName are redundant

Posted by "Hudson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CHUKWA-345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12783708#action_12783708 ] 

Hudson commented on CHUKWA-345:
-------------------------------

Integrated in Chukwa-trunk #213 (See [http://hudson.zones.apache.org/hudson/job/Chukwa-trunk/213/])
    

> Chunk.application and Chunk.streamName are redundant
> ----------------------------------------------------
>
>                 Key: CHUKWA-345
>                 URL: https://issues.apache.org/jira/browse/CHUKWA-345
>             Project: Hadoop Chukwa
>          Issue Type: Bug
>          Components: data collection
>            Reporter: Ari Rabkin
>            Assignee: Ari Rabkin
>             Fix For: 0.3.0
>
>         Attachments: CHUKWA-345.patch
>
>
> The chunk interface has both an "application" field and a "stream name" field. But they map to the same value.  Should cut one of those names, for clarity.
> I think "application" is the less descriptive name, and should be cut.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CHUKWA-345) Chunk.application and Chunk.streamName are redundant

Posted by "Eric Yang (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CHUKWA-345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12773272#action_12773272 ] 

Eric Yang commented on CHUKWA-345:
----------------------------------

+1 on removing application and keep stream name.

+1 on Map<String, String> approach, but we should define the serialization method on HTTP protocol upfront.  Revising Chunk protocol should be in another JIRA with more discussions.

> Chunk.application and Chunk.streamName are redundant
> ----------------------------------------------------
>
>                 Key: CHUKWA-345
>                 URL: https://issues.apache.org/jira/browse/CHUKWA-345
>             Project: Hadoop Chukwa
>          Issue Type: Bug
>          Components: data collection
>            Reporter: Ari Rabkin
>            Assignee: Ari Rabkin
>         Attachments: CHUKWA-345.patch
>
>
> The chunk interface has both an "application" field and a "stream name" field. But they map to the same value.  Should cut one of those names, for clarity.
> I think "application" is the less descriptive name, and should be cut.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CHUKWA-345) Chunk.application and Chunk.streamName are redundant

Posted by "Eric Yang (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CHUKWA-345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12773615#action_12773615 ] 

Eric Yang commented on CHUKWA-345:
----------------------------------

+1 to commit this patch.

Serialization format and protocol format are somewhat coupled.  It's not possible to use certain protocols if the serialization format uses the same keywords.  For example, GET / HTTP/1.0\r\n\r\n maybe required to be escaped in serialization format, if Content-Length: is not honored, and HTTP protocol is chosen to wrap around the serialization protocol.  The same escaping rules apply to any transport protocol and serialization format.  Hence it's best to define the serialization format and protocol upfront.  The implementation would be easier and less error prone.  However, the protocol and serialization changes should not be in scope of this jira.

> Chunk.application and Chunk.streamName are redundant
> ----------------------------------------------------
>
>                 Key: CHUKWA-345
>                 URL: https://issues.apache.org/jira/browse/CHUKWA-345
>             Project: Hadoop Chukwa
>          Issue Type: Bug
>          Components: data collection
>            Reporter: Ari Rabkin
>            Assignee: Ari Rabkin
>         Attachments: CHUKWA-345.patch
>
>
> The chunk interface has both an "application" field and a "stream name" field. But they map to the same value.  Should cut one of those names, for clarity.
> I think "application" is the less descriptive name, and should be cut.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (CHUKWA-345) Chunk.application and Chunk.streamName are redundant

Posted by "Ari Rabkin (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CHUKWA-345?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ari Rabkin updated CHUKWA-345:
------------------------------

    Status: Patch Available  (was: Open)

StreamName seemed more descriptive to me. This patch removes Chunk.application

> Chunk.application and Chunk.streamName are redundant
> ----------------------------------------------------
>
>                 Key: CHUKWA-345
>                 URL: https://issues.apache.org/jira/browse/CHUKWA-345
>             Project: Hadoop Chukwa
>          Issue Type: Bug
>          Components: data collection
>            Reporter: Ari Rabkin
>            Assignee: Ari Rabkin
>         Attachments: CHUKWA-345.patch
>
>
> The chunk interface has both an "application" field and a "stream name" field. But they map to the same value.  Should cut one of those names, for clarity.
> I think "application" is the less descriptive name, and should be cut.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.