You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@hive.apache.org by "Suresh Antony (JIRA)" <ji...@apache.org> on 2009/08/31 21:31:32 UTC

[jira] Created: (HIVE-809) Create a copier to copy data from scribe hdfs cluster to main DW cluster

Create a copier to copy data from scribe hdfs cluster to main DW cluster
------------------------------------------------------------------------

                 Key: HIVE-809
                 URL: https://issues.apache.org/jira/browse/HIVE-809
             Project: Hadoop Hive
          Issue Type: New Feature
            Reporter: Suresh Antony
            Assignee: Suresh Antony
            Priority: Minor
             Fix For: 0.4.0


Currently we have scribe hdfs, which write scribe data directly to HDFS cluster. But in most cases this cluster will not be used for accessing the data.
This data needs to copied to cluster from which you can access this scribe using hive or some other tool.
This copier should be able to copy large amounts of data on a new realtime bases. 



-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-809) Create a copier to copy data from scribe hdfs cluster to main DW cluster

Posted by "Suresh Antony (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HIVE-809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Suresh Antony updated HIVE-809:
-------------------------------

    Status: Patch Available  (was: Open)

Submitted patch for scribehdfs to main hdfs copier.   


> Create a copier to copy data from scribe hdfs cluster to main DW cluster
> ------------------------------------------------------------------------
>
>                 Key: HIVE-809
>                 URL: https://issues.apache.org/jira/browse/HIVE-809
>             Project: Hadoop Hive
>          Issue Type: New Feature
>            Reporter: Suresh Antony
>            Assignee: Suresh Antony
>            Priority: Minor
>             Fix For: 0.4.0
>
>         Attachments: patch_809_1.txt
>
>
> Currently we have scribe hdfs, which write scribe data directly to HDFS cluster. But in most cases this cluster will not be used for accessing the data.
> This data needs to copied to cluster from which you can access this scribe using hive or some other tool.
> This copier should be able to copy large amounts of data on a new realtime bases. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-809) Create a copier to copy data from scribe hdfs cluster to main DW cluster

Posted by "Namit Jain (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HIVE-809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Namit Jain updated HIVE-809:
----------------------------

    Fix Version/s:     (was: 0.5.0)

> Create a copier to copy data from scribe hdfs cluster to main DW cluster
> ------------------------------------------------------------------------
>
>                 Key: HIVE-809
>                 URL: https://issues.apache.org/jira/browse/HIVE-809
>             Project: Hadoop Hive
>          Issue Type: New Feature
>            Reporter: Suresh Antony
>            Assignee: Suresh Antony
>            Priority: Minor
>         Attachments: patch_809_1.txt
>
>
> Currently we have scribe hdfs, which write scribe data directly to HDFS cluster. But in most cases this cluster will not be used for accessing the data.
> This data needs to copied to cluster from which you can access this scribe using hive or some other tool.
> This copier should be able to copy large amounts of data on a new realtime bases. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-809) Create a copier to copy data from scribe hdfs cluster to main DW cluster

Posted by "Zheng Shao (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HIVE-809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12829485#action_12829485 ] 

Zheng Shao commented on HIVE-809:
---------------------------------

There are some syntactic improvements. See http://java.sun.com/docs/codeconv/ for details.

1. Variable names should NOT contain "_"
2. Import some classes so we don't need to refer to the full name again and again: org.apache.hadoop.record.meta.TypeID

Also it will be great to add javadocs for all public classes/methods.




> Create a copier to copy data from scribe hdfs cluster to main DW cluster
> ------------------------------------------------------------------------
>
>                 Key: HIVE-809
>                 URL: https://issues.apache.org/jira/browse/HIVE-809
>             Project: Hadoop Hive
>          Issue Type: New Feature
>            Reporter: Suresh Antony
>            Assignee: Suresh Antony
>            Priority: Minor
>         Attachments: patch_809_1.txt
>
>
> Currently we have scribe hdfs, which write scribe data directly to HDFS cluster. But in most cases this cluster will not be used for accessing the data.
> This data needs to copied to cluster from which you can access this scribe using hive or some other tool.
> This copier should be able to copy large amounts of data on a new realtime bases. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-809) Create a copier to copy data from scribe hdfs cluster to main DW cluster

Posted by "Namit Jain (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HIVE-809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12758027#action_12758027 ] 

Namit Jain commented on HIVE-809:
---------------------------------

It contains authors names in a few places.
Contains system.out.println() in a few places.
No unit test - can you use this to copy from 1 dir to another and add a test for the same.

> Create a copier to copy data from scribe hdfs cluster to main DW cluster
> ------------------------------------------------------------------------
>
>                 Key: HIVE-809
>                 URL: https://issues.apache.org/jira/browse/HIVE-809
>             Project: Hadoop Hive
>          Issue Type: New Feature
>            Reporter: Suresh Antony
>            Assignee: Suresh Antony
>            Priority: Minor
>             Fix For: 0.5.0
>
>         Attachments: patch_809_1.txt
>
>
> Currently we have scribe hdfs, which write scribe data directly to HDFS cluster. But in most cases this cluster will not be used for accessing the data.
> This data needs to copied to cluster from which you can access this scribe using hive or some other tool.
> This copier should be able to copy large amounts of data on a new realtime bases. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-809) Create a copier to copy data from scribe hdfs cluster to main DW cluster

Posted by "Suresh Antony (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HIVE-809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Suresh Antony updated HIVE-809:
-------------------------------

    Attachment: patch_809_1.txt

patch for scribe data copier. 

> Create a copier to copy data from scribe hdfs cluster to main DW cluster
> ------------------------------------------------------------------------
>
>                 Key: HIVE-809
>                 URL: https://issues.apache.org/jira/browse/HIVE-809
>             Project: Hadoop Hive
>          Issue Type: New Feature
>            Reporter: Suresh Antony
>            Assignee: Suresh Antony
>            Priority: Minor
>             Fix For: 0.4.0
>
>         Attachments: patch_809_1.txt
>
>
> Currently we have scribe hdfs, which write scribe data directly to HDFS cluster. But in most cases this cluster will not be used for accessing the data.
> This data needs to copied to cluster from which you can access this scribe using hive or some other tool.
> This copier should be able to copy large amounts of data on a new realtime bases. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-809) Create a copier to copy data from scribe hdfs cluster to main DW cluster

Posted by "Zheng Shao (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HIVE-809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Zheng Shao updated HIVE-809:
----------------------------

    Status: Open  (was: Patch Available)

> Create a copier to copy data from scribe hdfs cluster to main DW cluster
> ------------------------------------------------------------------------
>
>                 Key: HIVE-809
>                 URL: https://issues.apache.org/jira/browse/HIVE-809
>             Project: Hadoop Hive
>          Issue Type: New Feature
>            Reporter: Suresh Antony
>            Assignee: Suresh Antony
>            Priority: Minor
>         Attachments: patch_809_1.txt
>
>
> Currently we have scribe hdfs, which write scribe data directly to HDFS cluster. But in most cases this cluster will not be used for accessing the data.
> This data needs to copied to cluster from which you can access this scribe using hive or some other tool.
> This copier should be able to copy large amounts of data on a new realtime bases. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-809) Create a copier to copy data from scribe hdfs cluster to main DW cluster

Posted by "Suresh Antony (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HIVE-809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12793704#action_12793704 ] 

Suresh Antony commented on HIVE-809:
------------------------------------

I am on vacation from 12/11/09 to 1/5/10.


> Create a copier to copy data from scribe hdfs cluster to main DW cluster
> ------------------------------------------------------------------------
>
>                 Key: HIVE-809
>                 URL: https://issues.apache.org/jira/browse/HIVE-809
>             Project: Hadoop Hive
>          Issue Type: New Feature
>            Reporter: Suresh Antony
>            Assignee: Suresh Antony
>            Priority: Minor
>         Attachments: patch_809_1.txt
>
>
> Currently we have scribe hdfs, which write scribe data directly to HDFS cluster. But in most cases this cluster will not be used for accessing the data.
> This data needs to copied to cluster from which you can access this scribe using hive or some other tool.
> This copier should be able to copy large amounts of data on a new realtime bases. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-809) Create a copier to copy data from scribe hdfs cluster to main DW cluster

Posted by "Namit Jain (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HIVE-809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12758031#action_12758031 ] 

Namit Jain commented on HIVE-809:
---------------------------------

The default copier has some facebook location

> Create a copier to copy data from scribe hdfs cluster to main DW cluster
> ------------------------------------------------------------------------
>
>                 Key: HIVE-809
>                 URL: https://issues.apache.org/jira/browse/HIVE-809
>             Project: Hadoop Hive
>          Issue Type: New Feature
>            Reporter: Suresh Antony
>            Assignee: Suresh Antony
>            Priority: Minor
>             Fix For: 0.5.0
>
>         Attachments: patch_809_1.txt
>
>
> Currently we have scribe hdfs, which write scribe data directly to HDFS cluster. But in most cases this cluster will not be used for accessing the data.
> This data needs to copied to cluster from which you can access this scribe using hive or some other tool.
> This copier should be able to copy large amounts of data on a new realtime bases. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.