You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@giraph.apache.org by "Avery Ching (Created) (JIRA)" <ji...@apache.org> on 2011/11/17 05:38:51 UTC

[jira] [Created] (GIRAPH-93) Hive input / output format

Hive input / output format
--------------------------

                 Key: GIRAPH-93
                 URL: https://issues.apache.org/jira/browse/GIRAPH-93
             Project: Giraph
          Issue Type: New Feature
            Reporter: Avery Ching
            Assignee: Avery Ching


It would be great to be able to load/store data from/to Hive tables.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (GIRAPH-93) Hive input / output format

Posted by "Avery Ching (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/GIRAPH-93?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13460328#comment-13460328 ] 

Avery Ching commented on GIRAPH-93:
-----------------------------------

+1, this is contrib, so let's add it in.
                
> Hive input / output format
> --------------------------
>
>                 Key: GIRAPH-93
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-93
>             Project: Giraph
>          Issue Type: New Feature
>            Reporter: Avery Ching
>            Assignee: Nitay Joffe
>         Attachments: GIRAPH-93.txt, GIRAPH-93.txt
>
>
> It would be great to be able to load/store data from/to Hive tables.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (GIRAPH-93) Hive input / output format

Posted by "Dmitriy V. Ryaboy (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/GIRAPH-93?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13178607#comment-13178607 ] 

Dmitriy V. Ryaboy commented on GIRAPH-93:
-----------------------------------------

I suppose one could fake this by adding a local mvn repo to giraph, and configuring mvn to check there?


                
> Hive input / output format
> --------------------------
>
>                 Key: GIRAPH-93
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-93
>             Project: Giraph
>          Issue Type: New Feature
>            Reporter: Avery Ching
>            Assignee: Avery Ching
>
> It would be great to be able to load/store data from/to Hive tables.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (GIRAPH-93) Hive input / output format

Posted by "Avery Ching (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/GIRAPH-93?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13460389#comment-13460389 ] 

Avery Ching commented on GIRAPH-93:
-----------------------------------

Wrong patch (old one), re-committed with the new one.
                
> Hive input / output format
> --------------------------
>
>                 Key: GIRAPH-93
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-93
>             Project: Giraph
>          Issue Type: New Feature
>            Reporter: Avery Ching
>            Assignee: Nitay Joffe
>         Attachments: GIRAPH-93.txt, GIRAPH-93.txt
>
>
> It would be great to be able to load/store data from/to Hive tables.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (GIRAPH-93) Hive input / output format

Posted by "Nitay Joffe (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/GIRAPH-93?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13459160#comment-13459160 ] 

Nitay Joffe commented on GIRAPH-93:
-----------------------------------

Okay, I'll take a look into these options. Also it seems like the dependent HCatalog task you mention is actually progressing well. If they get a jar checked in to maven I presume we can always enable this diff, yes?
                
> Hive input / output format
> --------------------------
>
>                 Key: GIRAPH-93
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-93
>             Project: Giraph
>          Issue Type: New Feature
>            Reporter: Avery Ching
>            Assignee: Avery Ching
>         Attachments: GIRAPH-93.txt
>
>
> It would be great to be able to load/store data from/to Hive tables.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (GIRAPH-93) Hive input / output format

Posted by "Eli Reisman (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/GIRAPH-93?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13458971#comment-13458971 ] 

Eli Reisman commented on GIRAPH-93:
-----------------------------------

I'm not familiar with running those tests for giraph-contrib, does someone know whether this is normal and can be ignored?

Thanks for picking up this work.
                
> Hive input / output format
> --------------------------
>
>                 Key: GIRAPH-93
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-93
>             Project: Giraph
>          Issue Type: New Feature
>            Reporter: Avery Ching
>            Assignee: Avery Ching
>         Attachments: GIRAPH-93.txt
>
>
> It would be great to be able to load/store data from/to Hive tables.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (GIRAPH-93) Hive input / output format

Posted by "Jaeho Shin (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/GIRAPH-93?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13425325#comment-13425325 ] 

Jaeho Shin commented on GIRAPH-93:
----------------------------------

Based on Avery's initial work, I managed to make Giraph read and write Hive tables using HCatalog 0.4.0 and run vertex codes.  Under giraph-formats-contrib subproject, I added HCatalogInputFormat, HCatalogOutputFormat, and a HiveGiraphRunner which handles additional setup for GiraphJob.

However, this involves changes to HCatalog as well to let it publish jars to a local Maven repo, so submitting a patch for Giraph alone won't work. (need HCATALOG-132)  If there's interest, I'd like to post my minimal patch to hcatalog and instructions to build it.

Besides, I don't have a good minimal test yet, which seems to be tricky as well.  Please advise if you have a good idea on writing effective tests involving Hive or Pig.  Or, if it's OK, I want to post my patch without test code.
                
> Hive input / output format
> --------------------------
>
>                 Key: GIRAPH-93
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-93
>             Project: Giraph
>          Issue Type: New Feature
>            Reporter: Avery Ching
>            Assignee: Avery Ching
>
> It would be great to be able to load/store data from/to Hive tables.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (GIRAPH-93) Hive input / output format

Posted by "Jakob Homan (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/GIRAPH-93?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13151805#comment-13151805 ] 

Jakob Homan commented on GIRAPH-93:
-----------------------------------

Do you mean RCFile specifically?  Hive can handle data in any format there's a serde for.  I've been meaning to open a jira for handling Avro-encoded data as well (and possibly specifying a graph schema for it).  For directly loading tables in/out of Hive, it may be better to target HCatalog, as that will also give access to Pig (and whatever else HCatalog eventually supports)
                
> Hive input / output format
> --------------------------
>
>                 Key: GIRAPH-93
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-93
>             Project: Giraph
>          Issue Type: New Feature
>            Reporter: Avery Ching
>            Assignee: Avery Ching
>
> It would be great to be able to load/store data from/to Hive tables.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (GIRAPH-93) Hive input / output format

Posted by "Nitay Joffe (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/GIRAPH-93?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Nitay Joffe updated GIRAPH-93:
------------------------------

    Attachment: GIRAPH-93.txt

Here's the workaround. I moved the hcatalog stuff to a different src folder so it's not compiled by default and added a profile called "hcatalog" that adds the code. I tested by enabling this profile and compiling using Facebook's internal repository that has the hcatalog jar.
                
> Hive input / output format
> --------------------------
>
>                 Key: GIRAPH-93
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-93
>             Project: Giraph
>          Issue Type: New Feature
>            Reporter: Avery Ching
>            Assignee: Nitay Joffe
>         Attachments: GIRAPH-93.txt, GIRAPH-93.txt
>
>
> It would be great to be able to load/store data from/to Hive tables.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (GIRAPH-93) Hive input / output format

Posted by "Jakob Homan (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/GIRAPH-93?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13151867#comment-13151867 ] 

Jakob Homan commented on GIRAPH-93:
-----------------------------------

Actually, I see what you're trying to do. You want to store the output of a Vertex job into Hive. You can do this by writing the data to an HDFS directory that Hive is watching as a table.  It's safe to do this with an external table, though with the new indexing features, etc. it shouldn't be done for a managed table.  For this, you'd go directly through the metastore API, although this is what HCatalog is also working on, so it is worth making sure we don't duplicate work.  Also, via HCatalog we get Pig for free.
                
> Hive input / output format
> --------------------------
>
>                 Key: GIRAPH-93
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-93
>             Project: Giraph
>          Issue Type: New Feature
>            Reporter: Avery Ching
>            Assignee: Avery Ching
>
> It would be great to be able to load/store data from/to Hive tables.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (GIRAPH-93) Hive input / output format

Posted by "Jakob Homan (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/GIRAPH-93?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13151859#comment-13151859 ] 

Jakob Homan commented on GIRAPH-93:
-----------------------------------

bq. I've been trying to find any examples for using loading/storing from Hive tables to MapReduce jobs and can't find much unfortunately. I'd appreciate any pointers.
Hive will deal with any data stored on HDFS via a SerDe that will expose in a way Hive wants.  I wrote the Avro Hive SerDe, if you're looking for an example of that: https://github.com/jghoman/haivvreo but I'm not sure what you're asking? Do you want to be able to spawn a Giraph vertex job from Hive? I've been thinking it would be good to be able to do that from Pig, so that you could do Giraph processing as a step in your Pig workflow.

HCatalog is relatively young, but being heavily developed and will eventually be the gatekeeper for data both in Hive and Pig (if all goes according to spec).  The HCatalog crew is doing a lot of work right now on the Hive metastore to make this happen.
                
> Hive input / output format
> --------------------------
>
>                 Key: GIRAPH-93
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-93
>             Project: Giraph
>          Issue Type: New Feature
>            Reporter: Avery Ching
>            Assignee: Avery Ching
>
> It would be great to be able to load/store data from/to Hive tables.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Assigned] (GIRAPH-93) Hive input / output format

Posted by "Nitay Joffe (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/GIRAPH-93?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Nitay Joffe reassigned GIRAPH-93:
---------------------------------

    Assignee: Nitay Joffe  (was: Avery Ching)
    
> Hive input / output format
> --------------------------
>
>                 Key: GIRAPH-93
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-93
>             Project: Giraph
>          Issue Type: New Feature
>            Reporter: Avery Ching
>            Assignee: Nitay Joffe
>         Attachments: GIRAPH-93.txt
>
>
> It would be great to be able to load/store data from/to Hive tables.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (GIRAPH-93) Hive input / output format

Posted by "Nitay Joffe (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/GIRAPH-93?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13458452#comment-13458452 ] 

Nitay Joffe commented on GIRAPH-93:
-----------------------------------

When I run mvn test in giraph-formats-contrib I get failures for the following:

  testAccumuloInputOutput(org.apache.giraph.format.accumulo.TestAccumuloVertexFormat)
  testHBaseInputOutput(org.apache.giraph.format.hbase.TestHBaseRootMarkerVertextFormat)

However I get the same failures when I run mvn test on clean trunk, so I'm assuming this diff is safe in terms of those.
                
> Hive input / output format
> --------------------------
>
>                 Key: GIRAPH-93
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-93
>             Project: Giraph
>          Issue Type: New Feature
>            Reporter: Avery Ching
>            Assignee: Avery Ching
>         Attachments: GIRAPH-93.txt
>
>
> It would be great to be able to load/store data from/to Hive tables.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (GIRAPH-93) Hive input / output format

Posted by "Dmitriy V. Ryaboy (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/GIRAPH-93?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13151871#comment-13151871 ] 

Dmitriy V. Ryaboy commented on GIRAPH-93:
-----------------------------------------

Going through HCat will be a bit gnarly (though I agree with Jakob that it's the only sensible way when you are dealing with Hive-managed tables).  Writing to a directory and having Hive treat is a external will be far easier. Oh and Hive (and Pig) can read tab-delimited files, if it's just a matter of getting basic pipelining to happen initially.

Jakob, maybe we can discuss on the list, but starting a Giraph job from Pig should be as simple as a mapreduce invocation (http://pig.apache.org/docs/r0.9.1/basic.html#mapreduce). 
                
> Hive input / output format
> --------------------------
>
>                 Key: GIRAPH-93
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-93
>             Project: Giraph
>          Issue Type: New Feature
>            Reporter: Avery Ching
>            Assignee: Avery Ching
>
> It would be great to be able to load/store data from/to Hive tables.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (GIRAPH-93) Hive input / output format

Posted by "Avery Ching (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/GIRAPH-93?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13170631#comment-13170631 ] 

Avery Ching commented on GIRAPH-93:
-----------------------------------

Just wanted to update that I did get this to work with HCatalog a while ago.  And amazingly it actually works!  I'll put together a diff to getting this into Giraph.
                
> Hive input / output format
> --------------------------
>
>                 Key: GIRAPH-93
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-93
>             Project: Giraph
>          Issue Type: New Feature
>            Reporter: Avery Ching
>            Assignee: Avery Ching
>
> It would be great to be able to load/store data from/to Hive tables.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (GIRAPH-93) Hive input / output format

Posted by "Hudson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/GIRAPH-93?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13460388#comment-13460388 ] 

Hudson commented on GIRAPH-93:
------------------------------

Integrated in Giraph-trunk-Commit #203 (See [https://builds.apache.org/job/Giraph-trunk-Commit/203/])
    Fixed GIRAPH-93 to be using the latest patch. (Revision 1388386)

     Result = SUCCESS
aching : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1388386
Files : 
* /giraph/trunk/giraph-formats-contrib/pom.xml
* /giraph/trunk/giraph-formats-contrib/src/hcatalog
* /giraph/trunk/giraph-formats-contrib/src/hcatalog/java
* /giraph/trunk/giraph-formats-contrib/src/hcatalog/java/org
* /giraph/trunk/giraph-formats-contrib/src/hcatalog/java/org/apache
* /giraph/trunk/giraph-formats-contrib/src/hcatalog/java/org/apache/giraph
* /giraph/trunk/giraph-formats-contrib/src/hcatalog/java/org/apache/giraph/format
* /giraph/trunk/giraph-formats-contrib/src/hcatalog/java/org/apache/giraph/format/hcatalog
* /giraph/trunk/giraph-formats-contrib/src/hcatalog/java/org/apache/giraph/format/hcatalog/HCatalogVertexInputFormat.java
* /giraph/trunk/giraph-formats-contrib/src/hcatalog/java/org/apache/giraph/format/hcatalog/HCatalogVertexOutputFormat.java
* /giraph/trunk/giraph-formats-contrib/src/hcatalog/java/org/apache/giraph/format/hcatalog/HiveGiraphRunner.java
* /giraph/trunk/giraph-formats-contrib/src/hcatalog/java/org/apache/giraph/format/hcatalog/package-info.java
* /giraph/trunk/giraph-formats-contrib/src/main/java/org/apache/giraph/format/hcatalog/HCatalogVertexInputFormat.java
* /giraph/trunk/giraph-formats-contrib/src/main/java/org/apache/giraph/format/hcatalog/HCatalogVertexOutputFormat.java
* /giraph/trunk/giraph-formats-contrib/src/main/java/org/apache/giraph/format/hcatalog/HiveGiraphRunner.java
* /giraph/trunk/giraph-formats-contrib/src/main/java/org/apache/giraph/format/hcatalog/package-info.java

                
> Hive input / output format
> --------------------------
>
>                 Key: GIRAPH-93
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-93
>             Project: Giraph
>          Issue Type: New Feature
>            Reporter: Avery Ching
>            Assignee: Nitay Joffe
>         Attachments: GIRAPH-93.txt, GIRAPH-93.txt
>
>
> It would be great to be able to load/store data from/to Hive tables.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (GIRAPH-93) Hive input / output format

Posted by "Hudson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/GIRAPH-93?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13460332#comment-13460332 ] 

Hudson commented on GIRAPH-93:
------------------------------

Integrated in Giraph-trunk-Commit #202 (See [https://builds.apache.org/job/Giraph-trunk-Commit/202/])
    GIRAPH-93: Hive input / output format. (nitayj via aching) (Revision 1388358)

     Result = SUCCESS
aching : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1388358
Files : 
* /giraph/trunk/CHANGELOG
* /giraph/trunk/giraph-formats-contrib/pom.xml
* /giraph/trunk/giraph-formats-contrib/src/main/java/org/apache/giraph/format/hcatalog
* /giraph/trunk/giraph-formats-contrib/src/main/java/org/apache/giraph/format/hcatalog/HCatalogVertexInputFormat.java
* /giraph/trunk/giraph-formats-contrib/src/main/java/org/apache/giraph/format/hcatalog/HCatalogVertexOutputFormat.java
* /giraph/trunk/giraph-formats-contrib/src/main/java/org/apache/giraph/format/hcatalog/HiveGiraphRunner.java
* /giraph/trunk/giraph-formats-contrib/src/main/java/org/apache/giraph/format/hcatalog/package-info.java

                
> Hive input / output format
> --------------------------
>
>                 Key: GIRAPH-93
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-93
>             Project: Giraph
>          Issue Type: New Feature
>            Reporter: Avery Ching
>            Assignee: Nitay Joffe
>         Attachments: GIRAPH-93.txt, GIRAPH-93.txt
>
>
> It would be great to be able to load/store data from/to Hive tables.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (GIRAPH-93) Hive input / output format

Posted by "Avery Ching (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/GIRAPH-93?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13178616#comment-13178616 ] 

Avery Ching commented on GIRAPH-93:
-----------------------------------

Interesting idea.  Maybe I'll look around for an example (unless you know of one).
                
> Hive input / output format
> --------------------------
>
>                 Key: GIRAPH-93
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-93
>             Project: Giraph
>          Issue Type: New Feature
>            Reporter: Avery Ching
>            Assignee: Avery Ching
>
> It would be great to be able to load/store data from/to Hive tables.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (GIRAPH-93) Hive input / output format

Posted by "Nitay Joffe (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/GIRAPH-93?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Nitay Joffe updated GIRAPH-93:
------------------------------

    Attachment: GIRAPH-93.txt

I'm going to be taking over Jaeho's work so we can get the work he did at Facebook committed. Here's the patch for this issue.  
                
> Hive input / output format
> --------------------------
>
>                 Key: GIRAPH-93
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-93
>             Project: Giraph
>          Issue Type: New Feature
>            Reporter: Avery Ching
>            Assignee: Avery Ching
>         Attachments: GIRAPH-93.txt
>
>
> It would be great to be able to load/store data from/to Hive tables.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (GIRAPH-93) Hive input / output format

Posted by "Avery Ching (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/GIRAPH-93?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13459164#comment-13459164 ] 

Avery Ching commented on GIRAPH-93:
-----------------------------------

Yeah, we just need a workaround in the meantime.
                
> Hive input / output format
> --------------------------
>
>                 Key: GIRAPH-93
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-93
>             Project: Giraph
>          Issue Type: New Feature
>            Reporter: Avery Ching
>            Assignee: Avery Ching
>         Attachments: GIRAPH-93.txt
>
>
> It would be great to be able to load/store data from/to Hive tables.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (GIRAPH-93) Hive input / output format

Posted by "Avery Ching (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/GIRAPH-93?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13151866#comment-13151866 ] 

Avery Ching commented on GIRAPH-93:
-----------------------------------

Specifically, we have a lot of data stored in Hive tables and I'd like to be able to do graph computation on them with Giraph and then store the results back in Hive tables so Hive queries can operate against them as well.
                
> Hive input / output format
> --------------------------
>
>                 Key: GIRAPH-93
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-93
>             Project: Giraph
>          Issue Type: New Feature
>            Reporter: Avery Ching
>            Assignee: Avery Ching
>
> It would be great to be able to load/store data from/to Hive tables.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (GIRAPH-93) Hive input / output format

Posted by "Avery Ching (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/GIRAPH-93?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13151848#comment-13151848 ] 

Avery Ching commented on GIRAPH-93:
-----------------------------------

I think RCFile initially, then other formats in Hive.  HCatalog certainly is a good idea for the long term, not sure how ready it is now?  

Dmitriy, could you send me your code (don't worry about getting it to compile).  I'd like to take a look at any examples.

I've been trying to find any examples for using loading/storing from Hive tables to MapReduce jobs and can't find much unfortunately.  I'd appreciate any pointers.
                
> Hive input / output format
> --------------------------
>
>                 Key: GIRAPH-93
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-93
>             Project: Giraph
>          Issue Type: New Feature
>            Reporter: Avery Ching
>            Assignee: Avery Ching
>
> It would be great to be able to load/store data from/to Hive tables.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (GIRAPH-93) Hive input / output format

Posted by "Jakob Homan (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/GIRAPH-93?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13151868#comment-13151868 ] 

Jakob Homan commented on GIRAPH-93:
-----------------------------------

Cool. The first step will be to get a VertexInput/OutputFormat that can work with RCFiles.  The storing the result back in the metastore can be tackled after that.
                
> Hive input / output format
> --------------------------
>
>                 Key: GIRAPH-93
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-93
>             Project: Giraph
>          Issue Type: New Feature
>            Reporter: Avery Ching
>            Assignee: Avery Ching
>
> It would be great to be able to load/store data from/to Hive tables.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (GIRAPH-93) Hive input / output format

Posted by "Arun Suresh (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/GIRAPH-93?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13151851#comment-13151851 ] 

Arun Suresh commented on GIRAPH-93:
-----------------------------------

Avery, This might not be an optimal solution, but just putting it out there. I understand Hive exposes a JDBC interface. Once can use the JDBC interface and the DbInputFormat http://www.cloudera.com/blog/2009/03/database-access-with-hadoop/ to load data from a Hive table for a Map Reduce Job
                
> Hive input / output format
> --------------------------
>
>                 Key: GIRAPH-93
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-93
>             Project: Giraph
>          Issue Type: New Feature
>            Reporter: Avery Ching
>            Assignee: Avery Ching
>
> It would be great to be able to load/store data from/to Hive tables.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (GIRAPH-93) Hive input / output format

Posted by "Dmitriy V. Ryaboy (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/GIRAPH-93?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13151809#comment-13151809 ] 

Dmitriy V. Ryaboy commented on GIRAPH-93:
-----------------------------------------

FWIW I already have a Thrift one lying around, which both Hive and Pig read via Elephant-Bird. It might not compile against the current trunk as I've been working on other stuff and you guys have been coding like mad.. but I can post something over the thanksgiving weekend.

                
> Hive input / output format
> --------------------------
>
>                 Key: GIRAPH-93
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-93
>             Project: Giraph
>          Issue Type: New Feature
>            Reporter: Avery Ching
>            Assignee: Avery Ching
>
> It would be great to be able to load/store data from/to Hive tables.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (GIRAPH-93) Hive input / output format

Posted by "Avery Ching (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/GIRAPH-93?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13459062#comment-13459062 ] 

Avery Ching commented on GIRAPH-93:
-----------------------------------

This is great stuff.  The only thing is that AFAIK hcatalog isn't in maven yet (https://issues.apache.org/jira/browse/HCATALOG-132).  This causes the build to fail unless you have installed into your local maven repository.  It would be good to be able to either 

1) not build the HCatalogInputFormat stuff unless we have the dependencies or
2) use maven profiles to build the HCatalog stuff so that folks can install the appropriate dependencies into their own local maven repo when they want HCatalog stuff, but will build fine for the other cases.
                
> Hive input / output format
> --------------------------
>
>                 Key: GIRAPH-93
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-93
>             Project: Giraph
>          Issue Type: New Feature
>            Reporter: Avery Ching
>            Assignee: Avery Ching
>         Attachments: GIRAPH-93.txt
>
>
> It would be great to be able to load/store data from/to Hive tables.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (GIRAPH-93) Hive input / output format

Posted by "Avery Ching (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/GIRAPH-93?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13170831#comment-13170831 ] 

Avery Ching commented on GIRAPH-93:
-----------------------------------

Argh, since HCatalog is not published to maven, this is a bit of a problem.  We could add a system dependency, but it's a little messy (yucky warnings).  

I can get it to build with my compiled jar, but get warnings like:

[WARNING] 'dependencies.dependency.systemPath' for org.apache.hcatalog:hcatalog:jar should not point at files within the project directory, ${basedir}/lib/hcatalog-0.3.0-dev.jar will be unresolvable by dependent projects @ line 527, column 19
[WARNING] 
[WARNING] It is highly recommended to fix these problems because they threaten the stability of your build.
[WARNING] 
[WARNING] For this reason, future Maven versions might no longer support building such malformed projects.
[WARNING] 
[INFO]                                                                         
[INFO] ------------------------------------------------------------------------
[INFO] Building Apache Incubator Giraph 0.70
[INFO] ------------------------------------------------------------------------
[WARNING] The POM for org.apache.hadoop:hadoop-core:jar:0.20.1 is missing, no dependency information available
[WARNING] The POM for org.apache.hadoop:hadoop-core:jar:0.20.3-CDH3-SNAPSHOT is missing, no dependency information available
[WARNING] Could not transfer metadata asm:asm/maven-metadata.xml from/to local.repository (file:../../local.repository/trunk): No connector available to access repository local.repository (file:../../local.repository/trunk) of type legacy using the available factories WagonRepositoryConnectorFactory
[INFO] 
[INFO] --- maven-enforcer-plugin:1.0.1:enforce (enforce-maven) @ giraph ---
[INFO] 
[INFO] --- maven-resources-plugin:2.4.3:resources (default-resources) @ giraph ---
[INFO] Using 'UTF-8' encoding to copy filtered resources.
[INFO] skip non existing resourceDirectory /home/aching/giraph/src/main/resources
[INFO] 
[INFO] --- maven-compiler-plugin:2.3.2:compile (default-compile) @ giraph ---
[INFO] Compiling 122 source files to /home/aching/giraph/target/classes
[INFO] 
[INFO] --- maven-assembly-plugin:2.2:single (build-fat-jar) @ giraph ---
[WARNING] Missing POM for org.apache.hadoop:hadoop-core:jar:0.20.1
[WARNING] Missing POM for org.apache.hadoop:hadoop-core:jar:0.20.3-CDH3-SNAPSHOT

Maybe wait on HCATALOG-132?
                
> Hive input / output format
> --------------------------
>
>                 Key: GIRAPH-93
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-93
>             Project: Giraph
>          Issue Type: New Feature
>            Reporter: Avery Ching
>            Assignee: Avery Ching
>
> It would be great to be able to load/store data from/to Hive tables.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira