You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@giraph.apache.org by Ahme Emre Aladağ <em...@agmlab.com> on 2013/07/18 16:49:51 UTC

HBase EdgeInputFormat

Hi,

Question: Will there be HBaseEdgeInputFormat class or is there a restriction of HBase thus we can't implement it?

HBaseVertexInputFormat is fine for vertex-centric reading, i.e. each row in HBase corresponds to one Vertex. But it does not allow me to create duplicate vertices with the same ID.
Now I have the case "many rows in HBase can correspond to one Vertex, each representing sets of edges."

Example:
a1 - x y z
a2 - t p
a3 - k

will be

vertex "a" with edges to x y z t p k

It gives me the intuition that if there existed HBaseEdgeInputFormat, I could solve this case. But it doesn't exist yet.




Re: HBase EdgeInputFormat

Posted by Ahmet Emre Aladağ <em...@agmlab.com>.
Thank you,

TextVertexInputFormat has getEdges() method but EdgeInputFormat does not 
have (since it's not a vertex) and it does not support returning 
multiple edges per record. Normally, a row should have only one edge but 
in my case (Nutch2), we have multiple edges per row.

key: URL
value: ol:URL2, ol:URL3, ol:URL4, ...

Indicating multiple outlinks per row.

Is there a way to overcome this?



On 07/19/2013 01:03 AM, Avery Ching wrote:
> I don't think it will be hard to implement.  Just start with the 
> HbaseVertexInputFormat and have it extend EdgeInputFormat.  You can 
> look at TableEdgeInputFormat for an example.  It sounds like a good 
> contribution to Giraph.
>
> On 7/18/13 1:57 PM, Puneet Jain wrote:
>> I also need this feature. Will be really helpful.
>>
>>
>> On Thu, Jul 18, 2013 at 10:49 AM, Ahme Emre Aladağ 
>> <emre.aladag@agmlab.com <ma...@agmlab.com>> wrote:
>>
>>     Hi,
>>
>>     Question: Will there be HBaseEdgeInputFormat class or is there a
>>     restriction of HBase thus we can't implement it?
>>
>>     HBaseVertexInputFormat is fine for vertex-centric reading, i.e.
>>     each row in HBase corresponds to one Vertex. But it does not
>>     allow me to create duplicate vertices with the same ID.
>>     Now I have the case "many rows in HBase can correspond to one
>>     Vertex, each representing sets of edges."
>>
>>     Example:
>>     a1 - x y z
>>     a2 - t p
>>     a3 - k
>>
>>     will be
>>
>>     vertex "a" with edges to x y z t p k
>>
>>     It gives me the intuition that if there existed
>>     HBaseEdgeInputFormat, I could solve this case. But it doesn't
>>     exist yet.
>>

Re: HBase EdgeInputFormat

Posted by Avery Ching <ac...@apache.org>.
I don't think it will be hard to implement.  Just start with the 
HbaseVertexInputFormat and have it extend EdgeInputFormat.  You can look 
at TableEdgeInputFormat for an example.  It sounds like a good 
contribution to Giraph.

On 7/18/13 1:57 PM, Puneet Jain wrote:
> I also need this feature. Will be really helpful.
>
>
> On Thu, Jul 18, 2013 at 10:49 AM, Ahme Emre Aladağ 
> <emre.aladag@agmlab.com <ma...@agmlab.com>> wrote:
>
>     Hi,
>
>     Question: Will there be HBaseEdgeInputFormat class or is there a
>     restriction of HBase thus we can't implement it?
>
>     HBaseVertexInputFormat is fine for vertex-centric reading, i.e.
>     each row in HBase corresponds to one Vertex. But it does not allow
>     me to create duplicate vertices with the same ID.
>     Now I have the case "many rows in HBase can correspond to one
>     Vertex, each representing sets of edges."
>
>     Example:
>     a1 - x y z
>     a2 - t p
>     a3 - k
>
>     will be
>
>     vertex "a" with edges to x y z t p k
>
>     It gives me the intuition that if there existed
>     HBaseEdgeInputFormat, I could solve this case. But it doesn't
>     exist yet.
>
>
>
>
>
>
> -- 
> --Puneet


Re: HBase EdgeInputFormat

Posted by Puneet Jain <pu...@gmail.com>.
I also need this feature. Will be really helpful.


On Thu, Jul 18, 2013 at 10:49 AM, Ahme Emre Aladağ
<em...@agmlab.com>wrote:

> Hi,
>
> Question: Will there be HBaseEdgeInputFormat class or is there a
> restriction of HBase thus we can't implement it?
>
> HBaseVertexInputFormat is fine for vertex-centric reading, i.e. each row
> in HBase corresponds to one Vertex. But it does not allow me to create
> duplicate vertices with the same ID.
> Now I have the case "many rows in HBase can correspond to one Vertex, each
> representing sets of edges."
>
> Example:
> a1 - x y z
> a2 - t p
> a3 - k
>
> will be
>
> vertex "a" with edges to x y z t p k
>
> It gives me the intuition that if there existed HBaseEdgeInputFormat, I
> could solve this case. But it doesn't exist yet.
>
>
>
>


-- 
--Puneet