You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@giraph.apache.org by Ahme Emre Aladağ <em...@agmlab.com> on 2013/07/18 16:49:51 UTC
HBase EdgeInputFormat
Hi,
Question: Will there be HBaseEdgeInputFormat class or is there a restriction of HBase thus we can't implement it?
HBaseVertexInputFormat is fine for vertex-centric reading, i.e. each row in HBase corresponds to one Vertex. But it does not allow me to create duplicate vertices with the same ID.
Now I have the case "many rows in HBase can correspond to one Vertex, each representing sets of edges."
Example:
a1 - x y z
a2 - t p
a3 - k
will be
vertex "a" with edges to x y z t p k
It gives me the intuition that if there existed HBaseEdgeInputFormat, I could solve this case. But it doesn't exist yet.
Re: HBase EdgeInputFormat
Posted by Ahmet Emre Aladağ <em...@agmlab.com>.
Thank you,
TextVertexInputFormat has getEdges() method but EdgeInputFormat does not
have (since it's not a vertex) and it does not support returning
multiple edges per record. Normally, a row should have only one edge but
in my case (Nutch2), we have multiple edges per row.
key: URL
value: ol:URL2, ol:URL3, ol:URL4, ...
Indicating multiple outlinks per row.
Is there a way to overcome this?
On 07/19/2013 01:03 AM, Avery Ching wrote:
> I don't think it will be hard to implement. Just start with the
> HbaseVertexInputFormat and have it extend EdgeInputFormat. You can
> look at TableEdgeInputFormat for an example. It sounds like a good
> contribution to Giraph.
>
> On 7/18/13 1:57 PM, Puneet Jain wrote:
>> I also need this feature. Will be really helpful.
>>
>>
>> On Thu, Jul 18, 2013 at 10:49 AM, Ahme Emre Aladağ
>> <emre.aladag@agmlab.com <ma...@agmlab.com>> wrote:
>>
>> Hi,
>>
>> Question: Will there be HBaseEdgeInputFormat class or is there a
>> restriction of HBase thus we can't implement it?
>>
>> HBaseVertexInputFormat is fine for vertex-centric reading, i.e.
>> each row in HBase corresponds to one Vertex. But it does not
>> allow me to create duplicate vertices with the same ID.
>> Now I have the case "many rows in HBase can correspond to one
>> Vertex, each representing sets of edges."
>>
>> Example:
>> a1 - x y z
>> a2 - t p
>> a3 - k
>>
>> will be
>>
>> vertex "a" with edges to x y z t p k
>>
>> It gives me the intuition that if there existed
>> HBaseEdgeInputFormat, I could solve this case. But it doesn't
>> exist yet.
>>
Re: HBase EdgeInputFormat
Posted by Avery Ching <ac...@apache.org>.
I don't think it will be hard to implement. Just start with the
HbaseVertexInputFormat and have it extend EdgeInputFormat. You can look
at TableEdgeInputFormat for an example. It sounds like a good
contribution to Giraph.
On 7/18/13 1:57 PM, Puneet Jain wrote:
> I also need this feature. Will be really helpful.
>
>
> On Thu, Jul 18, 2013 at 10:49 AM, Ahme Emre Aladağ
> <emre.aladag@agmlab.com <ma...@agmlab.com>> wrote:
>
> Hi,
>
> Question: Will there be HBaseEdgeInputFormat class or is there a
> restriction of HBase thus we can't implement it?
>
> HBaseVertexInputFormat is fine for vertex-centric reading, i.e.
> each row in HBase corresponds to one Vertex. But it does not allow
> me to create duplicate vertices with the same ID.
> Now I have the case "many rows in HBase can correspond to one
> Vertex, each representing sets of edges."
>
> Example:
> a1 - x y z
> a2 - t p
> a3 - k
>
> will be
>
> vertex "a" with edges to x y z t p k
>
> It gives me the intuition that if there existed
> HBaseEdgeInputFormat, I could solve this case. But it doesn't
> exist yet.
>
>
>
>
>
>
> --
> --Puneet
Re: HBase EdgeInputFormat
Posted by Puneet Jain <pu...@gmail.com>.
I also need this feature. Will be really helpful.
On Thu, Jul 18, 2013 at 10:49 AM, Ahme Emre Aladağ
<em...@agmlab.com>wrote:
> Hi,
>
> Question: Will there be HBaseEdgeInputFormat class or is there a
> restriction of HBase thus we can't implement it?
>
> HBaseVertexInputFormat is fine for vertex-centric reading, i.e. each row
> in HBase corresponds to one Vertex. But it does not allow me to create
> duplicate vertices with the same ID.
> Now I have the case "many rows in HBase can correspond to one Vertex, each
> representing sets of edges."
>
> Example:
> a1 - x y z
> a2 - t p
> a3 - k
>
> will be
>
> vertex "a" with edges to x y z t p k
>
> It gives me the intuition that if there existed HBaseEdgeInputFormat, I
> could solve this case. But it doesn't exist yet.
>
>
>
>
--
--Puneet