You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@giraph.apache.org by Garimella Kiran <ki...@aalto.fi> on 2014/11/28 13:02:59 UTC

Best way to know the assignment of vertices to workers

Hi all,

Is there a clean way to find out which worker a particular vertex is assigned to?

>From what I tried out, I found that given n workers, each node is assigned to the worker with id (vertex_id % n  ). Is that a safe way to do this?

I’ve had a look at previous discussions, but most of them have no answer.

—————————

Why I need it:

In my application, each vertex needs to know some additional meta data, which is loaded from file. This metadata file is huge (>50 G) and so, on each worker, I only want to load the metadata corresponding to the vertices present on that worker.

—————————


Previous discussions:
1. http://mail-archives.apache.org/mod_mbox/giraph-user/201310.mbox/%3C7EC16F82718A6D4A920A99FE46CE7F4E2861F779%40MERCMBX19R.na.SAS.com%3E
2. http://mail-archives.apache.org/mod_mbox/giraph-user/201403.mbox/%3CCAMf08QYE%2BRgUv9otXT6oPJorTNjQ-Ay8p4NUiuhds8%2BzgDzs1w%40mail.gmail.com%3E



Regards,
Kiran

RE: Best way to know the assignment of vertices to workers

Posted by Pavan Kumar A <pa...@outlook.com>.
I looked at the code again & does not seem like workerList is sorted, etc. so by knowing a worker number there is no consistent way to tell the actual worker details each time. Lukas was working on such a diff sometime back. Perhaps he can answer more.
From: pavanka@outlook.com
To: user@giraph.apache.org
Subject: RE: Best way to know the assignment of vertices to workers
Date: Sat, 29 Nov 2014 11:23:39 +0530




I wrote a diff sometime ago where you can easily do that. 
You can find implementation details at - https://issues.apache.org/jira/browse/GIRAPH-908 & https://reviews.apache.org/r/22234/
Some options you can use are    -Dgiraph.mappingStoreClass=org.apache.giraph.mapping.LongByteMappingStore    -Dgiraph.lbMappingStoreUpper=1987000    -Dgiraph.lbMappingStoreLower=4096    # Mapping tore ops information    -Dgiraph.mappingStoreOpsClass=org.apache.giraph.mapping.DefaultEmbeddedLongByteOps    # Embed mapping information    -Dgiraph.edgeTranslationClass=org.apache.giraph.mapping.translate.LongByteTranslateEdge    # PartitionerFactory to be used    -Dgiraph.graphPartitionerFactoryClass=org.apache.giraph.partition.LongMappingStorePartitionerFactory
And like vertex input & edge input we now have a mapping inputI only implemented all these for giraph-hive, so if u have a hive table with the mapping vertexId -> workerNumthen u can pass the mapping input like
"org.apache.giraph.hive.input.mapping.examples.LongInt2ByteHiveToMapping, $mapping_table, $mapping_partition"
You can go through the code for each of these options to see what they do. 
Using this you can sort of pre-assign workers to vertex ids, now if u assign two vertices to a worker say worker-1, it is guaranteed they are both present in the same worker, the numbering (aka identification/naming) of workers is consistent (i.e, if a, b are assigned worker-x, they are guaranteed to be in the same worker but we do not know which worker that would be ahead in time), but cannot be explicitly set by the user. (which is what you want to do from what I can tell)
If you are using something else, other than hive then you will have to implement all the interfaces of MappingInputFormat and then u can easily achieve what you want.
From: kiran.garimella@aalto.fi
To: user@giraph.apache.org
Subject: Best way to know the assignment of vertices to workers
Date: Fri, 28 Nov 2014 12:02:59 +0000






Hi all,



Is there a clean way to find out which worker a particular vertex is assigned to?



>From what I tried out, I found that given n workers, each node is assigned to the worker with id (vertex_id % n  ). Is that a safe way to do this?




I’ve had a look at previous discussions, but most of them have no answer.




—————————



Why I need it:



In my application, each vertex needs to know some additional meta data, which is loaded from file. This metadata file is huge (>50 G) and so, on each worker, I only want to load the metadata corresponding to the vertices present on that worker.



—————————






Previous discussions:
1. http://mail-archives.apache.org/mod_mbox/giraph-user/201310.mbox/%3C7EC16F82718A6D4A920A99FE46CE7F4E2861F779%40MERCMBX19R.na.SAS.com%3E
2. http://mail-archives.apache.org/mod_mbox/giraph-user/201403.mbox/%3CCAMf08QYE%2BRgUv9otXT6oPJorTNjQ-Ay8p4NUiuhds8%2BzgDzs1w%40mail.gmail.com%3E









Regards,
Kiran 		 	   		   		 	   		  

RE: Best way to know the assignment of vertices to workers

Posted by Pavan Kumar A <pa...@outlook.com>.
I wrote a diff sometime ago where you can easily do that. 
You can find implementation details at - https://issues.apache.org/jira/browse/GIRAPH-908 & https://reviews.apache.org/r/22234/
Some options you can use are    -Dgiraph.mappingStoreClass=org.apache.giraph.mapping.LongByteMappingStore    -Dgiraph.lbMappingStoreUpper=1987000    -Dgiraph.lbMappingStoreLower=4096    # Mapping tore ops information    -Dgiraph.mappingStoreOpsClass=org.apache.giraph.mapping.DefaultEmbeddedLongByteOps    # Embed mapping information    -Dgiraph.edgeTranslationClass=org.apache.giraph.mapping.translate.LongByteTranslateEdge    # PartitionerFactory to be used    -Dgiraph.graphPartitionerFactoryClass=org.apache.giraph.partition.LongMappingStorePartitionerFactory
And like vertex input & edge input we now have a mapping inputI only implemented all these for giraph-hive, so if u have a hive table with the mapping vertexId -> workerNumthen u can pass the mapping input like
"org.apache.giraph.hive.input.mapping.examples.LongInt2ByteHiveToMapping, $mapping_table, $mapping_partition"
You can go through the code for each of these options to see what they do. 
Using this you can sort of pre-assign workers to vertex ids, now if u assign two vertices to a worker say worker-1, it is guaranteed they are both present in the same worker, the numbering (aka identification/naming) of workers is consistent (i.e, if a, b are assigned worker-x, they are guaranteed to be in the same worker but we do not know which worker that would be ahead in time), but cannot be explicitly set by the user. (which is what you want to do from what I can tell)
If you are using something else, other than hive then you will have to implement all the interfaces of MappingInputFormat and then u can easily achieve what you want.
From: kiran.garimella@aalto.fi
To: user@giraph.apache.org
Subject: Best way to know the assignment of vertices to workers
Date: Fri, 28 Nov 2014 12:02:59 +0000






Hi all,



Is there a clean way to find out which worker a particular vertex is assigned to?



>From what I tried out, I found that given n workers, each node is assigned to the worker with id (vertex_id % n  ). Is that a safe way to do this?




I’ve had a look at previous discussions, but most of them have no answer.




—————————



Why I need it:



In my application, each vertex needs to know some additional meta data, which is loaded from file. This metadata file is huge (>50 G) and so, on each worker, I only want to load the metadata corresponding to the vertices present on that worker.



—————————






Previous discussions:
1. http://mail-archives.apache.org/mod_mbox/giraph-user/201310.mbox/%3C7EC16F82718A6D4A920A99FE46CE7F4E2861F779%40MERCMBX19R.na.SAS.com%3E
2. http://mail-archives.apache.org/mod_mbox/giraph-user/201403.mbox/%3CCAMf08QYE%2BRgUv9otXT6oPJorTNjQ-Ay8p4NUiuhds8%2BzgDzs1w%40mail.gmail.com%3E









Regards,
Kiran 		 	   		  

Re: Best way to know the assignment of vertices to workers

Posted by Matthew Saltz <sa...@gmail.com>.
Kiran,

To answer your question directly, in an AbstractComputation class (or
whatever descendant you're using), you may call
getWorkerContext().getMyWorkerIndex() (here
<https://giraph.apache.org/apidocs/org/apache/giraph/worker/WorkerContext.html>).
However, if each vertex has metadata associated to it, I think the best way
to go would be to define a custom VertexReader
<https://giraph.apache.org/apidocs/org/apache/giraph/io/class-use/VertexReader.html>
and custom Vertex type to take that into account when reading the vertex.

Best,
Matthew

On Fri, Nov 28, 2014 at 1:02 PM, Garimella Kiran <ki...@aalto.fi>
wrote:

>  Hi all,
>
>  Is there a clean way to find out which worker a particular vertex is
> assigned to?
>
>  From what I tried out, I found that given n workers, each node is
> assigned to the worker with id (vertex_id % n  ). Is that a safe way to do
> this?
>
>  I’ve had a look at previous discussions, but most of them have no answer.
>
>  —————————
>
>  Why I need it:
>
>  In my application, each vertex needs to know some additional meta data,
> which is loaded from file. This metadata file is huge (>50 G) and so, on
> each worker, I only want to load the metadata corresponding to the vertices
> present on that worker.
>
>  —————————
>
>
>  Previous discussions:
> 1.
> http://mail-archives.apache.org/mod_mbox/giraph-user/201310.mbox/%3C7EC16F82718A6D4A920A99FE46CE7F4E2861F779%40MERCMBX19R.na.SAS.com%3E
> 2.
> http://mail-archives.apache.org/mod_mbox/giraph-user/201403.mbox/%3CCAMf08QYE%2BRgUv9otXT6oPJorTNjQ-Ay8p4NUiuhds8%2BzgDzs1w%40mail.gmail.com%3E
>
>
>
>  Regards,
> Kiran
>