You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spark.apache.org by James <al...@gmail.com> on 2015/02/12 15:26:50 UTC

Why a program would receive null from send message of mapReduceTriplets

Hello,

When I am running the code on a much bigger size graph, I met
NullPointerException.

I found that is because the sendMessage() function receive a triplet that
edge.srcAttr or edge.dstAttr is null. Thus I wonder why it will happen as I
am sure every vertices have a attr.

Any returns is appreciated.

Alcaid


2015-02-11 19:30 GMT+08:00 James <al...@gmail.com>:

> Hello,
>
> Recently  I am trying to estimate the average distance of a big graph
> using spark with the help of [HyperAnf](
> http://dl.acm.org/citation.cfm?id=1963493).
>
> It works like Connect Componenet algorithm, while the attribute of a
> vertex is a HyperLogLog counter that at k-th iteration it estimates the
> number of vertices it could reaches less than k hops.
>
> I have successfully run the code on a graph with 20M vertices. But I still
> need help:
>
>
> *I think the code could work more efficiently especially the "Send
> message" function, but I am not sure about what will happen if a vertex
> receive no message at a iteration.*
>
> Here is my code: https://github.com/alcaid1801/Erdos
>
> Any returns is appreciated.
>

Re: Why a program would receive null from send message of mapReduceTriplets

Posted by James <al...@gmail.com>.
I have a question:

*How could the attributes of triplets of a graph get update after
mapVertices() func? *

My code

```
// Initial the graph, assign a counter to each vertex that contains the
vertex id only
var anfGraph = graph.mapVertices { case (vid, _) =>
  val counter = new HyperLogLog(5)
  counter.offer(vid)
  counter
}

val nullVertex = anfGraph.triplets.filter(edge => edge.srcAttr ==
null).first

anfGraph.vertices.filter(_._1 == nullVertex).first
// I could see that the vertex has a not null attribute

// messages = anfGraph.aggregateMessages(msgFun, mergeMessage)   // <-
NullPointerException

```

I could found that some vertex attributes in some triplets are null, but
not all.


Alcaid


2015-02-13 14:50 GMT+08:00 Reynold Xin <rx...@databricks.com>:

> Then maybe you actually had a null in your vertex attribute?
>
>
> On Thu, Feb 12, 2015 at 10:47 PM, James <al...@gmail.com> wrote:
>
>> I changed the mapReduceTriplets() func to aggregateMessages(), but it
>> still failed.
>>
>>
>> 2015-02-13 6:52 GMT+08:00 Reynold Xin <rx...@databricks.com>:
>>
>>> Can you use the new aggregateNeighbors method? I suspect the null is
>>> coming from "automatic join elimination", which detects bytecode to see if
>>> you need the src or dst vertex data. Occasionally it can fail to detect. In
>>> the new aggregateNeighbors API, the caller needs to explicitly specifying
>>> that, making it more robust.
>>>
>>>
>>> On Thu, Feb 12, 2015 at 6:26 AM, James <al...@gmail.com> wrote:
>>>
>>>> Hello,
>>>>
>>>> When I am running the code on a much bigger size graph, I met
>>>> NullPointerException.
>>>>
>>>> I found that is because the sendMessage() function receive a triplet
>>>> that
>>>> edge.srcAttr or edge.dstAttr is null. Thus I wonder why it will happen
>>>> as I
>>>> am sure every vertices have a attr.
>>>>
>>>> Any returns is appreciated.
>>>>
>>>> Alcaid
>>>>
>>>>
>>>> 2015-02-11 19:30 GMT+08:00 James <al...@gmail.com>:
>>>>
>>>> > Hello,
>>>> >
>>>> > Recently  I am trying to estimate the average distance of a big graph
>>>> > using spark with the help of [HyperAnf](
>>>> > http://dl.acm.org/citation.cfm?id=1963493).
>>>> >
>>>> > It works like Connect Componenet algorithm, while the attribute of a
>>>> > vertex is a HyperLogLog counter that at k-th iteration it estimates
>>>> the
>>>> > number of vertices it could reaches less than k hops.
>>>> >
>>>> > I have successfully run the code on a graph with 20M vertices. But I
>>>> still
>>>> > need help:
>>>> >
>>>> >
>>>> > *I think the code could work more efficiently especially the "Send
>>>> > message" function, but I am not sure about what will happen if a
>>>> vertex
>>>> > receive no message at a iteration.*
>>>> >
>>>> > Here is my code: https://github.com/alcaid1801/Erdos
>>>> >
>>>> > Any returns is appreciated.
>>>> >
>>>>
>>>
>>>
>>
>

Re: Why a program would receive null from send message of mapReduceTriplets

Posted by James <al...@gmail.com>.
I am trying to run the data on spark-shell mode to find whether there is
something wrong in the code or data. As I could only reproduce the error on
a 50B edge graph.

2015-02-13 14:50 GMT+08:00 Reynold Xin <rx...@databricks.com>:

> Then maybe you actually had a null in your vertex attribute?
>
>
> On Thu, Feb 12, 2015 at 10:47 PM, James <al...@gmail.com> wrote:
>
>> I changed the mapReduceTriplets() func to aggregateMessages(), but it
>> still failed.
>>
>>
>> 2015-02-13 6:52 GMT+08:00 Reynold Xin <rx...@databricks.com>:
>>
>>> Can you use the new aggregateNeighbors method? I suspect the null is
>>> coming from "automatic join elimination", which detects bytecode to see if
>>> you need the src or dst vertex data. Occasionally it can fail to detect. In
>>> the new aggregateNeighbors API, the caller needs to explicitly specifying
>>> that, making it more robust.
>>>
>>>
>>> On Thu, Feb 12, 2015 at 6:26 AM, James <al...@gmail.com> wrote:
>>>
>>>> Hello,
>>>>
>>>> When I am running the code on a much bigger size graph, I met
>>>> NullPointerException.
>>>>
>>>> I found that is because the sendMessage() function receive a triplet
>>>> that
>>>> edge.srcAttr or edge.dstAttr is null. Thus I wonder why it will happen
>>>> as I
>>>> am sure every vertices have a attr.
>>>>
>>>> Any returns is appreciated.
>>>>
>>>> Alcaid
>>>>
>>>>
>>>> 2015-02-11 19:30 GMT+08:00 James <al...@gmail.com>:
>>>>
>>>> > Hello,
>>>> >
>>>> > Recently  I am trying to estimate the average distance of a big graph
>>>> > using spark with the help of [HyperAnf](
>>>> > http://dl.acm.org/citation.cfm?id=1963493).
>>>> >
>>>> > It works like Connect Componenet algorithm, while the attribute of a
>>>> > vertex is a HyperLogLog counter that at k-th iteration it estimates
>>>> the
>>>> > number of vertices it could reaches less than k hops.
>>>> >
>>>> > I have successfully run the code on a graph with 20M vertices. But I
>>>> still
>>>> > need help:
>>>> >
>>>> >
>>>> > *I think the code could work more efficiently especially the "Send
>>>> > message" function, but I am not sure about what will happen if a
>>>> vertex
>>>> > receive no message at a iteration.*
>>>> >
>>>> > Here is my code: https://github.com/alcaid1801/Erdos
>>>> >
>>>> > Any returns is appreciated.
>>>> >
>>>>
>>>
>>>
>>
>

Re: Why a program would receive null from send message of mapReduceTriplets

Posted by Reynold Xin <rx...@databricks.com>.
Then maybe you actually had a null in your vertex attribute?


On Thu, Feb 12, 2015 at 10:47 PM, James <al...@gmail.com> wrote:

> I changed the mapReduceTriplets() func to aggregateMessages(), but it
> still failed.
>
>
> 2015-02-13 6:52 GMT+08:00 Reynold Xin <rx...@databricks.com>:
>
>> Can you use the new aggregateNeighbors method? I suspect the null is
>> coming from "automatic join elimination", which detects bytecode to see if
>> you need the src or dst vertex data. Occasionally it can fail to detect. In
>> the new aggregateNeighbors API, the caller needs to explicitly specifying
>> that, making it more robust.
>>
>>
>> On Thu, Feb 12, 2015 at 6:26 AM, James <al...@gmail.com> wrote:
>>
>>> Hello,
>>>
>>> When I am running the code on a much bigger size graph, I met
>>> NullPointerException.
>>>
>>> I found that is because the sendMessage() function receive a triplet that
>>> edge.srcAttr or edge.dstAttr is null. Thus I wonder why it will happen
>>> as I
>>> am sure every vertices have a attr.
>>>
>>> Any returns is appreciated.
>>>
>>> Alcaid
>>>
>>>
>>> 2015-02-11 19:30 GMT+08:00 James <al...@gmail.com>:
>>>
>>> > Hello,
>>> >
>>> > Recently  I am trying to estimate the average distance of a big graph
>>> > using spark with the help of [HyperAnf](
>>> > http://dl.acm.org/citation.cfm?id=1963493).
>>> >
>>> > It works like Connect Componenet algorithm, while the attribute of a
>>> > vertex is a HyperLogLog counter that at k-th iteration it estimates the
>>> > number of vertices it could reaches less than k hops.
>>> >
>>> > I have successfully run the code on a graph with 20M vertices. But I
>>> still
>>> > need help:
>>> >
>>> >
>>> > *I think the code could work more efficiently especially the "Send
>>> > message" function, but I am not sure about what will happen if a vertex
>>> > receive no message at a iteration.*
>>> >
>>> > Here is my code: https://github.com/alcaid1801/Erdos
>>> >
>>> > Any returns is appreciated.
>>> >
>>>
>>
>>
>

Re: Why a program would receive null from send message of mapReduceTriplets

Posted by James <al...@gmail.com>.
I changed the mapReduceTriplets() func to aggregateMessages(), but it still
failed.


2015-02-13 6:52 GMT+08:00 Reynold Xin <rx...@databricks.com>:

> Can you use the new aggregateNeighbors method? I suspect the null is
> coming from "automatic join elimination", which detects bytecode to see if
> you need the src or dst vertex data. Occasionally it can fail to detect. In
> the new aggregateNeighbors API, the caller needs to explicitly specifying
> that, making it more robust.
>
>
> On Thu, Feb 12, 2015 at 6:26 AM, James <al...@gmail.com> wrote:
>
>> Hello,
>>
>> When I am running the code on a much bigger size graph, I met
>> NullPointerException.
>>
>> I found that is because the sendMessage() function receive a triplet that
>> edge.srcAttr or edge.dstAttr is null. Thus I wonder why it will happen as
>> I
>> am sure every vertices have a attr.
>>
>> Any returns is appreciated.
>>
>> Alcaid
>>
>>
>> 2015-02-11 19:30 GMT+08:00 James <al...@gmail.com>:
>>
>> > Hello,
>> >
>> > Recently  I am trying to estimate the average distance of a big graph
>> > using spark with the help of [HyperAnf](
>> > http://dl.acm.org/citation.cfm?id=1963493).
>> >
>> > It works like Connect Componenet algorithm, while the attribute of a
>> > vertex is a HyperLogLog counter that at k-th iteration it estimates the
>> > number of vertices it could reaches less than k hops.
>> >
>> > I have successfully run the code on a graph with 20M vertices. But I
>> still
>> > need help:
>> >
>> >
>> > *I think the code could work more efficiently especially the "Send
>> > message" function, but I am not sure about what will happen if a vertex
>> > receive no message at a iteration.*
>> >
>> > Here is my code: https://github.com/alcaid1801/Erdos
>> >
>> > Any returns is appreciated.
>> >
>>
>
>

Re: Why a program would receive null from send message of mapReduceTriplets

Posted by Reynold Xin <rx...@databricks.com>.
Can you use the new aggregateNeighbors method? I suspect the null is coming
from "automatic join elimination", which detects bytecode to see if you
need the src or dst vertex data. Occasionally it can fail to detect. In the
new aggregateNeighbors API, the caller needs to explicitly specifying that,
making it more robust.


On Thu, Feb 12, 2015 at 6:26 AM, James <al...@gmail.com> wrote:

> Hello,
>
> When I am running the code on a much bigger size graph, I met
> NullPointerException.
>
> I found that is because the sendMessage() function receive a triplet that
> edge.srcAttr or edge.dstAttr is null. Thus I wonder why it will happen as I
> am sure every vertices have a attr.
>
> Any returns is appreciated.
>
> Alcaid
>
>
> 2015-02-11 19:30 GMT+08:00 James <al...@gmail.com>:
>
> > Hello,
> >
> > Recently  I am trying to estimate the average distance of a big graph
> > using spark with the help of [HyperAnf](
> > http://dl.acm.org/citation.cfm?id=1963493).
> >
> > It works like Connect Componenet algorithm, while the attribute of a
> > vertex is a HyperLogLog counter that at k-th iteration it estimates the
> > number of vertices it could reaches less than k hops.
> >
> > I have successfully run the code on a graph with 20M vertices. But I
> still
> > need help:
> >
> >
> > *I think the code could work more efficiently especially the "Send
> > message" function, but I am not sure about what will happen if a vertex
> > receive no message at a iteration.*
> >
> > Here is my code: https://github.com/alcaid1801/Erdos
> >
> > Any returns is appreciated.
> >
>