You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@mahout.apache.org by Karl Wettin <ka...@gmail.com> on 2012/08/29 11:54:10 UTC

Voronoi

Hi all! Long time no see!

I'm searching for a Voronoi-implementation where the nodes/sites are represented by a polygon rather than a point. Anyone seen such a thing? 


			karl

Re: Number of Reducers in PFP Growth is always 1 !!!

Posted by 戴清灏 <ro...@gmail.com>.

PFP consists of three mapreduce jobs.
which job did you modified?
And do you mean the number of total reduce task or
just reduce task that is running?

Regards,
Q



2012/8/30 C.V.Krishnakumar Iyer <cv...@me.com>

> Hi,
>
> I've already tried setting it in the code using job.setNumReduceTasks()
> and conf.set("mapred.reduce.tasks","100").
> However, it does not seem to take the number of reducers at all, even for
> the job that does parallel counting. Any advice would be appreciated.
> Regards,
> Krishnakumar.
> On Aug 29, 2012, at 11:28 PM, 戴清灏 <ro...@gmail.com> wrote:
>
> > I doubt that you specify the config in hadoop config xml file.
> >
> > --
> > Regards,
> > Q
> >
> >
> >
> > 2012/8/30 C.V. Krishnakumar Iyer <cv...@me.com>
> >
> >> Hi,
> >>
> >> Quick question regarding PFPGrowth in Mahout 0.6:
> >>
> >> I see that there are no options to set the number of reducers in the
> >> parallel counting phase of PFP Growth. It is just simple word count - so
> >> I'm guessing it should be parallelized. But for some reason it is not!
> >>
> >> Is that intentional?
> >>
> >> Regards,
> >> Krishnakumar.
> >>
>
>

Re: Number of Reducers in PFP Growth is always 1 !!!

Posted by "C.V. Krishnakumar Iyer" <cv...@me.com>.

Hi,

Thanks for the reply. We verified the configurations. When we debugged through the driver locally, we see that the conf *does* have the property  mapred.reduce.tasks set to 100. However, when the job launches, we see that the number of reducers is 1. 
Do you know of any possible places where this property could be overwritten?

Thanks,
Krishnakumar

On Aug 30, 2012, at 2:05 AM, Sean Owen wrote:

> Block size and input size should not matter for the Reducer. You do have to
> explicitly say the number of workers.
> 
> It defaults to 1. You do set it with just these methods. Make sure you are
> setting on the right object and before you run. Look for other things that
> may be overriding it.
> 
> I don't know this job maybe it is forcing 1 for some reason.
> On Aug 30, 2012 9:58 AM, "Paritosh Ranjan" <pr...@xebia.com> wrote:
> 
>> If the problem is only the number of reduce tasks, then you can try to
>> reduce the dfs block size. This might help in triggering multiple reducers.
>> Also check the size of the mapper's output, if its greater than the block
>> size ( or the mapper output is scattered in multiple files ) , then only
>> multiple reducers would be triggered.
>> 
>> HTH,
>> Paritosh
>> 
>> On 30-08-2012 12:08, C.V.Krishnakumar Iyer wrote:
>> 
>>> Hi,
>>> 
>>> I've already tried setting it in the code using job.setNumReduceTasks()
>>> and conf.set("mapred.reduce.tasks"**,"100").
>>> However, it does not seem to take the number of reducers at all, even for
>>> the job that does parallel counting. Any advice would be appreciated.
>>> Regards,
>>> Krishnakumar.
>>> On Aug 29, 2012, at 11:28 PM, 戴清灏 <ro...@gmail.com> wrote:
>>> 
>>> I doubt that you specify the config in hadoop config xml file.
>>>> 
>>>> --
>>>> Regards,
>>>> Q
>>>> 
>>>> 
>>>> 
>>>> 2012/8/30 C.V. Krishnakumar Iyer <cv...@me.com>
>>>> 
>>>> Hi,
>>>>> 
>>>>> Quick question regarding PFPGrowth in Mahout 0.6:
>>>>> 
>>>>> I see that there are no options to set the number of reducers in the
>>>>> parallel counting phase of PFP Growth. It is just simple word count - so
>>>>> I'm guessing it should be parallelized. But for some reason it is not!
>>>>> 
>>>>> Is that intentional?
>>>>> 
>>>>> Regards,
>>>>> Krishnakumar.
>>>>> 
>>>>> 
>> 
>>

Re: Number of Reducers in PFP Growth is always 1 !!!

Posted by Sean Owen <sr...@gmail.com>.

Block size and input size should not matter for the Reducer. You do have to
explicitly say the number of workers.

It defaults to 1. You do set it with just these methods. Make sure you are
setting on the right object and before you run. Look for other things that
may be overriding it.

I don't know this job maybe it is forcing 1 for some reason.
 On Aug 30, 2012 9:58 AM, "Paritosh Ranjan" <pr...@xebia.com> wrote:

> If the problem is only the number of reduce tasks, then you can try to
> reduce the dfs block size. This might help in triggering multiple reducers.
> Also check the size of the mapper's output, if its greater than the block
> size ( or the mapper output is scattered in multiple files ) , then only
> multiple reducers would be triggered.
>
> HTH,
> Paritosh
>
> On 30-08-2012 12:08, C.V.Krishnakumar Iyer wrote:
>
>> Hi,
>>
>> I've already tried setting it in the code using job.setNumReduceTasks()
>> and conf.set("mapred.reduce.tasks"**,"100").
>> However, it does not seem to take the number of reducers at all, even for
>> the job that does parallel counting. Any advice would be appreciated.
>> Regards,
>> Krishnakumar.
>> On Aug 29, 2012, at 11:28 PM, 戴清灏 <ro...@gmail.com> wrote:
>>
>>  I doubt that you specify the config in hadoop config xml file.
>>>
>>> --
>>> Regards,
>>> Q
>>>
>>>
>>>
>>> 2012/8/30 C.V. Krishnakumar Iyer <cv...@me.com>
>>>
>>>  Hi,
>>>>
>>>> Quick question regarding PFPGrowth in Mahout 0.6:
>>>>
>>>> I see that there are no options to set the number of reducers in the
>>>> parallel counting phase of PFP Growth. It is just simple word count - so
>>>> I'm guessing it should be parallelized. But for some reason it is not!
>>>>
>>>> Is that intentional?
>>>>
>>>> Regards,
>>>> Krishnakumar.
>>>>
>>>>
>
>

Re: Number of Reducers in PFP Growth is always 1 !!!

Posted by Paritosh Ranjan <pr...@xebia.com>.

If the problem is only the number of reduce tasks, then you can try to 
reduce the dfs block size. This might help in triggering multiple reducers.
Also check the size of the mapper's output, if its greater than the 
block size ( or the mapper output is scattered in multiple files ) , 
then only multiple reducers would be triggered.

HTH,
Paritosh

On 30-08-2012 12:08, C.V.Krishnakumar Iyer wrote:
> Hi,
>
> I've already tried setting it in the code using job.setNumReduceTasks() and conf.set("mapred.reduce.tasks","100").
> However, it does not seem to take the number of reducers at all, even for the job that does parallel counting. Any advice would be appreciated.
> Regards,
> Krishnakumar.
> On Aug 29, 2012, at 11:28 PM, 戴清灏 <ro...@gmail.com> wrote:
>
>> I doubt that you specify the config in hadoop config xml file.
>>
>> --
>> Regards,
>> Q
>>
>>
>>
>> 2012/8/30 C.V. Krishnakumar Iyer <cv...@me.com>
>>
>>> Hi,
>>>
>>> Quick question regarding PFPGrowth in Mahout 0.6:
>>>
>>> I see that there are no options to set the number of reducers in the
>>> parallel counting phase of PFP Growth. It is just simple word count - so
>>> I'm guessing it should be parallelized. But for some reason it is not!
>>>
>>> Is that intentional?
>>>
>>> Regards,
>>> Krishnakumar.
>>>

Re: Number of Reducers in PFP Growth is always 1 !!!

Posted by "C.V.Krishnakumar Iyer" <cv...@me.com>.

Hi,

I've already tried setting it in the code using job.setNumReduceTasks() and conf.set("mapred.reduce.tasks","100"). 
However, it does not seem to take the number of reducers at all, even for the job that does parallel counting. Any advice would be appreciated.
Regards,
Krishnakumar.
On Aug 29, 2012, at 11:28 PM, 戴清灏 <ro...@gmail.com> wrote:

> I doubt that you specify the config in hadoop config xml file.
> 
> --
> Regards,
> Q
> 
> 
> 
> 2012/8/30 C.V. Krishnakumar Iyer <cv...@me.com>
> 
>> Hi,
>> 
>> Quick question regarding PFPGrowth in Mahout 0.6:
>> 
>> I see that there are no options to set the number of reducers in the
>> parallel counting phase of PFP Growth. It is just simple word count - so
>> I'm guessing it should be parallelized. But for some reason it is not!
>> 
>> Is that intentional?
>> 
>> Regards,
>> Krishnakumar.
>>

Re: Number of Reducers in PFP Growth is always 1 !!!

Posted by 戴清灏 <ro...@gmail.com>.

I doubt that you specify the config in hadoop config xml file.

--
Regards,
Q



2012/8/30 C.V. Krishnakumar Iyer <cv...@me.com>

> Hi,
>
> Quick question regarding PFPGrowth in Mahout 0.6:
>
> I see that there are no options to set the number of reducers in the
> parallel counting phase of PFP Growth. It is just simple word count - so
> I'm guessing it should be parallelized. But for some reason it is not!
>
> Is that intentional?
>
> Regards,
> Krishnakumar.
>

Number of Reducers in PFP Growth is always 1 !!!

Posted by "C.V. Krishnakumar Iyer" <cv...@me.com>.

Hi,

Quick question regarding PFPGrowth in Mahout 0.6:

I see that there are no options to set the number of reducers in the parallel counting phase of PFP Growth. It is just simple word count - so I'm guessing it should be parallelized. But for some reason it is not!

Is that intentional?

Regards,
Krishnakumar.

Re: Voronoi

Posted by Ted Dunning <te...@gmail.com>.

Yes.  Essentially this means construct the Voronoi tesellation for all
points and for each post code, use the union of the regions for each point
in that post code.  You will not necessarily have convex hulls for each
post-code, but you will have hulls and will almost certainly have a single
hull for each post code.

On Thu, Aug 30, 2012 at 5:09 AM, Dawid Weiss
<da...@cs.put.poznan.pl>wrote:

> > My original question was based on the thought that I could create a hull
> from the known points in each postal code and use those hulls as the sites
> in a Voronoi.
>
> This seems to make sense since you'll be effectively creating a convex
> polygon around each point for which you know the postal code. So if
> the original location of postal codes follows contiguous patterns then
> a simple algorithm much like coloring should yield groups of polygons
> with the same postal code. Whether this would look nice and
> eye-pleasing is another question ;)
>
> D.
>

Re: Voronoi

Posted by Dawid Weiss <da...@cs.put.poznan.pl>.

> My original question was based on the thought that I could create a hull from the known points in each postal code and use those hulls as the sites in a Voronoi.

This seems to make sense since you'll be effectively creating a convex
polygon around each point for which you know the postal code. So if
the original location of postal codes follows contiguous patterns then
a simple algorithm much like coloring should yield groups of polygons
with the same postal code. Whether this would look nice and
eye-pleasing is another question ;)

D.

Re: Voronoi

Posted by Karl Wettin <ka...@gmail.com>.

Hi Ted,

let me explain the original problem and (together with, rather than just) the solution I've thought of. Please excuse my somewhat messed up terminology:

I'm attempting to come up with an estimated map of the Swedish postal code system. For each postal code I'm aware of a number of points, sometimes I'm such great detail that I can create a good enough hull from the points (I would know when this is true and generally speaking this is only in urban areas), sometimes I'm just aware of a handful of points (generally speaking in the country side).

My original question was based on the thought that I could create a hull from the known points in each postal code and use those hulls as the sites in a Voronoi. 

The second post was the thought that I could create a Voronoi where each point in the postal code hulls would be be a site, and then in a post process iteration merge the clusters that are created from the same hull.

I would then have to come up with something that replaced clusters created from postal code hulls known to already be of great quality in order to improve the quality of their neighbors. I don't think that one will be a major problem.

(In the end this data is to be fed to OSM, causing a range of potential future problems that are out of scoop for this forum.)

			karl

29 aug 2012 kl. 23:39 skrev Ted Dunning:

> Karl,
> 
> I don't think that I understand your request.
> 
> What I think I hear is that you want an implementation (with unknown inputs
> and outputs) that encodes a Voronoi tesselation using boundary vertices
> instead of centroids.
> 
> Is that correct?
> 
> If so, it is relatively easy to go from centroid form to boundary vertex
> form.  Boundary edges are segments of the equidistant lines between
> centroids that form a minimal convex hull around the centroids.  The
> intersections of adjacent boundary edge segments are the vertices.
> 
> Going back is also relatively easy.  Centroids are the intersection of
> perpindicular bisectors of the boundary edges.
> 
> As far as concrete implementations go, I have found the tripack library in
> R quite useful.  For example, I generated the image at
> https://dl.dropbox.com/u/36863361/k-means-3.png by using the following R
> code:
> 
> png("k-means-3.png", width=600, height=600)
> plot(voronoi.mosaic(c$centers[,1], c$centers[,2],duplicate="remove"),
> main="", xlab="", sub="")
> points(x[,1], x[,2], cex=0.5, col='red', type='p')
> points(c$centers[,1], c$centers[,2])
> points(c$centers[,1], c$centers[,2], cex=0.5)
> points(c$centers[,1], c$centers[,2], cex=0.2)
> dev.off()
> 
> 
> 
> 
> On Wed, Aug 29, 2012 at 8:10 AM, Karl Wettin <ka...@gmail.com> wrote:
> 
>> 
>> 29 aug 2012 kl. 11:54 skrev Karl Wettin:
>> 
>>> I'm searching for a Voronoi-implementation where the nodes/sites are
>> represented by a polygon rather than a point. Anyone seen such a thing?
>> 
>> I suppose one solution would be to use each point in the polygon as a site
>> and then merge all clusters created from the points from the same polygon.
>> Is that the only solution?
>> 
>> 
>>                        karl

Re: Voronoi

Posted by Ted Dunning <te...@gmail.com>.

Karl,

I don't think that I understand your request.

What I think I hear is that you want an implementation (with unknown inputs
and outputs) that encodes a Voronoi tesselation using boundary vertices
instead of centroids.

Is that correct?

If so, it is relatively easy to go from centroid form to boundary vertex
form.  Boundary edges are segments of the equidistant lines between
centroids that form a minimal convex hull around the centroids.  The
intersections of adjacent boundary edge segments are the vertices.

Going back is also relatively easy.  Centroids are the intersection of
perpindicular bisectors of the boundary edges.

As far as concrete implementations go, I have found the tripack library in
R quite useful.  For example, I generated the image at
https://dl.dropbox.com/u/36863361/k-means-3.png by using the following R
code:

png("k-means-3.png", width=600, height=600)
plot(voronoi.mosaic(c$centers[,1], c$centers[,2],duplicate="remove"),
main="", xlab="", sub="")
points(x[,1], x[,2], cex=0.5, col='red', type='p')
points(c$centers[,1], c$centers[,2])
points(c$centers[,1], c$centers[,2], cex=0.5)
points(c$centers[,1], c$centers[,2], cex=0.2)
dev.off()

On Wed, Aug 29, 2012 at 8:10 AM, Karl Wettin <ka...@gmail.com> wrote:

>
> 29 aug 2012 kl. 11:54 skrev Karl Wettin:
>
> > I'm searching for a Voronoi-implementation where the nodes/sites are
> represented by a polygon rather than a point. Anyone seen such a thing?
>
> I suppose one solution would be to use each point in the polygon as a site
> and then merge all clusters created from the points from the same polygon.
> Is that the only solution?
>
>
>                         karl

Re: Voronoi

Posted by Karl Wettin <ka...@gmail.com>.

29 aug 2012 kl. 11:54 skrev Karl Wettin:

> I'm searching for a Voronoi-implementation where the nodes/sites are represented by a polygon rather than a point. Anyone seen such a thing? 

I suppose one solution would be to use each point in the polygon as a site and then merge all clusters created from the points from the same polygon. Is that the only solution?


			karl