You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@madlib.apache.org by Frank McQuillan <fm...@pivotal.io> on 2016/02/17 03:21:08 UTC

Follow up questions from community call

Thank you to Chenliang Wang for presenting Geographically Weighted
Regression (GWR) analysis of spatial data at the MADlib community meeting.

Here are some follow up questions that we did not get to in the meeting.
Chenliang, could you briefly respond?

1) what type of matrix operations are required?

2) how will the algorithm be parallelized?

3) is raster support for PostGIS a requirement?  (GPDB currently does not
support as per
http://gpdb.docs.pivotal.io/4340/ref_guide/postGIS.html#topic_wy2_rkb_3p)

4) what does the 160 refer to in your slides?

Thanks again,
Frank

Re: Follow up questions from community call

Posted by Ivan Novick <in...@pivotal.io>.
Note RASTER support will be in GPDB MASTER branch soon, hopefully

Cheers,
Ivan

On Tue, Feb 16, 2016 at 9:21 PM, Frank McQuillan <fm...@pivotal.io>
wrote:

> Thank you to Chenliang Wang for presenting Geographically Weighted
> Regression (GWR) analysis of spatial data at the MADlib community meeting.
>
> Here are some follow up questions that we did not get to in the meeting.
> Chenliang, could you briefly respond?
>
> 1) what type of matrix operations are required?
>
> 2) how will the algorithm be parallelized?
>
> 3) is raster support for PostGIS a requirement?  (GPDB currently does not
> support as per
> http://gpdb.docs.pivotal.io/4340/ref_guide/postGIS.html#topic_wy2_rkb_3p)
>
> 4) what does the 160 refer to in your slides?
>
> Thanks again,
> Frank
>

Re: Follow up questions from community call

Posted by chenliang wang <hi...@msn.com>.
Hi Frank,

I am sorry I did not explain it clearly. I planned to give you an
example to demonstrate the usage of GWR. But there are a number of
problems with my connection and the demo was cancelled.

To answer your questions:

1) As for the required operations or functions, some common functions
are involved in the process of estimation including extraction of the
diagonal elements for a specified matrix to generate matrix or vector,
and AIC/AICc and Adj-R2 function for linear regression.
In addition, a high performance Weighted Least Square(WLS) fitting
function ( e.g. , QR-based ) which GWR is mainly based on will be
beneficial for implementation.I am not sure all these operation are
available in current package. If these functions are not available, I
will implement them.

2) The computational burden of GWR is loop of fitting on every
regression points.  The steps of a loop are independent because they
only need to write coefficients for individual steps. GWR is similar
with doing several OLS on every steps with individual weights. I think
we can divide the entire loop into several group to parallelize the
algorithm. I think we can utilize openmp on single multicore machine and
MPP technology on GPDB. I will also welcome any suggestion you may have
at further improving the functionality of parallelized implementation so
that it can better serve our (future) needs.

3) Raster type is recently supported type in PostGIS. And the good news
is that, as Iran mentioned, GPDB will support raster type in the near
future. Maybe we could disable raster function when raster supporting is
not available .

4) It is the recommended least sample size for GWR. According to P ez et
al.(2011), the basic GWR is not an appropriate method for small sample
sizes (<160). The larger the sample size the more accurate estimation we
have. However, there is no existing GWR developed for mass data. Our
colleagues are always going on about the poor performance and
inefficiency of performing GWR on large amounts of data.

Best,
Chenliang Wang


On 02/17/2016 10:21 AM, Frank McQuillan wrote:
> Thank you to Chenliang Wang for presenting Geographically Weighted
> Regression (GWR) analysis of spatial data at the MADlib community meeting.
> 
> Here are some follow up questions that we did not get to in the meeting.
> Chenliang, could you briefly respond?
> 
> 1) what type of matrix operations are required?
> 
> 2) how will the algorithm be parallelized?
> 
> 3) is raster support for PostGIS a requirement?  (GPDB currently does not
> support as per
> http://gpdb.docs.pivotal.io/4340/ref_guide/postGIS.html#topic_wy2_rkb_3p)
> 
> 4) what does the 160 refer to in your slides?
> 
> Thanks again,
> Frank
>