You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@sedona.apache.org by Jerrod Wagnon <Je...@jbhunt.com> on 2021/02/23 16:41:59 UTC

Attribute Columns Question

I'm sure this is something simple I'm missing.  Caveat, I'm not a developer, but can manage.

Is there something different that needs to be done in Sedona vs the previous Snapshot version for Spark 3.0 to get additional columns to carry through in the JoinQuery.SpatialJoinQueryFlat results?  Previously, I just passed the columns in with the Adapter.toSpatialRDD and they carried through.  Now, I'm just just getting my two Geometry columns when converting back to a dataframe.  I've tried passing the left and right field names into Adapter.toDf, but that results in an error when displaying the resulting dataframe.  I'm using Scala in Databricks.  I've read the online documentation, but can't seem to find examples that help in this scenario.

Sedona:
[cid:image001.jpg@01D709D0.81EA6A30]

[cid:image002.jpg@01D709D0.81EA6A30]


Snapshot:

[cid:image003.jpg@01D709D0.81EA6A30]

[cid:image004.jpg@01D709D0.81EA6A30]

Thanks,

Jerrod
This email contains confidential material for the sole use of the intended recipient(s). Any review, use, distribution, or disclosure by others is strictly prohibited. If you are not the intended recipient (or authorized to receive for the recipient), please contact the sender by reply email and delete all copies of this message.

Re: Attribute Columns Question

Posted by Jia Yu <ji...@apache.org>.

You should still use the same code.

*import org.locationtech.jts.geom.Geometry*

Because Sedona Python Adapter packages JTS internally:
https://github.com/apache/incubator-sedona/blob/master/python-adapter/pom.xml#L41

On Wed, Feb 24, 2021 at 6:10 AM Jerrod Wagnon <Je...@jbhunt.com>
wrote:

> Thank you for the follow up.  Testing out those specific libraries, but
> not sure where to source the Geometry type from now, since I’m not using
> JTS anymore.
>
>
>
> *import org.locationtech.jts.geom.Geometry*
>
>
>
> command-4367909143506860:37: error: not found: type Geometry var pointRDD
> = new SpatialRDD[Geometry]() ^
>
>
>
> Thanks,
>
>
>
> Jerrod
>
>
>
> *From:* Jia Yu <ji...@apache.org>
> *Sent:* Tuesday, February 23, 2021 7:02 PM
> *To:* Jerrod Wagnon <Je...@jbhunt.com>; dev@sedona.apache.org
> *Subject:* Re: Attribute Columns Question
>
>
>
> *CAUTION:* This email originated from outside of the organization. Do not
> click links or open attachments unless you recognize the sender and know
> the content is safe.
>
>
>
> BTW, in fact, you just need to use
>
>
>
> sedona-python-adapter,
>
> sedona-viz,
>
> geotools-wrapper,
>
> sernetcdf.
>
>
>
> JTS / jts2geojson jars are not needed. Then the JTS / jts2geojson
> conflict issue should be gone.
>
>
>
> On Tue, Feb 23, 2021 at 4:52 PM Jia Yu <ji...@apache.org> wrote:
>
> Jerrod,
>
>
>
> Just updated the doc. We were struggling to write docs for all these
> packaging issues in Sedona 1.0.0. The latest doc of Scala and Java
> dependency should be super clear:
>
>
>
> You need to open this link in an incognito window which automatically
> clears the old browser cache.
>
>
>
>
> http://sedona.apache.org/download/GeoSpark-All-Modules-Maven-Central-Coordinates/#spark-30-scala-212
> <https://urldefense.proofpoint.com/v2/url?u=http-3A__sedona.apache.org_download_GeoSpark-2DAll-2DModules-2DMaven-2DCentral-2DCoordinates_-23spark-2D30-2Dscala-2D212&d=DwMFaQ&c=MsXLK6sQQBUgeD0JbTyYgA&r=9azZCQem_NPy1XE-fJ5d4mblFkmnSjMQGyiXFG3JznU&m=yywm82BF4_20l0Wnhj2fn25WixwguBJs4N63R46vOek&s=G5zHcJlsAriSFXIbGzDLHsuLfrD9mNuwY7rB0snDRo4&e=>
>
>
>
> On Tue, Feb 23, 2021 at 2:01 PM Jia Yu <ji...@apache.org> wrote:
>
> Hi Jerrod,
>
>
>
> If you look into that code example, actually there is one more line before
> it: import scala.collection.JavaConversions._
>
>
>
> Just add this line, you will be fine.
>
>
>
> Sedona 1.0.0 has to use JTS 1.18. You may have other errors later on.
> jts2geojson library, internally pacakge JTS 1.16. You will need to exclude
> it probably:
> https://github.com/bjornharrtell/jts2geojson/blob/master/pom.xml#L58
> <https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_bjornharrtell_jts2geojson_blob_master_pom.xml-23L58&d=DwMFaQ&c=MsXLK6sQQBUgeD0JbTyYgA&r=9azZCQem_NPy1XE-fJ5d4mblFkmnSjMQGyiXFG3JznU&m=yywm82BF4_20l0Wnhj2fn25WixwguBJs4N63R46vOek&s=l8ibdWi9EcMR83IYx1gy0Og14K760xp7gL_jroIk85c&e=>
>
>
>
> We will update our docs shortly
>
>
>
>
>
>
>
> On Tue, Feb 23, 2021 at 1:35 PM Jerrod Wagnon <Je...@jbhunt.com>
> wrote:
>
> Thanks.  There was a conflict with that method and the rdd.fieldNames
> (java list vs a sequence) when I tried to pass those in from the rdd.
>
>
>
> command-4367909143506860:54: error: type mismatch; found :
> java.util.List[String] required: Seq[String] var joinResultTest =
> Adapter.toDf(resultPairRDD, polygonRDD.fieldNames, pointRDD.fieldNames,
> spark)
>
>
>
> However, I was able to create the sequences manually and it seems to be
> working fine now.  I had tried this previously, but had the geometry
> columns included in the sequence, which isn’t needed.
>
>
>
> *var joinResultDf = Adapter.toDf(resultPairRDD, Seq("LocationCode"),
> Seq("UniqueID","EquipmentID","MKT_AREA"), spark)*
>
>
>
> Also, in Databricks, I was having issues with the latest library for
> locationtech.  1.18 was having a conflict with wololo, so I ended up just
> loading 1.16 instead.  That might not be a good solution, but it was the
> only way I could get all the dependencies to load properly on cluster
> restart.  Just wanted to mention that in case anyone else runs into issues
> with Databricks Spark 3.0.1 and Scala 2.12.  Everything seems to be working
> for my use cases with these libraries.
>
>
>
>
>
> Thanks again.  Really appreciate the help and your quick response!
>
>
>
> Jerrod
>
>
>
> *From:* Jia Yu <ji...@apache.org>
> *Sent:* Tuesday, February 23, 2021 1:41 PM
> *To:* dev@sedona.apache.org; Jerrod Wagnon <Je...@jbhunt.com>
> *Subject:* Re: Attribute Columns Question
>
>
>
> *CAUTION:* This email originated from outside of the organization. Do not
> click links or open attachments unless you recognize the sender and know
> the content is safe.
>
>
>
> Hi Jerrod,
>
>
>
> Can you try to use this method:
> https://github.com/apache/incubator-sedona/blob/master/sql/src/test/scala/org/apache/sedona/sql/adapterTestScala.scala#L138
> <https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_incubator-2Dsedona_blob_master_sql_src_test_scala_org_apache_sedona_sql_adapterTestScala.scala-23L138&d=DwMFaQ&c=MsXLK6sQQBUgeD0JbTyYgA&r=9azZCQem_NPy1XE-fJ5d4mblFkmnSjMQGyiXFG3JznU&m=vZEjTroVnQm5j1w7ocpG9We1c_0EKseB4KvDXL9Noek&s=yHlmPPdrz1gjnP-gAHrZPweDJ_CSudJO0S3qRF3er6w&e=>
>
>
>
> Basically, you need to use rdd.fieldnames as input parameters. I think our
> doc missed this part.
>
>
>
> Our Adapter implementation for join query result is here:
> https://github.com/apache/incubator-sedona/blob/master/sql/src/main/scala/org/apache/sedona/sql/utils/Adapter.scala#L130
> <https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_incubator-2Dsedona_blob_master_sql_src_main_scala_org_apache_sedona_sql_utils_Adapter.scala-23L130&d=DwMFaQ&c=MsXLK6sQQBUgeD0JbTyYgA&r=9azZCQem_NPy1XE-fJ5d4mblFkmnSjMQGyiXFG3JznU&m=vZEjTroVnQm5j1w7ocpG9We1c_0EKseB4KvDXL9Noek&s=o7FT3PmYqmPFJrSKmNt4doxdcub3uUq4oaOerIv8xd0&e=>
>
>
>
> Thanks,
>
> Jia
>
>
>
> On Tue, Feb 23, 2021 at 9:13 AM Jerrod Wagnon <Je...@jbhunt.com>
> wrote:
>
> I’m sure this is something simple I’m missing.  Caveat, I’m not a
> developer, but can manage.
>
>
>
> Is there something different that needs to be done in Sedona vs the
> previous Snapshot version for Spark 3.0 to get additional columns to carry
> through in the JoinQuery.SpatialJoinQueryFlat results?  Previously, I just
> passed the columns in with the Adapter.toSpatialRDD and they carried
> through.  Now, I'm just just getting my two Geometry columns when
> converting back to a dataframe.  I've tried passing the left and right
> field names into Adapter.toDf, but that results in an error when displaying
> the resulting dataframe.  I’m using Scala in Databricks.  I’ve read the
> online documentation, but can’t seem to find examples that help in this
> scenario.
>
>
>
> *Sedona:*
>
>
>
>
>
> *Snapshot:*
>
>
>
>
>
> Thanks,
>
>
>
> *Jerrod*
>
> This email contains confidential material for the sole use of the intended
> recipient(s). Any review, use, distribution, or disclosure by others is
> strictly prohibited. If you are not the intended recipient (or authorized
> to receive for the recipient), please contact the sender by reply email and
> delete all copies of this message.
>
> This email contains confidential material for the sole use of the intended
> recipient(s). Any review, use, distribution, or disclosure by others is
> strictly prohibited. If you are not the intended recipient (or authorized
> to receive for the recipient), please contact the sender by reply email and
> delete all copies of this message.
>
> This email contains confidential material for the sole use of the intended
> recipient(s). Any review, use, distribution, or disclosure by others is
> strictly prohibited. If you are not the intended recipient (or authorized
> to receive for the recipient), please contact the sender by reply email and
> delete all copies of this message.
>

RE: Attribute Columns Question

Posted by Jerrod Wagnon <Je...@jbhunt.com>.

Thank you for the follow up.  Testing out those specific libraries, but not sure where to source the Geometry type from now, since I’m not using JTS anymore.

import org.locationtech.jts.geom.Geometry

command-4367909143506860:37: error: not found: type Geometry var pointRDD = new SpatialRDD[Geometry]() ^

Thanks,

Jerrod

From: Jia Yu <ji...@apache.org>
Sent: Tuesday, February 23, 2021 7:02 PM
To: Jerrod Wagnon <Je...@jbhunt.com>; dev@sedona.apache.org
Subject: Re: Attribute Columns Question

CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you recognize the sender and know the content is safe.

BTW, in fact, you just need to use

sedona-python-adapter,
sedona-viz,
geotools-wrapper,
sernetcdf.


JTS / jts2geojson jars are not needed. Then the JTS / jts2geojson conflict issue should be gone.

On Tue, Feb 23, 2021 at 4:52 PM Jia Yu <ji...@apache.org>> wrote:
Jerrod,

Just updated the doc. We were struggling to write docs for all these packaging issues in Sedona 1.0.0. The latest doc of Scala and Java dependency should be super clear:

You need to open this link in an incognito window which automatically clears the old browser cache.

http://sedona.apache.org/download/GeoSpark-All-Modules-Maven-Central-Coordinates/#spark-30-scala-212<https://urldefense.proofpoint.com/v2/url?u=http-3A__sedona.apache.org_download_GeoSpark-2DAll-2DModules-2DMaven-2DCentral-2DCoordinates_-23spark-2D30-2Dscala-2D212&d=DwMFaQ&c=MsXLK6sQQBUgeD0JbTyYgA&r=9azZCQem_NPy1XE-fJ5d4mblFkmnSjMQGyiXFG3JznU&m=yywm82BF4_20l0Wnhj2fn25WixwguBJs4N63R46vOek&s=G5zHcJlsAriSFXIbGzDLHsuLfrD9mNuwY7rB0snDRo4&e=>

On Tue, Feb 23, 2021 at 2:01 PM Jia Yu <ji...@apache.org>> wrote:
Hi Jerrod,

If you look into that code example, actually there is one more line before it: import scala.collection.JavaConversions._


Just add this line, you will be fine.

Sedona 1.0.0 has to use JTS 1.18. You may have other errors later on. jts2geojson library, internally pacakge JTS 1.16. You will need to exclude it probably: https://github.com/bjornharrtell/jts2geojson/blob/master/pom.xml#L58<https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_bjornharrtell_jts2geojson_blob_master_pom.xml-23L58&d=DwMFaQ&c=MsXLK6sQQBUgeD0JbTyYgA&r=9azZCQem_NPy1XE-fJ5d4mblFkmnSjMQGyiXFG3JznU&m=yywm82BF4_20l0Wnhj2fn25WixwguBJs4N63R46vOek&s=l8ibdWi9EcMR83IYx1gy0Og14K760xp7gL_jroIk85c&e=>

We will update our docs shortly



On Tue, Feb 23, 2021 at 1:35 PM Jerrod Wagnon <Je...@jbhunt.com>> wrote:
Thanks.  There was a conflict with that method and the rdd.fieldNames (java list vs a sequence) when I tried to pass those in from the rdd.

command-4367909143506860:54: error: type mismatch; found : java.util.List[String] required: Seq[String] var joinResultTest = Adapter.toDf(resultPairRDD, polygonRDD.fieldNames, pointRDD.fieldNames, spark)

However, I was able to create the sequences manually and it seems to be working fine now.  I had tried this previously, but had the geometry columns included in the sequence, which isn’t needed.

var joinResultDf = Adapter.toDf(resultPairRDD, Seq("LocationCode"), Seq("UniqueID","EquipmentID","MKT_AREA"), spark)

Also, in Databricks, I was having issues with the latest library for locationtech.  1.18 was having a conflict with wololo, so I ended up just loading 1.16 instead.  That might not be a good solution, but it was the only way I could get all the dependencies to load properly on cluster restart.  Just wanted to mention that in case anyone else runs into issues with Databricks Spark 3.0.1 and Scala 2.12.  Everything seems to be working for my use cases with these libraries.

[cid:image003.jpg@01D70A84.74A2B4E0]

Thanks again.  Really appreciate the help and your quick response!

Jerrod

From: Jia Yu <ji...@apache.org>>
Sent: Tuesday, February 23, 2021 1:41 PM
To: dev@sedona.apache.org<ma...@sedona.apache.org>; Jerrod Wagnon <Je...@jbhunt.com>>
Subject: Re: Attribute Columns Question

CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you recognize the sender and know the content is safe.

Hi Jerrod,

Can you try to use this method: https://github.com/apache/incubator-sedona/blob/master/sql/src/test/scala/org/apache/sedona/sql/adapterTestScala.scala#L138<https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_incubator-2Dsedona_blob_master_sql_src_test_scala_org_apache_sedona_sql_adapterTestScala.scala-23L138&d=DwMFaQ&c=MsXLK6sQQBUgeD0JbTyYgA&r=9azZCQem_NPy1XE-fJ5d4mblFkmnSjMQGyiXFG3JznU&m=vZEjTroVnQm5j1w7ocpG9We1c_0EKseB4KvDXL9Noek&s=yHlmPPdrz1gjnP-gAHrZPweDJ_CSudJO0S3qRF3er6w&e=>

Basically, you need to use rdd.fieldnames as input parameters. I think our doc missed this part.

Our Adapter implementation for join query result is here: https://github.com/apache/incubator-sedona/blob/master/sql/src/main/scala/org/apache/sedona/sql/utils/Adapter.scala#L130<https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_incubator-2Dsedona_blob_master_sql_src_main_scala_org_apache_sedona_sql_utils_Adapter.scala-23L130&d=DwMFaQ&c=MsXLK6sQQBUgeD0JbTyYgA&r=9azZCQem_NPy1XE-fJ5d4mblFkmnSjMQGyiXFG3JznU&m=vZEjTroVnQm5j1w7ocpG9We1c_0EKseB4KvDXL9Noek&s=o7FT3PmYqmPFJrSKmNt4doxdcub3uUq4oaOerIv8xd0&e=>

Thanks,
Jia

On Tue, Feb 23, 2021 at 9:13 AM Jerrod Wagnon <Je...@jbhunt.com>> wrote:
I’m sure this is something simple I’m missing.  Caveat, I’m not a developer, but can manage.

Is there something different that needs to be done in Sedona vs the previous Snapshot version for Spark 3.0 to get additional columns to carry through in the JoinQuery.SpatialJoinQueryFlat results?  Previously, I just passed the columns in with the Adapter.toSpatialRDD and they carried through.  Now, I'm just just getting my two Geometry columns when converting back to a dataframe.  I've tried passing the left and right field names into Adapter.toDf, but that results in an error when displaying the resulting dataframe.  I’m using Scala in Databricks.  I’ve read the online documentation, but can’t seem to find examples that help in this scenario.

Sedona:



Snapshot:


Thanks,

Jerrod
This email contains confidential material for the sole use of the intended recipient(s). Any review, use, distribution, or disclosure by others is strictly prohibited. If you are not the intended recipient (or authorized to receive for the recipient), please contact the sender by reply email and delete all copies of this message.
This email contains confidential material for the sole use of the intended recipient(s). Any review, use, distribution, or disclosure by others is strictly prohibited. If you are not the intended recipient (or authorized to receive for the recipient), please contact the sender by reply email and delete all copies of this message.
This email contains confidential material for the sole use of the intended recipient(s). Any review, use, distribution, or disclosure by others is strictly prohibited. If you are not the intended recipient (or authorized to receive for the recipient), please contact the sender by reply email and delete all copies of this message.

Re: Attribute Columns Question

Posted by Jia Yu <ji...@apache.org>.

BTW, in fact, you just need to use

sedona-python-adapter, sedona-viz, geotools-wrapper, sernetcdf.

JTS / jts2geojson jars are not needed. Then the JTS / jts2geojson conflict
issue should be gone.

On Tue, Feb 23, 2021 at 4:52 PM Jia Yu <ji...@apache.org> wrote:

> Jerrod,
>
> Just updated the doc. We were struggling to write docs for all these
> packaging issues in Sedona 1.0.0. The latest doc of Scala and Java
> dependency should be super clear:
>
> You need to open this link in an incognito window which automatically
> clears the old browser cache.
>
>
> http://sedona.apache.org/download/GeoSpark-All-Modules-Maven-Central-Coordinates/#spark-30-scala-212
>
> On Tue, Feb 23, 2021 at 2:01 PM Jia Yu <ji...@apache.org> wrote:
>
>> Hi Jerrod,
>>
>> If you look into that code example, actually there is one more line
>> before it: import scala.collection.JavaConversions._
>>
>> Just add this line, you will be fine.
>>
>> Sedona 1.0.0 has to use JTS 1.18. You may have other errors later on.
>> jts2geojson library, internally pacakge JTS 1.16. You will need to exclude
>> it probably:
>> https://github.com/bjornharrtell/jts2geojson/blob/master/pom.xml#L58
>>
>> We will update our docs shortly
>>
>>
>>
>> On Tue, Feb 23, 2021 at 1:35 PM Jerrod Wagnon <Je...@jbhunt.com>
>> wrote:
>>
>>> Thanks.  There was a conflict with that method and the rdd.fieldNames
>>> (java list vs a sequence) when I tried to pass those in from the rdd.
>>>
>>>
>>>
>>> command-4367909143506860:54: error: type mismatch; found :
>>> java.util.List[String] required: Seq[String] var joinResultTest =
>>> Adapter.toDf(resultPairRDD, polygonRDD.fieldNames, pointRDD.fieldNames,
>>> spark)
>>>
>>>
>>>
>>> However, I was able to create the sequences manually and it seems to be
>>> working fine now.  I had tried this previously, but had the geometry
>>> columns included in the sequence, which isn’t needed.
>>>
>>>
>>>
>>> *var joinResultDf = Adapter.toDf(resultPairRDD, Seq("LocationCode"),
>>> Seq("UniqueID","EquipmentID","MKT_AREA"), spark)*
>>>
>>>
>>>
>>> Also, in Databricks, I was having issues with the latest library for
>>> locationtech.  1.18 was having a conflict with wololo, so I ended up just
>>> loading 1.16 instead.  That might not be a good solution, but it was the
>>> only way I could get all the dependencies to load properly on cluster
>>> restart.  Just wanted to mention that in case anyone else runs into issues
>>> with Databricks Spark 3.0.1 and Scala 2.12.  Everything seems to be working
>>> for my use cases with these libraries.
>>>
>>>
>>>
>>>
>>>
>>> Thanks again.  Really appreciate the help and your quick response!
>>>
>>>
>>>
>>> Jerrod
>>>
>>>
>>>
>>> *From:* Jia Yu <ji...@apache.org>
>>> *Sent:* Tuesday, February 23, 2021 1:41 PM
>>> *To:* dev@sedona.apache.org; Jerrod Wagnon <Je...@jbhunt.com>
>>> *Subject:* Re: Attribute Columns Question
>>>
>>>
>>>
>>> *CAUTION:* This email originated from outside of the organization. Do
>>> not click links or open attachments unless you recognize the sender and
>>> know the content is safe.
>>>
>>>
>>>
>>> Hi Jerrod,
>>>
>>>
>>>
>>> Can you try to use this method:
>>> https://github.com/apache/incubator-sedona/blob/master/sql/src/test/scala/org/apache/sedona/sql/adapterTestScala.scala#L138
>>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_incubator-2Dsedona_blob_master_sql_src_test_scala_org_apache_sedona_sql_adapterTestScala.scala-23L138&d=DwMFaQ&c=MsXLK6sQQBUgeD0JbTyYgA&r=9azZCQem_NPy1XE-fJ5d4mblFkmnSjMQGyiXFG3JznU&m=vZEjTroVnQm5j1w7ocpG9We1c_0EKseB4KvDXL9Noek&s=yHlmPPdrz1gjnP-gAHrZPweDJ_CSudJO0S3qRF3er6w&e=>
>>>
>>>
>>>
>>> Basically, you need to use rdd.fieldnames as input parameters. I think
>>> our doc missed this part.
>>>
>>>
>>>
>>> Our Adapter implementation for join query result is here:
>>> https://github.com/apache/incubator-sedona/blob/master/sql/src/main/scala/org/apache/sedona/sql/utils/Adapter.scala#L130
>>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_incubator-2Dsedona_blob_master_sql_src_main_scala_org_apache_sedona_sql_utils_Adapter.scala-23L130&d=DwMFaQ&c=MsXLK6sQQBUgeD0JbTyYgA&r=9azZCQem_NPy1XE-fJ5d4mblFkmnSjMQGyiXFG3JznU&m=vZEjTroVnQm5j1w7ocpG9We1c_0EKseB4KvDXL9Noek&s=o7FT3PmYqmPFJrSKmNt4doxdcub3uUq4oaOerIv8xd0&e=>
>>>
>>>
>>>
>>> Thanks,
>>>
>>> Jia
>>>
>>>
>>>
>>> On Tue, Feb 23, 2021 at 9:13 AM Jerrod Wagnon <Je...@jbhunt.com>
>>> wrote:
>>>
>>> I’m sure this is something simple I’m missing.  Caveat, I’m not a
>>> developer, but can manage.
>>>
>>>
>>>
>>> Is there something different that needs to be done in Sedona vs the
>>> previous Snapshot version for Spark 3.0 to get additional columns to carry
>>> through in the JoinQuery.SpatialJoinQueryFlat results?  Previously, I just
>>> passed the columns in with the Adapter.toSpatialRDD and they carried
>>> through.  Now, I'm just just getting my two Geometry columns when
>>> converting back to a dataframe.  I've tried passing the left and right
>>> field names into Adapter.toDf, but that results in an error when displaying
>>> the resulting dataframe.  I’m using Scala in Databricks.  I’ve read the
>>> online documentation, but can’t seem to find examples that help in this
>>> scenario.
>>>
>>>
>>>
>>> *Sedona:*
>>>
>>>
>>>
>>>
>>>
>>> *Snapshot:*
>>>
>>>
>>>
>>>
>>>
>>> Thanks,
>>>
>>>
>>>
>>> *Jerrod*
>>>
>>> This email contains confidential material for the sole use of the
>>> intended recipient(s). Any review, use, distribution, or disclosure by
>>> others is strictly prohibited. If you are not the intended recipient (or
>>> authorized to receive for the recipient), please contact the sender by
>>> reply email and delete all copies of this message.
>>>
>>> This email contains confidential material for the sole use of the
>>> intended recipient(s). Any review, use, distribution, or disclosure by
>>> others is strictly prohibited. If you are not the intended recipient (or
>>> authorized to receive for the recipient), please contact the sender by
>>> reply email and delete all copies of this message.
>>>
>>

Re: Attribute Columns Question

Posted by Jia Yu <ji...@apache.org>.

Jerrod,

Just updated the doc. We were struggling to write docs for all these
packaging issues in Sedona 1.0.0. The latest doc of Scala and Java
dependency should be super clear:

You need to open this link in an incognito window which automatically
clears the old browser cache.

http://sedona.apache.org/download/GeoSpark-All-Modules-Maven-Central-Coordinates/#spark-30-scala-212

On Tue, Feb 23, 2021 at 2:01 PM Jia Yu <ji...@apache.org> wrote:

> Hi Jerrod,
>
> If you look into that code example, actually there is one more line before
> it: import scala.collection.JavaConversions._
>
> Just add this line, you will be fine.
>
> Sedona 1.0.0 has to use JTS 1.18. You may have other errors later on.
> jts2geojson library, internally pacakge JTS 1.16. You will need to exclude
> it probably:
> https://github.com/bjornharrtell/jts2geojson/blob/master/pom.xml#L58
>
> We will update our docs shortly
>
>
>
> On Tue, Feb 23, 2021 at 1:35 PM Jerrod Wagnon <Je...@jbhunt.com>
> wrote:
>
>> Thanks.  There was a conflict with that method and the rdd.fieldNames
>> (java list vs a sequence) when I tried to pass those in from the rdd.
>>
>>
>>
>> command-4367909143506860:54: error: type mismatch; found :
>> java.util.List[String] required: Seq[String] var joinResultTest =
>> Adapter.toDf(resultPairRDD, polygonRDD.fieldNames, pointRDD.fieldNames,
>> spark)
>>
>>
>>
>> However, I was able to create the sequences manually and it seems to be
>> working fine now.  I had tried this previously, but had the geometry
>> columns included in the sequence, which isn’t needed.
>>
>>
>>
>> *var joinResultDf = Adapter.toDf(resultPairRDD, Seq("LocationCode"),
>> Seq("UniqueID","EquipmentID","MKT_AREA"), spark)*
>>
>>
>>
>> Also, in Databricks, I was having issues with the latest library for
>> locationtech.  1.18 was having a conflict with wololo, so I ended up just
>> loading 1.16 instead.  That might not be a good solution, but it was the
>> only way I could get all the dependencies to load properly on cluster
>> restart.  Just wanted to mention that in case anyone else runs into issues
>> with Databricks Spark 3.0.1 and Scala 2.12.  Everything seems to be working
>> for my use cases with these libraries.
>>
>>
>>
>>
>>
>> Thanks again.  Really appreciate the help and your quick response!
>>
>>
>>
>> Jerrod
>>
>>
>>
>> *From:* Jia Yu <ji...@apache.org>
>> *Sent:* Tuesday, February 23, 2021 1:41 PM
>> *To:* dev@sedona.apache.org; Jerrod Wagnon <Je...@jbhunt.com>
>> *Subject:* Re: Attribute Columns Question
>>
>>
>>
>> *CAUTION:* This email originated from outside of the organization. Do
>> not click links or open attachments unless you recognize the sender and
>> know the content is safe.
>>
>>
>>
>> Hi Jerrod,
>>
>>
>>
>> Can you try to use this method:
>> https://github.com/apache/incubator-sedona/blob/master/sql/src/test/scala/org/apache/sedona/sql/adapterTestScala.scala#L138
>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_incubator-2Dsedona_blob_master_sql_src_test_scala_org_apache_sedona_sql_adapterTestScala.scala-23L138&d=DwMFaQ&c=MsXLK6sQQBUgeD0JbTyYgA&r=9azZCQem_NPy1XE-fJ5d4mblFkmnSjMQGyiXFG3JznU&m=vZEjTroVnQm5j1w7ocpG9We1c_0EKseB4KvDXL9Noek&s=yHlmPPdrz1gjnP-gAHrZPweDJ_CSudJO0S3qRF3er6w&e=>
>>
>>
>>
>> Basically, you need to use rdd.fieldnames as input parameters. I think
>> our doc missed this part.
>>
>>
>>
>> Our Adapter implementation for join query result is here:
>> https://github.com/apache/incubator-sedona/blob/master/sql/src/main/scala/org/apache/sedona/sql/utils/Adapter.scala#L130
>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_incubator-2Dsedona_blob_master_sql_src_main_scala_org_apache_sedona_sql_utils_Adapter.scala-23L130&d=DwMFaQ&c=MsXLK6sQQBUgeD0JbTyYgA&r=9azZCQem_NPy1XE-fJ5d4mblFkmnSjMQGyiXFG3JznU&m=vZEjTroVnQm5j1w7ocpG9We1c_0EKseB4KvDXL9Noek&s=o7FT3PmYqmPFJrSKmNt4doxdcub3uUq4oaOerIv8xd0&e=>
>>
>>
>>
>> Thanks,
>>
>> Jia
>>
>>
>>
>> On Tue, Feb 23, 2021 at 9:13 AM Jerrod Wagnon <Je...@jbhunt.com>
>> wrote:
>>
>> I’m sure this is something simple I’m missing.  Caveat, I’m not a
>> developer, but can manage.
>>
>>
>>
>> Is there something different that needs to be done in Sedona vs the
>> previous Snapshot version for Spark 3.0 to get additional columns to carry
>> through in the JoinQuery.SpatialJoinQueryFlat results?  Previously, I just
>> passed the columns in with the Adapter.toSpatialRDD and they carried
>> through.  Now, I'm just just getting my two Geometry columns when
>> converting back to a dataframe.  I've tried passing the left and right
>> field names into Adapter.toDf, but that results in an error when displaying
>> the resulting dataframe.  I’m using Scala in Databricks.  I’ve read the
>> online documentation, but can’t seem to find examples that help in this
>> scenario.
>>
>>
>>
>> *Sedona:*
>>
>>
>>
>>
>>
>> *Snapshot:*
>>
>>
>>
>>
>>
>> Thanks,
>>
>>
>>
>> *Jerrod*
>>
>> This email contains confidential material for the sole use of the
>> intended recipient(s). Any review, use, distribution, or disclosure by
>> others is strictly prohibited. If you are not the intended recipient (or
>> authorized to receive for the recipient), please contact the sender by
>> reply email and delete all copies of this message.
>>
>> This email contains confidential material for the sole use of the
>> intended recipient(s). Any review, use, distribution, or disclosure by
>> others is strictly prohibited. If you are not the intended recipient (or
>> authorized to receive for the recipient), please contact the sender by
>> reply email and delete all copies of this message.
>>
>

Re: Attribute Columns Question

Posted by Jia Yu <ji...@apache.org>.

Hi Jerrod,

If you look into that code example, actually there is one more line before
it: import scala.collection.JavaConversions._

Just add this line, you will be fine.

Sedona 1.0.0 has to use JTS 1.18. You may have other errors later on.
jts2geojson library, internally pacakge JTS 1.16. You will need to exclude
it probably:
https://github.com/bjornharrtell/jts2geojson/blob/master/pom.xml#L58

We will update our docs shortly



On Tue, Feb 23, 2021 at 1:35 PM Jerrod Wagnon <Je...@jbhunt.com>
wrote:

> Thanks.  There was a conflict with that method and the rdd.fieldNames
> (java list vs a sequence) when I tried to pass those in from the rdd.
>
>
>
> command-4367909143506860:54: error: type mismatch; found :
> java.util.List[String] required: Seq[String] var joinResultTest =
> Adapter.toDf(resultPairRDD, polygonRDD.fieldNames, pointRDD.fieldNames,
> spark)
>
>
>
> However, I was able to create the sequences manually and it seems to be
> working fine now.  I had tried this previously, but had the geometry
> columns included in the sequence, which isn’t needed.
>
>
>
> *var joinResultDf = Adapter.toDf(resultPairRDD, Seq("LocationCode"),
> Seq("UniqueID","EquipmentID","MKT_AREA"), spark)*
>
>
>
> Also, in Databricks, I was having issues with the latest library for
> locationtech.  1.18 was having a conflict with wololo, so I ended up just
> loading 1.16 instead.  That might not be a good solution, but it was the
> only way I could get all the dependencies to load properly on cluster
> restart.  Just wanted to mention that in case anyone else runs into issues
> with Databricks Spark 3.0.1 and Scala 2.12.  Everything seems to be working
> for my use cases with these libraries.
>
>
>
>
>
> Thanks again.  Really appreciate the help and your quick response!
>
>
>
> Jerrod
>
>
>
> *From:* Jia Yu <ji...@apache.org>
> *Sent:* Tuesday, February 23, 2021 1:41 PM
> *To:* dev@sedona.apache.org; Jerrod Wagnon <Je...@jbhunt.com>
> *Subject:* Re: Attribute Columns Question
>
>
>
> *CAUTION:* This email originated from outside of the organization. Do not
> click links or open attachments unless you recognize the sender and know
> the content is safe.
>
>
>
> Hi Jerrod,
>
>
>
> Can you try to use this method:
> https://github.com/apache/incubator-sedona/blob/master/sql/src/test/scala/org/apache/sedona/sql/adapterTestScala.scala#L138
> <https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_incubator-2Dsedona_blob_master_sql_src_test_scala_org_apache_sedona_sql_adapterTestScala.scala-23L138&d=DwMFaQ&c=MsXLK6sQQBUgeD0JbTyYgA&r=9azZCQem_NPy1XE-fJ5d4mblFkmnSjMQGyiXFG3JznU&m=vZEjTroVnQm5j1w7ocpG9We1c_0EKseB4KvDXL9Noek&s=yHlmPPdrz1gjnP-gAHrZPweDJ_CSudJO0S3qRF3er6w&e=>
>
>
>
> Basically, you need to use rdd.fieldnames as input parameters. I think our
> doc missed this part.
>
>
>
> Our Adapter implementation for join query result is here:
> https://github.com/apache/incubator-sedona/blob/master/sql/src/main/scala/org/apache/sedona/sql/utils/Adapter.scala#L130
> <https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_incubator-2Dsedona_blob_master_sql_src_main_scala_org_apache_sedona_sql_utils_Adapter.scala-23L130&d=DwMFaQ&c=MsXLK6sQQBUgeD0JbTyYgA&r=9azZCQem_NPy1XE-fJ5d4mblFkmnSjMQGyiXFG3JznU&m=vZEjTroVnQm5j1w7ocpG9We1c_0EKseB4KvDXL9Noek&s=o7FT3PmYqmPFJrSKmNt4doxdcub3uUq4oaOerIv8xd0&e=>
>
>
>
> Thanks,
>
> Jia
>
>
>
> On Tue, Feb 23, 2021 at 9:13 AM Jerrod Wagnon <Je...@jbhunt.com>
> wrote:
>
> I’m sure this is something simple I’m missing.  Caveat, I’m not a
> developer, but can manage.
>
>
>
> Is there something different that needs to be done in Sedona vs the
> previous Snapshot version for Spark 3.0 to get additional columns to carry
> through in the JoinQuery.SpatialJoinQueryFlat results?  Previously, I just
> passed the columns in with the Adapter.toSpatialRDD and they carried
> through.  Now, I'm just just getting my two Geometry columns when
> converting back to a dataframe.  I've tried passing the left and right
> field names into Adapter.toDf, but that results in an error when displaying
> the resulting dataframe.  I’m using Scala in Databricks.  I’ve read the
> online documentation, but can’t seem to find examples that help in this
> scenario.
>
>
>
> *Sedona:*
>
>
>
>
>
> *Snapshot:*
>
>
>
>
>
> Thanks,
>
>
>
> *Jerrod*
>
> This email contains confidential material for the sole use of the intended
> recipient(s). Any review, use, distribution, or disclosure by others is
> strictly prohibited. If you are not the intended recipient (or authorized
> to receive for the recipient), please contact the sender by reply email and
> delete all copies of this message.
>
> This email contains confidential material for the sole use of the intended
> recipient(s). Any review, use, distribution, or disclosure by others is
> strictly prohibited. If you are not the intended recipient (or authorized
> to receive for the recipient), please contact the sender by reply email and
> delete all copies of this message.
>

Re: Attribute Columns Question

Posted by Jia Yu <ji...@apache.org>.

Hi Jerrod,

Can you try to use this method:
https://github.com/apache/incubator-sedona/blob/master/sql/src/test/scala/org/apache/sedona/sql/adapterTestScala.scala#L138

Basically, you need to use rdd.fieldnames as input parameters. I think our
doc missed this part.

Our Adapter implementation for join query result is here:
https://github.com/apache/incubator-sedona/blob/master/sql/src/main/scala/org/apache/sedona/sql/utils/Adapter.scala#L130

Thanks,
Jia

On Tue, Feb 23, 2021 at 9:13 AM Jerrod Wagnon <Je...@jbhunt.com>
wrote:

> I’m sure this is something simple I’m missing.  Caveat, I’m not a
> developer, but can manage.
>
>
>
> Is there something different that needs to be done in Sedona vs the
> previous Snapshot version for Spark 3.0 to get additional columns to carry
> through in the JoinQuery.SpatialJoinQueryFlat results?  Previously, I just
> passed the columns in with the Adapter.toSpatialRDD and they carried
> through.  Now, I'm just just getting my two Geometry columns when
> converting back to a dataframe.  I've tried passing the left and right
> field names into Adapter.toDf, but that results in an error when displaying
> the resulting dataframe.  I’m using Scala in Databricks.  I’ve read the
> online documentation, but can’t seem to find examples that help in this
> scenario.
>
>
>
> *Sedona:*
>
>
>
>
>
> *Snapshot:*
>
>
>
>
>
> Thanks,
>
>
>
> *Jerrod*
> This email contains confidential material for the sole use of the intended
> recipient(s). Any review, use, distribution, or disclosure by others is
> strictly prohibited. If you are not the intended recipient (or authorized
> to receive for the recipient), please contact the sender by reply email and
> delete all copies of this message.
>