You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@sedona.apache.org by Mehmet Kalich <m....@addland.com> on 2022/07/26 15:19:44 UTC

Sedona on AWS EMR Cluster

Dear Sedona team,

I work at a Geospatial research company in London, and we are trying to install Sedona on an AWS EMR Cluster.

The main issue is that when we add the jars into EMR boostrap steps, we get this error:

[cid:image001.png@01D8A10B.83FD5700]
As a result, the JAR files cannot be opened.

If you could please either write back with a link to articles/support with using Sedona in EMR, that would be greatly appreciated.

Best wishes,

Mehmet Kalich

Platform Engineer
Addland

RE: Sedona on AWS EMR Cluster

Posted by Mehmet Kalich <m....@addland.com>.
Jia, thank you for your helpful reply.

With regards to the “In short, I copy all jars into /usr/lib/spark/jars.” – could you please let me know where exactly this file directory should be? We’ve been trying to place this jars in an S3
Bucket and then copy them over within the file system of one of the EMR nodes on launch. We think this is what is causing us the error.

Many thanks for your help,

Mehmet

From: Jia Yu <ji...@apache.org>
Sent: 29 July 2022 04:03
To: dev@sedona.apache.org; Mehmet Kalich <m....@addland.com>
Subject: Re: Sedona on AWS EMR Cluster

OK. I saw the image now.

Here is a user comment from Sedona Gitter


Hi. I use Sedona on EMR 6.3 without issue. In short, I copy all jars into /usr/lib/spark/jars. The jars include spark_spark-avro_2.12.2.4.4.jar, sedona-python-adapter-3.0_2.12-1.0.1-incubating.jar, sedona-core, sedona-sql, and geotools-wrapper. I also setup a python virtualenv and pip install all dependencies there on all nodes. The last part is to set an EMR Configuration on the core instance group: Classification: spark-env.export, PYSPARK_PYTHON, /home/hadoop/venv/bin/python. That ensures that your spark-submitted jobs use the virtualenv you've created (named venv in this case).

Sedona is configured as a sql extension, so to use it in your spark-submitted app, include --conf spark.sql.extensions="org.apache.sedona.sql.SedonaSqlExtensions". I don't think I did anything else to make it available to submitted apps or to Zeppelin notebooks. It just works.


One more thing. I'm still using Sedona 1.0.1. To use shapefiles I had to keep them zipped to load them. The zip file included the .shp, .shx, .dbf, files.

On Thu, Jul 28, 2022 at 7:50 PM Jia Yu <ji...@apache.org>> wrote:
Hi Mehmet,

The figure in your email is not visible. Can you copy it as text? Many Sedona users are using EMR. Sedona should work fine there.

Thanks,
Jia

On Tue, Jul 26, 2022 at 8:59 AM Mehmet Kalich <m....@addland.com>> wrote:
Dear Sedona team,

I work at a Geospatial research company in London, and we are trying to install Sedona on an AWS EMR Cluster.

The main issue is that when we add the jars into EMR boostrap steps, we get this error:

As a result, the JAR files cannot be opened.

If you could please either write back with a link to articles/support with using Sedona in EMR, that would be greatly appreciated.

Best wishes,

Mehmet Kalich

Platform Engineer
Addland

Re: Sedona on AWS EMR Cluster

Posted by Jia Yu <ji...@apache.org>.
OK. I saw the image now.

Here is a user comment from Sedona Gitter

Hi. I use Sedona on EMR 6.3 without issue. In short, I copy all jars into
/usr/lib/spark/jars. The jars include spark_spark-avro_2.12.2.4.4.jar,
sedona-python-adapter-3.0_2.12-1.0.1-incubating.jar, sedona-core,
sedona-sql, and geotools-wrapper. I also setup a python virtualenv and pip
install all dependencies there on all nodes. The last part is to set an EMR
Configuration on the core instance group: Classification: spark-env.export,
PYSPARK_PYTHON, /home/hadoop/venv/bin/python. That ensures that your
spark-submitted jobs use the virtualenv you've created (named venv in this
case).

Sedona is configured as a sql extension, so to use it in your
spark-submitted app, include --conf
spark.sql.extensions="org.apache.sedona.sql.SedonaSqlExtensions". I don't
think I did anything else to make it available to submitted apps or to
Zeppelin notebooks. It just works.


One more thing. I'm still using Sedona 1.0.1. To use shapefiles I had to
keep them zipped to load them. The zip file included the .shp, .shx, .dbf,
files.

On Thu, Jul 28, 2022 at 7:50 PM Jia Yu <ji...@apache.org> wrote:

> Hi Mehmet,
>
> The figure in your email is not visible. Can you copy it as text? Many
> Sedona users are using EMR. Sedona should work fine there.
>
> Thanks,
> Jia
>
> On Tue, Jul 26, 2022 at 8:59 AM Mehmet Kalich <m....@addland.com>
> wrote:
>
>> Dear Sedona team,
>>
>>
>>
>> I work at a Geospatial research company in London, and we are trying to
>> install Sedona on an AWS EMR Cluster.
>>
>>
>>
>> The main issue is that when we add the jars into EMR boostrap steps, we
>> get this error:
>>
>>
>>
>> As a result, the JAR files cannot be opened.
>>
>>
>>
>> If you could please either write back with a link to articles/support
>> with using Sedona in EMR, that would be greatly appreciated.
>>
>>
>>
>> Best wishes,
>>
>>
>>
>> Mehmet Kalich
>>
>>
>>
>> Platform Engineer
>>
>> Addland
>>
>

Re: Sedona on AWS EMR Cluster

Posted by Jia Yu <ji...@apache.org>.
Hi Mehmet,

The figure in your email is not visible. Can you copy it as text? Many
Sedona users are using EMR. Sedona should work fine there.

Thanks,
Jia

On Tue, Jul 26, 2022 at 8:59 AM Mehmet Kalich <m....@addland.com> wrote:

> Dear Sedona team,
>
>
>
> I work at a Geospatial research company in London, and we are trying to
> install Sedona on an AWS EMR Cluster.
>
>
>
> The main issue is that when we add the jars into EMR boostrap steps, we
> get this error:
>
>
>
> As a result, the JAR files cannot be opened.
>
>
>
> If you could please either write back with a link to articles/support with
> using Sedona in EMR, that would be greatly appreciated.
>
>
>
> Best wishes,
>
>
>
> Mehmet Kalich
>
>
>
> Platform Engineer
>
> Addland
>