You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@sedona.apache.org by Martin Andersson <u....@gmail.com> on 2023/02/28 13:53:04 UTC

Re: How to use raster GeoTiff

Hi again Pedro,

Since https://github.com/apache/sedona/pull/773 got merged you should now
be able to use Apache Sedona for your GeoTiff processing needs. It will be
included in the next Sedona release.

All feedback is welcome!

Br
Martin Andersson


Den mån 23 jan. 2023 kl 10:45 skrev Pedro Mano Fernandes <
pedromorfeu@gmail.com>:

> Hi Martin,
>
> I've tested your proposal (reading binary and UDF getValue) and it works
> fine. I've actually converted the code to Scala easily. Now it's a matter
> of building/optimizing around it (spatial join, aggregate points per
> geotiff).
>
> Best,
>
> On Fri, 20 Jan 2023 at 13:47, Martin Andersson <
> u.martin.andersson@gmail.com> wrote:
>
>> Yes, there are lots of things to consider when processing large blobs in
>> Spark. What I have come to learn:
>>  - Do the spatial join (points and the geotiff extent) with as few
>> columns as possible. Ideally an id only for the geotiff. After that join
>> you can join back the geotiff using the id.
>>  - Aggregate the points to an array of points per geotiff. Your getValue
>> udf should take an array of points and return an array of values. That way
>> each geotiff is only loaded once.
>>  - Parquet in Spark is not very good at handling large blobs. If reading
>> parquet with geotiffs is slow you can repartition() with a very large
>> number to force smaller row groups when writing or use Avro instead.
>> https://www.uber.com/en-SE/blog/hdfs-file-format-apache-spark/
>>
>> Good luck!
>>
>> Br,
>> Martin Andersson
>>
>>
>> Den fre 20 jan. 2023 kl 13:08 skrev Pedro Mano Fernandes <
>> pedromorfeu@gmail.com>:
>>
>>> Thanks Martin, it sounds promising. I'll actually give it a try before
>>> going with geotiff conversions.
>>>
>>> I'm foreseeing some concerns, though:
>>>
>>>    - I'm afraid it won't be optimal for a big geotiff - I may have to
>>>    split the geotiff into smaller geotiffs
>>>    - I wonder how the spatial partitioning optimization will behave in
>>>    such approach - I may have to load smaller geotiffs and use their geometry
>>>    to join (my coordinates against envelope boundaries) before calculating the
>>>    getValue for my coordinates
>>>
>>> Best,
>>>
>>> On Fri, 20 Jan 2023 at 08:49, Martin Andersson <
>>> u.martin.andersson@gmail.com> wrote:
>>>
>>>> I would read the geotiff files as binary:
>>>> https://spark.apache.org/docs/latest/sql-data-sources-binaryFile.html
>>>>
>>>> Then you can define a udf to extract values directly from the geotiffs.
>>>> If you're on python you can use raster.io to do that.
>>>>
>>>> In java it would look some thing like this:
>>>>
>>>>   Integer getValue(byte[] geotiff, double x, double y)
>>>>       throws IOException, TransformException {
>>>>     try (ByteArrayInputStream inputStream = new
>>>> ByteArrayInputStream(geotiff)) {
>>>>       GeoTiffReader geoTiffReader = new GeoTiffReader(inputStream);
>>>>       GridCoverage2D grid = geoTiffReader.read(null);
>>>>       Raster raster = grid.getRenderedImage().getData();
>>>>       GridGeometry2D gridGeometry = grid.getGridGeometry();
>>>>
>>>>       DirectPosition2D directPosition2D = new DirectPosition2D(x, y);
>>>>       GridCoordinates2D gridCoordinates2D =
>>>> gridGeometry.worldToGrid(directPosition2D);
>>>>       try {
>>>>           int[] pixel = raster.getPixel(gridCoordinates2D.x,
>>>> gridCoordinates2D.y, new int[1]);
>>>>           return pixel[0];
>>>>       } catch (ArrayIndexOutOfBoundsException exc) {
>>>>           // point is outside the extentent
>>>>           result.add(null);
>>>>       }
>>>>     }
>>>> }
>>>>
>>>> Br,
>>>> Martin Andersson
>>>>
>>>> Den ons 18 jan. 2023 kl 17:59 skrev Pedro Mano Fernandes <
>>>> pedromorfeu@gmail.com>:
>>>>
>>>>> Thanks for the update, guys.
>>>>>
>>>>> I'm not ready to contribute yet.
>>>>>
>>>>> In the meanwhile, the solution could be perhaps to convert GeoTiff to
>>>>> another format supported by Sedona. If anyone has had this use case before
>>>>> or has any idea, please share.
>>>>>
>>>>> Best,
>>>>>
>>>>> On Wed, 18 Jan 2023 at 09:47, Martin Andersson <
>>>>> u.martin.andersson@gmail.com> wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I think you are looking for something like this:
>>>>>> https://postgis.net/docs/RT_ST_Value.html
>>>>>>
>>>>>> The raster support in Sedona is very limited at the moment. The lack
>>>>>> of a proper raster type makes implementing st_value impossible. We had a
>>>>>> brief discussion about that recently.
>>>>>> https://lists.apache.org/thread/qdfcvxl6z5pb7m7ky5zsksyytyxqwv8c
>>>>>>
>>>>>> If you want to make a contribution and need some guidance, please let
>>>>>> me know!
>>>>>>
>>>>>> Br,
>>>>>> Martin Andersson
>>>>>>
>>>>>> Den ons 18 jan. 2023 kl 05:45 skrev Jia Yu <ji...@apache.org>:
>>>>>>
>>>>>>> Hi Pedro,
>>>>>>>
>>>>>>> I got your point. Unfortunately, we don't have this function yet in
>>>>>>> Sedona.
>>>>>>> But we welcome anyone who want to contribute this to Sedona!
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Jia
>>>>>>>
>>>>>>> On Tue, Jan 17, 2023 at 9:11 AM Pedro Mano Fernandes <
>>>>>>> pedromorfeu@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>> > Hi all,
>>>>>>> >
>>>>>>> > Any clue? Or any documentation I can refer to?
>>>>>>> >
>>>>>>> > Here goes a dummy example to better explain myself: in QGIS I can
>>>>>>> click a
>>>>>>> > point (coordinates) of the geotiff and get the value in that point
>>>>>>> (in this
>>>>>>> > case 231 of Band 1).
>>>>>>> >
>>>>>>> > [image: image.png]
>>>>>>> >
>>>>>>> > Thanks,
>>>>>>> >
>>>>>>> > On Sun, 15 Jan 2023 at 16:17, Pedro Mano Fernandes <
>>>>>>> pedromorfeu@gmail.com>
>>>>>>> > wrote:
>>>>>>> >
>>>>>>> >> Hi Jia,
>>>>>>> >>
>>>>>>> >> Thanks for the fast response.
>>>>>>> >>
>>>>>>> >> With the regular spatial join I’ll get the array of data of the
>>>>>>> whole
>>>>>>> >> geotiff polygon. I was hoping to get the data element for specific
>>>>>>> >> coordinates inside that polygon. In other words: I guess the
>>>>>>> array of data
>>>>>>> >> corresponds to all the positions in the polygon, but I want to
>>>>>>> fetch
>>>>>>> >> specific positions.
>>>>>>> >>
>>>>>>> >> Thanks,
>>>>>>> >>
>>>>>>> >> On Sun, 15 Jan 2023 at 01:09, Jia Yu <ji...@apache.org> wrote:
>>>>>>> >>
>>>>>>> >>> Hi Pedro,
>>>>>>> >>>
>>>>>>> >>> Once you use Sedona geotiff reader to read those geotiffs, you
>>>>>>> will get
>>>>>>> >>> a dataframe with the following schema:
>>>>>>> >>>
>>>>>>> >>>  |-- image: struct (nullable = true)
>>>>>>> >>>  |    |-- origin: string (nullable = true)
>>>>>>> >>>  |    |-- Geometry: string (nullable = true)
>>>>>>> >>>  |    |-- height: integer (nullable = true)
>>>>>>> >>>  |    |-- width: integer (nullable = true)
>>>>>>> >>>  |    |-- nBands: integer (nullable = true)
>>>>>>> >>>  |    |-- data: array (nullable = true)
>>>>>>> >>>  |    |    |-- element: double (containsNull = true)
>>>>>>> >>>
>>>>>>> >>>
>>>>>>> >>> You can use the following way to fetch the geometry column and
>>>>>>> perform
>>>>>>> >>> the spatial join;
>>>>>>> >>>
>>>>>>> >>> geotiffDF = geotiffDF.selectExpr("image.origin as
>>>>>>> >>> origin","ST_GeomFromWkt(image.geometry) as Geom", "image.height
>>>>>>> as height",
>>>>>>> >>> "image.width as width", "image.data as data", "image.nBands as
>>>>>>> bands")
>>>>>>> >>> geotiffDF.createOrReplaceTempView("GeotiffDataframe")
>>>>>>> >>> geotiffDF.show()
>>>>>>> >>>
>>>>>>> >>> More info can be found:
>>>>>>> >>>
>>>>>>> https://sedona.apache.org/1.3.1-incubating/api/sql/Raster-loader/#geotiff-dataframe-loader
>>>>>>> >>>
>>>>>>> >>> Thanks,
>>>>>>> >>> Jia
>>>>>>> >>>
>>>>>>> >>> On Sat, Jan 14, 2023 at 9:10 AM Pedro Mano Fernandes <
>>>>>>> >>> pedromorfeu@gmail.com> wrote:
>>>>>>> >>>
>>>>>>> >>>> Hi everyone!
>>>>>>> >>>>
>>>>>>> >>>> I'm trying to use elevation data in GeoTiff format. I
>>>>>>> understand how to
>>>>>>> >>>> load the dataset, as described in
>>>>>>> >>>>
>>>>>>> >>>>
>>>>>>> https://sedona.staged.apache.org/api/sql/Raster-loader/#geotiff-dataframe-loader
>>>>>>> >>>> .
>>>>>>> >>>> Now I'm wondering how to join this dataframe with another one
>>>>>>> that
>>>>>>> >>>> contains
>>>>>>> >>>> coordinates, in order to get the elevation data for those
>>>>>>> coordinates.
>>>>>>> >>>>
>>>>>>> >>>> Something along these lines:
>>>>>>> >>>>
>>>>>>> >>>> pointsDF
>>>>>>> >>>>   .join(geotiffDF, ...)
>>>>>>> >>>>   .select("lon", "lat", "geotiff_data")
>>>>>>> >>>>
>>>>>>> >>>> Are there any examples or documentation I can follow to
>>>>>>> accomplish this?
>>>>>>> >>>>
>>>>>>> >>>> Thanks,
>>>>>>> >>>>
>>>>>>> >>>> --
>>>>>>> >>>> Pedro Mano Fernandes
>>>>>>> >>>>
>>>>>>> >>> --
>>>>>>> >> Pedro Mano Fernandes
>>>>>>> >>
>>>>>>> >
>>>>>>> >
>>>>>>> > --
>>>>>>> > Pedro Mano Fernandes
>>>>>>> >
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Hälsningar,
>>>>>> Martin
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Pedro Mano Fernandes
>>>>>
>>>>
>>>
>>> --
>>> Pedro Mano Fernandes
>>>
>>
>
> --
> Pedro Mano Fernandes
>

Re: How to use raster GeoTiff

Posted by Pedro Mano Fernandes <pe...@gmail.com>.
Thanks Jia and Martin,

I didn't notice I was using the staged version. My bad!

Regards,

On Wed, 22 Mar 2023 at 08:55, Martin Andersson <u....@gmail.com>
wrote:

> Thanks for trying it out Pedro!
>
> Unfortunately I found a bug in ST_Values. But there is also a workaround.
> I'm working on a fix.
>
> https://issues.apache.org/jira/browse/SEDONA-266
>
> Br,
> Martin Andersson
>
> Den tis 21 mars 2023 kl 18:39 skrev Jia Yu <ji...@apache.org>:
>
>> Hi Pedro,
>>
>> You should use sedona.apache.org instead of sedona.staged.apache.org.
>> `staged` website is for us to test the website template. We haven't
>> been updating that website for more than 1 year.
>>
>> Here is the doc for Martin's RasterUDT:
>> https://sedona.apache.org/1.4.0/api/sql/Raster-loader/
>>
>> Thanks,
>> Jia
>>
>> On Tue, Mar 21, 2023 at 8:30 AM Pedro Mano Fernandes
>> <pe...@gmail.com> wrote:
>> >
>> > Hi Martin,
>> >
>> > It's weird I don't see your new Raster features in the docs in
>> > https://sedona.staged.apache.org/api/sql/Raster-loader/. I thought the
>> > documentation was already up-to-date after the release of sedona-1.4.0.
>> >
>> > Best regards,
>> >
>> > On Wed, 1 Mar 2023 at 10:29, Pedro Mano Fernandes <
>> pedromorfeu@gmail.com>
>> > wrote:
>> >
>> > > Hi Martin,
>> > >
>> > > Great news! I'll give it a go and will let you know.
>> > >
>> > > Thanks for letting me know.
>> > > Best regards,
>> > >
>> > > On Tue, 28 Feb 2023 at 14:53, Martin Andersson <
>> > > u.martin.andersson@gmail.com> wrote:
>> > >
>> > >> Hi again Pedro,
>> > >>
>> > >> Since https://github.com/apache/sedona/pull/773 got merged you
>> should
>> > >> now be able to use Apache Sedona for your GeoTiff processing needs.
>> It will
>> > >> be included in the next Sedona release.
>> > >>
>> > >> All feedback is welcome!
>> > >>
>> > >> Br
>> > >> Martin Andersson
>> > >>
>> > >>
>> > >> Den mån 23 jan. 2023 kl 10:45 skrev Pedro Mano Fernandes <
>> > >> pedromorfeu@gmail.com>:
>> > >>
>> > >>> Hi Martin,
>> > >>>
>> > >>> I've tested your proposal (reading binary and UDF getValue) and it
>> works
>> > >>> fine. I've actually converted the code to Scala easily. Now it's a
>> matter
>> > >>> of building/optimizing around it (spatial join, aggregate points per
>> > >>> geotiff).
>> > >>>
>> > >>> Best,
>> > >>>
>> > >>> On Fri, 20 Jan 2023 at 13:47, Martin Andersson <
>> > >>> u.martin.andersson@gmail.com> wrote:
>> > >>>
>> > >>>> Yes, there are lots of things to consider when processing large
>> blobs
>> > >>>> in Spark. What I have come to learn:
>> > >>>>  - Do the spatial join (points and the geotiff extent) with as few
>> > >>>> columns as possible. Ideally an id only for the geotiff. After
>> that join
>> > >>>> you can join back the geotiff using the id.
>> > >>>>  - Aggregate the points to an array of points per geotiff. Your
>> > >>>> getValue udf should take an array of points and return an array of
>> values.
>> > >>>> That way each geotiff is only loaded once.
>> > >>>>  - Parquet in Spark is not very good at handling large blobs. If
>> > >>>> reading parquet with geotiffs is slow you can repartition() with a
>> very
>> > >>>> large number to force smaller row groups when writing or use Avro
>> instead.
>> > >>>> https://www.uber.com/en-SE/blog/hdfs-file-format-apache-spark/
>> > >>>>
>> > >>>> Good luck!
>> > >>>>
>> > >>>> Br,
>> > >>>> Martin Andersson
>> > >>>>
>> > >>>>
>> > >>>> Den fre 20 jan. 2023 kl 13:08 skrev Pedro Mano Fernandes <
>> > >>>> pedromorfeu@gmail.com>:
>> > >>>>
>> > >>>>> Thanks Martin, it sounds promising. I'll actually give it a try
>> before
>> > >>>>> going with geotiff conversions.
>> > >>>>>
>> > >>>>> I'm foreseeing some concerns, though:
>> > >>>>>
>> > >>>>>    - I'm afraid it won't be optimal for a big geotiff - I may
>> have to
>> > >>>>>    split the geotiff into smaller geotiffs
>> > >>>>>    - I wonder how the spatial partitioning optimization will
>> behave
>> > >>>>>    in such approach - I may have to load smaller geotiffs and use
>> their
>> > >>>>>    geometry to join (my coordinates against envelope boundaries)
>> before
>> > >>>>>    calculating the getValue for my coordinates
>> > >>>>>
>> > >>>>> Best,
>> > >>>>>
>> > >>>>> On Fri, 20 Jan 2023 at 08:49, Martin Andersson <
>> > >>>>> u.martin.andersson@gmail.com> wrote:
>> > >>>>>
>> > >>>>>> I would read the geotiff files as binary:
>> > >>>>>>
>> https://spark.apache.org/docs/latest/sql-data-sources-binaryFile.html
>> > >>>>>>
>> > >>>>>> Then you can define a udf to extract values directly from the
>> > >>>>>> geotiffs. If you're on python you can use raster.io to do that.
>> > >>>>>>
>> > >>>>>> In java it would look some thing like this:
>> > >>>>>>
>> > >>>>>>   Integer getValue(byte[] geotiff, double x, double y)
>> > >>>>>>       throws IOException, TransformException {
>> > >>>>>>     try (ByteArrayInputStream inputStream = new
>> > >>>>>> ByteArrayInputStream(geotiff)) {
>> > >>>>>>       GeoTiffReader geoTiffReader = new
>> GeoTiffReader(inputStream);
>> > >>>>>>       GridCoverage2D grid = geoTiffReader.read(null);
>> > >>>>>>       Raster raster = grid.getRenderedImage().getData();
>> > >>>>>>       GridGeometry2D gridGeometry = grid.getGridGeometry();
>> > >>>>>>
>> > >>>>>>       DirectPosition2D directPosition2D = new
>> DirectPosition2D(x, y);
>> > >>>>>>       GridCoordinates2D gridCoordinates2D =
>> > >>>>>> gridGeometry.worldToGrid(directPosition2D);
>> > >>>>>>       try {
>> > >>>>>>           int[] pixel = raster.getPixel(gridCoordinates2D.x,
>> > >>>>>> gridCoordinates2D.y, new int[1]);
>> > >>>>>>           return pixel[0];
>> > >>>>>>       } catch (ArrayIndexOutOfBoundsException exc) {
>> > >>>>>>           // point is outside the extentent
>> > >>>>>>           result.add(null);
>> > >>>>>>       }
>> > >>>>>>     }
>> > >>>>>> }
>> > >>>>>>
>> > >>>>>> Br,
>> > >>>>>> Martin Andersson
>> > >>>>>>
>> > >>>>>> Den ons 18 jan. 2023 kl 17:59 skrev Pedro Mano Fernandes <
>> > >>>>>> pedromorfeu@gmail.com>:
>> > >>>>>>
>> > >>>>>>> Thanks for the update, guys.
>> > >>>>>>>
>> > >>>>>>> I'm not ready to contribute yet.
>> > >>>>>>>
>> > >>>>>>> In the meanwhile, the solution could be perhaps to convert
>> GeoTiff
>> > >>>>>>> to another format supported by Sedona. If anyone has had this
>> use case
>> > >>>>>>> before or has any idea, please share.
>> > >>>>>>>
>> > >>>>>>> Best,
>> > >>>>>>>
>> > >>>>>>> On Wed, 18 Jan 2023 at 09:47, Martin Andersson <
>> > >>>>>>> u.martin.andersson@gmail.com> wrote:
>> > >>>>>>>
>> > >>>>>>>> Hi,
>> > >>>>>>>>
>> > >>>>>>>> I think you are looking for something like this:
>> > >>>>>>>> https://postgis.net/docs/RT_ST_Value.html
>> > >>>>>>>>
>> > >>>>>>>> The raster support in Sedona is very limited at the moment. The
>> > >>>>>>>> lack of a proper raster type makes implementing st_value
>> impossible. We had
>> > >>>>>>>> a brief discussion about that recently.
>> > >>>>>>>>
>> https://lists.apache.org/thread/qdfcvxl6z5pb7m7ky5zsksyytyxqwv8c
>> > >>>>>>>>
>> > >>>>>>>> If you want to make a contribution and need some guidance,
>> please
>> > >>>>>>>> let me know!
>> > >>>>>>>>
>> > >>>>>>>> Br,
>> > >>>>>>>> Martin Andersson
>> > >>>>>>>>
>> > >>>>>>>> Den ons 18 jan. 2023 kl 05:45 skrev Jia Yu <ji...@apache.org>:
>> > >>>>>>>>
>> > >>>>>>>>> Hi Pedro,
>> > >>>>>>>>>
>> > >>>>>>>>> I got your point. Unfortunately, we don't have this function
>> yet
>> > >>>>>>>>> in Sedona.
>> > >>>>>>>>> But we welcome anyone who want to contribute this to Sedona!
>> > >>>>>>>>>
>> > >>>>>>>>> Thanks,
>> > >>>>>>>>> Jia
>> > >>>>>>>>>
>> > >>>>>>>>> On Tue, Jan 17, 2023 at 9:11 AM Pedro Mano Fernandes <
>> > >>>>>>>>> pedromorfeu@gmail.com>
>> > >>>>>>>>> wrote:
>> > >>>>>>>>>
>> > >>>>>>>>> > Hi all,
>> > >>>>>>>>> >
>> > >>>>>>>>> > Any clue? Or any documentation I can refer to?
>> > >>>>>>>>> >
>> > >>>>>>>>> > Here goes a dummy example to better explain myself: in QGIS
>> I
>> > >>>>>>>>> can click a
>> > >>>>>>>>> > point (coordinates) of the geotiff and get the value in that
>> > >>>>>>>>> point (in this
>> > >>>>>>>>> > case 231 of Band 1).
>> > >>>>>>>>> >
>> > >>>>>>>>> > [image: image.png]
>> > >>>>>>>>> >
>> > >>>>>>>>> > Thanks,
>> > >>>>>>>>> >
>> > >>>>>>>>> > On Sun, 15 Jan 2023 at 16:17, Pedro Mano Fernandes <
>> > >>>>>>>>> pedromorfeu@gmail.com>
>> > >>>>>>>>> > wrote:
>> > >>>>>>>>> >
>> > >>>>>>>>> >> Hi Jia,
>> > >>>>>>>>> >>
>> > >>>>>>>>> >> Thanks for the fast response.
>> > >>>>>>>>> >>
>> > >>>>>>>>> >> With the regular spatial join I’ll get the array of data
>> of the
>> > >>>>>>>>> whole
>> > >>>>>>>>> >> geotiff polygon. I was hoping to get the data element for
>> > >>>>>>>>> specific
>> > >>>>>>>>> >> coordinates inside that polygon. In other words: I guess
>> the
>> > >>>>>>>>> array of data
>> > >>>>>>>>> >> corresponds to all the positions in the polygon, but I
>> want to
>> > >>>>>>>>> fetch
>> > >>>>>>>>> >> specific positions.
>> > >>>>>>>>> >>
>> > >>>>>>>>> >> Thanks,
>> > >>>>>>>>> >>
>> > >>>>>>>>> >> On Sun, 15 Jan 2023 at 01:09, Jia Yu <ji...@apache.org>
>> wrote:
>> > >>>>>>>>> >>
>> > >>>>>>>>> >>> Hi Pedro,
>> > >>>>>>>>> >>>
>> > >>>>>>>>> >>> Once you use Sedona geotiff reader to read those
>> geotiffs, you
>> > >>>>>>>>> will get
>> > >>>>>>>>> >>> a dataframe with the following schema:
>> > >>>>>>>>> >>>
>> > >>>>>>>>> >>>  |-- image: struct (nullable = true)
>> > >>>>>>>>> >>>  |    |-- origin: string (nullable = true)
>> > >>>>>>>>> >>>  |    |-- Geometry: string (nullable = true)
>> > >>>>>>>>> >>>  |    |-- height: integer (nullable = true)
>> > >>>>>>>>> >>>  |    |-- width: integer (nullable = true)
>> > >>>>>>>>> >>>  |    |-- nBands: integer (nullable = true)
>> > >>>>>>>>> >>>  |    |-- data: array (nullable = true)
>> > >>>>>>>>> >>>  |    |    |-- element: double (containsNull = true)
>> > >>>>>>>>> >>>
>> > >>>>>>>>> >>>
>> > >>>>>>>>> >>> You can use the following way to fetch the geometry
>> column and
>> > >>>>>>>>> perform
>> > >>>>>>>>> >>> the spatial join;
>> > >>>>>>>>> >>>
>> > >>>>>>>>> >>> geotiffDF = geotiffDF.selectExpr("image.origin as
>> > >>>>>>>>> >>> origin","ST_GeomFromWkt(image.geometry) as Geom",
>> > >>>>>>>>> "image.height as height",
>> > >>>>>>>>> >>> "image.width as width", "image.data as data",
>> "image.nBands as
>> > >>>>>>>>> bands")
>> > >>>>>>>>> >>> geotiffDF.createOrReplaceTempView("GeotiffDataframe")
>> > >>>>>>>>> >>> geotiffDF.show()
>> > >>>>>>>>> >>>
>> > >>>>>>>>> >>> More info can be found:
>> > >>>>>>>>> >>>
>> > >>>>>>>>>
>> https://sedona.apache.org/1.3.1-incubating/api/sql/Raster-loader/#geotiff-dataframe-loader
>> > >>>>>>>>> >>>
>> > >>>>>>>>> >>> Thanks,
>> > >>>>>>>>> >>> Jia
>> > >>>>>>>>> >>>
>> > >>>>>>>>> >>> On Sat, Jan 14, 2023 at 9:10 AM Pedro Mano Fernandes <
>> > >>>>>>>>> >>> pedromorfeu@gmail.com> wrote:
>> > >>>>>>>>> >>>
>> > >>>>>>>>> >>>> Hi everyone!
>> > >>>>>>>>> >>>>
>> > >>>>>>>>> >>>> I'm trying to use elevation data in GeoTiff format. I
>> > >>>>>>>>> understand how to
>> > >>>>>>>>> >>>> load the dataset, as described in
>> > >>>>>>>>> >>>>
>> > >>>>>>>>> >>>>
>> > >>>>>>>>>
>> https://sedona.staged.apache.org/api/sql/Raster-loader/#geotiff-dataframe-loader
>> > >>>>>>>>> >>>> .
>> > >>>>>>>>> >>>> Now I'm wondering how to join this dataframe with
>> another one
>> > >>>>>>>>> that
>> > >>>>>>>>> >>>> contains
>> > >>>>>>>>> >>>> coordinates, in order to get the elevation data for those
>> > >>>>>>>>> coordinates.
>> > >>>>>>>>> >>>>
>> > >>>>>>>>> >>>> Something along these lines:
>> > >>>>>>>>> >>>>
>> > >>>>>>>>> >>>> pointsDF
>> > >>>>>>>>> >>>>   .join(geotiffDF, ...)
>> > >>>>>>>>> >>>>   .select("lon", "lat", "geotiff_data")
>> > >>>>>>>>> >>>>
>> > >>>>>>>>> >>>> Are there any examples or documentation I can follow to
>> > >>>>>>>>> accomplish this?
>> > >>>>>>>>> >>>>
>> > >>>>>>>>> >>>> Thanks,
>> > >>>>>>>>> >>>>
>> > >>>>>>>>> >>>> --
>> > >>>>>>>>> >>>> Pedro Mano Fernandes
>> > >>>>>>>>> >>>>
>> > >>>>>>>>> >>> --
>> > >>>>>>>>> >> Pedro Mano Fernandes
>> > >>>>>>>>> >>
>> > >>>>>>>>> >
>> > >>>>>>>>> >
>> > >>>>>>>>> > --
>> > >>>>>>>>> > Pedro Mano Fernandes
>> > >>>>>>>>> >
>> > >>>>>>>>>
>> > >>>>>>>>
>> > >>>>>>>>
>> > >>>>>>>> --
>> > >>>>>>>> Hälsningar,
>> > >>>>>>>> Martin
>> > >>>>>>>>
>> > >>>>>>>
>> > >>>>>>>
>> > >>>>>>> --
>> > >>>>>>> Pedro Mano Fernandes
>> > >>>>>>>
>> > >>>>>>
>> > >>>>>
>> > >>>>> --
>> > >>>>> Pedro Mano Fernandes
>> > >>>>>
>> > >>>>
>> > >>>
>> > >>> --
>> > >>> Pedro Mano Fernandes
>> > >>>
>> > >>
>> > >
>> > > --
>> > > Pedro Mano Fernandes
>> > >
>> >
>> >
>> > --
>> > Pedro Mano Fernandes
>>
>

-- 
Pedro Mano Fernandes

Re: How to use raster GeoTiff

Posted by Martin Andersson <u....@gmail.com>.
Thanks for trying it out Pedro!

Unfortunately I found a bug in ST_Values. But there is also a workaround.
I'm working on a fix.

https://issues.apache.org/jira/browse/SEDONA-266

Br,
Martin Andersson

Den tis 21 mars 2023 kl 18:39 skrev Jia Yu <ji...@apache.org>:

> Hi Pedro,
>
> You should use sedona.apache.org instead of sedona.staged.apache.org.
> `staged` website is for us to test the website template. We haven't
> been updating that website for more than 1 year.
>
> Here is the doc for Martin's RasterUDT:
> https://sedona.apache.org/1.4.0/api/sql/Raster-loader/
>
> Thanks,
> Jia
>
> On Tue, Mar 21, 2023 at 8:30 AM Pedro Mano Fernandes
> <pe...@gmail.com> wrote:
> >
> > Hi Martin,
> >
> > It's weird I don't see your new Raster features in the docs in
> > https://sedona.staged.apache.org/api/sql/Raster-loader/. I thought the
> > documentation was already up-to-date after the release of sedona-1.4.0.
> >
> > Best regards,
> >
> > On Wed, 1 Mar 2023 at 10:29, Pedro Mano Fernandes <pedromorfeu@gmail.com
> >
> > wrote:
> >
> > > Hi Martin,
> > >
> > > Great news! I'll give it a go and will let you know.
> > >
> > > Thanks for letting me know.
> > > Best regards,
> > >
> > > On Tue, 28 Feb 2023 at 14:53, Martin Andersson <
> > > u.martin.andersson@gmail.com> wrote:
> > >
> > >> Hi again Pedro,
> > >>
> > >> Since https://github.com/apache/sedona/pull/773 got merged you should
> > >> now be able to use Apache Sedona for your GeoTiff processing needs.
> It will
> > >> be included in the next Sedona release.
> > >>
> > >> All feedback is welcome!
> > >>
> > >> Br
> > >> Martin Andersson
> > >>
> > >>
> > >> Den mån 23 jan. 2023 kl 10:45 skrev Pedro Mano Fernandes <
> > >> pedromorfeu@gmail.com>:
> > >>
> > >>> Hi Martin,
> > >>>
> > >>> I've tested your proposal (reading binary and UDF getValue) and it
> works
> > >>> fine. I've actually converted the code to Scala easily. Now it's a
> matter
> > >>> of building/optimizing around it (spatial join, aggregate points per
> > >>> geotiff).
> > >>>
> > >>> Best,
> > >>>
> > >>> On Fri, 20 Jan 2023 at 13:47, Martin Andersson <
> > >>> u.martin.andersson@gmail.com> wrote:
> > >>>
> > >>>> Yes, there are lots of things to consider when processing large
> blobs
> > >>>> in Spark. What I have come to learn:
> > >>>>  - Do the spatial join (points and the geotiff extent) with as few
> > >>>> columns as possible. Ideally an id only for the geotiff. After that
> join
> > >>>> you can join back the geotiff using the id.
> > >>>>  - Aggregate the points to an array of points per geotiff. Your
> > >>>> getValue udf should take an array of points and return an array of
> values.
> > >>>> That way each geotiff is only loaded once.
> > >>>>  - Parquet in Spark is not very good at handling large blobs. If
> > >>>> reading parquet with geotiffs is slow you can repartition() with a
> very
> > >>>> large number to force smaller row groups when writing or use Avro
> instead.
> > >>>> https://www.uber.com/en-SE/blog/hdfs-file-format-apache-spark/
> > >>>>
> > >>>> Good luck!
> > >>>>
> > >>>> Br,
> > >>>> Martin Andersson
> > >>>>
> > >>>>
> > >>>> Den fre 20 jan. 2023 kl 13:08 skrev Pedro Mano Fernandes <
> > >>>> pedromorfeu@gmail.com>:
> > >>>>
> > >>>>> Thanks Martin, it sounds promising. I'll actually give it a try
> before
> > >>>>> going with geotiff conversions.
> > >>>>>
> > >>>>> I'm foreseeing some concerns, though:
> > >>>>>
> > >>>>>    - I'm afraid it won't be optimal for a big geotiff - I may have
> to
> > >>>>>    split the geotiff into smaller geotiffs
> > >>>>>    - I wonder how the spatial partitioning optimization will behave
> > >>>>>    in such approach - I may have to load smaller geotiffs and use
> their
> > >>>>>    geometry to join (my coordinates against envelope boundaries)
> before
> > >>>>>    calculating the getValue for my coordinates
> > >>>>>
> > >>>>> Best,
> > >>>>>
> > >>>>> On Fri, 20 Jan 2023 at 08:49, Martin Andersson <
> > >>>>> u.martin.andersson@gmail.com> wrote:
> > >>>>>
> > >>>>>> I would read the geotiff files as binary:
> > >>>>>>
> https://spark.apache.org/docs/latest/sql-data-sources-binaryFile.html
> > >>>>>>
> > >>>>>> Then you can define a udf to extract values directly from the
> > >>>>>> geotiffs. If you're on python you can use raster.io to do that.
> > >>>>>>
> > >>>>>> In java it would look some thing like this:
> > >>>>>>
> > >>>>>>   Integer getValue(byte[] geotiff, double x, double y)
> > >>>>>>       throws IOException, TransformException {
> > >>>>>>     try (ByteArrayInputStream inputStream = new
> > >>>>>> ByteArrayInputStream(geotiff)) {
> > >>>>>>       GeoTiffReader geoTiffReader = new
> GeoTiffReader(inputStream);
> > >>>>>>       GridCoverage2D grid = geoTiffReader.read(null);
> > >>>>>>       Raster raster = grid.getRenderedImage().getData();
> > >>>>>>       GridGeometry2D gridGeometry = grid.getGridGeometry();
> > >>>>>>
> > >>>>>>       DirectPosition2D directPosition2D = new DirectPosition2D(x,
> y);
> > >>>>>>       GridCoordinates2D gridCoordinates2D =
> > >>>>>> gridGeometry.worldToGrid(directPosition2D);
> > >>>>>>       try {
> > >>>>>>           int[] pixel = raster.getPixel(gridCoordinates2D.x,
> > >>>>>> gridCoordinates2D.y, new int[1]);
> > >>>>>>           return pixel[0];
> > >>>>>>       } catch (ArrayIndexOutOfBoundsException exc) {
> > >>>>>>           // point is outside the extentent
> > >>>>>>           result.add(null);
> > >>>>>>       }
> > >>>>>>     }
> > >>>>>> }
> > >>>>>>
> > >>>>>> Br,
> > >>>>>> Martin Andersson
> > >>>>>>
> > >>>>>> Den ons 18 jan. 2023 kl 17:59 skrev Pedro Mano Fernandes <
> > >>>>>> pedromorfeu@gmail.com>:
> > >>>>>>
> > >>>>>>> Thanks for the update, guys.
> > >>>>>>>
> > >>>>>>> I'm not ready to contribute yet.
> > >>>>>>>
> > >>>>>>> In the meanwhile, the solution could be perhaps to convert
> GeoTiff
> > >>>>>>> to another format supported by Sedona. If anyone has had this
> use case
> > >>>>>>> before or has any idea, please share.
> > >>>>>>>
> > >>>>>>> Best,
> > >>>>>>>
> > >>>>>>> On Wed, 18 Jan 2023 at 09:47, Martin Andersson <
> > >>>>>>> u.martin.andersson@gmail.com> wrote:
> > >>>>>>>
> > >>>>>>>> Hi,
> > >>>>>>>>
> > >>>>>>>> I think you are looking for something like this:
> > >>>>>>>> https://postgis.net/docs/RT_ST_Value.html
> > >>>>>>>>
> > >>>>>>>> The raster support in Sedona is very limited at the moment. The
> > >>>>>>>> lack of a proper raster type makes implementing st_value
> impossible. We had
> > >>>>>>>> a brief discussion about that recently.
> > >>>>>>>>
> https://lists.apache.org/thread/qdfcvxl6z5pb7m7ky5zsksyytyxqwv8c
> > >>>>>>>>
> > >>>>>>>> If you want to make a contribution and need some guidance,
> please
> > >>>>>>>> let me know!
> > >>>>>>>>
> > >>>>>>>> Br,
> > >>>>>>>> Martin Andersson
> > >>>>>>>>
> > >>>>>>>> Den ons 18 jan. 2023 kl 05:45 skrev Jia Yu <ji...@apache.org>:
> > >>>>>>>>
> > >>>>>>>>> Hi Pedro,
> > >>>>>>>>>
> > >>>>>>>>> I got your point. Unfortunately, we don't have this function
> yet
> > >>>>>>>>> in Sedona.
> > >>>>>>>>> But we welcome anyone who want to contribute this to Sedona!
> > >>>>>>>>>
> > >>>>>>>>> Thanks,
> > >>>>>>>>> Jia
> > >>>>>>>>>
> > >>>>>>>>> On Tue, Jan 17, 2023 at 9:11 AM Pedro Mano Fernandes <
> > >>>>>>>>> pedromorfeu@gmail.com>
> > >>>>>>>>> wrote:
> > >>>>>>>>>
> > >>>>>>>>> > Hi all,
> > >>>>>>>>> >
> > >>>>>>>>> > Any clue? Or any documentation I can refer to?
> > >>>>>>>>> >
> > >>>>>>>>> > Here goes a dummy example to better explain myself: in QGIS I
> > >>>>>>>>> can click a
> > >>>>>>>>> > point (coordinates) of the geotiff and get the value in that
> > >>>>>>>>> point (in this
> > >>>>>>>>> > case 231 of Band 1).
> > >>>>>>>>> >
> > >>>>>>>>> > [image: image.png]
> > >>>>>>>>> >
> > >>>>>>>>> > Thanks,
> > >>>>>>>>> >
> > >>>>>>>>> > On Sun, 15 Jan 2023 at 16:17, Pedro Mano Fernandes <
> > >>>>>>>>> pedromorfeu@gmail.com>
> > >>>>>>>>> > wrote:
> > >>>>>>>>> >
> > >>>>>>>>> >> Hi Jia,
> > >>>>>>>>> >>
> > >>>>>>>>> >> Thanks for the fast response.
> > >>>>>>>>> >>
> > >>>>>>>>> >> With the regular spatial join I’ll get the array of data of
> the
> > >>>>>>>>> whole
> > >>>>>>>>> >> geotiff polygon. I was hoping to get the data element for
> > >>>>>>>>> specific
> > >>>>>>>>> >> coordinates inside that polygon. In other words: I guess the
> > >>>>>>>>> array of data
> > >>>>>>>>> >> corresponds to all the positions in the polygon, but I want
> to
> > >>>>>>>>> fetch
> > >>>>>>>>> >> specific positions.
> > >>>>>>>>> >>
> > >>>>>>>>> >> Thanks,
> > >>>>>>>>> >>
> > >>>>>>>>> >> On Sun, 15 Jan 2023 at 01:09, Jia Yu <ji...@apache.org>
> wrote:
> > >>>>>>>>> >>
> > >>>>>>>>> >>> Hi Pedro,
> > >>>>>>>>> >>>
> > >>>>>>>>> >>> Once you use Sedona geotiff reader to read those geotiffs,
> you
> > >>>>>>>>> will get
> > >>>>>>>>> >>> a dataframe with the following schema:
> > >>>>>>>>> >>>
> > >>>>>>>>> >>>  |-- image: struct (nullable = true)
> > >>>>>>>>> >>>  |    |-- origin: string (nullable = true)
> > >>>>>>>>> >>>  |    |-- Geometry: string (nullable = true)
> > >>>>>>>>> >>>  |    |-- height: integer (nullable = true)
> > >>>>>>>>> >>>  |    |-- width: integer (nullable = true)
> > >>>>>>>>> >>>  |    |-- nBands: integer (nullable = true)
> > >>>>>>>>> >>>  |    |-- data: array (nullable = true)
> > >>>>>>>>> >>>  |    |    |-- element: double (containsNull = true)
> > >>>>>>>>> >>>
> > >>>>>>>>> >>>
> > >>>>>>>>> >>> You can use the following way to fetch the geometry column
> and
> > >>>>>>>>> perform
> > >>>>>>>>> >>> the spatial join;
> > >>>>>>>>> >>>
> > >>>>>>>>> >>> geotiffDF = geotiffDF.selectExpr("image.origin as
> > >>>>>>>>> >>> origin","ST_GeomFromWkt(image.geometry) as Geom",
> > >>>>>>>>> "image.height as height",
> > >>>>>>>>> >>> "image.width as width", "image.data as data",
> "image.nBands as
> > >>>>>>>>> bands")
> > >>>>>>>>> >>> geotiffDF.createOrReplaceTempView("GeotiffDataframe")
> > >>>>>>>>> >>> geotiffDF.show()
> > >>>>>>>>> >>>
> > >>>>>>>>> >>> More info can be found:
> > >>>>>>>>> >>>
> > >>>>>>>>>
> https://sedona.apache.org/1.3.1-incubating/api/sql/Raster-loader/#geotiff-dataframe-loader
> > >>>>>>>>> >>>
> > >>>>>>>>> >>> Thanks,
> > >>>>>>>>> >>> Jia
> > >>>>>>>>> >>>
> > >>>>>>>>> >>> On Sat, Jan 14, 2023 at 9:10 AM Pedro Mano Fernandes <
> > >>>>>>>>> >>> pedromorfeu@gmail.com> wrote:
> > >>>>>>>>> >>>
> > >>>>>>>>> >>>> Hi everyone!
> > >>>>>>>>> >>>>
> > >>>>>>>>> >>>> I'm trying to use elevation data in GeoTiff format. I
> > >>>>>>>>> understand how to
> > >>>>>>>>> >>>> load the dataset, as described in
> > >>>>>>>>> >>>>
> > >>>>>>>>> >>>>
> > >>>>>>>>>
> https://sedona.staged.apache.org/api/sql/Raster-loader/#geotiff-dataframe-loader
> > >>>>>>>>> >>>> .
> > >>>>>>>>> >>>> Now I'm wondering how to join this dataframe with another
> one
> > >>>>>>>>> that
> > >>>>>>>>> >>>> contains
> > >>>>>>>>> >>>> coordinates, in order to get the elevation data for those
> > >>>>>>>>> coordinates.
> > >>>>>>>>> >>>>
> > >>>>>>>>> >>>> Something along these lines:
> > >>>>>>>>> >>>>
> > >>>>>>>>> >>>> pointsDF
> > >>>>>>>>> >>>>   .join(geotiffDF, ...)
> > >>>>>>>>> >>>>   .select("lon", "lat", "geotiff_data")
> > >>>>>>>>> >>>>
> > >>>>>>>>> >>>> Are there any examples or documentation I can follow to
> > >>>>>>>>> accomplish this?
> > >>>>>>>>> >>>>
> > >>>>>>>>> >>>> Thanks,
> > >>>>>>>>> >>>>
> > >>>>>>>>> >>>> --
> > >>>>>>>>> >>>> Pedro Mano Fernandes
> > >>>>>>>>> >>>>
> > >>>>>>>>> >>> --
> > >>>>>>>>> >> Pedro Mano Fernandes
> > >>>>>>>>> >>
> > >>>>>>>>> >
> > >>>>>>>>> >
> > >>>>>>>>> > --
> > >>>>>>>>> > Pedro Mano Fernandes
> > >>>>>>>>> >
> > >>>>>>>>>
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>> --
> > >>>>>>>> Hälsningar,
> > >>>>>>>> Martin
> > >>>>>>>>
> > >>>>>>>
> > >>>>>>>
> > >>>>>>> --
> > >>>>>>> Pedro Mano Fernandes
> > >>>>>>>
> > >>>>>>
> > >>>>>
> > >>>>> --
> > >>>>> Pedro Mano Fernandes
> > >>>>>
> > >>>>
> > >>>
> > >>> --
> > >>> Pedro Mano Fernandes
> > >>>
> > >>
> > >
> > > --
> > > Pedro Mano Fernandes
> > >
> >
> >
> > --
> > Pedro Mano Fernandes
>

Re: How to use raster GeoTiff

Posted by Jia Yu <ji...@apache.org>.
Hi Pedro,

You should use sedona.apache.org instead of sedona.staged.apache.org.
`staged` website is for us to test the website template. We haven't
been updating that website for more than 1 year.

Here is the doc for Martin's RasterUDT:
https://sedona.apache.org/1.4.0/api/sql/Raster-loader/

Thanks,
Jia

On Tue, Mar 21, 2023 at 8:30 AM Pedro Mano Fernandes
<pe...@gmail.com> wrote:
>
> Hi Martin,
>
> It's weird I don't see your new Raster features in the docs in
> https://sedona.staged.apache.org/api/sql/Raster-loader/. I thought the
> documentation was already up-to-date after the release of sedona-1.4.0.
>
> Best regards,
>
> On Wed, 1 Mar 2023 at 10:29, Pedro Mano Fernandes <pe...@gmail.com>
> wrote:
>
> > Hi Martin,
> >
> > Great news! I'll give it a go and will let you know.
> >
> > Thanks for letting me know.
> > Best regards,
> >
> > On Tue, 28 Feb 2023 at 14:53, Martin Andersson <
> > u.martin.andersson@gmail.com> wrote:
> >
> >> Hi again Pedro,
> >>
> >> Since https://github.com/apache/sedona/pull/773 got merged you should
> >> now be able to use Apache Sedona for your GeoTiff processing needs. It will
> >> be included in the next Sedona release.
> >>
> >> All feedback is welcome!
> >>
> >> Br
> >> Martin Andersson
> >>
> >>
> >> Den mån 23 jan. 2023 kl 10:45 skrev Pedro Mano Fernandes <
> >> pedromorfeu@gmail.com>:
> >>
> >>> Hi Martin,
> >>>
> >>> I've tested your proposal (reading binary and UDF getValue) and it works
> >>> fine. I've actually converted the code to Scala easily. Now it's a matter
> >>> of building/optimizing around it (spatial join, aggregate points per
> >>> geotiff).
> >>>
> >>> Best,
> >>>
> >>> On Fri, 20 Jan 2023 at 13:47, Martin Andersson <
> >>> u.martin.andersson@gmail.com> wrote:
> >>>
> >>>> Yes, there are lots of things to consider when processing large blobs
> >>>> in Spark. What I have come to learn:
> >>>>  - Do the spatial join (points and the geotiff extent) with as few
> >>>> columns as possible. Ideally an id only for the geotiff. After that join
> >>>> you can join back the geotiff using the id.
> >>>>  - Aggregate the points to an array of points per geotiff. Your
> >>>> getValue udf should take an array of points and return an array of values.
> >>>> That way each geotiff is only loaded once.
> >>>>  - Parquet in Spark is not very good at handling large blobs. If
> >>>> reading parquet with geotiffs is slow you can repartition() with a very
> >>>> large number to force smaller row groups when writing or use Avro instead.
> >>>> https://www.uber.com/en-SE/blog/hdfs-file-format-apache-spark/
> >>>>
> >>>> Good luck!
> >>>>
> >>>> Br,
> >>>> Martin Andersson
> >>>>
> >>>>
> >>>> Den fre 20 jan. 2023 kl 13:08 skrev Pedro Mano Fernandes <
> >>>> pedromorfeu@gmail.com>:
> >>>>
> >>>>> Thanks Martin, it sounds promising. I'll actually give it a try before
> >>>>> going with geotiff conversions.
> >>>>>
> >>>>> I'm foreseeing some concerns, though:
> >>>>>
> >>>>>    - I'm afraid it won't be optimal for a big geotiff - I may have to
> >>>>>    split the geotiff into smaller geotiffs
> >>>>>    - I wonder how the spatial partitioning optimization will behave
> >>>>>    in such approach - I may have to load smaller geotiffs and use their
> >>>>>    geometry to join (my coordinates against envelope boundaries) before
> >>>>>    calculating the getValue for my coordinates
> >>>>>
> >>>>> Best,
> >>>>>
> >>>>> On Fri, 20 Jan 2023 at 08:49, Martin Andersson <
> >>>>> u.martin.andersson@gmail.com> wrote:
> >>>>>
> >>>>>> I would read the geotiff files as binary:
> >>>>>> https://spark.apache.org/docs/latest/sql-data-sources-binaryFile.html
> >>>>>>
> >>>>>> Then you can define a udf to extract values directly from the
> >>>>>> geotiffs. If you're on python you can use raster.io to do that.
> >>>>>>
> >>>>>> In java it would look some thing like this:
> >>>>>>
> >>>>>>   Integer getValue(byte[] geotiff, double x, double y)
> >>>>>>       throws IOException, TransformException {
> >>>>>>     try (ByteArrayInputStream inputStream = new
> >>>>>> ByteArrayInputStream(geotiff)) {
> >>>>>>       GeoTiffReader geoTiffReader = new GeoTiffReader(inputStream);
> >>>>>>       GridCoverage2D grid = geoTiffReader.read(null);
> >>>>>>       Raster raster = grid.getRenderedImage().getData();
> >>>>>>       GridGeometry2D gridGeometry = grid.getGridGeometry();
> >>>>>>
> >>>>>>       DirectPosition2D directPosition2D = new DirectPosition2D(x, y);
> >>>>>>       GridCoordinates2D gridCoordinates2D =
> >>>>>> gridGeometry.worldToGrid(directPosition2D);
> >>>>>>       try {
> >>>>>>           int[] pixel = raster.getPixel(gridCoordinates2D.x,
> >>>>>> gridCoordinates2D.y, new int[1]);
> >>>>>>           return pixel[0];
> >>>>>>       } catch (ArrayIndexOutOfBoundsException exc) {
> >>>>>>           // point is outside the extentent
> >>>>>>           result.add(null);
> >>>>>>       }
> >>>>>>     }
> >>>>>> }
> >>>>>>
> >>>>>> Br,
> >>>>>> Martin Andersson
> >>>>>>
> >>>>>> Den ons 18 jan. 2023 kl 17:59 skrev Pedro Mano Fernandes <
> >>>>>> pedromorfeu@gmail.com>:
> >>>>>>
> >>>>>>> Thanks for the update, guys.
> >>>>>>>
> >>>>>>> I'm not ready to contribute yet.
> >>>>>>>
> >>>>>>> In the meanwhile, the solution could be perhaps to convert GeoTiff
> >>>>>>> to another format supported by Sedona. If anyone has had this use case
> >>>>>>> before or has any idea, please share.
> >>>>>>>
> >>>>>>> Best,
> >>>>>>>
> >>>>>>> On Wed, 18 Jan 2023 at 09:47, Martin Andersson <
> >>>>>>> u.martin.andersson@gmail.com> wrote:
> >>>>>>>
> >>>>>>>> Hi,
> >>>>>>>>
> >>>>>>>> I think you are looking for something like this:
> >>>>>>>> https://postgis.net/docs/RT_ST_Value.html
> >>>>>>>>
> >>>>>>>> The raster support in Sedona is very limited at the moment. The
> >>>>>>>> lack of a proper raster type makes implementing st_value impossible. We had
> >>>>>>>> a brief discussion about that recently.
> >>>>>>>> https://lists.apache.org/thread/qdfcvxl6z5pb7m7ky5zsksyytyxqwv8c
> >>>>>>>>
> >>>>>>>> If you want to make a contribution and need some guidance, please
> >>>>>>>> let me know!
> >>>>>>>>
> >>>>>>>> Br,
> >>>>>>>> Martin Andersson
> >>>>>>>>
> >>>>>>>> Den ons 18 jan. 2023 kl 05:45 skrev Jia Yu <ji...@apache.org>:
> >>>>>>>>
> >>>>>>>>> Hi Pedro,
> >>>>>>>>>
> >>>>>>>>> I got your point. Unfortunately, we don't have this function yet
> >>>>>>>>> in Sedona.
> >>>>>>>>> But we welcome anyone who want to contribute this to Sedona!
> >>>>>>>>>
> >>>>>>>>> Thanks,
> >>>>>>>>> Jia
> >>>>>>>>>
> >>>>>>>>> On Tue, Jan 17, 2023 at 9:11 AM Pedro Mano Fernandes <
> >>>>>>>>> pedromorfeu@gmail.com>
> >>>>>>>>> wrote:
> >>>>>>>>>
> >>>>>>>>> > Hi all,
> >>>>>>>>> >
> >>>>>>>>> > Any clue? Or any documentation I can refer to?
> >>>>>>>>> >
> >>>>>>>>> > Here goes a dummy example to better explain myself: in QGIS I
> >>>>>>>>> can click a
> >>>>>>>>> > point (coordinates) of the geotiff and get the value in that
> >>>>>>>>> point (in this
> >>>>>>>>> > case 231 of Band 1).
> >>>>>>>>> >
> >>>>>>>>> > [image: image.png]
> >>>>>>>>> >
> >>>>>>>>> > Thanks,
> >>>>>>>>> >
> >>>>>>>>> > On Sun, 15 Jan 2023 at 16:17, Pedro Mano Fernandes <
> >>>>>>>>> pedromorfeu@gmail.com>
> >>>>>>>>> > wrote:
> >>>>>>>>> >
> >>>>>>>>> >> Hi Jia,
> >>>>>>>>> >>
> >>>>>>>>> >> Thanks for the fast response.
> >>>>>>>>> >>
> >>>>>>>>> >> With the regular spatial join I’ll get the array of data of the
> >>>>>>>>> whole
> >>>>>>>>> >> geotiff polygon. I was hoping to get the data element for
> >>>>>>>>> specific
> >>>>>>>>> >> coordinates inside that polygon. In other words: I guess the
> >>>>>>>>> array of data
> >>>>>>>>> >> corresponds to all the positions in the polygon, but I want to
> >>>>>>>>> fetch
> >>>>>>>>> >> specific positions.
> >>>>>>>>> >>
> >>>>>>>>> >> Thanks,
> >>>>>>>>> >>
> >>>>>>>>> >> On Sun, 15 Jan 2023 at 01:09, Jia Yu <ji...@apache.org> wrote:
> >>>>>>>>> >>
> >>>>>>>>> >>> Hi Pedro,
> >>>>>>>>> >>>
> >>>>>>>>> >>> Once you use Sedona geotiff reader to read those geotiffs, you
> >>>>>>>>> will get
> >>>>>>>>> >>> a dataframe with the following schema:
> >>>>>>>>> >>>
> >>>>>>>>> >>>  |-- image: struct (nullable = true)
> >>>>>>>>> >>>  |    |-- origin: string (nullable = true)
> >>>>>>>>> >>>  |    |-- Geometry: string (nullable = true)
> >>>>>>>>> >>>  |    |-- height: integer (nullable = true)
> >>>>>>>>> >>>  |    |-- width: integer (nullable = true)
> >>>>>>>>> >>>  |    |-- nBands: integer (nullable = true)
> >>>>>>>>> >>>  |    |-- data: array (nullable = true)
> >>>>>>>>> >>>  |    |    |-- element: double (containsNull = true)
> >>>>>>>>> >>>
> >>>>>>>>> >>>
> >>>>>>>>> >>> You can use the following way to fetch the geometry column and
> >>>>>>>>> perform
> >>>>>>>>> >>> the spatial join;
> >>>>>>>>> >>>
> >>>>>>>>> >>> geotiffDF = geotiffDF.selectExpr("image.origin as
> >>>>>>>>> >>> origin","ST_GeomFromWkt(image.geometry) as Geom",
> >>>>>>>>> "image.height as height",
> >>>>>>>>> >>> "image.width as width", "image.data as data", "image.nBands as
> >>>>>>>>> bands")
> >>>>>>>>> >>> geotiffDF.createOrReplaceTempView("GeotiffDataframe")
> >>>>>>>>> >>> geotiffDF.show()
> >>>>>>>>> >>>
> >>>>>>>>> >>> More info can be found:
> >>>>>>>>> >>>
> >>>>>>>>> https://sedona.apache.org/1.3.1-incubating/api/sql/Raster-loader/#geotiff-dataframe-loader
> >>>>>>>>> >>>
> >>>>>>>>> >>> Thanks,
> >>>>>>>>> >>> Jia
> >>>>>>>>> >>>
> >>>>>>>>> >>> On Sat, Jan 14, 2023 at 9:10 AM Pedro Mano Fernandes <
> >>>>>>>>> >>> pedromorfeu@gmail.com> wrote:
> >>>>>>>>> >>>
> >>>>>>>>> >>>> Hi everyone!
> >>>>>>>>> >>>>
> >>>>>>>>> >>>> I'm trying to use elevation data in GeoTiff format. I
> >>>>>>>>> understand how to
> >>>>>>>>> >>>> load the dataset, as described in
> >>>>>>>>> >>>>
> >>>>>>>>> >>>>
> >>>>>>>>> https://sedona.staged.apache.org/api/sql/Raster-loader/#geotiff-dataframe-loader
> >>>>>>>>> >>>> .
> >>>>>>>>> >>>> Now I'm wondering how to join this dataframe with another one
> >>>>>>>>> that
> >>>>>>>>> >>>> contains
> >>>>>>>>> >>>> coordinates, in order to get the elevation data for those
> >>>>>>>>> coordinates.
> >>>>>>>>> >>>>
> >>>>>>>>> >>>> Something along these lines:
> >>>>>>>>> >>>>
> >>>>>>>>> >>>> pointsDF
> >>>>>>>>> >>>>   .join(geotiffDF, ...)
> >>>>>>>>> >>>>   .select("lon", "lat", "geotiff_data")
> >>>>>>>>> >>>>
> >>>>>>>>> >>>> Are there any examples or documentation I can follow to
> >>>>>>>>> accomplish this?
> >>>>>>>>> >>>>
> >>>>>>>>> >>>> Thanks,
> >>>>>>>>> >>>>
> >>>>>>>>> >>>> --
> >>>>>>>>> >>>> Pedro Mano Fernandes
> >>>>>>>>> >>>>
> >>>>>>>>> >>> --
> >>>>>>>>> >> Pedro Mano Fernandes
> >>>>>>>>> >>
> >>>>>>>>> >
> >>>>>>>>> >
> >>>>>>>>> > --
> >>>>>>>>> > Pedro Mano Fernandes
> >>>>>>>>> >
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> --
> >>>>>>>> Hälsningar,
> >>>>>>>> Martin
> >>>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> --
> >>>>>>> Pedro Mano Fernandes
> >>>>>>>
> >>>>>>
> >>>>>
> >>>>> --
> >>>>> Pedro Mano Fernandes
> >>>>>
> >>>>
> >>>
> >>> --
> >>> Pedro Mano Fernandes
> >>>
> >>
> >
> > --
> > Pedro Mano Fernandes
> >
>
>
> --
> Pedro Mano Fernandes

Re: How to use raster GeoTiff

Posted by Pedro Mano Fernandes <pe...@gmail.com>.
Hi Martin,

It's weird I don't see your new Raster features in the docs in
https://sedona.staged.apache.org/api/sql/Raster-loader/. I thought the
documentation was already up-to-date after the release of sedona-1.4.0.

Best regards,

On Wed, 1 Mar 2023 at 10:29, Pedro Mano Fernandes <pe...@gmail.com>
wrote:

> Hi Martin,
>
> Great news! I'll give it a go and will let you know.
>
> Thanks for letting me know.
> Best regards,
>
> On Tue, 28 Feb 2023 at 14:53, Martin Andersson <
> u.martin.andersson@gmail.com> wrote:
>
>> Hi again Pedro,
>>
>> Since https://github.com/apache/sedona/pull/773 got merged you should
>> now be able to use Apache Sedona for your GeoTiff processing needs. It will
>> be included in the next Sedona release.
>>
>> All feedback is welcome!
>>
>> Br
>> Martin Andersson
>>
>>
>> Den mån 23 jan. 2023 kl 10:45 skrev Pedro Mano Fernandes <
>> pedromorfeu@gmail.com>:
>>
>>> Hi Martin,
>>>
>>> I've tested your proposal (reading binary and UDF getValue) and it works
>>> fine. I've actually converted the code to Scala easily. Now it's a matter
>>> of building/optimizing around it (spatial join, aggregate points per
>>> geotiff).
>>>
>>> Best,
>>>
>>> On Fri, 20 Jan 2023 at 13:47, Martin Andersson <
>>> u.martin.andersson@gmail.com> wrote:
>>>
>>>> Yes, there are lots of things to consider when processing large blobs
>>>> in Spark. What I have come to learn:
>>>>  - Do the spatial join (points and the geotiff extent) with as few
>>>> columns as possible. Ideally an id only for the geotiff. After that join
>>>> you can join back the geotiff using the id.
>>>>  - Aggregate the points to an array of points per geotiff. Your
>>>> getValue udf should take an array of points and return an array of values.
>>>> That way each geotiff is only loaded once.
>>>>  - Parquet in Spark is not very good at handling large blobs. If
>>>> reading parquet with geotiffs is slow you can repartition() with a very
>>>> large number to force smaller row groups when writing or use Avro instead.
>>>> https://www.uber.com/en-SE/blog/hdfs-file-format-apache-spark/
>>>>
>>>> Good luck!
>>>>
>>>> Br,
>>>> Martin Andersson
>>>>
>>>>
>>>> Den fre 20 jan. 2023 kl 13:08 skrev Pedro Mano Fernandes <
>>>> pedromorfeu@gmail.com>:
>>>>
>>>>> Thanks Martin, it sounds promising. I'll actually give it a try before
>>>>> going with geotiff conversions.
>>>>>
>>>>> I'm foreseeing some concerns, though:
>>>>>
>>>>>    - I'm afraid it won't be optimal for a big geotiff - I may have to
>>>>>    split the geotiff into smaller geotiffs
>>>>>    - I wonder how the spatial partitioning optimization will behave
>>>>>    in such approach - I may have to load smaller geotiffs and use their
>>>>>    geometry to join (my coordinates against envelope boundaries) before
>>>>>    calculating the getValue for my coordinates
>>>>>
>>>>> Best,
>>>>>
>>>>> On Fri, 20 Jan 2023 at 08:49, Martin Andersson <
>>>>> u.martin.andersson@gmail.com> wrote:
>>>>>
>>>>>> I would read the geotiff files as binary:
>>>>>> https://spark.apache.org/docs/latest/sql-data-sources-binaryFile.html
>>>>>>
>>>>>> Then you can define a udf to extract values directly from the
>>>>>> geotiffs. If you're on python you can use raster.io to do that.
>>>>>>
>>>>>> In java it would look some thing like this:
>>>>>>
>>>>>>   Integer getValue(byte[] geotiff, double x, double y)
>>>>>>       throws IOException, TransformException {
>>>>>>     try (ByteArrayInputStream inputStream = new
>>>>>> ByteArrayInputStream(geotiff)) {
>>>>>>       GeoTiffReader geoTiffReader = new GeoTiffReader(inputStream);
>>>>>>       GridCoverage2D grid = geoTiffReader.read(null);
>>>>>>       Raster raster = grid.getRenderedImage().getData();
>>>>>>       GridGeometry2D gridGeometry = grid.getGridGeometry();
>>>>>>
>>>>>>       DirectPosition2D directPosition2D = new DirectPosition2D(x, y);
>>>>>>       GridCoordinates2D gridCoordinates2D =
>>>>>> gridGeometry.worldToGrid(directPosition2D);
>>>>>>       try {
>>>>>>           int[] pixel = raster.getPixel(gridCoordinates2D.x,
>>>>>> gridCoordinates2D.y, new int[1]);
>>>>>>           return pixel[0];
>>>>>>       } catch (ArrayIndexOutOfBoundsException exc) {
>>>>>>           // point is outside the extentent
>>>>>>           result.add(null);
>>>>>>       }
>>>>>>     }
>>>>>> }
>>>>>>
>>>>>> Br,
>>>>>> Martin Andersson
>>>>>>
>>>>>> Den ons 18 jan. 2023 kl 17:59 skrev Pedro Mano Fernandes <
>>>>>> pedromorfeu@gmail.com>:
>>>>>>
>>>>>>> Thanks for the update, guys.
>>>>>>>
>>>>>>> I'm not ready to contribute yet.
>>>>>>>
>>>>>>> In the meanwhile, the solution could be perhaps to convert GeoTiff
>>>>>>> to another format supported by Sedona. If anyone has had this use case
>>>>>>> before or has any idea, please share.
>>>>>>>
>>>>>>> Best,
>>>>>>>
>>>>>>> On Wed, 18 Jan 2023 at 09:47, Martin Andersson <
>>>>>>> u.martin.andersson@gmail.com> wrote:
>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> I think you are looking for something like this:
>>>>>>>> https://postgis.net/docs/RT_ST_Value.html
>>>>>>>>
>>>>>>>> The raster support in Sedona is very limited at the moment. The
>>>>>>>> lack of a proper raster type makes implementing st_value impossible. We had
>>>>>>>> a brief discussion about that recently.
>>>>>>>> https://lists.apache.org/thread/qdfcvxl6z5pb7m7ky5zsksyytyxqwv8c
>>>>>>>>
>>>>>>>> If you want to make a contribution and need some guidance, please
>>>>>>>> let me know!
>>>>>>>>
>>>>>>>> Br,
>>>>>>>> Martin Andersson
>>>>>>>>
>>>>>>>> Den ons 18 jan. 2023 kl 05:45 skrev Jia Yu <ji...@apache.org>:
>>>>>>>>
>>>>>>>>> Hi Pedro,
>>>>>>>>>
>>>>>>>>> I got your point. Unfortunately, we don't have this function yet
>>>>>>>>> in Sedona.
>>>>>>>>> But we welcome anyone who want to contribute this to Sedona!
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Jia
>>>>>>>>>
>>>>>>>>> On Tue, Jan 17, 2023 at 9:11 AM Pedro Mano Fernandes <
>>>>>>>>> pedromorfeu@gmail.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>> > Hi all,
>>>>>>>>> >
>>>>>>>>> > Any clue? Or any documentation I can refer to?
>>>>>>>>> >
>>>>>>>>> > Here goes a dummy example to better explain myself: in QGIS I
>>>>>>>>> can click a
>>>>>>>>> > point (coordinates) of the geotiff and get the value in that
>>>>>>>>> point (in this
>>>>>>>>> > case 231 of Band 1).
>>>>>>>>> >
>>>>>>>>> > [image: image.png]
>>>>>>>>> >
>>>>>>>>> > Thanks,
>>>>>>>>> >
>>>>>>>>> > On Sun, 15 Jan 2023 at 16:17, Pedro Mano Fernandes <
>>>>>>>>> pedromorfeu@gmail.com>
>>>>>>>>> > wrote:
>>>>>>>>> >
>>>>>>>>> >> Hi Jia,
>>>>>>>>> >>
>>>>>>>>> >> Thanks for the fast response.
>>>>>>>>> >>
>>>>>>>>> >> With the regular spatial join I’ll get the array of data of the
>>>>>>>>> whole
>>>>>>>>> >> geotiff polygon. I was hoping to get the data element for
>>>>>>>>> specific
>>>>>>>>> >> coordinates inside that polygon. In other words: I guess the
>>>>>>>>> array of data
>>>>>>>>> >> corresponds to all the positions in the polygon, but I want to
>>>>>>>>> fetch
>>>>>>>>> >> specific positions.
>>>>>>>>> >>
>>>>>>>>> >> Thanks,
>>>>>>>>> >>
>>>>>>>>> >> On Sun, 15 Jan 2023 at 01:09, Jia Yu <ji...@apache.org> wrote:
>>>>>>>>> >>
>>>>>>>>> >>> Hi Pedro,
>>>>>>>>> >>>
>>>>>>>>> >>> Once you use Sedona geotiff reader to read those geotiffs, you
>>>>>>>>> will get
>>>>>>>>> >>> a dataframe with the following schema:
>>>>>>>>> >>>
>>>>>>>>> >>>  |-- image: struct (nullable = true)
>>>>>>>>> >>>  |    |-- origin: string (nullable = true)
>>>>>>>>> >>>  |    |-- Geometry: string (nullable = true)
>>>>>>>>> >>>  |    |-- height: integer (nullable = true)
>>>>>>>>> >>>  |    |-- width: integer (nullable = true)
>>>>>>>>> >>>  |    |-- nBands: integer (nullable = true)
>>>>>>>>> >>>  |    |-- data: array (nullable = true)
>>>>>>>>> >>>  |    |    |-- element: double (containsNull = true)
>>>>>>>>> >>>
>>>>>>>>> >>>
>>>>>>>>> >>> You can use the following way to fetch the geometry column and
>>>>>>>>> perform
>>>>>>>>> >>> the spatial join;
>>>>>>>>> >>>
>>>>>>>>> >>> geotiffDF = geotiffDF.selectExpr("image.origin as
>>>>>>>>> >>> origin","ST_GeomFromWkt(image.geometry) as Geom",
>>>>>>>>> "image.height as height",
>>>>>>>>> >>> "image.width as width", "image.data as data", "image.nBands as
>>>>>>>>> bands")
>>>>>>>>> >>> geotiffDF.createOrReplaceTempView("GeotiffDataframe")
>>>>>>>>> >>> geotiffDF.show()
>>>>>>>>> >>>
>>>>>>>>> >>> More info can be found:
>>>>>>>>> >>>
>>>>>>>>> https://sedona.apache.org/1.3.1-incubating/api/sql/Raster-loader/#geotiff-dataframe-loader
>>>>>>>>> >>>
>>>>>>>>> >>> Thanks,
>>>>>>>>> >>> Jia
>>>>>>>>> >>>
>>>>>>>>> >>> On Sat, Jan 14, 2023 at 9:10 AM Pedro Mano Fernandes <
>>>>>>>>> >>> pedromorfeu@gmail.com> wrote:
>>>>>>>>> >>>
>>>>>>>>> >>>> Hi everyone!
>>>>>>>>> >>>>
>>>>>>>>> >>>> I'm trying to use elevation data in GeoTiff format. I
>>>>>>>>> understand how to
>>>>>>>>> >>>> load the dataset, as described in
>>>>>>>>> >>>>
>>>>>>>>> >>>>
>>>>>>>>> https://sedona.staged.apache.org/api/sql/Raster-loader/#geotiff-dataframe-loader
>>>>>>>>> >>>> .
>>>>>>>>> >>>> Now I'm wondering how to join this dataframe with another one
>>>>>>>>> that
>>>>>>>>> >>>> contains
>>>>>>>>> >>>> coordinates, in order to get the elevation data for those
>>>>>>>>> coordinates.
>>>>>>>>> >>>>
>>>>>>>>> >>>> Something along these lines:
>>>>>>>>> >>>>
>>>>>>>>> >>>> pointsDF
>>>>>>>>> >>>>   .join(geotiffDF, ...)
>>>>>>>>> >>>>   .select("lon", "lat", "geotiff_data")
>>>>>>>>> >>>>
>>>>>>>>> >>>> Are there any examples or documentation I can follow to
>>>>>>>>> accomplish this?
>>>>>>>>> >>>>
>>>>>>>>> >>>> Thanks,
>>>>>>>>> >>>>
>>>>>>>>> >>>> --
>>>>>>>>> >>>> Pedro Mano Fernandes
>>>>>>>>> >>>>
>>>>>>>>> >>> --
>>>>>>>>> >> Pedro Mano Fernandes
>>>>>>>>> >>
>>>>>>>>> >
>>>>>>>>> >
>>>>>>>>> > --
>>>>>>>>> > Pedro Mano Fernandes
>>>>>>>>> >
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Hälsningar,
>>>>>>>> Martin
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Pedro Mano Fernandes
>>>>>>>
>>>>>>
>>>>>
>>>>> --
>>>>> Pedro Mano Fernandes
>>>>>
>>>>
>>>
>>> --
>>> Pedro Mano Fernandes
>>>
>>
>
> --
> Pedro Mano Fernandes
>


-- 
Pedro Mano Fernandes

Re: How to use raster GeoTiff

Posted by Pedro Mano Fernandes <pe...@gmail.com>.
Hi Martin,

Great news! I'll give it a go and will let you know.

Thanks for letting me know.
Best regards,

On Tue, 28 Feb 2023 at 14:53, Martin Andersson <u....@gmail.com>
wrote:

> Hi again Pedro,
>
> Since https://github.com/apache/sedona/pull/773 got merged you should now
> be able to use Apache Sedona for your GeoTiff processing needs. It will be
> included in the next Sedona release.
>
> All feedback is welcome!
>
> Br
> Martin Andersson
>
>
> Den mån 23 jan. 2023 kl 10:45 skrev Pedro Mano Fernandes <
> pedromorfeu@gmail.com>:
>
>> Hi Martin,
>>
>> I've tested your proposal (reading binary and UDF getValue) and it works
>> fine. I've actually converted the code to Scala easily. Now it's a matter
>> of building/optimizing around it (spatial join, aggregate points per
>> geotiff).
>>
>> Best,
>>
>> On Fri, 20 Jan 2023 at 13:47, Martin Andersson <
>> u.martin.andersson@gmail.com> wrote:
>>
>>> Yes, there are lots of things to consider when processing large blobs in
>>> Spark. What I have come to learn:
>>>  - Do the spatial join (points and the geotiff extent) with as few
>>> columns as possible. Ideally an id only for the geotiff. After that join
>>> you can join back the geotiff using the id.
>>>  - Aggregate the points to an array of points per geotiff. Your getValue
>>> udf should take an array of points and return an array of values. That way
>>> each geotiff is only loaded once.
>>>  - Parquet in Spark is not very good at handling large blobs. If reading
>>> parquet with geotiffs is slow you can repartition() with a very large
>>> number to force smaller row groups when writing or use Avro instead.
>>> https://www.uber.com/en-SE/blog/hdfs-file-format-apache-spark/
>>>
>>> Good luck!
>>>
>>> Br,
>>> Martin Andersson
>>>
>>>
>>> Den fre 20 jan. 2023 kl 13:08 skrev Pedro Mano Fernandes <
>>> pedromorfeu@gmail.com>:
>>>
>>>> Thanks Martin, it sounds promising. I'll actually give it a try before
>>>> going with geotiff conversions.
>>>>
>>>> I'm foreseeing some concerns, though:
>>>>
>>>>    - I'm afraid it won't be optimal for a big geotiff - I may have to
>>>>    split the geotiff into smaller geotiffs
>>>>    - I wonder how the spatial partitioning optimization will behave in
>>>>    such approach - I may have to load smaller geotiffs and use their geometry
>>>>    to join (my coordinates against envelope boundaries) before calculating the
>>>>    getValue for my coordinates
>>>>
>>>> Best,
>>>>
>>>> On Fri, 20 Jan 2023 at 08:49, Martin Andersson <
>>>> u.martin.andersson@gmail.com> wrote:
>>>>
>>>>> I would read the geotiff files as binary:
>>>>> https://spark.apache.org/docs/latest/sql-data-sources-binaryFile.html
>>>>>
>>>>> Then you can define a udf to extract values directly from the
>>>>> geotiffs. If you're on python you can use raster.io to do that.
>>>>>
>>>>> In java it would look some thing like this:
>>>>>
>>>>>   Integer getValue(byte[] geotiff, double x, double y)
>>>>>       throws IOException, TransformException {
>>>>>     try (ByteArrayInputStream inputStream = new
>>>>> ByteArrayInputStream(geotiff)) {
>>>>>       GeoTiffReader geoTiffReader = new GeoTiffReader(inputStream);
>>>>>       GridCoverage2D grid = geoTiffReader.read(null);
>>>>>       Raster raster = grid.getRenderedImage().getData();
>>>>>       GridGeometry2D gridGeometry = grid.getGridGeometry();
>>>>>
>>>>>       DirectPosition2D directPosition2D = new DirectPosition2D(x, y);
>>>>>       GridCoordinates2D gridCoordinates2D =
>>>>> gridGeometry.worldToGrid(directPosition2D);
>>>>>       try {
>>>>>           int[] pixel = raster.getPixel(gridCoordinates2D.x,
>>>>> gridCoordinates2D.y, new int[1]);
>>>>>           return pixel[0];
>>>>>       } catch (ArrayIndexOutOfBoundsException exc) {
>>>>>           // point is outside the extentent
>>>>>           result.add(null);
>>>>>       }
>>>>>     }
>>>>> }
>>>>>
>>>>> Br,
>>>>> Martin Andersson
>>>>>
>>>>> Den ons 18 jan. 2023 kl 17:59 skrev Pedro Mano Fernandes <
>>>>> pedromorfeu@gmail.com>:
>>>>>
>>>>>> Thanks for the update, guys.
>>>>>>
>>>>>> I'm not ready to contribute yet.
>>>>>>
>>>>>> In the meanwhile, the solution could be perhaps to convert GeoTiff to
>>>>>> another format supported by Sedona. If anyone has had this use case before
>>>>>> or has any idea, please share.
>>>>>>
>>>>>> Best,
>>>>>>
>>>>>> On Wed, 18 Jan 2023 at 09:47, Martin Andersson <
>>>>>> u.martin.andersson@gmail.com> wrote:
>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> I think you are looking for something like this:
>>>>>>> https://postgis.net/docs/RT_ST_Value.html
>>>>>>>
>>>>>>> The raster support in Sedona is very limited at the moment. The lack
>>>>>>> of a proper raster type makes implementing st_value impossible. We had a
>>>>>>> brief discussion about that recently.
>>>>>>> https://lists.apache.org/thread/qdfcvxl6z5pb7m7ky5zsksyytyxqwv8c
>>>>>>>
>>>>>>> If you want to make a contribution and need some guidance, please
>>>>>>> let me know!
>>>>>>>
>>>>>>> Br,
>>>>>>> Martin Andersson
>>>>>>>
>>>>>>> Den ons 18 jan. 2023 kl 05:45 skrev Jia Yu <ji...@apache.org>:
>>>>>>>
>>>>>>>> Hi Pedro,
>>>>>>>>
>>>>>>>> I got your point. Unfortunately, we don't have this function yet in
>>>>>>>> Sedona.
>>>>>>>> But we welcome anyone who want to contribute this to Sedona!
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Jia
>>>>>>>>
>>>>>>>> On Tue, Jan 17, 2023 at 9:11 AM Pedro Mano Fernandes <
>>>>>>>> pedromorfeu@gmail.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>> > Hi all,
>>>>>>>> >
>>>>>>>> > Any clue? Or any documentation I can refer to?
>>>>>>>> >
>>>>>>>> > Here goes a dummy example to better explain myself: in QGIS I can
>>>>>>>> click a
>>>>>>>> > point (coordinates) of the geotiff and get the value in that
>>>>>>>> point (in this
>>>>>>>> > case 231 of Band 1).
>>>>>>>> >
>>>>>>>> > [image: image.png]
>>>>>>>> >
>>>>>>>> > Thanks,
>>>>>>>> >
>>>>>>>> > On Sun, 15 Jan 2023 at 16:17, Pedro Mano Fernandes <
>>>>>>>> pedromorfeu@gmail.com>
>>>>>>>> > wrote:
>>>>>>>> >
>>>>>>>> >> Hi Jia,
>>>>>>>> >>
>>>>>>>> >> Thanks for the fast response.
>>>>>>>> >>
>>>>>>>> >> With the regular spatial join I’ll get the array of data of the
>>>>>>>> whole
>>>>>>>> >> geotiff polygon. I was hoping to get the data element for
>>>>>>>> specific
>>>>>>>> >> coordinates inside that polygon. In other words: I guess the
>>>>>>>> array of data
>>>>>>>> >> corresponds to all the positions in the polygon, but I want to
>>>>>>>> fetch
>>>>>>>> >> specific positions.
>>>>>>>> >>
>>>>>>>> >> Thanks,
>>>>>>>> >>
>>>>>>>> >> On Sun, 15 Jan 2023 at 01:09, Jia Yu <ji...@apache.org> wrote:
>>>>>>>> >>
>>>>>>>> >>> Hi Pedro,
>>>>>>>> >>>
>>>>>>>> >>> Once you use Sedona geotiff reader to read those geotiffs, you
>>>>>>>> will get
>>>>>>>> >>> a dataframe with the following schema:
>>>>>>>> >>>
>>>>>>>> >>>  |-- image: struct (nullable = true)
>>>>>>>> >>>  |    |-- origin: string (nullable = true)
>>>>>>>> >>>  |    |-- Geometry: string (nullable = true)
>>>>>>>> >>>  |    |-- height: integer (nullable = true)
>>>>>>>> >>>  |    |-- width: integer (nullable = true)
>>>>>>>> >>>  |    |-- nBands: integer (nullable = true)
>>>>>>>> >>>  |    |-- data: array (nullable = true)
>>>>>>>> >>>  |    |    |-- element: double (containsNull = true)
>>>>>>>> >>>
>>>>>>>> >>>
>>>>>>>> >>> You can use the following way to fetch the geometry column and
>>>>>>>> perform
>>>>>>>> >>> the spatial join;
>>>>>>>> >>>
>>>>>>>> >>> geotiffDF = geotiffDF.selectExpr("image.origin as
>>>>>>>> >>> origin","ST_GeomFromWkt(image.geometry) as Geom", "image.height
>>>>>>>> as height",
>>>>>>>> >>> "image.width as width", "image.data as data", "image.nBands as
>>>>>>>> bands")
>>>>>>>> >>> geotiffDF.createOrReplaceTempView("GeotiffDataframe")
>>>>>>>> >>> geotiffDF.show()
>>>>>>>> >>>
>>>>>>>> >>> More info can be found:
>>>>>>>> >>>
>>>>>>>> https://sedona.apache.org/1.3.1-incubating/api/sql/Raster-loader/#geotiff-dataframe-loader
>>>>>>>> >>>
>>>>>>>> >>> Thanks,
>>>>>>>> >>> Jia
>>>>>>>> >>>
>>>>>>>> >>> On Sat, Jan 14, 2023 at 9:10 AM Pedro Mano Fernandes <
>>>>>>>> >>> pedromorfeu@gmail.com> wrote:
>>>>>>>> >>>
>>>>>>>> >>>> Hi everyone!
>>>>>>>> >>>>
>>>>>>>> >>>> I'm trying to use elevation data in GeoTiff format. I
>>>>>>>> understand how to
>>>>>>>> >>>> load the dataset, as described in
>>>>>>>> >>>>
>>>>>>>> >>>>
>>>>>>>> https://sedona.staged.apache.org/api/sql/Raster-loader/#geotiff-dataframe-loader
>>>>>>>> >>>> .
>>>>>>>> >>>> Now I'm wondering how to join this dataframe with another one
>>>>>>>> that
>>>>>>>> >>>> contains
>>>>>>>> >>>> coordinates, in order to get the elevation data for those
>>>>>>>> coordinates.
>>>>>>>> >>>>
>>>>>>>> >>>> Something along these lines:
>>>>>>>> >>>>
>>>>>>>> >>>> pointsDF
>>>>>>>> >>>>   .join(geotiffDF, ...)
>>>>>>>> >>>>   .select("lon", "lat", "geotiff_data")
>>>>>>>> >>>>
>>>>>>>> >>>> Are there any examples or documentation I can follow to
>>>>>>>> accomplish this?
>>>>>>>> >>>>
>>>>>>>> >>>> Thanks,
>>>>>>>> >>>>
>>>>>>>> >>>> --
>>>>>>>> >>>> Pedro Mano Fernandes
>>>>>>>> >>>>
>>>>>>>> >>> --
>>>>>>>> >> Pedro Mano Fernandes
>>>>>>>> >>
>>>>>>>> >
>>>>>>>> >
>>>>>>>> > --
>>>>>>>> > Pedro Mano Fernandes
>>>>>>>> >
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Hälsningar,
>>>>>>> Martin
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Pedro Mano Fernandes
>>>>>>
>>>>>
>>>>
>>>> --
>>>> Pedro Mano Fernandes
>>>>
>>>
>>
>> --
>> Pedro Mano Fernandes
>>
>

-- 
Pedro Mano Fernandes