You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@sedona.apache.org by GitBox <gi...@apache.org> on 2022/04/21 22:29:16 UTC

[GitHub] [incubator-sedona] kanchanchy opened a new pull request, #614: [SEDONA-117] RS_AppendNormalizedDifference Implemented and Raster Notebook Updated

kanchanchy opened a new pull request, #614:
URL: https://github.com/apache/incubator-sedona/pull/614

   ## Is this PR related to a proposed Issue?
   Yes. [SEDONA-117](https://issues.apache.org/jira/browse/SEDONA-117)
   
   ## What changes were proposed in this PR?
   1. Implements a new SQL expression for GeoTiff images. Expression name: RS_AppendNormalizedDifference. This transformation calculates the normalized difference index between two bands and appends the index as a new band in the array of all bands.
   2. Updates the raster notebook in the binder to include the improvements made for version 1.2.1.
   
   ## How was this patch tested?
   Unit tests were added for the new transformation operation.
   
   ## Did this PR include necessary documentation updates?
   Yes. Documentation was updated.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@sedona.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-sedona] kanchanchy commented on a diff in pull request #614: [SEDONA-117] RS_AppendNormalizedDifference Implemented and Raster Notebook Updated

Posted by GitBox <gi...@apache.org>.
kanchanchy commented on code in PR #614:
URL: https://github.com/apache/incubator-sedona/pull/614#discussion_r855804030


##########
sql/src/main/scala/org/apache/spark/sql/sedona_sql/expressions/raster/Functions.scala:
##########
@@ -771,3 +771,45 @@ case class RS_Normalize(inputExpressions: Seq[Expression])
 }
 
 
+/// Append the Normalized Difference between two bands to the image array data as a new band

Review Comment:
   @jiayuasu In current GeoTiff Dataframe of Apache Sedona, each individual band is not a single column. There are two columns related bands: i) nBands abd ii) data. nBands denote the number of bands in a Geotiff image, and data contains all bands in a single array. If there are 4 bands each of length 10, the size of data array is 40, where first 10 elements correspond to first band, second 10 elements correspond to 2nd band....
   I maintained exactly the same structure of current raster dataframe in apache sedona.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@sedona.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-sedona] kanchanchy commented on a diff in pull request #614: [SEDONA-117] RS_AppendNormalizedDifference Implemented and Raster Notebook Updated

Posted by GitBox <gi...@apache.org>.
kanchanchy commented on code in PR #614:
URL: https://github.com/apache/incubator-sedona/pull/614#discussion_r855828560


##########
sql/src/main/scala/org/apache/spark/sql/sedona_sql/expressions/raster/Functions.scala:
##########
@@ -771,3 +771,45 @@ case class RS_Normalize(inputExpressions: Seq[Expression])
 }
 
 
+/// Append the Normalized Difference between two bands to the image array data as a new band

Review Comment:
   Yes, I will update the pull request with the above changes.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@sedona.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-sedona] kanchanchy commented on a diff in pull request #614: [SEDONA-117] RS_AppendNormalizedDifference Implemented and Raster Notebook Updated

Posted by GitBox <gi...@apache.org>.
kanchanchy commented on code in PR #614:
URL: https://github.com/apache/incubator-sedona/pull/614#discussion_r855820003


##########
sql/src/main/scala/org/apache/spark/sql/sedona_sql/expressions/raster/Functions.scala:
##########
@@ -771,3 +771,45 @@ case class RS_Normalize(inputExpressions: Seq[Expression])
 }
 
 
+/// Append the Normalized Difference between two bands to the image array data as a new band

Review Comment:
   Yes, I agree with both points. Users can calculate the normalized difference using RS_NormalizedDifference operator and then pass the normalized difference to RS_AppendBand to complete the full requirement. Also, if all RS operator are specific to bands, then Band postfix is not required.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@sedona.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-sedona] kanchanchy commented on a diff in pull request #614: [SEDONA-117] RS_AppendNormalizedDifference Implemented and Raster Notebook Updated

Posted by GitBox <gi...@apache.org>.
kanchanchy commented on code in PR #614:
URL: https://github.com/apache/incubator-sedona/pull/614#discussion_r855930007


##########
sql/src/main/scala/org/apache/spark/sql/sedona_sql/expressions/raster/Functions.scala:
##########
@@ -771,3 +771,45 @@ case class RS_Normalize(inputExpressions: Seq[Expression])
 }
 
 
+/// Append the Normalized Difference between two bands to the image array data as a new band

Review Comment:
   @jiayuasu There is one more operator with Band postfix: RS_GetBand. I think we can keep it unchanged. Please let me know if you want to remove the Band postfix from this operator too.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@sedona.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-sedona] jiayuasu merged pull request #614: [SEDONA-117] RS_AppendNormalizedDifference Implemented and Raster Notebook Updated

Posted by GitBox <gi...@apache.org>.
jiayuasu merged PR #614:
URL: https://github.com/apache/incubator-sedona/pull/614


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@sedona.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-sedona] jiayuasu commented on a diff in pull request #614: [SEDONA-117] RS_AppendNormalizedDifference Implemented and Raster Notebook Updated

Posted by GitBox <gi...@apache.org>.
jiayuasu commented on code in PR #614:
URL: https://github.com/apache/incubator-sedona/pull/614#discussion_r855815163


##########
sql/src/main/scala/org/apache/spark/sql/sedona_sql/expressions/raster/Functions.scala:
##########
@@ -771,3 +771,45 @@ case class RS_Normalize(inputExpressions: Seq[Expression])
 }
 
 
+/// Append the Normalized Difference between two bands to the image array data as a new band

Review Comment:
   OK. I was wrong. Then there is an RS operator which is missing: RS_AppendBand, otherwise the user won't be able to re-generate GeoTiffs after manipulating individual bands using [Raster operators](https://sedona.apache.org/api/sql/Raster-operators/).
   
   1. RS_AppendBand takes at least 2 parameters: data, new_band, nBands.  The new_band will be appended to the end of data. nBands is the num of bands in data, before appending. nBands might come in handy if you need to expand the data array.
   
   With this more generic RS_AppendBand func, RS_AppendNormalizedDifference is no longer needed since we already has RS_NormalizedDifference func.
   
   2. On a side note, since the RS operator unit is a single band, it seems to me that all "Band" postfix in the RS operators' names are redundant and will produce super long SQL queries. What do you think?
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@sedona.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-sedona] jiayuasu commented on pull request #614: [SEDONA-117] RS_AppendNormalizedDifference Implemented and Raster Notebook Updated

Posted by GitBox <gi...@apache.org>.
jiayuasu commented on PR #614:
URL: https://github.com/apache/incubator-sedona/pull/614#issuecomment-1105858533

   @kanchanchy A quick question: Is df.coalesce(1) required in order to write GeoTiffs?
   
   If we don't use coalesce(1), what will happen? and what is the disk layout of this GeoTiff DataFrame?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@sedona.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-sedona] jiayuasu commented on a diff in pull request #614: [SEDONA-117] RS_AppendNormalizedDifference Implemented and Raster Notebook Updated

Posted by GitBox <gi...@apache.org>.
jiayuasu commented on code in PR #614:
URL: https://github.com/apache/incubator-sedona/pull/614#discussion_r855801004


##########
sql/src/main/scala/org/apache/spark/sql/sedona_sql/expressions/raster/Functions.scala:
##########
@@ -771,3 +771,45 @@ case class RS_Normalize(inputExpressions: Seq[Expression])
 }
 
 
+/// Append the Normalized Difference between two bands to the image array data as a new band

Review Comment:
   I am kind of confused by this new function. In the current Sedona Raster DataFrame, each individual band is a single column. So if these GeoTiffs have 9 bands, our raster dataframe will have 9 columns.
   
   But in your case, it seems that you put all bands in one column? This is different from what the GeoTiff reader does.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@sedona.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-sedona] kanchanchy commented on pull request #614: [SEDONA-117] RS_AppendNormalizedDifference Implemented and Raster Notebook Updated

Posted by GitBox <gi...@apache.org>.
kanchanchy commented on PR #614:
URL: https://github.com/apache/incubator-sedona/pull/614#issuecomment-1105861544

   In the unit tests, I have shown it both with and without coalesce(1). I also showed both versions in the documentation.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@sedona.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-sedona] jiayuasu commented on a diff in pull request #614: [SEDONA-117] RS_AppendNormalizedDifference Implemented and Raster Notebook Updated

Posted by GitBox <gi...@apache.org>.
jiayuasu commented on code in PR #614:
URL: https://github.com/apache/incubator-sedona/pull/614#discussion_r855953818


##########
sql/src/main/scala/org/apache/spark/sql/sedona_sql/expressions/raster/Functions.scala:
##########
@@ -771,3 +771,45 @@ case class RS_Normalize(inputExpressions: Seq[Expression])
 }
 
 
+/// Append the Normalized Difference between two bands to the image array data as a new band

Review Comment:
   Yes, I know this one and I think it is better to keep it unchanged



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@sedona.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-sedona] kanchanchy commented on a diff in pull request #614: [SEDONA-117] RS_AppendNormalizedDifference Implemented and Raster Notebook Updated

Posted by GitBox <gi...@apache.org>.
kanchanchy commented on code in PR #614:
URL: https://github.com/apache/incubator-sedona/pull/614#discussion_r855804030


##########
sql/src/main/scala/org/apache/spark/sql/sedona_sql/expressions/raster/Functions.scala:
##########
@@ -771,3 +771,45 @@ case class RS_Normalize(inputExpressions: Seq[Expression])
 }
 
 
+/// Append the Normalized Difference between two bands to the image array data as a new band

Review Comment:
   @jiayuasu In current GeoTiff Dataframe of Apache Sedona, each individual band is not a single column. There are two columns related bands: i) nBands abd ii) data. nBands denotes the number of bands in a Geotiff image, and data contains all bands in a single array. If there are 4 bands each of length 10, the size of data array is 40, where first 10 elements correspond to first band, second 10 elements correspond to 2nd band....
   I maintained exactly the same structure of current raster dataframe in apache sedona.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@sedona.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-sedona] kanchanchy commented on pull request #614: [SEDONA-117] RS_AppendNormalizedDifference Implemented and Raster Notebook Updated

Posted by GitBox <gi...@apache.org>.
kanchanchy commented on PR #614:
URL: https://github.com/apache/incubator-sedona/pull/614#issuecomment-1106856139

   @jiayuasu The pull request is updated with the above discussed changes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@sedona.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-sedona] kanchanchy commented on a diff in pull request #614: [SEDONA-117] RS_AppendNormalizedDifference Implemented and Raster Notebook Updated

Posted by GitBox <gi...@apache.org>.
kanchanchy commented on code in PR #614:
URL: https://github.com/apache/incubator-sedona/pull/614#discussion_r855804595


##########
sql/src/main/scala/org/apache/spark/sql/sedona_sql/expressions/raster/Functions.scala:
##########
@@ -771,3 +771,45 @@ case class RS_Normalize(inputExpressions: Seq[Expression])
 }
 
 
+/// Append the Normalized Difference between two bands to the image array data as a new band

Review Comment:
   You can check lines 43 to 49 in this file:
   https://github.com/apache/incubator-sedona/blob/master/sql/src/main/scala/org/apache/spark/sql/sedona_sql/io/GeotiffSchema.scala



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@sedona.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-sedona] jiayuasu commented on a diff in pull request #614: [SEDONA-117] RS_AppendNormalizedDifference Implemented and Raster Notebook Updated

Posted by GitBox <gi...@apache.org>.
jiayuasu commented on code in PR #614:
URL: https://github.com/apache/incubator-sedona/pull/614#discussion_r855826685


##########
sql/src/main/scala/org/apache/spark/sql/sedona_sql/expressions/raster/Functions.scala:
##########
@@ -771,3 +771,45 @@ case class RS_Normalize(inputExpressions: Seq[Expression])
 }
 
 
+/// Append the Normalized Difference between two bands to the image array data as a new band

Review Comment:
   Awesome. Then can you (1) add RS_Append instead? (2) remove the "Band" postfix from the operators whose name contain "Band"?
   
   * RS_AddBands
   * RS_SubtractBands
   * RS_MultiplyBands
   * RS_DivideBands



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@sedona.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-sedona] kanchanchy commented on pull request #614: [SEDONA-117] RS_AppendNormalizedDifference Implemented and Raster Notebook Updated

Posted by GitBox <gi...@apache.org>.
kanchanchy commented on PR #614:
URL: https://github.com/apache/incubator-sedona/pull/614#issuecomment-1105859935

   @jiayuasu No, coalesce(1) is not mandatory. If it is not provided, writing will be distributed and images will be written in different subfolders in the given destination path.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@sedona.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-sedona] kanchanchy commented on pull request #614: [SEDONA-117] RS_AppendNormalizedDifference Implemented and Raster Notebook Updated

Posted by GitBox <gi...@apache.org>.
kanchanchy commented on PR #614:
URL: https://github.com/apache/incubator-sedona/pull/614#issuecomment-1105862971

   Same image will not be distributed into multiple sub-folders. If there are 3 images in total, there might be 3 subfolders with each subfolder having 1 image.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@sedona.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org