You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2022/08/27 07:53:17 UTC

[GitHub] [spark] wangyum opened a new pull request, #37688: [SPARK-40243][DOCS] Enhance Hive UDF support documentation

wangyum opened a new pull request, #37688:
URL: https://github.com/apache/spark/pull/37688

   ### What changes were proposed in this pull request?
   
   Enhance Hive UDF support documentation to get better performance.
   
   For example, we have rewritten a heavy UDF:
   Default(UTF8String <-> Text <-> String) | Rewritten(UTF8String <-> String)
   -- | --
   ![image](https://user-images.githubusercontent.com/5399861/187020616-cb3f008a-b798-44f8-bf29-599bf21e3367.png) | ![image](https://user-images.githubusercontent.com/5399861/187020624-8f1da19c-c584-493e-9a2a-5fa7dd268563.png)
   
   
   
   
   
   
   
   ### Why are the changes needed?
   
   Enhance documentation.
   
   ### Does this PR introduce _any_ user-facing change?
   
   No.
   
   ### How was this patch tested?
   
   manual test.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zhengruifeng closed pull request #37688: [SPARK-40243][DOCS] Enhance Hive UDF support documentation

Posted by GitBox <gi...@apache.org>.
zhengruifeng closed pull request #37688: [SPARK-40243][DOCS] Enhance Hive UDF support documentation
URL: https://github.com/apache/spark/pull/37688


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zhengruifeng commented on pull request #37688: [SPARK-40243][DOCS] Enhance Hive UDF support documentation

Posted by GitBox <gi...@apache.org>.
zhengruifeng commented on PR #37688:
URL: https://github.com/apache/spark/pull/37688#issuecomment-1230124114

   Merged into master


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] wangyum commented on a diff in pull request #37688: [SPARK-40243][DOCS] Enhance Hive UDF support documentation

Posted by GitBox <gi...@apache.org>.
wangyum commented on code in PR #37688:
URL: https://github.com/apache/spark/pull/37688#discussion_r956553540


##########
docs/sql-ref-functions-udf-hive.md:
##########
@@ -52,6 +52,18 @@ SELECT testUDF(value) FROM t;
 |           2.0|
 |           3.0|
 +--------------+
+
+-- Register `UDFSubstr` and use it in Spark SQL.
+-- Note that, it can achieve better performance if the return types and method parameters use Java primitives.
+-- e.g., UDFSubstr. The data processing method is UTF8String <-> Text <-> String. we can avoid UTF8String <-> Text. 

Review Comment:
   Example of how to rewrite UDF: 
   <img width="1681" alt="image" src="https://user-images.githubusercontent.com/5399861/187021044-ade75cb9-6f3e-40be-aac1-0cd6aab97d0b.png">
   
   Previous stack trace:
   ```
   java.lang.String.toCharArray(String.java:2899)
   org.apache.hadoop.io.Text.encode(Text.java:451)
   org.apache.hadoop.io.Text.set(Text.java:198)
   org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorConverter$TextConverter.convert(PrimitiveObjectInspectorConverter.java:441)
   org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorConverter$TextConverter.convert(PrimitiveObjectInspectorConverter.java:386)
   org.apache.hadoop.hive.ql.udf.generic.GenericUDFUtils$ConversionHelper.convertIfNecessary(GenericUDFUtils.java:435)
   org.apache.spark.sql.hive.HiveSimpleUDF.eval(hiveUDFs.scala:99)
   org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificPredicate.eval(Unknown Source)
   org.apache.spark.sql.execution.FilterExec.$anonfun$doExecute$3(basicPhysicalOperators.scala:236)
   org.apache.spark.sql.execution.FilterExec.$anonfun$doExecute$3$adapted(basicPhysicalOperators.scala:235)
   org.apache.spark.sql.execution.FilterExec$$Lambda$3203/1487544979.apply(Unknown Source)
   ...
   ```
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] wangyum commented on a diff in pull request #37688: [SPARK-40243][DOCS] Enhance Hive UDF support documentation

Posted by GitBox <gi...@apache.org>.
wangyum commented on code in PR #37688:
URL: https://github.com/apache/spark/pull/37688#discussion_r956553540


##########
docs/sql-ref-functions-udf-hive.md:
##########
@@ -52,6 +52,18 @@ SELECT testUDF(value) FROM t;
 |           2.0|
 |           3.0|
 +--------------+
+
+-- Register `UDFSubstr` and use it in Spark SQL.
+-- Note that, it can achieve better performance if the return types and method parameters use Java primitives.
+-- e.g., UDFSubstr. The data processing method is UTF8String <-> Text <-> String. we can avoid UTF8String <-> Text. 

Review Comment:
   Example of how to rewrite UDF: 
   <img width="1681" alt="image" src="https://user-images.githubusercontent.com/5399861/187021044-ade75cb9-6f3e-40be-aac1-0cd6aab97d0b.png">
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] wangyum commented on a diff in pull request #37688: [SPARK-40243][DOCS] Enhance Hive UDF support documentation

Posted by GitBox <gi...@apache.org>.
wangyum commented on code in PR #37688:
URL: https://github.com/apache/spark/pull/37688#discussion_r956553540


##########
docs/sql-ref-functions-udf-hive.md:
##########
@@ -52,6 +52,18 @@ SELECT testUDF(value) FROM t;
 |           2.0|
 |           3.0|
 +--------------+
+
+-- Register `UDFSubstr` and use it in Spark SQL.
+-- Note that, it can achieve better performance if the return types and method parameters use Java primitives.
+-- e.g., UDFSubstr. The data processing method is UTF8String <-> Text <-> String. we can avoid UTF8String <-> Text. 

Review Comment:
   Example of how to rewrite UDF: 
   <img width="1363" alt="image" src="https://user-images.githubusercontent.com/5399861/187026276-2dd32897-9ce6-491b-b2a6-ab952b9a3cfb.png">
   
   
   Previous stack trace:
   ```
   java.lang.String.toCharArray(String.java:2899)
   org.apache.hadoop.io.Text.encode(Text.java:451)
   org.apache.hadoop.io.Text.set(Text.java:198)
   org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorConverter$TextConverter.convert(PrimitiveObjectInspectorConverter.java:441)
   org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorConverter$TextConverter.convert(PrimitiveObjectInspectorConverter.java:386)
   org.apache.hadoop.hive.ql.udf.generic.GenericUDFUtils$ConversionHelper.convertIfNecessary(GenericUDFUtils.java:435)
   org.apache.spark.sql.hive.HiveSimpleUDF.eval(hiveUDFs.scala:99)
   org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificPredicate.eval(Unknown Source)
   org.apache.spark.sql.execution.FilterExec.$anonfun$doExecute$3(basicPhysicalOperators.scala:236)
   org.apache.spark.sql.execution.FilterExec.$anonfun$doExecute$3$adapted(basicPhysicalOperators.scala:235)
   org.apache.spark.sql.execution.FilterExec$$Lambda$3203/1487544979.apply(Unknown Source)
   ...
   ```
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org