You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hivemall.apache.org by "Bob (Jira)" <ji...@apache.org> on 2021/05/18 08:49:00 UTC
[jira] [Resolved] (HIVEMALL-313) Missing xxxWrapper classes with latest version when load define-all.spark

     [ https://issues.apache.org/jira/browse/HIVEMALL-313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Bob resolved HIVEMALL-313.
--------------------------
    Fix Version/s: 0.6.2
       Resolution: Fixed

The problem has been resolved by [https://github.com/apache/incubator-hivemall/pull/244]

Thanks a lot

> Missing xxxWrapper classes with latest version when load define-all.spark
> -------------------------------------------------------------------------
>
>                 Key: HIVEMALL-313
>                 URL: https://issues.apache.org/jira/browse/HIVEMALL-313
>             Project: Hivemall
>          Issue Type: Bug
>    Affects Versions: 0.6.2
>            Reporter: Bob
>            Assignee: Makoto Yui
>            Priority: Major
>             Fix For: 0.6.2
>
>
> Current updated problems:
> When I working with define-all.spark as described in [http://hivemall.apache.org/userguide/spark/getting_started/installation.html,] I got no class found errors.
> Please kindly help on this point, thanks.
> Is there any chances to reload those xxxWrapper functions from previous spark module to latest version of Hivemall?
> That means keep 'hivemall.knn.lsh.MinHashesUDFWrapper' with 'hivemall.knn.lsh.MinHashesUDF' in the same package.
>  
> {code:java}
> spark-shell --jars target/hivemall-all<version>-incubating-SNAPSHOT.jar
> scala> :load resources/ddl/define-all.spark
> <https://github.com/apache/incubator-hivemall/blob/master/resources/ddl/define-all.spark>{code}
>  
>  
> In the latest define-all.spark, you could find functions created by xxxWrapper class in line 142, 242, 248, 251, 583, 666 as below:
> {code:java}
> //line 142
> sqlContext.sql("CREATE TEMPORARY FUNCTION minhashes AS 'hivemall.knn.lsh.MinHashesUDFWrapper'")
> //line 242
> sqlContext.sql("CREATE TEMPORARY FUNCTION add_bias AS 'hivemall.ftvec.AddBiasUDFWrapper'")
> //line 248
> sqlContext.sql("CREATE TEMPORARY FUNCTION extract_feature AS 'hivemall.ftvec.ExtractFeatureUDFWrapper'")
> //line 251
> sqlContext.sql("CREATE TEMPORARY FUNCTION extract_weight AS 'hivemall.ftvec.ExtractWeightUDFWrapper'")
> //line 583
> sqlContext.sql("CREATE TEMPORARY FUNCTION rowid AS 'hivemall.tools.mapred.RowIdUDFWrapper'")
> //line 666
> sqlContext.sql("CREATE TEMPORARY FUNCTION lr_datagen AS 'hivemall.dataset.LogisticRegressionDataGeneratorUDTFWrapper'")
> {code}
> As there are no related class defined, it should show errors when load define-all.spark, I have tried and get below errors:
> {code:java}
> org.apache.spark.sql.AnalysisException: Can not load class 'hivemall.knn.lsh.MinHashesUDFWrapper' when registering the function 'minhashes', please make sure it is on the classpath;
>   at org.apache.spark.sql.catalyst.catalog.SessionCatalog$$anonfun$37.apply(SessionCatalog.scala:1180)
>   at org.apache.spark.sql.catalyst.catalog.SessionCatalog$$anonfun$37.apply(SessionCatalog.scala:1177)
>   at scala.Option.getOrElse(Option.scala:121)
>   at org.apache.spark.sql.catalyst.catalog.SessionCatalog.registerFunction(SessionCatalog.scala:1177)
>   at org.apache.spark.sql.execution.command.CreateFunctionCommand.run(functions.scala:81)
>   at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
>   at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
>   at org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:79)
>   at org.apache.spark.sql.Dataset$$anonfun$6.apply(Dataset.scala:194)
>   at org.apache.spark.sql.Dataset$$anonfun$6.apply(Dataset.scala:194)
>   at org.apache.spark.sql.Dataset$$anonfun$52.apply(Dataset.scala:3370)
>   at org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:80)
>   at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:127)
>   at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:75)
>   at org.apache.spark.sql.Dataset.org$apache$spark$sql$Dataset$$withAction(Dataset.scala:3369)
>   at org.apache.spark.sql.Dataset.<init>(Dataset.scala:194)
>   at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:79)
>   at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:643)
>   at org.apache.spark.sql.SQLContext.sql(SQLContext.scala:694)
>   ... 76 elided
> org.apache.spark.sql.AnalysisException: Can not load class 'hivemall.ftvec.AddBiasUDFWrapper' when registering the function 'add_bias', please make sure it is on the classpath;
>   at org.apache.spark.sql.catalyst.catalog.SessionCatalog$$anonfun$37.apply(SessionCatalog.scala:1180)
>   at org.apache.spark.sql.catalyst.catalog.SessionCatalog$$anonfun$37.apply(SessionCatalog.scala:1177)
>   at scala.Option.getOrElse(Option.scala:121)
>   at org.apache.spark.sql.catalyst.catalog.SessionCatalog.registerFunction(SessionCatalog.scala:1177)
>   at org.apache.spark.sql.execution.command.CreateFunctionCommand.run(functions.scala:81)
>   at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
>   at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
>   at org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:79)
>   at org.apache.spark.sql.Dataset$$anonfun$6.apply(Dataset.scala:194)
>   at org.apache.spark.sql.Dataset$$anonfun$6.apply(Dataset.scala:194)
>   at org.apache.spark.sql.Dataset$$anonfun$52.apply(Dataset.scala:3370)
>   at org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:80)
>   at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:127)
>   at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:75)
>   at org.apache.spark.sql.Dataset.org$apache$spark$sql$Dataset$$withAction(Dataset.scala:3369)
>   at org.apache.spark.sql.Dataset.<init>(Dataset.scala:194)
>   at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:79)
>   at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:643)
>   at org.apache.spark.sql.SQLContext.sql(SQLContext.scala:694)
>   ... 76 elided
> ...................{code}
>  
>  
> Previous discussed questions:
> It seems that we need to create some specific functions based on xxxWrapper class in spark environment. But the latest version of Hivemall does not contain spark module any more.
> That means I have to get the previous version such as 0.5.2.
> Sometimes I have to use function A based on hivemall-spark2.2-0.5.2-incubating-with-dependencies.jar and function B based on hivemall-all-0.6.2-incubating-SNAPSHOT.jar.
> Then I may meet some dependency package conflict problems there.
> I just wonder if I could use those xxxWrapper class with Hivemall under the latest version.
> Thanks a lot for your help.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)