You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "LoicH (Jira)" <ji...@apache.org> on 2020/07/01 08:27:00 UTC
[jira] [Updated] (SPARK-32146) ValueError when loading a
PipelineModel on a personal computer
[ https://issues.apache.org/jira/browse/SPARK-32146?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
LoicH updated SPARK-32146:
--------------------------
Description:
I have a PipelineModel saved on my computer that I can't load using {{PipelineModel.load(path)}}.
When I launch my code in a Databricks cluster, it works. {{path}} is the path to my model saved on DBFS, accessible via a mount point: {{path = "/dbfs/path/to/my/model}}.
However on my machine, calling {{PipelineModel.load("C:\\Users\\path\\to\\my\\model")}} throws a {{ValueError("RDD is empty")}}.
Here is how the model is saved on my computer:
````
\---model
+---metadata
| part-00000
| _SUCCESS
|
\---stages
+---0_CountVectorizer_b92625354bf7
| +---data
| | part-00000-tid-9156766819779394023-5cf6aecb-8959-48b3-be24-65bfa0543465-62-1-c000.snappy.parquet
| | _committed_9156766819779394023
| | _started_9156766819779394023
| | _SUCCESS
| |
| \---metadata
| part-00000
| _SUCCESS
|
\---1_LinearSVC_108fa01daf43
+---data
| part-00000-tid-4403060754466700849-27841dd9-de88-4015-9dfa-7854c2a15f15-65-1-c000.snappy.parquet
| _committed_4403060754466700849
| _started_4403060754466700849
| _SUCCESS
|
\---metadata
part-00000
_SUCCESS
````
(I just downloaded the model from my DataLake to my computer)
How can I load this model when running my code in local?
was:
I have a PipelineModel saved on my computer that I can't load using {{PipelineModel.load(path)}}.
When I launch my code in a Databricks cluster, it works. {{path}} is the path to my model saved on DBFS, accessible via a mount point: {{path = "/dbfs/path/to/my/model}}.
However on my machine, calling {{PipelineModel.load("C:\\Users\\path\\to\\my\\model")}} throws a {{ValueError("RDD is empty")}}.
Here is how the model is saved on my computer:
{{{{
\---model
+---metadata
| part-00000
| _SUCCESS
|
\---stages
+---0_CountVectorizer_b92625354bf7
| +---data
| | part-00000-tid-9156766819779394023-5cf6aecb-8959-48b3-be24-65bfa0543465-62-1-c000.snappy.parquet
| | _committed_9156766819779394023
| | _started_9156766819779394023
| | _SUCCESS
| |
| \---metadata
| part-00000
| _SUCCESS
|
\---1_LinearSVC_108fa01daf43
+---data
| part-00000-tid-4403060754466700849-27841dd9-de88-4015-9dfa-7854c2a15f15-65-1-c000.snappy.parquet
| _committed_4403060754466700849
| _started_4403060754466700849
| _SUCCESS
|
\---metadata
part-00000
_SUCCESS
}}}}
(I just downloaded the model from my DataLake to my computer)
How can I load this model when running my code in local?
> ValueError when loading a PipelineModel on a personal computer
> --------------------------------------------------------------
>
> Key: SPARK-32146
> URL: https://issues.apache.org/jira/browse/SPARK-32146
> Project: Spark
> Issue Type: Bug
> Components: ML
> Affects Versions: 2.4.5
> Environment: * OS: Windows
> * SparkSession: spark = SparkSession.builder.appName({color:#6a8759}"annonces_organiques"{color}).getOrCreate()
> Reporter: LoicH
> Priority: Blocker
>
> I have a PipelineModel saved on my computer that I can't load using {{PipelineModel.load(path)}}.
> When I launch my code in a Databricks cluster, it works. {{path}} is the path to my model saved on DBFS, accessible via a mount point: {{path = "/dbfs/path/to/my/model}}.
> However on my machine, calling {{PipelineModel.load("C:\\Users\\path\\to\\my\\model")}} throws a {{ValueError("RDD is empty")}}.
> Here is how the model is saved on my computer:
> ````
> \---model
> +---metadata
> | part-00000
> | _SUCCESS
> |
> \---stages
> +---0_CountVectorizer_b92625354bf7
> | +---data
> | | part-00000-tid-9156766819779394023-5cf6aecb-8959-48b3-be24-65bfa0543465-62-1-c000.snappy.parquet
> | | _committed_9156766819779394023
> | | _started_9156766819779394023
> | | _SUCCESS
> | |
> | \---metadata
> | part-00000
> | _SUCCESS
> |
> \---1_LinearSVC_108fa01daf43
> +---data
> | part-00000-tid-4403060754466700849-27841dd9-de88-4015-9dfa-7854c2a15f15-65-1-c000.snappy.parquet
> | _committed_4403060754466700849
> | _started_4403060754466700849
> | _SUCCESS
> |
> \---metadata
> part-00000
> _SUCCESS
> ````
> (I just downloaded the model from my DataLake to my computer)
> How can I load this model when running my code in local?
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org