You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "LoicH (Jira)" <ji...@apache.org> on 2020/07/01 08:27:00 UTC
[jira] [Updated] (SPARK-32146) ValueError when loading a PipelineModel on a personal computer

     [ https://issues.apache.org/jira/browse/SPARK-32146?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

LoicH updated SPARK-32146:
--------------------------
    Description: 
I have a PipelineModel saved on my computer that I can't load using {{PipelineModel.load(path)}}.

When I launch my code in a Databricks cluster, it works. {{path}} is the path to my model saved on DBFS, accessible via a mount point: {{path = "/dbfs/path/to/my/model}}.

However on my machine, calling {{PipelineModel.load("C:\\Users\\path\\to\\my\\model")}} throws a {{ValueError("RDD is empty")}}.

Here is how the model is saved on my computer:

````
\---model
    +---metadata
    |       part-00000
    |       _SUCCESS
    |
    \---stages
        +---0_CountVectorizer_b92625354bf7
        |   +---data
        |   |       part-00000-tid-9156766819779394023-5cf6aecb-8959-48b3-be24-65bfa0543465-62-1-c000.snappy.parquet
        |   |       _committed_9156766819779394023
        |   |       _started_9156766819779394023
        |   |       _SUCCESS
        |   |
        |   \---metadata
        |           part-00000
        |           _SUCCESS
        |
        \---1_LinearSVC_108fa01daf43
            +---data
            |       part-00000-tid-4403060754466700849-27841dd9-de88-4015-9dfa-7854c2a15f15-65-1-c000.snappy.parquet
            |       _committed_4403060754466700849
            |       _started_4403060754466700849
            |       _SUCCESS
            |
            \---metadata
                    part-00000
                    _SUCCESS
````

(I just downloaded the model from my DataLake to my computer)

How can I load this model when running my code in local?

  was:
I have a PipelineModel saved on my computer that I can't load using {{PipelineModel.load(path)}}.

When I launch my code in a Databricks cluster, it works. {{path}} is the path to my model saved on DBFS, accessible via a mount point: {{path = "/dbfs/path/to/my/model}}.

However on my machine, calling {{PipelineModel.load("C:\\Users\\path\\to\\my\\model")}} throws a {{ValueError("RDD is empty")}}.

Here is how the model is saved on my computer:

{{{{
\---model
    +---metadata
    |       part-00000
    |       _SUCCESS
    |
    \---stages
        +---0_CountVectorizer_b92625354bf7
        |   +---data
        |   |       part-00000-tid-9156766819779394023-5cf6aecb-8959-48b3-be24-65bfa0543465-62-1-c000.snappy.parquet
        |   |       _committed_9156766819779394023
        |   |       _started_9156766819779394023
        |   |       _SUCCESS
        |   |
        |   \---metadata
        |           part-00000
        |           _SUCCESS
        |
        \---1_LinearSVC_108fa01daf43
            +---data
            |       part-00000-tid-4403060754466700849-27841dd9-de88-4015-9dfa-7854c2a15f15-65-1-c000.snappy.parquet
            |       _committed_4403060754466700849
            |       _started_4403060754466700849
            |       _SUCCESS
            |
            \---metadata
                    part-00000
                    _SUCCESS
}}}}

(I just downloaded the model from my DataLake to my computer)

How can I load this model when running my code in local?


> ValueError when loading a PipelineModel on a personal computer
> --------------------------------------------------------------
>
>                 Key: SPARK-32146
>                 URL: https://issues.apache.org/jira/browse/SPARK-32146
>             Project: Spark
>          Issue Type: Bug
>          Components: ML
>    Affects Versions: 2.4.5
>         Environment: * OS: Windows
>  * SparkSession: spark = SparkSession.builder.appName({color:#6a8759}"annonces_organiques"{color}).getOrCreate()
>            Reporter: LoicH
>            Priority: Blocker
>
> I have a PipelineModel saved on my computer that I can't load using {{PipelineModel.load(path)}}.
> When I launch my code in a Databricks cluster, it works. {{path}} is the path to my model saved on DBFS, accessible via a mount point: {{path = "/dbfs/path/to/my/model}}.
> However on my machine, calling {{PipelineModel.load("C:\\Users\\path\\to\\my\\model")}} throws a {{ValueError("RDD is empty")}}.
> Here is how the model is saved on my computer:
> ````
> \---model
>     +---metadata
>     |       part-00000
>     |       _SUCCESS
>     |
>     \---stages
>         +---0_CountVectorizer_b92625354bf7
>         |   +---data
>         |   |       part-00000-tid-9156766819779394023-5cf6aecb-8959-48b3-be24-65bfa0543465-62-1-c000.snappy.parquet
>         |   |       _committed_9156766819779394023
>         |   |       _started_9156766819779394023
>         |   |       _SUCCESS
>         |   |
>         |   \---metadata
>         |           part-00000
>         |           _SUCCESS
>         |
>         \---1_LinearSVC_108fa01daf43
>             +---data
>             |       part-00000-tid-4403060754466700849-27841dd9-de88-4015-9dfa-7854c2a15f15-65-1-c000.snappy.parquet
>             |       _committed_4403060754466700849
>             |       _started_4403060754466700849
>             |       _SUCCESS
>             |
>             \---metadata
>                     part-00000
>                     _SUCCESS
> ````
> (I just downloaded the model from my DataLake to my computer)
> How can I load this model when running my code in local?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org