You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "L. C. Hsieh (Jira)" <ji...@apache.org> on 2020/09/10 03:39:00 UTC

[jira] [Resolved] (SPARK-32589) NoSuchElementException: None.get for needsUnsafeRowConversion

     [ https://issues.apache.org/jira/browse/SPARK-32589?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

L. C. Hsieh resolved SPARK-32589.
---------------------------------
    Resolution: Duplicate

> NoSuchElementException: None.get for needsUnsafeRowConversion
> -------------------------------------------------------------
>
>                 Key: SPARK-32589
>                 URL: https://issues.apache.org/jira/browse/SPARK-32589
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 3.0.0
>            Reporter: Adam Binford
>            Priority: Minor
>
> I have run into an error somewhat non-deterministically where a query fails with 
> {{NoSuchElementException: None.get}}
> which happens at [https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/DataSourceScanExec.scala#L182]
> getActiveSession apparently is returning None. I only use Pyspark, and I think this is a threading issue, since the active session comes from an InheritableThreadLocal. I encounter this both when I manually use threading to run multiple jobs at the same time, as well as occasionally when I have multiple streams active at the same time. I tried using the PYSPARK_PIN_THREAD flag but it didn't seem to help. For the former case I hacked around it in my manual threading code by doing
> {{spark._jvm.SparkSession.setActiveSession(spark._jvm.SparkSession.builder().getOrCreate())}}
> at the start of each new thread, and this sometimes doesn't work reliably either.
> I see this was mentioned in this [issue|https://issues.apache.org/jira/browse/SPARK-21418?focusedCommentId=16174642&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16174642]
> I'm not sure if the problem/solution is something to do with Python threads, or adding a default value or some other way of updating this function. One other note is that I started encountering this when using Delta Lake OSS, which reads parquet files as part of the transaction log, which is when this error always happens. It doesn't seem like anything specific to that library though that would be doing something incorrectly that would cause this issue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org