You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Rodrigo Boavida (Jira)" <ji...@apache.org> on 2022/02/05 10:13:00 UTC
[jira] [Updated] (SPARK-36986) Improving schema filtering flexibility
[ https://issues.apache.org/jira/browse/SPARK-36986?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Rodrigo Boavida updated SPARK-36986:
------------------------------------
Summary: Improving schema filtering flexibility (was: Improving external schema management flexibility)
> Improving schema filtering flexibility
> --------------------------------------
>
> Key: SPARK-36986
> URL: https://issues.apache.org/jira/browse/SPARK-36986
> Project: Spark
> Issue Type: Improvement
> Components: SQL
> Affects Versions: 3.3.0
> Reporter: Rodrigo Boavida
> Priority: Major
>
> Our spark usage, requires us to build an external schema and pass it on while creating a DataSet.
> While working through this, I found a couple of optimizations would improve greatly Spark's flexibility to handle external schema management.
> Scope: ability to retrieve a field's name and schema in one single call, requesting to return a tupple by index.
> Means extending the StructType class to support an additional method
> This is what the function would look like:
> /**
> * Returns the index and field structure by name.
> * If it doesn't find it, returns None.
> * Avoids two client calls/loops to obtain consolidated field info.
> *
> */
> def getIndexAndFieldByName(name: String): Option[(Int, StructField)] = \{ val field = nameToField.get(name) if(field.isDefined) \{ Some((fieldIndex(name), field.get)) }
> else
> { None }
> }
> This is particularly useful from an efficiency perspective, when we're parsing a Json structure and we want to check for every field what is the name and field type already defined in the schema
> I will create a corresponding branch for PR review, assuming that there are no concerns with the above proposal.
>
--
This message was sent by Atlassian Jira
(v8.20.1#820001)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org