You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Sean Owen (JIRA)" <ji...@apache.org> on 2016/10/28 09:59:58 UTC

[jira] [Resolved] (SPARK-18150) Spark 2.* failes to create partitions for avro files

     [ https://issues.apache.org/jira/browse/SPARK-18150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sean Owen resolved SPARK-18150.
-------------------------------
    Resolution: Invalid

Please start on the mailing list with a more detailed question, and after reviewing the contributing guide.

>  Spark 2.* failes to create partitions for avro files
> -----------------------------------------------------
>
>                 Key: SPARK-18150
>                 URL: https://issues.apache.org/jira/browse/SPARK-18150
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL, Streaming
>            Reporter: Sunil Kumar
>            Priority: Blocker
>
> I am using Apache Spark 2.0.1 for processing the Grid HDFS Avro file, however I don't see spark distributing the job into different tasks instead it uses single task and all the operations (read, load, filter, show ) are done in a sequence using same task.
> This means I am not able to leverage distributed parallel processing.
> I tried the same operation on JSON file on HDFS, it works good, means the job gets distributed into multiple tasks and partition. I see parallelism.
> I then tested the same on Spark 1.6, there it does the partitioning. Looks like there is a bug in Spark 2.* version. If not can some one help me know how to achieve the same on Avro file, do I need to do something special for Avro files ?
> Note:
> I explored spark setting: "spark.default.parallelism",  "spark.sql.files.maxPartitionBytes", "--num-executors" and "spark.sql.shuffle.partitions". These were not of much help. "spark.default.parallelism", ensured to have multiple tasks however a single task ended up performing all the operation.
> I am using com.databricks.spark.avro (3.0.1) for Spark 2.0.1.
> Thanks,
> Sunil



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org