You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@spark.apache.org by "Adam Roberts (JIRA)" <ji...@apache.org> on 2015/12/08 19:25:11 UTC

[jira] [Commented] (SPARK-9858) Introduce an ExchangeCoordinator to estimate the number of post-shuffle partitions.

    [ https://issues.apache.org/jira/browse/SPARK-9858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15047194#comment-15047194 ] 

Adam Roberts commented on SPARK-9858:
-------------------------------------

Several potential issues here, may well not be with this code itself though - I'm consistently encountering problems for two different big endian platforms while testing this

1) is this thread safe? I've noticed if we print the rowBuffer when using more than one thread for our SQLContext, the ordering of elements is not consistent and we sometimes have two rows printed consecutively

2) For the aggregate, join, and complex query 2 tests, I consistently receive more bytes per partition and instead of estimating (0, 2) for the indices we get (0, 2, 4). I know we're using the UnsafeRowSerializer and so wary if the issue lies here instead, I see it's using Google's ByteStreams class to read in the bytes. Specifically I have 800, 800, 800, 800, 720 bytes per partition instead of 600, 600, 600, 600, 600

3) Where do the values used in the assertions for the test suite come from?

If we print the rows we see differences between the two platforms: (the 63 and 70 is on our BE platform and this value differs each time we run the test)

Works perfectly on various architectures that are LE and hence the current endianness/serialization theory. Apologies if this would be better suited to the dev mailing list, although I expect I'm one of the few to be testing this on BE...

> Introduce an ExchangeCoordinator to estimate the number of post-shuffle partitions.
> -----------------------------------------------------------------------------------
>
>                 Key: SPARK-9858
>                 URL: https://issues.apache.org/jira/browse/SPARK-9858
>             Project: Spark
>          Issue Type: Sub-task
>          Components: SQL
>            Reporter: Yin Huai
>            Assignee: Yin Huai
>             Fix For: 1.6.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org