You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@spark.apache.org by Marek Wiewiorka <ma...@gmail.com> on 2015/03/22 16:39:51 UTC

lower&upperBound not working/spark 1/3

Hi All - I try to use the new SQLContext API for populating DataFrame from
jdbc data source.
like this:

val jdbcDF = sqlContext.jdbc(url =
"jdbc:postgresql://localhost:5430/dbname?user=user&password=111", table =
"se_staging.exp_table3" ,columnName="cs_id",lowerBound=1 ,upperBound =
10000, numPartitions=12 )

No matter how I set lower and upper bounds I always get all the rows from
my table.
The API is marked as experimental so I assume there might by some bugs in
it but
did anybody come across a similar issue?

Thanks!
Marek

Re: lower&upperBound not working/spark 1/3

Posted by "alessandro.andrioni" <al...@dafiti.com.br>.

If I'm reading this comment[1] correctly, this is expected behavior: the
lower and upper bounds are used to make the partitioning more efficient, not
to limit the data returned.

>/**
> * Given a partitioning schematic (a column of integral type, a number of
> * partitions, and upper and lower bounds on the column's value), generate
> * WHERE clauses for each partition so that each row in the table appears
> * exactly once.  The parameters minValue and maxValue are advisory in that
> * incorrect values may cause the partitioning to be poor, but no data
> * will fail to be represented.
> */

I also got bit by this recently.

[1]:
https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/jdbc/JDBCRelation.scala#L49-L56


Marek Wiewiorka wrote
> Ok- thanks Michael I will do another series of tests to confirm this and
> then report an issue.
> 
> Regards,
> Marek
> 
> 2015-03-22 22:19 GMT+01:00 Michael Armbrust &lt;

> michael@

> &gt;:
> 
>> I have not heard this reported yet, but your invocation looks correct to
>> me.  Can you open a JIRA?
>>
>> On Sun, Mar 22, 2015 at 8:39 AM, Marek Wiewiorka <
>> 

> marek.wiewiorka@

>> wrote:
>>
>>> Hi All - I try to use the new SQLContext API for populating DataFrame
>>> from
>>> jdbc data source.
>>> like this:
>>>
>>> val jdbcDF = sqlContext.jdbc(url =
>>> "jdbc:postgresql://localhost:5430/dbname?user=user&password=111", table
>>> =
>>> "se_staging.exp_table3" ,columnName="cs_id",lowerBound=1 ,upperBound =
>>> 10000, numPartitions=12 )
>>>
>>> No matter how I set lower and upper bounds I always get all the rows
>>> from
>>> my table.
>>> The API is marked as experimental so I assume there might by some bugs
>>> in
>>> it but
>>> did anybody come across a similar issue?
>>>
>>> Thanks!
>>> Marek
>>>
>>
>>





--
View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/lower-upperBound-not-working-spark-1-3-tp11151p11252.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
For additional commands, e-mail: dev-help@spark.apache.org

Re: lower&upperBound not working/spark 1/3

Posted by Marek Wiewiorka <ma...@gmail.com>.

Ok- thanks Michael I will do another series of tests to confirm this and
then report an issue.

Regards,
Marek

2015-03-22 22:19 GMT+01:00 Michael Armbrust <mi...@databricks.com>:

> I have not heard this reported yet, but your invocation looks correct to
> me.  Can you open a JIRA?
>
> On Sun, Mar 22, 2015 at 8:39 AM, Marek Wiewiorka <
> marek.wiewiorka@gmail.com> wrote:
>
>> Hi All - I try to use the new SQLContext API for populating DataFrame from
>> jdbc data source.
>> like this:
>>
>> val jdbcDF = sqlContext.jdbc(url =
>> "jdbc:postgresql://localhost:5430/dbname?user=user&password=111", table =
>> "se_staging.exp_table3" ,columnName="cs_id",lowerBound=1 ,upperBound =
>> 10000, numPartitions=12 )
>>
>> No matter how I set lower and upper bounds I always get all the rows from
>> my table.
>> The API is marked as experimental so I assume there might by some bugs in
>> it but
>> did anybody come across a similar issue?
>>
>> Thanks!
>> Marek
>>
>
>

Re: lower&upperBound not working/spark 1/3

Posted by Michael Armbrust <mi...@databricks.com>.

I have not heard this reported yet, but your invocation looks correct to
me.  Can you open a JIRA?

On Sun, Mar 22, 2015 at 8:39 AM, Marek Wiewiorka <ma...@gmail.com>
wrote:

> Hi All - I try to use the new SQLContext API for populating DataFrame from
> jdbc data source.
> like this:
>
> val jdbcDF = sqlContext.jdbc(url =
> "jdbc:postgresql://localhost:5430/dbname?user=user&password=111", table =
> "se_staging.exp_table3" ,columnName="cs_id",lowerBound=1 ,upperBound =
> 10000, numPartitions=12 )
>
> No matter how I set lower and upper bounds I always get all the rows from
> my table.
> The API is marked as experimental so I assume there might by some bugs in
> it but
> did anybody come across a similar issue?
>
> Thanks!
> Marek
>