You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Daniel Lopes <da...@onematch.com.br> on 2016/09/07 23:37:54 UTC

year out of range

Hi,

I'm* importing a few CSV*s with spark-csv package,
Always when I give a select at each one looks ok
But when i join then with sqlContext.sql give me this error

all tables has fields timestamp

joins are not with this dates


*Py4JJavaError: An error occurred while calling o643.showString.*
: org.apache.spark.SparkException: Job aborted due to stage failure: Task
54 in stage 92.0 failed 10 times, most recent failure: Lost task 54.9 in
stage 92.0 (TID 6356, yp-spark-dal09-env5-0036):
org.apache.spark.api.python.PythonException: Traceback (most recent call
last):
  File
"/usr/local/src/spark160master/spark-1.6.0-bin-2.6.0/python/lib/pyspark.zip/pyspark/worker.py",
line 111, in main
    process()
  File
"/usr/local/src/spark160master/spark-1.6.0-bin-2.6.0/python/lib/pyspark.zip/pyspark/worker.py",
line 106, in process
    serializer.dump_stream(func(split_index, iterator), outfile)
  File
"/usr/local/src/spark160master/spark-1.6.0-bin-2.6.0/python/lib/pyspark.zip/pyspark/serializers.py",
line 263, in dump_stream
    vs = list(itertools.islice(iterator, batch))
  File
"/usr/local/src/spark160master/spark/python/pyspark/sql/functions.py", line
1563, in <lambda>
    func = lambda _, it: map(lambda x: returnType.toInternal(f(*x)), it)
  File
"/usr/local/src/spark160master/spark-1.6.0-bin-2.6.0/python/lib/pyspark.zip/pyspark/sql/types.py",
line 191, in toInternal
    else time.mktime(dt.timetuple()))
*ValueError: year out of range  *

Any one knows this problem?

Best,

*Daniel Lopes*
Chief Data and Analytics Officer | OneMatch
c: +55 (18) 99764-2733 | https://www.linkedin.com/in/dslopes

www.onematch.com.br
<http://www.onematch.com.br/?utm_source=EmailSignature&utm_term=daniel-lopes>

Re: year out of range

Posted by Daniel Lopes <da...@onematch.com.br>.
Thanks Ayan!

*Daniel Lopes*
Chief Data and Analytics Officer | OneMatch
c: +55 (18) 99764-2733 | https://www.linkedin.com/in/dslopes

www.onematch.com.br
<http://www.onematch.com.br/?utm_source=EmailSignature&utm_term=daniel-lopes>

On Thu, Sep 8, 2016 at 7:54 PM, ayan guha <gu...@gmail.com> wrote:

> Another way of debugging would be writing another UDF, returning string.
> Also, in that function, put something useful in catch block, so you can
> filter those records from df.
> On 9 Sep 2016 03:41, "Daniel Lopes" <da...@onematch.com.br> wrote:
>
>> Thanks Mike,
>>
>> A good way to debug! Was that already!
>>
>> Best,
>>
>> *Daniel Lopes*
>> Chief Data and Analytics Officer | OneMatch
>> c: +55 (18) 99764-2733 | https://www.linkedin.com/in/dslopes
>>
>> www.onematch.com.br
>> <http://www.onematch.com.br/?utm_source=EmailSignature&utm_term=daniel-lopes>
>>
>> On Thu, Sep 8, 2016 at 2:26 PM, Mike Metzger <mi...@flexiblecreations.com>
>> wrote:
>>
>>> My guess is there's some row that does not match up with the expected
>>> data.  While slower, I've found RDDs to be easier to troubleshoot this kind
>>> of thing until you sort out exactly what's happening.
>>>
>>> Something like:
>>>
>>> raw_data = sc.textFile("<path to text file(s)>")
>>> rowcounts = raw_data.map(lambda x: (len(x.split(",")),
>>> 1)).reduceByKey(lambda x,y: x+y)
>>> rowcounts.take(5)
>>>
>>> badrows = raw_data.filter(lambda x: len(x.split(",")) != <expected
>>> number of columns>)
>>> if badrows.count() > 0:
>>>     badrows.saveAsTextFile("<path to malformed.csv>")
>>>
>>>
>>> You should be able to tell if there are any rows with column counts that
>>> don't match up (the thing that usually bites me with CSV conversions).
>>> Assuming these all match to what you want, I'd try mapping the unparsed
>>> date column out to separate fields and try to see if a year field isn't
>>> matching the expected values.
>>>
>>> Thanks
>>>
>>> Mike
>>>
>>>
>>> On Thu, Sep 8, 2016 at 8:15 AM, Daniel Lopes <da...@onematch.com.br>
>>> wrote:
>>>
>>>> Thanks,
>>>>
>>>> I *tested* the function offline and works
>>>> Tested too with select * from after convert the data and see the new
>>>> data good
>>>> *but* if I *register as temp table* to *join other table* stilll shows *the
>>>> same error*.
>>>>
>>>> ValueError: year out of range
>>>>
>>>> Best,
>>>>
>>>> *Daniel Lopes*
>>>> Chief Data and Analytics Officer | OneMatch
>>>> c: +55 (18) 99764-2733 | https://www.linkedin.com/in/dslopes
>>>>
>>>> www.onematch.com.br
>>>> <http://www.onematch.com.br/?utm_source=EmailSignature&utm_term=daniel-lopes>
>>>>
>>>> On Thu, Sep 8, 2016 at 9:43 AM, Marco Mistroni <mm...@gmail.com>
>>>> wrote:
>>>>
>>>>> Daniel
>>>>> Test the parse date offline to make sure it returns what you expect
>>>>> If it does   in spark shell create a df with 1 row only and run ur
>>>>> UDF. U should b able to see issue
>>>>> If not send me a reduced CSV file at my email and I give it a try this
>>>>> eve ....hopefully someone else will b able to assist in meantime
>>>>> U don't need to run a full spark app to debug issue
>>>>> Ur problem. Is either in the parse date or in what gets passed to the
>>>>> UDF
>>>>> Hth
>>>>>
>>>>> On 8 Sep 2016 1:31 pm, "Daniel Lopes" <da...@onematch.com.br> wrote:
>>>>>
>>>>>> Thanks Marco for your response.
>>>>>>
>>>>>> The field came encoded by SQL Server in locale pt_BR.
>>>>>>
>>>>>> The code that I am formating is:
>>>>>>
>>>>>> --------------------------
>>>>>> def parse_date(argument, format_date='%Y-%m%d %H:%M:%S'):
>>>>>>     try:
>>>>>>         locale.setlocale(locale.LC_TIME, 'pt_BR.utf8')
>>>>>>         return datetime.strptime(argument, format_date)
>>>>>>     except:
>>>>>>         return None
>>>>>>
>>>>>> convert_date = funcspk.udf(lambda x: parse_date(x, '%b %d %Y %H:%M'),
>>>>>> TimestampType())
>>>>>>
>>>>>> transacoes = transacoes.withColumn('tr_Vencimento',
>>>>>> convert_date(transacoes.*tr_Vencimento*))
>>>>>>
>>>>>> --------------------------
>>>>>>
>>>>>> the sample is
>>>>>>
>>>>>> -------------------------
>>>>>> +-----------------+----------------+-----------------+------
>>>>>> --+------------------+-----------+-----------------+--------
>>>>>> -------------+------------------+--------------+------------
>>>>>> ----+-------------+-------------+----------------------+----
>>>>>> ------------------------+--------------------+--------+-----
>>>>>> ---+------------------+----------------+--------+----------+
>>>>>> -----------------+----------+
>>>>>> |tr_NumeroContrato|tr_TipoDocumento|    *tr_Vencimento*
>>>>>> |tr_Valor|tr_DataRecebimento|tr_TaxaMora|tr_De
>>>>>> scontoMaximo|tr_DescontoMaximoCorr|tr_ValorAtualizado|tr_Com
>>>>>> Garantia|tr_ValorDesconto|tr_ValorJuros|tr_ValorMulta|tr_Dat
>>>>>> aDevolucaoCheque|tr_ValorCorrigidoContratante|
>>>>>>  tr_DataNotificacao|tr_Banco|tr_Praca|tr_DescricaoAlinea|tr_
>>>>>> Enquadramento|tr_Linha|tr_Arquivo|tr_DataImportacao|tr_Agencia|
>>>>>> +-----------------+----------------+-----------------+------
>>>>>> --+------------------+-----------+-----------------+--------
>>>>>> -------------+------------------+--------------+------------
>>>>>> ----+-------------+-------------+----------------------+----
>>>>>> ------------------------+--------------------+--------+-----
>>>>>> ---+------------------+----------------+--------+----------+
>>>>>> -----------------+----------+
>>>>>> | 0000992600153001|                |*Jul 20 2015 12:00*|  254.35|
>>>>>>            null|       null|             null|                 null|
>>>>>>        null|             0|            null|         null|         null|
>>>>>>                null|                      254.35|2015-07-20 12:00:...|
>>>>>>  null|    null|              null|            null|    null|      null|
>>>>>>         null|      null|
>>>>>> | 0000992600153001|                |*Abr 20 2015 12:00*|  254.35|
>>>>>>            null|       null|             null|                 null|
>>>>>>        null|             0|            null|         null|         null|
>>>>>>                null|                      254.35|                null|
>>>>>>  null|    null|              null|            null|    null|      null|
>>>>>>         null|      null|
>>>>>> | 0000992600153001|                |Nov 20 2015 12:00|  254.35|
>>>>>>        null|       null|             null|                 null|
>>>>>>    null|             0|            null|         null|         null|
>>>>>>            null|                      254.35|2015-11-20 12:00:...|    null|
>>>>>>    null|              null|            null|    null|      null|
>>>>>>   null|      null|
>>>>>> | 0000992600153001|                |Dez 20 2015 12:00|  254.35|
>>>>>>        null|       null|             null|                 null|
>>>>>>    null|             0|            null|         null|         null|
>>>>>>            null|                      254.35|                null|    null|
>>>>>>    null|              null|            null|    null|      null|
>>>>>>   null|      null|
>>>>>> | 0000992600153001|                |Fev 20 2016 12:00|  254.35|
>>>>>>        null|       null|             null|                 null|
>>>>>>    null|             0|            null|         null|         null|
>>>>>>            null|                      254.35|                null|    null|
>>>>>>    null|              null|            null|    null|      null|
>>>>>>   null|      null|
>>>>>> | 0000992600153001|                |Fev 20 2015 12:00|  254.35|
>>>>>>        null|       null|             null|                 null|
>>>>>>    null|             0|            null|         null|         null|
>>>>>>            null|                      254.35|                null|    null|
>>>>>>    null|              null|            null|    null|      null|
>>>>>>   null|      null|
>>>>>> | 0000992600153001|                |Jun 20 2015 12:00|  254.35|
>>>>>>        null|       null|             null|                 null|
>>>>>>    null|             0|            null|         null|         null|
>>>>>>            null|                      254.35|2015-06-20 12:00:...|    null|
>>>>>>    null|              null|            null|    null|      null|
>>>>>>   null|      null|
>>>>>> | 0000992600153001|                |Ago 20 2015 12:00|  254.35|
>>>>>>        null|       null|             null|                 null|
>>>>>>    null|             0|            null|         null|         null|
>>>>>>            null|                      254.35|                null|    null|
>>>>>>    null|              null|            null|    null|      null|
>>>>>>   null|      null|
>>>>>> | 0000992600153001|                |Jan 20 2016 12:00|  254.35|
>>>>>>        null|       null|             null|                 null|
>>>>>>    null|             0|            null|         null|         null|
>>>>>>            null|                      254.35|2016-01-20 12:00:...|    null|
>>>>>>    null|              null|            null|    null|      null|
>>>>>>   null|      null|
>>>>>> | 0000992600153001|                |Jan 20 2015 12:00|  254.35|
>>>>>>        null|       null|             null|                 null|
>>>>>>    null|             0|            null|         null|         null|
>>>>>>            null|                      254.35|2015-01-20 12:00:...|    null|
>>>>>>    null|              null|            null|    null|      null|
>>>>>>   null|      null|
>>>>>> | 0000992600153001|                |Set 20 2015 12:00|  254.35|
>>>>>>        null|       null|             null|                 null|
>>>>>>    null|             0|            null|         null|         null|
>>>>>>            null|                      254.35|                null|    null|
>>>>>>    null|              null|            null|    null|      null|
>>>>>>   null|      null|
>>>>>> | 0000992600153001|                |Mai 20 2015 12:00|  254.35|
>>>>>>        null|       null|             null|                 null|
>>>>>>    null|             0|            null|         null|         null|
>>>>>>            null|                      254.35|                null|    null|
>>>>>>    null|              null|            null|    null|      null|
>>>>>>   null|      null|
>>>>>> | 0000992600153001|                |Out 20 2015 12:00|  254.35|
>>>>>>        null|       null|             null|                 null|
>>>>>>    null|             0|            null|         null|         null|
>>>>>>            null|                      254.35|                null|    null|
>>>>>>    null|              null|            null|    null|      null|
>>>>>>   null|      null|
>>>>>> | 0000992600153001|                |Mar 20 2015 12:00|  254.35|
>>>>>>        null|       null|             null|                 null|
>>>>>>    null|             0|            null|         null|         null|
>>>>>>            null|                      254.35|2015-03-20 12:00:...|    null|
>>>>>>    null|              null|            null|    null|      null|
>>>>>>   null|      null|
>>>>>> +-----------------+----------------+-----------------+------
>>>>>> --+------------------+-----------+-----------------+--------
>>>>>> -------------+------------------+--------------+------------
>>>>>> ----+-------------+-------------+----------------------+----
>>>>>> ------------------------+--------------------+--------+-----
>>>>>> ---+------------------+----------------+--------+----------+
>>>>>> -----------------+----------+
>>>>>>
>>>>>> -------------------------
>>>>>>
>>>>>> *Daniel Lopes*
>>>>>> Chief Data and Analytics Officer | OneMatch
>>>>>> c: +55 (18) 99764-2733 | https://www.linkedin.com/in/dslopes
>>>>>>
>>>>>> www.onematch.com.br
>>>>>> <http://www.onematch.com.br/?utm_source=EmailSignature&utm_term=daniel-lopes>
>>>>>>
>>>>>> On Thu, Sep 8, 2016 at 5:33 AM, Marco Mistroni <mm...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Pls paste code and sample CSV
>>>>>>> I m guessing it has to do with formatting time?
>>>>>>> Kr
>>>>>>>
>>>>>>> On 8 Sep 2016 12:38 am, "Daniel Lopes" <da...@onematch.com.br>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> I'm* importing a few CSV*s with spark-csv package,
>>>>>>>> Always when I give a select at each one looks ok
>>>>>>>> But when i join then with sqlContext.sql give me this error
>>>>>>>>
>>>>>>>> all tables has fields timestamp
>>>>>>>>
>>>>>>>> joins are not with this dates
>>>>>>>>
>>>>>>>>
>>>>>>>> *Py4JJavaError: An error occurred while calling o643.showString.*
>>>>>>>> : org.apache.spark.SparkException: Job aborted due to stage
>>>>>>>> failure: Task 54 in stage 92.0 failed 10 times, most recent failure: Lost
>>>>>>>> task 54.9 in stage 92.0 (TID 6356, yp-spark-dal09-env5-0036):
>>>>>>>> org.apache.spark.api.python.PythonException: Traceback (most
>>>>>>>> recent call last):
>>>>>>>>   File "/usr/local/src/spark160master/spark-1.6.0-bin-2.6.0/python/
>>>>>>>> lib/pyspark.zip/pyspark/worker.py", line 111, in main
>>>>>>>>     process()
>>>>>>>>   File "/usr/local/src/spark160master/spark-1.6.0-bin-2.6.0/python/
>>>>>>>> lib/pyspark.zip/pyspark/worker.py", line 106, in process
>>>>>>>>     serializer.dump_stream(func(split_index, iterator), outfile)
>>>>>>>>   File "/usr/local/src/spark160master/spark-1.6.0-bin-2.6.0/python/
>>>>>>>> lib/pyspark.zip/pyspark/serializers.py", line 263, in dump_stream
>>>>>>>>     vs = list(itertools.islice(iterator, batch))
>>>>>>>>   File "/usr/local/src/spark160master/spark/python/pyspark/sql/functions.py",
>>>>>>>> line 1563, in <lambda>
>>>>>>>>     func = lambda _, it: map(lambda x:
>>>>>>>> returnType.toInternal(f(*x)), it)
>>>>>>>>   File "/usr/local/src/spark160master/spark-1.6.0-bin-2.6.0/python/
>>>>>>>> lib/pyspark.zip/pyspark/sql/types.py", line 191, in toInternal
>>>>>>>>     else time.mktime(dt.timetuple()))
>>>>>>>> *ValueError: year out of range  *
>>>>>>>>
>>>>>>>> Any one knows this problem?
>>>>>>>>
>>>>>>>> Best,
>>>>>>>>
>>>>>>>> *Daniel Lopes*
>>>>>>>> Chief Data and Analytics Officer | OneMatch
>>>>>>>> c: +55 (18) 99764-2733 | https://www.linkedin.com/in/dslopes
>>>>>>>>
>>>>>>>> www.onematch.com.br
>>>>>>>> <http://www.onematch.com.br/?utm_source=EmailSignature&utm_term=daniel-lopes>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>
>>>
>>

Re: year out of range

Posted by ayan guha <gu...@gmail.com>.
Another way of debugging would be writing another UDF, returning string.
Also, in that function, put something useful in catch block, so you can
filter those records from df.
On 9 Sep 2016 03:41, "Daniel Lopes" <da...@onematch.com.br> wrote:

> Thanks Mike,
>
> A good way to debug! Was that already!
>
> Best,
>
> *Daniel Lopes*
> Chief Data and Analytics Officer | OneMatch
> c: +55 (18) 99764-2733 | https://www.linkedin.com/in/dslopes
>
> www.onematch.com.br
> <http://www.onematch.com.br/?utm_source=EmailSignature&utm_term=daniel-lopes>
>
> On Thu, Sep 8, 2016 at 2:26 PM, Mike Metzger <mi...@flexiblecreations.com>
> wrote:
>
>> My guess is there's some row that does not match up with the expected
>> data.  While slower, I've found RDDs to be easier to troubleshoot this kind
>> of thing until you sort out exactly what's happening.
>>
>> Something like:
>>
>> raw_data = sc.textFile("<path to text file(s)>")
>> rowcounts = raw_data.map(lambda x: (len(x.split(",")),
>> 1)).reduceByKey(lambda x,y: x+y)
>> rowcounts.take(5)
>>
>> badrows = raw_data.filter(lambda x: len(x.split(",")) != <expected number
>> of columns>)
>> if badrows.count() > 0:
>>     badrows.saveAsTextFile("<path to malformed.csv>")
>>
>>
>> You should be able to tell if there are any rows with column counts that
>> don't match up (the thing that usually bites me with CSV conversions).
>> Assuming these all match to what you want, I'd try mapping the unparsed
>> date column out to separate fields and try to see if a year field isn't
>> matching the expected values.
>>
>> Thanks
>>
>> Mike
>>
>>
>> On Thu, Sep 8, 2016 at 8:15 AM, Daniel Lopes <da...@onematch.com.br>
>> wrote:
>>
>>> Thanks,
>>>
>>> I *tested* the function offline and works
>>> Tested too with select * from after convert the data and see the new
>>> data good
>>> *but* if I *register as temp table* to *join other table* stilll shows *the
>>> same error*.
>>>
>>> ValueError: year out of range
>>>
>>> Best,
>>>
>>> *Daniel Lopes*
>>> Chief Data and Analytics Officer | OneMatch
>>> c: +55 (18) 99764-2733 | https://www.linkedin.com/in/dslopes
>>>
>>> www.onematch.com.br
>>> <http://www.onematch.com.br/?utm_source=EmailSignature&utm_term=daniel-lopes>
>>>
>>> On Thu, Sep 8, 2016 at 9:43 AM, Marco Mistroni <mm...@gmail.com>
>>> wrote:
>>>
>>>> Daniel
>>>> Test the parse date offline to make sure it returns what you expect
>>>> If it does   in spark shell create a df with 1 row only and run ur UDF.
>>>> U should b able to see issue
>>>> If not send me a reduced CSV file at my email and I give it a try this
>>>> eve ....hopefully someone else will b able to assist in meantime
>>>> U don't need to run a full spark app to debug issue
>>>> Ur problem. Is either in the parse date or in what gets passed to the
>>>> UDF
>>>> Hth
>>>>
>>>> On 8 Sep 2016 1:31 pm, "Daniel Lopes" <da...@onematch.com.br> wrote:
>>>>
>>>>> Thanks Marco for your response.
>>>>>
>>>>> The field came encoded by SQL Server in locale pt_BR.
>>>>>
>>>>> The code that I am formating is:
>>>>>
>>>>> --------------------------
>>>>> def parse_date(argument, format_date='%Y-%m%d %H:%M:%S'):
>>>>>     try:
>>>>>         locale.setlocale(locale.LC_TIME, 'pt_BR.utf8')
>>>>>         return datetime.strptime(argument, format_date)
>>>>>     except:
>>>>>         return None
>>>>>
>>>>> convert_date = funcspk.udf(lambda x: parse_date(x, '%b %d %Y %H:%M'),
>>>>> TimestampType())
>>>>>
>>>>> transacoes = transacoes.withColumn('tr_Vencimento',
>>>>> convert_date(transacoes.*tr_Vencimento*))
>>>>>
>>>>> --------------------------
>>>>>
>>>>> the sample is
>>>>>
>>>>> -------------------------
>>>>> +-----------------+----------------+-----------------+------
>>>>> --+------------------+-----------+-----------------+--------
>>>>> -------------+------------------+--------------+------------
>>>>> ----+-------------+-------------+----------------------+----
>>>>> ------------------------+--------------------+--------+-----
>>>>> ---+------------------+----------------+--------+----------+
>>>>> -----------------+----------+
>>>>> |tr_NumeroContrato|tr_TipoDocumento|    *tr_Vencimento*
>>>>> |tr_Valor|tr_DataRecebimento|tr_TaxaMora|tr_De
>>>>> scontoMaximo|tr_DescontoMaximoCorr|tr_ValorAtualizado|tr_Com
>>>>> Garantia|tr_ValorDesconto|tr_ValorJuros|tr_ValorMulta|tr_Dat
>>>>> aDevolucaoCheque|tr_ValorCorrigidoContratante|
>>>>>  tr_DataNotificacao|tr_Banco|tr_Praca|tr_DescricaoAlinea|tr_
>>>>> Enquadramento|tr_Linha|tr_Arquivo|tr_DataImportacao|tr_Agencia|
>>>>> +-----------------+----------------+-----------------+------
>>>>> --+------------------+-----------+-----------------+--------
>>>>> -------------+------------------+--------------+------------
>>>>> ----+-------------+-------------+----------------------+----
>>>>> ------------------------+--------------------+--------+-----
>>>>> ---+------------------+----------------+--------+----------+
>>>>> -----------------+----------+
>>>>> | 0000992600153001|                |*Jul 20 2015 12:00*|  254.35|
>>>>>          null|       null|             null|                 null|
>>>>>      null|             0|            null|         null|         null|
>>>>>              null|                      254.35|2015-07-20 12:00:...|
>>>>>  null|    null|              null|            null|    null|      null|
>>>>>         null|      null|
>>>>> | 0000992600153001|                |*Abr 20 2015 12:00*|  254.35|
>>>>>          null|       null|             null|                 null|
>>>>>      null|             0|            null|         null|         null|
>>>>>              null|                      254.35|                null|
>>>>>  null|    null|              null|            null|    null|      null|
>>>>>         null|      null|
>>>>> | 0000992600153001|                |Nov 20 2015 12:00|  254.35|
>>>>>        null|       null|             null|                 null|
>>>>>    null|             0|            null|         null|         null|
>>>>>            null|                      254.35|2015-11-20 12:00:...|    null|
>>>>>    null|              null|            null|    null|      null|
>>>>>   null|      null|
>>>>> | 0000992600153001|                |Dez 20 2015 12:00|  254.35|
>>>>>        null|       null|             null|                 null|
>>>>>    null|             0|            null|         null|         null|
>>>>>            null|                      254.35|                null|    null|
>>>>>    null|              null|            null|    null|      null|
>>>>>   null|      null|
>>>>> | 0000992600153001|                |Fev 20 2016 12:00|  254.35|
>>>>>        null|       null|             null|                 null|
>>>>>    null|             0|            null|         null|         null|
>>>>>            null|                      254.35|                null|    null|
>>>>>    null|              null|            null|    null|      null|
>>>>>   null|      null|
>>>>> | 0000992600153001|                |Fev 20 2015 12:00|  254.35|
>>>>>        null|       null|             null|                 null|
>>>>>    null|             0|            null|         null|         null|
>>>>>            null|                      254.35|                null|    null|
>>>>>    null|              null|            null|    null|      null|
>>>>>   null|      null|
>>>>> | 0000992600153001|                |Jun 20 2015 12:00|  254.35|
>>>>>        null|       null|             null|                 null|
>>>>>    null|             0|            null|         null|         null|
>>>>>            null|                      254.35|2015-06-20 12:00:...|    null|
>>>>>    null|              null|            null|    null|      null|
>>>>>   null|      null|
>>>>> | 0000992600153001|                |Ago 20 2015 12:00|  254.35|
>>>>>        null|       null|             null|                 null|
>>>>>    null|             0|            null|         null|         null|
>>>>>            null|                      254.35|                null|    null|
>>>>>    null|              null|            null|    null|      null|
>>>>>   null|      null|
>>>>> | 0000992600153001|                |Jan 20 2016 12:00|  254.35|
>>>>>        null|       null|             null|                 null|
>>>>>    null|             0|            null|         null|         null|
>>>>>            null|                      254.35|2016-01-20 12:00:...|    null|
>>>>>    null|              null|            null|    null|      null|
>>>>>   null|      null|
>>>>> | 0000992600153001|                |Jan 20 2015 12:00|  254.35|
>>>>>        null|       null|             null|                 null|
>>>>>    null|             0|            null|         null|         null|
>>>>>            null|                      254.35|2015-01-20 12:00:...|    null|
>>>>>    null|              null|            null|    null|      null|
>>>>>   null|      null|
>>>>> | 0000992600153001|                |Set 20 2015 12:00|  254.35|
>>>>>        null|       null|             null|                 null|
>>>>>    null|             0|            null|         null|         null|
>>>>>            null|                      254.35|                null|    null|
>>>>>    null|              null|            null|    null|      null|
>>>>>   null|      null|
>>>>> | 0000992600153001|                |Mai 20 2015 12:00|  254.35|
>>>>>        null|       null|             null|                 null|
>>>>>    null|             0|            null|         null|         null|
>>>>>            null|                      254.35|                null|    null|
>>>>>    null|              null|            null|    null|      null|
>>>>>   null|      null|
>>>>> | 0000992600153001|                |Out 20 2015 12:00|  254.35|
>>>>>        null|       null|             null|                 null|
>>>>>    null|             0|            null|         null|         null|
>>>>>            null|                      254.35|                null|    null|
>>>>>    null|              null|            null|    null|      null|
>>>>>   null|      null|
>>>>> | 0000992600153001|                |Mar 20 2015 12:00|  254.35|
>>>>>        null|       null|             null|                 null|
>>>>>    null|             0|            null|         null|         null|
>>>>>            null|                      254.35|2015-03-20 12:00:...|    null|
>>>>>    null|              null|            null|    null|      null|
>>>>>   null|      null|
>>>>> +-----------------+----------------+-----------------+------
>>>>> --+------------------+-----------+-----------------+--------
>>>>> -------------+------------------+--------------+------------
>>>>> ----+-------------+-------------+----------------------+----
>>>>> ------------------------+--------------------+--------+-----
>>>>> ---+------------------+----------------+--------+----------+
>>>>> -----------------+----------+
>>>>>
>>>>> -------------------------
>>>>>
>>>>> *Daniel Lopes*
>>>>> Chief Data and Analytics Officer | OneMatch
>>>>> c: +55 (18) 99764-2733 | https://www.linkedin.com/in/dslopes
>>>>>
>>>>> www.onematch.com.br
>>>>> <http://www.onematch.com.br/?utm_source=EmailSignature&utm_term=daniel-lopes>
>>>>>
>>>>> On Thu, Sep 8, 2016 at 5:33 AM, Marco Mistroni <mm...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Pls paste code and sample CSV
>>>>>> I m guessing it has to do with formatting time?
>>>>>> Kr
>>>>>>
>>>>>> On 8 Sep 2016 12:38 am, "Daniel Lopes" <da...@onematch.com.br>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> I'm* importing a few CSV*s with spark-csv package,
>>>>>>> Always when I give a select at each one looks ok
>>>>>>> But when i join then with sqlContext.sql give me this error
>>>>>>>
>>>>>>> all tables has fields timestamp
>>>>>>>
>>>>>>> joins are not with this dates
>>>>>>>
>>>>>>>
>>>>>>> *Py4JJavaError: An error occurred while calling o643.showString.*
>>>>>>> : org.apache.spark.SparkException: Job aborted due to stage
>>>>>>> failure: Task 54 in stage 92.0 failed 10 times, most recent failure: Lost
>>>>>>> task 54.9 in stage 92.0 (TID 6356, yp-spark-dal09-env5-0036):
>>>>>>> org.apache.spark.api.python.PythonException: Traceback (most recent
>>>>>>> call last):
>>>>>>>   File "/usr/local/src/spark160master/spark-1.6.0-bin-2.6.0/python/
>>>>>>> lib/pyspark.zip/pyspark/worker.py", line 111, in main
>>>>>>>     process()
>>>>>>>   File "/usr/local/src/spark160master/spark-1.6.0-bin-2.6.0/python/
>>>>>>> lib/pyspark.zip/pyspark/worker.py", line 106, in process
>>>>>>>     serializer.dump_stream(func(split_index, iterator), outfile)
>>>>>>>   File "/usr/local/src/spark160master/spark-1.6.0-bin-2.6.0/python/
>>>>>>> lib/pyspark.zip/pyspark/serializers.py", line 263, in dump_stream
>>>>>>>     vs = list(itertools.islice(iterator, batch))
>>>>>>>   File "/usr/local/src/spark160master/spark/python/pyspark/sql/functions.py",
>>>>>>> line 1563, in <lambda>
>>>>>>>     func = lambda _, it: map(lambda x: returnType.toInternal(f(*x)),
>>>>>>> it)
>>>>>>>   File "/usr/local/src/spark160master/spark-1.6.0-bin-2.6.0/python/
>>>>>>> lib/pyspark.zip/pyspark/sql/types.py", line 191, in toInternal
>>>>>>>     else time.mktime(dt.timetuple()))
>>>>>>> *ValueError: year out of range  *
>>>>>>>
>>>>>>> Any one knows this problem?
>>>>>>>
>>>>>>> Best,
>>>>>>>
>>>>>>> *Daniel Lopes*
>>>>>>> Chief Data and Analytics Officer | OneMatch
>>>>>>> c: +55 (18) 99764-2733 | https://www.linkedin.com/in/dslopes
>>>>>>>
>>>>>>> www.onematch.com.br
>>>>>>> <http://www.onematch.com.br/?utm_source=EmailSignature&utm_term=daniel-lopes>
>>>>>>>
>>>>>>
>>>>>
>>>
>>
>

Re: year out of range

Posted by Daniel Lopes <da...@onematch.com.br>.
Thanks Mike,

A good way to debug! Was that already!

Best,

*Daniel Lopes*
Chief Data and Analytics Officer | OneMatch
c: +55 (18) 99764-2733 | https://www.linkedin.com/in/dslopes

www.onematch.com.br
<http://www.onematch.com.br/?utm_source=EmailSignature&utm_term=daniel-lopes>

On Thu, Sep 8, 2016 at 2:26 PM, Mike Metzger <mi...@flexiblecreations.com>
wrote:

> My guess is there's some row that does not match up with the expected
> data.  While slower, I've found RDDs to be easier to troubleshoot this kind
> of thing until you sort out exactly what's happening.
>
> Something like:
>
> raw_data = sc.textFile("<path to text file(s)>")
> rowcounts = raw_data.map(lambda x: (len(x.split(",")),
> 1)).reduceByKey(lambda x,y: x+y)
> rowcounts.take(5)
>
> badrows = raw_data.filter(lambda x: len(x.split(",")) != <expected number
> of columns>)
> if badrows.count() > 0:
>     badrows.saveAsTextFile("<path to malformed.csv>")
>
>
> You should be able to tell if there are any rows with column counts that
> don't match up (the thing that usually bites me with CSV conversions).
> Assuming these all match to what you want, I'd try mapping the unparsed
> date column out to separate fields and try to see if a year field isn't
> matching the expected values.
>
> Thanks
>
> Mike
>
>
> On Thu, Sep 8, 2016 at 8:15 AM, Daniel Lopes <da...@onematch.com.br>
> wrote:
>
>> Thanks,
>>
>> I *tested* the function offline and works
>> Tested too with select * from after convert the data and see the new data
>> good
>> *but* if I *register as temp table* to *join other table* stilll shows *the
>> same error*.
>>
>> ValueError: year out of range
>>
>> Best,
>>
>> *Daniel Lopes*
>> Chief Data and Analytics Officer | OneMatch
>> c: +55 (18) 99764-2733 | https://www.linkedin.com/in/dslopes
>>
>> www.onematch.com.br
>> <http://www.onematch.com.br/?utm_source=EmailSignature&utm_term=daniel-lopes>
>>
>> On Thu, Sep 8, 2016 at 9:43 AM, Marco Mistroni <mm...@gmail.com>
>> wrote:
>>
>>> Daniel
>>> Test the parse date offline to make sure it returns what you expect
>>> If it does   in spark shell create a df with 1 row only and run ur UDF.
>>> U should b able to see issue
>>> If not send me a reduced CSV file at my email and I give it a try this
>>> eve ....hopefully someone else will b able to assist in meantime
>>> U don't need to run a full spark app to debug issue
>>> Ur problem. Is either in the parse date or in what gets passed to the UDF
>>> Hth
>>>
>>> On 8 Sep 2016 1:31 pm, "Daniel Lopes" <da...@onematch.com.br> wrote:
>>>
>>>> Thanks Marco for your response.
>>>>
>>>> The field came encoded by SQL Server in locale pt_BR.
>>>>
>>>> The code that I am formating is:
>>>>
>>>> --------------------------
>>>> def parse_date(argument, format_date='%Y-%m%d %H:%M:%S'):
>>>>     try:
>>>>         locale.setlocale(locale.LC_TIME, 'pt_BR.utf8')
>>>>         return datetime.strptime(argument, format_date)
>>>>     except:
>>>>         return None
>>>>
>>>> convert_date = funcspk.udf(lambda x: parse_date(x, '%b %d %Y %H:%M'),
>>>> TimestampType())
>>>>
>>>> transacoes = transacoes.withColumn('tr_Vencimento',
>>>> convert_date(transacoes.*tr_Vencimento*))
>>>>
>>>> --------------------------
>>>>
>>>> the sample is
>>>>
>>>> -------------------------
>>>> +-----------------+----------------+-----------------+------
>>>> --+------------------+-----------+-----------------+--------
>>>> -------------+------------------+--------------+------------
>>>> ----+-------------+-------------+----------------------+----
>>>> ------------------------+--------------------+--------+-----
>>>> ---+------------------+----------------+--------+----------+
>>>> -----------------+----------+
>>>> |tr_NumeroContrato|tr_TipoDocumento|    *tr_Vencimento*|tr_Valor|tr_Dat
>>>> aRecebimento|tr_TaxaMora|tr_DescontoMaximo|tr_DescontoMaximo
>>>> Corr|tr_ValorAtualizado|tr_ComGarantia|tr_ValorDesconto|tr_V
>>>> alorJuros|tr_ValorMulta|tr_DataDevolucaoCheque|tr_ValorCorrigidoContratante|
>>>>  tr_DataNotificacao|tr_Banco|tr_Praca|tr_DescricaoAlinea|tr_
>>>> Enquadramento|tr_Linha|tr_Arquivo|tr_DataImportacao|tr_Agencia|
>>>> +-----------------+----------------+-----------------+------
>>>> --+------------------+-----------+-----------------+--------
>>>> -------------+------------------+--------------+------------
>>>> ----+-------------+-------------+----------------------+----
>>>> ------------------------+--------------------+--------+-----
>>>> ---+------------------+----------------+--------+----------+
>>>> -----------------+----------+
>>>> | 0000992600153001|                |*Jul 20 2015 12:00*|  254.35|
>>>>          null|       null|             null|                 null|
>>>>      null|             0|            null|         null|         null|
>>>>              null|                      254.35|2015-07-20 12:00:...|
>>>>  null|    null|              null|            null|    null|      null|
>>>>         null|      null|
>>>> | 0000992600153001|                |*Abr 20 2015 12:00*|  254.35|
>>>>          null|       null|             null|                 null|
>>>>      null|             0|            null|         null|         null|
>>>>              null|                      254.35|                null|
>>>>  null|    null|              null|            null|    null|      null|
>>>>         null|      null|
>>>> | 0000992600153001|                |Nov 20 2015 12:00|  254.35|
>>>>      null|       null|             null|                 null|
>>>>  null|             0|            null|         null|         null|
>>>>          null|                      254.35|2015-11-20 12:00:...|    null|
>>>>  null|              null|            null|    null|      null|
>>>> null|      null|
>>>> | 0000992600153001|                |Dez 20 2015 12:00|  254.35|
>>>>      null|       null|             null|                 null|
>>>>  null|             0|            null|         null|         null|
>>>>          null|                      254.35|                null|    null|
>>>>  null|              null|            null|    null|      null|
>>>> null|      null|
>>>> | 0000992600153001|                |Fev 20 2016 12:00|  254.35|
>>>>      null|       null|             null|                 null|
>>>>  null|             0|            null|         null|         null|
>>>>          null|                      254.35|                null|    null|
>>>>  null|              null|            null|    null|      null|
>>>> null|      null|
>>>> | 0000992600153001|                |Fev 20 2015 12:00|  254.35|
>>>>      null|       null|             null|                 null|
>>>>  null|             0|            null|         null|         null|
>>>>          null|                      254.35|                null|    null|
>>>>  null|              null|            null|    null|      null|
>>>> null|      null|
>>>> | 0000992600153001|                |Jun 20 2015 12:00|  254.35|
>>>>      null|       null|             null|                 null|
>>>>  null|             0|            null|         null|         null|
>>>>          null|                      254.35|2015-06-20 12:00:...|    null|
>>>>  null|              null|            null|    null|      null|
>>>> null|      null|
>>>> | 0000992600153001|                |Ago 20 2015 12:00|  254.35|
>>>>      null|       null|             null|                 null|
>>>>  null|             0|            null|         null|         null|
>>>>          null|                      254.35|                null|    null|
>>>>  null|              null|            null|    null|      null|
>>>> null|      null|
>>>> | 0000992600153001|                |Jan 20 2016 12:00|  254.35|
>>>>      null|       null|             null|                 null|
>>>>  null|             0|            null|         null|         null|
>>>>          null|                      254.35|2016-01-20 12:00:...|    null|
>>>>  null|              null|            null|    null|      null|
>>>> null|      null|
>>>> | 0000992600153001|                |Jan 20 2015 12:00|  254.35|
>>>>      null|       null|             null|                 null|
>>>>  null|             0|            null|         null|         null|
>>>>          null|                      254.35|2015-01-20 12:00:...|    null|
>>>>  null|              null|            null|    null|      null|
>>>> null|      null|
>>>> | 0000992600153001|                |Set 20 2015 12:00|  254.35|
>>>>      null|       null|             null|                 null|
>>>>  null|             0|            null|         null|         null|
>>>>          null|                      254.35|                null|    null|
>>>>  null|              null|            null|    null|      null|
>>>> null|      null|
>>>> | 0000992600153001|                |Mai 20 2015 12:00|  254.35|
>>>>      null|       null|             null|                 null|
>>>>  null|             0|            null|         null|         null|
>>>>          null|                      254.35|                null|    null|
>>>>  null|              null|            null|    null|      null|
>>>> null|      null|
>>>> | 0000992600153001|                |Out 20 2015 12:00|  254.35|
>>>>      null|       null|             null|                 null|
>>>>  null|             0|            null|         null|         null|
>>>>          null|                      254.35|                null|    null|
>>>>  null|              null|            null|    null|      null|
>>>> null|      null|
>>>> | 0000992600153001|                |Mar 20 2015 12:00|  254.35|
>>>>      null|       null|             null|                 null|
>>>>  null|             0|            null|         null|         null|
>>>>          null|                      254.35|2015-03-20 12:00:...|    null|
>>>>  null|              null|            null|    null|      null|
>>>> null|      null|
>>>> +-----------------+----------------+-----------------+------
>>>> --+------------------+-----------+-----------------+--------
>>>> -------------+------------------+--------------+------------
>>>> ----+-------------+-------------+----------------------+----
>>>> ------------------------+--------------------+--------+-----
>>>> ---+------------------+----------------+--------+----------+
>>>> -----------------+----------+
>>>>
>>>> -------------------------
>>>>
>>>> *Daniel Lopes*
>>>> Chief Data and Analytics Officer | OneMatch
>>>> c: +55 (18) 99764-2733 | https://www.linkedin.com/in/dslopes
>>>>
>>>> www.onematch.com.br
>>>> <http://www.onematch.com.br/?utm_source=EmailSignature&utm_term=daniel-lopes>
>>>>
>>>> On Thu, Sep 8, 2016 at 5:33 AM, Marco Mistroni <mm...@gmail.com>
>>>> wrote:
>>>>
>>>>> Pls paste code and sample CSV
>>>>> I m guessing it has to do with formatting time?
>>>>> Kr
>>>>>
>>>>> On 8 Sep 2016 12:38 am, "Daniel Lopes" <da...@onematch.com.br> wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I'm* importing a few CSV*s with spark-csv package,
>>>>>> Always when I give a select at each one looks ok
>>>>>> But when i join then with sqlContext.sql give me this error
>>>>>>
>>>>>> all tables has fields timestamp
>>>>>>
>>>>>> joins are not with this dates
>>>>>>
>>>>>>
>>>>>> *Py4JJavaError: An error occurred while calling o643.showString.*
>>>>>> : org.apache.spark.SparkException: Job aborted due to stage failure:
>>>>>> Task 54 in stage 92.0 failed 10 times, most recent failure: Lost task 54.9
>>>>>> in stage 92.0 (TID 6356, yp-spark-dal09-env5-0036):
>>>>>> org.apache.spark.api.python.PythonException: Traceback (most recent
>>>>>> call last):
>>>>>>   File "/usr/local/src/spark160master/spark-1.6.0-bin-2.6.0/python/
>>>>>> lib/pyspark.zip/pyspark/worker.py", line 111, in main
>>>>>>     process()
>>>>>>   File "/usr/local/src/spark160master/spark-1.6.0-bin-2.6.0/python/
>>>>>> lib/pyspark.zip/pyspark/worker.py", line 106, in process
>>>>>>     serializer.dump_stream(func(split_index, iterator), outfile)
>>>>>>   File "/usr/local/src/spark160master/spark-1.6.0-bin-2.6.0/python/
>>>>>> lib/pyspark.zip/pyspark/serializers.py", line 263, in dump_stream
>>>>>>     vs = list(itertools.islice(iterator, batch))
>>>>>>   File "/usr/local/src/spark160master/spark/python/pyspark/sql/functions.py",
>>>>>> line 1563, in <lambda>
>>>>>>     func = lambda _, it: map(lambda x: returnType.toInternal(f(*x)),
>>>>>> it)
>>>>>>   File "/usr/local/src/spark160master/spark-1.6.0-bin-2.6.0/python/
>>>>>> lib/pyspark.zip/pyspark/sql/types.py", line 191, in toInternal
>>>>>>     else time.mktime(dt.timetuple()))
>>>>>> *ValueError: year out of range  *
>>>>>>
>>>>>> Any one knows this problem?
>>>>>>
>>>>>> Best,
>>>>>>
>>>>>> *Daniel Lopes*
>>>>>> Chief Data and Analytics Officer | OneMatch
>>>>>> c: +55 (18) 99764-2733 | https://www.linkedin.com/in/dslopes
>>>>>>
>>>>>> www.onematch.com.br
>>>>>> <http://www.onematch.com.br/?utm_source=EmailSignature&utm_term=daniel-lopes>
>>>>>>
>>>>>
>>>>
>>
>

Re: year out of range

Posted by Mike Metzger <mi...@flexiblecreations.com>.
My guess is there's some row that does not match up with the expected
data.  While slower, I've found RDDs to be easier to troubleshoot this kind
of thing until you sort out exactly what's happening.

Something like:

raw_data = sc.textFile("<path to text file(s)>")
rowcounts = raw_data.map(lambda x: (len(x.split(",")),
1)).reduceByKey(lambda x,y: x+y)
rowcounts.take(5)

badrows = raw_data.filter(lambda x: len(x.split(",")) != <expected number
of columns>)
if badrows.count() > 0:
    badrows.saveAsTextFile("<path to malformed.csv>")


You should be able to tell if there are any rows with column counts that
don't match up (the thing that usually bites me with CSV conversions).
Assuming these all match to what you want, I'd try mapping the unparsed
date column out to separate fields and try to see if a year field isn't
matching the expected values.

Thanks

Mike


On Thu, Sep 8, 2016 at 8:15 AM, Daniel Lopes <da...@onematch.com.br> wrote:

> Thanks,
>
> I *tested* the function offline and works
> Tested too with select * from after convert the data and see the new data
> good
> *but* if I *register as temp table* to *join other table* stilll shows *the
> same error*.
>
> ValueError: year out of range
>
> Best,
>
> *Daniel Lopes*
> Chief Data and Analytics Officer | OneMatch
> c: +55 (18) 99764-2733 | https://www.linkedin.com/in/dslopes
>
> www.onematch.com.br
> <http://www.onematch.com.br/?utm_source=EmailSignature&utm_term=daniel-lopes>
>
> On Thu, Sep 8, 2016 at 9:43 AM, Marco Mistroni <mm...@gmail.com>
> wrote:
>
>> Daniel
>> Test the parse date offline to make sure it returns what you expect
>> If it does   in spark shell create a df with 1 row only and run ur UDF. U
>> should b able to see issue
>> If not send me a reduced CSV file at my email and I give it a try this
>> eve ....hopefully someone else will b able to assist in meantime
>> U don't need to run a full spark app to debug issue
>> Ur problem. Is either in the parse date or in what gets passed to the UDF
>> Hth
>>
>> On 8 Sep 2016 1:31 pm, "Daniel Lopes" <da...@onematch.com.br> wrote:
>>
>>> Thanks Marco for your response.
>>>
>>> The field came encoded by SQL Server in locale pt_BR.
>>>
>>> The code that I am formating is:
>>>
>>> --------------------------
>>> def parse_date(argument, format_date='%Y-%m%d %H:%M:%S'):
>>>     try:
>>>         locale.setlocale(locale.LC_TIME, 'pt_BR.utf8')
>>>         return datetime.strptime(argument, format_date)
>>>     except:
>>>         return None
>>>
>>> convert_date = funcspk.udf(lambda x: parse_date(x, '%b %d %Y %H:%M'),
>>> TimestampType())
>>>
>>> transacoes = transacoes.withColumn('tr_Vencimento',
>>> convert_date(transacoes.*tr_Vencimento*))
>>>
>>> --------------------------
>>>
>>> the sample is
>>>
>>> -------------------------
>>> +-----------------+----------------+-----------------+------
>>> --+------------------+-----------+-----------------+--------
>>> -------------+------------------+--------------+------------
>>> ----+-------------+-------------+----------------------+----
>>> ------------------------+--------------------+--------+-----
>>> ---+------------------+----------------+--------+----------+
>>> -----------------+----------+
>>> |tr_NumeroContrato|tr_TipoDocumento|    *tr_Vencimento*|tr_Valor|tr_Dat
>>> aRecebimento|tr_TaxaMora|tr_DescontoMaximo|tr_DescontoMaximo
>>> Corr|tr_ValorAtualizado|tr_ComGarantia|tr_ValorDesconto|tr_
>>> ValorJuros|tr_ValorMulta|tr_DataDevolucaoCheque|tr_ValorCorrigidoContratante|
>>>  tr_DataNotificacao|tr_Banco|tr_Praca|tr_DescricaoAlinea|tr_
>>> Enquadramento|tr_Linha|tr_Arquivo|tr_DataImportacao|tr_Agencia|
>>> +-----------------+----------------+-----------------+------
>>> --+------------------+-----------+-----------------+--------
>>> -------------+------------------+--------------+------------
>>> ----+-------------+-------------+----------------------+----
>>> ------------------------+--------------------+--------+-----
>>> ---+------------------+----------------+--------+----------+
>>> -----------------+----------+
>>> | 0000992600153001|                |*Jul 20 2015 12:00*|  254.35|
>>>        null|       null|             null|                 null|
>>>    null|             0|            null|         null|         null|
>>>            null|                      254.35|2015-07-20 12:00:...|    null|
>>>    null|              null|            null|    null|      null|
>>>   null|      null|
>>> | 0000992600153001|                |*Abr 20 2015 12:00*|  254.35|
>>>        null|       null|             null|                 null|
>>>    null|             0|            null|         null|         null|
>>>            null|                      254.35|                null|    null|
>>>    null|              null|            null|    null|      null|
>>>   null|      null|
>>> | 0000992600153001|                |Nov 20 2015 12:00|  254.35|
>>>      null|       null|             null|                 null|
>>>  null|             0|            null|         null|         null|
>>>          null|                      254.35|2015-11-20 12:00:...|    null|
>>>  null|              null|            null|    null|      null|
>>> null|      null|
>>> | 0000992600153001|                |Dez 20 2015 12:00|  254.35|
>>>      null|       null|             null|                 null|
>>>  null|             0|            null|         null|         null|
>>>          null|                      254.35|                null|    null|
>>>  null|              null|            null|    null|      null|
>>> null|      null|
>>> | 0000992600153001|                |Fev 20 2016 12:00|  254.35|
>>>      null|       null|             null|                 null|
>>>  null|             0|            null|         null|         null|
>>>          null|                      254.35|                null|    null|
>>>  null|              null|            null|    null|      null|
>>> null|      null|
>>> | 0000992600153001|                |Fev 20 2015 12:00|  254.35|
>>>      null|       null|             null|                 null|
>>>  null|             0|            null|         null|         null|
>>>          null|                      254.35|                null|    null|
>>>  null|              null|            null|    null|      null|
>>> null|      null|
>>> | 0000992600153001|                |Jun 20 2015 12:00|  254.35|
>>>      null|       null|             null|                 null|
>>>  null|             0|            null|         null|         null|
>>>          null|                      254.35|2015-06-20 12:00:...|    null|
>>>  null|              null|            null|    null|      null|
>>> null|      null|
>>> | 0000992600153001|                |Ago 20 2015 12:00|  254.35|
>>>      null|       null|             null|                 null|
>>>  null|             0|            null|         null|         null|
>>>          null|                      254.35|                null|    null|
>>>  null|              null|            null|    null|      null|
>>> null|      null|
>>> | 0000992600153001|                |Jan 20 2016 12:00|  254.35|
>>>      null|       null|             null|                 null|
>>>  null|             0|            null|         null|         null|
>>>          null|                      254.35|2016-01-20 12:00:...|    null|
>>>  null|              null|            null|    null|      null|
>>> null|      null|
>>> | 0000992600153001|                |Jan 20 2015 12:00|  254.35|
>>>      null|       null|             null|                 null|
>>>  null|             0|            null|         null|         null|
>>>          null|                      254.35|2015-01-20 12:00:...|    null|
>>>  null|              null|            null|    null|      null|
>>> null|      null|
>>> | 0000992600153001|                |Set 20 2015 12:00|  254.35|
>>>      null|       null|             null|                 null|
>>>  null|             0|            null|         null|         null|
>>>          null|                      254.35|                null|    null|
>>>  null|              null|            null|    null|      null|
>>> null|      null|
>>> | 0000992600153001|                |Mai 20 2015 12:00|  254.35|
>>>      null|       null|             null|                 null|
>>>  null|             0|            null|         null|         null|
>>>          null|                      254.35|                null|    null|
>>>  null|              null|            null|    null|      null|
>>> null|      null|
>>> | 0000992600153001|                |Out 20 2015 12:00|  254.35|
>>>      null|       null|             null|                 null|
>>>  null|             0|            null|         null|         null|
>>>          null|                      254.35|                null|    null|
>>>  null|              null|            null|    null|      null|
>>> null|      null|
>>> | 0000992600153001|                |Mar 20 2015 12:00|  254.35|
>>>      null|       null|             null|                 null|
>>>  null|             0|            null|         null|         null|
>>>          null|                      254.35|2015-03-20 12:00:...|    null|
>>>  null|              null|            null|    null|      null|
>>> null|      null|
>>> +-----------------+----------------+-----------------+------
>>> --+------------------+-----------+-----------------+--------
>>> -------------+------------------+--------------+------------
>>> ----+-------------+-------------+----------------------+----
>>> ------------------------+--------------------+--------+-----
>>> ---+------------------+----------------+--------+----------+
>>> -----------------+----------+
>>>
>>> -------------------------
>>>
>>> *Daniel Lopes*
>>> Chief Data and Analytics Officer | OneMatch
>>> c: +55 (18) 99764-2733 | https://www.linkedin.com/in/dslopes
>>>
>>> www.onematch.com.br
>>> <http://www.onematch.com.br/?utm_source=EmailSignature&utm_term=daniel-lopes>
>>>
>>> On Thu, Sep 8, 2016 at 5:33 AM, Marco Mistroni <mm...@gmail.com>
>>> wrote:
>>>
>>>> Pls paste code and sample CSV
>>>> I m guessing it has to do with formatting time?
>>>> Kr
>>>>
>>>> On 8 Sep 2016 12:38 am, "Daniel Lopes" <da...@onematch.com.br> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> I'm* importing a few CSV*s with spark-csv package,
>>>>> Always when I give a select at each one looks ok
>>>>> But when i join then with sqlContext.sql give me this error
>>>>>
>>>>> all tables has fields timestamp
>>>>>
>>>>> joins are not with this dates
>>>>>
>>>>>
>>>>> *Py4JJavaError: An error occurred while calling o643.showString.*
>>>>> : org.apache.spark.SparkException: Job aborted due to stage failure:
>>>>> Task 54 in stage 92.0 failed 10 times, most recent failure: Lost task 54.9
>>>>> in stage 92.0 (TID 6356, yp-spark-dal09-env5-0036):
>>>>> org.apache.spark.api.python.PythonException: Traceback (most recent
>>>>> call last):
>>>>>   File "/usr/local/src/spark160master/spark-1.6.0-bin-2.6.0/python/
>>>>> lib/pyspark.zip/pyspark/worker.py", line 111, in main
>>>>>     process()
>>>>>   File "/usr/local/src/spark160master/spark-1.6.0-bin-2.6.0/python/
>>>>> lib/pyspark.zip/pyspark/worker.py", line 106, in process
>>>>>     serializer.dump_stream(func(split_index, iterator), outfile)
>>>>>   File "/usr/local/src/spark160master/spark-1.6.0-bin-2.6.0/python/
>>>>> lib/pyspark.zip/pyspark/serializers.py", line 263, in dump_stream
>>>>>     vs = list(itertools.islice(iterator, batch))
>>>>>   File "/usr/local/src/spark160master/spark/python/pyspark/sql/functions.py",
>>>>> line 1563, in <lambda>
>>>>>     func = lambda _, it: map(lambda x: returnType.toInternal(f(*x)),
>>>>> it)
>>>>>   File "/usr/local/src/spark160master/spark-1.6.0-bin-2.6.0/python/
>>>>> lib/pyspark.zip/pyspark/sql/types.py", line 191, in toInternal
>>>>>     else time.mktime(dt.timetuple()))
>>>>> *ValueError: year out of range  *
>>>>>
>>>>> Any one knows this problem?
>>>>>
>>>>> Best,
>>>>>
>>>>> *Daniel Lopes*
>>>>> Chief Data and Analytics Officer | OneMatch
>>>>> c: +55 (18) 99764-2733 | https://www.linkedin.com/in/dslopes
>>>>>
>>>>> www.onematch.com.br
>>>>> <http://www.onematch.com.br/?utm_source=EmailSignature&utm_term=daniel-lopes>
>>>>>
>>>>
>>>
>

Re: year out of range

Posted by Daniel Lopes <da...@onematch.com.br>.
Thanks,

I *tested* the function offline and works
Tested too with select * from after convert the data and see the new data
good
*but* if I *register as temp table* to *join other table* stilll shows *the
same error*.

ValueError: year out of range

Best,

*Daniel Lopes*
Chief Data and Analytics Officer | OneMatch
c: +55 (18) 99764-2733 | https://www.linkedin.com/in/dslopes

www.onematch.com.br
<http://www.onematch.com.br/?utm_source=EmailSignature&utm_term=daniel-lopes>

On Thu, Sep 8, 2016 at 9:43 AM, Marco Mistroni <mm...@gmail.com> wrote:

> Daniel
> Test the parse date offline to make sure it returns what you expect
> If it does   in spark shell create a df with 1 row only and run ur UDF. U
> should b able to see issue
> If not send me a reduced CSV file at my email and I give it a try this eve
> ....hopefully someone else will b able to assist in meantime
> U don't need to run a full spark app to debug issue
> Ur problem. Is either in the parse date or in what gets passed to the UDF
> Hth
>
> On 8 Sep 2016 1:31 pm, "Daniel Lopes" <da...@onematch.com.br> wrote:
>
>> Thanks Marco for your response.
>>
>> The field came encoded by SQL Server in locale pt_BR.
>>
>> The code that I am formating is:
>>
>> --------------------------
>> def parse_date(argument, format_date='%Y-%m%d %H:%M:%S'):
>>     try:
>>         locale.setlocale(locale.LC_TIME, 'pt_BR.utf8')
>>         return datetime.strptime(argument, format_date)
>>     except:
>>         return None
>>
>> convert_date = funcspk.udf(lambda x: parse_date(x, '%b %d %Y %H:%M'),
>> TimestampType())
>>
>> transacoes = transacoes.withColumn('tr_Vencimento',
>> convert_date(transacoes.*tr_Vencimento*))
>>
>> --------------------------
>>
>> the sample is
>>
>> -------------------------
>> +-----------------+----------------+-----------------+------
>> --+------------------+-----------+-----------------+--------
>> -------------+------------------+--------------+------------
>> ----+-------------+-------------+----------------------+----
>> ------------------------+--------------------+--------+-----
>> ---+------------------+----------------+--------+----------+
>> -----------------+----------+
>> |tr_NumeroContrato|tr_TipoDocumento|    *tr_Vencimento*|tr_Valor|tr_Dat
>> aRecebimento|tr_TaxaMora|tr_DescontoMaximo|tr_DescontoMaxi
>> moCorr|tr_ValorAtualizado|tr_ComGarantia|tr_ValorDesconto|tr
>> _ValorJuros|tr_ValorMulta|tr_DataDevolucaoCheque|tr_ValorCorrigidoContratante|
>>  tr_DataNotificacao|tr_Banco|tr_Praca|tr_DescricaoAlinea|tr_
>> Enquadramento|tr_Linha|tr_Arquivo|tr_DataImportacao|tr_Agencia|
>> +-----------------+----------------+-----------------+------
>> --+------------------+-----------+-----------------+--------
>> -------------+------------------+--------------+------------
>> ----+-------------+-------------+----------------------+----
>> ------------------------+--------------------+--------+-----
>> ---+------------------+----------------+--------+----------+
>> -----------------+----------+
>> | 0000992600153001|                |*Jul 20 2015 12:00*|  254.35|
>>        null|       null|             null|                 null|
>>    null|             0|            null|         null|         null|
>>            null|                      254.35|2015-07-20 12:00:...|    null|
>>    null|              null|            null|    null|      null|
>>   null|      null|
>> | 0000992600153001|                |*Abr 20 2015 12:00*|  254.35|
>>        null|       null|             null|                 null|
>>    null|             0|            null|         null|         null|
>>            null|                      254.35|                null|    null|
>>    null|              null|            null|    null|      null|
>>   null|      null|
>> | 0000992600153001|                |Nov 20 2015 12:00|  254.35|
>>    null|       null|             null|                 null|
>>  null|             0|            null|         null|         null|
>>          null|                      254.35|2015-11-20 12:00:...|    null|
>>  null|              null|            null|    null|      null|
>> null|      null|
>> | 0000992600153001|                |Dez 20 2015 12:00|  254.35|
>>    null|       null|             null|                 null|
>>  null|             0|            null|         null|         null|
>>          null|                      254.35|                null|    null|
>>  null|              null|            null|    null|      null|
>> null|      null|
>> | 0000992600153001|                |Fev 20 2016 12:00|  254.35|
>>    null|       null|             null|                 null|
>>  null|             0|            null|         null|         null|
>>          null|                      254.35|                null|    null|
>>  null|              null|            null|    null|      null|
>> null|      null|
>> | 0000992600153001|                |Fev 20 2015 12:00|  254.35|
>>    null|       null|             null|                 null|
>>  null|             0|            null|         null|         null|
>>          null|                      254.35|                null|    null|
>>  null|              null|            null|    null|      null|
>> null|      null|
>> | 0000992600153001|                |Jun 20 2015 12:00|  254.35|
>>    null|       null|             null|                 null|
>>  null|             0|            null|         null|         null|
>>          null|                      254.35|2015-06-20 12:00:...|    null|
>>  null|              null|            null|    null|      null|
>> null|      null|
>> | 0000992600153001|                |Ago 20 2015 12:00|  254.35|
>>    null|       null|             null|                 null|
>>  null|             0|            null|         null|         null|
>>          null|                      254.35|                null|    null|
>>  null|              null|            null|    null|      null|
>> null|      null|
>> | 0000992600153001|                |Jan 20 2016 12:00|  254.35|
>>    null|       null|             null|                 null|
>>  null|             0|            null|         null|         null|
>>          null|                      254.35|2016-01-20 12:00:...|    null|
>>  null|              null|            null|    null|      null|
>> null|      null|
>> | 0000992600153001|                |Jan 20 2015 12:00|  254.35|
>>    null|       null|             null|                 null|
>>  null|             0|            null|         null|         null|
>>          null|                      254.35|2015-01-20 12:00:...|    null|
>>  null|              null|            null|    null|      null|
>> null|      null|
>> | 0000992600153001|                |Set 20 2015 12:00|  254.35|
>>    null|       null|             null|                 null|
>>  null|             0|            null|         null|         null|
>>          null|                      254.35|                null|    null|
>>  null|              null|            null|    null|      null|
>> null|      null|
>> | 0000992600153001|                |Mai 20 2015 12:00|  254.35|
>>    null|       null|             null|                 null|
>>  null|             0|            null|         null|         null|
>>          null|                      254.35|                null|    null|
>>  null|              null|            null|    null|      null|
>> null|      null|
>> | 0000992600153001|                |Out 20 2015 12:00|  254.35|
>>    null|       null|             null|                 null|
>>  null|             0|            null|         null|         null|
>>          null|                      254.35|                null|    null|
>>  null|              null|            null|    null|      null|
>> null|      null|
>> | 0000992600153001|                |Mar 20 2015 12:00|  254.35|
>>    null|       null|             null|                 null|
>>  null|             0|            null|         null|         null|
>>          null|                      254.35|2015-03-20 12:00:...|    null|
>>  null|              null|            null|    null|      null|
>> null|      null|
>> +-----------------+----------------+-----------------+------
>> --+------------------+-----------+-----------------+--------
>> -------------+------------------+--------------+------------
>> ----+-------------+-------------+----------------------+----
>> ------------------------+--------------------+--------+-----
>> ---+------------------+----------------+--------+----------+
>> -----------------+----------+
>>
>> -------------------------
>>
>> *Daniel Lopes*
>> Chief Data and Analytics Officer | OneMatch
>> c: +55 (18) 99764-2733 | https://www.linkedin.com/in/dslopes
>>
>> www.onematch.com.br
>> <http://www.onematch.com.br/?utm_source=EmailSignature&utm_term=daniel-lopes>
>>
>> On Thu, Sep 8, 2016 at 5:33 AM, Marco Mistroni <mm...@gmail.com>
>> wrote:
>>
>>> Pls paste code and sample CSV
>>> I m guessing it has to do with formatting time?
>>> Kr
>>>
>>> On 8 Sep 2016 12:38 am, "Daniel Lopes" <da...@onematch.com.br> wrote:
>>>
>>>> Hi,
>>>>
>>>> I'm* importing a few CSV*s with spark-csv package,
>>>> Always when I give a select at each one looks ok
>>>> But when i join then with sqlContext.sql give me this error
>>>>
>>>> all tables has fields timestamp
>>>>
>>>> joins are not with this dates
>>>>
>>>>
>>>> *Py4JJavaError: An error occurred while calling o643.showString.*
>>>> : org.apache.spark.SparkException: Job aborted due to stage failure:
>>>> Task 54 in stage 92.0 failed 10 times, most recent failure: Lost task 54.9
>>>> in stage 92.0 (TID 6356, yp-spark-dal09-env5-0036):
>>>> org.apache.spark.api.python.PythonException: Traceback (most recent
>>>> call last):
>>>>   File "/usr/local/src/spark160master/spark-1.6.0-bin-2.6.0/python/
>>>> lib/pyspark.zip/pyspark/worker.py", line 111, in main
>>>>     process()
>>>>   File "/usr/local/src/spark160master/spark-1.6.0-bin-2.6.0/python/
>>>> lib/pyspark.zip/pyspark/worker.py", line 106, in process
>>>>     serializer.dump_stream(func(split_index, iterator), outfile)
>>>>   File "/usr/local/src/spark160master/spark-1.6.0-bin-2.6.0/python/
>>>> lib/pyspark.zip/pyspark/serializers.py", line 263, in dump_stream
>>>>     vs = list(itertools.islice(iterator, batch))
>>>>   File "/usr/local/src/spark160master/spark/python/pyspark/sql/functions.py",
>>>> line 1563, in <lambda>
>>>>     func = lambda _, it: map(lambda x: returnType.toInternal(f(*x)), it)
>>>>   File "/usr/local/src/spark160master/spark-1.6.0-bin-2.6.0/python/
>>>> lib/pyspark.zip/pyspark/sql/types.py", line 191, in toInternal
>>>>     else time.mktime(dt.timetuple()))
>>>> *ValueError: year out of range  *
>>>>
>>>> Any one knows this problem?
>>>>
>>>> Best,
>>>>
>>>> *Daniel Lopes*
>>>> Chief Data and Analytics Officer | OneMatch
>>>> c: +55 (18) 99764-2733 | https://www.linkedin.com/in/dslopes
>>>>
>>>> www.onematch.com.br
>>>> <http://www.onematch.com.br/?utm_source=EmailSignature&utm_term=daniel-lopes>
>>>>
>>>
>>

Re: year out of range

Posted by Marco Mistroni <mm...@gmail.com>.
Daniel
Test the parse date offline to make sure it returns what you expect
If it does   in spark shell create a df with 1 row only and run ur UDF. U
should b able to see issue
If not send me a reduced CSV file at my email and I give it a try this eve
....hopefully someone else will b able to assist in meantime
U don't need to run a full spark app to debug issue
Ur problem. Is either in the parse date or in what gets passed to the UDF
Hth

On 8 Sep 2016 1:31 pm, "Daniel Lopes" <da...@onematch.com.br> wrote:

> Thanks Marco for your response.
>
> The field came encoded by SQL Server in locale pt_BR.
>
> The code that I am formating is:
>
> --------------------------
> def parse_date(argument, format_date='%Y-%m%d %H:%M:%S'):
>     try:
>         locale.setlocale(locale.LC_TIME, 'pt_BR.utf8')
>         return datetime.strptime(argument, format_date)
>     except:
>         return None
>
> convert_date = funcspk.udf(lambda x: parse_date(x, '%b %d %Y %H:%M'),
> TimestampType())
>
> transacoes = transacoes.withColumn('tr_Vencimento',
> convert_date(transacoes.*tr_Vencimento*))
>
> --------------------------
>
> the sample is
>
> -------------------------
> +-----------------+----------------+-----------------+------
> --+------------------+-----------+-----------------+--------
> -------------+------------------+--------------+------------
> ----+-------------+-------------+----------------------+----
> ------------------------+--------------------+--------+-----
> ---+------------------+----------------+--------+----------+
> -----------------+----------+
> |tr_NumeroContrato|tr_TipoDocumento|    *tr_Vencimento*|tr_Valor|tr_
> DataRecebimento|tr_TaxaMora|tr_DescontoMaximo|tr_DescontoMaximoCorr|tr_
> ValorAtualizado|tr_ComGarantia|tr_ValorDesconto|
> tr_ValorJuros|tr_ValorMulta|tr_DataDevolucaoCheque|tr_ValorCorrigidoContratante|
>  tr_DataNotificacao|tr_Banco|tr_Praca|tr_DescricaoAlinea|
> tr_Enquadramento|tr_Linha|tr_Arquivo|tr_DataImportacao|tr_Agencia|
> +-----------------+----------------+-----------------+------
> --+------------------+-----------+-----------------+--------
> -------------+------------------+--------------+------------
> ----+-------------+-------------+----------------------+----
> ------------------------+--------------------+--------+-----
> ---+------------------+----------------+--------+----------+
> -----------------+----------+
> | 0000992600153001|                |*Jul 20 2015 12:00*|  254.35|
>      null|       null|             null|                 null|
>  null|             0|            null|         null|         null|
>          null|                      254.35|2015-07-20 12:00:...|    null|
>  null|              null|            null|    null|      null|
> null|      null|
> | 0000992600153001|                |*Abr 20 2015 12:00*|  254.35|
>      null|       null|             null|                 null|
>  null|             0|            null|         null|         null|
>          null|                      254.35|                null|    null|
>  null|              null|            null|    null|      null|
> null|      null|
> | 0000992600153001|                |Nov 20 2015 12:00|  254.35|
>    null|       null|             null|                 null|
>  null|             0|            null|         null|         null|
>          null|                      254.35|2015-11-20 12:00:...|    null|
>  null|              null|            null|    null|      null|
> null|      null|
> | 0000992600153001|                |Dez 20 2015 12:00|  254.35|
>    null|       null|             null|                 null|
>  null|             0|            null|         null|         null|
>          null|                      254.35|                null|    null|
>  null|              null|            null|    null|      null|
> null|      null|
> | 0000992600153001|                |Fev 20 2016 12:00|  254.35|
>    null|       null|             null|                 null|
>  null|             0|            null|         null|         null|
>          null|                      254.35|                null|    null|
>  null|              null|            null|    null|      null|
> null|      null|
> | 0000992600153001|                |Fev 20 2015 12:00|  254.35|
>    null|       null|             null|                 null|
>  null|             0|            null|         null|         null|
>          null|                      254.35|                null|    null|
>  null|              null|            null|    null|      null|
> null|      null|
> | 0000992600153001|                |Jun 20 2015 12:00|  254.35|
>    null|       null|             null|                 null|
>  null|             0|            null|         null|         null|
>          null|                      254.35|2015-06-20 12:00:...|    null|
>  null|              null|            null|    null|      null|
> null|      null|
> | 0000992600153001|                |Ago 20 2015 12:00|  254.35|
>    null|       null|             null|                 null|
>  null|             0|            null|         null|         null|
>          null|                      254.35|                null|    null|
>  null|              null|            null|    null|      null|
> null|      null|
> | 0000992600153001|                |Jan 20 2016 12:00|  254.35|
>    null|       null|             null|                 null|
>  null|             0|            null|         null|         null|
>          null|                      254.35|2016-01-20 12:00:...|    null|
>  null|              null|            null|    null|      null|
> null|      null|
> | 0000992600153001|                |Jan 20 2015 12:00|  254.35|
>    null|       null|             null|                 null|
>  null|             0|            null|         null|         null|
>          null|                      254.35|2015-01-20 12:00:...|    null|
>  null|              null|            null|    null|      null|
> null|      null|
> | 0000992600153001|                |Set 20 2015 12:00|  254.35|
>    null|       null|             null|                 null|
>  null|             0|            null|         null|         null|
>          null|                      254.35|                null|    null|
>  null|              null|            null|    null|      null|
> null|      null|
> | 0000992600153001|                |Mai 20 2015 12:00|  254.35|
>    null|       null|             null|                 null|
>  null|             0|            null|         null|         null|
>          null|                      254.35|                null|    null|
>  null|              null|            null|    null|      null|
> null|      null|
> | 0000992600153001|                |Out 20 2015 12:00|  254.35|
>    null|       null|             null|                 null|
>  null|             0|            null|         null|         null|
>          null|                      254.35|                null|    null|
>  null|              null|            null|    null|      null|
> null|      null|
> | 0000992600153001|                |Mar 20 2015 12:00|  254.35|
>    null|       null|             null|                 null|
>  null|             0|            null|         null|         null|
>          null|                      254.35|2015-03-20 12:00:...|    null|
>  null|              null|            null|    null|      null|
> null|      null|
> +-----------------+----------------+-----------------+------
> --+------------------+-----------+-----------------+--------
> -------------+------------------+--------------+------------
> ----+-------------+-------------+----------------------+----
> ------------------------+--------------------+--------+-----
> ---+------------------+----------------+--------+----------+
> -----------------+----------+
>
> -------------------------
>
> *Daniel Lopes*
> Chief Data and Analytics Officer | OneMatch
> c: +55 (18) 99764-2733 | https://www.linkedin.com/in/dslopes
>
> www.onematch.com.br
> <http://www.onematch.com.br/?utm_source=EmailSignature&utm_term=daniel-lopes>
>
> On Thu, Sep 8, 2016 at 5:33 AM, Marco Mistroni <mm...@gmail.com>
> wrote:
>
>> Pls paste code and sample CSV
>> I m guessing it has to do with formatting time?
>> Kr
>>
>> On 8 Sep 2016 12:38 am, "Daniel Lopes" <da...@onematch.com.br> wrote:
>>
>>> Hi,
>>>
>>> I'm* importing a few CSV*s with spark-csv package,
>>> Always when I give a select at each one looks ok
>>> But when i join then with sqlContext.sql give me this error
>>>
>>> all tables has fields timestamp
>>>
>>> joins are not with this dates
>>>
>>>
>>> *Py4JJavaError: An error occurred while calling o643.showString.*
>>> : org.apache.spark.SparkException: Job aborted due to stage failure:
>>> Task 54 in stage 92.0 failed 10 times, most recent failure: Lost task 54.9
>>> in stage 92.0 (TID 6356, yp-spark-dal09-env5-0036):
>>> org.apache.spark.api.python.PythonException: Traceback (most recent
>>> call last):
>>>   File "/usr/local/src/spark160master/spark-1.6.0-bin-2.6.0/python/
>>> lib/pyspark.zip/pyspark/worker.py", line 111, in main
>>>     process()
>>>   File "/usr/local/src/spark160master/spark-1.6.0-bin-2.6.0/python/
>>> lib/pyspark.zip/pyspark/worker.py", line 106, in process
>>>     serializer.dump_stream(func(split_index, iterator), outfile)
>>>   File "/usr/local/src/spark160master/spark-1.6.0-bin-2.6.0/python/
>>> lib/pyspark.zip/pyspark/serializers.py", line 263, in dump_stream
>>>     vs = list(itertools.islice(iterator, batch))
>>>   File "/usr/local/src/spark160master/spark/python/pyspark/sql/functions.py",
>>> line 1563, in <lambda>
>>>     func = lambda _, it: map(lambda x: returnType.toInternal(f(*x)), it)
>>>   File "/usr/local/src/spark160master/spark-1.6.0-bin-2.6.0/python/
>>> lib/pyspark.zip/pyspark/sql/types.py", line 191, in toInternal
>>>     else time.mktime(dt.timetuple()))
>>> *ValueError: year out of range  *
>>>
>>> Any one knows this problem?
>>>
>>> Best,
>>>
>>> *Daniel Lopes*
>>> Chief Data and Analytics Officer | OneMatch
>>> c: +55 (18) 99764-2733 | https://www.linkedin.com/in/dslopes
>>>
>>> www.onematch.com.br
>>> <http://www.onematch.com.br/?utm_source=EmailSignature&utm_term=daniel-lopes>
>>>
>>
>

Re: year out of range

Posted by Daniel Lopes <da...@onematch.com.br>.
Thanks Marco for your response.

The field came encoded by SQL Server in locale pt_BR.

The code that I am formating is:

--------------------------
def parse_date(argument, format_date='%Y-%m%d %H:%M:%S'):
    try:
        locale.setlocale(locale.LC_TIME, 'pt_BR.utf8')
        return datetime.strptime(argument, format_date)
    except:
        return None

convert_date = funcspk.udf(lambda x: parse_date(x, '%b %d %Y %H:%M'),
TimestampType())

transacoes = transacoes.withColumn('tr_Vencimento', convert_date(transacoes.
*tr_Vencimento*))

--------------------------

the sample is

-------------------------
+-----------------+----------------+-----------------+--------+------------------+-----------+-----------------+---------------------+------------------+--------------+----------------+-------------+-------------+----------------------+----------------------------+--------------------+--------+--------+------------------+----------------+--------+----------+-----------------+----------+
|tr_NumeroContrato|tr_TipoDocumento|
*tr_Vencimento*|tr_Valor|tr_DataRecebimento|tr_TaxaMora|tr_DescontoMaximo|tr_DescontoMaximoCorr|tr_ValorAtualizado|tr_ComGarantia|tr_ValorDesconto|tr_ValorJuros|tr_ValorMulta|tr_DataDevolucaoCheque|tr_ValorCorrigidoContratante|
 tr_DataNotificacao|tr_Banco|tr_Praca|tr_DescricaoAlinea|tr_Enquadramento|tr_Linha|tr_Arquivo|tr_DataImportacao|tr_Agencia|
+-----------------+----------------+-----------------+--------+------------------+-----------+-----------------+---------------------+------------------+--------------+----------------+-------------+-------------+----------------------+----------------------------+--------------------+--------+--------+------------------+----------------+--------+----------+-----------------+----------+
| 0000992600153001|                |*Jul 20 2015 12:00*|  254.35|
   null|       null|             null|                 null|
 null|             0|            null|         null|         null|
         null|                      254.35|2015-07-20 12:00:...|    null|
 null|              null|            null|    null|      null|
null|      null|
| 0000992600153001|                |*Abr 20 2015 12:00*|  254.35|
   null|       null|             null|                 null|
 null|             0|            null|         null|         null|
         null|                      254.35|                null|    null|
 null|              null|            null|    null|      null|
null|      null|
| 0000992600153001|                |Nov 20 2015 12:00|  254.35|
 null|       null|             null|                 null|
 null|             0|            null|         null|         null|
         null|                      254.35|2015-11-20 12:00:...|    null|
 null|              null|            null|    null|      null|
null|      null|
| 0000992600153001|                |Dez 20 2015 12:00|  254.35|
 null|       null|             null|                 null|
 null|             0|            null|         null|         null|
         null|                      254.35|                null|    null|
 null|              null|            null|    null|      null|
null|      null|
| 0000992600153001|                |Fev 20 2016 12:00|  254.35|
 null|       null|             null|                 null|
 null|             0|            null|         null|         null|
         null|                      254.35|                null|    null|
 null|              null|            null|    null|      null|
null|      null|
| 0000992600153001|                |Fev 20 2015 12:00|  254.35|
 null|       null|             null|                 null|
 null|             0|            null|         null|         null|
         null|                      254.35|                null|    null|
 null|              null|            null|    null|      null|
null|      null|
| 0000992600153001|                |Jun 20 2015 12:00|  254.35|
 null|       null|             null|                 null|
 null|             0|            null|         null|         null|
         null|                      254.35|2015-06-20 12:00:...|    null|
 null|              null|            null|    null|      null|
null|      null|
| 0000992600153001|                |Ago 20 2015 12:00|  254.35|
 null|       null|             null|                 null|
 null|             0|            null|         null|         null|
         null|                      254.35|                null|    null|
 null|              null|            null|    null|      null|
null|      null|
| 0000992600153001|                |Jan 20 2016 12:00|  254.35|
 null|       null|             null|                 null|
 null|             0|            null|         null|         null|
         null|                      254.35|2016-01-20 12:00:...|    null|
 null|              null|            null|    null|      null|
null|      null|
| 0000992600153001|                |Jan 20 2015 12:00|  254.35|
 null|       null|             null|                 null|
 null|             0|            null|         null|         null|
         null|                      254.35|2015-01-20 12:00:...|    null|
 null|              null|            null|    null|      null|
null|      null|
| 0000992600153001|                |Set 20 2015 12:00|  254.35|
 null|       null|             null|                 null|
 null|             0|            null|         null|         null|
         null|                      254.35|                null|    null|
 null|              null|            null|    null|      null|
null|      null|
| 0000992600153001|                |Mai 20 2015 12:00|  254.35|
 null|       null|             null|                 null|
 null|             0|            null|         null|         null|
         null|                      254.35|                null|    null|
 null|              null|            null|    null|      null|
null|      null|
| 0000992600153001|                |Out 20 2015 12:00|  254.35|
 null|       null|             null|                 null|
 null|             0|            null|         null|         null|
         null|                      254.35|                null|    null|
 null|              null|            null|    null|      null|
null|      null|
| 0000992600153001|                |Mar 20 2015 12:00|  254.35|
 null|       null|             null|                 null|
 null|             0|            null|         null|         null|
         null|                      254.35|2015-03-20 12:00:...|    null|
 null|              null|            null|    null|      null|
null|      null|
+-----------------+----------------+-----------------+--------+------------------+-----------+-----------------+---------------------+------------------+--------------+----------------+-------------+-------------+----------------------+----------------------------+--------------------+--------+--------+------------------+----------------+--------+----------+-----------------+----------+

-------------------------

*Daniel Lopes*
Chief Data and Analytics Officer | OneMatch
c: +55 (18) 99764-2733 | https://www.linkedin.com/in/dslopes

www.onematch.com.br
<http://www.onematch.com.br/?utm_source=EmailSignature&utm_term=daniel-lopes>

On Thu, Sep 8, 2016 at 5:33 AM, Marco Mistroni <mm...@gmail.com> wrote:

> Pls paste code and sample CSV
> I m guessing it has to do with formatting time?
> Kr
>
> On 8 Sep 2016 12:38 am, "Daniel Lopes" <da...@onematch.com.br> wrote:
>
>> Hi,
>>
>> I'm* importing a few CSV*s with spark-csv package,
>> Always when I give a select at each one looks ok
>> But when i join then with sqlContext.sql give me this error
>>
>> all tables has fields timestamp
>>
>> joins are not with this dates
>>
>>
>> *Py4JJavaError: An error occurred while calling o643.showString.*
>> : org.apache.spark.SparkException: Job aborted due to stage failure:
>> Task 54 in stage 92.0 failed 10 times, most recent failure: Lost task 54.9
>> in stage 92.0 (TID 6356, yp-spark-dal09-env5-0036):
>> org.apache.spark.api.python.PythonException: Traceback (most recent call
>> last):
>>   File "/usr/local/src/spark160master/spark-1.6.0-bin-2.6.0/python/
>> lib/pyspark.zip/pyspark/worker.py", line 111, in main
>>     process()
>>   File "/usr/local/src/spark160master/spark-1.6.0-bin-2.6.0/python/
>> lib/pyspark.zip/pyspark/worker.py", line 106, in process
>>     serializer.dump_stream(func(split_index, iterator), outfile)
>>   File "/usr/local/src/spark160master/spark-1.6.0-bin-2.6.0/python/
>> lib/pyspark.zip/pyspark/serializers.py", line 263, in dump_stream
>>     vs = list(itertools.islice(iterator, batch))
>>   File "/usr/local/src/spark160master/spark/python/pyspark/sql/functions.py",
>> line 1563, in <lambda>
>>     func = lambda _, it: map(lambda x: returnType.toInternal(f(*x)), it)
>>   File "/usr/local/src/spark160master/spark-1.6.0-bin-2.6.0/python/
>> lib/pyspark.zip/pyspark/sql/types.py", line 191, in toInternal
>>     else time.mktime(dt.timetuple()))
>> *ValueError: year out of range  *
>>
>> Any one knows this problem?
>>
>> Best,
>>
>> *Daniel Lopes*
>> Chief Data and Analytics Officer | OneMatch
>> c: +55 (18) 99764-2733 | https://www.linkedin.com/in/dslopes
>>
>> www.onematch.com.br
>> <http://www.onematch.com.br/?utm_source=EmailSignature&utm_term=daniel-lopes>
>>
>

Re: year out of range

Posted by Marco Mistroni <mm...@gmail.com>.
Pls paste code and sample CSV
I m guessing it has to do with formatting time?
Kr

On 8 Sep 2016 12:38 am, "Daniel Lopes" <da...@onematch.com.br> wrote:

> Hi,
>
> I'm* importing a few CSV*s with spark-csv package,
> Always when I give a select at each one looks ok
> But when i join then with sqlContext.sql give me this error
>
> all tables has fields timestamp
>
> joins are not with this dates
>
>
> *Py4JJavaError: An error occurred while calling o643.showString.*
> : org.apache.spark.SparkException: Job aborted due to stage failure: Task
> 54 in stage 92.0 failed 10 times, most recent failure: Lost task 54.9 in
> stage 92.0 (TID 6356, yp-spark-dal09-env5-0036):
> org.apache.spark.api.python.PythonException: Traceback (most recent call
> last):
>   File "/usr/local/src/spark160master/spark-1.6.0-
> bin-2.6.0/python/lib/pyspark.zip/pyspark/worker.py", line 111, in main
>     process()
>   File "/usr/local/src/spark160master/spark-1.6.0-
> bin-2.6.0/python/lib/pyspark.zip/pyspark/worker.py", line 106, in process
>     serializer.dump_stream(func(split_index, iterator), outfile)
>   File "/usr/local/src/spark160master/spark-1.6.0-
> bin-2.6.0/python/lib/pyspark.zip/pyspark/serializers.py", line 263, in
> dump_stream
>     vs = list(itertools.islice(iterator, batch))
>   File "/usr/local/src/spark160master/spark/python/pyspark/sql/functions.py",
> line 1563, in <lambda>
>     func = lambda _, it: map(lambda x: returnType.toInternal(f(*x)), it)
>   File "/usr/local/src/spark160master/spark-1.6.0-
> bin-2.6.0/python/lib/pyspark.zip/pyspark/sql/types.py", line 191, in
> toInternal
>     else time.mktime(dt.timetuple()))
> *ValueError: year out of range  *
>
> Any one knows this problem?
>
> Best,
>
> *Daniel Lopes*
> Chief Data and Analytics Officer | OneMatch
> c: +55 (18) 99764-2733 | https://www.linkedin.com/in/dslopes
>
> www.onematch.com.br
> <http://www.onematch.com.br/?utm_source=EmailSignature&utm_term=daniel-lopes>
>