You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spark.apache.org by Hyukjin Kwon <gu...@gmail.com> on 2017/08/05 07:41:06 UTC

Question, Flaky tests: pyspark.sql.tests.ArrowTests tests in Jenkins worker 5(?)

Hi all,

I am seeing flaky Python tests time to time and if I am not mistaken mostly
in amp-jenkins-worker-05:


======================================================================
ERROR: test_filtered_frame (pyspark.sql.tests.ArrowTests)
----------------------------------------------------------------------
Traceback (most recent call last):
  File
"/home/anaconda/envs/py3k/lib/python3.4/site-packages/pandas/__init__.py",
line 25, in <module>
    from pandas import hashtable, tslib, lib
ImportError: cannot import name 'hashtable'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File
"/home/jenkins/workspace/SparkPullRequestBuilder/python/pyspark/sql/tests.py",
line 3057, in test_filtered_frame
    pdf = df.filter("i < 0").toPandas()
  File
"/home/jenkins/workspace/SparkPullRequestBuilder/python/pyspark/sql/dataframe.py",
line 1727, in toPandas
    import pandas as pd
  File
"/home/anaconda/envs/py3k/lib/python3.4/site-packages/pandas/__init__.py",
line 31, in <module>
    "the C extensions first.".format(module))
ImportError: C extension: 'hashtable' not built. If you want to import
pandas from the source directory, you may need to run 'python setup.py
build_ext --inplace --force' to build the C extensions first.

======================================================================
ERROR: test_null_conversion (pyspark.sql.tests.ArrowTests)
----------------------------------------------------------------------
...

======================================================================
ERROR: test_pandas_round_trip (pyspark.sql.tests.ArrowTests)
----------------------------------------------------------------------
...

======================================================================
ERROR: test_toPandas_arrow_toggle (pyspark.sql.tests.ArrowTests)
----------------------------------------------------------------------
...


I sounds environment problem apparently due to missing hashtable (which I
believe should have been compiled and importable properly).

I suspect few possibilities such as a bug somewhere or unsuccessful manual
build from Pandas source but I am unable to reproduce this and check this.
So, yes. This is rather my guess.


Does anyone know if this is an environment problem and how to fix this?

Re: Question, Flaky tests: pyspark.sql.tests.ArrowTests tests in Jenkins worker 5(?)

Posted by Hyukjin Kwon <gu...@gmail.com>.
Thank you, Shane.

2017-08-06 8:30 GMT+09:00 shane knapp <sk...@berkeley.edu>:

> ok, first test to run post-fix is green:
> https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80289/
>
> i'll keep an eye on this worker over the next few days.
>
> shane
>
> On Sat, Aug 5, 2017 at 11:06 AM, shane knapp <sk...@berkeley.edu> wrote:
> > amp-jenkins-worker-05 had 0.20.3 installed for some reason.  it's now
> > been downgraded to 0.19.2 and matches the other workers.
> >
> > shane
> >
> > On Sat, Aug 5, 2017 at 2:01 AM, Liang-Chi Hsieh <vi...@gmail.com>
> wrote:
> >>
> >> Maybe a possible fix:
> >> https://stackoverflow.com/questions/31495657/
> development-build-of-pandas-giving-importerror-c-
> extension-hashtable-not-bui
> >>
> >>
> >> Hyukjin Kwon wrote
> >>> Hi all,
> >>>
> >>> I am seeing flaky Python tests time to time and if I am not mistaken
> >>> mostly
> >>> in amp-jenkins-worker-05:
> >>>
> >>>
> >>> ======================================================================
> >>> ERROR: test_filtered_frame (pyspark.sql.tests.ArrowTests)
> >>> ----------------------------------------------------------------------
> >>> Traceback (most recent call last):
> >>>   File
> >>> "/home/anaconda/envs/py3k/lib/python3.4/site-packages/
> pandas/__init__.py",
> >>> line 25, in
> >>> <module>
> >>>     from pandas import hashtable, tslib, lib
> >>> ImportError: cannot import name 'hashtable'
> >>>
> >>> During handling of the above exception, another exception occurred:
> >>>
> >>> Traceback (most recent call last):
> >>>   File
> >>> "/home/jenkins/workspace/SparkPullRequestBuilder/
> python/pyspark/sql/tests.py",
> >>> line 3057, in test_filtered_frame
> >>>     pdf = df.filter("i < 0").toPandas()
> >>>   File
> >>> "/home/jenkins/workspace/SparkPullRequestBuilder/
> python/pyspark/sql/dataframe.py",
> >>> line 1727, in toPandas
> >>>     import pandas as pd
> >>>   File
> >>> "/home/anaconda/envs/py3k/lib/python3.4/site-packages/
> pandas/__init__.py",
> >>> line 31, in
> >>> <module>
> >>>     "the C extensions first.".format(module))
> >>> ImportError: C extension: 'hashtable' not built. If you want to import
> >>> pandas from the source directory, you may need to run 'python setup.py
> >>> build_ext --inplace --force' to build the C extensions first.
> >>>
> >>> ======================================================================
> >>> ERROR: test_null_conversion (pyspark.sql.tests.ArrowTests)
> >>> ----------------------------------------------------------------------
> >>> ...
> >>>
> >>> ======================================================================
> >>> ERROR: test_pandas_round_trip (pyspark.sql.tests.ArrowTests)
> >>> ----------------------------------------------------------------------
> >>> ...
> >>>
> >>> ======================================================================
> >>> ERROR: test_toPandas_arrow_toggle (pyspark.sql.tests.ArrowTests)
> >>> ----------------------------------------------------------------------
> >>> ...
> >>>
> >>>
> >>> I sounds environment problem apparently due to missing hashtable
> (which I
> >>> believe should have been compiled and importable properly).
> >>>
> >>> I suspect few possibilities such as a bug somewhere or unsuccessful
> manual
> >>> build from Pandas source but I am unable to reproduce this and check
> this.
> >>> So, yes. This is rather my guess.
> >>>
> >>>
> >>> Does anyone know if this is an environment problem and how to fix this?
> >>
> >>
> >>
> >>
> >>
> >> -----
> >> Liang-Chi Hsieh | @viirya
> >> Spark Technology Center
> >> http://www.spark.tc/
> >> --
> >> View this message in context: http://apache-spark-
> developers-list.1001551.n3.nabble.com/Question-Flaky-
> tests-pyspark-sql-tests-ArrowTests-tests-in-Jenkins-
> worker-5-tp22085p22086.html
> >> Sent from the Apache Spark Developers List mailing list archive at
> Nabble.com.
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
> >>
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
>
>

Re: Question, Flaky tests: pyspark.sql.tests.ArrowTests tests in Jenkins worker 5(?)

Posted by shane knapp <sk...@berkeley.edu>.
ok, first test to run post-fix is green:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80289/

i'll keep an eye on this worker over the next few days.

shane

On Sat, Aug 5, 2017 at 11:06 AM, shane knapp <sk...@berkeley.edu> wrote:
> amp-jenkins-worker-05 had 0.20.3 installed for some reason.  it's now
> been downgraded to 0.19.2 and matches the other workers.
>
> shane
>
> On Sat, Aug 5, 2017 at 2:01 AM, Liang-Chi Hsieh <vi...@gmail.com> wrote:
>>
>> Maybe a possible fix:
>> https://stackoverflow.com/questions/31495657/development-build-of-pandas-giving-importerror-c-extension-hashtable-not-bui
>>
>>
>> Hyukjin Kwon wrote
>>> Hi all,
>>>
>>> I am seeing flaky Python tests time to time and if I am not mistaken
>>> mostly
>>> in amp-jenkins-worker-05:
>>>
>>>
>>> ======================================================================
>>> ERROR: test_filtered_frame (pyspark.sql.tests.ArrowTests)
>>> ----------------------------------------------------------------------
>>> Traceback (most recent call last):
>>>   File
>>> "/home/anaconda/envs/py3k/lib/python3.4/site-packages/pandas/__init__.py",
>>> line 25, in
>>> <module>
>>>     from pandas import hashtable, tslib, lib
>>> ImportError: cannot import name 'hashtable'
>>>
>>> During handling of the above exception, another exception occurred:
>>>
>>> Traceback (most recent call last):
>>>   File
>>> "/home/jenkins/workspace/SparkPullRequestBuilder/python/pyspark/sql/tests.py",
>>> line 3057, in test_filtered_frame
>>>     pdf = df.filter("i < 0").toPandas()
>>>   File
>>> "/home/jenkins/workspace/SparkPullRequestBuilder/python/pyspark/sql/dataframe.py",
>>> line 1727, in toPandas
>>>     import pandas as pd
>>>   File
>>> "/home/anaconda/envs/py3k/lib/python3.4/site-packages/pandas/__init__.py",
>>> line 31, in
>>> <module>
>>>     "the C extensions first.".format(module))
>>> ImportError: C extension: 'hashtable' not built. If you want to import
>>> pandas from the source directory, you may need to run 'python setup.py
>>> build_ext --inplace --force' to build the C extensions first.
>>>
>>> ======================================================================
>>> ERROR: test_null_conversion (pyspark.sql.tests.ArrowTests)
>>> ----------------------------------------------------------------------
>>> ...
>>>
>>> ======================================================================
>>> ERROR: test_pandas_round_trip (pyspark.sql.tests.ArrowTests)
>>> ----------------------------------------------------------------------
>>> ...
>>>
>>> ======================================================================
>>> ERROR: test_toPandas_arrow_toggle (pyspark.sql.tests.ArrowTests)
>>> ----------------------------------------------------------------------
>>> ...
>>>
>>>
>>> I sounds environment problem apparently due to missing hashtable (which I
>>> believe should have been compiled and importable properly).
>>>
>>> I suspect few possibilities such as a bug somewhere or unsuccessful manual
>>> build from Pandas source but I am unable to reproduce this and check this.
>>> So, yes. This is rather my guess.
>>>
>>>
>>> Does anyone know if this is an environment problem and how to fix this?
>>
>>
>>
>>
>>
>> -----
>> Liang-Chi Hsieh | @viirya
>> Spark Technology Center
>> http://www.spark.tc/
>> --
>> View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Question-Flaky-tests-pyspark-sql-tests-ArrowTests-tests-in-Jenkins-worker-5-tp22085p22086.html
>> Sent from the Apache Spark Developers List mailing list archive at Nabble.com.
>>
>> ---------------------------------------------------------------------
>> To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
>>

---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscribe@spark.apache.org


Re: Question, Flaky tests: pyspark.sql.tests.ArrowTests tests in Jenkins worker 5(?)

Posted by shane knapp <sk...@berkeley.edu>.
amp-jenkins-worker-05 had 0.20.3 installed for some reason.  it's now
been downgraded to 0.19.2 and matches the other workers.

shane

On Sat, Aug 5, 2017 at 2:01 AM, Liang-Chi Hsieh <vi...@gmail.com> wrote:
>
> Maybe a possible fix:
> https://stackoverflow.com/questions/31495657/development-build-of-pandas-giving-importerror-c-extension-hashtable-not-bui
>
>
> Hyukjin Kwon wrote
>> Hi all,
>>
>> I am seeing flaky Python tests time to time and if I am not mistaken
>> mostly
>> in amp-jenkins-worker-05:
>>
>>
>> ======================================================================
>> ERROR: test_filtered_frame (pyspark.sql.tests.ArrowTests)
>> ----------------------------------------------------------------------
>> Traceback (most recent call last):
>>   File
>> "/home/anaconda/envs/py3k/lib/python3.4/site-packages/pandas/__init__.py",
>> line 25, in
>> <module>
>>     from pandas import hashtable, tslib, lib
>> ImportError: cannot import name 'hashtable'
>>
>> During handling of the above exception, another exception occurred:
>>
>> Traceback (most recent call last):
>>   File
>> "/home/jenkins/workspace/SparkPullRequestBuilder/python/pyspark/sql/tests.py",
>> line 3057, in test_filtered_frame
>>     pdf = df.filter("i < 0").toPandas()
>>   File
>> "/home/jenkins/workspace/SparkPullRequestBuilder/python/pyspark/sql/dataframe.py",
>> line 1727, in toPandas
>>     import pandas as pd
>>   File
>> "/home/anaconda/envs/py3k/lib/python3.4/site-packages/pandas/__init__.py",
>> line 31, in
>> <module>
>>     "the C extensions first.".format(module))
>> ImportError: C extension: 'hashtable' not built. If you want to import
>> pandas from the source directory, you may need to run 'python setup.py
>> build_ext --inplace --force' to build the C extensions first.
>>
>> ======================================================================
>> ERROR: test_null_conversion (pyspark.sql.tests.ArrowTests)
>> ----------------------------------------------------------------------
>> ...
>>
>> ======================================================================
>> ERROR: test_pandas_round_trip (pyspark.sql.tests.ArrowTests)
>> ----------------------------------------------------------------------
>> ...
>>
>> ======================================================================
>> ERROR: test_toPandas_arrow_toggle (pyspark.sql.tests.ArrowTests)
>> ----------------------------------------------------------------------
>> ...
>>
>>
>> I sounds environment problem apparently due to missing hashtable (which I
>> believe should have been compiled and importable properly).
>>
>> I suspect few possibilities such as a bug somewhere or unsuccessful manual
>> build from Pandas source but I am unable to reproduce this and check this.
>> So, yes. This is rather my guess.
>>
>>
>> Does anyone know if this is an environment problem and how to fix this?
>
>
>
>
>
> -----
> Liang-Chi Hsieh | @viirya
> Spark Technology Center
> http://www.spark.tc/
> --
> View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Question-Flaky-tests-pyspark-sql-tests-ArrowTests-tests-in-Jenkins-worker-5-tp22085p22086.html
> Sent from the Apache Spark Developers List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
>

---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscribe@spark.apache.org


Re: Question, Flaky tests: pyspark.sql.tests.ArrowTests tests in Jenkins worker 5(?)

Posted by Liang-Chi Hsieh <vi...@gmail.com>.
Maybe a possible fix:
https://stackoverflow.com/questions/31495657/development-build-of-pandas-giving-importerror-c-extension-hashtable-not-bui


Hyukjin Kwon wrote
> Hi all,
> 
> I am seeing flaky Python tests time to time and if I am not mistaken
> mostly
> in amp-jenkins-worker-05:
> 
> 
> ======================================================================
> ERROR: test_filtered_frame (pyspark.sql.tests.ArrowTests)
> ----------------------------------------------------------------------
> Traceback (most recent call last):
>   File
> "/home/anaconda/envs/py3k/lib/python3.4/site-packages/pandas/__init__.py",
> line 25, in 
> <module>
>     from pandas import hashtable, tslib, lib
> ImportError: cannot import name 'hashtable'
> 
> During handling of the above exception, another exception occurred:
> 
> Traceback (most recent call last):
>   File
> "/home/jenkins/workspace/SparkPullRequestBuilder/python/pyspark/sql/tests.py",
> line 3057, in test_filtered_frame
>     pdf = df.filter("i < 0").toPandas()
>   File
> "/home/jenkins/workspace/SparkPullRequestBuilder/python/pyspark/sql/dataframe.py",
> line 1727, in toPandas
>     import pandas as pd
>   File
> "/home/anaconda/envs/py3k/lib/python3.4/site-packages/pandas/__init__.py",
> line 31, in 
> <module>
>     "the C extensions first.".format(module))
> ImportError: C extension: 'hashtable' not built. If you want to import
> pandas from the source directory, you may need to run 'python setup.py
> build_ext --inplace --force' to build the C extensions first.
> 
> ======================================================================
> ERROR: test_null_conversion (pyspark.sql.tests.ArrowTests)
> ----------------------------------------------------------------------
> ...
> 
> ======================================================================
> ERROR: test_pandas_round_trip (pyspark.sql.tests.ArrowTests)
> ----------------------------------------------------------------------
> ...
> 
> ======================================================================
> ERROR: test_toPandas_arrow_toggle (pyspark.sql.tests.ArrowTests)
> ----------------------------------------------------------------------
> ...
> 
> 
> I sounds environment problem apparently due to missing hashtable (which I
> believe should have been compiled and importable properly).
> 
> I suspect few possibilities such as a bug somewhere or unsuccessful manual
> build from Pandas source but I am unable to reproduce this and check this.
> So, yes. This is rather my guess.
> 
> 
> Does anyone know if this is an environment problem and how to fix this?





-----
Liang-Chi Hsieh | @viirya 
Spark Technology Center 
http://www.spark.tc/ 
--
View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Question-Flaky-tests-pyspark-sql-tests-ArrowTests-tests-in-Jenkins-worker-5-tp22085p22086.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscribe@spark.apache.org