You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@beam.apache.org by GitBox <gi...@apache.org> on 2021/11/06 00:24:58 UTC

[GitHub] [beam] TheNeuralBit commented on pull request #15909: [BEAM-12550] Parallelizable kurtosis Implementation

TheNeuralBit commented on pull request #15909:
URL: https://github.com/apache/beam/pull/15909#issuecomment-962291253


   The pandas/_libs/testing.pyx errors look like real errors in [`test_dataframe_agg_method`](https://github.com/apache/beam/blob/a3bb58dbd4fc6a59f93f69f9ab1980f8232b6e82/sdks/python/apache_beam/dataframe/frames_test.py#L1490):
   ```
   >   ???
   E   AssertionError: Series are different
   E   
   E   Series values are different (50.0 %)
   E   [index]: [A, B]
   E   [left]:  [-1.1999999999999993, -1.699511634587763]
   E   [right]: [-1.200000000000001, -0.40130739795918835]
   ```
   They just happen to come from `pd.testing.assert_frames_equal`, which we use to verify if DataFrame results are equivalent:  https://github.com/apache/beam/blob/a3bb58dbd4fc6a59f93f69f9ab1980f8232b6e82/sdks/python/apache_beam/dataframe/frames_test.py#L175-L176
   
   I also see a couple of failures for `test_series_cov_corr` indicating it may be a little flaky, like [this one](https://ci-beam.apache.org/job/beam_PreCommit_Python_Commit/20405/testReport/junit/apache_beam.dataframe.frames_test/DeferredFrameTest/test_series_cov_corr_8/):
   
   ```
   apache_beam/dataframe/frames_test.py:191: in _run_test
       self.assertTrue(
   E   AssertionError: False is not true : Expected:
   E   
   E   -1.2
   E   
   E   Actual:
   E   
   E   -1.1999545602598247
   ```
   
   That's off by just 5e-5, but I guess it's enough for np.isclose to consider it different. If we can rule out an actual cause for this difference, we may want to plumb through an option for increasing the tolerance, like we discussed for skew. But it seems like something else may be going on here.
   
   The error in `test_dataframe_agg_method` does look like a hard failure, if you look [here](https://ci-beam.apache.org/job/beam_PreCommit_Python_Commit/20405/testReport/junit/apache_beam.dataframe.frames_test/AggregationTest/) you can see it failed in every run: 
   ![image](https://user-images.githubusercontent.com/675055/140591167-ef72f201-b323-4c4b-86e8-9750867a4fc8.png)
   
   and it's consistently producing -0.4 rather than -1.7 for column B. I'd suggest looking closer at the column B case from that test: https://github.com/apache/beam/blob/a3bb58dbd4fc6a59f93f69f9ab1980f8232b6e82/sdks/python/apache_beam/dataframe/frames_test.py#L1491
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org