You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Antoine Pitrou (Jira)" <ji...@apache.org> on 2022/01/25 18:17:00 UTC

[jira] [Created] (ARROW-15454) [Python] Try to make CSV cancellation test more robust

Antoine Pitrou created ARROW-15454:
--------------------------------------

             Summary: [Python] Try to make CSV cancellation test more robust
                 Key: ARROW-15454
                 URL: https://issues.apache.org/jira/browse/ARROW-15454
             Project: Apache Arrow
          Issue Type: Bug
          Components: Python
            Reporter: Antoine Pitrou
            Assignee: Antoine Pitrou
             Fix For: 8.0.0


The test can occasionally fail, see symptoms here:
https://github.com/ursacomputing/crossbow/runs/4920924293?check_suite_focus=true#step:4:12555

{code}
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> captured stdout >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
workload size: 100000
workload size: 300000
workload size: 900000
workload size: 2700000
workload size: 8100000
workload size: 24300000
workload size: 72900000
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> traceback >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>

self = <pyarrow.tests.test_csv.TestSerialCSVTableRead object at 0x1215b9f70>

    def test_cancellation(self):
        if (threading.current_thread().ident !=
                threading.main_thread().ident):
            pytest.skip("test only works from main Python thread")
        # Skips test if not available
        raise_signal = util.get_raise_signal()
    
        # Make the interruptible workload large enough to not finish
        # before the interrupt comes, even in release mode on fast machines.
        last_duration = 0.0
        workload_size = 100_000
    
        while last_duration < 1.0:
            print("workload size:", workload_size)
            large_csv = b"a,b,c\n" + b"1,2,3\n" * workload_size
            t1 = time.time()
            self.read_bytes(large_csv)
            last_duration = time.time() - t1
            workload_size = workload_size * 3
    
        def signal_from_thread():
            time.sleep(0.2)
            raise_signal(signal.SIGINT)
    
        t1 = time.time()
        try:
            try:
                t = threading.Thread(target=signal_from_thread)
                with pytest.raises(KeyboardInterrupt) as exc_info:
                    t.start()
>                   self.read_bytes(large_csv)
E                   Failed: DID NOT RAISE <class 'KeyboardInterrupt'>

pyarrow/tests/test_csv.py:1400: Failed
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> entering PDB >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
{code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)