You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Bihui Jin (JIRA)" <ji...@apache.org> on 2018/10/18 02:25:00 UTC

[jira] [Reopened] (SPARK-25733) The method toLocalIterator() with dataframe doesn't work

     [ https://issues.apache.org/jira/browse/SPARK-25733?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Bihui Jin reopened SPARK-25733:
-------------------------------

This issue isn't a duplicate of SPARK-23961

In SPARK-23961, toLocalIterator() is working but throw an exception if we do not consume all records when spark context stopped. In this issue,  toLocalIterator() is not working and we can't get records from this iterator.

> The method toLocalIterator() with dataframe doesn't work
> --------------------------------------------------------
>
>                 Key: SPARK-25733
>                 URL: https://issues.apache.org/jira/browse/SPARK-25733
>             Project: Spark
>          Issue Type: Bug
>          Components: PySpark
>    Affects Versions: 2.3.1
>         Environment: Spark in standalone mode, and 48 cores are available.
> spark-defaults.conf as blew:
> spark.pyshark.python /usr/bin/python3.6
> spark.driver.memory 4g
> spark.executor.memory 8g
>  
> other configurations are at default.
>            Reporter: Bihui Jin
>            Priority: Major
>         Attachments: report_dataset.zip.001, report_dataset.zip.002
>
>
> {color:#FF0000}The dataset which I used attached.{color}
>  
> First I loaded a dataframe from local disk:
> df = spark.read.load('report_dataset')
> there are about 200 partitions stored in s3, and the max size of partitions is 28.37MB.
>  
> after data loaded,  I execute "df.take(1)" to test the dataframe, and expected output printed "[Row(s3_link='https://dcm-ul-phy.s3-china-1.eecloud.nsn-net.net/normal/run2/pool1/Tests.NbIot.NBCellSetupDelete.LTE3374_CellSetup_4x5M_2RX_3CELevel_Loop100.html', sequences=[364, 15, 184, 34, 524, 49, 30, 527, 44, 366, 125, 85, 69, 524, 49, 389, 575, 29, 179, 447, 168, 3, 223, 116, 573, 524, 49, 30, 527, 56, 366, 125, 85, 524, 118, 295, 440, 123, 389, 32, 575, 529, 192, 524, 49, 389, 575, 29, 179, 29, 140, 268, 96, 508, 389, 32, 575, 529, 192, 524, 49, 389, 575, 29, 179, 180, 451, 69, 286, 524, 49, 389, 575, 29, 42, 553, 451, 37, 125, 524, 49, 389, 575, 29, 42, 553, 451, 37, 125, 524, 49, 389, 575, 29, 42, 553, 451, 368, 125, 88, 588, 524, 49, 389, 575, 29, 42, 553, 451, 368, 125, 88, 588, 524, 49, 389, 575, 29, 42, 553, 451, 368, 125, 88, 588, 524, 49, 389], next_word=575, line_num=12)]" 
>  
> Then I try to convert dataframe to the local iterator and want to print one row in dataframe for testing, and blew code is used:
> for row in df.toLocalIterator():
>     print(row)
>     break
> {color:#ff0000}*But there is no output printed after that code executed.*{color}
>  
> Then I execute "df.take(1)" and blew error is reported:
> ERROR:root:Exception while sending command.
> Traceback (most recent call last):
> File "/opt/k2-v02/lib/python3.6/site-packages/py4j/java_gateway.py", line 1159, in send_command
> raise Py4JNetworkError("Answer from Java side is empty")
> py4j.protocol.Py4JNetworkError: Answer from Java side is empty
> During handling of the above exception, another exception occurred:
> ERROR:root:Exception while sending command.
> Traceback (most recent call last):
> File "/opt/k2-v02/lib/python3.6/site-packages/py4j/java_gateway.py", line 1159, in send_command
> raise Py4JNetworkError("Answer from Java side is empty")
> py4j.protocol.Py4JNetworkError: Answer from Java side is empty
> During handling of the above exception, another exception occurred:
> Traceback (most recent call last):
> File "/opt/k2-v02/lib/python3.6/site-packages/py4j/java_gateway.py", line 985, in send_command
> response = connection.send_command(command)
> File "/opt/k2-v02/lib/python3.6/site-packages/py4j/java_gateway.py", line 1164, in send_command
> "Error while receiving", e, proto.ERROR_ON_RECEIVE)
> py4j.protocol.Py4JNetworkError: Error while receiving
> ERROR:py4j.java_gateway:An error occurred while trying to connect to the Java server (127.0.0.1:37735)
> Traceback (most recent call last):
> File "/opt/k2-v02/lib/python3.6/site-packages/IPython/core/interactiveshell.py", line 2963, in run_code
> exec(code_obj, self.user_global_ns, self.user_ns)
> File "<ipython-input-7-3959105b378f>", line 1, in <module>
> df.take(1)
> File "/opt/k2-v02/lib/python3.6/site-packages/pyspark/sql/dataframe.py", line 504, in take
> return self.limit(num).collect()
> File "/opt/k2-v02/lib/python3.6/site-packages/pyspark/sql/dataframe.py", line 493, in limit
> jdf = self._jdf.limit(num)
> File "/opt/k2-v02/lib/python3.6/site-packages/py4j/java_gateway.py", line 1257, in __call__
> answer, self.gateway_client, self.target_id, self.name)
> File "/opt/k2-v02/lib/python3.6/site-packages/pyspark/sql/utils.py", line 63, in deco
> return f(*a, **kw)
> File "/opt/k2-v02/lib/python3.6/site-packages/py4j/protocol.py", line 336, in get_return_value
> format(target_id, ".", name))
> py4j.protocol.Py4JError: An error occurred while calling o29.limit
> During handling of the above exception, another exception occurred:
> Traceback (most recent call last):
> File "/opt/k2-v02/lib/python3.6/site-packages/IPython/core/interactiveshell.py", line 1863, in showtraceback
> stb = value._render_traceback_()
> AttributeError: 'Py4JError' object has no attribute '_render_traceback_'
> During handling of the above exception, another exception occurred:
> Traceback (most recent call last):
> File "/opt/k2-v02/lib/python3.6/site-packages/py4j/java_gateway.py", line 929, in _get_connection
> connection = self.deque.pop()
> IndexError: pop from an empty deque
> During handling of the above exception, another exception occurred:
> Traceback (most recent call last):
> File "/opt/k2-v02/lib/python3.6/site-packages/py4j/java_gateway.py", line 1067, in start
> self.socket.connect((self.address, self.port))
> ConnectionRefusedError: [Errno 111] Connection refused
> ERROR:py4j.java_gateway:An error occurred while trying to connect to the Java server (127.0.0.1:37735)
> Traceback (most recent call last):
> File "/opt/k2-v02/lib/python3.6/site-packages/IPython/core/interactiveshell.py", line 2963, in run_code
> exec(code_obj, self.user_global_ns, self.user_ns)
> File "<ipython-input-7-3959105b378f>", line 1, in <module>
> df.take(1)
> File "/opt/k2-v02/lib/python3.6/site-packages/pyspark/sql/dataframe.py", line 504, in take
> return self.limit(num).collect()
> File "/opt/k2-v02/lib/python3.6/site-packages/pyspark/sql/dataframe.py", line 493, in limit
> jdf = self._jdf.limit(num)
> File "/opt/k2-v02/lib/python3.6/site-packages/py4j/java_gateway.py", line 1257, in __call__
> answer, self.gateway_client, self.target_id, self.name)
> File "/opt/k2-v02/lib/python3.6/site-packages/pyspark/sql/utils.py", line 63, in deco
> return f(*a, **kw)
> File "/opt/k2-v02/lib/python3.6/site-packages/py4j/protocol.py", line 336, in get_return_value
> format(target_id, ".", name))
> py4j.protocol.Py4JError: An error occurred while calling o29.limit
> During handling of the above exception, another exception occurred:
> Traceback (most recent call last):
> File "/opt/k2-v02/lib/python3.6/site-packages/IPython/core/interactiveshell.py", line 1863, in showtraceback
> stb = value._render_traceback_()
> AttributeError: 'Py4JError' object has no attribute '_render_traceback_'
> During handling of the above exception, another exception occurred:
> Traceback (most recent call last):
> File "/opt/k2-v02/lib/python3.6/site-packages/py4j/java_gateway.py", line 929, in _get_connection
> connection = self.deque.pop()
> IndexError: pop from an empty deque
> During handling of the above exception, another exception occurred:
> Traceback (most recent call last):
> File "/opt/k2-v02/lib/python3.6/site-packages/py4j/java_gateway.py", line 1067, in start
> self.socket.connect((self.address, self.port))
> ConnectionRefusedError: [Errno 111] Connection refused
> ERROR:py4j.java_gateway:An error occurred while trying to connect to the Java server (127.0.0.1:37735)
> Traceback (most recent call last):
> File "/opt/k2-v02/lib/python3.6/site-packages/IPython/core/interactiveshell.py", line 2963, in run_code
> exec(code_obj, self.user_global_ns, self.user_ns)
> File "<ipython-input-7-3959105b378f>", line 1, in <module>
> df.take(1)
> File "/opt/k2-v02/lib/python3.6/site-packages/pyspark/sql/dataframe.py", line 504, in take
> return self.limit(num).collect()
> File "/opt/k2-v02/lib/python3.6/site-packages/pyspark/sql/dataframe.py", line 493, in limit
> jdf = self._jdf.limit(num)
> File "/opt/k2-v02/lib/python3.6/site-packages/py4j/java_gateway.py", line 1257, in __call__
> answer, self.gateway_client, self.target_id, self.name)
> File "/opt/k2-v02/lib/python3.6/site-packages/pyspark/sql/utils.py", line 63, in deco
> return f(*a, **kw)
> File "/opt/k2-v02/lib/python3.6/site-packages/py4j/protocol.py", line 336, in get_return_value
> format(target_id, ".", name))
> py4j.protocol.Py4JError: An error occurred while calling o29.limit
> During handling of the above exception, another exception occurred:
> Traceback (most recent call last):
> File "/opt/k2-v02/lib/python3.6/site-packages/IPython/core/interactiveshell.py", line 1863, in showtraceback
> stb = value._render_traceback_()
> AttributeError: 'Py4JError' object has no attribute '_render_traceback_'
> During handling of the above exception, another exception occurred:
> Traceback (most recent call last):
> File "/opt/k2-v02/lib/python3.6/site-packages/py4j/java_gateway.py", line 929, in _get_connection
> connection = self.deque.pop()
> IndexError: pop from an empty deque
> During handling of the above exception, another exception occurred:
> Traceback (most recent call last):
> File "/opt/k2-v02/lib/python3.6/site-packages/py4j/java_gateway.py", line 1067, in start
> self.socket.connect((self.address, self.port))
> ConnectionRefusedError: [Errno 111] Connection refused
>  
>  {color:#e75c58}---------------------------------------------------------------------------{color}{color:#e75c58}Py4JError{color} Traceback (most recent call last){color:#00a250}<ipython-input-7-3959105b378f>{color} in {color:#60c6c8}<module>{color}{color:#208ffb}(){color}{color:#00a250}----> 1{color} df{color:#208ffb}.{color}take{color:#208ffb}({color}{color:#60c6c8}1{color}{color:#208ffb}){color}{color:#00a250}/opt/k2-v02/lib/python3.6/site-packages/pyspark/sql/dataframe.py{color} in {color:#60c6c8}take{color}{color:#208ffb}(self, num){color}{color:#00a250} 502{color} {color:#208ffb}[{color}Row{color:#208ffb}({color}age{color:#208ffb}={color}{color:#60c6c8}2{color}{color:#208ffb},{color} name{color:#208ffb}={color}{color:#208ffb}u'Alice'{color}{color:#208ffb}){color}{color:#208ffb},{color} Row{color:#208ffb}({color}age{color:#208ffb}={color}{color:#60c6c8}5{color}{color:#208ffb},{color} name{color:#208ffb}={color}{color:#208ffb}u'Bob'{color}{color:#208ffb}){color}{color:#208ffb}]{color}{color:#00a250} 503{color} """{color:#00a250}--> 504{color} {color:#00a250}return{color} self{color:#208ffb}.{color}limit{color:#208ffb}({color}num{color:#208ffb}){color}{color:#208ffb}.{color}collect{color:#208ffb}({color}{color:#208ffb}){color}{color:#00a250} 505{color}{color:#00a250} 506{color} {color:#208ffb}@{color}since{color:#208ffb}({color}{color:#60c6c8}1.3{color}{color:#208ffb}){color}{color:#00a250}/opt/k2-v02/lib/python3.6/site-packages/pyspark/sql/dataframe.py{color} in {color:#60c6c8}limit{color}{color:#208ffb}(self, num){color}{color:#00a250} 491{color} {color:#208ffb}[{color}{color:#208ffb}]{color}{color:#00a250} 492{color} """{color:#00a250}--> 493{color} jdf {color:#208ffb}={color} self{color:#208ffb}.{color}_jdf{color:#208ffb}.{color}limit{color:#208ffb}({color}num{color:#208ffb}){color}{color:#00a250} 494{color} {color:#00a250}return{color} DataFrame{color:#208ffb}({color}jdf{color:#208ffb},{color} self{color:#208ffb}.{color}sql_ctx{color:#208ffb}){color}{color:#00a250} 495{color}{color:#00a250}/opt/k2-v02/lib/python3.6/site-packages/py4j/java_gateway.py{color} in {color:#60c6c8}__call__{color}{color:#208ffb}(self, *args){color}{color:#00a250} 1255{color} answer {color:#208ffb}={color} self{color:#208ffb}.{color}gateway_client{color:#208ffb}.{color}send_command{color:#208ffb}({color}command{color:#208ffb}){color}{color:#00a250} 1256{color} return_value = get_return_value({color:#00a250}-> 1257{color}{color:#e75c58} answer, self.gateway_client, self.target_id, self.name){color}{color:#00a250} 1258{color}{color:#00a250} 1259{color} {color:#00a250}for{color} temp_arg {color:#00a250}in{color} temp_args{color:#208ffb}:{color}{color:#00a250}/opt/k2-v02/lib/python3.6/site-packages/pyspark/sql/utils.py{color} in {color:#60c6c8}deco{color}{color:#208ffb}(*a, **kw){color}{color:#00a250} 61{color} {color:#00a250}def{color} deco{color:#208ffb}({color}{color:#208ffb}*{color}a{color:#208ffb},{color} {color:#208ffb}**{color}kw{color:#208ffb}){color}{color:#208ffb}:{color}{color:#00a250} 62{color} {color:#00a250}try{color}{color:#208ffb}:{color}{color:#00a250}---> 63{color} {color:#00a250}return{color} f{color:#208ffb}({color}{color:#208ffb}*{color}a{color:#208ffb},{color} {color:#208ffb}**{color}kw{color:#208ffb}){color}{color:#00a250} 64{color} {color:#00a250}except{color} py4j{color:#208ffb}.{color}protocol{color:#208ffb}.{color}Py4JJavaError {color:#00a250}as{color} e{color:#208ffb}:{color}{color:#00a250} 65{color} s {color:#208ffb}={color} e{color:#208ffb}.{color}java_exception{color:#208ffb}.{color}toString{color:#208ffb}({color}{color:#208ffb}){color}{color:#00a250}/opt/k2-v02/lib/python3.6/site-packages/py4j/protocol.py{color} in {color:#60c6c8}get_return_value{color}{color:#208ffb}(answer, gateway_client, target_id, name){color}{color:#00a250} 334{color} raise Py4JError({color:#00a250} 335{color} {color:#208ffb}"An error occurred while calling \{0}{1}\{2}"{color}{color:#208ffb}.{color}{color:#00a250}--> 336{color}{color:#e75c58} format(target_id, ".", name)){color}{color:#00a250} 337{color} {color:#00a250}else{color}{color:#208ffb}:{color}{color:#00a250} 338{color} type {color:#208ffb}={color} answer{color:#208ffb}[{color}{color:#60c6c8}1{color}{color:#208ffb}]{color}{color:#e75c58}Py4JError{color}: An error occurred while calling o29.limit
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org