You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Ruslan Dautkhanov (JIRA)" <ji...@apache.org> on 2018/11/07 16:34:00 UTC

[jira] [Updated] (SPARK-25958) error: [Errno 97] Address family not supported by protocol in dataframe.take()

     [ https://issues.apache.org/jira/browse/SPARK-25958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ruslan Dautkhanov updated SPARK-25958:
--------------------------------------
    Description: 
Following error happens on a heavy Spark job after 4 hours of runtime..
{code}
2018-11-06 14:35:56,604 - data_vault.py - ERROR - Exited with exception: [Errno 97] Address family not supported by protocol
Traceback (most recent call last):
  File "/home/mwincek/svn/data_vault/data_vault.py", line 64, in data_vault
    item.create_persistent_data()
  File "/home/mwincek/svn/data_vault/src/table_recipe/amf_table_recipe.py", line 53, in create_persistent_data
    single_obj.create_persistent_data()
  File "/home/mwincek/svn/data_vault/src/table_processing/table_processing.py", line 21, in create_persistent_data
    main_df = self.generate_dataframe_main()
  File "/home/mwincek/svn/data_vault/src/table_processing/table_processing.py", line 98, in generate_dataframe_main
    raw_disc_dv_df = self.get_raw_data_with_metadata_and_aggregation()
  File "/home/mwincek/svn/data_vault/src/table_processing/satellite_binary_dates_table_processing.py", line 16, in get_raw_data_with_metadata_and_aggregation
    main_df = self.get_dataframe_using_binary_date_aggregation_on_dataframe(input_df=raw_disc_dv_df)
  File "/home/mwincek/svn/data_vault/src/table_processing/satellite_binary_dates_table_processing.py", line 60, in get_dataframe_using_binary_date_aggregation_on_dataframe
    return_df = self.get_dataframe_from_binary_value_iteration(input_df)
  File "/home/mwincek/svn/data_vault/src/table_processing/satellite_binary_dates_table_processing.py", line 136, in get_dataframe_from_binary_value_iteration
    combine_df = self.get_dataframe_from_binary_value(input_df=input_df, binary_value=count)
  File "/home/mwincek/svn/data_vault/src/table_processing/satellite_binary_dates_table_processing.py", line 154, in get_dataframe_from_binary_value
    if len(results_of_filter_df.take(1)) == 0:
  File "/opt/cloudera/parcels/SPARK2/lib/spark2/python/pyspark/sql/dataframe.py", line 504, in take
    return self.limit(num).collect()
  File "/opt/cloudera/parcels/SPARK2/lib/spark2/python/pyspark/sql/dataframe.py", line 467, in collect
    return list(_load_from_socket(sock_info, BatchedSerializer(PickleSerializer())))
  File "/opt/cloudera/parcels/SPARK2/lib/spark2/python/pyspark/rdd.py", line 148, in _load_from_socket
    sock = socket.socket(af, socktype, proto)
  File "/opt/cloudera/parcels/Anaconda/lib/python2.7/socket.py", line 191, in __init__
    _sock = _realsocket(family, type, proto)
error: [Errno 97] Address family not supported by protocol
{code}
Looking at the failing line in lib/spark2/python/pyspark/rdd.py, line 148:
{code}
def _load_from_socket(sock_info, serializer):
    port, auth_secret = sock_info
    sock = None
    # Support for both IPv4 and IPv6.
    # On most of IPv6-ready systems, IPv6 will take precedence.
    for res in socket.getaddrinfo("localhost", port, socket.AF_UNSPEC, socket.SOCK_STREAM):
        af, socktype, proto, canonname, sa = res
        sock = socket.socket(af, socktype, proto)
        try:
            sock.settimeout(15)
            sock.connect(sa)
        except socket.error:
            sock.close()
            sock = None
            continue
        break
    if not sock:
        raise Exception("could not open socket")
    # The RDD materialization time is unpredicable, if we set a timeout for socket reading
    # operation, it will very possibly fail. See SPARK-18281.
    sock.settimeout(None)

    sockfile = sock.makefile("rwb", 65536)
    do_server_auth(sockfile, auth_secret)

    # The socket will be automatically closed when garbage-collected.
    return serializer.load_stream(sockfile)
{code}
the culprint is in lib/spark2/python/pyspark/rdd.py in this lineĀ 
{code}
socket.getaddrinfo("localhost", port, socket.AF_UNSPEC, socket.SOCK_STREAM)
{code}
so the error "error: [Errno 97] *Address family* not supported by protocol"

seems to be caused by socket.AF_UNSPEC third option to the socket.getaddrinfo() call.

I tried to call similar socket.getaddrinfo call locally outside of PySpark and it worked fine.

RHEL 7.5.

  was:
Following error happens on a heavy Spark job after 4 hours of runtime.. 

{code:python}
2018-11-06 14:35:56,604 - data_vault.py - ERROR - Exited with exception: [Errno 97] Address family not supported by protocol
Traceback (most recent call last):
  File "/home/mwincek/svn/data_vault/data_vault.py", line 64, in data_vault
    item.create_persistent_data()
  File "/home/mwincek/svn/data_vault/src/table_recipe/amf_table_recipe.py", line 53, in create_persistent_data
    single_obj.create_persistent_data()
  File "/home/mwincek/svn/data_vault/src/table_processing/table_processing.py", line 21, in create_persistent_data
    main_df = self.generate_dataframe_main()
  File "/home/mwincek/svn/data_vault/src/table_processing/table_processing.py", line 98, in generate_dataframe_main
    raw_disc_dv_df = self.get_raw_data_with_metadata_and_aggregation()
  File "/home/mwincek/svn/data_vault/src/table_processing/satellite_binary_dates_table_processing.py", line 16, in get_raw_data_with_metadata_and_aggregation
    main_df = self.get_dataframe_using_binary_date_aggregation_on_dataframe(input_df=raw_disc_dv_df)
  File "/home/mwincek/svn/data_vault/src/table_processing/satellite_binary_dates_table_processing.py", line 60, in get_dataframe_using_binary_date_aggregation_on_dataframe
    return_df = self.get_dataframe_from_binary_value_iteration(input_df)
  File "/home/mwincek/svn/data_vault/src/table_processing/satellite_binary_dates_table_processing.py", line 136, in get_dataframe_from_binary_value_iteration
    combine_df = self.get_dataframe_from_binary_value(input_df=input_df, binary_value=count)
  File "/home/mwincek/svn/data_vault/src/table_processing/satellite_binary_dates_table_processing.py", line 154, in get_dataframe_from_binary_value
    if len(results_of_filter_df.take(1)) == 0:
  File "/opt/cloudera/parcels/SPARK2/lib/spark2/python/pyspark/sql/dataframe.py", line 504, in take
    return self.limit(num).collect()
  File "/opt/cloudera/parcels/SPARK2/lib/spark2/python/pyspark/sql/dataframe.py", line 467, in collect
    return list(_load_from_socket(sock_info, BatchedSerializer(PickleSerializer())))
  File "/opt/cloudera/parcels/SPARK2/lib/spark2/python/pyspark/rdd.py", line 148, in _load_from_socket
    sock = socket.socket(af, socktype, proto)
  File "/opt/cloudera/parcels/Anaconda/lib/python2.7/socket.py", line 191, in __init__
    _sock = _realsocket(family, type, proto)
error: [Errno 97] Address family not supported by protocol
{code}

Looking at the failing line in lib/spark2/python/pyspark/rdd.py, line 148: 

{code:python}

def _load_from_socket(sock_info, serializer):
    port, auth_secret = sock_info
    sock = None
    # Support for both IPv4 and IPv6.
    # On most of IPv6-ready systems, IPv6 will take precedence.
    for res in socket.getaddrinfo("localhost", port, socket.AF_UNSPEC, socket.SOCK_STREAM):
        af, socktype, proto, canonname, sa = res
        sock = socket.socket(af, socktype, proto)
        try:
            sock.settimeout(15)
            sock.connect(sa)
        except socket.error:
            sock.close()
            sock = None
            continue
        break
    if not sock:
        raise Exception("could not open socket")
    # The RDD materialization time is unpredicable, if we set a timeout for socket reading
    # operation, it will very possibly fail. See SPARK-18281.
    sock.settimeout(None)

    sockfile = sock.makefile("rwb", 65536)
    do_server_auth(sockfile, auth_secret)

    # The socket will be automatically closed when garbage-collected.
    return serializer.load_stream(sockfile)
{code}

the culprint is in the line 

{code:python}
socket.getaddrinfo("localhost", port, socket.AF_UNSPEC, socket.SOCK_STREAM)
{code}

so the error "error: [Errno 97] *Address family* not supported by protocol" 

seems to be caused by socket.AF_UNSPEC third option to the socket.getaddrinfo() call.

I tried to call similar socket.getaddrinfo call locally outside of PySpark and it worked fine.

RHEL 7.5.



> error: [Errno 97] Address family not supported by protocol in dataframe.take()
> ------------------------------------------------------------------------------
>
>                 Key: SPARK-25958
>                 URL: https://issues.apache.org/jira/browse/SPARK-25958
>             Project: Spark
>          Issue Type: New Feature
>          Components: PySpark, Spark Core
>    Affects Versions: 2.3.1, 2.3.2
>            Reporter: Ruslan Dautkhanov
>            Priority: Major
>
> Following error happens on a heavy Spark job after 4 hours of runtime..
> {code}
> 2018-11-06 14:35:56,604 - data_vault.py - ERROR - Exited with exception: [Errno 97] Address family not supported by protocol
> Traceback (most recent call last):
>   File "/home/mwincek/svn/data_vault/data_vault.py", line 64, in data_vault
>     item.create_persistent_data()
>   File "/home/mwincek/svn/data_vault/src/table_recipe/amf_table_recipe.py", line 53, in create_persistent_data
>     single_obj.create_persistent_data()
>   File "/home/mwincek/svn/data_vault/src/table_processing/table_processing.py", line 21, in create_persistent_data
>     main_df = self.generate_dataframe_main()
>   File "/home/mwincek/svn/data_vault/src/table_processing/table_processing.py", line 98, in generate_dataframe_main
>     raw_disc_dv_df = self.get_raw_data_with_metadata_and_aggregation()
>   File "/home/mwincek/svn/data_vault/src/table_processing/satellite_binary_dates_table_processing.py", line 16, in get_raw_data_with_metadata_and_aggregation
>     main_df = self.get_dataframe_using_binary_date_aggregation_on_dataframe(input_df=raw_disc_dv_df)
>   File "/home/mwincek/svn/data_vault/src/table_processing/satellite_binary_dates_table_processing.py", line 60, in get_dataframe_using_binary_date_aggregation_on_dataframe
>     return_df = self.get_dataframe_from_binary_value_iteration(input_df)
>   File "/home/mwincek/svn/data_vault/src/table_processing/satellite_binary_dates_table_processing.py", line 136, in get_dataframe_from_binary_value_iteration
>     combine_df = self.get_dataframe_from_binary_value(input_df=input_df, binary_value=count)
>   File "/home/mwincek/svn/data_vault/src/table_processing/satellite_binary_dates_table_processing.py", line 154, in get_dataframe_from_binary_value
>     if len(results_of_filter_df.take(1)) == 0:
>   File "/opt/cloudera/parcels/SPARK2/lib/spark2/python/pyspark/sql/dataframe.py", line 504, in take
>     return self.limit(num).collect()
>   File "/opt/cloudera/parcels/SPARK2/lib/spark2/python/pyspark/sql/dataframe.py", line 467, in collect
>     return list(_load_from_socket(sock_info, BatchedSerializer(PickleSerializer())))
>   File "/opt/cloudera/parcels/SPARK2/lib/spark2/python/pyspark/rdd.py", line 148, in _load_from_socket
>     sock = socket.socket(af, socktype, proto)
>   File "/opt/cloudera/parcels/Anaconda/lib/python2.7/socket.py", line 191, in __init__
>     _sock = _realsocket(family, type, proto)
> error: [Errno 97] Address family not supported by protocol
> {code}
> Looking at the failing line in lib/spark2/python/pyspark/rdd.py, line 148:
> {code}
> def _load_from_socket(sock_info, serializer):
>     port, auth_secret = sock_info
>     sock = None
>     # Support for both IPv4 and IPv6.
>     # On most of IPv6-ready systems, IPv6 will take precedence.
>     for res in socket.getaddrinfo("localhost", port, socket.AF_UNSPEC, socket.SOCK_STREAM):
>         af, socktype, proto, canonname, sa = res
>         sock = socket.socket(af, socktype, proto)
>         try:
>             sock.settimeout(15)
>             sock.connect(sa)
>         except socket.error:
>             sock.close()
>             sock = None
>             continue
>         break
>     if not sock:
>         raise Exception("could not open socket")
>     # The RDD materialization time is unpredicable, if we set a timeout for socket reading
>     # operation, it will very possibly fail. See SPARK-18281.
>     sock.settimeout(None)
>     sockfile = sock.makefile("rwb", 65536)
>     do_server_auth(sockfile, auth_secret)
>     # The socket will be automatically closed when garbage-collected.
>     return serializer.load_stream(sockfile)
> {code}
> the culprint is in lib/spark2/python/pyspark/rdd.py in this lineĀ 
> {code}
> socket.getaddrinfo("localhost", port, socket.AF_UNSPEC, socket.SOCK_STREAM)
> {code}
> so the error "error: [Errno 97] *Address family* not supported by protocol"
> seems to be caused by socket.AF_UNSPEC third option to the socket.getaddrinfo() call.
> I tried to call similar socket.getaddrinfo call locally outside of PySpark and it worked fine.
> RHEL 7.5.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org