You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Hyukjin Kwon (JIRA)" <ji...@apache.org> on 2018/04/23 01:45:00 UTC

[jira] [Resolved] (SPARK-24047) use spark package to load csv file

     [ https://issues.apache.org/jira/browse/SPARK-24047?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hyukjin Kwon resolved SPARK-24047.
----------------------------------
    Resolution: Invalid

Sounds more like a question. Please ask it to the mailing list. You could have a better answer. Please reopen this if we find this is actually an issue.

> use spark package to load csv file
> ----------------------------------
>
>                 Key: SPARK-24047
>                 URL: https://issues.apache.org/jira/browse/SPARK-24047
>             Project: Spark
>          Issue Type: IT Help
>          Components: Input/Output
>    Affects Versions: 2.3.0
>            Reporter: Jijiao Zeng
>            Priority: Major
>
> I am new to spark. I used spark.read.csv() function read local csv.file.
> But I got following error:
>  
> h2. File "<stdin>", line 1, in <module>
> h2.   File "/Users/jzeng/spark-2.3.0-bin-hadoop2.7/python/pyspark/sql/readwriter.py", line 439, in csv
> h2.     return self._df(self._jreader.csv(self._spark._sc._jvm.PythonUtils.toSeq(path)))
> h2.   File "/Users/jzeng/spark-2.3.0-bin-hadoop2.7/python/lib/py4j-0.10.6-src.zip/py4j/java_gateway.py", line 1160, in __call__
> h2.   File "/Users/jzeng/spark-2.3.0-bin-hadoop2.7/python/pyspark/sql/utils.py", line 63, in deco
> h2.     return f(*a, **kw)
> h2.   File "/Users/jzeng/spark-2.3.0-bin-hadoop2.7/python/lib/py4j-0.10.6-src.zip/py4j/protocol.py", line 320, in get_return_value
> h2. py4j.protocol.Py4JJavaError: An error occurred while calling o58.csv.
> h2. : java.lang.AssertionError: assertion failed: Conflicting directory structures detected. Suspicious paths:
> h2.  file:/users/jzeng/spark-2.3.0-bin-hadoop2.7
> h2.  file:/users/jzeng/spark-2.3.0-bin-hadoop2.7/data/streaming
> h2.  file:/users/jzeng/spark-2.3.0-bin-hadoop2.7/python/lib
> h2.  file:/users/jzeng/spark-2.3.0-bin-hadoop2.7/examples/src/main/scala/org/apache/spark/examples/pythonconverters
> h2.  file:/users/jzeng/spark-2.3.0-bin-hadoop2.7/examples/src/main/python/ml
> h2.  file:/users/jzeng/spark-2.3.0-bin-hadoop2.7/python/test_support/hello
> h2.  file:/users/jzeng/spark-2.3.0-bin-hadoop2.7/examples/src/main/resources
> h2.  file:/users/jzeng/spark-2.3.0-bin-hadoop2.7/python/pyspark/streaming
> h2.  file:/users/jzeng/spark-2.3.0-bin-hadoop2.7/python/pyspark/mllib/stat
> h2.  file:/users/jzeng/spark-2.3.0-bin-hadoop2.7/data/mllib/als
> h2.  file:/users/jzeng/spark-2.3.0-bin-hadoop2.7/examples/src/main/python/sql
> h2.  file:/users/jzeng/spark-2.3.0-bin-hadoop2.7/python/pyspark.egg-info
> h2.  file:/users/jzeng/spark-2.3.0-bin-hadoop2.7/python/test_support/hello/sub_hello
> h2.  file:/users/jzeng/spark-2.3.0-bin-hadoop2.7/licenses
> h2.  file:/users/jzeng/spark-2.3.0-bin-hadoop2.7/python/test_support/sql/parquet_partitioned
> h2.  file:/users/jzeng/spark-2.3.0-bin-hadoop2.7/sbin
> h2.  file:/users/jzeng/spark-2.3.0-bin-hadoop2.7/python/test_support/sql/orc_partitioned
> h2.  file:/users/jzeng/spark-2.3.0-bin-hadoop2.7/r/lib/sparkr/tests/testthat
> h2.  file:/users/jzeng/spark-2.3.0-bin-hadoop2.7/python/pyspark/sql
> h2.  file:/users/jzeng/spark-2.3.0-bin-hadoop2.7/r/lib/sparkr/profile
> h2.  file:/users/jzeng/spark-2.3.0-bin-hadoop2.7/examples/src/main/java/org/apache/spark/examples/sql
> h2.  file:/users/jzeng/spark-2.3.0-bin-hadoop2.7/examples/src/main/java/org/apache/spark/examples/sql/hive
> h2.  file:/users/jzeng/spark-2.3.0-bin-hadoop2.7/r/lib/sparkr
> h2.  file:/users/jzeng/spark-2.3.0-bin-hadoop2.7/kubernetes/dockerfiles/spark
> h2.  file:/users/jzeng/spark-2.3.0-bin-hadoop2.7/r/lib/sparkr/html
> h2.  file:/users/jzeng/spark-2.3.0-bin-hadoop2.7/examples/src/main/scala/org/apache/spark/examples/sql/streaming
> h2.  file:/users/jzeng/spark-2.3.0-bin-hadoop2.7/python/pyspark/mllib/linalg
> h2.  file:/users/jzeng/spark-2.3.0-bin-hadoop2.7/examples/jars
> h2.  file:/users/jzeng/spark-2.3.0-bin-hadoop2.7/python/test_support
> h2.  file:/users/jzeng/spark-2.3.0-bin-hadoop2.7/examples/src/main/scala/org/apache/spark/examples/sql
> h2.  file:/users/jzeng/spark-2.3.0-bin-hadoop2.7/r/lib/sparkr/worker
> h2.  file:/users/jzeng/spark-2.3.0-bin-hadoop2.7/examples/src/main/scala/org/apache/spark/examples/graphx
> h2.  file:/users/jzeng/spark-2.3.0-bin-hadoop2.7/python/test_support/sql/streaming
> h2.  file:/users/jzeng/spark-2.3.0-bin-hadoop2.7/data/mllib/images/multi-channel
> h2.  file:/users/jzeng/spark-2.3.0-bin-hadoop2.7/examples/src/main/scala/org/apache/spark/examples/streaming/clickstream
> h2.  file:/users/jzeng/spark-2.3.0-bin-hadoop2.7/conf
> h2.  file:/users/jzeng/spark-2.3.0-bin-hadoop2.7/examples/src/main/r/streaming
> h2.  file:/users/jzeng/spark-2.3.0-bin-hadoop2.7/examples/src/main/scala/org/apache/spark/examples/ml
> h2.  file:/users/jzeng/spark-2.3.0-bin-hadoop2.7/examples/src/main/python/sql/streaming
> h2.  file:/users/jzeng/spark-2.3.0-bin-hadoop2.7/examples/src/main/java/org/apache/spark/examples
> h2.  file:/users/jzeng/spark-2.3.0-bin-hadoop2.7/python/docs
> h2.  file:/users/jzeng/spark-2.3.0-bin-hadoop2.7/data/mllib/ridge-data
> h2.  file:/users/jzeng/spark-2.3.0-bin-hadoop2.7/python/pyspark
> h2.  file:/users/jzeng/spark-2.3.0-bin-hadoop2.7/r/lib/sparkr/help
> h2.  file:/users/jzeng/spark-2.3.0-bin-hadoop2.7/python/pyspark/ml
> h2.  file:/users/jzeng/spark-2.3.0-bin-hadoop2.7/examples/src/main/r
> h2.  file:/users/jzeng/spark-2.3.0-bin-hadoop2.7/data/mllib
> h2.  file:/users/jzeng/spark-2.3.0-bin-hadoop2.7/examples/src/main/java/org/apache/spark/examples/mllib
> h2.  file:/users/jzeng/spark-2.3.0-bin-hadoop2.7/python
> h2.  file:/users/jzeng/spark-2.3.0-bin-hadoop2.7/examples/src/main/r/ml
> h2.  file:/users/jzeng/spark-2.3.0-bin-hadoop2.7/r/lib
> h2.  file:/users/jzeng/spark-2.3.0-bin-hadoop2.7/r/lib/sparkr/meta
> h2.  file:/users/jzeng/spark-2.3.0-bin-hadoop2.7/data/mllib/images
> h2.  file:/users/jzeng/spark-2.3.0-bin-hadoop2.7/r/lib/sparkr/r
> h2.  file:/users/jzeng/spark-2.3.0-bin-hadoop2.7/examples/src/main/scala/org/apache/spark/examples/mllib
> h2.  file:/users/jzeng/spark-2.3.0-bin-hadoop2.7/examples/src/main/python/streaming
> h2.  file:/users/jzeng/spark-2.3.0-bin-hadoop2.7/python/test_support/sql
> h2.  file:/users/jzeng/spark-2.3.0-bin-hadoop2.7/examples/src/main/scala/org/apache/spark/examples
> h2.  file:/users/jzeng/spark-2.3.0-bin-hadoop2.7/bin
> h2.  file:/users/jzeng/spark-2.3.0-bin-hadoop2.7/python/pyspark/mllib
> h2.  file:/users/jzeng/spark-2.3.0-bin-hadoop2.7/python/pyspark/python/pyspark
> h2.  file:/users/jzeng/spark-2.3.0-bin-hadoop2.7/examples/src/main/python/mllib
> h2.  file:/users/jzeng/spark-2.3.0-bin-hadoop2.7/yarn
> h2.  file:/users/jzeng/spark-2.3.0-bin-hadoop2.7/examples/src/main/java/org/apache/spark/examples/sql/streaming
> h2.  file:/users/jzeng/spark-2.3.0-bin-hadoop2.7/examples/src/main/python
> h2.  file:/users/jzeng/spark-2.3.0-bin-hadoop2.7/python/pyspark/ml/param
> h2.  file:/users/jzeng/spark-2.3.0-bin-hadoop2.7/examples/src/main/java/org/apache/spark/examples/ml
> h2.  file:/users/jzeng/spark-2.3.0-bin-hadoop2.7/examples/src/main/java/org/apache/spark/examples/streaming
> h2.  file:/users/jzeng/spark-2.3.0-bin-hadoop2.7/examples/src/main/scala/org/apache/spark/examples/streaming
> h2.  file:/users/jzeng/spark-2.3.0-bin-hadoop2.7/examples/src/main/scala/org/apache/spark/examples/sql/hive
> h2.  file:/users/jzeng/spark-2.3.0-bin-hadoop2.7/jars
> h2.  file:/users/jzeng/spark-2.3.0-bin-hadoop2.7/data/mllib/images/kittens
> h2.  file:/users/jzeng/spark-2.3.0-bin-hadoop2.7/data/graphx
> h2.  
> h2. If provided paths are partition directories, please set "basePath" in the options of the data source to specify the root directory of the table. If there are multiple root directories, please load them separately and then union them.
> h2.  at scala.Predef$.assert(Predef.scala:170)
> h2.  at org.apache.spark.sql.execution.datasources.PartitioningUtils$.parsePartitions(PartitioningUtils.scala:133)
> h2.  at org.apache.spark.sql.execution.datasources.PartitioningUtils$.parsePartitions(PartitioningUtils.scala:98)
> h2.  at org.apache.spark.sql.execution.datasources.PartitioningAwareFileIndex.inferPartitioning(PartitioningAwareFileIndex.scala:153)
> h2.  at org.apache.spark.sql.execution.datasources.InMemoryFileIndex.partitionSpec(InMemoryFileIndex.scala:71)
> h2.  at org.apache.spark.sql.execution.datasources.PartitioningAwareFileIndex.partitionSchema(PartitioningAwareFileIndex.scala:50)
> h2.  at org.apache.spark.sql.execution.datasources.DataSource.combineInferredAndUserSpecifiedPartitionSchema(DataSource.scala:115)
> h2.  at org.apache.spark.sql.execution.datasources.DataSource.getOrInferFileFormatSchema(DataSource.scala:166)
> h2.  at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:392)
> h2.  at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:239)
> h2.  at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:227)
> h2.  at org.apache.spark.sql.DataFrameReader.csv(DataFrameReader.scala:594)
> h2.  at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> h2.  at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> h2.  at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> h2.  at java.base/java.lang.reflect.Method.invoke(Method.java:564)
> h2.  at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
> h2.  at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
> h2.  at py4j.Gateway.invoke(Gateway.java:282)
> h2.  at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
> h2.  at py4j.commands.CallCommand.execute(CallCommand.java:79)
> h2.  at py4j.GatewayConnection.run(GatewayConnection.java:214)
> h2.  at java.base/java.lang.Thread.run(Thread.java:844)
>  
> Any suggestion will be appreciated. Thanks in advance.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org