You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Hiren Ghinaiya (JIRA)" <ji...@apache.org> on 2016/07/13 14:47:20 UTC

[jira] [Comment Edited] (SPARK-16428) Spark file system watcher not working on Windows

    [ https://issues.apache.org/jira/browse/SPARK-16428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15375054#comment-15375054 ] 

Hiren Ghinaiya edited comment on SPARK-16428 at 7/13/16 2:47 PM:
-----------------------------------------------------------------

Hello, This more looks like Hadoop setup issue on windows. You need to provide directory hosted on hadoop compatible file system (HCFS) otherwise it will not auto-detect. Follow steps at https://wiki.apache.org/hadoop/Hadoop2OnWindows while running hadoop on windows. 

Instead of compiling hadoop, I used hadoop compiled binaries for 64 bits windows 7 hosted at https://github.com/karthikj1/Hadoop-2.7.1-Windows-64-binaries. To use this hadoop version I need to use spark version that is pre-built for user provided hadoop. I set SPARK_DIST_CLASSPATH as mentioned in https://spark.apache.org/docs/latest/hadoop-provided.html. Also put %HADOOP_HOME%\lib\native on PATH. Once setup, I followed steps 3.1,3.3,3.4 and 3.5 mentioned at https://wiki.apache.org/hadoop/Hadoop2OnWindows to start local HDFS. While running HdfsWordCount I need to pass hdfs:///tmp as directory path arg. Now I see spark is able to detect new file showing up in HDFS.


was (Author: hghina0):
Hello, I used hadoop compiled binaries for 64 bits windows 7 hosted at https://github.com/karthikj1/Hadoop-2.7.1-Windows-64-binaries. To use this hadoop version I need to use spark version that is pre-built for user provided hadoop. I set SPARK_DIST_CLASSPATH as mentioned in https://spark.apache.org/docs/latest/hadoop-provided.html. Also put %HADOOP_HOME%\lib\native on PATH. Once setup, I followed steps 3.1,3.3,3.4 and 3.5 mentioned at https://wiki.apache.org/hadoop/Hadoop2OnWindows to start local HDFS. While running HdfsWordCount I need to pass hdfs:///tmp as directory path arg. Now I see spark is able to detect new file showing up in HDFS.

> Spark file system watcher not working on Windows
> ------------------------------------------------
>
>                 Key: SPARK-16428
>                 URL: https://issues.apache.org/jira/browse/SPARK-16428
>             Project: Spark
>          Issue Type: Bug
>          Components: Examples, Input/Output, Spark Core, Windows
>    Affects Versions: 1.6.2
>         Environment: Ubuntu 15.10 64 bit,  Windows 7 Enterprise 64 bit
>            Reporter: John-Michael Reed
>            Priority: Blocker
>
> Two people tested Apache Spark on their computers...
> [Spark Download - http://i.stack.imgur.com/z1oqu.png]
> We downloaded the version of Spark prebuild for Hadoop 2.6, went to the folder /spark-1.6.2-bin-hadoop2.6/, created a "tmp" directory, went to that directory, and ran:
> $ bin/run-example org.apache.spark.examples.streaming.HdfsWordCount tmp
> I added arbitrary files content1 and content2dssdgdg to that "tmp" directory.
> -------------------------------------------
> Time: 1467921704000 ms
> -------------------------------------------
> (content1,1)
> (content2dssdgdg,1)
> -------------------------------------------
> Time: 1467921706000 ms
> Spark detected those files with the above terminal output on my Ubuntu 15.10 laptop, but not on my colleague's Windows 7 Enterprise laptop.
> This is preventing us from getting work done with Spark.
> Link: http://stackoverflow.com/questions/38254405/spark-file-system-watcher-not-working-on-windows



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org