You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by ABHISHEK <ab...@gmail.com> on 2016/09/23 07:33:18 UTC

Spark Yarn Cluster with Reference File

Hello there,

I have Spark Application which refer to an external file ‘abc.drl’ and
having unstructured data.
Application is able to find this reference file if I  run app in Local mode
but in Yarn with Cluster mode, it is not able to  find the file in the
specified path.
I tried with both local and hdfs path with –-files option but it didn’t
work.


What is working ?
1. Current  Spark Application runs fine if I run it in Local mode as
mentioned below.
In below command   file path is local path not HDFS.
spark-submit --master local[*]  --class "com.abc.StartMain"
abc-0.0.1-SNAPSHOT-jar-with-dependencies.jar /home/abhietc/abc/abc.drl

3. I want to run this Spark application using Yarn with cluster mode.
For that, I used below command but application is not able to find the path
for the reference file abc.drl.I tried giving both local and HDFS path but
didn’t work.

spark-submit --master yarn --deploy-mode cluster  --files
/home/abhietc/abc/abc.drl --class com.abc.StartMain
abc-0.0.1-SNAPSHOT-jar-with-dependencies.jar /home/abhietc/abc/abc.drl

spark-submit --master yarn --deploy-mode cluster  --files hdfs://
abhietc.com:8020/user/abhietc/abc.drl --class com.abc.StartMain
abc-0.0.1-SNAPSHOT-jar-with-dependencies.jar hdfs://
abhietc.com:8020/user/abhietc/abc.drl

spark-submit --master yarn --deploy-mode cluster  --files hdfs://
abc.com:8020/tmp/abc.drl --class com.abc.StartMain
abc-0.0.1-SNAPSHOT-jar-with-dependencies.jar hdfs://abc.com:8020/tmp/abc.drl


Error Messages:
Surprising we are not doing any Write operation on reference file but still
log shows that application is trying to write to file instead reading the
file.
Also log shows File not found exception for both HDFS and Local path.
-------------
16/09/20 14:49:50 ERROR scheduler.JobScheduler: Error running job streaming
job 1474363176000 ms.0
org.apache.spark.SparkException: Job aborted due to stage failure: Task 0
in stage 1.0 failed 4 times, most recent failure: Lost task 0.3 in stage
1.0 (TID 4, abc.com): java.lang.RuntimeException: Unable to write Resource:
FileResource[file=hdfs:/abc.com:8020/user/abhietc/abc.drl]
        at
org.drools.compiler.kie.builder.impl.KieFileSystemImpl.write(KieFileSystemImpl.java:71)
        at
com.hmrc.taxcalculator.KieSessionFactory$.getNewSession(KieSessionFactory.scala:49)
        at
com.hmrc.taxcalculator.KieSessionFactory$.getKieSession(KieSessionFactory.scala:21)
        at
com.hmrc.taxcalculator.KieSessionFactory$.execute(KieSessionFactory.scala:27)
        at
com.abc.StartMain$$anonfun$main$1$$anonfun$4.apply(TaxCalculatorMain.scala:124)
        at
com.abc.StartMain$$anonfun$main$1$$anonfun$4.apply(TaxCalculatorMain.scala:124)
        at
org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$20.apply(RDD.scala:710)
        at
org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$20.apply(RDD.scala:710)
        at
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
        at
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
        at
org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
        at org.apache.spark.scheduler.Task.run(Task.scala:89)
        at
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
        at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.FileNotFoundException: hdfs:/
abc.com:8020/user/abhietc/abc.drl (No such file or directory)
        at java.io.FileInputStream.open(Native Method)
        at java.io.FileInputStream.<init>(FileInputStream.java:146)
        at
org.drools.core.io.impl.FileSystemResource.getInputStream(FileSystemResource.java:123)
        at
org.drools.compiler.kie.builder.impl.KieFileSystemImpl.write(KieFileSystemImpl.java:58)
        ... 19 more
--------------
Cheers,
Abhishek

Re: Spark Yarn Cluster with Reference File

Posted by Aditya <ad...@augmentiq.co.in>.
Hi Abhishek,

 From your spark-submit it seems your passing the file as a parameter to 
the driver program. So now it depends what exactly you are doing with 
that parameter. Using --files option it will be available to all the 
worker nodes but if in your code if you are referencing using the 
specified path in distributed mode it wont get the file on the worker nodes.

If you can share the snippet of code it will be easy to debug.

On Friday 23 September 2016 01:03 PM, ABHISHEK wrote:
> Hello there,
>
> I have Spark Application which refer to an external file \u2018abc.drl\u2019 and 
> having unstructured data.
> Application is able to find this reference file if I  run app in Local 
> mode but in Yarn with Cluster mode, it is not able to  find the file 
> in the specified path.
> I tried with both local and hdfs path with \u2013-files option but it 
> didn\u2019t work.
>
>
> What is working ?
> 1.Current  Spark Application runs fine if I run it in Local mode as 
> mentioned below.
> In below command   file path is local path not HDFS.
> spark-submit --master local[*]  --class "com.abc.StartMain" 
> abc-0.0.1-SNAPSHOT-jar-with-dependencies.jar /home/abhietc/abc/abc.drl
>
> 3.I want to run this Spark application using Yarn with cluster mode.
> For that, I used below command but application is not able to find the 
> path for the reference file abc.drl.I tried giving both local and HDFS 
> path but didn\u2019t work.
>
> spark-submit --master yarn --deploy-mode cluster  --files 
> /home/abhietc/abc/abc.drl --class com.abc.StartMain 
> abc-0.0.1-SNAPSHOT-jar-with-dependencies.jar /home/abhietc/abc/abc.drl
>
> spark-submit --master yarn --deploy-mode cluster  --files 
> hdfs://abhietc.com:8020/user/abhietc/abc.drl 
> <http://abhietc.com:8020/user/abhietc/abc.drl> --class 
> com.abc.StartMain abc-0.0.1-SNAPSHOT-jar-with-dependencies.jar 
> hdfs://abhietc.com:8020/user/abhietc/abc.drl 
> <http://abhietc.com:8020/user/abhietc/abc.drl>
>
> spark-submit --master yarn --deploy-mode cluster  --files 
> hdfs://abc.com:8020/tmp/abc.drl <http://abc.com:8020/tmp/abc.drl> 
> --class com.abc.StartMain abc-0.0.1-SNAPSHOT-jar-with-dependencies.jar 
> hdfs://abc.com:8020/tmp/abc.drl <http://abc.com:8020/tmp/abc.drl>
>
>
> Error Messages:
> Surprising we are not doing any Write operation on reference file but 
> still log shows that application is trying to write to file instead 
> reading the file.
> Also log shows File not found exception for both HDFS and Local path.
> -------------
> 16/09/20 14:49:50 ERROR scheduler.JobScheduler: Error running job 
> streaming job 1474363176000 ms.0
> org.apache.spark.SparkException: Job aborted due to stage failure: 
> Task 0 in stage 1.0 failed 4 times, most recent failure: Lost task 0.3 
> in stage 1.0 (TID 4, abc.com <http://abc.com>): 
> java.lang.RuntimeException: Unable to write Resource: 
> FileResource[file=hdfs:/abc.com:8020/user/abhietc/abc.drl 
> <http://abc.com:8020/user/abhietc/abc.drl>]
>         at 
> org.drools.compiler.kie.builder.impl.KieFileSystemImpl.write(KieFileSystemImpl.java:71)
>         at 
> com.hmrc.taxcalculator.KieSessionFactory$.getNewSession(KieSessionFactory.scala:49)
>         at 
> com.hmrc.taxcalculator.KieSessionFactory$.getKieSession(KieSessionFactory.scala:21)
>         at 
> com.hmrc.taxcalculator.KieSessionFactory$.execute(KieSessionFactory.scala:27)
>         at 
> com.abc.StartMain$$anonfun$main$1$$anonfun$4.apply(TaxCalculatorMain.scala:124)
>         at 
> com.abc.StartMain$$anonfun$main$1$$anonfun$4.apply(TaxCalculatorMain.scala:124)
>         at 
> org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$20.apply(RDD.scala:710)
>         at 
> org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$20.apply(RDD.scala:710)
>         at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
>         at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
>         at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
>         at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
>         at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
>         at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
>         at 
> org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
>         at org.apache.spark.scheduler.Task.run(Task.scala:89)
>         at 
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>         at java.lang.Thread.run(Thread.java:745)
> Caused by: java.io.FileNotFoundException: 
> hdfs:/abc.com:8020/user/abhietc/abc.drl 
> <http://abc.com:8020/user/abhietc/abc.drl> (No such file or directory)
>         at java.io.FileInputStream.open(Native Method)
>         at java.io.FileInputStream.<init>(FileInputStream.java:146)
>         at 
> org.drools.core.io.impl.FileSystemResource.getInputStream(FileSystemResource.java:123)
>         at 
> org.drools.compiler.kie.builder.impl.KieFileSystemImpl.write(KieFileSystemImpl.java:58)
>         ... 19 more
> --------------
> Cheers,
> Abhishek
>




Re: Spark Yarn Cluster with Reference File

Posted by Aditya <ad...@augmentiq.co.in>.
Hi Abhishek,

 From your spark-submit it seems your passing the file as a parameter to 
the driver program. So now it depends what exactly you are doing with 
that parameter. Using --files option it will be available to all the 
worker nodes but if in your code if you are referencing using the 
specified path in distributed mode it wont get the file on the worker nodes.

If you can share the snippet of code it will be easy to debug.

On Friday 23 September 2016 01:03 PM, ABHISHEK wrote:
> Hello there,
>
> I have Spark Application which refer to an external file \u2018abc.drl\u2019 and 
> having unstructured data.
> Application is able to find this reference file if I  run app in Local 
> mode but in Yarn with Cluster mode, it is not able to  find the file 
> in the specified path.
> I tried with both local and hdfs path with \u2013-files option but it 
> didn\u2019t work.
>
>
> What is working ?
> 1.Current  Spark Application runs fine if I run it in Local mode as 
> mentioned below.
> In below command   file path is local path not HDFS.
> spark-submit --master local[*]  --class "com.abc.StartMain" 
> abc-0.0.1-SNAPSHOT-jar-with-dependencies.jar /home/abhietc/abc/abc.drl
>
> 3.I want to run this Spark application using Yarn with cluster mode.
> For that, I used below command but application is not able to find the 
> path for the reference file abc.drl.I tried giving both local and HDFS 
> path but didn\u2019t work.
>
> spark-submit --master yarn --deploy-mode cluster  --files 
> /home/abhietc/abc/abc.drl --class com.abc.StartMain 
> abc-0.0.1-SNAPSHOT-jar-with-dependencies.jar /home/abhietc/abc/abc.drl
>
> spark-submit --master yarn --deploy-mode cluster  --files 
> hdfs://abhietc.com:8020/user/abhietc/abc.drl 
> <http://abhietc.com:8020/user/abhietc/abc.drl> --class 
> com.abc.StartMain abc-0.0.1-SNAPSHOT-jar-with-dependencies.jar 
> hdfs://abhietc.com:8020/user/abhietc/abc.drl 
> <http://abhietc.com:8020/user/abhietc/abc.drl>
>
> spark-submit --master yarn --deploy-mode cluster  --files 
> hdfs://abc.com:8020/tmp/abc.drl <http://abc.com:8020/tmp/abc.drl> 
> --class com.abc.StartMain abc-0.0.1-SNAPSHOT-jar-with-dependencies.jar 
> hdfs://abc.com:8020/tmp/abc.drl <http://abc.com:8020/tmp/abc.drl>
>
>
> Error Messages:
> Surprising we are not doing any Write operation on reference file but 
> still log shows that application is trying to write to file instead 
> reading the file.
> Also log shows File not found exception for both HDFS and Local path.
> -------------
> 16/09/20 14:49:50 ERROR scheduler.JobScheduler: Error running job 
> streaming job 1474363176000 ms.0
> org.apache.spark.SparkException: Job aborted due to stage failure: 
> Task 0 in stage 1.0 failed 4 times, most recent failure: Lost task 0.3 
> in stage 1.0 (TID 4, abc.com <http://abc.com>): 
> java.lang.RuntimeException: Unable to write Resource: 
> FileResource[file=hdfs:/abc.com:8020/user/abhietc/abc.drl 
> <http://abc.com:8020/user/abhietc/abc.drl>]
>         at 
> org.drools.compiler.kie.builder.impl.KieFileSystemImpl.write(KieFileSystemImpl.java:71)
>         at 
> com.hmrc.taxcalculator.KieSessionFactory$.getNewSession(KieSessionFactory.scala:49)
>         at 
> com.hmrc.taxcalculator.KieSessionFactory$.getKieSession(KieSessionFactory.scala:21)
>         at 
> com.hmrc.taxcalculator.KieSessionFactory$.execute(KieSessionFactory.scala:27)
>         at 
> com.abc.StartMain$$anonfun$main$1$$anonfun$4.apply(TaxCalculatorMain.scala:124)
>         at 
> com.abc.StartMain$$anonfun$main$1$$anonfun$4.apply(TaxCalculatorMain.scala:124)
>         at 
> org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$20.apply(RDD.scala:710)
>         at 
> org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$20.apply(RDD.scala:710)
>         at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
>         at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
>         at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
>         at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
>         at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
>         at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
>         at 
> org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
>         at org.apache.spark.scheduler.Task.run(Task.scala:89)
>         at 
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>         at java.lang.Thread.run(Thread.java:745)
> Caused by: java.io.FileNotFoundException: 
> hdfs:/abc.com:8020/user/abhietc/abc.drl 
> <http://abc.com:8020/user/abhietc/abc.drl> (No such file or directory)
>         at java.io.FileInputStream.open(Native Method)
>         at java.io.FileInputStream.<init>(FileInputStream.java:146)
>         at 
> org.drools.core.io.impl.FileSystemResource.getInputStream(FileSystemResource.java:123)
>         at 
> org.drools.compiler.kie.builder.impl.KieFileSystemImpl.write(KieFileSystemImpl.java:58)
>         ... 19 more
> --------------
> Cheers,
> Abhishek
>




Re: Spark Yarn Cluster with Reference File

Posted by ayan guha <gu...@gmail.com>.
You may try copying the file to same location on all nodes and try to read
from that place
On 24 Sep 2016 00:20, "ABHISHEK" <ab...@gmail.com> wrote:

> I have tried with hdfs/tmp location but it didn't work. Same error.
>
> On 23 Sep 2016 19:37, "Aditya" <ad...@augmentiq.co.in> wrote:
>
>> Hi Abhishek,
>>
>> Try below spark submit.
>> spark-submit --master yarn --deploy-mode cluster  --files hdfs://
>> abc.com:8020/tmp/abc.drl --class com.abc.StartMain
>> abc-0.0.1-SNAPSHOT-jar-with-dependencies.jar abc.drl
>> <http://abc.com:8020/tmp/abc.drl>
>>
>> On Friday 23 September 2016 07:29 PM, ABHISHEK wrote:
>>
>> Thanks for your response Aditya and Steve.
>> Steve:
>> I have tried specifying both /tmp/filename in hdfs and local path but it
>> didn't work.
>> You may be write that Kie session is configured  to  access files from
>> Local path.
>> I have attached code here for your reference and if you find some thing
>> wrong, please help to correct it.
>>
>> Aditya:
>> I have attached code here for reference. --File option will distributed
>> reference file to all node but  Kie session is not able  to pickup it.
>>
>> Thanks,
>> Abhishek
>>
>> On Fri, Sep 23, 2016 at 2:25 PM, Steve Loughran <st...@hortonworks.com>
>> wrote:
>>
>>>
>>> On 23 Sep 2016, at 08:33, ABHISHEK <ab...@gmail.com> wrote:
>>>
>>>         at java.lang.Thread.run(Thread.java:745)
>>> Caused by: java.io.FileNotFoundException: hdfs:/abc.com:8020/user/abhiet
>>> c/abc.drl (No such file or directory)
>>>         at java.io.FileInputStream.open(Native Method)
>>>         at java.io.FileInputStream.<init>(FileInputStream.java:146)
>>>         at org.drools.core.io.impl.FileSystemResource.getInputStream(Fi
>>> leSystemResource.java:123)
>>>         at org.drools.compiler.kie.builder.impl.KieFileSystemImpl.write
>>> (KieFileSystemImpl.java:58)
>>>
>>>
>>>
>>> Looks like this .KieFileSystemImpl class only works with local files, so
>>> when it gets an HDFS path in it tries to open it and gets confused.
>>>
>>> you may need to write to a local FS temp file then copy it into HDFS
>>>
>>
>>
>>
>>

Re: Spark Yarn Cluster with Reference File

Posted by ABHISHEK <ab...@gmail.com>.
I have tried with hdfs/tmp location but it didn't work. Same error.

On 23 Sep 2016 19:37, "Aditya" <ad...@augmentiq.co.in> wrote:

> Hi Abhishek,
>
> Try below spark submit.
> spark-submit --master yarn --deploy-mode cluster  --files hdfs://
> abc.com:8020/tmp/abc.drl --class com.abc.StartMain
> abc-0.0.1-SNAPSHOT-jar-with-dependencies.jar abc.drl
> <http://abc.com:8020/tmp/abc.drl>
>
> On Friday 23 September 2016 07:29 PM, ABHISHEK wrote:
>
> Thanks for your response Aditya and Steve.
> Steve:
> I have tried specifying both /tmp/filename in hdfs and local path but it
> didn't work.
> You may be write that Kie session is configured  to  access files from
> Local path.
> I have attached code here for your reference and if you find some thing
> wrong, please help to correct it.
>
> Aditya:
> I have attached code here for reference. --File option will distributed
> reference file to all node but  Kie session is not able  to pickup it.
>
> Thanks,
> Abhishek
>
> On Fri, Sep 23, 2016 at 2:25 PM, Steve Loughran <st...@hortonworks.com>
> wrote:
>
>>
>> On 23 Sep 2016, at 08:33, ABHISHEK <ab...@gmail.com> wrote:
>>
>>         at java.lang.Thread.run(Thread.java:745)
>> Caused by: java.io.FileNotFoundException: hdfs:/abc.com:8020/user/abhiet
>> c/abc.drl (No such file or directory)
>>         at java.io.FileInputStream.open(Native Method)
>>         at java.io.FileInputStream.<init>(FileInputStream.java:146)
>>         at org.drools.core.io.impl.FileSystemResource.getInputStream(Fi
>> leSystemResource.java:123)
>>         at org.drools.compiler.kie.builder.impl.KieFileSystemImpl.
>> write(KieFileSystemImpl.java:58)
>>
>>
>>
>> Looks like this .KieFileSystemImpl class only works with local files, so
>> when it gets an HDFS path in it tries to open it and gets confused.
>>
>> you may need to write to a local FS temp file then copy it into HDFS
>>
>
>
>
>

Re: Spark Yarn Cluster with Reference File

Posted by Aditya <ad...@augmentiq.co.in>.
Hi Abhishek,

Try below spark submit.
spark-submit --master yarn --deploy-mode cluster  --files 
hdfs://abc.com:8020/tmp/abc.drl <http://abc.com:8020/tmp/abc.drl> 
--class com.abc.StartMain abc-0.0.1-SNAPSHOT-jar-with-dependencies.jar 
abc.drl <http://abc.com:8020/tmp/abc.drl>

On Friday 23 September 2016 07:29 PM, ABHISHEK wrote:
> Thanks for your response Aditya and Steve.
> Steve:
> I have tried specifying both /tmp/filename in hdfs and local path but 
> it didn't work.
> You may be write that Kie session is configured  to  access files from 
> Local path.
> I have attached code here for your reference and if you find some 
> thing wrong, please help to correct it.
>
> Aditya:
> I have attached code here for reference. --File option will 
> distributed reference file to all node but  Kie session is not able 
>  to pickup it.
>
> Thanks,
> Abhishek
>
> On Fri, Sep 23, 2016 at 2:25 PM, Steve Loughran 
> <stevel@hortonworks.com <ma...@hortonworks.com>> wrote:
>
>
>>     On 23 Sep 2016, at 08:33, ABHISHEK <abhietc@gmail.com
>>     <ma...@gmail.com>> wrote:
>>
>>             at java.lang.Thread.run(Thread.java:745)
>>     Caused by: java.io.FileNotFoundException:
>>     hdfs:/abc.com:8020/user/abhietc/abc.drl
>>     <http://abc.com:8020/user/abhietc/abc.drl>(No such file or directory)
>>             at java.io.FileInputStream.open(Native Method)
>>             at java.io.FileInputStream.<init>(FileInputStream.java:146)
>>             at
>>     org.drools.core.io.impl.FileSystemResource.getInputStream(FileSystemResource.java:123)
>>             at
>>     org.drools.compiler.kie.builder.impl.KieFileSystemImpl.write(KieFileSystemImpl.java:58)
>
>
>     Looks like this .KieFileSystemImpl class only works with local
>     files, so when it gets an HDFS path in it tries to open it and
>     gets confused.
>
>     you may need to write to a local FS temp file then copy it into HDFS
>
>




Re: Spark Yarn Cluster with Reference File

Posted by ABHISHEK <ab...@gmail.com>.
Thanks for your response Aditya and Steve.
Steve:
I have tried specifying both /tmp/filename in hdfs and local path but it
didn't work.
You may be write that Kie session is configured  to  access files from
Local path.
I have attached code here for your reference and if you find some thing
wrong, please help to correct it.

Aditya:
I have attached code here for reference. --File option will distributed
reference file to all node but  Kie session is not able  to pickup it.

Thanks,
Abhishek

On Fri, Sep 23, 2016 at 2:25 PM, Steve Loughran <st...@hortonworks.com>
wrote:

>
> On 23 Sep 2016, at 08:33, ABHISHEK <ab...@gmail.com> wrote:
>
>         at java.lang.Thread.run(Thread.java:745)
> Caused by: java.io.FileNotFoundException: hdfs:/abc.com:8020/user/
> abhietc/abc.drl (No such file or directory)
>         at java.io.FileInputStream.open(Native Method)
>         at java.io.FileInputStream.<init>(FileInputStream.java:146)
>         at org.drools.core.io.impl.FileSystemResource.getInputStream(
> FileSystemResource.java:123)
>         at org.drools.compiler.kie.builder.impl.KieFileSystemImpl.write(
> KieFileSystemImpl.java:58)
>
>
>
> Looks like this .KieFileSystemImpl class only works with local files, so
> when it gets an HDFS path in it tries to open it and gets confused.
>
> you may need to write to a local FS temp file then copy it into HDFS
>

Re: Spark Yarn Cluster with Reference File

Posted by Steve Loughran <st...@hortonworks.com>.
On 23 Sep 2016, at 08:33, ABHISHEK <ab...@gmail.com>> wrote:

        at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.FileNotFoundException: hdfs:/abc.com:8020/user/abhietc/abc.drl<http://abc.com:8020/user/abhietc/abc.drl> (No such file or directory)
        at java.io.FileInputStream.open(Native Method)
        at java.io.FileInputStream.<init>(FileInputStream.java:146)
        at org.drools.core.io.impl.FileSystemResource.getInputStream(FileSystemResource.java:123)
        at org.drools.compiler.kie.builder.impl.KieFileSystemImpl.write(KieFileSystemImpl.java:58)


Looks like this .KieFileSystemImpl class only works with local files, so when it gets an HDFS path in it tries to open it and gets confused.

you may need to write to a local FS temp file then copy it into HDFS