You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by ankit tyagi <an...@gmail.com> on 2015/07/19 21:03:55 UTC
Exception while triggering spark job from remote jvm
Hi,
I am using below code to trigger spark job from remote jvm.
import org.apache.hadoop.conf.Configuration;
import org.apache.spark.SparkConf;
import org.apache.spark.deploy.yarn.Client;
import org.apache.spark.deploy.yarn.ClientArguments;
/**
* @version 1.0, 15-Jul-2015
* @author ankit
*/
public class QueryEngineImpl implements IQueryEngine {
SparkSqlEngine sqlEngine;
public QueryEngineImpl(SparkSqlEngine sparkSqlEngine) {
this.sqlEngine = sparkSqlEngine;
}
@Override
public void executeQuery(String query, String resultLocation, String...
parquetFileLocation) {
// TODO Auto-generated method stub
String[] args = new String[] {
// the name of your application
"--name",
"RemoteJVM",
// memory for driver (optional)
"--driver-memory",
"1000M",
// path to your application's JAR file
// required in yarn-cluster mode
"--jar",
"hdfs://
52.24.76.10:9000/ankit/SPARKSQLPOC-0.0.1-SNAPSHOT-jar-with-dependencies.jar
",
// name of your application's main class (required)
"--class",
"SparkSqlEngine",
// argument 1 to your Spark program
(SparkFriendRecommendation)
"--arg",
query,
// argument 2 to your Spark program
(SparkFriendRecommendation)
"--arg",
resultLocation,
// argument 3 to your Spark program
(SparkFriendRecommendation)
"--arg",
parquetFileLocation[0],
"--arg",
"yarn-cluster" };
Configuration conf = new Configuration();
conf.set("yarn.resourcemanager.address", "52.24.76.10:9022");
conf.set("HADOOP_HOME", "/home/hadoop");
System.setProperty("SPARK_YARN_MODE", "true");
SparkConf sparkConf = new SparkConf();
System.out.println("SPARK CONF" + sparkConf.toDebugString());
// create ClientArguments, which will be passed to Client
org.apache.spark.deploy.yarn.ClientArguments cArgs = new
ClientArguments(args, sparkConf);
//create a insntance of yarn client
Client client = new Client(cArgs, conf, sparkConf);
client.run();
}
public static void main(String[] args) {
QueryEngineImpl impl = new QueryEngineImpl(null);
impl.executeQuery("select count(*) from parquetTable",
"/tmp/ankit.txt",
"s3n://AKIAJPLOFN3DM27DIIUQ:zKsFTopwgmu4zNdAfZ5Xe+Qe0XtbegHLTgy629VB@hadoop-poc-ashish
/parquet");
}
}
But I am getting below exception.
*23:08:09.268 [main] WARN org.apache.hadoop.hdfs.DFSClient - Failed to
connect to /172.31.24.27:9200 <http://172.31.24.27:9200> for block, add to
deadNodes and continue. org.apache.hadoop.net.ConnectTimeoutException:
60000 millis timeout while waiting for channel to be ready for connect. ch
: java.nio.channels.SocketChannel[connection-pending
remote=/172.31.24.27:9200 <http://172.31.24.27:9200>]*
org.apache.hadoop.net.ConnectTimeoutException: 60000 millis timeout while
waiting for channel to be ready for connect. ch :
java.nio.channels.SocketChannel[connection-pending remote=/172.31.24.27:9200
]
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:532)
~[hadoop-common-2.2.0.jar:na]
at
org.apache.hadoop.hdfs.DFSInputStream.newTcpPeer(DFSInputStream.java:955)
*23:08:09.269 [main] WARN org.apache.hadoop.hdfs.DFSClient - DFS Read*
*org.apache.hadoop.hdfs.BlockMissingException: Could not obtain block:
BP-511626939-172.31.24.27-1436185102368:blk_1073741964_126634
file=/ankit/SPARKSQLPOC-0.0.1-SNAPSHOT-jar-with-dependencies.jar*
* at
org.apache.hadoop.hdfs.DFSInputStream.chooseDataNode(DFSInputStream.java:838)
[hadoop-hdfs-2.2.0.jar:na]*
* at
org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:526)
[hadoop-hdfs-2.2.0.jar:na]*
at
org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:749)
[hadoop-hdfs-2.2.0.jar:na]
at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:793)
[hadoop-hdfs-2.2.0.jar:na]
at java.io.DataInputStream.read(DataInputStream.java:100) [na:1.7.0_80]
Please suggest me is there any other way to trigger job from remote jvm??
Re: Exception while triggering spark job from remote jvm
Posted by Akhil Das <ak...@sigmoidanalytics.com>.
Just make sure there is no firewall/network blocking the requests as its
complaining about timeout.
Thanks
Best Regards
On Mon, Jul 20, 2015 at 1:14 AM, ankit tyagi <an...@gmail.com>
wrote:
> Just to add more information. I have checked the status of this file, not
> a single block is corrupted.
>
> *[hadoop@ip-172-31-24-27 ~]$ hadoop fsck /ankit -files -blocks*
> *DEPRECATED: Use of this script to execute hdfs command is deprecated.*
> *Instead use the hdfs command for it.*
>
> Connecting to namenode via
> http://ip-172-31-24-27.us-west-2.compute.internal:9101
> FSCK started by hadoop (auth:SIMPLE) from /172.31.24.27 for path /ankit
> at Sun Jul 19 19:11:37 UTC 2015
> /ankit <dir>
> /ankit/SPARKSQLPOC-0.0.1-SNAPSHOT-jar-with-dependencies.jar 103599417
> bytes, 1 block(s): OK
> 0. BP-511626939-172.31.24.27-1436185102368:blk_1073741964_126634
> len=103599417 repl=1
>
> *Status: HEALTHY*
> * Total size: 103599417 B*
> * Total dirs: 1*
> * Total files: 1*
> Total symlinks: 0
> Total blocks (validated): 1 (avg. block size 103599417 B)
> Minimally replicated blocks: 1 (100.0 %)
> Over-replicated blocks: 0 (0.0 %)
> Under-replicated blocks: 0 (0.0 %)
> Mis-replicated blocks: 0 (0.0 %)
> Default replication factor: 1
> Average block replication: 1.0
> * Corrupt blocks: 0*
> Missing replicas: 0 (0.0 %)
> Number of data-nodes: 4
> Number of racks: 1
> FSCK ended at Sun Jul 19 19:11:37 UTC 2015 in 1 milliseconds
>
>
> On Mon, Jul 20, 2015 at 12:33 AM, ankit tyagi <an...@gmail.com>
> wrote:
>
>> Hi,
>>
>> I am using below code to trigger spark job from remote jvm.
>>
>> import org.apache.hadoop.conf.Configuration;
>> import org.apache.spark.SparkConf;
>> import org.apache.spark.deploy.yarn.Client;
>> import org.apache.spark.deploy.yarn.ClientArguments;
>>
>> /**
>> * @version 1.0, 15-Jul-2015
>> * @author ankit
>> */
>>
>> public class QueryEngineImpl implements IQueryEngine {
>>
>> SparkSqlEngine sqlEngine;
>>
>> public QueryEngineImpl(SparkSqlEngine sparkSqlEngine) {
>> this.sqlEngine = sparkSqlEngine;
>> }
>>
>> @Override
>> public void executeQuery(String query, String resultLocation,
>> String... parquetFileLocation) {
>> // TODO Auto-generated method stub
>> String[] args = new String[] {
>> // the name of your application
>> "--name",
>> "RemoteJVM",
>>
>> // memory for driver (optional)
>> "--driver-memory",
>> "1000M",
>>
>> // path to your application's JAR file
>> // required in yarn-cluster mode
>> "--jar",
>> "hdfs://
>> 52.24.76.10:9000/ankit/SPARKSQLPOC-0.0.1-SNAPSHOT-jar-with-dependencies.jar
>> ",
>>
>> // name of your application's main class (required)
>> "--class",
>> "SparkSqlEngine",
>>
>> // argument 1 to your Spark program
>> (SparkFriendRecommendation)
>> "--arg",
>> query,
>>
>> // argument 2 to your Spark program
>> (SparkFriendRecommendation)
>> "--arg",
>> resultLocation,
>>
>> // argument 3 to your Spark program
>> (SparkFriendRecommendation)
>> "--arg",
>> parquetFileLocation[0],
>>
>> "--arg",
>> "yarn-cluster" };
>>
>> Configuration conf = new Configuration();
>> conf.set("yarn.resourcemanager.address", "52.24.76.10:9022");
>> conf.set("HADOOP_HOME", "/home/hadoop");
>>
>> System.setProperty("SPARK_YARN_MODE", "true");
>> SparkConf sparkConf = new SparkConf();
>> System.out.println("SPARK CONF" + sparkConf.toDebugString());
>> // create ClientArguments, which will be passed to Client
>> org.apache.spark.deploy.yarn.ClientArguments cArgs = new
>> ClientArguments(args, sparkConf);
>> //create a insntance of yarn client
>> Client client = new Client(cArgs, conf, sparkConf);
>>
>> client.run();
>> }
>>
>> public static void main(String[] args) {
>> QueryEngineImpl impl = new QueryEngineImpl(null);
>> impl.executeQuery("select count(*) from parquetTable",
>> "/tmp/ankit.txt",
>> "s3n://AKIAJPLOFN3DM27DIIUQ:zKsFTopwgmu4zNdAfZ5Xe+Qe0XtbegHLTgy629VB@hadoop-poc-ashish
>> /parquet");
>> }
>> }
>>
>>
>> But I am getting below exception.
>>
>> *23:08:09.268 [main] WARN org.apache.hadoop.hdfs.DFSClient - Failed to
>> connect to /172.31.24.27:9200 <http://172.31.24.27:9200> for block, add to
>> deadNodes and continue. org.apache.hadoop.net.ConnectTimeoutException:
>> 60000 millis timeout while waiting for channel to be ready for connect. ch
>> : java.nio.channels.SocketChannel[connection-pending
>> remote=/172.31.24.27:9200 <http://172.31.24.27:9200>]*
>> org.apache.hadoop.net.ConnectTimeoutException: 60000 millis timeout while
>> waiting for channel to be ready for connect. ch :
>> java.nio.channels.SocketChannel[connection-pending remote=/
>> 172.31.24.27:9200]
>> at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:532)
>> ~[hadoop-common-2.2.0.jar:na]
>> at
>> org.apache.hadoop.hdfs.DFSInputStream.newTcpPeer(DFSInputStream.java:955)
>>
>>
>> *23:08:09.269 [main] WARN org.apache.hadoop.hdfs.DFSClient - DFS Read*
>> *org.apache.hadoop.hdfs.BlockMissingException: Could not obtain block:
>> BP-511626939-172.31.24.27-1436185102368:blk_1073741964_126634
>> file=/ankit/SPARKSQLPOC-0.0.1-SNAPSHOT-jar-with-dependencies.jar*
>> * at
>> org.apache.hadoop.hdfs.DFSInputStream.chooseDataNode(DFSInputStream.java:838)
>> [hadoop-hdfs-2.2.0.jar:na]*
>> * at
>> org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:526)
>> [hadoop-hdfs-2.2.0.jar:na]*
>> at
>> org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:749)
>> [hadoop-hdfs-2.2.0.jar:na]
>> at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:793)
>> [hadoop-hdfs-2.2.0.jar:na]
>> at java.io.DataInputStream.read(DataInputStream.java:100) [na:1.7.0_80]
>>
>>
>> Please suggest me is there any other way to trigger job from remote jvm??
>>
>
>
Re: Exception while triggering spark job from remote jvm
Posted by ankit tyagi <an...@gmail.com>.
Just to add more information. I have checked the status of this file, not a
single block is corrupted.
*[hadoop@ip-172-31-24-27 ~]$ hadoop fsck /ankit -files -blocks*
*DEPRECATED: Use of this script to execute hdfs command is deprecated.*
*Instead use the hdfs command for it.*
Connecting to namenode via
http://ip-172-31-24-27.us-west-2.compute.internal:9101
FSCK started by hadoop (auth:SIMPLE) from /172.31.24.27 for path /ankit at
Sun Jul 19 19:11:37 UTC 2015
/ankit <dir>
/ankit/SPARKSQLPOC-0.0.1-SNAPSHOT-jar-with-dependencies.jar 103599417
bytes, 1 block(s): OK
0. BP-511626939-172.31.24.27-1436185102368:blk_1073741964_126634
len=103599417 repl=1
*Status: HEALTHY*
* Total size: 103599417 B*
* Total dirs: 1*
* Total files: 1*
Total symlinks: 0
Total blocks (validated): 1 (avg. block size 103599417 B)
Minimally replicated blocks: 1 (100.0 %)
Over-replicated blocks: 0 (0.0 %)
Under-replicated blocks: 0 (0.0 %)
Mis-replicated blocks: 0 (0.0 %)
Default replication factor: 1
Average block replication: 1.0
* Corrupt blocks: 0*
Missing replicas: 0 (0.0 %)
Number of data-nodes: 4
Number of racks: 1
FSCK ended at Sun Jul 19 19:11:37 UTC 2015 in 1 milliseconds
On Mon, Jul 20, 2015 at 12:33 AM, ankit tyagi <an...@gmail.com>
wrote:
> Hi,
>
> I am using below code to trigger spark job from remote jvm.
>
> import org.apache.hadoop.conf.Configuration;
> import org.apache.spark.SparkConf;
> import org.apache.spark.deploy.yarn.Client;
> import org.apache.spark.deploy.yarn.ClientArguments;
>
> /**
> * @version 1.0, 15-Jul-2015
> * @author ankit
> */
>
> public class QueryEngineImpl implements IQueryEngine {
>
> SparkSqlEngine sqlEngine;
>
> public QueryEngineImpl(SparkSqlEngine sparkSqlEngine) {
> this.sqlEngine = sparkSqlEngine;
> }
>
> @Override
> public void executeQuery(String query, String resultLocation,
> String... parquetFileLocation) {
> // TODO Auto-generated method stub
> String[] args = new String[] {
> // the name of your application
> "--name",
> "RemoteJVM",
>
> // memory for driver (optional)
> "--driver-memory",
> "1000M",
>
> // path to your application's JAR file
> // required in yarn-cluster mode
> "--jar",
> "hdfs://
> 52.24.76.10:9000/ankit/SPARKSQLPOC-0.0.1-SNAPSHOT-jar-with-dependencies.jar
> ",
>
> // name of your application's main class (required)
> "--class",
> "SparkSqlEngine",
>
> // argument 1 to your Spark program
> (SparkFriendRecommendation)
> "--arg",
> query,
>
> // argument 2 to your Spark program
> (SparkFriendRecommendation)
> "--arg",
> resultLocation,
>
> // argument 3 to your Spark program
> (SparkFriendRecommendation)
> "--arg",
> parquetFileLocation[0],
>
> "--arg",
> "yarn-cluster" };
>
> Configuration conf = new Configuration();
> conf.set("yarn.resourcemanager.address", "52.24.76.10:9022");
> conf.set("HADOOP_HOME", "/home/hadoop");
>
> System.setProperty("SPARK_YARN_MODE", "true");
> SparkConf sparkConf = new SparkConf();
> System.out.println("SPARK CONF" + sparkConf.toDebugString());
> // create ClientArguments, which will be passed to Client
> org.apache.spark.deploy.yarn.ClientArguments cArgs = new
> ClientArguments(args, sparkConf);
> //create a insntance of yarn client
> Client client = new Client(cArgs, conf, sparkConf);
>
> client.run();
> }
>
> public static void main(String[] args) {
> QueryEngineImpl impl = new QueryEngineImpl(null);
> impl.executeQuery("select count(*) from parquetTable",
> "/tmp/ankit.txt",
> "s3n://AKIAJPLOFN3DM27DIIUQ:zKsFTopwgmu4zNdAfZ5Xe+Qe0XtbegHLTgy629VB@hadoop-poc-ashish
> /parquet");
> }
> }
>
>
> But I am getting below exception.
>
> *23:08:09.268 [main] WARN org.apache.hadoop.hdfs.DFSClient - Failed to
> connect to /172.31.24.27:9200 <http://172.31.24.27:9200> for block, add to
> deadNodes and continue. org.apache.hadoop.net.ConnectTimeoutException:
> 60000 millis timeout while waiting for channel to be ready for connect. ch
> : java.nio.channels.SocketChannel[connection-pending
> remote=/172.31.24.27:9200 <http://172.31.24.27:9200>]*
> org.apache.hadoop.net.ConnectTimeoutException: 60000 millis timeout while
> waiting for channel to be ready for connect. ch :
> java.nio.channels.SocketChannel[connection-pending remote=/
> 172.31.24.27:9200]
> at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:532)
> ~[hadoop-common-2.2.0.jar:na]
> at
> org.apache.hadoop.hdfs.DFSInputStream.newTcpPeer(DFSInputStream.java:955)
>
>
> *23:08:09.269 [main] WARN org.apache.hadoop.hdfs.DFSClient - DFS Read*
> *org.apache.hadoop.hdfs.BlockMissingException: Could not obtain block:
> BP-511626939-172.31.24.27-1436185102368:blk_1073741964_126634
> file=/ankit/SPARKSQLPOC-0.0.1-SNAPSHOT-jar-with-dependencies.jar*
> * at
> org.apache.hadoop.hdfs.DFSInputStream.chooseDataNode(DFSInputStream.java:838)
> [hadoop-hdfs-2.2.0.jar:na]*
> * at
> org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:526)
> [hadoop-hdfs-2.2.0.jar:na]*
> at
> org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:749)
> [hadoop-hdfs-2.2.0.jar:na]
> at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:793)
> [hadoop-hdfs-2.2.0.jar:na]
> at java.io.DataInputStream.read(DataInputStream.java:100) [na:1.7.0_80]
>
>
> Please suggest me is there any other way to trigger job from remote jvm??
>