You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@predictionio.apache.org by Carlos Vidal <ca...@beeva.com> on 2017/08/02 11:52:43 UTC

Error when importing data

Hello,

I have installed the pio + ur AMI in AWS, in an m4.2xlarge instance with
32GB of RAM and 8 VCPU.

When I try to import a 20GB events file por my application, the system
crashes. The command I have used is:


pio import --appid 4 --input my_events.json

this command launch an spark job that needs to perform 800 task. When the
process reaches the task 211 it crashes. This is what I can see in my
pio.log file:

2017-08-02 11:16:17,101 WARN  org.apache.hadoop.hbase.
client.HConnectionManager$HConnectionImplementation [htable-pool230-t1] -
Encountered problems when prefetch hbase:meta table:
org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed after
attempts=35, exceptions:
Wed Aug 02 11:07:06 UTC 2017, org.apache.hadoop.hbase.
client.RpcRetryingCaller@475db952,
org.apache.hadoop.hbase.ipc.RpcClient$FailedServerException:
This server is in the failed servers list: localhost/127.0.0.1:44866
Wed Aug 02 11:07:07 UTC 2017, org.apache.hadoop.hbase.
client.RpcRetryingCaller@475db952,
org.apache.hadoop.hbase.ipc.RpcClient$FailedServerException:
This server is in the failed servers list: localhost/127.0.0.1:44866
Wed Aug 02 11:07:07 UTC 2017, org.apache.hadoop.hbase.
client.RpcRetryingCaller@475db952,
org.apache.hadoop.hbase.ipc.RpcClient$FailedServerException:
This server is in the failed servers list: localhost/127.0.0.1:44866
Wed Aug 02 11:07:08 UTC 2017, org.apache.hadoop.hbase.
client.RpcRetryingCaller@475db952,
org.apache.hadoop.hbase.ipc.RpcClient$FailedServerException:
This server is in the failed servers list: localhost/127.0.0.1:44866
Wed Aug 02 11:07:10 UTC 2017, org.apache.hadoop.hbase.
client.RpcRetryingCaller@475db952, java.net.ConnectException: Connection
refused
Wed Aug 02 11:07:14 UTC 2017, org.apache.hadoop.hbase.
client.RpcRetryingCaller@475db952, java.net.ConnectException: Connection
refused
Wed Aug 02 11:07:24 UTC 2017, org.apache.hadoop.hbase.
client.RpcRetryingCaller@475db952, java.net.ConnectException: Connection
refused
Wed Aug 02 11:07:34 UTC 2017, org.apache.hadoop.hbase.
client.RpcRetryingCaller@475db952, java.net.ConnectException: Connection
refused
Wed Aug 02 11:07:44 UTC 2017, org.apache.hadoop.hbase.
client.RpcRetryingCaller@475db952, java.net.ConnectException: Connection
refused
Wed Aug 02 11:07:54 UTC 2017, org.apache.hadoop.hbase.
client.RpcRetryingCaller@475db952, java.net.ConnectException: Connection
refused
Wed Aug 02 11:08:15 UTC 2017, org.apache.hadoop.hbase.
client.RpcRetryingCaller@475db952, java.net.ConnectException: Connection
refused
Wed Aug 02 11:08:35 UTC 2017, org.apache.hadoop.hbase.
client.RpcRetryingCaller@475db952, java.net.ConnectException: Connection
refused
Wed Aug 02 11:08:55 UTC 2017, org.apache.hadoop.hbase.
client.RpcRetryingCaller@475db952, java.net.ConnectException: Connection
refused
Wed Aug 02 11:09:15 UTC 2017, org.apache.hadoop.hbase.
client.RpcRetryingCaller@475db952, java.net.ConnectException: Connection
refused
Wed Aug 02 11:09:35 UTC 2017, org.apache.hadoop.hbase.
client.RpcRetryingCaller@475db952, java.net.ConnectException: Connection
refused
Wed Aug 02 11:09:55 UTC 2017, org.apache.hadoop.hbase.
client.RpcRetryingCaller@475db952, java.net.ConnectException: Connection
refused
Wed Aug 02 11:10:15 UTC 2017, org.apache.hadoop.hbase.
client.RpcRetryingCaller@475db952, java.net.ConnectException: Connection
refused
Wed Aug 02 11:10:35 UTC 2017, org.apache.hadoop.hbase.
client.RpcRetryingCaller@475db952, java.net.ConnectException: Connection
refused
Wed Aug 02 11:10:55 UTC 2017, org.apache.hadoop.hbase.
client.RpcRetryingCaller@475db952, java.net.ConnectException: Connection
refused
Wed Aug 02 11:11:15 UTC 2017, org.apache.hadoop.hbase.
client.RpcRetryingCaller@475db952, java.net.ConnectException: Connection
refused
Wed Aug 02 11:11:35 UTC 2017, org.apache.hadoop.hbase.
client.RpcRetryingCaller@475db952, java.net.ConnectException: Connection
refused
Wed Aug 02 11:11:55 UTC 2017, org.apache.hadoop.hbase.
client.RpcRetryingCaller@475db952, java.net.ConnectException: Connection
refused
Wed Aug 02 11:12:15 UTC 2017, org.apache.hadoop.hbase.
client.RpcRetryingCaller@475db952, java.net.ConnectException: Connection
refused
Wed Aug 02 11:12:35 UTC 2017, org.apache.hadoop.hbase.
client.RpcRetryingCaller@475db952, java.net.ConnectException: Connection
refused
Wed Aug 02 11:12:56 UTC 2017, org.apache.hadoop.hbase.
client.RpcRetryingCaller@475db952, java.net.ConnectException: Connection
refused
Wed Aug 02 11:13:16 UTC 2017, org.apache.hadoop.hbase.
client.RpcRetryingCaller@475db952, java.net.ConnectException: Connection
refused
Wed Aug 02 11:13:36 UTC 2017, org.apache.hadoop.hbase.
client.RpcRetryingCaller@475db952, java.net.ConnectException: Connection
refused
Wed Aug 02 11:13:56 UTC 2017, org.apache.hadoop.hbase.
client.RpcRetryingCaller@475db952, java.net.ConnectException: Connection
refused
Wed Aug 02 11:14:16 UTC 2017, org.apache.hadoop.hbase.
client.RpcRetryingCaller@475db952, java.net.ConnectException: Connection
refused
Wed Aug 02 11:14:36 UTC 2017, org.apache.hadoop.hbase.
client.RpcRetryingCaller@475db952, java.net.ConnectException: Connection
refused
Wed Aug 02 11:14:56 UTC 2017, org.apache.hadoop.hbase.
client.RpcRetryingCaller@475db952, java.net.ConnectException: Connection
refused
Wed Aug 02 11:15:16 UTC 2017, org.apache.hadoop.hbase.
client.RpcRetryingCaller@475db952, java.net.ConnectException: Connection
refused
Wed Aug 02 11:15:36 UTC 2017, org.apache.hadoop.hbase.
client.RpcRetryingCaller@475db952, java.net.ConnectException: Connection
refused
Wed Aug 02 11:15:56 UTC 2017, org.apache.hadoop.hbase.
client.RpcRetryingCaller@475db952, java.net.ConnectException: Connection
refused
Wed Aug 02 11:16:17 UTC 2017, org.apache.hadoop.hbase.
client.RpcRetryingCaller@475db952, java.net.ConnectException: Connection
refused

at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(
RpcRetryingCaller.java:129)
at org.apache.hadoop.hbase.client.HTable.getRowOrBefore(HTable.java:714)
at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:144)
at org.apache.hadoop.hbase.client.HConnectionManager$
HConnectionImplementation.prefetchRegionCache(HConnectionManager.java:1153)
at org.apache.hadoop.hbase.client.HConnectionManager$
HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:1217)
at org.apache.hadoop.hbase.client.HConnectionManager$
HConnectionImplementation.locateRegion(HConnectionManager.java:1105)
at org.apache.hadoop.hbase.client.HConnectionManager$
HConnectionImplementation.locateRegion(HConnectionManager.java:1062)
at org.apache.hadoop.hbase.client.AsyncProcess.
findDestLocation(AsyncProcess.java:365)
at org.apache.hadoop.hbase.client.AsyncProcess.submit(AsyncProcess.java:507)
at org.apache.hadoop.hbase.client.AsyncProcess.logAndResubmit(AsyncProcess.
java:717)
at org.apache.hadoop.hbase.client.AsyncProcess.receiveGlobalFailure(
AsyncProcess.java:664)
at org.apache.hadoop.hbase.client.AsyncProcess.access$
100(AsyncProcess.java:93)
at org.apache.hadoop.hbase.client.AsyncProcess$1.run(AsyncProcess.java:547)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(
ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(
ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
at org.apache.hadoop.net.SocketIOWithTimeout.connect(
SocketIOWithTimeout.java:206)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:531)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:495)
at org.apache.hadoop.hbase.ipc.RpcClient$Connection.
setupConnection(RpcClient.java:578)
at org.apache.hadoop.hbase.ipc.RpcClient$Connection.
setupIOstreams(RpcClient.java:868)
at org.apache.hadoop.hbase.ipc.RpcClient.getConnection(RpcClient.java:1543)
at org.apache.hadoop.hbase.ipc.RpcClient.call(RpcClient.java:1442)
at org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(
RpcClient.java:1661)
at org.apache.hadoop.hbase.ipc.RpcClient$BlockingRpcChannelImplementati
on.callBlockingMethod(RpcClient.java:1719)
at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$
BlockingStub.get(ClientProtos.java:29966)
at org.apache.hadoop.hbase.protobuf.ProtobufUtil.
getRowOrBefore(ProtobufUtil.java:1508)
at org.apache.hadoop.hbase.client.HTable$2.call(HTable.java:710)
at org.apache.hadoop.hbase.client.HTable$2.call(HTable.java:708)
at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(
RpcRetryingCaller.java:114)
... 17 more
2017-08-02 11:21:04,430 ERROR org.apache.spark.scheduler.LiveListenerBus
[Thread-3] - SparkListenerBus has already stopped! Dropping event
SparkListenerStageCompleted(org.apache.spark.scheduler.StageInfo@66c4a5d2)
2017-08-02 11:21:04,431 ERROR org.apache.spark.scheduler.LiveListenerBus
[Thread-3] - SparkListenerBus has already stopped! Dropping event
SparkListenerJobEnd(0,1501672864431,JobFailed(org.apache.spark.SparkException:
Job 0 cancelled because SparkContext was shut down))
2017-08-02 11:28:47,129 INFO
org.apache.predictionio.tools.commands.Management$
[main] - Inspecting PredictionIO...
2017-08-02 11:28:47,132 INFO
org.apache.predictionio.tools.commands.Management$
[main] - PredictionIO 0.11.0-incubating is installed at
/opt/data/PredictionIO-0.11.0-incubating
2017-08-02 11:28:47,132 INFO
org.apache.predictionio.tools.commands.Management$
[main] - Inspecting Apache Spark...
2017-08-02 11:28:47,142 INFO
org.apache.predictionio.tools.commands.Management$
[main] - Apache Spark is installed at /usr/local/spark
2017-08-02 11:28:47,175 INFO
org.apache.predictionio.tools.commands.Management$
[main] - Apache Spark 1.6.3 detected (meets minimum requirement of 1.3.0)
2017-08-02 11:28:47,175 INFO
org.apache.predictionio.tools.commands.Management$
[main] - Inspecting storage backend connections...
2017-08-02 11:28:47,195 INFO  org.apache.predictionio.data.storage.Storage$
[main] - Verifying Meta Data Backend (Source: ELASTICSEARCH)...
2017-08-02 11:28:48,225 INFO  org.apache.predictionio.data.storage.Storage$
[main] - Verifying Model Data Backend (Source: HDFS)...
2017-08-02 11:28:48,447 INFO  org.apache.predictionio.data.storage.Storage$
[main] - Verifying Event Data Backend (Source: HBASE)...
2017-08-02 11:28:48,979 INFO  org.apache.predictionio.data.storage.Storage$
[main] - Test writing to Event Store (App Id 0)...
2017-08-02 11:29:49,026 ERROR
org.apache.predictionio.tools.commands.Management$
[main] - Unable to connect to all storage backends successfully.






On the other hand, once this happens, if I run pio status this is what I
obtain:

aml@ip-10-41-11-227:~$ pio status
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/data/PredictionIO-0.11.0-
incubating/lib/spark/pio-data-hdfs-assembly-0.11.0-
incubating.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/data/PredictionIO-0.11.0-
incubating/lib/pio-assembly-0.11.0-incubating.jar!/org/
slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an
explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
[INFO] [Management$] Inspecting PredictionIO...
[INFO] [Management$] PredictionIO 0.11.0-incubating is installed at
/opt/data/PredictionIO-0.11.0-incubating
[INFO] [Management$] Inspecting Apache Spark...
[INFO] [Management$] Apache Spark is installed at /usr/local/spark
[INFO] [Management$] Apache Spark 1.6.3 detected (meets minimum requirement
of 1.3.0)
[INFO] [Management$] Inspecting storage backend connections...
[INFO] [Storage$] Verifying Meta Data Backend (Source: ELASTICSEARCH)...
[INFO] [Storage$] Verifying Model Data Backend (Source: HDFS)...
[INFO] [Storage$] Verifying Event Data Backend (Source: HBASE)...
[INFO] [Storage$] Test writing to Event Store (App Id 0)...
[ERROR] [Management$] Unable to connect to all storage backends
successfully.
The following shows the error message from the storage backend.

Failed after attempts=1, exceptions:
Wed Aug 02 11:45:04 UTC 2017, org.apache.hadoop.hbase.
client.RpcRetryingCaller@43045f9f, java.net.SocketTimeoutException: Call to
localhost/127.0.0.1:39562 failed because java.net.SocketTimeoutException:
60000 millis timeout while waiting for channel to be ready for read. ch :
java.nio.channels.SocketChannel[connected local=/127.0.0.1:51462
 remote=localhost/127.0.0.1:39562]
 (org.apache.hadoop.hbase.client.RetriesExhaustedException)

Dumping configuration of initialized storage backend sources.
Please make sure they are correct.

Source Name: ELASTICSEARCH; Type: elasticsearch; Configuration: HOSTS ->
127.0.0.1, TYPE -> elasticsearch, CLUSTERNAME -> elasticsearch
Source Name: HBASE; Type: hbase; Configuration: TYPE -> hbase
Source Name: HDFS; Type: hdfs; Configuration: TYPE -> hdfs, PATH -> /models

Do you know what is the problem? How can I restart the services once the
system fails?

Thanks.

Carlos Vidal.

Re: Error when importing data

Posted by Mahesh Hegde <ma...@tracxn.com>.
Try importing it in batches, like split file into 28 1 GB files and import
one after the other.

File can be split , provided each line have one event. here is how you can
split with simple command
lets say you have in total 12k lines in your original 28GB file (use wc -l
bigfile.json to do this), then first file would be
head -1000 bigfile.json > chunk1.json
second chunk file would be
head -2000 | tail -1000 > chunk2.json
and third chunk file would be
head -3000 | tail -1000 > chunk3.json

I am sure you can write a tiny script to do this. then you can easily
import chunk by chunk with same script.

-Mahesh

On Wed, Aug 2, 2017 at 5:22 PM, Carlos Vidal <ca...@beeva.com> wrote:

> Hello,
>
> I have installed the pio + ur AMI in AWS, in an m4.2xlarge instance with
> 32GB of RAM and 8 VCPU.
>
> When I try to import a 20GB events file por my application, the system
> crashes. The command I have used is:
>
>
> pio import --appid 4 --input my_events.json
>
> this command launch an spark job that needs to perform 800 task. When the
> process reaches the task 211 it crashes. This is what I can see in my
> pio.log file:
>
> 2017-08-02 11:16:17,101 WARN  org.apache.hadoop.hbase.clien
> t.HConnectionManager$HConnectionImplementation [htable-pool230-t1] -
> Encountered problems when prefetch hbase:meta table:
> org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed after
> attempts=35, exceptions:
> Wed Aug 02 11:07:06 UTC 2017, org.apache.hadoop.hbase.client
> .RpcRetryingCaller@475db952, org.apache.hadoop.hbase.ipc.RpcClient$FailedServerException:
> This server is in the failed servers list: localhost/127.0.0.1:44866
> Wed Aug 02 11:07:07 UTC 2017, org.apache.hadoop.hbase.client
> .RpcRetryingCaller@475db952, org.apache.hadoop.hbase.ipc.RpcClient$FailedServerException:
> This server is in the failed servers list: localhost/127.0.0.1:44866
> Wed Aug 02 11:07:07 UTC 2017, org.apache.hadoop.hbase.client
> .RpcRetryingCaller@475db952, org.apache.hadoop.hbase.ipc.RpcClient$FailedServerException:
> This server is in the failed servers list: localhost/127.0.0.1:44866
> Wed Aug 02 11:07:08 UTC 2017, org.apache.hadoop.hbase.client
> .RpcRetryingCaller@475db952, org.apache.hadoop.hbase.ipc.RpcClient$FailedServerException:
> This server is in the failed servers list: localhost/127.0.0.1:44866
> Wed Aug 02 11:07:10 UTC 2017, org.apache.hadoop.hbase.client
> .RpcRetryingCaller@475db952, java.net.ConnectException: Connection refused
> Wed Aug 02 11:07:14 UTC 2017, org.apache.hadoop.hbase.client
> .RpcRetryingCaller@475db952, java.net.ConnectException: Connection refused
> Wed Aug 02 11:07:24 UTC 2017, org.apache.hadoop.hbase.client
> .RpcRetryingCaller@475db952, java.net.ConnectException: Connection refused
> Wed Aug 02 11:07:34 UTC 2017, org.apache.hadoop.hbase.client
> .RpcRetryingCaller@475db952, java.net.ConnectException: Connection refused
> Wed Aug 02 11:07:44 UTC 2017, org.apache.hadoop.hbase.client
> .RpcRetryingCaller@475db952, java.net.ConnectException: Connection refused
> Wed Aug 02 11:07:54 UTC 2017, org.apache.hadoop.hbase.client
> .RpcRetryingCaller@475db952, java.net.ConnectException: Connection refused
> Wed Aug 02 11:08:15 UTC 2017, org.apache.hadoop.hbase.client
> .RpcRetryingCaller@475db952, java.net.ConnectException: Connection refused
> Wed Aug 02 11:08:35 UTC 2017, org.apache.hadoop.hbase.client
> .RpcRetryingCaller@475db952, java.net.ConnectException: Connection refused
> Wed Aug 02 11:08:55 UTC 2017, org.apache.hadoop.hbase.client
> .RpcRetryingCaller@475db952, java.net.ConnectException: Connection refused
> Wed Aug 02 11:09:15 UTC 2017, org.apache.hadoop.hbase.client
> .RpcRetryingCaller@475db952, java.net.ConnectException: Connection refused
> Wed Aug 02 11:09:35 UTC 2017, org.apache.hadoop.hbase.client
> .RpcRetryingCaller@475db952, java.net.ConnectException: Connection refused
> Wed Aug 02 11:09:55 UTC 2017, org.apache.hadoop.hbase.client
> .RpcRetryingCaller@475db952, java.net.ConnectException: Connection refused
> Wed Aug 02 11:10:15 UTC 2017, org.apache.hadoop.hbase.client
> .RpcRetryingCaller@475db952, java.net.ConnectException: Connection refused
> Wed Aug 02 11:10:35 UTC 2017, org.apache.hadoop.hbase.client
> .RpcRetryingCaller@475db952, java.net.ConnectException: Connection refused
> Wed Aug 02 11:10:55 UTC 2017, org.apache.hadoop.hbase.client
> .RpcRetryingCaller@475db952, java.net.ConnectException: Connection refused
> Wed Aug 02 11:11:15 UTC 2017, org.apache.hadoop.hbase.client
> .RpcRetryingCaller@475db952, java.net.ConnectException: Connection refused
> Wed Aug 02 11:11:35 UTC 2017, org.apache.hadoop.hbase.client
> .RpcRetryingCaller@475db952, java.net.ConnectException: Connection refused
> Wed Aug 02 11:11:55 UTC 2017, org.apache.hadoop.hbase.client
> .RpcRetryingCaller@475db952, java.net.ConnectException: Connection refused
> Wed Aug 02 11:12:15 UTC 2017, org.apache.hadoop.hbase.client
> .RpcRetryingCaller@475db952, java.net.ConnectException: Connection refused
> Wed Aug 02 11:12:35 UTC 2017, org.apache.hadoop.hbase.client
> .RpcRetryingCaller@475db952, java.net.ConnectException: Connection refused
> Wed Aug 02 11:12:56 UTC 2017, org.apache.hadoop.hbase.client
> .RpcRetryingCaller@475db952, java.net.ConnectException: Connection refused
> Wed Aug 02 11:13:16 UTC 2017, org.apache.hadoop.hbase.client
> .RpcRetryingCaller@475db952, java.net.ConnectException: Connection refused
> Wed Aug 02 11:13:36 UTC 2017, org.apache.hadoop.hbase.client
> .RpcRetryingCaller@475db952, java.net.ConnectException: Connection refused
> Wed Aug 02 11:13:56 UTC 2017, org.apache.hadoop.hbase.client
> .RpcRetryingCaller@475db952, java.net.ConnectException: Connection refused
> Wed Aug 02 11:14:16 UTC 2017, org.apache.hadoop.hbase.client
> .RpcRetryingCaller@475db952, java.net.ConnectException: Connection refused
> Wed Aug 02 11:14:36 UTC 2017, org.apache.hadoop.hbase.client
> .RpcRetryingCaller@475db952, java.net.ConnectException: Connection refused
> Wed Aug 02 11:14:56 UTC 2017, org.apache.hadoop.hbase.client
> .RpcRetryingCaller@475db952, java.net.ConnectException: Connection refused
> Wed Aug 02 11:15:16 UTC 2017, org.apache.hadoop.hbase.client
> .RpcRetryingCaller@475db952, java.net.ConnectException: Connection refused
> Wed Aug 02 11:15:36 UTC 2017, org.apache.hadoop.hbase.client
> .RpcRetryingCaller@475db952, java.net.ConnectException: Connection refused
> Wed Aug 02 11:15:56 UTC 2017, org.apache.hadoop.hbase.client
> .RpcRetryingCaller@475db952, java.net.ConnectException: Connection refused
> Wed Aug 02 11:16:17 UTC 2017, org.apache.hadoop.hbase.client
> .RpcRetryingCaller@475db952, java.net.ConnectException: Connection refused
>
> at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRet
> ries(RpcRetryingCaller.java:129)
> at org.apache.hadoop.hbase.client.HTable.getRowOrBefore(HTable.java:714)
> at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScan
> ner.java:144)
> at org.apache.hadoop.hbase.client.HConnectionManager$HConnectio
> nImplementation.prefetchRegionCache(HConnectionManager.java:1153)
> at org.apache.hadoop.hbase.client.HConnectionManager$HConnectio
> nImplementation.locateRegionInMeta(HConnectionManager.java:1217)
> at org.apache.hadoop.hbase.client.HConnectionManager$HConnectio
> nImplementation.locateRegion(HConnectionManager.java:1105)
> at org.apache.hadoop.hbase.client.HConnectionManager$HConnectio
> nImplementation.locateRegion(HConnectionManager.java:1062)
> at org.apache.hadoop.hbase.client.AsyncProcess.findDestLocation
> (AsyncProcess.java:365)
> at org.apache.hadoop.hbase.client.AsyncProcess.submit(AsyncProc
> ess.java:507)
> at org.apache.hadoop.hbase.client.AsyncProcess.logAndResubmit(
> AsyncProcess.java:717)
> at org.apache.hadoop.hbase.client.AsyncProcess.receiveGlobalFai
> lure(AsyncProcess.java:664)
> at org.apache.hadoop.hbase.client.AsyncProcess.access$100(
> AsyncProcess.java:93)
> at org.apache.hadoop.hbase.client.AsyncProcess$1.run(AsyncProce
> ss.java:547)
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPool
> Executor.java:1149)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoo
> lExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> Caused by: java.net.ConnectException: Connection refused
> at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
> at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
> at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWi
> thTimeout.java:206)
> at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:531)
> at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:495)
> at org.apache.hadoop.hbase.ipc.RpcClient$Connection.setupConnec
> tion(RpcClient.java:578)
> at org.apache.hadoop.hbase.ipc.RpcClient$Connection.setupIOstre
> ams(RpcClient.java:868)
> at org.apache.hadoop.hbase.ipc.RpcClient.getConnection(RpcClien
> t.java:1543)
> at org.apache.hadoop.hbase.ipc.RpcClient.call(RpcClient.java:1442)
> at org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(Rpc
> Client.java:1661)
> at org.apache.hadoop.hbase.ipc.RpcClient$BlockingRpcChannelImpl
> ementation.callBlockingMethod(RpcClient.java:1719)
> at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$
> ClientService$BlockingStub.get(ClientProtos.java:29966)
> at org.apache.hadoop.hbase.protobuf.ProtobufUtil.getRowOrBefore
> (ProtobufUtil.java:1508)
> at org.apache.hadoop.hbase.client.HTable$2.call(HTable.java:710)
> at org.apache.hadoop.hbase.client.HTable$2.call(HTable.java:708)
> at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRet
> ries(RpcRetryingCaller.java:114)
> ... 17 more
> 2017-08-02 11:21:04,430 ERROR org.apache.spark.scheduler.LiveListenerBus
> [Thread-3] - SparkListenerBus has already stopped! Dropping event
> SparkListenerStageCompleted(org.apache.spark.scheduler.StageInfo@66c4a5d2)
> 2017-08-02 11:21:04,431 ERROR org.apache.spark.scheduler.LiveListenerBus
> [Thread-3] - SparkListenerBus has already stopped! Dropping event
> SparkListenerJobEnd(0,1501672864431,JobFailed(org.apache.spark.SparkException:
> Job 0 cancelled because SparkContext was shut down))
> 2017-08-02 11:28:47,129 INFO  org.apache.predictionio.tools.commands.Management$
> [main] - Inspecting PredictionIO...
> 2017-08-02 11:28:47,132 INFO  org.apache.predictionio.tools.commands.Management$
> [main] - PredictionIO 0.11.0-incubating is installed at
> /opt/data/PredictionIO-0.11.0-incubating
> 2017-08-02 11:28:47,132 INFO  org.apache.predictionio.tools.commands.Management$
> [main] - Inspecting Apache Spark...
> 2017-08-02 11:28:47,142 INFO  org.apache.predictionio.tools.commands.Management$
> [main] - Apache Spark is installed at /usr/local/spark
> 2017-08-02 11:28:47,175 INFO  org.apache.predictionio.tools.commands.Management$
> [main] - Apache Spark 1.6.3 detected (meets minimum requirement of 1.3.0)
> 2017-08-02 11:28:47,175 INFO  org.apache.predictionio.tools.commands.Management$
> [main] - Inspecting storage backend connections...
> 2017-08-02 11:28:47,195 INFO  org.apache.predictionio.data.storage.Storage$
> [main] - Verifying Meta Data Backend (Source: ELASTICSEARCH)...
> 2017-08-02 11:28:48,225 INFO  org.apache.predictionio.data.storage.Storage$
> [main] - Verifying Model Data Backend (Source: HDFS)...
> 2017-08-02 11:28:48,447 INFO  org.apache.predictionio.data.storage.Storage$
> [main] - Verifying Event Data Backend (Source: HBASE)...
> 2017-08-02 11:28:48,979 INFO  org.apache.predictionio.data.storage.Storage$
> [main] - Test writing to Event Store (App Id 0)...
> 2017-08-02 11:29:49,026 ERROR org.apache.predictionio.tools.commands.Management$
> [main] - Unable to connect to all storage backends successfully.
>
>
>
>
>
>
> On the other hand, once this happens, if I run pio status this is what I
> obtain:
>
> aml@ip-10-41-11-227:~$ pio status
> SLF4J: Class path contains multiple SLF4J bindings.
> SLF4J: Found binding in [jar:file:/opt/data/Prediction
> IO-0.11.0-incubating/lib/spark/pio-data-hdfs-assembly-
> 0.11.0-incubating.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in [jar:file:/opt/data/Prediction
> IO-0.11.0-incubating/lib/pio-assembly-0.11.0-incubating.
> jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an
> explanation.
> SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
> [INFO] [Management$] Inspecting PredictionIO...
> [INFO] [Management$] PredictionIO 0.11.0-incubating is installed at
> /opt/data/PredictionIO-0.11.0-incubating
> [INFO] [Management$] Inspecting Apache Spark...
> [INFO] [Management$] Apache Spark is installed at /usr/local/spark
> [INFO] [Management$] Apache Spark 1.6.3 detected (meets minimum
> requirement of 1.3.0)
> [INFO] [Management$] Inspecting storage backend connections...
> [INFO] [Storage$] Verifying Meta Data Backend (Source: ELASTICSEARCH)...
> [INFO] [Storage$] Verifying Model Data Backend (Source: HDFS)...
> [INFO] [Storage$] Verifying Event Data Backend (Source: HBASE)...
> [INFO] [Storage$] Test writing to Event Store (App Id 0)...
> [ERROR] [Management$] Unable to connect to all storage backends
> successfully.
> The following shows the error message from the storage backend.
>
> Failed after attempts=1, exceptions:
> Wed Aug 02 11:45:04 UTC 2017, org.apache.hadoop.hbase.client
> .RpcRetryingCaller@43045f9f, java.net.SocketTimeoutException: Call to
> localhost/127.0.0.1:39562 failed because java.net.SocketTimeoutException:
> 60000 millis timeout while waiting for channel to be ready for read. ch :
> java.nio.channels.SocketChannel[connected local=/127.0.0.1:51462 remote=
> localhost/127.0.0.1:39562]
>  (org.apache.hadoop.hbase.client.RetriesExhaustedException)
>
> Dumping configuration of initialized storage backend sources.
> Please make sure they are correct.
>
> Source Name: ELASTICSEARCH; Type: elasticsearch; Configuration: HOSTS ->
> 127.0.0.1, TYPE -> elasticsearch, CLUSTERNAME -> elasticsearch
> Source Name: HBASE; Type: hbase; Configuration: TYPE -> hbase
> Source Name: HDFS; Type: hdfs; Configuration: TYPE -> hdfs, PATH -> /models
>
> Do you know what is the problem? How can I restart the services once the
> system fails?
>
> Thanks.
>
> Carlos Vidal.
>

-- 
This email is subject to Tracxn's Email Policy 
<https://tracxn.com/emailpolicy>

Re: Error when importing data

Posted by Pat Ferrel <pa...@occamsmachete.com>.
BTW a single machine installation is not likely to be good for production because of the possible resource contention issue. So you should think of it as a way to experiment and get an idea of how things work, what the input and tuning looks like, etc. Then move to a multi-machine cluster for production, if only because it limits resource contention. The cluster will use smaller machines than a single machine with all-in-one.

If you want actual results with enough data to make good recommendations, the quickest way may be to get a bigger instance (vertical scaling) but consider splitting this apart for production.


On Aug 3, 2017, at 8:32 AM, Pat Ferrel <pa...@occamsmachete.com> wrote:

It should be easy to try a smaller batch of data first since we are just guessing


On Aug 2, 2017, at 11:22 PM, Carlos Vidal <carlos.vidal@beeva.com <ma...@beeva.com>> wrote:

Hello Mahesh, Pat

Thanks for your answers. I will try with a bigger EC2 instance.

Carlos.

2017-08-02 18:42 GMT+02:00 Pat Ferrel <pat@occamsmachete.com <ma...@occamsmachete.com>>:
Actually memory may be your problem. Mahesh Hegde may be right about trying smaller sets. Since it sounds like you have all services running on one machine, they may be in contention for resources.


On Aug 2, 2017, at 9:35 AM, Pat Ferrel <pat@occamsmachete.com <ma...@occamsmachete.com>> wrote:

Something is not configured correctly `pio import` should work with any size of file but this may be an undersized instance for that much data.

Spark needs memory, it keeps all data that it needs for a particular calculation spread across all cluster machines in memory. That includes derived data so a total of 32g may not be enough. But that is not your current problem.

I would start by verifying that all components are working properly, starting with HDFS, then HBase, then Spark, then Elasticsearch. I see several storage backend errors below.



On Aug 2, 2017, at 4:52 AM, Carlos Vidal <carlos.vidal@beeva.com <ma...@beeva.com>> wrote:

Hello,

I have installed the pio + ur AMI in AWS, in an m4.2xlarge instance with 32GB of RAM and 8 VCPU. 

When I try to import a 20GB events file por my application, the system crashes. The command I have used is:


pio import --appid 4 --input my_events.json

this command launch an spark job that needs to perform 800 task. When the process reaches the task 211 it crashes. This is what I can see in my pio.log file:

2017-08-02 11:16:17,101 WARN  org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation [htable-pool230-t1] - Encountered problems when prefetch hbase:meta table: 
org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed after attempts=35, exceptions:
Wed Aug 02 11:07:06 UTC 2017, org.apache.hadoop.hbase.client.RpcRetryingCaller@475db952, org.apache.hadoop.hbase.ipc.RpcClient$FailedServerException: This server is in the failed servers list: localhost/127.0.0.1:44866 <http://127.0.0.1:44866/>
Wed Aug 02 11:07:07 UTC 2017, org.apache.hadoop.hbase.client.RpcRetryingCaller@475db952, org.apache.hadoop.hbase.ipc.RpcClient$FailedServerException: This server is in the failed servers list: localhost/127.0.0.1:44866 <http://127.0.0.1:44866/>
Wed Aug 02 11:07:07 UTC 2017, org.apache.hadoop.hbase.client.RpcRetryingCaller@475db952, org.apache.hadoop.hbase.ipc.RpcClient$FailedServerException: This server is in the failed servers list: localhost/127.0.0.1:44866 <http://127.0.0.1:44866/>
Wed Aug 02 11:07:08 UTC 2017, org.apache.hadoop.hbase.client.RpcRetryingCaller@475db952, org.apache.hadoop.hbase.ipc.RpcClient$FailedServerException: This server is in the failed servers list: localhost/127.0.0.1:44866 <http://127.0.0.1:44866/>
Wed Aug 02 11:07:10 UTC 2017, org.apache.hadoop.hbase.client.RpcRetryingCaller@475db952, java.net.ConnectException: Connection refused
Wed Aug 02 11:07:14 UTC 2017, org.apache.hadoop.hbase.client.RpcRetryingCaller@475db952, java.net.ConnectException: Connection refused
Wed Aug 02 11:07:24 UTC 2017, org.apache.hadoop.hbase.client.RpcRetryingCaller@475db952, java.net.ConnectException: Connection refused
Wed Aug 02 11:07:34 UTC 2017, org.apache.hadoop.hbase.client.RpcRetryingCaller@475db952, java.net.ConnectException: Connection refused
Wed Aug 02 11:07:44 UTC 2017, org.apache.hadoop.hbase.client.RpcRetryingCaller@475db952, java.net.ConnectException: Connection refused
Wed Aug 02 11:07:54 UTC 2017, org.apache.hadoop.hbase.client.RpcRetryingCaller@475db952, java.net.ConnectException: Connection refused
Wed Aug 02 11:08:15 UTC 2017, org.apache.hadoop.hbase.client.RpcRetryingCaller@475db952, java.net.ConnectException: Connection refused
Wed Aug 02 11:08:35 UTC 2017, org.apache.hadoop.hbase.client.RpcRetryingCaller@475db952, java.net.ConnectException: Connection refused
Wed Aug 02 11:08:55 UTC 2017, org.apache.hadoop.hbase.client.RpcRetryingCaller@475db952, java.net.ConnectException: Connection refused
Wed Aug 02 11:09:15 UTC 2017, org.apache.hadoop.hbase.client.RpcRetryingCaller@475db952, java.net.ConnectException: Connection refused
Wed Aug 02 11:09:35 UTC 2017, org.apache.hadoop.hbase.client.RpcRetryingCaller@475db952, java.net.ConnectException: Connection refused
Wed Aug 02 11:09:55 UTC 2017, org.apache.hadoop.hbase.client.RpcRetryingCaller@475db952, java.net.ConnectException: Connection refused
Wed Aug 02 11:10:15 UTC 2017, org.apache.hadoop.hbase.client.RpcRetryingCaller@475db952, java.net.ConnectException: Connection refused
Wed Aug 02 11:10:35 UTC 2017, org.apache.hadoop.hbase.client.RpcRetryingCaller@475db952, java.net.ConnectException: Connection refused
Wed Aug 02 11:10:55 UTC 2017, org.apache.hadoop.hbase.client.RpcRetryingCaller@475db952, java.net.ConnectException: Connection refused
Wed Aug 02 11:11:15 UTC 2017, org.apache.hadoop.hbase.client.RpcRetryingCaller@475db952, java.net.ConnectException: Connection refused
Wed Aug 02 11:11:35 UTC 2017, org.apache.hadoop.hbase.client.RpcRetryingCaller@475db952, java.net.ConnectException: Connection refused
Wed Aug 02 11:11:55 UTC 2017, org.apache.hadoop.hbase.client.RpcRetryingCaller@475db952, java.net.ConnectException: Connection refused
Wed Aug 02 11:12:15 UTC 2017, org.apache.hadoop.hbase.client.RpcRetryingCaller@475db952, java.net.ConnectException: Connection refused
Wed Aug 02 11:12:35 UTC 2017, org.apache.hadoop.hbase.client.RpcRetryingCaller@475db952, java.net.ConnectException: Connection refused
Wed Aug 02 11:12:56 UTC 2017, org.apache.hadoop.hbase.client.RpcRetryingCaller@475db952, java.net.ConnectException: Connection refused
Wed Aug 02 11:13:16 UTC 2017, org.apache.hadoop.hbase.client.RpcRetryingCaller@475db952, java.net.ConnectException: Connection refused
Wed Aug 02 11:13:36 UTC 2017, org.apache.hadoop.hbase.client.RpcRetryingCaller@475db952, java.net.ConnectException: Connection refused
Wed Aug 02 11:13:56 UTC 2017, org.apache.hadoop.hbase.client.RpcRetryingCaller@475db952, java.net.ConnectException: Connection refused
Wed Aug 02 11:14:16 UTC 2017, org.apache.hadoop.hbase.client.RpcRetryingCaller@475db952, java.net.ConnectException: Connection refused
Wed Aug 02 11:14:36 UTC 2017, org.apache.hadoop.hbase.client.RpcRetryingCaller@475db952, java.net.ConnectException: Connection refused
Wed Aug 02 11:14:56 UTC 2017, org.apache.hadoop.hbase.client.RpcRetryingCaller@475db952, java.net.ConnectException: Connection refused
Wed Aug 02 11:15:16 UTC 2017, org.apache.hadoop.hbase.client.RpcRetryingCaller@475db952, java.net.ConnectException: Connection refused
Wed Aug 02 11:15:36 UTC 2017, org.apache.hadoop.hbase.client.RpcRetryingCaller@475db952, java.net.ConnectException: Connection refused
Wed Aug 02 11:15:56 UTC 2017, org.apache.hadoop.hbase.client.RpcRetryingCaller@475db952, java.net.ConnectException: Connection refused
Wed Aug 02 11:16:17 UTC 2017, org.apache.hadoop.hbase.client.RpcRetryingCaller@475db952, java.net.ConnectException: Connection refused

	at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:129)
	at org.apache.hadoop.hbase.client.HTable.getRowOrBefore(HTable.java:714)
	at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:144)
	at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.prefetchRegionCache(HConnectionManager.java:1153)
	at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:1217)
	at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:1105)
	at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:1062)
	at org.apache.hadoop.hbase.client.AsyncProcess.findDestLocation(AsyncProcess.java:365)
	at org.apache.hadoop.hbase.client.AsyncProcess.submit(AsyncProcess.java:507)
	at org.apache.hadoop.hbase.client.AsyncProcess.logAndResubmit(AsyncProcess.java:717)
	at org.apache.hadoop.hbase.client.AsyncProcess.receiveGlobalFailure(AsyncProcess.java:664)
	at org.apache.hadoop.hbase.client.AsyncProcess.access$100(AsyncProcess.java:93)
	at org.apache.hadoop.hbase.client.AsyncProcess$1.run(AsyncProcess.java:547)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
Caused by: java.net.ConnectException: Connection refused
	at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
	at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
	at org.apache.hadoop.net <http://org.apache.hadoop.net/>.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
	at org.apache.hadoop.net <http://org.apache.hadoop.net/>.NetUtils.connect(NetUtils.java:531)
	at org.apache.hadoop.net <http://org.apache.hadoop.net/>.NetUtils.connect(NetUtils.java:495)
	at org.apache.hadoop.hbase.ipc.RpcClient$Connection.setupConnection(RpcClient.java:578)
	at org.apache.hadoop.hbase.ipc.RpcClient$Connection.setupIOstreams(RpcClient.java:868)
	at org.apache.hadoop.hbase.ipc.RpcClient.getConnection(RpcClient.java:1543)
	at org.apache.hadoop.hbase.ipc.RpcClient.call(RpcClient.java:1442)
	at org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1661)
	at org.apache.hadoop.hbase.ipc.RpcClient$BlockingRpcChannelImplementation.callBlockingMethod(RpcClient.java:1719)
	at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$BlockingStub.get(ClientProtos.java:29966)
	at org.apache.hadoop.hbase.protobuf.ProtobufUtil.getRowOrBefore(ProtobufUtil.java:1508)
	at org.apache.hadoop.hbase.client.HTable$2.call(HTable.java:710)
	at org.apache.hadoop.hbase.client.HTable$2.call(HTable.java:708)
	at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:114)
	... 17 more
2017-08-02 11:21:04,430 ERROR org.apache.spark.scheduler.LiveListenerBus [Thread-3] - SparkListenerBus has already stopped! Dropping event SparkListenerStageCompleted(org.apache.spark.scheduler.StageInfo@66c4a5d2)
2017-08-02 11:21:04,431 ERROR org.apache.spark.scheduler.LiveListenerBus [Thread-3] - SparkListenerBus has already stopped! Dropping event SparkListenerJobEnd(0,1501672864431,JobFailed(org.apache.spark.SparkException: Job 0 cancelled because SparkContext was shut down))
2017-08-02 11:28:47,129 INFO  org.apache.predictionio.tools.commands.Management$ [main] - Inspecting PredictionIO...
2017-08-02 11:28:47,132 INFO  org.apache.predictionio.tools.commands.Management$ [main] - PredictionIO 0.11.0-incubating is installed at /opt/data/PredictionIO-0.11.0-incubating
2017-08-02 11:28:47,132 INFO  org.apache.predictionio.tools.commands.Management$ [main] - Inspecting Apache Spark...
2017-08-02 11:28:47,142 INFO  org.apache.predictionio.tools.commands.Management$ [main] - Apache Spark is installed at /usr/local/spark
2017-08-02 11:28:47,175 INFO  org.apache.predictionio.tools.commands.Management$ [main] - Apache Spark 1.6.3 detected (meets minimum requirement of 1.3.0)
2017-08-02 11:28:47,175 INFO  org.apache.predictionio.tools.commands.Management$ [main] - Inspecting storage backend connections...
2017-08-02 11:28:47,195 INFO  org.apache.predictionio.data.storage.Storage$ [main] - Verifying Meta Data Backend (Source: ELASTICSEARCH)...
2017-08-02 11:28:48,225 INFO  org.apache.predictionio.data.storage.Storage$ [main] - Verifying Model Data Backend (Source: HDFS)...
2017-08-02 11:28:48,447 INFO  org.apache.predictionio.data.storage.Storage$ [main] - Verifying Event Data Backend (Source: HBASE)...
2017-08-02 11:28:48,979 INFO  org.apache.predictionio.data.storage.Storage$ [main] - Test writing to Event Store (App Id 0)...
2017-08-02 11:29:49,026 ERROR org.apache.predictionio.tools.commands.Management$ [main] - Unable to connect to all storage backends successfully.






On the other hand, once this happens, if I run pio status this is what I obtain:

aml@ip-10-41-11-227:~$ pio status
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/data/PredictionIO-0.11.0-incubating/lib/spark/pio-data-hdfs-assembly-0.11.0-incubating.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/data/PredictionIO-0.11.0-incubating/lib/pio-assembly-0.11.0-incubating.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings <http://www.slf4j.org/codes.html#multiple_bindings> for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
[INFO] [Management$] Inspecting PredictionIO...
[INFO] [Management$] PredictionIO 0.11.0-incubating is installed at /opt/data/PredictionIO-0.11.0-incubating
[INFO] [Management$] Inspecting Apache Spark...
[INFO] [Management$] Apache Spark is installed at /usr/local/spark
[INFO] [Management$] Apache Spark 1.6.3 detected (meets minimum requirement of 1.3.0)
[INFO] [Management$] Inspecting storage backend connections...
[INFO] [Storage$] Verifying Meta Data Backend (Source: ELASTICSEARCH)...
[INFO] [Storage$] Verifying Model Data Backend (Source: HDFS)...
[INFO] [Storage$] Verifying Event Data Backend (Source: HBASE)...
[INFO] [Storage$] Test writing to Event Store (App Id 0)...
[ERROR] [Management$] Unable to connect to all storage backends successfully.
The following shows the error message from the storage backend.

Failed after attempts=1, exceptions:
Wed Aug 02 11:45:04 UTC 2017, org.apache.hadoop.hbase.client.RpcRetryingCaller@43045f9f, java.net <http://java.net/>.SocketTimeoutException: Call to localhost/127.0.0.1:39562 <http://127.0.0.1:39562/> failed because java.net <http://java.net/>.SocketTimeoutException: 60000 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/127.0.0.1:51462 <http://127.0.0.1:51462/> remote=localhost/127.0.0.1:39562 <http://127.0.0.1:39562/>]
 (org.apache.hadoop.hbase.client.RetriesExhaustedException)

Dumping configuration of initialized storage backend sources.
Please make sure they are correct.

Source Name: ELASTICSEARCH; Type: elasticsearch; Configuration: HOSTS -> 127.0.0.1, TYPE -> elasticsearch, CLUSTERNAME -> elasticsearch
Source Name: HBASE; Type: hbase; Configuration: TYPE -> hbase
Source Name: HDFS; Type: hdfs; Configuration: TYPE -> hdfs, PATH -> /models

Do you know what is the problem? How can I restart the services once the system fails? 

Thanks.

Carlos Vidal.






Re: Error when importing data

Posted by Pat Ferrel <pa...@occamsmachete.com>.
It should be easy to try a smaller batch of data first since we are just guessing


On Aug 2, 2017, at 11:22 PM, Carlos Vidal <ca...@beeva.com> wrote:

Hello Mahesh, Pat

Thanks for your answers. I will try with a bigger EC2 instance.

Carlos.

2017-08-02 18:42 GMT+02:00 Pat Ferrel <pat@occamsmachete.com <ma...@occamsmachete.com>>:
Actually memory may be your problem. Mahesh Hegde may be right about trying smaller sets. Since it sounds like you have all services running on one machine, they may be in contention for resources.


On Aug 2, 2017, at 9:35 AM, Pat Ferrel <pat@occamsmachete.com <ma...@occamsmachete.com>> wrote:

Something is not configured correctly `pio import` should work with any size of file but this may be an undersized instance for that much data.

Spark needs memory, it keeps all data that it needs for a particular calculation spread across all cluster machines in memory. That includes derived data so a total of 32g may not be enough. But that is not your current problem.

I would start by verifying that all components are working properly, starting with HDFS, then HBase, then Spark, then Elasticsearch. I see several storage backend errors below.



On Aug 2, 2017, at 4:52 AM, Carlos Vidal <carlos.vidal@beeva.com <ma...@beeva.com>> wrote:

Hello,

I have installed the pio + ur AMI in AWS, in an m4.2xlarge instance with 32GB of RAM and 8 VCPU. 

When I try to import a 20GB events file por my application, the system crashes. The command I have used is:


pio import --appid 4 --input my_events.json

this command launch an spark job that needs to perform 800 task. When the process reaches the task 211 it crashes. This is what I can see in my pio.log file:

2017-08-02 11:16:17,101 WARN  org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation [htable-pool230-t1] - Encountered problems when prefetch hbase:meta table: 
org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed after attempts=35, exceptions:
Wed Aug 02 11:07:06 UTC 2017, org.apache.hadoop.hbase.client.RpcRetryingCaller@475db952, org.apache.hadoop.hbase.ipc.RpcClient$FailedServerException: This server is in the failed servers list: localhost/127.0.0.1:44866 <http://127.0.0.1:44866/>
Wed Aug 02 11:07:07 UTC 2017, org.apache.hadoop.hbase.client.RpcRetryingCaller@475db952, org.apache.hadoop.hbase.ipc.RpcClient$FailedServerException: This server is in the failed servers list: localhost/127.0.0.1:44866 <http://127.0.0.1:44866/>
Wed Aug 02 11:07:07 UTC 2017, org.apache.hadoop.hbase.client.RpcRetryingCaller@475db952, org.apache.hadoop.hbase.ipc.RpcClient$FailedServerException: This server is in the failed servers list: localhost/127.0.0.1:44866 <http://127.0.0.1:44866/>
Wed Aug 02 11:07:08 UTC 2017, org.apache.hadoop.hbase.client.RpcRetryingCaller@475db952, org.apache.hadoop.hbase.ipc.RpcClient$FailedServerException: This server is in the failed servers list: localhost/127.0.0.1:44866 <http://127.0.0.1:44866/>
Wed Aug 02 11:07:10 UTC 2017, org.apache.hadoop.hbase.client.RpcRetryingCaller@475db952, java.net.ConnectException: Connection refused
Wed Aug 02 11:07:14 UTC 2017, org.apache.hadoop.hbase.client.RpcRetryingCaller@475db952, java.net.ConnectException: Connection refused
Wed Aug 02 11:07:24 UTC 2017, org.apache.hadoop.hbase.client.RpcRetryingCaller@475db952, java.net.ConnectException: Connection refused
Wed Aug 02 11:07:34 UTC 2017, org.apache.hadoop.hbase.client.RpcRetryingCaller@475db952, java.net.ConnectException: Connection refused
Wed Aug 02 11:07:44 UTC 2017, org.apache.hadoop.hbase.client.RpcRetryingCaller@475db952, java.net.ConnectException: Connection refused
Wed Aug 02 11:07:54 UTC 2017, org.apache.hadoop.hbase.client.RpcRetryingCaller@475db952, java.net.ConnectException: Connection refused
Wed Aug 02 11:08:15 UTC 2017, org.apache.hadoop.hbase.client.RpcRetryingCaller@475db952, java.net.ConnectException: Connection refused
Wed Aug 02 11:08:35 UTC 2017, org.apache.hadoop.hbase.client.RpcRetryingCaller@475db952, java.net.ConnectException: Connection refused
Wed Aug 02 11:08:55 UTC 2017, org.apache.hadoop.hbase.client.RpcRetryingCaller@475db952, java.net.ConnectException: Connection refused
Wed Aug 02 11:09:15 UTC 2017, org.apache.hadoop.hbase.client.RpcRetryingCaller@475db952, java.net.ConnectException: Connection refused
Wed Aug 02 11:09:35 UTC 2017, org.apache.hadoop.hbase.client.RpcRetryingCaller@475db952, java.net.ConnectException: Connection refused
Wed Aug 02 11:09:55 UTC 2017, org.apache.hadoop.hbase.client.RpcRetryingCaller@475db952, java.net.ConnectException: Connection refused
Wed Aug 02 11:10:15 UTC 2017, org.apache.hadoop.hbase.client.RpcRetryingCaller@475db952, java.net.ConnectException: Connection refused
Wed Aug 02 11:10:35 UTC 2017, org.apache.hadoop.hbase.client.RpcRetryingCaller@475db952, java.net.ConnectException: Connection refused
Wed Aug 02 11:10:55 UTC 2017, org.apache.hadoop.hbase.client.RpcRetryingCaller@475db952, java.net.ConnectException: Connection refused
Wed Aug 02 11:11:15 UTC 2017, org.apache.hadoop.hbase.client.RpcRetryingCaller@475db952, java.net.ConnectException: Connection refused
Wed Aug 02 11:11:35 UTC 2017, org.apache.hadoop.hbase.client.RpcRetryingCaller@475db952, java.net.ConnectException: Connection refused
Wed Aug 02 11:11:55 UTC 2017, org.apache.hadoop.hbase.client.RpcRetryingCaller@475db952, java.net.ConnectException: Connection refused
Wed Aug 02 11:12:15 UTC 2017, org.apache.hadoop.hbase.client.RpcRetryingCaller@475db952, java.net.ConnectException: Connection refused
Wed Aug 02 11:12:35 UTC 2017, org.apache.hadoop.hbase.client.RpcRetryingCaller@475db952, java.net.ConnectException: Connection refused
Wed Aug 02 11:12:56 UTC 2017, org.apache.hadoop.hbase.client.RpcRetryingCaller@475db952, java.net.ConnectException: Connection refused
Wed Aug 02 11:13:16 UTC 2017, org.apache.hadoop.hbase.client.RpcRetryingCaller@475db952, java.net.ConnectException: Connection refused
Wed Aug 02 11:13:36 UTC 2017, org.apache.hadoop.hbase.client.RpcRetryingCaller@475db952, java.net.ConnectException: Connection refused
Wed Aug 02 11:13:56 UTC 2017, org.apache.hadoop.hbase.client.RpcRetryingCaller@475db952, java.net.ConnectException: Connection refused
Wed Aug 02 11:14:16 UTC 2017, org.apache.hadoop.hbase.client.RpcRetryingCaller@475db952, java.net.ConnectException: Connection refused
Wed Aug 02 11:14:36 UTC 2017, org.apache.hadoop.hbase.client.RpcRetryingCaller@475db952, java.net.ConnectException: Connection refused
Wed Aug 02 11:14:56 UTC 2017, org.apache.hadoop.hbase.client.RpcRetryingCaller@475db952, java.net.ConnectException: Connection refused
Wed Aug 02 11:15:16 UTC 2017, org.apache.hadoop.hbase.client.RpcRetryingCaller@475db952, java.net.ConnectException: Connection refused
Wed Aug 02 11:15:36 UTC 2017, org.apache.hadoop.hbase.client.RpcRetryingCaller@475db952, java.net.ConnectException: Connection refused
Wed Aug 02 11:15:56 UTC 2017, org.apache.hadoop.hbase.client.RpcRetryingCaller@475db952, java.net.ConnectException: Connection refused
Wed Aug 02 11:16:17 UTC 2017, org.apache.hadoop.hbase.client.RpcRetryingCaller@475db952, java.net.ConnectException: Connection refused

	at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:129)
	at org.apache.hadoop.hbase.client.HTable.getRowOrBefore(HTable.java:714)
	at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:144)
	at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.prefetchRegionCache(HConnectionManager.java:1153)
	at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:1217)
	at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:1105)
	at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:1062)
	at org.apache.hadoop.hbase.client.AsyncProcess.findDestLocation(AsyncProcess.java:365)
	at org.apache.hadoop.hbase.client.AsyncProcess.submit(AsyncProcess.java:507)
	at org.apache.hadoop.hbase.client.AsyncProcess.logAndResubmit(AsyncProcess.java:717)
	at org.apache.hadoop.hbase.client.AsyncProcess.receiveGlobalFailure(AsyncProcess.java:664)
	at org.apache.hadoop.hbase.client.AsyncProcess.access$100(AsyncProcess.java:93)
	at org.apache.hadoop.hbase.client.AsyncProcess$1.run(AsyncProcess.java:547)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
Caused by: java.net.ConnectException: Connection refused
	at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
	at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
	at org.apache.hadoop.net <http://org.apache.hadoop.net/>.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
	at org.apache.hadoop.net <http://org.apache.hadoop.net/>.NetUtils.connect(NetUtils.java:531)
	at org.apache.hadoop.net <http://org.apache.hadoop.net/>.NetUtils.connect(NetUtils.java:495)
	at org.apache.hadoop.hbase.ipc.RpcClient$Connection.setupConnection(RpcClient.java:578)
	at org.apache.hadoop.hbase.ipc.RpcClient$Connection.setupIOstreams(RpcClient.java:868)
	at org.apache.hadoop.hbase.ipc.RpcClient.getConnection(RpcClient.java:1543)
	at org.apache.hadoop.hbase.ipc.RpcClient.call(RpcClient.java:1442)
	at org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1661)
	at org.apache.hadoop.hbase.ipc.RpcClient$BlockingRpcChannelImplementation.callBlockingMethod(RpcClient.java:1719)
	at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$BlockingStub.get(ClientProtos.java:29966)
	at org.apache.hadoop.hbase.protobuf.ProtobufUtil.getRowOrBefore(ProtobufUtil.java:1508)
	at org.apache.hadoop.hbase.client.HTable$2.call(HTable.java:710)
	at org.apache.hadoop.hbase.client.HTable$2.call(HTable.java:708)
	at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:114)
	... 17 more
2017-08-02 11:21:04,430 ERROR org.apache.spark.scheduler.LiveListenerBus [Thread-3] - SparkListenerBus has already stopped! Dropping event SparkListenerStageCompleted(org.apache.spark.scheduler.StageInfo@66c4a5d2)
2017-08-02 11:21:04,431 ERROR org.apache.spark.scheduler.LiveListenerBus [Thread-3] - SparkListenerBus has already stopped! Dropping event SparkListenerJobEnd(0,1501672864431,JobFailed(org.apache.spark.SparkException: Job 0 cancelled because SparkContext was shut down))
2017-08-02 11:28:47,129 INFO  org.apache.predictionio.tools.commands.Management$ [main] - Inspecting PredictionIO...
2017-08-02 11:28:47,132 INFO  org.apache.predictionio.tools.commands.Management$ [main] - PredictionIO 0.11.0-incubating is installed at /opt/data/PredictionIO-0.11.0-incubating
2017-08-02 11:28:47,132 INFO  org.apache.predictionio.tools.commands.Management$ [main] - Inspecting Apache Spark...
2017-08-02 11:28:47,142 INFO  org.apache.predictionio.tools.commands.Management$ [main] - Apache Spark is installed at /usr/local/spark
2017-08-02 11:28:47,175 INFO  org.apache.predictionio.tools.commands.Management$ [main] - Apache Spark 1.6.3 detected (meets minimum requirement of 1.3.0)
2017-08-02 11:28:47,175 INFO  org.apache.predictionio.tools.commands.Management$ [main] - Inspecting storage backend connections...
2017-08-02 11:28:47,195 INFO  org.apache.predictionio.data.storage.Storage$ [main] - Verifying Meta Data Backend (Source: ELASTICSEARCH)...
2017-08-02 11:28:48,225 INFO  org.apache.predictionio.data.storage.Storage$ [main] - Verifying Model Data Backend (Source: HDFS)...
2017-08-02 11:28:48,447 INFO  org.apache.predictionio.data.storage.Storage$ [main] - Verifying Event Data Backend (Source: HBASE)...
2017-08-02 11:28:48,979 INFO  org.apache.predictionio.data.storage.Storage$ [main] - Test writing to Event Store (App Id 0)...
2017-08-02 11:29:49,026 ERROR org.apache.predictionio.tools.commands.Management$ [main] - Unable to connect to all storage backends successfully.






On the other hand, once this happens, if I run pio status this is what I obtain:

aml@ip-10-41-11-227:~$ pio status
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/data/PredictionIO-0.11.0-incubating/lib/spark/pio-data-hdfs-assembly-0.11.0-incubating.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/data/PredictionIO-0.11.0-incubating/lib/pio-assembly-0.11.0-incubating.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings <http://www.slf4j.org/codes.html#multiple_bindings> for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
[INFO] [Management$] Inspecting PredictionIO...
[INFO] [Management$] PredictionIO 0.11.0-incubating is installed at /opt/data/PredictionIO-0.11.0-incubating
[INFO] [Management$] Inspecting Apache Spark...
[INFO] [Management$] Apache Spark is installed at /usr/local/spark
[INFO] [Management$] Apache Spark 1.6.3 detected (meets minimum requirement of 1.3.0)
[INFO] [Management$] Inspecting storage backend connections...
[INFO] [Storage$] Verifying Meta Data Backend (Source: ELASTICSEARCH)...
[INFO] [Storage$] Verifying Model Data Backend (Source: HDFS)...
[INFO] [Storage$] Verifying Event Data Backend (Source: HBASE)...
[INFO] [Storage$] Test writing to Event Store (App Id 0)...
[ERROR] [Management$] Unable to connect to all storage backends successfully.
The following shows the error message from the storage backend.

Failed after attempts=1, exceptions:
Wed Aug 02 11:45:04 UTC 2017, org.apache.hadoop.hbase.client.RpcRetryingCaller@43045f9f, java.net <http://java.net/>.SocketTimeoutException: Call to localhost/127.0.0.1:39562 <http://127.0.0.1:39562/> failed because java.net <http://java.net/>.SocketTimeoutException: 60000 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/127.0.0.1:51462 <http://127.0.0.1:51462/> remote=localhost/127.0.0.1:39562 <http://127.0.0.1:39562/>]
 (org.apache.hadoop.hbase.client.RetriesExhaustedException)

Dumping configuration of initialized storage backend sources.
Please make sure they are correct.

Source Name: ELASTICSEARCH; Type: elasticsearch; Configuration: HOSTS -> 127.0.0.1, TYPE -> elasticsearch, CLUSTERNAME -> elasticsearch
Source Name: HBASE; Type: hbase; Configuration: TYPE -> hbase
Source Name: HDFS; Type: hdfs; Configuration: TYPE -> hdfs, PATH -> /models

Do you know what is the problem? How can I restart the services once the system fails? 

Thanks.

Carlos Vidal.





Re: Error when importing data

Posted by Carlos Vidal <ca...@beeva.com>.
Hello Mahesh, Pat

Thanks for your answers. I will try with a bigger EC2 instance.

Carlos.

2017-08-02 18:42 GMT+02:00 Pat Ferrel <pa...@occamsmachete.com>:

> Actually memory may be your problem. Mahesh Hegde may be right about
> trying smaller sets. Since it sounds like you have all services running on
> one machine, they may be in contention for resources.
>
>
> On Aug 2, 2017, at 9:35 AM, Pat Ferrel <pa...@occamsmachete.com> wrote:
>
> Something is not configured correctly `pio import` should work with any
> size of file but this may be an undersized instance for that much data.
>
> Spark needs memory, it keeps all data that it needs for a particular
> calculation spread across all cluster machines in memory. That includes
> derived data so a total of 32g may not be enough. But that is not your
> current problem.
>
> I would start by verifying that all components are working properly,
> starting with HDFS, then HBase, then Spark, then Elasticsearch. I see
> several storage backend errors below.
>
>
>
> On Aug 2, 2017, at 4:52 AM, Carlos Vidal <ca...@beeva.com> wrote:
>
> Hello,
>
> I have installed the pio + ur AMI in AWS, in an m4.2xlarge instance with
> 32GB of RAM and 8 VCPU.
>
> When I try to import a 20GB events file por my application, the system
> crashes. The command I have used is:
>
>
> pio import --appid 4 --input my_events.json
>
> this command launch an spark job that needs to perform 800 task. When the
> process reaches the task 211 it crashes. This is what I can see in my
> pio.log file:
>
> 2017-08-02 11:16:17,101 WARN  org.apache.hadoop.hbase.clien
> t.HConnectionManager$HConnectionImplementation [htable-pool230-t1] -
> Encountered problems when prefetch hbase:meta table:
> org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed after
> attempts=35, exceptions:
> Wed Aug 02 11:07:06 UTC 2017, org.apache.hadoop.hbase.client
> .RpcRetryingCaller@475db952, org.apache.hadoop.hbase.ipc.RpcClient$FailedServerException:
> This server is in the failed servers list: localhost/127.0.0.1:44866
> Wed Aug 02 11:07:07 UTC 2017, org.apache.hadoop.hbase.client
> .RpcRetryingCaller@475db952, org.apache.hadoop.hbase.ipc.RpcClient$FailedServerException:
> This server is in the failed servers list: localhost/127.0.0.1:44866
> Wed Aug 02 11:07:07 UTC 2017, org.apache.hadoop.hbase.client
> .RpcRetryingCaller@475db952, org.apache.hadoop.hbase.ipc.RpcClient$FailedServerException:
> This server is in the failed servers list: localhost/127.0.0.1:44866
> Wed Aug 02 11:07:08 UTC 2017, org.apache.hadoop.hbase.client
> .RpcRetryingCaller@475db952, org.apache.hadoop.hbase.ipc.RpcClient$FailedServerException:
> This server is in the failed servers list: localhost/127.0.0.1:44866
> Wed Aug 02 11:07:10 UTC 2017, org.apache.hadoop.hbase.client
> .RpcRetryingCaller@475db952, java.net.ConnectException: Connection refused
> Wed Aug 02 11:07:14 UTC 2017, org.apache.hadoop.hbase.client
> .RpcRetryingCaller@475db952, java.net.ConnectException: Connection refused
> Wed Aug 02 11:07:24 UTC 2017, org.apache.hadoop.hbase.client
> .RpcRetryingCaller@475db952, java.net.ConnectException: Connection refused
> Wed Aug 02 11:07:34 UTC 2017, org.apache.hadoop.hbase.client
> .RpcRetryingCaller@475db952, java.net.ConnectException: Connection refused
> Wed Aug 02 11:07:44 UTC 2017, org.apache.hadoop.hbase.client
> .RpcRetryingCaller@475db952, java.net.ConnectException: Connection refused
> Wed Aug 02 11:07:54 UTC 2017, org.apache.hadoop.hbase.client
> .RpcRetryingCaller@475db952, java.net.ConnectException: Connection refused
> Wed Aug 02 11:08:15 UTC 2017, org.apache.hadoop.hbase.client
> .RpcRetryingCaller@475db952, java.net.ConnectException: Connection refused
> Wed Aug 02 11:08:35 UTC 2017, org.apache.hadoop.hbase.client
> .RpcRetryingCaller@475db952, java.net.ConnectException: Connection refused
> Wed Aug 02 11:08:55 UTC 2017, org.apache.hadoop.hbase.client
> .RpcRetryingCaller@475db952, java.net.ConnectException: Connection refused
> Wed Aug 02 11:09:15 UTC 2017, org.apache.hadoop.hbase.client
> .RpcRetryingCaller@475db952, java.net.ConnectException: Connection refused
> Wed Aug 02 11:09:35 UTC 2017, org.apache.hadoop.hbase.client
> .RpcRetryingCaller@475db952, java.net.ConnectException: Connection refused
> Wed Aug 02 11:09:55 UTC 2017, org.apache.hadoop.hbase.client
> .RpcRetryingCaller@475db952, java.net.ConnectException: Connection refused
> Wed Aug 02 11:10:15 UTC 2017, org.apache.hadoop.hbase.client
> .RpcRetryingCaller@475db952, java.net.ConnectException: Connection refused
> Wed Aug 02 11:10:35 UTC 2017, org.apache.hadoop.hbase.client
> .RpcRetryingCaller@475db952, java.net.ConnectException: Connection refused
> Wed Aug 02 11:10:55 UTC 2017, org.apache.hadoop.hbase.client
> .RpcRetryingCaller@475db952, java.net.ConnectException: Connection refused
> Wed Aug 02 11:11:15 UTC 2017, org.apache.hadoop.hbase.client
> .RpcRetryingCaller@475db952, java.net.ConnectException: Connection refused
> Wed Aug 02 11:11:35 UTC 2017, org.apache.hadoop.hbase.client
> .RpcRetryingCaller@475db952, java.net.ConnectException: Connection refused
> Wed Aug 02 11:11:55 UTC 2017, org.apache.hadoop.hbase.client
> .RpcRetryingCaller@475db952, java.net.ConnectException: Connection refused
> Wed Aug 02 11:12:15 UTC 2017, org.apache.hadoop.hbase.client
> .RpcRetryingCaller@475db952, java.net.ConnectException: Connection refused
> Wed Aug 02 11:12:35 UTC 2017, org.apache.hadoop.hbase.client
> .RpcRetryingCaller@475db952, java.net.ConnectException: Connection refused
> Wed Aug 02 11:12:56 UTC 2017, org.apache.hadoop.hbase.client
> .RpcRetryingCaller@475db952, java.net.ConnectException: Connection refused
> Wed Aug 02 11:13:16 UTC 2017, org.apache.hadoop.hbase.client
> .RpcRetryingCaller@475db952, java.net.ConnectException: Connection refused
> Wed Aug 02 11:13:36 UTC 2017, org.apache.hadoop.hbase.client
> .RpcRetryingCaller@475db952, java.net.ConnectException: Connection refused
> Wed Aug 02 11:13:56 UTC 2017, org.apache.hadoop.hbase.client
> .RpcRetryingCaller@475db952, java.net.ConnectException: Connection refused
> Wed Aug 02 11:14:16 UTC 2017, org.apache.hadoop.hbase.client
> .RpcRetryingCaller@475db952, java.net.ConnectException: Connection refused
> Wed Aug 02 11:14:36 UTC 2017, org.apache.hadoop.hbase.client
> .RpcRetryingCaller@475db952, java.net.ConnectException: Connection refused
> Wed Aug 02 11:14:56 UTC 2017, org.apache.hadoop.hbase.client
> .RpcRetryingCaller@475db952, java.net.ConnectException: Connection refused
> Wed Aug 02 11:15:16 UTC 2017, org.apache.hadoop.hbase.client
> .RpcRetryingCaller@475db952, java.net.ConnectException: Connection refused
> Wed Aug 02 11:15:36 UTC 2017, org.apache.hadoop.hbase.client
> .RpcRetryingCaller@475db952, java.net.ConnectException: Connection refused
> Wed Aug 02 11:15:56 UTC 2017, org.apache.hadoop.hbase.client
> .RpcRetryingCaller@475db952, java.net.ConnectException: Connection refused
> Wed Aug 02 11:16:17 UTC 2017, org.apache.hadoop.hbase.client
> .RpcRetryingCaller@475db952, java.net.ConnectException: Connection refused
>
> at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRet
> ries(RpcRetryingCaller.java:129)
> at org.apache.hadoop.hbase.client.HTable.getRowOrBefore(HTable.java:714)
> at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScan
> ner.java:144)
> at org.apache.hadoop.hbase.client.HConnectionManager$HConnectio
> nImplementation.prefetchRegionCache(HConnectionManager.java:1153)
> at org.apache.hadoop.hbase.client.HConnectionManager$HConnectio
> nImplementation.locateRegionInMeta(HConnectionManager.java:1217)
> at org.apache.hadoop.hbase.client.HConnectionManager$HConnectio
> nImplementation.locateRegion(HConnectionManager.java:1105)
> at org.apache.hadoop.hbase.client.HConnectionManager$HConnectio
> nImplementation.locateRegion(HConnectionManager.java:1062)
> at org.apache.hadoop.hbase.client.AsyncProcess.findDestLocation
> (AsyncProcess.java:365)
> at org.apache.hadoop.hbase.client.AsyncProcess.submit(AsyncProc
> ess.java:507)
> at org.apache.hadoop.hbase.client.AsyncProcess.logAndResubmit(
> AsyncProcess.java:717)
> at org.apache.hadoop.hbase.client.AsyncProcess.receiveGlobalFai
> lure(AsyncProcess.java:664)
> at org.apache.hadoop.hbase.client.AsyncProcess.access$100(
> AsyncProcess.java:93)
> at org.apache.hadoop.hbase.client.AsyncProcess$1.run(AsyncProce
> ss.java:547)
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPool
> Executor.java:1149)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoo
> lExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> Caused by: java.net.ConnectException: Connection refused
> at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
> at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
> at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWi
> thTimeout.java:206)
> at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:531)
> at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:495)
> at org.apache.hadoop.hbase.ipc.RpcClient$Connection.setupConnec
> tion(RpcClient.java:578)
> at org.apache.hadoop.hbase.ipc.RpcClient$Connection.setupIOstre
> ams(RpcClient.java:868)
> at org.apache.hadoop.hbase.ipc.RpcClient.getConnection(RpcClien
> t.java:1543)
> at org.apache.hadoop.hbase.ipc.RpcClient.call(RpcClient.java:1442)
> at org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(Rpc
> Client.java:1661)
> at org.apache.hadoop.hbase.ipc.RpcClient$BlockingRpcChannelImpl
> ementation.callBlockingMethod(RpcClient.java:1719)
> at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$
> ClientService$BlockingStub.get(ClientProtos.java:29966)
> at org.apache.hadoop.hbase.protobuf.ProtobufUtil.getRowOrBefore
> (ProtobufUtil.java:1508)
> at org.apache.hadoop.hbase.client.HTable$2.call(HTable.java:710)
> at org.apache.hadoop.hbase.client.HTable$2.call(HTable.java:708)
> at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRet
> ries(RpcRetryingCaller.java:114)
> ... 17 more
> 2017-08-02 11:21:04,430 ERROR org.apache.spark.scheduler.LiveListenerBus
> [Thread-3] - SparkListenerBus has already stopped! Dropping event
> SparkListenerStageCompleted(org.apache.spark.scheduler.StageInfo@66c4a5d2)
> 2017-08-02 11:21:04,431 ERROR org.apache.spark.scheduler.LiveListenerBus
> [Thread-3] - SparkListenerBus has already stopped! Dropping event
> SparkListenerJobEnd(0,1501672864431,JobFailed(org.apache.spark.SparkException:
> Job 0 cancelled because SparkContext was shut down))
> 2017-08-02 11:28:47,129 INFO  org.apache.predictionio.tools.commands.Management$
> [main] - Inspecting PredictionIO...
> 2017-08-02 11:28:47,132 INFO  org.apache.predictionio.tools.commands.Management$
> [main] - PredictionIO 0.11.0-incubating is installed at
> /opt/data/PredictionIO-0.11.0-incubating
> 2017-08-02 11:28:47,132 INFO  org.apache.predictionio.tools.commands.Management$
> [main] - Inspecting Apache Spark...
> 2017-08-02 11:28:47,142 INFO  org.apache.predictionio.tools.commands.Management$
> [main] - Apache Spark is installed at /usr/local/spark
> 2017-08-02 11:28:47,175 INFO  org.apache.predictionio.tools.commands.Management$
> [main] - Apache Spark 1.6.3 detected (meets minimum requirement of 1.3.0)
> 2017-08-02 11:28:47,175 INFO  org.apache.predictionio.tools.commands.Management$
> [main] - Inspecting storage backend connections...
> 2017-08-02 11:28:47,195 INFO  org.apache.predictionio.data.storage.Storage$
> [main] - Verifying Meta Data Backend (Source: ELASTICSEARCH)...
> 2017-08-02 11:28:48,225 INFO  org.apache.predictionio.data.storage.Storage$
> [main] - Verifying Model Data Backend (Source: HDFS)...
> 2017-08-02 11:28:48,447 INFO  org.apache.predictionio.data.storage.Storage$
> [main] - Verifying Event Data Backend (Source: HBASE)...
> 2017-08-02 11:28:48,979 INFO  org.apache.predictionio.data.storage.Storage$
> [main] - Test writing to Event Store (App Id 0)...
> 2017-08-02 11:29:49,026 ERROR org.apache.predictionio.tools.commands.Management$
> [main] - Unable to connect to all storage backends successfully.
>
>
>
>
>
>
> On the other hand, once this happens, if I run pio status this is what I
> obtain:
>
> aml@ip-10-41-11-227:~$ pio status
> SLF4J: Class path contains multiple SLF4J bindings.
> SLF4J: Found binding in [jar:file:/opt/data/Prediction
> IO-0.11.0-incubating/lib/spark/pio-data-hdfs-assembly-
> 0.11.0-incubating.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in [jar:file:/opt/data/Prediction
> IO-0.11.0-incubating/lib/pio-assembly-0.11.0-incubating.
> jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an
> explanation.
> SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
> [INFO] [Management$] Inspecting PredictionIO...
> [INFO] [Management$] PredictionIO 0.11.0-incubating is installed at
> /opt/data/PredictionIO-0.11.0-incubating
> [INFO] [Management$] Inspecting Apache Spark...
> [INFO] [Management$] Apache Spark is installed at /usr/local/spark
> [INFO] [Management$] Apache Spark 1.6.3 detected (meets minimum
> requirement of 1.3.0)
> [INFO] [Management$] Inspecting storage backend connections...
> [INFO] [Storage$] Verifying Meta Data Backend (Source: ELASTICSEARCH)...
> [INFO] [Storage$] Verifying Model Data Backend (Source: HDFS)...
> [INFO] [Storage$] Verifying Event Data Backend (Source: HBASE)...
> [INFO] [Storage$] Test writing to Event Store (App Id 0)...
> [ERROR] [Management$] Unable to connect to all storage backends
> successfully.
> The following shows the error message from the storage backend.
>
> Failed after attempts=1, exceptions:
> Wed Aug 02 11:45:04 UTC 2017, org.apache.hadoop.hbase.client
> .RpcRetryingCaller@43045f9f, java.net.SocketTimeoutException: Call to
> localhost/127.0.0.1:39562 failed because java.net.SocketTimeoutException:
> 60000 millis timeout while waiting for channel to be ready for read. ch :
> java.nio.channels.SocketChannel[connected local=/127.0.0.1:51462 remote=
> localhost/127.0.0.1:39562]
>  (org.apache.hadoop.hbase.client.RetriesExhaustedException)
>
> Dumping configuration of initialized storage backend sources.
> Please make sure they are correct.
>
> Source Name: ELASTICSEARCH; Type: elasticsearch; Configuration: HOSTS ->
> 127.0.0.1, TYPE -> elasticsearch, CLUSTERNAME -> elasticsearch
> Source Name: HBASE; Type: hbase; Configuration: TYPE -> hbase
> Source Name: HDFS; Type: hdfs; Configuration: TYPE -> hdfs, PATH -> /models
>
> Do you know what is the problem? How can I restart the services once the
> system fails?
>
> Thanks.
>
> Carlos Vidal.
>
>
>

Re: Error when importing data

Posted by Pat Ferrel <pa...@occamsmachete.com>.
Actually memory may be your problem. Mahesh Hegde may be right about trying smaller sets. Since it sounds like you have all services running on one machine, they may be in contention for resources.


On Aug 2, 2017, at 9:35 AM, Pat Ferrel <pa...@occamsmachete.com> wrote:

Something is not configured correctly `pio import` should work with any size of file but this may be an undersized instance for that much data.

Spark needs memory, it keeps all data that it needs for a particular calculation spread across all cluster machines in memory. That includes derived data so a total of 32g may not be enough. But that is not your current problem.

I would start by verifying that all components are working properly, starting with HDFS, then HBase, then Spark, then Elasticsearch. I see several storage backend errors below.



On Aug 2, 2017, at 4:52 AM, Carlos Vidal <carlos.vidal@beeva.com <ma...@beeva.com>> wrote:

Hello,

I have installed the pio + ur AMI in AWS, in an m4.2xlarge instance with 32GB of RAM and 8 VCPU. 

When I try to import a 20GB events file por my application, the system crashes. The command I have used is:


pio import --appid 4 --input my_events.json

this command launch an spark job that needs to perform 800 task. When the process reaches the task 211 it crashes. This is what I can see in my pio.log file:

2017-08-02 11:16:17,101 WARN  org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation [htable-pool230-t1] - Encountered problems when prefetch hbase:meta table: 
org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed after attempts=35, exceptions:
Wed Aug 02 11:07:06 UTC 2017, org.apache.hadoop.hbase.client.RpcRetryingCaller@475db952, org.apache.hadoop.hbase.ipc.RpcClient$FailedServerException: This server is in the failed servers list: localhost/127.0.0.1:44866 <http://127.0.0.1:44866/>
Wed Aug 02 11:07:07 UTC 2017, org.apache.hadoop.hbase.client.RpcRetryingCaller@475db952, org.apache.hadoop.hbase.ipc.RpcClient$FailedServerException: This server is in the failed servers list: localhost/127.0.0.1:44866 <http://127.0.0.1:44866/>
Wed Aug 02 11:07:07 UTC 2017, org.apache.hadoop.hbase.client.RpcRetryingCaller@475db952, org.apache.hadoop.hbase.ipc.RpcClient$FailedServerException: This server is in the failed servers list: localhost/127.0.0.1:44866 <http://127.0.0.1:44866/>
Wed Aug 02 11:07:08 UTC 2017, org.apache.hadoop.hbase.client.RpcRetryingCaller@475db952, org.apache.hadoop.hbase.ipc.RpcClient$FailedServerException: This server is in the failed servers list: localhost/127.0.0.1:44866 <http://127.0.0.1:44866/>
Wed Aug 02 11:07:10 UTC 2017, org.apache.hadoop.hbase.client.RpcRetryingCaller@475db952, java.net.ConnectException: Connection refused
Wed Aug 02 11:07:14 UTC 2017, org.apache.hadoop.hbase.client.RpcRetryingCaller@475db952, java.net.ConnectException: Connection refused
Wed Aug 02 11:07:24 UTC 2017, org.apache.hadoop.hbase.client.RpcRetryingCaller@475db952, java.net.ConnectException: Connection refused
Wed Aug 02 11:07:34 UTC 2017, org.apache.hadoop.hbase.client.RpcRetryingCaller@475db952, java.net.ConnectException: Connection refused
Wed Aug 02 11:07:44 UTC 2017, org.apache.hadoop.hbase.client.RpcRetryingCaller@475db952, java.net.ConnectException: Connection refused
Wed Aug 02 11:07:54 UTC 2017, org.apache.hadoop.hbase.client.RpcRetryingCaller@475db952, java.net.ConnectException: Connection refused
Wed Aug 02 11:08:15 UTC 2017, org.apache.hadoop.hbase.client.RpcRetryingCaller@475db952, java.net.ConnectException: Connection refused
Wed Aug 02 11:08:35 UTC 2017, org.apache.hadoop.hbase.client.RpcRetryingCaller@475db952, java.net.ConnectException: Connection refused
Wed Aug 02 11:08:55 UTC 2017, org.apache.hadoop.hbase.client.RpcRetryingCaller@475db952, java.net.ConnectException: Connection refused
Wed Aug 02 11:09:15 UTC 2017, org.apache.hadoop.hbase.client.RpcRetryingCaller@475db952, java.net.ConnectException: Connection refused
Wed Aug 02 11:09:35 UTC 2017, org.apache.hadoop.hbase.client.RpcRetryingCaller@475db952, java.net.ConnectException: Connection refused
Wed Aug 02 11:09:55 UTC 2017, org.apache.hadoop.hbase.client.RpcRetryingCaller@475db952, java.net.ConnectException: Connection refused
Wed Aug 02 11:10:15 UTC 2017, org.apache.hadoop.hbase.client.RpcRetryingCaller@475db952, java.net.ConnectException: Connection refused
Wed Aug 02 11:10:35 UTC 2017, org.apache.hadoop.hbase.client.RpcRetryingCaller@475db952, java.net.ConnectException: Connection refused
Wed Aug 02 11:10:55 UTC 2017, org.apache.hadoop.hbase.client.RpcRetryingCaller@475db952, java.net.ConnectException: Connection refused
Wed Aug 02 11:11:15 UTC 2017, org.apache.hadoop.hbase.client.RpcRetryingCaller@475db952, java.net.ConnectException: Connection refused
Wed Aug 02 11:11:35 UTC 2017, org.apache.hadoop.hbase.client.RpcRetryingCaller@475db952, java.net.ConnectException: Connection refused
Wed Aug 02 11:11:55 UTC 2017, org.apache.hadoop.hbase.client.RpcRetryingCaller@475db952, java.net.ConnectException: Connection refused
Wed Aug 02 11:12:15 UTC 2017, org.apache.hadoop.hbase.client.RpcRetryingCaller@475db952, java.net.ConnectException: Connection refused
Wed Aug 02 11:12:35 UTC 2017, org.apache.hadoop.hbase.client.RpcRetryingCaller@475db952, java.net.ConnectException: Connection refused
Wed Aug 02 11:12:56 UTC 2017, org.apache.hadoop.hbase.client.RpcRetryingCaller@475db952, java.net.ConnectException: Connection refused
Wed Aug 02 11:13:16 UTC 2017, org.apache.hadoop.hbase.client.RpcRetryingCaller@475db952, java.net.ConnectException: Connection refused
Wed Aug 02 11:13:36 UTC 2017, org.apache.hadoop.hbase.client.RpcRetryingCaller@475db952, java.net.ConnectException: Connection refused
Wed Aug 02 11:13:56 UTC 2017, org.apache.hadoop.hbase.client.RpcRetryingCaller@475db952, java.net.ConnectException: Connection refused
Wed Aug 02 11:14:16 UTC 2017, org.apache.hadoop.hbase.client.RpcRetryingCaller@475db952, java.net.ConnectException: Connection refused
Wed Aug 02 11:14:36 UTC 2017, org.apache.hadoop.hbase.client.RpcRetryingCaller@475db952, java.net.ConnectException: Connection refused
Wed Aug 02 11:14:56 UTC 2017, org.apache.hadoop.hbase.client.RpcRetryingCaller@475db952, java.net.ConnectException: Connection refused
Wed Aug 02 11:15:16 UTC 2017, org.apache.hadoop.hbase.client.RpcRetryingCaller@475db952, java.net.ConnectException: Connection refused
Wed Aug 02 11:15:36 UTC 2017, org.apache.hadoop.hbase.client.RpcRetryingCaller@475db952, java.net.ConnectException: Connection refused
Wed Aug 02 11:15:56 UTC 2017, org.apache.hadoop.hbase.client.RpcRetryingCaller@475db952, java.net.ConnectException: Connection refused
Wed Aug 02 11:16:17 UTC 2017, org.apache.hadoop.hbase.client.RpcRetryingCaller@475db952, java.net.ConnectException: Connection refused

	at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:129)
	at org.apache.hadoop.hbase.client.HTable.getRowOrBefore(HTable.java:714)
	at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:144)
	at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.prefetchRegionCache(HConnectionManager.java:1153)
	at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:1217)
	at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:1105)
	at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:1062)
	at org.apache.hadoop.hbase.client.AsyncProcess.findDestLocation(AsyncProcess.java:365)
	at org.apache.hadoop.hbase.client.AsyncProcess.submit(AsyncProcess.java:507)
	at org.apache.hadoop.hbase.client.AsyncProcess.logAndResubmit(AsyncProcess.java:717)
	at org.apache.hadoop.hbase.client.AsyncProcess.receiveGlobalFailure(AsyncProcess.java:664)
	at org.apache.hadoop.hbase.client.AsyncProcess.access$100(AsyncProcess.java:93)
	at org.apache.hadoop.hbase.client.AsyncProcess$1.run(AsyncProcess.java:547)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
Caused by: java.net.ConnectException: Connection refused
	at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
	at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
	at org.apache.hadoop.net <http://org.apache.hadoop.net/>.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
	at org.apache.hadoop.net <http://org.apache.hadoop.net/>.NetUtils.connect(NetUtils.java:531)
	at org.apache.hadoop.net <http://org.apache.hadoop.net/>.NetUtils.connect(NetUtils.java:495)
	at org.apache.hadoop.hbase.ipc.RpcClient$Connection.setupConnection(RpcClient.java:578)
	at org.apache.hadoop.hbase.ipc.RpcClient$Connection.setupIOstreams(RpcClient.java:868)
	at org.apache.hadoop.hbase.ipc.RpcClient.getConnection(RpcClient.java:1543)
	at org.apache.hadoop.hbase.ipc.RpcClient.call(RpcClient.java:1442)
	at org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1661)
	at org.apache.hadoop.hbase.ipc.RpcClient$BlockingRpcChannelImplementation.callBlockingMethod(RpcClient.java:1719)
	at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$BlockingStub.get(ClientProtos.java:29966)
	at org.apache.hadoop.hbase.protobuf.ProtobufUtil.getRowOrBefore(ProtobufUtil.java:1508)
	at org.apache.hadoop.hbase.client.HTable$2.call(HTable.java:710)
	at org.apache.hadoop.hbase.client.HTable$2.call(HTable.java:708)
	at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:114)
	... 17 more
2017-08-02 11:21:04,430 ERROR org.apache.spark.scheduler.LiveListenerBus [Thread-3] - SparkListenerBus has already stopped! Dropping event SparkListenerStageCompleted(org.apache.spark.scheduler.StageInfo@66c4a5d2)
2017-08-02 11:21:04,431 ERROR org.apache.spark.scheduler.LiveListenerBus [Thread-3] - SparkListenerBus has already stopped! Dropping event SparkListenerJobEnd(0,1501672864431,JobFailed(org.apache.spark.SparkException: Job 0 cancelled because SparkContext was shut down))
2017-08-02 11:28:47,129 INFO  org.apache.predictionio.tools.commands.Management$ [main] - Inspecting PredictionIO...
2017-08-02 11:28:47,132 INFO  org.apache.predictionio.tools.commands.Management$ [main] - PredictionIO 0.11.0-incubating is installed at /opt/data/PredictionIO-0.11.0-incubating
2017-08-02 11:28:47,132 INFO  org.apache.predictionio.tools.commands.Management$ [main] - Inspecting Apache Spark...
2017-08-02 11:28:47,142 INFO  org.apache.predictionio.tools.commands.Management$ [main] - Apache Spark is installed at /usr/local/spark
2017-08-02 11:28:47,175 INFO  org.apache.predictionio.tools.commands.Management$ [main] - Apache Spark 1.6.3 detected (meets minimum requirement of 1.3.0)
2017-08-02 11:28:47,175 INFO  org.apache.predictionio.tools.commands.Management$ [main] - Inspecting storage backend connections...
2017-08-02 11:28:47,195 INFO  org.apache.predictionio.data.storage.Storage$ [main] - Verifying Meta Data Backend (Source: ELASTICSEARCH)...
2017-08-02 11:28:48,225 INFO  org.apache.predictionio.data.storage.Storage$ [main] - Verifying Model Data Backend (Source: HDFS)...
2017-08-02 11:28:48,447 INFO  org.apache.predictionio.data.storage.Storage$ [main] - Verifying Event Data Backend (Source: HBASE)...
2017-08-02 11:28:48,979 INFO  org.apache.predictionio.data.storage.Storage$ [main] - Test writing to Event Store (App Id 0)...
2017-08-02 11:29:49,026 ERROR org.apache.predictionio.tools.commands.Management$ [main] - Unable to connect to all storage backends successfully.






On the other hand, once this happens, if I run pio status this is what I obtain:

aml@ip-10-41-11-227:~$ pio status
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/data/PredictionIO-0.11.0-incubating/lib/spark/pio-data-hdfs-assembly-0.11.0-incubating.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/data/PredictionIO-0.11.0-incubating/lib/pio-assembly-0.11.0-incubating.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings <http://www.slf4j.org/codes.html#multiple_bindings> for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
[INFO] [Management$] Inspecting PredictionIO...
[INFO] [Management$] PredictionIO 0.11.0-incubating is installed at /opt/data/PredictionIO-0.11.0-incubating
[INFO] [Management$] Inspecting Apache Spark...
[INFO] [Management$] Apache Spark is installed at /usr/local/spark
[INFO] [Management$] Apache Spark 1.6.3 detected (meets minimum requirement of 1.3.0)
[INFO] [Management$] Inspecting storage backend connections...
[INFO] [Storage$] Verifying Meta Data Backend (Source: ELASTICSEARCH)...
[INFO] [Storage$] Verifying Model Data Backend (Source: HDFS)...
[INFO] [Storage$] Verifying Event Data Backend (Source: HBASE)...
[INFO] [Storage$] Test writing to Event Store (App Id 0)...
[ERROR] [Management$] Unable to connect to all storage backends successfully.
The following shows the error message from the storage backend.

Failed after attempts=1, exceptions:
Wed Aug 02 11:45:04 UTC 2017, org.apache.hadoop.hbase.client.RpcRetryingCaller@43045f9f, java.net <http://java.net/>.SocketTimeoutException: Call to localhost/127.0.0.1:39562 <http://127.0.0.1:39562/> failed because java.net <http://java.net/>.SocketTimeoutException: 60000 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/127.0.0.1:51462 <http://127.0.0.1:51462/> remote=localhost/127.0.0.1:39562 <http://127.0.0.1:39562/>]
 (org.apache.hadoop.hbase.client.RetriesExhaustedException)

Dumping configuration of initialized storage backend sources.
Please make sure they are correct.

Source Name: ELASTICSEARCH; Type: elasticsearch; Configuration: HOSTS -> 127.0.0.1, TYPE -> elasticsearch, CLUSTERNAME -> elasticsearch
Source Name: HBASE; Type: hbase; Configuration: TYPE -> hbase
Source Name: HDFS; Type: hdfs; Configuration: TYPE -> hdfs, PATH -> /models

Do you know what is the problem? How can I restart the services once the system fails? 

Thanks.

Carlos Vidal.



Re: Error when importing data

Posted by Pat Ferrel <pa...@occamsmachete.com>.
Something is not configured correctly `pio import` should work with any size of file but this may be an undersized instance for that much data.

Spark needs memory, it keeps all data that it needs for a particular calculation spread across all cluster machines in memory. That includes derived data so a total of 32g may not be enough. But that is not your current problem.

I would start by verifying that all components are working properly, starting with HDFS, then HBase, then Spark, then Elasticsearch. I see several storage backend errors below.



On Aug 2, 2017, at 4:52 AM, Carlos Vidal <ca...@beeva.com> wrote:

Hello,

I have installed the pio + ur AMI in AWS, in an m4.2xlarge instance with 32GB of RAM and 8 VCPU. 

When I try to import a 20GB events file por my application, the system crashes. The command I have used is:


pio import --appid 4 --input my_events.json

this command launch an spark job that needs to perform 800 task. When the process reaches the task 211 it crashes. This is what I can see in my pio.log file:

2017-08-02 11:16:17,101 WARN  org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation [htable-pool230-t1] - Encountered problems when prefetch hbase:meta table: 
org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed after attempts=35, exceptions:
Wed Aug 02 11:07:06 UTC 2017, org.apache.hadoop.hbase.client.RpcRetryingCaller@475db952, org.apache.hadoop.hbase.ipc.RpcClient$FailedServerException: This server is in the failed servers list: localhost/127.0.0.1:44866 <http://127.0.0.1:44866/>
Wed Aug 02 11:07:07 UTC 2017, org.apache.hadoop.hbase.client.RpcRetryingCaller@475db952, org.apache.hadoop.hbase.ipc.RpcClient$FailedServerException: This server is in the failed servers list: localhost/127.0.0.1:44866 <http://127.0.0.1:44866/>
Wed Aug 02 11:07:07 UTC 2017, org.apache.hadoop.hbase.client.RpcRetryingCaller@475db952, org.apache.hadoop.hbase.ipc.RpcClient$FailedServerException: This server is in the failed servers list: localhost/127.0.0.1:44866 <http://127.0.0.1:44866/>
Wed Aug 02 11:07:08 UTC 2017, org.apache.hadoop.hbase.client.RpcRetryingCaller@475db952, org.apache.hadoop.hbase.ipc.RpcClient$FailedServerException: This server is in the failed servers list: localhost/127.0.0.1:44866 <http://127.0.0.1:44866/>
Wed Aug 02 11:07:10 UTC 2017, org.apache.hadoop.hbase.client.RpcRetryingCaller@475db952, java.net.ConnectException: Connection refused
Wed Aug 02 11:07:14 UTC 2017, org.apache.hadoop.hbase.client.RpcRetryingCaller@475db952, java.net.ConnectException: Connection refused
Wed Aug 02 11:07:24 UTC 2017, org.apache.hadoop.hbase.client.RpcRetryingCaller@475db952, java.net.ConnectException: Connection refused
Wed Aug 02 11:07:34 UTC 2017, org.apache.hadoop.hbase.client.RpcRetryingCaller@475db952, java.net.ConnectException: Connection refused
Wed Aug 02 11:07:44 UTC 2017, org.apache.hadoop.hbase.client.RpcRetryingCaller@475db952, java.net.ConnectException: Connection refused
Wed Aug 02 11:07:54 UTC 2017, org.apache.hadoop.hbase.client.RpcRetryingCaller@475db952, java.net.ConnectException: Connection refused
Wed Aug 02 11:08:15 UTC 2017, org.apache.hadoop.hbase.client.RpcRetryingCaller@475db952, java.net.ConnectException: Connection refused
Wed Aug 02 11:08:35 UTC 2017, org.apache.hadoop.hbase.client.RpcRetryingCaller@475db952, java.net.ConnectException: Connection refused
Wed Aug 02 11:08:55 UTC 2017, org.apache.hadoop.hbase.client.RpcRetryingCaller@475db952, java.net.ConnectException: Connection refused
Wed Aug 02 11:09:15 UTC 2017, org.apache.hadoop.hbase.client.RpcRetryingCaller@475db952, java.net.ConnectException: Connection refused
Wed Aug 02 11:09:35 UTC 2017, org.apache.hadoop.hbase.client.RpcRetryingCaller@475db952, java.net.ConnectException: Connection refused
Wed Aug 02 11:09:55 UTC 2017, org.apache.hadoop.hbase.client.RpcRetryingCaller@475db952, java.net.ConnectException: Connection refused
Wed Aug 02 11:10:15 UTC 2017, org.apache.hadoop.hbase.client.RpcRetryingCaller@475db952, java.net.ConnectException: Connection refused
Wed Aug 02 11:10:35 UTC 2017, org.apache.hadoop.hbase.client.RpcRetryingCaller@475db952, java.net.ConnectException: Connection refused
Wed Aug 02 11:10:55 UTC 2017, org.apache.hadoop.hbase.client.RpcRetryingCaller@475db952, java.net.ConnectException: Connection refused
Wed Aug 02 11:11:15 UTC 2017, org.apache.hadoop.hbase.client.RpcRetryingCaller@475db952, java.net.ConnectException: Connection refused
Wed Aug 02 11:11:35 UTC 2017, org.apache.hadoop.hbase.client.RpcRetryingCaller@475db952, java.net.ConnectException: Connection refused
Wed Aug 02 11:11:55 UTC 2017, org.apache.hadoop.hbase.client.RpcRetryingCaller@475db952, java.net.ConnectException: Connection refused
Wed Aug 02 11:12:15 UTC 2017, org.apache.hadoop.hbase.client.RpcRetryingCaller@475db952, java.net.ConnectException: Connection refused
Wed Aug 02 11:12:35 UTC 2017, org.apache.hadoop.hbase.client.RpcRetryingCaller@475db952, java.net.ConnectException: Connection refused
Wed Aug 02 11:12:56 UTC 2017, org.apache.hadoop.hbase.client.RpcRetryingCaller@475db952, java.net.ConnectException: Connection refused
Wed Aug 02 11:13:16 UTC 2017, org.apache.hadoop.hbase.client.RpcRetryingCaller@475db952, java.net.ConnectException: Connection refused
Wed Aug 02 11:13:36 UTC 2017, org.apache.hadoop.hbase.client.RpcRetryingCaller@475db952, java.net.ConnectException: Connection refused
Wed Aug 02 11:13:56 UTC 2017, org.apache.hadoop.hbase.client.RpcRetryingCaller@475db952, java.net.ConnectException: Connection refused
Wed Aug 02 11:14:16 UTC 2017, org.apache.hadoop.hbase.client.RpcRetryingCaller@475db952, java.net.ConnectException: Connection refused
Wed Aug 02 11:14:36 UTC 2017, org.apache.hadoop.hbase.client.RpcRetryingCaller@475db952, java.net.ConnectException: Connection refused
Wed Aug 02 11:14:56 UTC 2017, org.apache.hadoop.hbase.client.RpcRetryingCaller@475db952, java.net.ConnectException: Connection refused
Wed Aug 02 11:15:16 UTC 2017, org.apache.hadoop.hbase.client.RpcRetryingCaller@475db952, java.net.ConnectException: Connection refused
Wed Aug 02 11:15:36 UTC 2017, org.apache.hadoop.hbase.client.RpcRetryingCaller@475db952, java.net.ConnectException: Connection refused
Wed Aug 02 11:15:56 UTC 2017, org.apache.hadoop.hbase.client.RpcRetryingCaller@475db952, java.net.ConnectException: Connection refused
Wed Aug 02 11:16:17 UTC 2017, org.apache.hadoop.hbase.client.RpcRetryingCaller@475db952, java.net.ConnectException: Connection refused

	at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:129)
	at org.apache.hadoop.hbase.client.HTable.getRowOrBefore(HTable.java:714)
	at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:144)
	at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.prefetchRegionCache(HConnectionManager.java:1153)
	at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:1217)
	at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:1105)
	at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:1062)
	at org.apache.hadoop.hbase.client.AsyncProcess.findDestLocation(AsyncProcess.java:365)
	at org.apache.hadoop.hbase.client.AsyncProcess.submit(AsyncProcess.java:507)
	at org.apache.hadoop.hbase.client.AsyncProcess.logAndResubmit(AsyncProcess.java:717)
	at org.apache.hadoop.hbase.client.AsyncProcess.receiveGlobalFailure(AsyncProcess.java:664)
	at org.apache.hadoop.hbase.client.AsyncProcess.access$100(AsyncProcess.java:93)
	at org.apache.hadoop.hbase.client.AsyncProcess$1.run(AsyncProcess.java:547)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
Caused by: java.net.ConnectException: Connection refused
	at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
	at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
	at org.apache.hadoop.net <http://org.apache.hadoop.net/>.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
	at org.apache.hadoop.net <http://org.apache.hadoop.net/>.NetUtils.connect(NetUtils.java:531)
	at org.apache.hadoop.net <http://org.apache.hadoop.net/>.NetUtils.connect(NetUtils.java:495)
	at org.apache.hadoop.hbase.ipc.RpcClient$Connection.setupConnection(RpcClient.java:578)
	at org.apache.hadoop.hbase.ipc.RpcClient$Connection.setupIOstreams(RpcClient.java:868)
	at org.apache.hadoop.hbase.ipc.RpcClient.getConnection(RpcClient.java:1543)
	at org.apache.hadoop.hbase.ipc.RpcClient.call(RpcClient.java:1442)
	at org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1661)
	at org.apache.hadoop.hbase.ipc.RpcClient$BlockingRpcChannelImplementation.callBlockingMethod(RpcClient.java:1719)
	at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$BlockingStub.get(ClientProtos.java:29966)
	at org.apache.hadoop.hbase.protobuf.ProtobufUtil.getRowOrBefore(ProtobufUtil.java:1508)
	at org.apache.hadoop.hbase.client.HTable$2.call(HTable.java:710)
	at org.apache.hadoop.hbase.client.HTable$2.call(HTable.java:708)
	at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:114)
	... 17 more
2017-08-02 11:21:04,430 ERROR org.apache.spark.scheduler.LiveListenerBus [Thread-3] - SparkListenerBus has already stopped! Dropping event SparkListenerStageCompleted(org.apache.spark.scheduler.StageInfo@66c4a5d2)
2017-08-02 11:21:04,431 ERROR org.apache.spark.scheduler.LiveListenerBus [Thread-3] - SparkListenerBus has already stopped! Dropping event SparkListenerJobEnd(0,1501672864431,JobFailed(org.apache.spark.SparkException: Job 0 cancelled because SparkContext was shut down))
2017-08-02 11:28:47,129 INFO  org.apache.predictionio.tools.commands.Management$ [main] - Inspecting PredictionIO...
2017-08-02 11:28:47,132 INFO  org.apache.predictionio.tools.commands.Management$ [main] - PredictionIO 0.11.0-incubating is installed at /opt/data/PredictionIO-0.11.0-incubating
2017-08-02 11:28:47,132 INFO  org.apache.predictionio.tools.commands.Management$ [main] - Inspecting Apache Spark...
2017-08-02 11:28:47,142 INFO  org.apache.predictionio.tools.commands.Management$ [main] - Apache Spark is installed at /usr/local/spark
2017-08-02 11:28:47,175 INFO  org.apache.predictionio.tools.commands.Management$ [main] - Apache Spark 1.6.3 detected (meets minimum requirement of 1.3.0)
2017-08-02 11:28:47,175 INFO  org.apache.predictionio.tools.commands.Management$ [main] - Inspecting storage backend connections...
2017-08-02 11:28:47,195 INFO  org.apache.predictionio.data.storage.Storage$ [main] - Verifying Meta Data Backend (Source: ELASTICSEARCH)...
2017-08-02 11:28:48,225 INFO  org.apache.predictionio.data.storage.Storage$ [main] - Verifying Model Data Backend (Source: HDFS)...
2017-08-02 11:28:48,447 INFO  org.apache.predictionio.data.storage.Storage$ [main] - Verifying Event Data Backend (Source: HBASE)...
2017-08-02 11:28:48,979 INFO  org.apache.predictionio.data.storage.Storage$ [main] - Test writing to Event Store (App Id 0)...
2017-08-02 11:29:49,026 ERROR org.apache.predictionio.tools.commands.Management$ [main] - Unable to connect to all storage backends successfully.






On the other hand, once this happens, if I run pio status this is what I obtain:

aml@ip-10-41-11-227:~$ pio status
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/data/PredictionIO-0.11.0-incubating/lib/spark/pio-data-hdfs-assembly-0.11.0-incubating.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/data/PredictionIO-0.11.0-incubating/lib/pio-assembly-0.11.0-incubating.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings <http://www.slf4j.org/codes.html#multiple_bindings> for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
[INFO] [Management$] Inspecting PredictionIO...
[INFO] [Management$] PredictionIO 0.11.0-incubating is installed at /opt/data/PredictionIO-0.11.0-incubating
[INFO] [Management$] Inspecting Apache Spark...
[INFO] [Management$] Apache Spark is installed at /usr/local/spark
[INFO] [Management$] Apache Spark 1.6.3 detected (meets minimum requirement of 1.3.0)
[INFO] [Management$] Inspecting storage backend connections...
[INFO] [Storage$] Verifying Meta Data Backend (Source: ELASTICSEARCH)...
[INFO] [Storage$] Verifying Model Data Backend (Source: HDFS)...
[INFO] [Storage$] Verifying Event Data Backend (Source: HBASE)...
[INFO] [Storage$] Test writing to Event Store (App Id 0)...
[ERROR] [Management$] Unable to connect to all storage backends successfully.
The following shows the error message from the storage backend.

Failed after attempts=1, exceptions:
Wed Aug 02 11:45:04 UTC 2017, org.apache.hadoop.hbase.client.RpcRetryingCaller@43045f9f, java.net <http://java.net/>.SocketTimeoutException: Call to localhost/127.0.0.1:39562 <http://127.0.0.1:39562/> failed because java.net <http://java.net/>.SocketTimeoutException: 60000 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/127.0.0.1:51462 <http://127.0.0.1:51462/> remote=localhost/127.0.0.1:39562 <http://127.0.0.1:39562/>]
 (org.apache.hadoop.hbase.client.RetriesExhaustedException)

Dumping configuration of initialized storage backend sources.
Please make sure they are correct.

Source Name: ELASTICSEARCH; Type: elasticsearch; Configuration: HOSTS -> 127.0.0.1, TYPE -> elasticsearch, CLUSTERNAME -> elasticsearch
Source Name: HBASE; Type: hbase; Configuration: TYPE -> hbase
Source Name: HDFS; Type: hdfs; Configuration: TYPE -> hdfs, PATH -> /models

Do you know what is the problem? How can I restart the services once the system fails? 

Thanks.

Carlos Vidal.