You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by qingyang li <li...@gmail.com> on 2014/03/12 05:11:40 UTC

how to config worker HA

i have one table in memery,  when one worker becomes dead, i can not query
data from that table. Here is it's storage status:


RDD NameStorage LevelCached PartitionsFraction CachedSize in MemorySize on
Disk
 <http://192.168.1.101:4040/storage/rdd?id=47>
table01 Memory Deserialized 1x Replicated 119 88%       697.0 MB     0.0 Bso,
my question is:
1. what meaning is " Memory Deserialized 1x Replicated" ?
2. how to config worker HA so that i can query data even one worker dead.

Re: how to config worker HA

Posted by qingyang li <li...@gmail.com>.

 i think i found the answer:

apply(flags: Int, replication: Int):
StorageLevel<http://spark.incubator.apache.org/docs/latest/api/core/org/apache/spark/storage/StorageLevel.html><http://spark.incubator.apache.org/docs/latest/api/core/org/apache/spark/storage/StorageLevel.html>


2014-03-20 17:00 GMT+08:00 qingyang li <li...@gmail.com>:

> can someone help me ?
>
>
> 2014-03-12 21:26 GMT+08:00 qingyang li <li...@gmail.com>:
>
> in addition:
>> on this site:
>> https://spark.apache.org/docs/0.9.0/scala-programming-guide.html#hadoop-datasets
>> ,
>> i find RDD can be stored using a different *storage level on the web,
>> and  *also find StorageLevel's attribute MEMORY_ONLY_2 .
>> MEMORY_ONLY_2, Same as the levels above, but replicate each partition on
>> two cluster nodes.
>> 1. is this one point of fault-tolerance ?
>> 2.if replicate each partition on two cluster nodes will help worker node
>> HA ?
>> 3. if there is MEMORY_ONLY_3 which could replicate each partition on
>> three cluster nodes?
>>
>>
>>
>>
>> 2014-03-12 12:11 GMT+08:00 qingyang li <li...@gmail.com>:
>>
>> i have one table in memery,  when one worker becomes dead, i can not
>>> query data from that table. Here is it's storage status:
>>>
>>>
>>> RDD Name Storage LevelCached PartitionsFraction CachedSize in MemorySize
>>> on Disk
>>>  <http://192.168.1.101:4040/storage/rdd?id=47>
>>> table01 Memory Deserialized 1x Replicated 119 88%       697.0 MB
>>> 0.0 Bso, my question is:
>>> 1. what meaning is " Memory Deserialized 1x Replicated" ?
>>> 2. how to config worker HA so that i can query data even one worker dead.
>>>
>>
>>
>

Re: how to config worker HA

Posted by qingyang li <li...@gmail.com>.

can someone help me ?


2014-03-12 21:26 GMT+08:00 qingyang li <li...@gmail.com>:

> in addition:
> on this site:
> https://spark.apache.org/docs/0.9.0/scala-programming-guide.html#hadoop-datasets
> ,
> i find RDD can be stored using a different *storage level on the web,
> and  *also find StorageLevel's attribute MEMORY_ONLY_2 .
> MEMORY_ONLY_2, Same as the levels above, but replicate each partition on
> two cluster nodes.
> 1. is this one point of fault-tolerance ?
> 2.if replicate each partition on two cluster nodes will help worker node
> HA ?
> 3. if there is MEMORY_ONLY_3 which could replicate each partition on three
> cluster nodes?
>
>
>
>
> 2014-03-12 12:11 GMT+08:00 qingyang li <li...@gmail.com>:
>
> i have one table in memery,  when one worker becomes dead, i can not query
>> data from that table. Here is it's storage status:
>>
>>
>> RDD Name Storage LevelCached PartitionsFraction CachedSize in MemorySize
>> on Disk
>>  <http://192.168.1.101:4040/storage/rdd?id=47>
>> table01 Memory Deserialized 1x Replicated 119 88%       697.0 MB     0.0
>> Bso, my question is:
>> 1. what meaning is " Memory Deserialized 1x Replicated" ?
>> 2. how to config worker HA so that i can query data even one worker dead.
>>
>
>

Re: how to config worker HA

Posted by qingyang li <li...@gmail.com>.

in addition:
on this site:
https://spark.apache.org/docs/0.9.0/scala-programming-guide.html#hadoop-datasets
,
i find RDD can be stored using a different *storage level on the web,  and
*also find StorageLevel's attribute MEMORY_ONLY_2 .
MEMORY_ONLY_2, Same as the levels above, but replicate each partition on
two cluster nodes.
1. is this one point of fault-tolerance ?
2.if replicate each partition on two cluster nodes will help worker node HA
?
3. if there is MEMORY_ONLY_3 which could replicate each partition on three
cluster nodes?




2014-03-12 12:11 GMT+08:00 qingyang li <li...@gmail.com>:

> i have one table in memery,  when one worker becomes dead, i can not query
> data from that table. Here is it's storage status:
>
>
> RDD Name Storage LevelCached PartitionsFraction CachedSize in MemorySize
> on Disk
>  <http://192.168.1.101:4040/storage/rdd?id=47>
> table01 Memory Deserialized 1x Replicated 119 88%       697.0 MB     0.0 Bso,
> my question is:
> 1. what meaning is " Memory Deserialized 1x Replicated" ?
> 2. how to config worker HA so that i can query data even one worker dead.
>