You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@ignite.apache.org by Anil <an...@gmail.com> on 2017/02/24 06:05:59 UTC

Node fauliure

Hi ,

I see the node is down with following error while running compute task


# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x00007facd5cae561, pid=18543, tid=0x00007fab8a9ea700
#
# JRE version: OpenJDK Runtime Environment (8.0_111-b14) (build
1.8.0_111-8u111-b14-3~14.04.1-b14)
# Java VM: OpenJDK 64-Bit Server VM (25.111-b14 mixed mode linux-amd64
compressed oops)
# Problematic frame:
# J 8676 C2
org.apache.ignite.internal.processors.query.h2.opt.GridH2KeyValueRowOffheap.getOffheapValue(I)Lorg/h2/value/Value;
(290 bytes) @ 0x00007facd5cae561 [0x00007facd5cae180+0x3e1]
#
# Failed to write core dump. Core dumps have been disabled. To enable core
dumping, try "ulimit -c unlimited" before starting Java again
#
# An error report file with more information is saved as:
# /opt/ignite-manager/api/hs_err_pid18543.log
#
# If you would like to submit a bug report, please visit:
#   http://bugreport.java.com/bugreport/crash.jsp
#


I have two 2 caches on 4 node cluster each cache is configured with 10 gb
off heap.

ComputeTask performs the following execution and it is broad casted to all
nodes.

               for (Integer part : parts) {
ScanQuery<String, Person> scanQuery = new ScanQuery<String, Person>();
scanQuery.setLocal(true);
scanQuery.setPartition(part);

Iterator<Cache.Entry<String, Person>> iterator =
cache.query(scanQuery).iterator();

while (iterator.hasNext()) {
Cache.Entry<String, Person> row = iterator.next();
String eqId =   row.getValue().getEqId();
try {
QueryCursor<Entry<AffinityKey<String>, Contract>> pdCursor =
detailsCache.query(new SqlQuery<AffinityKey<String>,
PersonDetail>(PersonDetail.class,
"select * from DETAIL_CACHE.PersonDetail where eqId = ? order by enddate
desc").setLocal(true).setArgs(eqId));
Long prev = null;
for (Entry<AffinityKey<String>, PersonDetail> d : pdCursor) {
// populate person info into person detail
dataStreamer.addData(new AffinityKey<String>(detaildId, eqId),
d);
}
pdCursor.close();
}catch (Exception ex){
}
}

}


Please let me know if you see any issues with approach or any
configurations.

Thanks.

Re: Node fauliure

Posted by Andrey Mashenkov <an...@gmail.com>.

Hi Anil,

Unfortunatelly, I have no luck to reproduce it.

On Mon, Feb 27, 2017 at 3:21 PM, Anil <an...@gmail.com> wrote:

> Hi Andrey,
>
> thanks for looking into it. could you please share more details around the
> bug ? this helps us.
>
> Thanks.
>
> On 27 February 2017 at 17:27, Andrey Mashenkov <am...@gridgain.com>
> wrote:
>
>> Thanks, It was very helpful.
>>
>> Seems, Offheap with swap enabled funcionality has a bug.
>>
>> On Mon, Feb 27, 2017 at 2:46 PM, Anil <an...@gmail.com> wrote:
>>
>>> Hi Andrey,
>>>
>>> I set both off heap cache and swap enabled = true.
>>>
>>> Thanks
>>>
>>> On 27 February 2017 at 16:48, Andrey Mashenkov <
>>> andrey.mashenkov@gmail.com> wrote:
>>>
>>>> Hi Anil,
>>>>
>>>> One more question. Did you use Offheap cache or may be SwapEnabled=true
>>>> is set?
>>>>
>>>>
>>>> On Sat, Feb 25, 2017 at 5:14 AM, Anil <an...@gmail.com> wrote:
>>>>
>>>>> Thank you Andrey.
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Best regards,
>>>> Andrey V. Mashenkov
>>>>
>>>
>>>
>>
>


-- 
Best regards,
Andrey V. Mashenkov

Re: Node fauliure

Posted by Anil <an...@gmail.com>.

Hi Andrey,

thanks for looking into it. could you please share more details around the
bug ? this helps us.

Thanks.

On 27 February 2017 at 17:27, Andrey Mashenkov <am...@gridgain.com>
wrote:

> Thanks, It was very helpful.
>
> Seems, Offheap with swap enabled funcionality has a bug.
>
> On Mon, Feb 27, 2017 at 2:46 PM, Anil <an...@gmail.com> wrote:
>
>> Hi Andrey,
>>
>> I set both off heap cache and swap enabled = true.
>>
>> Thanks
>>
>> On 27 February 2017 at 16:48, Andrey Mashenkov <
>> andrey.mashenkov@gmail.com> wrote:
>>
>>> Hi Anil,
>>>
>>> One more question. Did you use Offheap cache or may be SwapEnabled=true
>>> is set?
>>>
>>>
>>> On Sat, Feb 25, 2017 at 5:14 AM, Anil <an...@gmail.com> wrote:
>>>
>>>> Thank you Andrey.
>>>>
>>>
>>>
>>>
>>> --
>>> Best regards,
>>> Andrey V. Mashenkov
>>>
>>
>>
>

Re: Node fauliure

Posted by Andrey Mashenkov <am...@gridgain.com>.

Thanks, It was very helpful.

Seems, Offheap with swap enabled funcionality has a bug.

On Mon, Feb 27, 2017 at 2:46 PM, Anil <an...@gmail.com> wrote:

> Hi Andrey,
>
> I set both off heap cache and swap enabled = true.
>
> Thanks
>
> On 27 February 2017 at 16:48, Andrey Mashenkov <andrey.mashenkov@gmail.com
> > wrote:
>
>> Hi Anil,
>>
>> One more question. Did you use Offheap cache or may be SwapEnabled=true
>> is set?
>>
>>
>> On Sat, Feb 25, 2017 at 5:14 AM, Anil <an...@gmail.com> wrote:
>>
>>> Thank you Andrey.
>>>
>>
>>
>>
>> --
>> Best regards,
>> Andrey V. Mashenkov
>>
>
>

Re: Node fauliure

Posted by Anil <an...@gmail.com>.

Hi Andrey,

I set both off heap cache and swap enabled = true.

Thanks

On 27 February 2017 at 16:48, Andrey Mashenkov <an...@gmail.com>
wrote:

> Hi Anil,
>
> One more question. Did you use Offheap cache or may be SwapEnabled=true is
> set?
>
>
> On Sat, Feb 25, 2017 at 5:14 AM, Anil <an...@gmail.com> wrote:
>
>> Thank you Andrey.
>>
>
>
>
> --
> Best regards,
> Andrey V. Mashenkov
>

Re: Node fauliure

Posted by Andrey Mashenkov <an...@gmail.com>.

Hi Anil,

One more question. Did you use Offheap cache or may be SwapEnabled=true is
set?


On Sat, Feb 25, 2017 at 5:14 AM, Anil <an...@gmail.com> wrote:

> Thank you Andrey.
>



-- 
Best regards,
Andrey V. Mashenkov

Re: Node fauliure

Posted by Anil <an...@gmail.com>.

Thank you Andrey.

Re: Node fauliure

Posted by Andrey Mashenkov <an...@gmail.com>.

Hi Anil,

I've created a Jira ticket [1] for SIGSEGV JVM crash. So, you can track it.

[1] https://issues.apache.org/jira/browse/IGNITE-4751

On Fri, Feb 24, 2017 at 4:41 PM, Andrey Mashenkov <
andrey.mashenkov@gmail.com> wrote:

> Hi Anil,
>
> Partition processing time can be high due to non-uniform data distribution
> when data collocation or bad affinity function used.
> It is ok for collocate data, and it is shouldn't be an  issue if partition
> size variation is low enough and you iterate over partitions in parallel
> manner.
>
> Please, check that partition sizes.
>
>
> You approach looks correct.
>
> On Fri, Feb 24, 2017 at 4:13 PM, Anil <an...@gmail.com> wrote:
>
>> Hi Andrey,
>>
>> if you notice in the log, time taken to process the partition is high ( >
>> 15 sec). Not sure what is causing that high query time.
>>
>> In my case, both caches are collocated, and eqId column is indexed and
>> setLocal is true for the query.
>>
>> I wonder if my approach is correct. please correct it, in case you see it
>> is suspicious.
>>
>> Thanks.
>>
>>
>>
>> On 24 February 2017 at 18:37, Anil <an...@gmail.com> wrote:
>>
>>> Hi Andrey,
>>>
>>> I have attached the log. thanks.
>>>
>>> Thanks.
>>>
>>>
>>>
>>>
>>>
>>> On 24 February 2017 at 18:16, Andrey Mashenkov <
>>> andrey.mashenkov@gmail.com> wrote:
>>>
>>>> Hi Anil,
>>>>
>>>> Would you please provide ignite logs as well?
>>>>
>>>>
>>>> On Fri, Feb 24, 2017 at 3:33 PM, Andrey Gura <ag...@apache.org> wrote:
>>>>
>>>>> Hi, Anil
>>>>>
>>>>> Could you please provide crash dump? In your case it is
>>>>> /opt/ignite-manager/api/hs_err_pid18543.log file.
>>>>>
>>>>> On Fri, Feb 24, 2017 at 9:05 AM, Anil <an...@gmail.com> wrote:
>>>>> > Hi ,
>>>>> >
>>>>> > I see the node is down with following error while running compute
>>>>> task
>>>>> >
>>>>> >
>>>>> > # A fatal error has been detected by the Java Runtime Environment:
>>>>> > #
>>>>> > #  SIGSEGV (0xb) at pc=0x00007facd5cae561, pid=18543,
>>>>> tid=0x00007fab8a9ea700
>>>>> > #
>>>>> > # JRE version: OpenJDK Runtime Environment (8.0_111-b14) (build
>>>>> > 1.8.0_111-8u111-b14-3~14.04.1-b14)
>>>>> > # Java VM: OpenJDK 64-Bit Server VM (25.111-b14 mixed mode
>>>>> linux-amd64
>>>>> > compressed oops)
>>>>> > # Problematic frame:
>>>>> > # J 8676 C2
>>>>> > org.apache.ignite.internal.processors.query.h2.opt.GridH2Key
>>>>> ValueRowOffheap.getOffheapValue(I)Lorg/h2/value/Value;
>>>>> > (290 bytes) @ 0x00007facd5cae561 [0x00007facd5cae180+0x3e1]
>>>>> > #
>>>>> > # Failed to write core dump. Core dumps have been disabled. To
>>>>> enable core
>>>>> > dumping, try "ulimit -c unlimited" before starting Java again
>>>>> > #
>>>>> > # An error report file with more information is saved as:
>>>>> > # /opt/ignite-manager/api/hs_err_pid18543.log
>>>>> > #
>>>>> > # If you would like to submit a bug report, please visit:
>>>>> > #   http://bugreport.java.com/bugreport/crash.jsp
>>>>> > #
>>>>> >
>>>>> >
>>>>> > I have two 2 caches on 4 node cluster each cache is configured with
>>>>> 10 gb
>>>>> > off heap.
>>>>> >
>>>>> > ComputeTask performs the following execution and it is broad casted
>>>>> to all
>>>>> > nodes.
>>>>> >
>>>>> >                for (Integer part : parts) {
>>>>> > ScanQuery<String, Person> scanQuery = new ScanQuery<String,
>>>>> Person>();
>>>>> > scanQuery.setLocal(true);
>>>>> > scanQuery.setPartition(part);
>>>>> >
>>>>> > Iterator<Cache.Entry<String, Person>> iterator =
>>>>> > cache.query(scanQuery).iterator();
>>>>> >
>>>>> > while (iterator.hasNext()) {
>>>>> > Cache.Entry<String, Person> row = iterator.next();
>>>>> > String eqId =   row.getValue().getEqId();
>>>>> > try {
>>>>> > QueryCursor<Entry<AffinityKey<String>, Contract>> pdCursor =
>>>>> > detailsCache.query(new SqlQuery<AffinityKey<String>,
>>>>> > PersonDetail>(PersonDetail.class,
>>>>> > "select * from DETAIL_CACHE.PersonDetail where eqId = ? order by
>>>>> enddate
>>>>> > desc").setLocal(true).setArgs(eqId));
>>>>> > Long prev = null;
>>>>> > for (Entry<AffinityKey<String>, PersonDetail> d : pdCursor) {
>>>>> > // populate person info into person detail
>>>>> > dataStreamer.addData(new AffinityKey<String>(detaildId, eqId),
>>>>> > d);
>>>>> > }
>>>>> > pdCursor.close();
>>>>> > }catch (Exception ex){
>>>>> > }
>>>>> > }
>>>>> >
>>>>> > }
>>>>> >
>>>>> >
>>>>> > Please let me know if you see any issues with approach or any
>>>>> > configurations.
>>>>> >
>>>>> > Thanks.
>>>>> >
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Best regards,
>>>> Andrey V. Mashenkov
>>>>
>>>
>>>
>>
>
>
> --
> Best regards,
> Andrey V. Mashenkov
>



-- 
Best regards,
Andrey V. Mashenkov

Re: Node fauliure

Posted by Andrey Mashenkov <an...@gmail.com>.

Hi Anil,

Partition processing time can be high due to non-uniform data distribution
when data collocation or bad affinity function used.
It is ok for collocate data, and it is shouldn't be an  issue if partition
size variation is low enough and you iterate over partitions in parallel
manner.

Please, check that partition sizes.


You approach looks correct.

On Fri, Feb 24, 2017 at 4:13 PM, Anil <an...@gmail.com> wrote:

> Hi Andrey,
>
> if you notice in the log, time taken to process the partition is high ( >
> 15 sec). Not sure what is causing that high query time.
>
> In my case, both caches are collocated, and eqId column is indexed and
> setLocal is true for the query.
>
> I wonder if my approach is correct. please correct it, in case you see it
> is suspicious.
>
> Thanks.
>
>
>
> On 24 February 2017 at 18:37, Anil <an...@gmail.com> wrote:
>
>> Hi Andrey,
>>
>> I have attached the log. thanks.
>>
>> Thanks.
>>
>>
>>
>>
>>
>> On 24 February 2017 at 18:16, Andrey Mashenkov <
>> andrey.mashenkov@gmail.com> wrote:
>>
>>> Hi Anil,
>>>
>>> Would you please provide ignite logs as well?
>>>
>>>
>>> On Fri, Feb 24, 2017 at 3:33 PM, Andrey Gura <ag...@apache.org> wrote:
>>>
>>>> Hi, Anil
>>>>
>>>> Could you please provide crash dump? In your case it is
>>>> /opt/ignite-manager/api/hs_err_pid18543.log file.
>>>>
>>>> On Fri, Feb 24, 2017 at 9:05 AM, Anil <an...@gmail.com> wrote:
>>>> > Hi ,
>>>> >
>>>> > I see the node is down with following error while running compute task
>>>> >
>>>> >
>>>> > # A fatal error has been detected by the Java Runtime Environment:
>>>> > #
>>>> > #  SIGSEGV (0xb) at pc=0x00007facd5cae561, pid=18543,
>>>> tid=0x00007fab8a9ea700
>>>> > #
>>>> > # JRE version: OpenJDK Runtime Environment (8.0_111-b14) (build
>>>> > 1.8.0_111-8u111-b14-3~14.04.1-b14)
>>>> > # Java VM: OpenJDK 64-Bit Server VM (25.111-b14 mixed mode linux-amd64
>>>> > compressed oops)
>>>> > # Problematic frame:
>>>> > # J 8676 C2
>>>> > org.apache.ignite.internal.processors.query.h2.opt.GridH2Key
>>>> ValueRowOffheap.getOffheapValue(I)Lorg/h2/value/Value;
>>>> > (290 bytes) @ 0x00007facd5cae561 [0x00007facd5cae180+0x3e1]
>>>> > #
>>>> > # Failed to write core dump. Core dumps have been disabled. To enable
>>>> core
>>>> > dumping, try "ulimit -c unlimited" before starting Java again
>>>> > #
>>>> > # An error report file with more information is saved as:
>>>> > # /opt/ignite-manager/api/hs_err_pid18543.log
>>>> > #
>>>> > # If you would like to submit a bug report, please visit:
>>>> > #   http://bugreport.java.com/bugreport/crash.jsp
>>>> > #
>>>> >
>>>> >
>>>> > I have two 2 caches on 4 node cluster each cache is configured with
>>>> 10 gb
>>>> > off heap.
>>>> >
>>>> > ComputeTask performs the following execution and it is broad casted
>>>> to all
>>>> > nodes.
>>>> >
>>>> >                for (Integer part : parts) {
>>>> > ScanQuery<String, Person> scanQuery = new ScanQuery<String, Person>();
>>>> > scanQuery.setLocal(true);
>>>> > scanQuery.setPartition(part);
>>>> >
>>>> > Iterator<Cache.Entry<String, Person>> iterator =
>>>> > cache.query(scanQuery).iterator();
>>>> >
>>>> > while (iterator.hasNext()) {
>>>> > Cache.Entry<String, Person> row = iterator.next();
>>>> > String eqId =   row.getValue().getEqId();
>>>> > try {
>>>> > QueryCursor<Entry<AffinityKey<String>, Contract>> pdCursor =
>>>> > detailsCache.query(new SqlQuery<AffinityKey<String>,
>>>> > PersonDetail>(PersonDetail.class,
>>>> > "select * from DETAIL_CACHE.PersonDetail where eqId = ? order by
>>>> enddate
>>>> > desc").setLocal(true).setArgs(eqId));
>>>> > Long prev = null;
>>>> > for (Entry<AffinityKey<String>, PersonDetail> d : pdCursor) {
>>>> > // populate person info into person detail
>>>> > dataStreamer.addData(new AffinityKey<String>(detaildId, eqId),
>>>> > d);
>>>> > }
>>>> > pdCursor.close();
>>>> > }catch (Exception ex){
>>>> > }
>>>> > }
>>>> >
>>>> > }
>>>> >
>>>> >
>>>> > Please let me know if you see any issues with approach or any
>>>> > configurations.
>>>> >
>>>> > Thanks.
>>>> >
>>>>
>>>
>>>
>>>
>>> --
>>> Best regards,
>>> Andrey V. Mashenkov
>>>
>>
>>
>


-- 
Best regards,
Andrey V. Mashenkov

Re: Node fauliure

Posted by Anil <an...@gmail.com>.

Hi Andrey,

if you notice in the log, time taken to process the partition is high ( >
15 sec). Not sure what is causing that high query time.

In my case, both caches are collocated, and eqId column is indexed and
setLocal is true for the query.

I wonder if my approach is correct. please correct it, in case you see it
is suspicious.

Thanks.



On 24 February 2017 at 18:37, Anil <an...@gmail.com> wrote:

> Hi Andrey,
>
> I have attached the log. thanks.
>
> Thanks.
>
>
>
>
>
> On 24 February 2017 at 18:16, Andrey Mashenkov <andrey.mashenkov@gmail.com
> > wrote:
>
>> Hi Anil,
>>
>> Would you please provide ignite logs as well?
>>
>>
>> On Fri, Feb 24, 2017 at 3:33 PM, Andrey Gura <ag...@apache.org> wrote:
>>
>>> Hi, Anil
>>>
>>> Could you please provide crash dump? In your case it is
>>> /opt/ignite-manager/api/hs_err_pid18543.log file.
>>>
>>> On Fri, Feb 24, 2017 at 9:05 AM, Anil <an...@gmail.com> wrote:
>>> > Hi ,
>>> >
>>> > I see the node is down with following error while running compute task
>>> >
>>> >
>>> > # A fatal error has been detected by the Java Runtime Environment:
>>> > #
>>> > #  SIGSEGV (0xb) at pc=0x00007facd5cae561, pid=18543,
>>> tid=0x00007fab8a9ea700
>>> > #
>>> > # JRE version: OpenJDK Runtime Environment (8.0_111-b14) (build
>>> > 1.8.0_111-8u111-b14-3~14.04.1-b14)
>>> > # Java VM: OpenJDK 64-Bit Server VM (25.111-b14 mixed mode linux-amd64
>>> > compressed oops)
>>> > # Problematic frame:
>>> > # J 8676 C2
>>> > org.apache.ignite.internal.processors.query.h2.opt.GridH2Key
>>> ValueRowOffheap.getOffheapValue(I)Lorg/h2/value/Value;
>>> > (290 bytes) @ 0x00007facd5cae561 [0x00007facd5cae180+0x3e1]
>>> > #
>>> > # Failed to write core dump. Core dumps have been disabled. To enable
>>> core
>>> > dumping, try "ulimit -c unlimited" before starting Java again
>>> > #
>>> > # An error report file with more information is saved as:
>>> > # /opt/ignite-manager/api/hs_err_pid18543.log
>>> > #
>>> > # If you would like to submit a bug report, please visit:
>>> > #   http://bugreport.java.com/bugreport/crash.jsp
>>> > #
>>> >
>>> >
>>> > I have two 2 caches on 4 node cluster each cache is configured with 10
>>> gb
>>> > off heap.
>>> >
>>> > ComputeTask performs the following execution and it is broad casted to
>>> all
>>> > nodes.
>>> >
>>> >                for (Integer part : parts) {
>>> > ScanQuery<String, Person> scanQuery = new ScanQuery<String, Person>();
>>> > scanQuery.setLocal(true);
>>> > scanQuery.setPartition(part);
>>> >
>>> > Iterator<Cache.Entry<String, Person>> iterator =
>>> > cache.query(scanQuery).iterator();
>>> >
>>> > while (iterator.hasNext()) {
>>> > Cache.Entry<String, Person> row = iterator.next();
>>> > String eqId =   row.getValue().getEqId();
>>> > try {
>>> > QueryCursor<Entry<AffinityKey<String>, Contract>> pdCursor =
>>> > detailsCache.query(new SqlQuery<AffinityKey<String>,
>>> > PersonDetail>(PersonDetail.class,
>>> > "select * from DETAIL_CACHE.PersonDetail where eqId = ? order by
>>> enddate
>>> > desc").setLocal(true).setArgs(eqId));
>>> > Long prev = null;
>>> > for (Entry<AffinityKey<String>, PersonDetail> d : pdCursor) {
>>> > // populate person info into person detail
>>> > dataStreamer.addData(new AffinityKey<String>(detaildId, eqId),
>>> > d);
>>> > }
>>> > pdCursor.close();
>>> > }catch (Exception ex){
>>> > }
>>> > }
>>> >
>>> > }
>>> >
>>> >
>>> > Please let me know if you see any issues with approach or any
>>> > configurations.
>>> >
>>> > Thanks.
>>> >
>>>
>>
>>
>>
>> --
>> Best regards,
>> Andrey V. Mashenkov
>>
>
>

Re: Node fauliure

Posted by Anil <an...@gmail.com>.

Hi Andrey,

I have attached the log. thanks.

Thanks.





On 24 February 2017 at 18:16, Andrey Mashenkov <an...@gmail.com>
wrote:

> Hi Anil,
>
> Would you please provide ignite logs as well?
>
>
> On Fri, Feb 24, 2017 at 3:33 PM, Andrey Gura <ag...@apache.org> wrote:
>
>> Hi, Anil
>>
>> Could you please provide crash dump? In your case it is
>> /opt/ignite-manager/api/hs_err_pid18543.log file.
>>
>> On Fri, Feb 24, 2017 at 9:05 AM, Anil <an...@gmail.com> wrote:
>> > Hi ,
>> >
>> > I see the node is down with following error while running compute task
>> >
>> >
>> > # A fatal error has been detected by the Java Runtime Environment:
>> > #
>> > #  SIGSEGV (0xb) at pc=0x00007facd5cae561, pid=18543,
>> tid=0x00007fab8a9ea700
>> > #
>> > # JRE version: OpenJDK Runtime Environment (8.0_111-b14) (build
>> > 1.8.0_111-8u111-b14-3~14.04.1-b14)
>> > # Java VM: OpenJDK 64-Bit Server VM (25.111-b14 mixed mode linux-amd64
>> > compressed oops)
>> > # Problematic frame:
>> > # J 8676 C2
>> > org.apache.ignite.internal.processors.query.h2.opt.GridH2Key
>> ValueRowOffheap.getOffheapValue(I)Lorg/h2/value/Value;
>> > (290 bytes) @ 0x00007facd5cae561 [0x00007facd5cae180+0x3e1]
>> > #
>> > # Failed to write core dump. Core dumps have been disabled. To enable
>> core
>> > dumping, try "ulimit -c unlimited" before starting Java again
>> > #
>> > # An error report file with more information is saved as:
>> > # /opt/ignite-manager/api/hs_err_pid18543.log
>> > #
>> > # If you would like to submit a bug report, please visit:
>> > #   http://bugreport.java.com/bugreport/crash.jsp
>> > #
>> >
>> >
>> > I have two 2 caches on 4 node cluster each cache is configured with 10
>> gb
>> > off heap.
>> >
>> > ComputeTask performs the following execution and it is broad casted to
>> all
>> > nodes.
>> >
>> >                for (Integer part : parts) {
>> > ScanQuery<String, Person> scanQuery = new ScanQuery<String, Person>();
>> > scanQuery.setLocal(true);
>> > scanQuery.setPartition(part);
>> >
>> > Iterator<Cache.Entry<String, Person>> iterator =
>> > cache.query(scanQuery).iterator();
>> >
>> > while (iterator.hasNext()) {
>> > Cache.Entry<String, Person> row = iterator.next();
>> > String eqId =   row.getValue().getEqId();
>> > try {
>> > QueryCursor<Entry<AffinityKey<String>, Contract>> pdCursor =
>> > detailsCache.query(new SqlQuery<AffinityKey<String>,
>> > PersonDetail>(PersonDetail.class,
>> > "select * from DETAIL_CACHE.PersonDetail where eqId = ? order by enddate
>> > desc").setLocal(true).setArgs(eqId));
>> > Long prev = null;
>> > for (Entry<AffinityKey<String>, PersonDetail> d : pdCursor) {
>> > // populate person info into person detail
>> > dataStreamer.addData(new AffinityKey<String>(detaildId, eqId),
>> > d);
>> > }
>> > pdCursor.close();
>> > }catch (Exception ex){
>> > }
>> > }
>> >
>> > }
>> >
>> >
>> > Please let me know if you see any issues with approach or any
>> > configurations.
>> >
>> > Thanks.
>> >
>>
>
>
>
> --
> Best regards,
> Andrey V. Mashenkov
>

Re: Node fauliure

Posted by Andrey Mashenkov <an...@gmail.com>.

Hi Anil,

Would you please provide ignite logs as well?


On Fri, Feb 24, 2017 at 3:33 PM, Andrey Gura <ag...@apache.org> wrote:

> Hi, Anil
>
> Could you please provide crash dump? In your case it is
> /opt/ignite-manager/api/hs_err_pid18543.log file.
>
> On Fri, Feb 24, 2017 at 9:05 AM, Anil <an...@gmail.com> wrote:
> > Hi ,
> >
> > I see the node is down with following error while running compute task
> >
> >
> > # A fatal error has been detected by the Java Runtime Environment:
> > #
> > #  SIGSEGV (0xb) at pc=0x00007facd5cae561, pid=18543,
> tid=0x00007fab8a9ea700
> > #
> > # JRE version: OpenJDK Runtime Environment (8.0_111-b14) (build
> > 1.8.0_111-8u111-b14-3~14.04.1-b14)
> > # Java VM: OpenJDK 64-Bit Server VM (25.111-b14 mixed mode linux-amd64
> > compressed oops)
> > # Problematic frame:
> > # J 8676 C2
> > org.apache.ignite.internal.processors.query.h2.opt.
> GridH2KeyValueRowOffheap.getOffheapValue(I)Lorg/h2/value/Value;
> > (290 bytes) @ 0x00007facd5cae561 [0x00007facd5cae180+0x3e1]
> > #
> > # Failed to write core dump. Core dumps have been disabled. To enable
> core
> > dumping, try "ulimit -c unlimited" before starting Java again
> > #
> > # An error report file with more information is saved as:
> > # /opt/ignite-manager/api/hs_err_pid18543.log
> > #
> > # If you would like to submit a bug report, please visit:
> > #   http://bugreport.java.com/bugreport/crash.jsp
> > #
> >
> >
> > I have two 2 caches on 4 node cluster each cache is configured with 10 gb
> > off heap.
> >
> > ComputeTask performs the following execution and it is broad casted to
> all
> > nodes.
> >
> >                for (Integer part : parts) {
> > ScanQuery<String, Person> scanQuery = new ScanQuery<String, Person>();
> > scanQuery.setLocal(true);
> > scanQuery.setPartition(part);
> >
> > Iterator<Cache.Entry<String, Person>> iterator =
> > cache.query(scanQuery).iterator();
> >
> > while (iterator.hasNext()) {
> > Cache.Entry<String, Person> row = iterator.next();
> > String eqId =   row.getValue().getEqId();
> > try {
> > QueryCursor<Entry<AffinityKey<String>, Contract>> pdCursor =
> > detailsCache.query(new SqlQuery<AffinityKey<String>,
> > PersonDetail>(PersonDetail.class,
> > "select * from DETAIL_CACHE.PersonDetail where eqId = ? order by enddate
> > desc").setLocal(true).setArgs(eqId));
> > Long prev = null;
> > for (Entry<AffinityKey<String>, PersonDetail> d : pdCursor) {
> > // populate person info into person detail
> > dataStreamer.addData(new AffinityKey<String>(detaildId, eqId),
> > d);
> > }
> > pdCursor.close();
> > }catch (Exception ex){
> > }
> > }
> >
> > }
> >
> >
> > Please let me know if you see any issues with approach or any
> > configurations.
> >
> > Thanks.
> >
>



-- 
Best regards,
Andrey V. Mashenkov

Re: Node fauliure

Posted by Anil <an...@gmail.com>.

HI Andrey,

i have attached the log.

Compute Job ran without issues yesterday without setLocal(true) on scan
query and without order by on detailsCache in the given code.

I am not sure adding this these two caused the issue. But as per
ignite-examples setLocal(true) is required when compute task is broadcast-ed

Thanks.


On 24 February 2017 at 18:03, Andrey Gura <ag...@apache.org> wrote:

> Hi, Anil
>
> Could you please provide crash dump? In your case it is
> /opt/ignite-manager/api/hs_err_pid18543.log file.
>
> On Fri, Feb 24, 2017 at 9:05 AM, Anil <an...@gmail.com> wrote:
> > Hi ,
> >
> > I see the node is down with following error while running compute task
> >
> >
> > # A fatal error has been detected by the Java Runtime Environment:
> > #
> > #  SIGSEGV (0xb) at pc=0x00007facd5cae561, pid=18543,
> tid=0x00007fab8a9ea700
> > #
> > # JRE version: OpenJDK Runtime Environment (8.0_111-b14) (build
> > 1.8.0_111-8u111-b14-3~14.04.1-b14)
> > # Java VM: OpenJDK 64-Bit Server VM (25.111-b14 mixed mode linux-amd64
> > compressed oops)
> > # Problematic frame:
> > # J 8676 C2
> > org.apache.ignite.internal.processors.query.h2.opt.
> GridH2KeyValueRowOffheap.getOffheapValue(I)Lorg/h2/value/Value;
> > (290 bytes) @ 0x00007facd5cae561 [0x00007facd5cae180+0x3e1]
> > #
> > # Failed to write core dump. Core dumps have been disabled. To enable
> core
> > dumping, try "ulimit -c unlimited" before starting Java again
> > #
> > # An error report file with more information is saved as:
> > # /opt/ignite-manager/api/hs_err_pid18543.log
> > #
> > # If you would like to submit a bug report, please visit:
> > #   http://bugreport.java.com/bugreport/crash.jsp
> > #
> >
> >
> > I have two 2 caches on 4 node cluster each cache is configured with 10 gb
> > off heap.
> >
> > ComputeTask performs the following execution and it is broad casted to
> all
> > nodes.
> >
> >                for (Integer part : parts) {
> > ScanQuery<String, Person> scanQuery = new ScanQuery<String, Person>();
> > scanQuery.setLocal(true);
> > scanQuery.setPartition(part);
> >
> > Iterator<Cache.Entry<String, Person>> iterator =
> > cache.query(scanQuery).iterator();
> >
> > while (iterator.hasNext()) {
> > Cache.Entry<String, Person> row = iterator.next();
> > String eqId =   row.getValue().getEqId();
> > try {
> > QueryCursor<Entry<AffinityKey<String>, Contract>> pdCursor =
> > detailsCache.query(new SqlQuery<AffinityKey<String>,
> > PersonDetail>(PersonDetail.class,
> > "select * from DETAIL_CACHE.PersonDetail where eqId = ? order by enddate
> > desc").setLocal(true).setArgs(eqId));
> > Long prev = null;
> > for (Entry<AffinityKey<String>, PersonDetail> d : pdCursor) {
> > // populate person info into person detail
> > dataStreamer.addData(new AffinityKey<String>(detaildId, eqId),
> > d);
> > }
> > pdCursor.close();
> > }catch (Exception ex){
> > }
> > }
> >
> > }
> >
> >
> > Please let me know if you see any issues with approach or any
> > configurations.
> >
> > Thanks.
> >
>

Re: Node fauliure

Posted by Andrey Gura <ag...@apache.org>.

Hi, Anil

Could you please provide crash dump? In your case it is
/opt/ignite-manager/api/hs_err_pid18543.log file.

On Fri, Feb 24, 2017 at 9:05 AM, Anil <an...@gmail.com> wrote:
> Hi ,
>
> I see the node is down with following error while running compute task
>
>
> # A fatal error has been detected by the Java Runtime Environment:
> #
> #  SIGSEGV (0xb) at pc=0x00007facd5cae561, pid=18543, tid=0x00007fab8a9ea700
> #
> # JRE version: OpenJDK Runtime Environment (8.0_111-b14) (build
> 1.8.0_111-8u111-b14-3~14.04.1-b14)
> # Java VM: OpenJDK 64-Bit Server VM (25.111-b14 mixed mode linux-amd64
> compressed oops)
> # Problematic frame:
> # J 8676 C2
> org.apache.ignite.internal.processors.query.h2.opt.GridH2KeyValueRowOffheap.getOffheapValue(I)Lorg/h2/value/Value;
> (290 bytes) @ 0x00007facd5cae561 [0x00007facd5cae180+0x3e1]
> #
> # Failed to write core dump. Core dumps have been disabled. To enable core
> dumping, try "ulimit -c unlimited" before starting Java again
> #
> # An error report file with more information is saved as:
> # /opt/ignite-manager/api/hs_err_pid18543.log
> #
> # If you would like to submit a bug report, please visit:
> #   http://bugreport.java.com/bugreport/crash.jsp
> #
>
>
> I have two 2 caches on 4 node cluster each cache is configured with 10 gb
> off heap.
>
> ComputeTask performs the following execution and it is broad casted to all
> nodes.
>
>                for (Integer part : parts) {
> ScanQuery<String, Person> scanQuery = new ScanQuery<String, Person>();
> scanQuery.setLocal(true);
> scanQuery.setPartition(part);
>
> Iterator<Cache.Entry<String, Person>> iterator =
> cache.query(scanQuery).iterator();
>
> while (iterator.hasNext()) {
> Cache.Entry<String, Person> row = iterator.next();
> String eqId =   row.getValue().getEqId();
> try {
> QueryCursor<Entry<AffinityKey<String>, Contract>> pdCursor =
> detailsCache.query(new SqlQuery<AffinityKey<String>,
> PersonDetail>(PersonDetail.class,
> "select * from DETAIL_CACHE.PersonDetail where eqId = ? order by enddate
> desc").setLocal(true).setArgs(eqId));
> Long prev = null;
> for (Entry<AffinityKey<String>, PersonDetail> d : pdCursor) {
> // populate person info into person detail
> dataStreamer.addData(new AffinityKey<String>(detaildId, eqId),
> d);
> }
> pdCursor.close();
> }catch (Exception ex){
> }
> }
>
> }
>
>
> Please let me know if you see any issues with approach or any
> configurations.
>
> Thanks.
>