You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@carbondata.apache.org by kumarvishal09 <gi...@git.apache.org> on 2016/09/15 08:52:06 UTC

[GitHub] incubator-carbondata pull request #158: [CARBONDATA-241]Fixed out of memory ...

GitHub user kumarvishal09 opened a pull request:

    https://github.com/apache/incubator-carbondata/pull/158

    [CARBONDATA-241]Fixed out of memory issue during query execution

    **Problem:** During long run query execution is taking more time and it is throwing out of memory issue.
    **Reason**: In compaction we are compacting segments and each segment metadata is loaded in memory. So after compaction compacted segments are invalid but its meta data is not removed from memory because of this duplicate metadata is pile up and it is taking more memory and after few days query execution is throwing OOM
    **Solution**: Need to remove invalid blocks from memory
    
     


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/kumarvishal09/incubator-carbondata OOMIssue

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/incubator-carbondata/pull/158.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #158
    
----
commit 3ff39301df586597eebf3a8d92ca3c60f5eba531
Author: kumarvishal <ku...@gmail.com>
Date:   2016-09-15T08:41:00Z

    Fixed out of memory issue during query execution

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #158: [CARBONDATA-241]Fixed out of memory ...

Posted by gvramana <gi...@git.apache.org>.
Github user gvramana commented on a diff in the pull request:

    https://github.com/apache/incubator-carbondata/pull/158#discussion_r79290267
  
    --- Diff: hadoop/src/main/java/org/apache/carbondata/hadoop/CarbonInputFormat.java ---
    @@ -706,8 +725,9 @@ private String getUpdateExtension() {
       /**
        * @return updateExtension
        */
    -  private String[] getValidSegments(JobContext job) throws IOException {
    -    String segmentString = job.getConfiguration().get(INPUT_SEGMENT_NUMBERS, "");
    +  private String[] getSegmentsFromConfiguration(JobContext job, String segmentType)
    +      throws IOException {
    +    String segmentString = job.getConfiguration().get(segmentType, "");
    --- End diff --
    
    change signature to previous one


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #158: [CARBONDATA-241]Fixed out of memory ...

Posted by kumarvishal09 <gi...@git.apache.org>.
Github user kumarvishal09 commented on a diff in the pull request:

    https://github.com/apache/incubator-carbondata/pull/158#discussion_r79109942
  
    --- Diff: processing/src/main/java/org/apache/carbondata/lcm/status/SegmentStatusManager.java ---
    @@ -102,6 +91,60 @@ public long getTableStatusLastModifiedTime() throws IOException {
     
       /**
        * get valid segment for given table
    +   *
    +   * @return
    +   * @throws IOException
    +   */
    +  public InvalidSegmentsInfo getInvalidSegments() throws IOException {
    --- End diff --
    
    ok i will handle 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #158: [CARBONDATA-241]Fixed out of memory ...

Posted by gvramana <gi...@git.apache.org>.
Github user gvramana commented on a diff in the pull request:

    https://github.com/apache/incubator-carbondata/pull/158#discussion_r79290374
  
    --- Diff: integration/spark/src/main/scala/org/apache/carbondata/spark/rdd/CarbonScanRDD.scala ---
    @@ -102,7 +115,7 @@ class CarbonScanRDD[V: ClassTag](
         val splits = carbonInputFormat.getSplits(job)
         if (!splits.isEmpty) {
           val carbonInputSplits = splits.asScala.map(_.asInstanceOf[CarbonInputSplit])
    -
    +      queryModel.setInvalidSegmentIds(validAndInvalidSegments.getInvalidSegments)
    --- End diff --
    
    move this to common getSplits, other wise validAndInvalidSegments can be null, if parallel deletion happens.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #158: [CARBONDATA-241]Fixed out of memory ...

Posted by gvramana <gi...@git.apache.org>.
Github user gvramana commented on a diff in the pull request:

    https://github.com/apache/incubator-carbondata/pull/158#discussion_r79290196
  
    --- Diff: hadoop/src/test/java/org/apache/carbondata/hadoop/ft/CarbonInputMapperTest.java ---
    @@ -129,6 +132,37 @@ private int countTheColumns(String outPath) throws Exception {
         return 0;
       }
     
    +  private void runJob(String outPath, CarbonProjection projection, Expression filter)
    --- End diff --
    
    move back the code to original place


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #158: [CARBONDATA-241]Fixed out of memory ...

Posted by gvramana <gi...@git.apache.org>.
Github user gvramana commented on a diff in the pull request:

    https://github.com/apache/incubator-carbondata/pull/158#discussion_r79222167
  
    --- Diff: hadoop/src/main/java/org/apache/carbondata/hadoop/CarbonInputFormat.java ---
    @@ -101,6 +106,8 @@
       //comma separated list of input segment numbers
       public static final String INPUT_SEGMENT_NUMBERS =
           "mapreduce.input.carboninputformat.segmentnumbers";
    +  public static final String INVALID_SEGMENT_NUMBERS =
    +      "mapreduce.input.carboninputformat.invalidsegmentnumbers";
    --- End diff --
    
    Invalid segment deletion, need not be through CarbonInputFormat, When Invalid segments list given to Btree(both in Driver and executor it should able it delete invalid blocks).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #158: [CARBONDATA-241]Fixed out of memory ...

Posted by kumarvishal09 <gi...@git.apache.org>.
Github user kumarvishal09 commented on a diff in the pull request:

    https://github.com/apache/incubator-carbondata/pull/158#discussion_r79207294
  
    --- Diff: integration/spark/src/main/scala/org/apache/spark/sql/CarbonDatasourceHadoopRelation.scala ---
    @@ -20,20 +20,13 @@ package org.apache.spark.sql
     import java.text.SimpleDateFormat
     import java.util.Date
     
    -import org.apache.carbondata.core.carbon.AbsoluteTableIdentifier
    --- End diff --
    
    done


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #158: [CARBONDATA-241]Fixed out of memory ...

Posted by gvramana <gi...@git.apache.org>.
Github user gvramana commented on a diff in the pull request:

    https://github.com/apache/incubator-carbondata/pull/158#discussion_r79008577
  
    --- Diff: integration/spark/src/main/scala/org/apache/spark/sql/CarbonDatasourceHadoopRelation.scala ---
    @@ -20,20 +20,13 @@ package org.apache.spark.sql
     import java.text.SimpleDateFormat
     import java.util.Date
     
    -import org.apache.carbondata.core.carbon.AbsoluteTableIdentifier
    --- End diff --
    
    Merged this commit (compilation issue) changes separately. So can take out those changes from PR.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #158: [CARBONDATA-241]Fixed out of memory ...

Posted by gvramana <gi...@git.apache.org>.
Github user gvramana commented on a diff in the pull request:

    https://github.com/apache/incubator-carbondata/pull/158#discussion_r79003748
  
    --- Diff: processing/src/main/java/org/apache/carbondata/lcm/status/SegmentStatusManager.java ---
    @@ -102,6 +91,60 @@ public long getTableStatusLastModifiedTime() throws IOException {
     
       /**
        * get valid segment for given table
    +   *
    +   * @return
    +   * @throws IOException
    +   */
    +  public InvalidSegmentsInfo getInvalidSegments() throws IOException {
    --- End diff --
    
    This requires reading SegmentInfo twice, once for valid blocks and next for Invalid Blocks. Instead send a single class containting ValidAndInvalidBlocks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #158: [CARBONDATA-241]Fixed out of memory ...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/incubator-carbondata/pull/158


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #158: [CARBONDATA-241]Fixed out of memory ...

Posted by gvramana <gi...@git.apache.org>.
Github user gvramana commented on a diff in the pull request:

    https://github.com/apache/incubator-carbondata/pull/158#discussion_r79288466
  
    --- Diff: core/src/main/java/org/apache/carbondata/core/carbon/datastore/BlockIndexStore.java ---
    @@ -260,11 +295,29 @@ public void removeTableBlocks(List<TableBlockInfo> removeTableBlocksInfos,
         }
         Map<TableBlockInfo, AbstractIndex> map = tableBlocksMap.get(absoluteTableIdentifier);
         // if there is no loaded blocks then return
    -    if (null == map) {
    +    if (null == map || map.isEmpty()) {
    +      return;
    +    }
    +    Map<String, List<TableBlockInfo>> segmentIdToBlockInfoMap =
    +        segmentIdToBlockListMap.get(absoluteTableIdentifier);
    +    if (null == segmentIdToBlockInfoMap || segmentIdToBlockInfoMap.isEmpty()) {
           return;
         }
    -    for (TableBlockInfo blockInfos : removeTableBlocksInfos) {
    -      map.remove(blockInfos);
    +    synchronized (lockObject) {
    +      for (String segmentId : segmentsToBeRemoved) {
    +        List<TableBlockInfo> tableBlockInfoList = segmentIdToBlockInfoMap.get(segmentId);
    +        if (null == tableBlockInfoList) {
    +          continue;
    +        }
    +        Iterator<TableBlockInfo> tableBlockInfoIterator = tableBlockInfoList.iterator();
    +        while (tableBlockInfoIterator.hasNext()) {
    +          TableBlockInfo info = tableBlockInfoIterator.next();
    +          AbstractIndex remove = map.remove(info);
    +          if (null != remove) {
    --- End diff --
    
    tableBlockInfoIterator.remove needs to called irrespective of null != remove


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---