You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@druid.apache.org by GitBox <gi...@apache.org> on 2022/01/03 22:56:04 UTC

[GitHub] [druid] jasonk000 opened a new pull request #12113: perf: cache row if it is a transformed row

jasonk000 opened a new pull request #12113:
URL: https://github.com/apache/druid/pull/12113


   Reduce the number of times a timestamp is parsed and read if the row is a transformed row.
   
   During ingestion the parsing and appending functionality requires repeated access to the timestamp information from the row. In the case of a TransformedInputRow, this could be an expensive step:
   
   ```
   {
     "type": "expression",
     "name": "__time",
     "expression": "(point_seconds * 1000.0) + (point_nanos / 1000000.0)"
   },
   ```
   
   By caching this row, a significant reduction in CPU time can be achieved, in this example from 14% to 4% of CPU involved in reading Timestamps, a 10% reduction. This change converts the processing from lazy load of timestamp during Transform to an eager transform, and then keeps the result.
   
   Before:
   ![image](https://user-images.githubusercontent.com/3196528/147988949-ed26c604-2c8a-478e-be13-9a476eff70b7.png)
   
   After:
   ![image](https://user-images.githubusercontent.com/3196528/147988927-922f14f5-2e97-4f0e-a105-8b0fe79e8912.png)
   
   This PR has:
   - [x] been self-reviewed.
      - [ ] N/A using the [concurrency checklist](https://github.com/apache/druid/blob/master/dev/code-review/concurrency.md) (Remove this item if the PR doesn't have any relation to concurrency.)
   - [x] added documentation for new or modified features or behaviors.
   - [x] added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
   - [ ] N/A added or updated version, license, or notice information in [licenses.yaml](https://github.com/apache/druid/blob/master/dev/license.md)
   - [x] added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
   - [ ] N/A added unit tests or modified existing tests to cover new code paths, ensuring the threshold for [code coverage](https://github.com/apache/druid/blob/master/dev/code-review/code-coverage.md) is met.
   - [ ] N/A added integration tests.
   - [x] been tested in a test Druid cluster.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] jasonk000 commented on a change in pull request #12113: perf: cache row if it is a transformed row

Posted by GitBox <gi...@apache.org>.
jasonk000 commented on a change in pull request #12113:
URL: https://github.com/apache/druid/pull/12113#discussion_r778231427



##########
File path: processing/src/main/java/org/apache/druid/segment/transform/Transformer.java
##########
@@ -170,15 +174,16 @@ public long getTimestampFromEpoch()
       }
     }
 
+    @Override
+    public long getTimestampFromEpoch()
+    {
+      return timestamp;
+    }
+
     @Override
     public DateTime getTimestamp()
     {
-      final RowFunction transform = transforms.get(ColumnHolder.TIME_COLUMN_NAME);
-      if (transform != null) {
-        return DateTimes.utc(getTimestampFromEpoch());
-      } else {
-        return row.getTimestamp();
-      }
+      return DateTimes.utc(getTimestampFromEpoch());

Review comment:
       Sure, fixed in f444ad28c6.
   
   It's not material to performance, but it also shouldn't hurt.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] suneet-s merged pull request #12113: perf: cache row if it is a transformed row

Posted by GitBox <gi...@apache.org>.
suneet-s merged pull request #12113:
URL: https://github.com/apache/druid/pull/12113


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] suneet-s commented on pull request #12113: perf: cache row if it is a transformed row

Posted by GitBox <gi...@apache.org>.
suneet-s commented on pull request #12113:
URL: https://github.com/apache/druid/pull/12113#issuecomment-1040609171


   Thanks for the optimization @jasonk000 ! The test failure appears unrelated.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] suneet-s commented on pull request #12113: perf: cache row if it is a transformed row

Posted by GitBox <gi...@apache.org>.
suneet-s commented on pull request #12113:
URL: https://github.com/apache/druid/pull/12113#issuecomment-1040609171


   Thanks for the optimization @jasonk000 ! The test failure appears unrelated.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] FrankChen021 commented on a change in pull request #12113: perf: cache row if it is a transformed row

Posted by GitBox <gi...@apache.org>.
FrankChen021 commented on a change in pull request #12113:
URL: https://github.com/apache/druid/pull/12113#discussion_r777787125



##########
File path: processing/src/main/java/org/apache/druid/segment/transform/Transformer.java
##########
@@ -170,15 +174,16 @@ public long getTimestampFromEpoch()
       }
     }
 
+    @Override
+    public long getTimestampFromEpoch()
+    {
+      return timestamp;
+    }
+
     @Override
     public DateTime getTimestamp()
     {
-      final RowFunction transform = transforms.get(ColumnHolder.TIME_COLUMN_NAME);
-      if (transform != null) {
-        return DateTimes.utc(getTimestampFromEpoch());
-      } else {
-        return row.getTimestamp();
-      }
+      return DateTimes.utc(getTimestampFromEpoch());

Review comment:
       Why not cache the `DateTime` object instead of the `long`? By doing that we can save many temp objects creation.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] suneet-s merged pull request #12113: perf: cache row if it is a transformed row

Posted by GitBox <gi...@apache.org>.
suneet-s merged pull request #12113:
URL: https://github.com/apache/druid/pull/12113


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org