You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@drill.apache.org by parthchandra <gi...@git.apache.org> on 2015/12/11 22:12:47 UTC

[GitHub] drill pull request: DRILL-4152: Add trace logging to Parquet reade...

GitHub user parthchandra opened a pull request:

    https://github.com/apache/drill/pull/298

    DRILL-4152: Add trace logging to  Parquet reader for performance tuning.

    Added trace level logging to track time taken by the Parquet reader.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/parthchandra/incubator-drill DRILL-4152

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/drill/pull/298.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #298
    
----
commit edb772e4e7c1a1af325c2ae3da659dbf0b0c26a4
Author: Parth Chandra <pa...@apache.org>
Date:   2015-12-03T22:32:30Z

    DRILL-4152: Add trace logging to  Parquet reader for performance tuning.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] drill pull request: DRILL-4152: Add trace logging to Parquet reade...

Posted by adeneche <gi...@git.apache.org>.
Github user adeneche commented on the pull request:

    https://github.com/apache/drill/pull/298#issuecomment-164589249
  
    I left some comments that should be easy fixes, other than that LGTM
    +1


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] drill pull request: DRILL-4152: Add trace logging to Parquet reade...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/drill/pull/298


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] drill pull request: DRILL-4152: Add trace logging to Parquet reade...

Posted by adeneche <gi...@git.apache.org>.
Github user adeneche commented on a diff in the pull request:

    https://github.com/apache/drill/pull/298#discussion_r47535407
  
    --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/columnreaders/PageReader.java ---
    @@ -124,9 +131,16 @@
     
       private void loadDictionaryIfExists(final ColumnReader<?> parentStatus,
           final ColumnChunkMetaData columnChunkMetaData, final FSDataInputStream f) throws IOException {
    +    Stopwatch timer = new Stopwatch();
         if (columnChunkMetaData.getDictionaryPageOffset() > 0) {
           f.seek(columnChunkMetaData.getDictionaryPageOffset());
    +      long start=f.getPos();
    +      timer.start();
           final PageHeader pageHeader = Util.readPageHeader(f);
    +      long timeToRead = timer.elapsed(TimeUnit.MICROSECONDS);
    +      timer.reset();
    --- End diff --
    
    no need to call `timer.reset()`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] drill pull request: DRILL-4152: Add trace logging to Parquet reade...

Posted by adeneche <gi...@git.apache.org>.
Github user adeneche commented on a diff in the pull request:

    https://github.com/apache/drill/pull/298#discussion_r47534819
  
    --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/ParquetScanBatchCreator.java ---
    @@ -122,9 +124,14 @@ public ScanBatch getBatch(FragmentContext context, ParquetRowGroupScan rowGroupS
           These fields will be added to the constructor below
           */
           try {
    +        Stopwatch timer = new Stopwatch();
             if ( ! footers.containsKey(e.getPath())){
    -          footers.put(e.getPath(),
    -              ParquetFileReader.readFooter(conf, new Path(e.getPath())));
    +          timer.start();
    +          ParquetMetadata footer = ParquetFileReader.readFooter(conf, new Path(e.getPath()));
    +          long timeToRead = timer.elapsed(TimeUnit.MICROSECONDS);
    +          timer.reset();
    --- End diff --
    
    we don't really need to call `timer.reset()` unless we move the timer creation outside the for loop


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] drill pull request: DRILL-4152: Add trace logging to Parquet reade...

Posted by adeneche <gi...@git.apache.org>.
Github user adeneche commented on a diff in the pull request:

    https://github.com/apache/drill/pull/298#discussion_r47573483
  
    --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/columnreaders/ParquetRecordReader.java ---
    @@ -105,13 +107,15 @@
       long totalRecordsRead;
       private final FragmentContext fragmentContext;
     
    +  public ParquetReaderStats parquetReaderStats = new ParquetReaderStats();
    --- End diff --
    
    should we mark this as `final` ? it makes it easier for other developers understand how it's supposed to be used. Of course, we'll have to remove the `set to null` in the `close()` method but I guess that's fine unless of course the close method can be called multiple times


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] drill pull request: DRILL-4152: Add trace logging to Parquet reade...

Posted by adeneche <gi...@git.apache.org>.
Github user adeneche commented on a diff in the pull request:

    https://github.com/apache/drill/pull/298#discussion_r47572637
  
    --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/columnreaders/PageReader.java ---
    @@ -196,7 +220,15 @@ public boolean next() throws IOException {
         // TODO - figure out if we need multiple dictionary pages, I believe it may be limited to one
         // I think we are clobbering parts of the dictionary if there can be multiple pages of dictionary
         do {
    +      long start=inputStream.getPos();
    +      timer.start();
           pageHeader = dataReader.readPageHeader();
    +      long timeToRead = timer.elapsed(TimeUnit.MICROSECONDS);
    +      this.updateStats(pageHeader, "Page Header Read", start, timeToRead, 0,0);
    +      logger.trace("ParquetTrace,{},{},{},{},{},{},{},{}","Page Header Read","",
    +          this.parentColumnReader.parentReader.hadoopPath,
    +          this.parentColumnReader.columnDescriptor.toString(), start, 0, 0, timeToRead);
    +      timer.reset();
    --- End diff --
    
    same here


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] drill pull request: DRILL-4152: Add trace logging to Parquet reade...

Posted by parthchandra <gi...@git.apache.org>.
Github user parthchandra commented on a diff in the pull request:

    https://github.com/apache/drill/pull/298#discussion_r47578156
  
    --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/columnreaders/ParquetRecordReader.java ---
    @@ -105,13 +107,15 @@
       long totalRecordsRead;
       private final FragmentContext fragmentContext;
     
    +  public ParquetReaderStats parquetReaderStats = new ParquetReaderStats();
    --- End diff --
    
    I did have this as final originally, but close is called multiple times so I had to remove the final.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---