You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@drill.apache.org by GitBox <gi...@apache.org> on 2022/01/12 20:21:18 UTC

[GitHub] [drill] pjfanning opened a new pull request #2426: [DRILL-8106] do not use cellIterator because it skips empty cells

pjfanning opened a new pull request #2426:
URL: https://github.com/apache/drill/pull/2426


   # [DRILL-8106](https://issues.apache.org/jira/browse/DRILL-DRILL-8106): do not use cellIterator because it skips empty cells
   
   ## Description
   
   @cgivre sent me an excel file that has missing cells and this throws off the existing code.
   
   ## Documentation
   None expected
   
   ## Testing
   Unit test - yet to be written
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@drill.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [drill] cgivre merged pull request #2426: DRILL-8106: do not use cellIterator because it skips empty cells

Posted by GitBox <gi...@apache.org>.
cgivre merged pull request #2426:
URL: https://github.com/apache/drill/pull/2426


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@drill.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [drill] pjfanning commented on pull request #2426: [DRILL-8106] do not use cellIterator because it skips empty cells

Posted by GitBox <gi...@apache.org>.
pjfanning commented on pull request #2426:
URL: https://github.com/apache/drill/pull/2426#issuecomment-1011419554


   Current PR does not yet fully fix the code - I'm still getting issues with the sample file - but a different one from when I started.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@drill.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [drill] cgivre commented on a change in pull request #2426: DRILL-8106: do not use cellIterator because it skips empty cells

Posted by GitBox <gi...@apache.org>.
cgivre commented on a change in pull request #2426:
URL: https://github.com/apache/drill/pull/2426#discussion_r784235563



##########
File path: contrib/format-excel/src/main/java/org/apache/drill/exec/store/excel/ExcelBatchReader.java
##########
@@ -312,41 +311,31 @@ private void getColumnHeaders(SchemaBuilder builder) {
     //If there are no headers, create columns names of field_n
     if (readerConfig.headerRow == -1) {
       String missingFieldName;
-      int i = 0;
 
-      for (Cell c : currentRow) {
-        missingFieldName = MISSING_FIELD_NAME_HEADER + (i + 1);
+      for (short colNum = 0; colNum < currentRow.getLastCellNum(); colNum++) {

Review comment:
       Gotcha.  




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@drill.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [drill] lgtm-com[bot] commented on pull request #2426: DRILL-8106: do not use cellIterator because it skips empty cells

Posted by GitBox <gi...@apache.org>.
lgtm-com[bot] commented on pull request #2426:
URL: https://github.com/apache/drill/pull/2426#issuecomment-1012157820


   This pull request **fixes 1 alert** when merging 138701c54110bec157907331ffd88f3631516cdc into e3150e3a33ebc6c1c1fe41029ffb077c11f445bc - [view on LGTM.com](https://lgtm.com/projects/g/apache/drill/rev/pr-13f74dd720fb2645575073d8a0bd7511b693dced)
   
   **fixed alerts:**
   
   * 1 for Dereferenced variable may be null


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@drill.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [drill] pjfanning commented on pull request #2426: DRILL-8106: do not use cellIterator because it skips empty cells

Posted by GitBox <gi...@apache.org>.
pjfanning commented on pull request #2426:
URL: https://github.com/apache/drill/pull/2426#issuecomment-1012405515


   @cgivre I have the unit test passing locally now so hopefully this should be ready to merge after the CI build


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@drill.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [drill] pjfanning commented on pull request #2426: [DRILL-8106] do not use cellIterator because it skips empty cells

Posted by GitBox <gi...@apache.org>.
pjfanning commented on pull request #2426:
URL: https://github.com/apache/drill/pull/2426#issuecomment-1011448293


   I think the code is now basically fixed.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@drill.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [drill] pjfanning commented on pull request #2426: DRILL-8106: do not use cellIterator because it skips empty cells

Posted by GitBox <gi...@apache.org>.
pjfanning commented on pull request #2426:
URL: https://github.com/apache/drill/pull/2426#issuecomment-1012134713


   the new test case is failing and I need to get back to it to debug it


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@drill.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [drill] lgtm-com[bot] commented on pull request #2426: DRILL-8106: do not use cellIterator because it skips empty cells

Posted by GitBox <gi...@apache.org>.
lgtm-com[bot] commented on pull request #2426:
URL: https://github.com/apache/drill/pull/2426#issuecomment-1012463168


   This pull request **fixes 1 alert** when merging bcc5b1e50b0bdb4a35c1554a938a578bcc1f553e into e3150e3a33ebc6c1c1fe41029ffb077c11f445bc - [view on LGTM.com](https://lgtm.com/projects/g/apache/drill/rev/pr-6fbbd18970e93d6afbe2a6101db68f9bb63abd51)
   
   **fixed alerts:**
   
   * 1 for Dereferenced variable may be null


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@drill.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [drill] pjfanning commented on a change in pull request #2426: DRILL-8106: do not use cellIterator because it skips empty cells

Posted by GitBox <gi...@apache.org>.
pjfanning commented on a change in pull request #2426:
URL: https://github.com/apache/drill/pull/2426#discussion_r783954120



##########
File path: contrib/format-excel/src/main/java/org/apache/drill/exec/store/excel/ExcelBatchReader.java
##########
@@ -312,41 +311,31 @@ private void getColumnHeaders(SchemaBuilder builder) {
     //If there are no headers, create columns names of field_n
     if (readerConfig.headerRow == -1) {
       String missingFieldName;
-      int i = 0;
 
-      for (Cell c : currentRow) {
-        missingFieldName = MISSING_FIELD_NAME_HEADER + (i + 1);
+      for (short colNum = 0; colNum < currentRow.getLastCellNum(); colNum++) {

Review comment:
       getLastCellNum returns a short so matching the type to that - excel only allows 16,384 columns and Short.MAX_VALUE = 32767




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@drill.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [drill] cgivre commented on pull request #2426: DRILL-8106: do not use cellIterator because it skips empty cells

Posted by GitBox <gi...@apache.org>.
cgivre commented on pull request #2426:
URL: https://github.com/apache/drill/pull/2426#issuecomment-1011728735


   @pjfanning Here is a spreadsheet with anonymized data that you can use for a quick UT.
   [missing_data.xlsx](https://github.com/apache/drill/files/7859494/missing_data.xlsx)
    


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@drill.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [drill] lgtm-com[bot] commented on pull request #2426: [DRILL-8106] do not use cellIterator because it skips empty cells

Posted by GitBox <gi...@apache.org>.
lgtm-com[bot] commented on pull request #2426:
URL: https://github.com/apache/drill/pull/2426#issuecomment-1011533498


   This pull request **fixes 1 alert** when merging e323da71239981a8a9dcd8de541913c4874a7774 into e3150e3a33ebc6c1c1fe41029ffb077c11f445bc - [view on LGTM.com](https://lgtm.com/projects/g/apache/drill/rev/pr-4a2378c5f765a5ba59e00929b40340ea3782f9f2)
   
   **fixed alerts:**
   
   * 1 for Dereferenced variable may be null


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@drill.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [drill] cgivre commented on a change in pull request #2426: DRILL-8106: do not use cellIterator because it skips empty cells

Posted by GitBox <gi...@apache.org>.
cgivre commented on a change in pull request #2426:
URL: https://github.com/apache/drill/pull/2426#discussion_r783463505



##########
File path: contrib/format-excel/src/main/java/org/apache/drill/exec/store/excel/ExcelBatchReader.java
##########
@@ -312,41 +311,31 @@ private void getColumnHeaders(SchemaBuilder builder) {
     //If there are no headers, create columns names of field_n
     if (readerConfig.headerRow == -1) {
       String missingFieldName;
-      int i = 0;
 
-      for (Cell c : currentRow) {
-        missingFieldName = MISSING_FIELD_NAME_HEADER + (i + 1);
+      for (short colNum = 0; colNum < currentRow.getLastCellNum(); colNum++) {

Review comment:
       Is there a reason why we're using a `short` here rather than an `int`?  




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@drill.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org