You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@tajo.apache.org by jinossy <gi...@git.apache.org> on 2014/12/16 06:55:11 UTC

[GitHub] tajo pull request: TAJO-1250: RawFileAppender occasionally causes ...

GitHub user jinossy opened a pull request:

    https://github.com/apache/tajo/pull/303

    TAJO-1250: RawFileAppender occasionally causes BufferOverflowException.

    

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/jinossy/tajo TAJO-1250

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/tajo/pull/303.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #303
    
----
commit bddf997031a6daf666fb3f491d8b64e10ef1d138
Author: jhkim <jh...@apache.org>
Date:   2014-12-16T05:54:21Z

    TAJO-1250: RawFileAppender occasionally causes BufferOverflowException.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] tajo pull request: TAJO-1250: RawFileAppender occasionally causes ...

Posted by jihoonson <gi...@git.apache.org>.
Github user jihoonson commented on a diff in the pull request:

    https://github.com/apache/tajo/pull/303#discussion_r21949532
  
    --- Diff: tajo-common/src/main/java/org/apache/tajo/conf/TajoConf.java ---
    @@ -272,6 +272,10 @@ public static int setDateOrder(int dateOrder) {
         // Geo IP
         GEOIP_DATA("tajo.function.geoip-database-location", ""),
     
    +    // Storage IO BUFFER
    +    STORAGE_IO_WRITE_BUFFER_SIZE("tajo.io.write.buffer.size", 128 * 1024),
    +    STORAGE_IO_READ_BUFFER_SIZE("tajo.io.read.buffer.size", 128 * 1024),
    --- End diff --
    
    They seem buffer sizes for HDFS.
    I think that we will support various types of storages.
    In addition, it would be more intuititive if it represents the size unit such as KB.
    So, how about change the property name such as tajo.hdfs.write.buffer.size.KB?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] tajo pull request: TAJO-1250: RawFileAppender occasionally causes ...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/tajo/pull/303


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] tajo pull request: TAJO-1250: RawFileAppender occasionally causes ...

Posted by jinossy <gi...@git.apache.org>.
Github user jinossy commented on a diff in the pull request:

    https://github.com/apache/tajo/pull/303#discussion_r21950515
  
    --- Diff: tajo-common/src/main/java/org/apache/tajo/conf/TajoConf.java ---
    @@ -272,6 +272,10 @@ public static int setDateOrder(int dateOrder) {
         // Geo IP
         GEOIP_DATA("tajo.function.geoip-database-location", ""),
     
    +    // Storage IO BUFFER
    +    STORAGE_IO_WRITE_BUFFER_SIZE("tajo.io.write.buffer.size", 128 * 1024),
    +    STORAGE_IO_READ_BUFFER_SIZE("tajo.io.read.buffer.size", 128 * 1024),
    --- End diff --
    
    @hyunsik  Good idea! 
    
    @jihoonson If you agree, I will update the patch


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] tajo pull request: TAJO-1250: RawFileAppender occasionally causes ...

Posted by hyunsik <gi...@git.apache.org>.
Github user hyunsik commented on a diff in the pull request:

    https://github.com/apache/tajo/pull/303#discussion_r22088859
  
    --- Diff: tajo-storage/tajo-storage-hdfs/src/main/java/org/apache/tajo/storage/text/DelimitedLineReader.java ---
    @@ -60,12 +60,18 @@
       private AtomicInteger lineReadBytes = new AtomicInteger();
       private FileFragment fragment;
       private Configuration conf;
    +  private int bufferSize;
     
       public DelimitedLineReader(Configuration conf, final FileFragment fragment) throws IOException {
    +    this(conf, fragment, 128 * StorageUnit.KB);
    --- End diff --
    
    How about using some constants for default block size?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] tajo pull request: TAJO-1250: RawFileAppender occasionally causes ...

Posted by jihoonson <gi...@git.apache.org>.
Github user jihoonson commented on a diff in the pull request:

    https://github.com/apache/tajo/pull/303#discussion_r21950805
  
    --- Diff: tajo-common/src/main/java/org/apache/tajo/conf/TajoConf.java ---
    @@ -272,6 +272,10 @@ public static int setDateOrder(int dateOrder) {
         // Geo IP
         GEOIP_DATA("tajo.function.geoip-database-location", ""),
     
    +    // Storage IO BUFFER
    +    STORAGE_IO_WRITE_BUFFER_SIZE("tajo.io.write.buffer.size", 128 * 1024),
    +    STORAGE_IO_READ_BUFFER_SIZE("tajo.io.read.buffer.size", 128 * 1024),
    --- End diff --
    
    @jinho, @hyunsik you are right. 
    I confused with RowFile. 
    Thanks for your comment. 
    Herr is +1. 
    Ship it!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] tajo pull request: TAJO-1250: RawFileAppender occasionally causes ...

Posted by hyunsik <gi...@git.apache.org>.
Github user hyunsik commented on a diff in the pull request:

    https://github.com/apache/tajo/pull/303#discussion_r21949903
  
    --- Diff: tajo-common/src/main/java/org/apache/tajo/conf/TajoConf.java ---
    @@ -272,6 +272,10 @@ public static int setDateOrder(int dateOrder) {
         // Geo IP
         GEOIP_DATA("tajo.function.geoip-database-location", ""),
     
    +    // Storage IO BUFFER
    +    STORAGE_IO_WRITE_BUFFER_SIZE("tajo.io.write.buffer.size", 128 * 1024),
    +    STORAGE_IO_READ_BUFFER_SIZE("tajo.io.read.buffer.size", 128 * 1024),
    --- End diff --
    
    His change occurs in mainly RawFile for local file system. So, we need to use another word instead of HDFS.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] tajo pull request: TAJO-1250: RawFileAppender occasionally causes ...

Posted by jinossy <gi...@git.apache.org>.
Github user jinossy commented on the pull request:

    https://github.com/apache/tajo/pull/303#issuecomment-67804313
  
    OK, I will do that


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] tajo pull request: TAJO-1250: RawFileAppender occasionally causes ...

Posted by hyunsik <gi...@git.apache.org>.
Github user hyunsik commented on a diff in the pull request:

    https://github.com/apache/tajo/pull/303#discussion_r22088891
  
    --- Diff: tajo-storage/tajo-storage-hdfs/src/main/java/org/apache/tajo/storage/text/DelimitedTextFile.java ---
    @@ -165,8 +167,9 @@ public void init() throws IOException {
           serializer = getLineSerde().createSerializer(schema, meta);
           serializer.init();
     
    +      bufferSize = conf.getInt(WRITE_BUFFER_SIZE, 128 * StorageUnit.KB);
    --- End diff --
    
    How about using some constants for default buffer size? I know that in most cases default values is in storage-default.xml. Nevertheless, constants would be better.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] tajo pull request: TAJO-1250: RawFileAppender occasionally causes ...

Posted by jinossy <gi...@git.apache.org>.
Github user jinossy commented on the pull request:

    https://github.com/apache/tajo/pull/303#issuecomment-67297042
  
    I've updated config key


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] tajo pull request: TAJO-1250: RawFileAppender occasionally causes ...

Posted by hyunsik <gi...@git.apache.org>.
Github user hyunsik commented on the pull request:

    https://github.com/apache/tajo/pull/303#issuecomment-67595871
  
    +1
    
    The patch looks good to me. I just leave some trivial suggestion. Please reflect the comments if you agree with my suggestion.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] tajo pull request: TAJO-1250: RawFileAppender occasionally causes ...

Posted by hyunsik <gi...@git.apache.org>.
Github user hyunsik commented on a diff in the pull request:

    https://github.com/apache/tajo/pull/303#discussion_r21950196
  
    --- Diff: tajo-common/src/main/java/org/apache/tajo/conf/TajoConf.java ---
    @@ -272,6 +272,10 @@ public static int setDateOrder(int dateOrder) {
         // Geo IP
         GEOIP_DATA("tajo.function.geoip-database-location", ""),
     
    +    // Storage IO BUFFER
    +    STORAGE_IO_WRITE_BUFFER_SIZE("tajo.io.write.buffer.size", 128 * 1024),
    +    STORAGE_IO_READ_BUFFER_SIZE("tajo.io.read.buffer.size", 128 * 1024),
    --- End diff --
    
    Also, we could use various properties even specified for each file if we use the prefix ```tajo.storage.[storetype]```.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] tajo pull request: TAJO-1250: RawFileAppender occasionally causes ...

Posted by hyunsik <gi...@git.apache.org>.
Github user hyunsik commented on the pull request:

    https://github.com/apache/tajo/pull/303#issuecomment-67271729
  
    Except for issues about config key, the patch looks good to me.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] tajo pull request: TAJO-1250: RawFileAppender occasionally causes ...

Posted by jinossy <gi...@git.apache.org>.
Github user jinossy commented on a diff in the pull request:

    https://github.com/apache/tajo/pull/303#discussion_r21949956
  
    --- Diff: tajo-common/src/main/java/org/apache/tajo/conf/TajoConf.java ---
    @@ -272,6 +272,10 @@ public static int setDateOrder(int dateOrder) {
         // Geo IP
         GEOIP_DATA("tajo.function.geoip-database-location", ""),
     
    +    // Storage IO BUFFER
    +    STORAGE_IO_WRITE_BUFFER_SIZE("tajo.io.write.buffer.size", 128 * 1024),
    +    STORAGE_IO_READ_BUFFER_SIZE("tajo.io.read.buffer.size", 128 * 1024),
    --- End diff --
    
    Thank you for the review.
    HDFS buffer property is {{io.file.buffer.size}}. and RawFile is not HDFS.
    So, {{tajo.io.write.buffer.bytes}} is looks good to me. Any idea?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] tajo pull request: TAJO-1250: RawFileAppender occasionally causes ...

Posted by hyunsik <gi...@git.apache.org>.
Github user hyunsik commented on a diff in the pull request:

    https://github.com/apache/tajo/pull/303#discussion_r21950158
  
    --- Diff: tajo-common/src/main/java/org/apache/tajo/conf/TajoConf.java ---
    @@ -272,6 +272,10 @@ public static int setDateOrder(int dateOrder) {
         // Geo IP
         GEOIP_DATA("tajo.function.geoip-database-location", ""),
     
    +    // Storage IO BUFFER
    +    STORAGE_IO_WRITE_BUFFER_SIZE("tajo.io.write.buffer.size", 128 * 1024),
    +    STORAGE_IO_READ_BUFFER_SIZE("tajo.io.read.buffer.size", 128 * 1024),
    --- End diff --
    
    BTW, I also have the same concern about the config key name. The properties may vary in file formats.
    
    So, I propose using ```storage-default.xml``` and ```tajo.storage.[storetype].io.read-buffer.bytes``` which can be specified according to file formats.
    
    Our config naming convention includes some unit like ```bytes``` or ```mb``` because it reduces misuse of values. So, I'd like to suggest  ```io.read-buffer.bytes``` or ```io.write-buffer.bytes``` for table properties.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---