You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-user@hadoop.apache.org by Anna Guan <ag...@gmail.com> on 2016/03/15 18:34:25 UTC

How to increase buffer size so that hdfsRead() reads entire tif file into a buffer

Hello,
I am new to Hadoop, and trying to run legacy c++ program to process geotiff
file using Hadoop Streaming. The version of Hadoop is 2.6.2.
I 'd like to open an image in hdfs file system and use hdfsRead to read the
file into a memory buffer and then use GDAL library to create a virtual
memory file so that I can create GDALDataset from it. I'd like to read the
whole file into the buffer however hdfsRead() only read 65536 bytes every
time. Is there any way to read the entire file into the buffer? I also
set dfs.image.transfer.chunksize
in the config file but it did not help. When I run it I got ERROR 4:
`/vsimem/l1' not recognised as a supported file format. I think this is
because I did not set the buffer properly. Can anyone kindly tell me if it
is possible or not?
many thanks!
 Anna Guan

    // open  hdfs file
    hdfsFile lfs = hdfsOpenFile(fs, "/input/L1.tif", O_RDONLY, 134217728,
0, 0);
    int size =  hdfsAvailable(fs, lfs) ;
    char * data_buffer = (char*)CPLMalloc(size);
    int hasdata = -1;
    tOffset offset = hdfsTell(fs, lfs);

    while(hasdata){
        hasdata =  hdfsRead(fs, lfs, data_buffer, size) ;
    }
    hdfsSeek(fs, lfs, offset);
    VSIFCloseL(VSIFileFromMemBuffer( "/vsimem/l1", (GByte*)data_buffer,
size, FALSE ));
    GDALAllRegister();
    GDALDataset* readDS = (GDALDataset*)GDALOpen("/vsimem/l1",GA_ReadOnly);

Re: How to increase buffer size so that hdfsRead() reads entire tif file into a buffer

Posted by Anna Guan <ag...@gmail.com>.
Hi Renjith,
I started from setting up a single node by following this
https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/SingleCluster.html,
then multiple nodes, and
https://hadoop.apache.org/docs/r1.2.1/streaming.html. Also some online
tutorials.

Thanks,
Anna

On Tue, Mar 15, 2016 at 10:34 AM, Anna Guan <ag...@gmail.com> wrote:

> Hello,
> I am new to Hadoop, and trying to run legacy c++ program to process
> geotiff file using Hadoop Streaming. The version of Hadoop is 2.6.2.
> I 'd like to open an image in hdfs file system and use hdfsRead to read
> the file into a memory buffer and then use GDAL library to create a virtual
> memory file so that I can create GDALDataset from it. I'd like to read the
> whole file into the buffer however hdfsRead() only read 65536 bytes every
> time. Is there any way to read the entire file into the buffer? I also set dfs.image.transfer.chunksize
> in the config file but it did not help. When I run it I got ERROR 4:
> `/vsimem/l1' not recognised as a supported file format. I think this is
> because I did not set the buffer properly. Can anyone kindly tell me if it
> is possible or not?
> many thanks!
>  Anna Guan
>
>     // open  hdfs file
>     hdfsFile lfs = hdfsOpenFile(fs, "/input/L1.tif", O_RDONLY, 134217728,
> 0, 0);
>     int size =  hdfsAvailable(fs, lfs) ;
>     char * data_buffer = (char*)CPLMalloc(size);
>     int hasdata = -1;
>     tOffset offset = hdfsTell(fs, lfs);
>
>     while(hasdata){
>         hasdata =  hdfsRead(fs, lfs, data_buffer, size) ;
>     }
>     hdfsSeek(fs, lfs, offset);
>     VSIFCloseL(VSIFileFromMemBuffer( "/vsimem/l1", (GByte*)data_buffer,
> size, FALSE ));
>     GDALAllRegister();
>     GDALDataset* readDS = (GDALDataset*)GDALOpen("/vsimem/l1",GA_ReadOnly);
>

Re: How to increase buffer size so that hdfsRead() reads entire tif file into a buffer

Posted by Anna Guan <ag...@gmail.com>.
Hi Renjith,
I started from setting up a single node by following this
https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/SingleCluster.html,
then multiple nodes, and
https://hadoop.apache.org/docs/r1.2.1/streaming.html. Also some online
tutorials.

Thanks,
Anna

On Tue, Mar 15, 2016 at 10:34 AM, Anna Guan <ag...@gmail.com> wrote:

> Hello,
> I am new to Hadoop, and trying to run legacy c++ program to process
> geotiff file using Hadoop Streaming. The version of Hadoop is 2.6.2.
> I 'd like to open an image in hdfs file system and use hdfsRead to read
> the file into a memory buffer and then use GDAL library to create a virtual
> memory file so that I can create GDALDataset from it. I'd like to read the
> whole file into the buffer however hdfsRead() only read 65536 bytes every
> time. Is there any way to read the entire file into the buffer? I also set dfs.image.transfer.chunksize
> in the config file but it did not help. When I run it I got ERROR 4:
> `/vsimem/l1' not recognised as a supported file format. I think this is
> because I did not set the buffer properly. Can anyone kindly tell me if it
> is possible or not?
> many thanks!
>  Anna Guan
>
>     // open  hdfs file
>     hdfsFile lfs = hdfsOpenFile(fs, "/input/L1.tif", O_RDONLY, 134217728,
> 0, 0);
>     int size =  hdfsAvailable(fs, lfs) ;
>     char * data_buffer = (char*)CPLMalloc(size);
>     int hasdata = -1;
>     tOffset offset = hdfsTell(fs, lfs);
>
>     while(hasdata){
>         hasdata =  hdfsRead(fs, lfs, data_buffer, size) ;
>     }
>     hdfsSeek(fs, lfs, offset);
>     VSIFCloseL(VSIFileFromMemBuffer( "/vsimem/l1", (GByte*)data_buffer,
> size, FALSE ));
>     GDALAllRegister();
>     GDALDataset* readDS = (GDALDataset*)GDALOpen("/vsimem/l1",GA_ReadOnly);
>

Re: How to increase buffer size so that hdfsRead() reads entire tif file into a buffer

Posted by Anna Guan <ag...@gmail.com>.
Hi Renjith,
I started from setting up a single node by following this
https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/SingleCluster.html,
then multiple nodes, and
https://hadoop.apache.org/docs/r1.2.1/streaming.html. Also some online
tutorials.

Thanks,
Anna

On Tue, Mar 15, 2016 at 10:34 AM, Anna Guan <ag...@gmail.com> wrote:

> Hello,
> I am new to Hadoop, and trying to run legacy c++ program to process
> geotiff file using Hadoop Streaming. The version of Hadoop is 2.6.2.
> I 'd like to open an image in hdfs file system and use hdfsRead to read
> the file into a memory buffer and then use GDAL library to create a virtual
> memory file so that I can create GDALDataset from it. I'd like to read the
> whole file into the buffer however hdfsRead() only read 65536 bytes every
> time. Is there any way to read the entire file into the buffer? I also set dfs.image.transfer.chunksize
> in the config file but it did not help. When I run it I got ERROR 4:
> `/vsimem/l1' not recognised as a supported file format. I think this is
> because I did not set the buffer properly. Can anyone kindly tell me if it
> is possible or not?
> many thanks!
>  Anna Guan
>
>     // open  hdfs file
>     hdfsFile lfs = hdfsOpenFile(fs, "/input/L1.tif", O_RDONLY, 134217728,
> 0, 0);
>     int size =  hdfsAvailable(fs, lfs) ;
>     char * data_buffer = (char*)CPLMalloc(size);
>     int hasdata = -1;
>     tOffset offset = hdfsTell(fs, lfs);
>
>     while(hasdata){
>         hasdata =  hdfsRead(fs, lfs, data_buffer, size) ;
>     }
>     hdfsSeek(fs, lfs, offset);
>     VSIFCloseL(VSIFileFromMemBuffer( "/vsimem/l1", (GByte*)data_buffer,
> size, FALSE ));
>     GDALAllRegister();
>     GDALDataset* readDS = (GDALDataset*)GDALOpen("/vsimem/l1",GA_ReadOnly);
>

Re: How to increase buffer size so that hdfsRead() reads entire tif file into a buffer

Posted by Anna Guan <ag...@gmail.com>.
Hi Renjith,
I started from setting up a single node by following this
https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/SingleCluster.html,
then multiple nodes, and
https://hadoop.apache.org/docs/r1.2.1/streaming.html. Also some online
tutorials.

Thanks,
Anna

On Tue, Mar 15, 2016 at 10:34 AM, Anna Guan <ag...@gmail.com> wrote:

> Hello,
> I am new to Hadoop, and trying to run legacy c++ program to process
> geotiff file using Hadoop Streaming. The version of Hadoop is 2.6.2.
> I 'd like to open an image in hdfs file system and use hdfsRead to read
> the file into a memory buffer and then use GDAL library to create a virtual
> memory file so that I can create GDALDataset from it. I'd like to read the
> whole file into the buffer however hdfsRead() only read 65536 bytes every
> time. Is there any way to read the entire file into the buffer? I also set dfs.image.transfer.chunksize
> in the config file but it did not help. When I run it I got ERROR 4:
> `/vsimem/l1' not recognised as a supported file format. I think this is
> because I did not set the buffer properly. Can anyone kindly tell me if it
> is possible or not?
> many thanks!
>  Anna Guan
>
>     // open  hdfs file
>     hdfsFile lfs = hdfsOpenFile(fs, "/input/L1.tif", O_RDONLY, 134217728,
> 0, 0);
>     int size =  hdfsAvailable(fs, lfs) ;
>     char * data_buffer = (char*)CPLMalloc(size);
>     int hasdata = -1;
>     tOffset offset = hdfsTell(fs, lfs);
>
>     while(hasdata){
>         hasdata =  hdfsRead(fs, lfs, data_buffer, size) ;
>     }
>     hdfsSeek(fs, lfs, offset);
>     VSIFCloseL(VSIFileFromMemBuffer( "/vsimem/l1", (GByte*)data_buffer,
> size, FALSE ));
>     GDALAllRegister();
>     GDALDataset* readDS = (GDALDataset*)GDALOpen("/vsimem/l1",GA_ReadOnly);
>