You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@accumulo.apache.org by Al Krinker <al...@gmail.com> on 2014/05/01 22:46:06 UTC

Issue with importDirectory

So I am trying to create my own rFile and write it to accumulo...

in the nutshell.

I create my rFile and two directories. One that would contain the file and
one for failures, both required by importDirectory

        Configuration conf = new Configuration();
        conf.set("fs.default.name", "hdfs://blah:9000/");
        conf.set("fs.hdfs.impl",
"org.apache.hadoop.hdfs.DistributedFileSystem");
        FileSystem fs = FileSystem.get(conf);

        Path input = new Path("/accumulo/temp1/testing/");
        Path output = new Path("/accumulo/temp1/testing/my_output");
        fs.mkdirs(input);
        fs.mkdirs(output);

        String extension = conf.get(FILE_TYPE);
        if (extension == null || extension.isEmpty()) {
            extension = RFile.EXTENSION;
        }
        String filename = "/accumulo/temp1/testing/my_input/testFile." +
extension;
        Path file = new Path(filename);
        if (fs.exists(file)) {
            file.getFileSystem(conf).delete(file, false);
        }
        FileSKVWriter out =
RFileOperations.getInstance().openWriter(filename, fs, conf,
                AccumuloConfiguration.getDefaultConfiguration());
        out.startDefaultLocalityGroup();
        long timestamp = (new Date()).getTime();
        Key key = new Key(new Text("row_1"), new Text("cf"), new Text("cq"),
                new ColumnVisibility(), timestamp);
        Value value = new Value("".getBytes());
        out.append(key, value);
        out.close();

at this point i can ssh into my namenode and see the file and two
directories

then i try to bulk import it

        String instanceName = "blah";
        String zooServers = "blah:2181,blah:2181"
        String userName = ; // Provide username
        String password = ; // Provide password
        // Connect
        Instance inst = new ZooKeeperInstance(instanceName, zooServers);
        Connector conn = inst.getConnector(userName, password);
        TableOperations ops = conn.tableOperations();
        ops.delete("mynewtesttable");
        ops.create("mynewtesttable");
        ops.importDirectory("mynewtesttable", input.toString(),
output.toString(), false);

The exception that I am getting is
SEVERE: null
org.apache.accumulo.core.client.AccumuloException: Bulk import directory
/accumulo/temp1/testing does not exist!

I tried to play around with the file/directory owner by manually setting it
to accumulo and then hadoop, but no luck.

I checked hdfs-site and I have
  <property>
    <name>dfs.permissions</name>
    <value>false</value>
  </property>

Any ideas?

Any guesses of what might be wrong?

Re: Issue with importDirectory

Posted by David Medinets <da...@gmail.com>.
I'm heading home but I can play with this tomorrow. On the positive side, I
have D4M reading the data that I wrote from Java. So that's nice. :)


On Thu, May 1, 2014 at 4:46 PM, Al Krinker <al...@gmail.com> wrote:

> So I am trying to create my own rFile and write it to accumulo...
>
> in the nutshell.
>
> I create my rFile and two directories. One that would contain the file and
> one for failures, both required by importDirectory
>
>         Configuration conf = new Configuration();
>         conf.set("fs.default.name", "hdfs://blah:9000/");
>         conf.set("fs.hdfs.impl",
> "org.apache.hadoop.hdfs.DistributedFileSystem");
>         FileSystem fs = FileSystem.get(conf);
>
>         Path input = new Path("/accumulo/temp1/testing/");
>         Path output = new Path("/accumulo/temp1/testing/my_output");
>         fs.mkdirs(input);
>         fs.mkdirs(output);
>
>         String extension = conf.get(FILE_TYPE);
>         if (extension == null || extension.isEmpty()) {
>             extension = RFile.EXTENSION;
>         }
>         String filename = "/accumulo/temp1/testing/my_input/testFile." +
> extension;
>         Path file = new Path(filename);
>         if (fs.exists(file)) {
>             file.getFileSystem(conf).delete(file, false);
>         }
>         FileSKVWriter out =
> RFileOperations.getInstance().openWriter(filename, fs, conf,
>                 AccumuloConfiguration.getDefaultConfiguration());
>         out.startDefaultLocalityGroup();
>         long timestamp = (new Date()).getTime();
>         Key key = new Key(new Text("row_1"), new Text("cf"), new
> Text("cq"),
>                 new ColumnVisibility(), timestamp);
>         Value value = new Value("".getBytes());
>         out.append(key, value);
>         out.close();
>
> at this point i can ssh into my namenode and see the file and two
> directories
>
> then i try to bulk import it
>
>         String instanceName = "blah";
>         String zooServers = "blah:2181,blah:2181"
>         String userName = ; // Provide username
>         String password = ; // Provide password
>         // Connect
>         Instance inst = new ZooKeeperInstance(instanceName, zooServers);
>         Connector conn = inst.getConnector(userName, password);
>         TableOperations ops = conn.tableOperations();
>         ops.delete("mynewtesttable");
>         ops.create("mynewtesttable");
>         ops.importDirectory("mynewtesttable", input.toString(),
> output.toString(), false);
>
> The exception that I am getting is
> SEVERE: null
> org.apache.accumulo.core.client.AccumuloException: Bulk import directory
> /accumulo/temp1/testing does not exist!
>
> I tried to play around with the file/directory owner by manually setting
> it to accumulo and then hadoop, but no luck.
>
> I checked hdfs-site and I have
>   <property>
>     <name>dfs.permissions</name>
>     <value>false</value>
>   </property>
>
> Any ideas?
>
> Any guesses of what might be wrong?
>

Re: Issue with importDirectory

Posted by Al Krinker <al...@gmail.com>.
Hey Josh,

I checked HDFS and it was there... the issue was and I have to thank one of
my friends who ran into it before..

When importdirectory runs it uses cachedconfig... so it was getting my
local one...

All I did to solve it was to add CachedConfiguration.setInstance(conf);
right after I created conf and pointed it to my hadoop hdfs...

Worked perfectly... i was able to create new rfile and write it to a table
in accumulo... The code that I posted works (plus the fix) for anyone
interested.

Anyway, that was it... and thank you Josh for your feedback! You are
awesome :)



On Thu, May 1, 2014 at 5:52 PM, Josh Elser <jo...@gmail.com> wrote:

> Probably best to start in HDFS. Check to see if the directory you thought
> you made actually exists (/accumulo/temp1/testing).
>
> It's possible that you wrote that file to local FS instead of HDFS.
>
>
> On 5/1/14, 4:46 PM, Al Krinker wrote:
>
>> So I am trying to create my own rFile and write it to accumulo...
>> in the nutshell.
>> I create my rFile and two directories. One that would contain the file
>> and one for failures, both required by importDirectory
>>          Configuration conf = new Configuration();
>>          conf.set("fs.default.name <http://fs.default.name>",
>>
>> "hdfs://blah:9000/");
>>          conf.set("fs.hdfs.impl",
>> "org.apache.hadoop.hdfs.DistributedFileSystem");
>>          FileSystem fs = FileSystem.get(conf);
>>
>>          Path input = new Path("/accumulo/temp1/testing/");
>>          Path output = new Path("/accumulo/temp1/testing/my_output");
>>          fs.mkdirs(input);
>>          fs.mkdirs(output);
>>
>>          String extension = conf.get(FILE_TYPE);
>>          if (extension == null || extension.isEmpty()) {
>>              extension = RFile.EXTENSION;
>>          }
>>          String filename = "/accumulo/temp1/testing/my_input/testFile."
>> + extension;
>>          Path file = new Path(filename);
>>          if (fs.exists(file)) {
>>              file.getFileSystem(conf).delete(file, false);
>>          }
>>          FileSKVWriter out =
>> RFileOperations.getInstance().openWriter(filename, fs, conf,
>>                  AccumuloConfiguration.getDefaultConfiguration());
>>          out.startDefaultLocalityGroup();
>>          long timestamp = (new Date()).getTime();
>>          Key key = new Key(new Text("row_1"), new Text("cf"), new
>> Text("cq"),
>>                  new ColumnVisibility(), timestamp);
>>          Value value = new Value("".getBytes());
>>          out.append(key, value);
>>          out.close();
>> at this point i can ssh into my namenode and see the file and two
>> directories
>> then i try to bulk import it
>>          String instanceName = "blah";
>>          String zooServers = "blah:2181,blah:2181"
>>          String userName = ; // Provide username
>>          String password = ; // Provide password
>>          // Connect
>>          Instance inst = new ZooKeeperInstance(instanceName, zooServers);
>>          Connector conn = inst.getConnector(userName, password);
>>          TableOperations ops = conn.tableOperations();
>>          ops.delete("mynewtesttable");
>>          ops.create("mynewtesttable");
>>          ops.importDirectory("mynewtesttable", input.toString(),
>> output.toString(), false);
>> The exception that I am getting is
>> SEVERE: null
>> org.apache.accumulo.core.client.AccumuloException: Bulk import directory
>> /accumulo/temp1/testing does not exist!
>> I tried to play around with the file/directory owner by manually setting
>> it to accumulo and then hadoop, but no luck.
>> I checked hdfs-site and I have
>>    <property>
>>      <name>dfs.permissions</name>
>>      <value>false</value>
>>    </property>
>> Any ideas?
>> Any guesses of what might be wrong?
>>
>

Re: Issue with importDirectory

Posted by Josh Elser <jo...@gmail.com>.
Probably best to start in HDFS. Check to see if the directory you 
thought you made actually exists (/accumulo/temp1/testing).

It's possible that you wrote that file to local FS instead of HDFS.

On 5/1/14, 4:46 PM, Al Krinker wrote:
> So I am trying to create my own rFile and write it to accumulo...
> in the nutshell.
> I create my rFile and two directories. One that would contain the file
> and one for failures, both required by importDirectory
>          Configuration conf = new Configuration();
>          conf.set("fs.default.name <http://fs.default.name>",
> "hdfs://blah:9000/");
>          conf.set("fs.hdfs.impl",
> "org.apache.hadoop.hdfs.DistributedFileSystem");
>          FileSystem fs = FileSystem.get(conf);
>
>          Path input = new Path("/accumulo/temp1/testing/");
>          Path output = new Path("/accumulo/temp1/testing/my_output");
>          fs.mkdirs(input);
>          fs.mkdirs(output);
>
>          String extension = conf.get(FILE_TYPE);
>          if (extension == null || extension.isEmpty()) {
>              extension = RFile.EXTENSION;
>          }
>          String filename = "/accumulo/temp1/testing/my_input/testFile."
> + extension;
>          Path file = new Path(filename);
>          if (fs.exists(file)) {
>              file.getFileSystem(conf).delete(file, false);
>          }
>          FileSKVWriter out =
> RFileOperations.getInstance().openWriter(filename, fs, conf,
>                  AccumuloConfiguration.getDefaultConfiguration());
>          out.startDefaultLocalityGroup();
>          long timestamp = (new Date()).getTime();
>          Key key = new Key(new Text("row_1"), new Text("cf"), new
> Text("cq"),
>                  new ColumnVisibility(), timestamp);
>          Value value = new Value("".getBytes());
>          out.append(key, value);
>          out.close();
> at this point i can ssh into my namenode and see the file and two
> directories
> then i try to bulk import it
>          String instanceName = "blah";
>          String zooServers = "blah:2181,blah:2181"
>          String userName = ; // Provide username
>          String password = ; // Provide password
>          // Connect
>          Instance inst = new ZooKeeperInstance(instanceName, zooServers);
>          Connector conn = inst.getConnector(userName, password);
>          TableOperations ops = conn.tableOperations();
>          ops.delete("mynewtesttable");
>          ops.create("mynewtesttable");
>          ops.importDirectory("mynewtesttable", input.toString(),
> output.toString(), false);
> The exception that I am getting is
> SEVERE: null
> org.apache.accumulo.core.client.AccumuloException: Bulk import directory
> /accumulo/temp1/testing does not exist!
> I tried to play around with the file/directory owner by manually setting
> it to accumulo and then hadoop, but no luck.
> I checked hdfs-site and I have
>    <property>
>      <name>dfs.permissions</name>
>      <value>false</value>
>    </property>
> Any ideas?
> Any guesses of what might be wrong?