You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@cassandra.apache.org by "Michael Kjellman (JIRA)" <ji...@apache.org> on 2012/11/05 17:12:12 UTC

[jira] [Created] (CASSANDRA-4912) BulkOutputFormat should support Hadoop MultipleOutput

Michael Kjellman created CASSANDRA-4912:
-------------------------------------------

             Summary: BulkOutputFormat should support Hadoop MultipleOutput
                 Key: CASSANDRA-4912
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-4912
             Project: Cassandra
          Issue Type: New Feature
          Components: Hadoop
    Affects Versions: 1.2.0 beta 1
            Reporter: Michael Kjellman


Much like CASSANDRA-4208 BOF should support outputting to Multiple Column Families. The current approach takken in the patch for COF results in only one stream being sent.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-4912) BulkOutputFormat should support Hadoop MultipleOutput

Posted by "Brandon Williams (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-4912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13496392#comment-13496392 ] 

Brandon Williams commented on CASSANDRA-4912:
---------------------------------------------

I still get a slew of errors trying to compile this. An obvious one is in ReducerToCassandra.reduce where 'val' is never defined, but there are many others.
                
> BulkOutputFormat should support Hadoop MultipleOutput
> -----------------------------------------------------
>
>                 Key: CASSANDRA-4912
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-4912
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Hadoop
>    Affects Versions: 1.2.0 beta 1, 1.2.0 beta 2
>            Reporter: Michael Kjellman
>         Attachments: 4912.txt, Example.java
>
>
> Much like CASSANDRA-4208 BOF should support outputting to Multiple Column Families. The current approach takken in the patch for COF results in only one stream being sent.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (CASSANDRA-4912) BulkOutputFormat should support Hadoop MultipleOutput

Posted by "Michael Kjellman (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/CASSANDRA-4912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Michael Kjellman updated CASSANDRA-4912:
----------------------------------------

    Attachment: loaddata.pl
    
> BulkOutputFormat should support Hadoop MultipleOutput
> -----------------------------------------------------
>
>                 Key: CASSANDRA-4912
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-4912
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Hadoop
>    Affects Versions: 1.2.0 beta 1, 1.2.0 beta 2
>            Reporter: Michael Kjellman
>         Attachments: 4912.txt, App.java, loaddata.pl
>
>
> Much like CASSANDRA-4208 BOF should support outputting to Multiple Column Families. The current approach takken in the patch for COF results in only one stream being sent.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-4912) BulkOutputFormat should support Hadoop MultipleOutput

Posted by "Michael Kjellman (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-4912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13494500#comment-13494500 ] 

Michael Kjellman commented on CASSANDRA-4912:
---------------------------------------------

looks like OUTPUT_COLUMNFAMILY_CONFIG never gets set in ConfigHelper when a a new BulkRecordWriter is created. Difficult to figure out exactly what should/where the code should be setting mapreduce.output.basename in the job config.
                
> BulkOutputFormat should support Hadoop MultipleOutput
> -----------------------------------------------------
>
>                 Key: CASSANDRA-4912
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-4912
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Hadoop
>    Affects Versions: 1.2.0 beta 1
>            Reporter: Michael Kjellman
>         Attachments: Example.java
>
>
> Much like CASSANDRA-4208 BOF should support outputting to Multiple Column Families. The current approach takken in the patch for COF results in only one stream being sent.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-4912) BulkOutputFormat should support Hadoop MultipleOutput

Posted by "Michael Kjellman (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-4912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13494513#comment-13494513 ] 

Michael Kjellman commented on CASSANDRA-4912:
---------------------------------------------

I think also another difference in behavior between CFOF and BOF is that when a new BulkRecordWriter(Configuration conf) is created it creates the directory for the sstables. It calls ConfigHelper here to get the name of the column family so it can create the directory. The only call to getOutputColumnFamily is RangeClient in CFOF.

Normally, without MultipleOutputs the job config would include a setOutputColumnFamily(). I don't understand what calls setOutputColumnFamily when you add a new named MultipleOutput. I presume this is where the problem is.
                
> BulkOutputFormat should support Hadoop MultipleOutput
> -----------------------------------------------------
>
>                 Key: CASSANDRA-4912
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-4912
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Hadoop
>    Affects Versions: 1.2.0 beta 1
>            Reporter: Michael Kjellman
>         Attachments: Example.java
>
>
> Much like CASSANDRA-4208 BOF should support outputting to Multiple Column Families. The current approach takken in the patch for COF results in only one stream being sent.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-4912) BulkOutputFormat should support Hadoop MultipleOutput

Posted by "Michael Kjellman (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-4912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13496589#comment-13496589 ] 

Michael Kjellman commented on CASSANDRA-4912:
---------------------------------------------

Okay, attached a script to load data (really simple but wanted you to see what kind of data I was using to test that the Example job runs) and App.java which will allow you to output to multiple column families with BOF.
                
> BulkOutputFormat should support Hadoop MultipleOutput
> -----------------------------------------------------
>
>                 Key: CASSANDRA-4912
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-4912
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Hadoop
>    Affects Versions: 1.2.0 beta 1, 1.2.0 beta 2
>            Reporter: Michael Kjellman
>         Attachments: 4912.txt, App.java, loaddata.pl
>
>
> Much like CASSANDRA-4208 BOF should support outputting to Multiple Column Families. The current approach takken in the patch for COF results in only one stream being sent.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Comment Edited] (CASSANDRA-4912) BulkOutputFormat should support Hadoop MultipleOutput

Posted by "Michael Kjellman (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-4912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13492993#comment-13492993 ] 

Michael Kjellman edited comment on CASSANDRA-4912 at 11/8/12 6:21 AM:
----------------------------------------------------------------------

so obviously this is due to the handling in the close() function in BulkRecordWriter. So far i've been unable to get BOF to work in Local mode thru eclipse with MultipleOutput. ConfigHelper is happy on the first check of the job config, but when the reducer is instantiated the column family output names don't seem to be set. close() is pretty simple in BulkRecordWriter though, looks like the sstable is first closed, and then streamed to the nodes. I'm guessing that either close() is only being called on one of the sstables/named outputs (i do see in a fully distributed cluster the sstables get created for multiple column families).
                
      was (Author: mkjellman):
    so obviously this is due to the handling in the close() function in BulkRecordWriter. So far i've been unable to get BOF to work in Local mode thru eclipse with multipleoutput. ConfigHelper is happy on the first check, but when the reducer is created the column family output names don't seem to be set. close() is pretty simple, looks like the sstable is first closed, and then streamed to the nodes. I'm guessing that either close is only being close on one of the sstables (i do see in a fully distributed cluster the sstables get created for multiple column families) but maybe we don't close it thus it never streams to the nodes?
                  
> BulkOutputFormat should support Hadoop MultipleOutput
> -----------------------------------------------------
>
>                 Key: CASSANDRA-4912
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-4912
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Hadoop
>    Affects Versions: 1.2.0 beta 1
>            Reporter: Michael Kjellman
>
> Much like CASSANDRA-4208 BOF should support outputting to Multiple Column Families. The current approach takken in the patch for COF results in only one stream being sent.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (CASSANDRA-4912) BulkOutputFormat should support Hadoop MultipleOutput

Posted by "Michael Kjellman (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/CASSANDRA-4912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Michael Kjellman updated CASSANDRA-4912:
----------------------------------------

    Attachment: pom.xml

[~brandon.williams] also including a pom.xml for you if you decide to use maven for testing this.
                
> BulkOutputFormat should support Hadoop MultipleOutput
> -----------------------------------------------------
>
>                 Key: CASSANDRA-4912
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-4912
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Hadoop
>    Affects Versions: 1.2.0 beta 1, 1.2.0 beta 2
>            Reporter: Michael Kjellman
>         Attachments: 4912.txt, App.java, loaddata.pl, pom.xml
>
>
> Much like CASSANDRA-4208 BOF should support outputting to Multiple Column Families. The current approach takken in the patch for COF results in only one stream being sent.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (CASSANDRA-4912) BulkOutputFormat should support Hadoop MultipleOutput

Posted by "Michael Kjellman (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/CASSANDRA-4912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Michael Kjellman updated CASSANDRA-4912:
----------------------------------------

    Attachment: 4912.txt
    
> BulkOutputFormat should support Hadoop MultipleOutput
> -----------------------------------------------------
>
>                 Key: CASSANDRA-4912
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-4912
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Hadoop
>    Affects Versions: 1.2.0 beta 1
>            Reporter: Michael Kjellman
>         Attachments: 4912.txt, Example.java
>
>
> Much like CASSANDRA-4208 BOF should support outputting to Multiple Column Families. The current approach takken in the patch for COF results in only one stream being sent.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-4912) BulkOutputFormat should support Hadoop MultipleOutput

Posted by "Michael Kjellman (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-4912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13494471#comment-13494471 ] 

Michael Kjellman commented on CASSANDRA-4912:
---------------------------------------------

[~brandon.williams] If I patch BulkOutputFormat.java in a similar manner to CASSANDRA-4208 (line 40) this is what is causing the initial check of the config to pass but fail when the reducer is created. Still not sure why the behavior is different.
                
> BulkOutputFormat should support Hadoop MultipleOutput
> -----------------------------------------------------
>
>                 Key: CASSANDRA-4912
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-4912
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Hadoop
>    Affects Versions: 1.2.0 beta 1
>            Reporter: Michael Kjellman
>         Attachments: Example.java
>
>
> Much like CASSANDRA-4208 BOF should support outputting to Multiple Column Families. The current approach takken in the patch for COF results in only one stream being sent.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-4912) BulkOutputFormat should support Hadoop MultipleOutput

Posted by "Brandon Williams (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-4912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13495591#comment-13495591 ] 

Brandon Williams commented on CASSANDRA-4912:
---------------------------------------------

There is no particular reason that I recall, it was just a convenient place at the time.
                
> BulkOutputFormat should support Hadoop MultipleOutput
> -----------------------------------------------------
>
>                 Key: CASSANDRA-4912
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-4912
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Hadoop
>    Affects Versions: 1.2.0 beta 1, 1.2.0 beta 2
>            Reporter: Michael Kjellman
>         Attachments: 4912.txt, Example.java
>
>
> Much like CASSANDRA-4208 BOF should support outputting to Multiple Column Families. The current approach takken in the patch for COF results in only one stream being sent.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-4912) BulkOutputFormat should support Hadoop MultipleOutput

Posted by "Michael Kjellman (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-4912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13492993#comment-13492993 ] 

Michael Kjellman commented on CASSANDRA-4912:
---------------------------------------------

so obviously this is due to the handling in the close() function in BulkRecordWriter. So far i've been unable to get BOF to work in Local mode thru eclipse with multipleoutput. ConfigHelper is happy on the first check, but when the reducer is created the column family output names don't seem to be set. close() is pretty simple, looks like the sstable is first closed, and then streamed to the nodes. I'm guessing that either close is only being close on one of the sstables (i do see in a fully distributed cluster the sstables get created for multiple column families) but maybe we don't close it thus it never streams to the nodes?
                
> BulkOutputFormat should support Hadoop MultipleOutput
> -----------------------------------------------------
>
>                 Key: CASSANDRA-4912
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-4912
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Hadoop
>    Affects Versions: 1.2.0 beta 1
>            Reporter: Michael Kjellman
>
> Much like CASSANDRA-4208 BOF should support outputting to Multiple Column Families. The current approach takken in the patch for COF results in only one stream being sent.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-4912) BulkOutputFormat should support Hadoop MultipleOutput

Posted by "Michael Kjellman (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-4912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13498315#comment-13498315 ] 

Michael Kjellman commented on CASSANDRA-4912:
---------------------------------------------

[~brandon.williams] did everything compile okay for you?
                
> BulkOutputFormat should support Hadoop MultipleOutput
> -----------------------------------------------------
>
>                 Key: CASSANDRA-4912
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-4912
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Hadoop
>    Affects Versions: 1.2.0 beta 1, 1.2.0 beta 2
>            Reporter: Michael Kjellman
>         Attachments: 4912.txt, App.java, loaddata.pl, pom.xml
>
>
> Much like CASSANDRA-4208 BOF should support outputting to Multiple Column Families. The current approach taken in the patch for COF results in only one stream being sent and an exception being thrown when Hadoop is run in local mode due to the call to ConfigHelper when a new BulkRecordWriter is created.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-4912) BulkOutputFormat should support Hadoop MultipleOutput

Posted by "Brandon Williams (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-4912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13493129#comment-13493129 ] 

Brandon Williams commented on CASSANDRA-4912:
---------------------------------------------

Hmm, normally I find the opposite: local mode works, and then everything breaks in distributed mode :)  Can you post everything needed to test?
                
> BulkOutputFormat should support Hadoop MultipleOutput
> -----------------------------------------------------
>
>                 Key: CASSANDRA-4912
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-4912
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Hadoop
>    Affects Versions: 1.2.0 beta 1
>            Reporter: Michael Kjellman
>
> Much like CASSANDRA-4208 BOF should support outputting to Multiple Column Families. The current approach takken in the patch for COF results in only one stream being sent.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (CASSANDRA-4912) BulkOutputFormat should support Hadoop MultipleOutput

Posted by "Michael Kjellman (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/CASSANDRA-4912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Michael Kjellman updated CASSANDRA-4912:
----------------------------------------

    Attachment:     (was: Example.java)
    
> BulkOutputFormat should support Hadoop MultipleOutput
> -----------------------------------------------------
>
>                 Key: CASSANDRA-4912
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-4912
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Hadoop
>    Affects Versions: 1.2.0 beta 1, 1.2.0 beta 2
>            Reporter: Michael Kjellman
>         Attachments: 4912.txt, App.java, loaddata.pl
>
>
> Much like CASSANDRA-4208 BOF should support outputting to Multiple Column Families. The current approach takken in the patch for COF results in only one stream being sent.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Comment Edited] (CASSANDRA-4912) BulkOutputFormat should support Hadoop MultipleOutput

Posted by "Michael Kjellman (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-4912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13493577#comment-13493577 ] 

Michael Kjellman edited comment on CASSANDRA-4912 at 11/8/12 10:44 PM:
-----------------------------------------------------------------------

So when ConfigHelper calls checkOutputSpecs() in local mode when the job is setup we don't throw any exceptions. When a reducer is created however org.apache.cassandra.hadoop.ConfigHelper.getOutputColumnFamily throws a UnsupportedOperationException that the output column family isn't setup. It looks like mapreduce.output.basename is null.

Job Config is something along the lines of

public int run(String[] args) throws Exception
	{	
		Job job = new Job(getConf(), "MRJobName");
	
		job.setJarByClass(Nashoba.class);
		job.setMapperClass(TokenizerMapper.class);
		job.setReducerClass(ReducerToCassandra.class);
		job.setInputFormatClass(ColumnFamilyInputFormat.class);
		
		// setup 3 reducers
		job.setNumReduceTasks(3);

		// thrift input job settings
		ConfigHelper.setInputRpcPort(job.getConfiguration(), "9160");
		ConfigHelper.setInputInitialAddress(job.getConfiguration(), "127.0.0.1");
		ConfigHelper.setInputPartitioner(job.getConfiguration(), "RandomPartitioner");

		// thrift output job settings
		ConfigHelper.setOutputRpcPort(job.getConfiguration(), "9160");
		ConfigHelper.setOutputInitialAddress(job.getConfiguration(), "127.0.0.1");
		ConfigHelper.setOutputPartitioner(job.getConfiguration(), "RandomPartitioner");
		
		//set timeout to 1 hour for testing
		job.getConfiguration().set("mapreduce.task.timeout", "3600000");
		job.getConfiguration().set("mapred.task.timeout", "3600000");
		
		job.getConfiguration().set("mapreduce.output.bulkoutputformat.buffersize", "64");
job.setOutputFormatClass(BulkOutputFormat.class);
		ConfigHelper.setRangeBatchSize(getConf(), 99);
		
		// let ConfigHelper know what Column Family to get data from and where to output it
		ConfigHelper.setInputColumnFamily(job.getConfiguration(), KEYSPACE, INPUT_COLUMN_FAMILY);
		
		ConfigHelper.setOutputKeyspace(job.getConfiguration(), KEYSPACE);
		MultipleOutputs.addNamedOutput(job, OUTPUT_COLUMN_FAMILY1, BulkOutputFormat.class, ByteBuffer.class, List.class);
		MultipleOutputs.addNamedOutput(job, OUTPUT_COLUMN_FAMILY2, BulkOutputFormat.class, ByteBuffer.class, List.class);
		
		//what classes the mapper will write and what the consumer should expect to recieve
		job.setMapOutputKeyClass(Text.class);
		job.setMapOutputValueClass(MapWritable.class);
		job.setOutputKeyClass(ByteBuffer.class);
		job.setOutputValueClass(List.class);
		
		SliceRange sliceRange = new SliceRange();
		sliceRange.setStart(new bytes[0]);
		sliceRange.setFinish(new bytes[0]);
		SlicePredicate predicate = new SlicePredicate();
		predicate.setSlice_range(sliceRange);
		ConfigHelper.setInputSlicePredicate(job.getConfiguration(), predicate);
		
		job.waitForCompletion(true);
		return 0;
}

public static class ReducerToCassandra extends Reducer<Text, MapWritable, ByteBuffer, List<Mutation>>
	{
		private MultipleOutputs<ByteBuffer, List<Mutation>> output;
		
		@Override
		public void setup(Context context) {
			output = new MultipleOutputs<ByteBuffer, List<Mutation>>(context);
		}
		
		public void reduce(Text word, Iterable<MapWritable> values, Context context) throws IOException, InterruptedException
    	{
			do stuff in reducer...

			//write out our result to Hadoop
			context.progress();
			//for writing to 2 column families
			output.write(OUTPUT_COLUMN_FAMILY1, key, Collections.singletonList(getMutation1(word, val)));
			output.write(OUTPUT_COLUMN_FAMILY2, key, Collections.singletonList(getMutation2(word, val)));
		}

		
		public void cleanup(Context context) throws IOException, InterruptedException {
			output.close(); //closes all of the opened outputs
		}

	}
                
      was (Author: mkjellman):
    So when ConfigHelper calls checkOutputSpecs() in local mode when the job is setup we don't throw any exceptions. When a reducer is created however org.apache.cassandra.hadoop.ConfigHelper.getOutputColumnFamily throws a UnsupportedOperationException that the output column family isn't setup. It looks like mapreduce.output.basename is null.

Job Config is something along the lines of

public int run(String[] args) throws Exception
	{	
		Job job = new Job(getConf(), "Nashoba");
	
		job.setJarByClass(Nashoba.class);
		job.setMapperClass(TokenizerMapper.class);
		job.setReducerClass(ReducerToCassandra.class);
		job.setInputFormatClass(ColumnFamilyInputFormat.class);
		
		// setup 3 reducers
		job.setNumReduceTasks(3);

		// thrift input job settings
		ConfigHelper.setInputRpcPort(job.getConfiguration(), "9160");
		ConfigHelper.setInputInitialAddress(job.getConfiguration(), "127.0.0.1");
		ConfigHelper.setInputPartitioner(job.getConfiguration(), "RandomPartitioner");

		// thrift output job settings
		ConfigHelper.setOutputRpcPort(job.getConfiguration(), "9160");
		ConfigHelper.setOutputInitialAddress(job.getConfiguration(), "127.0.0.1");
		ConfigHelper.setOutputPartitioner(job.getConfiguration(), "RandomPartitioner");
		
		//set timeout to 1 hour for testing
		job.getConfiguration().set("mapreduce.task.timeout", "3600000");
		job.getConfiguration().set("mapred.task.timeout", "3600000");
		
		job.getConfiguration().set("mapreduce.output.bulkoutputformat.buffersize", "64");
job.setOutputFormatClass(BulkOutputFormat.class);

		
		ConfigHelper.setRangeBatchSize(getConf(), 99);
	
		
		
		
		// let ConfigHelper know what Column Family to get data from and where to output it
		ConfigHelper.setInputColumnFamily(job.getConfiguration(), KEYSPACE, INPUT_COLUMN_FAMILY);
		
		ConfigHelper.setOutputKeyspace(job.getConfiguration(), KEYSPACE);
		MultipleOutputs.addNamedOutput(job, OUTPUT_COLUMN_FAMILY1, BulkOutputFormat.class, ByteBuffer.class, List.class);
		MultipleOutputs.addNamedOutput(job, OUTPUT_COLUMN_FAMILY2, BulkOutputFormat.class, ByteBuffer.class, List.class);
		
		//what classes the mapper will write and what the consumer should expect to recieve
		job.setMapOutputKeyClass(Text.class);
		job.setMapOutputValueClass(MapWritable.class);
		job.setOutputKeyClass(ByteBuffer.class);
		job.setOutputValueClass(List.class);
		
		SliceRange sliceRange = new SliceRange();
		sliceRange.setStart(new bytes[0]);
		sliceRange.setFinish(new bytes[0]);
		SlicePredicate predicate = new SlicePredicate();
		predicate.setSlice_range(sliceRange);
		ConfigHelper.setInputSlicePredicate(job.getConfiguration(), predicate);
		
		job.waitForCompletion(true);
		return 0;
}

public static class ReducerToCassandra extends Reducer<Text, MapWritable, ByteBuffer, List<Mutation>>
	{
		private MultipleOutputs<ByteBuffer, List<Mutation>> output;
		
		@Override
		public void setup(Context context) {
			output = new MultipleOutputs<ByteBuffer, List<Mutation>>(context);
		}
		
		public void reduce(Text word, Iterable<MapWritable> values, Context context) throws IOException, InterruptedException
    	{
			do stuff in reducer...

			//write out our result to Hadoop
			context.progress();
			//for writing to 2 column families
			output.write(OUTPUT_COLUMN_FAMILY1, key, Collections.singletonList(getMutation1(word, val)));
			output.write(OUTPUT_COLUMN_FAMILY2, key, Collections.singletonList(getMutation2(word, val)));
		}

		
		public void cleanup(Context context) throws IOException, InterruptedException {
			output.close(); //closes all of the opened outputs
		}

	}
                  
> BulkOutputFormat should support Hadoop MultipleOutput
> -----------------------------------------------------
>
>                 Key: CASSANDRA-4912
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-4912
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Hadoop
>    Affects Versions: 1.2.0 beta 1
>            Reporter: Michael Kjellman
>
> Much like CASSANDRA-4208 BOF should support outputting to Multiple Column Families. The current approach takken in the patch for COF results in only one stream being sent.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (CASSANDRA-4912) BulkOutputFormat should support Hadoop MultipleOutput

Posted by "Brandon Williams (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/CASSANDRA-4912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Brandon Williams updated CASSANDRA-4912:
----------------------------------------

    Assignee: Brandon Williams
    
> BulkOutputFormat should support Hadoop MultipleOutput
> -----------------------------------------------------
>
>                 Key: CASSANDRA-4912
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-4912
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Hadoop
>    Affects Versions: 1.2.0 beta 1, 1.2.0 beta 2
>            Reporter: Michael Kjellman
>            Assignee: Brandon Williams
>         Attachments: 4912.txt, App.java, loaddata.pl, pom.xml
>
>
> Much like CASSANDRA-4208 BOF should support outputting to Multiple Column Families. The current approach takken in the patch for COF results in only one stream being sent.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-4912) BulkOutputFormat should support Hadoop MultipleOutput

Posted by "Michael Kjellman (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-4912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13493577#comment-13493577 ] 

Michael Kjellman commented on CASSANDRA-4912:
---------------------------------------------

So when ConfigHelper calls checkOutputSpecs() in local mode when the job is setup we don't throw any exceptions. When a reducer is created however org.apache.cassandra.hadoop.ConfigHelper.getOutputColumnFamily throws a UnsupportedOperationException that the output column family isn't setup. It looks like mapreduce.output.basename is null.

Job Config is something along the lines of

public int run(String[] args) throws Exception
	{	
		Job job = new Job(getConf(), "Nashoba");
	
		job.setJarByClass(Nashoba.class);
		job.setMapperClass(TokenizerMapper.class);
		job.setReducerClass(ReducerToCassandra.class);
		job.setInputFormatClass(ColumnFamilyInputFormat.class);
		
		// setup 3 reducers
		job.setNumReduceTasks(3);

		// thrift input job settings
		ConfigHelper.setInputRpcPort(job.getConfiguration(), "9160");
		ConfigHelper.setInputInitialAddress(job.getConfiguration(), "127.0.0.1");
		ConfigHelper.setInputPartitioner(job.getConfiguration(), "RandomPartitioner");

		// thrift output job settings
		ConfigHelper.setOutputRpcPort(job.getConfiguration(), "9160");
		ConfigHelper.setOutputInitialAddress(job.getConfiguration(), "127.0.0.1");
		ConfigHelper.setOutputPartitioner(job.getConfiguration(), "RandomPartitioner");
		
		//set timeout to 1 hour for testing
		job.getConfiguration().set("mapreduce.task.timeout", "3600000");
		job.getConfiguration().set("mapred.task.timeout", "3600000");
		
		job.getConfiguration().set("mapreduce.output.bulkoutputformat.buffersize", "64");
job.setOutputFormatClass(BulkOutputFormat.class);

		
		ConfigHelper.setRangeBatchSize(getConf(), 99);
	
		
		
		
		// let ConfigHelper know what Column Family to get data from and where to output it
		ConfigHelper.setInputColumnFamily(job.getConfiguration(), KEYSPACE, INPUT_COLUMN_FAMILY);
		
		ConfigHelper.setOutputKeyspace(job.getConfiguration(), KEYSPACE);
		MultipleOutputs.addNamedOutput(job, OUTPUT_COLUMN_FAMILY1, BulkOutputFormat.class, ByteBuffer.class, List.class);
		MultipleOutputs.addNamedOutput(job, OUTPUT_COLUMN_FAMILY2, BulkOutputFormat.class, ByteBuffer.class, List.class);
		
		//what classes the mapper will write and what the consumer should expect to recieve
		job.setMapOutputKeyClass(Text.class);
		job.setMapOutputValueClass(MapWritable.class);
		job.setOutputKeyClass(ByteBuffer.class);
		job.setOutputValueClass(List.class);
		
		SliceRange sliceRange = new SliceRange();
		sliceRange.setStart(new bytes[0]);
		sliceRange.setFinish(new bytes[0]);
		SlicePredicate predicate = new SlicePredicate();
		predicate.setSlice_range(sliceRange);
		ConfigHelper.setInputSlicePredicate(job.getConfiguration(), predicate);
		
		job.waitForCompletion(true);
		return 0;
}

public static class ReducerToCassandra extends Reducer<Text, MapWritable, ByteBuffer, List<Mutation>>
	{
		private MultipleOutputs<ByteBuffer, List<Mutation>> output;
		
		@Override
		public void setup(Context context) {
			output = new MultipleOutputs<ByteBuffer, List<Mutation>>(context);
		}
		
		public void reduce(Text word, Iterable<MapWritable> values, Context context) throws IOException, InterruptedException
    	{
			do stuff in reducer...

			//write out our result to Hadoop
			context.progress();
			//for writing to 2 column families
			output.write(OUTPUT_COLUMN_FAMILY1, key, Collections.singletonList(getMutation1(word, val)));
			output.write(OUTPUT_COLUMN_FAMILY2, key, Collections.singletonList(getMutation2(word, val)));
		}

		
		public void cleanup(Context context) throws IOException, InterruptedException {
			output.close(); //closes all of the opened outputs
		}

	}
                
> BulkOutputFormat should support Hadoop MultipleOutput
> -----------------------------------------------------
>
>                 Key: CASSANDRA-4912
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-4912
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Hadoop
>    Affects Versions: 1.2.0 beta 1
>            Reporter: Michael Kjellman
>
> Much like CASSANDRA-4208 BOF should support outputting to Multiple Column Families. The current approach takken in the patch for COF results in only one stream being sent.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-4912) BulkOutputFormat should support Hadoop MultipleOutput

Posted by "Michael Kjellman (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-4912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13494542#comment-13494542 ] 

Michael Kjellman commented on CASSANDRA-4912:
---------------------------------------------

okay so it looks like setting outputdir in the creation of the object is causing the problem. I moved setting outputdir into prepareWriter() and it looks like both sstables are created and streamed.

[~brandon.williams] any reason the outputdir is created when the BulkRecordWriter object is created?
                
> BulkOutputFormat should support Hadoop MultipleOutput
> -----------------------------------------------------
>
>                 Key: CASSANDRA-4912
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-4912
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Hadoop
>    Affects Versions: 1.2.0 beta 1
>            Reporter: Michael Kjellman
>         Attachments: Example.java
>
>
> Much like CASSANDRA-4208 BOF should support outputting to Multiple Column Families. The current approach takken in the patch for COF results in only one stream being sent.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (CASSANDRA-4912) BulkOutputFormat should support Hadoop MultipleOutput

Posted by "Michael Kjellman (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/CASSANDRA-4912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Michael Kjellman updated CASSANDRA-4912:
----------------------------------------

    Attachment: Example.java
    
> BulkOutputFormat should support Hadoop MultipleOutput
> -----------------------------------------------------
>
>                 Key: CASSANDRA-4912
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-4912
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Hadoop
>    Affects Versions: 1.2.0 beta 1, 1.2.0 beta 2
>            Reporter: Michael Kjellman
>         Attachments: 4912.txt, Example.java
>
>
> Much like CASSANDRA-4208 BOF should support outputting to Multiple Column Families. The current approach takken in the patch for COF results in only one stream being sent.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Comment Edited] (CASSANDRA-4912) BulkOutputFormat should support Hadoop MultipleOutput

Posted by "Michael Kjellman (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-4912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13493577#comment-13493577 ] 

Michael Kjellman edited comment on CASSANDRA-4912 at 11/8/12 10:54 PM:
-----------------------------------------------------------------------

So when ConfigHelper calls checkOutputSpecs() in local mode when the job is setup we don't throw any exceptions. When a reducer is created however org.apache.cassandra.hadoop.ConfigHelper.getOutputColumnFamily throws a UnsupportedOperationException that the output column family isn't setup. It looks like mapreduce.output.basename is null.

See Example.java attached as a stripped down example MR job.
                
      was (Author: mkjellman):
    So when ConfigHelper calls checkOutputSpecs() in local mode when the job is setup we don't throw any exceptions. When a reducer is created however org.apache.cassandra.hadoop.ConfigHelper.getOutputColumnFamily throws a UnsupportedOperationException that the output column family isn't setup. It looks like mapreduce.output.basename is null.

Job Config is something along the lines of

public int run(String[] args) throws Exception
	{	
		Job job = new Job(getConf(), "MRJobName");
	
		job.setJarByClass(Nashoba.class);
		job.setMapperClass(TokenizerMapper.class);
		job.setReducerClass(ReducerToCassandra.class);
		job.setInputFormatClass(ColumnFamilyInputFormat.class);
		
		// setup 3 reducers
		job.setNumReduceTasks(3);

		// thrift input job settings
		ConfigHelper.setInputRpcPort(job.getConfiguration(), "9160");
		ConfigHelper.setInputInitialAddress(job.getConfiguration(), "127.0.0.1");
		ConfigHelper.setInputPartitioner(job.getConfiguration(), "RandomPartitioner");

		// thrift output job settings
		ConfigHelper.setOutputRpcPort(job.getConfiguration(), "9160");
		ConfigHelper.setOutputInitialAddress(job.getConfiguration(), "127.0.0.1");
		ConfigHelper.setOutputPartitioner(job.getConfiguration(), "RandomPartitioner");
		
		//set timeout to 1 hour for testing
		job.getConfiguration().set("mapreduce.task.timeout", "3600000");
		job.getConfiguration().set("mapred.task.timeout", "3600000");
		
		job.getConfiguration().set("mapreduce.output.bulkoutputformat.buffersize", "64");
job.setOutputFormatClass(BulkOutputFormat.class);
		ConfigHelper.setRangeBatchSize(getConf(), 99);
		
		// let ConfigHelper know what Column Family to get data from and where to output it
		ConfigHelper.setInputColumnFamily(job.getConfiguration(), KEYSPACE, INPUT_COLUMN_FAMILY);
		
		ConfigHelper.setOutputKeyspace(job.getConfiguration(), KEYSPACE);
		MultipleOutputs.addNamedOutput(job, OUTPUT_COLUMN_FAMILY1, BulkOutputFormat.class, ByteBuffer.class, List.class);
		MultipleOutputs.addNamedOutput(job, OUTPUT_COLUMN_FAMILY2, BulkOutputFormat.class, ByteBuffer.class, List.class);
		
		//what classes the mapper will write and what the consumer should expect to recieve
		job.setMapOutputKeyClass(Text.class);
		job.setMapOutputValueClass(MapWritable.class);
		job.setOutputKeyClass(ByteBuffer.class);
		job.setOutputValueClass(List.class);
		
		SliceRange sliceRange = new SliceRange();
		sliceRange.setStart(new bytes[0]);
		sliceRange.setFinish(new bytes[0]);
		SlicePredicate predicate = new SlicePredicate();
		predicate.setSlice_range(sliceRange);
		ConfigHelper.setInputSlicePredicate(job.getConfiguration(), predicate);
		
		job.waitForCompletion(true);
		return 0;
}

public static class ReducerToCassandra extends Reducer<Text, MapWritable, ByteBuffer, List<Mutation>>
	{
		private MultipleOutputs<ByteBuffer, List<Mutation>> output;
		
		@Override
		public void setup(Context context) {
			output = new MultipleOutputs<ByteBuffer, List<Mutation>>(context);
		}
		
		public void reduce(Text word, Iterable<MapWritable> values, Context context) throws IOException, InterruptedException
    	{
			do stuff in reducer...

			//write out our result to Hadoop
			context.progress();
			//for writing to 2 column families
			output.write(OUTPUT_COLUMN_FAMILY1, key, Collections.singletonList(getMutation1(word, val)));
			output.write(OUTPUT_COLUMN_FAMILY2, key, Collections.singletonList(getMutation2(word, val)));
		}

		
		public void cleanup(Context context) throws IOException, InterruptedException {
			output.close(); //closes all of the opened outputs
		}

	}
                  
> BulkOutputFormat should support Hadoop MultipleOutput
> -----------------------------------------------------
>
>                 Key: CASSANDRA-4912
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-4912
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Hadoop
>    Affects Versions: 1.2.0 beta 1
>            Reporter: Michael Kjellman
>
> Much like CASSANDRA-4208 BOF should support outputting to Multiple Column Families. The current approach takken in the patch for COF results in only one stream being sent.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Assigned] (CASSANDRA-4912) BulkOutputFormat should support Hadoop MultipleOutput

Posted by "Brandon Williams (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/CASSANDRA-4912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Brandon Williams reassigned CASSANDRA-4912:
-------------------------------------------

    Assignee: Michael Kjellman
    
> BulkOutputFormat should support Hadoop MultipleOutput
> -----------------------------------------------------
>
>                 Key: CASSANDRA-4912
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-4912
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Hadoop
>    Affects Versions: 1.2.0 beta 1, 1.2.0 beta 2
>            Reporter: Michael Kjellman
>            Assignee: Michael Kjellman
>         Attachments: 4912.txt, App.java, loaddata.pl, pom.xml
>
>
> Much like CASSANDRA-4208 BOF should support outputting to Multiple Column Families. The current approach taken in the patch for COF results in only one stream being sent and an exception being thrown when Hadoop is run in local mode due to the call to ConfigHelper when a new BulkRecordWriter is created.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (CASSANDRA-4912) BulkOutputFormat should support Hadoop MultipleOutput

Posted by "Michael Kjellman (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/CASSANDRA-4912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Michael Kjellman updated CASSANDRA-4912:
----------------------------------------

    Description: Much like CASSANDRA-4208 BOF should support outputting to Multiple Column Families. The current approach taken in the patch for COF results in only one stream being sent and an exception being thrown when Hadoop is run in local mode due to the call to ConfigHelper when a new BulkRecordWriter is created.  (was: Much like CASSANDRA-4208 BOF should support outputting to Multiple Column Families. The current approach takken in the patch for COF results in only one stream being sent.)
    
> BulkOutputFormat should support Hadoop MultipleOutput
> -----------------------------------------------------
>
>                 Key: CASSANDRA-4912
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-4912
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Hadoop
>    Affects Versions: 1.2.0 beta 1, 1.2.0 beta 2
>            Reporter: Michael Kjellman
>         Attachments: 4912.txt, App.java, loaddata.pl, pom.xml
>
>
> Much like CASSANDRA-4208 BOF should support outputting to Multiple Column Families. The current approach taken in the patch for COF results in only one stream being sent and an exception being thrown when Hadoop is run in local mode due to the call to ConfigHelper when a new BulkRecordWriter is created.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-4912) BulkOutputFormat should support Hadoop MultipleOutput

Posted by "Michael Kjellman (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-4912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13496382#comment-13496382 ] 

Michael Kjellman commented on CASSANDRA-4912:
---------------------------------------------

Updated example with imports.
                
> BulkOutputFormat should support Hadoop MultipleOutput
> -----------------------------------------------------
>
>                 Key: CASSANDRA-4912
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-4912
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Hadoop
>    Affects Versions: 1.2.0 beta 1, 1.2.0 beta 2
>            Reporter: Michael Kjellman
>         Attachments: 4912.txt, Example.java
>
>
> Much like CASSANDRA-4208 BOF should support outputting to Multiple Column Families. The current approach takken in the patch for COF results in only one stream being sent.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (CASSANDRA-4912) BulkOutputFormat should support Hadoop MultipleOutput

Posted by "Michael Kjellman (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/CASSANDRA-4912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Michael Kjellman updated CASSANDRA-4912:
----------------------------------------

    Attachment: Example.java
    
> BulkOutputFormat should support Hadoop MultipleOutput
> -----------------------------------------------------
>
>                 Key: CASSANDRA-4912
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-4912
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Hadoop
>    Affects Versions: 1.2.0 beta 1
>            Reporter: Michael Kjellman
>         Attachments: Example.java
>
>
> Much like CASSANDRA-4208 BOF should support outputting to Multiple Column Families. The current approach takken in the patch for COF results in only one stream being sent.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-4912) BulkOutputFormat should support Hadoop MultipleOutput

Posted by "Michael Kjellman (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-4912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13496397#comment-13496397 ] 

Michael Kjellman commented on CASSANDRA-4912:
---------------------------------------------

yeah sorry wasn't originally intended as a functional example. i'll create one that does something now.
                
> BulkOutputFormat should support Hadoop MultipleOutput
> -----------------------------------------------------
>
>                 Key: CASSANDRA-4912
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-4912
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Hadoop
>    Affects Versions: 1.2.0 beta 1, 1.2.0 beta 2
>            Reporter: Michael Kjellman
>         Attachments: 4912.txt, Example.java
>
>
> Much like CASSANDRA-4208 BOF should support outputting to Multiple Column Families. The current approach takken in the patch for COF results in only one stream being sent.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (CASSANDRA-4912) BulkOutputFormat should support Hadoop MultipleOutput

Posted by "Michael Kjellman (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/CASSANDRA-4912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Michael Kjellman updated CASSANDRA-4912:
----------------------------------------

    Attachment: App.java
    
> BulkOutputFormat should support Hadoop MultipleOutput
> -----------------------------------------------------
>
>                 Key: CASSANDRA-4912
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-4912
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Hadoop
>    Affects Versions: 1.2.0 beta 1, 1.2.0 beta 2
>            Reporter: Michael Kjellman
>         Attachments: 4912.txt, App.java, loaddata.pl
>
>
> Much like CASSANDRA-4208 BOF should support outputting to Multiple Column Families. The current approach takken in the patch for COF results in only one stream being sent.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-4912) BulkOutputFormat should support Hadoop MultipleOutput

Posted by "Brandon Williams (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-4912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13496379#comment-13496379 ] 

Brandon Williams commented on CASSANDRA-4912:
---------------------------------------------

Do you have an Example.java that contains all the imports?
                
> BulkOutputFormat should support Hadoop MultipleOutput
> -----------------------------------------------------
>
>                 Key: CASSANDRA-4912
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-4912
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Hadoop
>    Affects Versions: 1.2.0 beta 1, 1.2.0 beta 2
>            Reporter: Michael Kjellman
>         Attachments: 4912.txt, Example.java
>
>
> Much like CASSANDRA-4208 BOF should support outputting to Multiple Column Families. The current approach takken in the patch for COF results in only one stream being sent.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (CASSANDRA-4912) BulkOutputFormat should support Hadoop MultipleOutput

Posted by "Michael Kjellman (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/CASSANDRA-4912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Michael Kjellman updated CASSANDRA-4912:
----------------------------------------

    Attachment:     (was: Example.java)
    
> BulkOutputFormat should support Hadoop MultipleOutput
> -----------------------------------------------------
>
>                 Key: CASSANDRA-4912
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-4912
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Hadoop
>    Affects Versions: 1.2.0 beta 1, 1.2.0 beta 2
>            Reporter: Michael Kjellman
>         Attachments: 4912.txt, Example.java
>
>
> Much like CASSANDRA-4208 BOF should support outputting to Multiple Column Families. The current approach takken in the patch for COF results in only one stream being sent.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (CASSANDRA-4912) BulkOutputFormat should support Hadoop MultipleOutput

Posted by "Brandon Williams (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/CASSANDRA-4912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Brandon Williams updated CASSANDRA-4912:
----------------------------------------

    Reviewer: brandon.williams
    Assignee:     (was: Brandon Williams)
    
> BulkOutputFormat should support Hadoop MultipleOutput
> -----------------------------------------------------
>
>                 Key: CASSANDRA-4912
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-4912
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Hadoop
>    Affects Versions: 1.2.0 beta 1, 1.2.0 beta 2
>            Reporter: Michael Kjellman
>         Attachments: 4912.txt, App.java, loaddata.pl, pom.xml
>
>
> Much like CASSANDRA-4208 BOF should support outputting to Multiple Column Families. The current approach takken in the patch for COF results in only one stream being sent.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira