You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@storm.apache.org by "Robert Joseph Evans (JIRA)" <ji...@apache.org> on 2015/05/19 19:46:00 UTC
[jira] [Created] (STORM-828) HdfsBolt takes a lot of configuration,
need good defaults
Robert Joseph Evans created STORM-828:
-----------------------------------------
Summary: HdfsBolt takes a lot of configuration, need good defaults
Key: STORM-828
URL: https://issues.apache.org/jira/browse/STORM-828
Project: Apache Storm
Issue Type: Improvement
Reporter: Robert Joseph Evans
The following is code from https://github.com/apache/storm/blob/master/external/storm-hdfs/src/test/java/org/apache/storm/hdfs/bolt/HdfsFileTopology.java representing the amount of configuration required to use the HdfsBolt.
{code}
// sync the filesystem after every 1k tuples
SyncPolicy syncPolicy = new CountSyncPolicy(1000);
// rotate files every 1 min
FileRotationPolicy rotationPolicy = new TimedRotationPolicy(1.0f, TimedRotationPolicy.TimeUnit.MINUTES);
FileNameFormat fileNameFormat = new DefaultFileNameFormat()
.withPath("/tmp/foo/")
.withExtension(".txt");
RecordFormat format = new DelimitedRecordFormat()
.withFieldDelimiter("|");
Yaml yaml = new Yaml();
InputStream in = new FileInputStream(args[1]);
Map<String, Object> yamlConf = (Map<String, Object>) yaml.load(in);
in.close();
config.put("hdfs.config", yamlConf);
HdfsBolt bolt = new HdfsBolt()
.withConfigKey("hdfs.config")
.withFsUrl(args[0])
.withFileNameFormat(fileNameFormat)
.withRecordFormat(format)
.withRotationPolicy(rotationPolicy)
.withSyncPolicy(syncPolicy)
.addRotationAction(new MoveFileAction().toDestination("/tmp/dest2/"));
{code}
This is way too much. If it were just an example showing all of the possibilities that would be OK but of the 8 lines used in the construction of the bolt, 5 of them are required or the bolt will blow up at run time. We should provide reasonable defaults for everything that can have a reasonable default. And required parameters should be passed in through the constructor, not as builder arguments. I realize we need to maintain backwards compatibility so we may need some new Bolt definitions.
{code}
HdfsTSVBolt bolt = new HdfsTSVBolt(outputDir);
{code}
If someone wanted to sync every 100 records instead of every 1000 we could do
{code}
TSVFileBolt bolt = new TSVFileBolt(outputDir).withSyncPolicy(new CountSyncPolicy(100))
{code}
I would like to see a base HdfsFileBolt that requires a record format, and an output directory. It would have defaults for everything else. Then we could have a TSVFileBolt and CSVFileBolt subclass it and ideally SequenceFileBolt as well.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)