You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@gobblin.apache.org by ab...@apache.org on 2018/01/03 20:02:15 UTC

[06/10] incubator-gobblin git commit: [GOBBLIN-355] Updated CHANGELOG for 0.12.0 release

[GOBBLIN-355] Updated CHANGELOG for 0.12.0 release


Project: http://git-wip-us.apache.org/repos/asf/incubator-gobblin/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-gobblin/commit/50b0bcfb
Tree: http://git-wip-us.apache.org/repos/asf/incubator-gobblin/tree/50b0bcfb
Diff: http://git-wip-us.apache.org/repos/asf/incubator-gobblin/diff/50b0bcfb

Branch: refs/heads/master
Commit: 50b0bcfb414b8355fe45ce5c0f5b151ca0024172
Parents: f2e7c06
Author: Abhishek Tiwari <ab...@gmail.com>
Authored: Wed Jan 3 19:27:04 2018 +0530
Committer: Abhishek Tiwari <ab...@gmail.com>
Committed: Wed Jan 3 19:27:04 2018 +0530

----------------------------------------------------------------------
 CHANGELOG.md | 220 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 220 insertions(+)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/incubator-gobblin/blob/50b0bcfb/CHANGELOG.md
----------------------------------------------------------------------
diff --git a/CHANGELOG.md b/CHANGELOG.md
index e697552..c6c262f 100644
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -1,3 +1,223 @@
+GOBBLIN 0.12.0
+-------------
+
+###Created Date: 1/03/2018
+
+## HIGHLIGHTS 
+
+* First Apache Release.
+* Improved Gobblin-as-a-Service. 
+* Improved Global Throttling. 
+* Improved Gobblin Cluster. 
+* Enhanced stream processing. 
+* New Converters: JsonToParquet, GrokToJson, JsonToAvro.
+* New Sources: RegexPartitionedAvroFileSource, new SalesforceWriter.
+* New Extractors: PostgresqlExtractor, EnvelopePayloadExtractor.
+* New Writers: ParquetHdfsDataWriter, eventually consistent FS support.
+
+## NEW FEATURES 
+
+* [State Store] [GOBBLIN-199] GOBBLIN-56 Add state store entry listing API
+* [State Store] [GOBBLIN-200] GOBBLIN-56 State store dataset cleaner using state store listing API
+* [Extractor] [GOBBLIN-203] Postgresql Extractor
+* [Extractor] [GOBBLIN-238] Implement EnvelopePayloadExtractor and EnvelopePayloadDeserializer
+* [Converter] [GOBBLIN-248] Converter for Json to Parquet
+* [Converter] [GOBBLIN-231] Grok to Json Converter
+* [Converter] [GOBBLIN-221] Add Json to Avro converter
+* [Writer] [GOBBLIN-255] ParquetHdfsDataWriter
+* [Writer] [GOBBLIN-36] New salesforce writer
+* [Encryption] [GOBBLIN-224] Gobblin doesn't support keyring based GPG file decryption
+* [Kafka] [GOBBLIN-190] Kafka Sink replication factor and partition creation.
+* [Avro-to-ORC] [GOBBLIN-181] Modify Avro2ORC flow to materialize Hive views
+
+## IMPROVEMENTS
+
+* [GaaS] [GOBBLIN-232] Create Azkaban Orchestrator for Gobblin-as-a-Service
+* [GaaS] [GOBBLIN-213] Add scheduler service to GobblinServiceManager
+* [GaaS] [GOBBLIN-3] Implementation of Flow compiler with multiple hops
+* [GaaS] [GOBBLIN-280] Add new SpecCompiler compatible constructor to AzkabanSpecExecutor
+* [GaaS] [GOBBLIN-299] Add deletion support to Azkaban Orchestrator
+* [GaaS] [GOBBLIN-262] Make multihopcompiler use the first user specified template
+* [GaaS] [GOBBLIN-204] Add a service that fetches GaaS flow configs from a git repository
+* [GaaS] [GOBBLIN-292] Add kafka09 support for service and cluster job spec communication
+* [GaaS] [GOBBLIN-281] Fix logging in gobblin-service
+* [GaaS] [GOBBLIN-273] Add failure monitoring
+* [GaaS] [GOBBLIN-304] Remove versioning from Gobblin-as-a-Service flow specs
+* [Global Throttling] [GOBBLIN-334] Implement SharedResourceFactory for LineageInfo
+* [Global Throttling] [GOBBLIN-287] Support service-level throttling quotas
+* [Global Throttling] [GOBBLIN-264] Add a SharedResourceFactory for creating shared DataPublishers
+* [Global Throttling] [GOBBLIN-251] Having UpdateProviderFactory able to instantiate FileSystem with URI
+* [Global Throtlting] [GOBBLIN-236] Add a ControlMessage injector as a RecordStreamProcessor
+* [Global Throttling] [GOBBLIN-24] Allow disabling global throttling. Fix a race condition in BatchedPer…
+* [Cluster] [GOBBLIN-329] Add a basic cluster integration test
+* [Cluster] [GOBBLIN-325] Add a Source and Extractor for stress testing
+* [Cluster] [GOBBLIN-324] Add a configuration to configure the cluster working directory
+* [Cluster] [GOBBLIN-257] Remove old jobs' run data
+* [Cluster] [GOBBLIN-202] Add better metrics to gobblin to support AWS autoscaling
+* [Cluster] [GOBBLIN-320] Add metrics to GobblinHelixJobScheduler
+* [Cluster] [GOBBLIN-185] Design for gobblin job level gracefully shutdown
+* [Cluster] [GOBBLIN-11] Fix for #1822 and #1823
+* [Cluster] [GOBBLIN-10] Fix_for_#1850_and_#1851
+* [Cluster] [GOBBLIN-349] Add guages for gobblin cluster metrics
+* [Core] [GOBBLIN-177] Allow error limit to skip records which are not convertible
+* [Core] [GOBBLIN-333] Remove reference to log4j in WriterUtils
+* [Core] [GOBBLIN-332] Implement fetching hive tokens in tokenUtils
+* [Core] [GOBBLIN-330] Generate Kerberos Principal dynamically
+* [Core] [GOBBLIN-319] Add DatasetResolver to transform raw Gobblin dataset to application specific dataset
+* [Core] [GOBBLIN-317] Add dynamic configuration injection in the mappers
+* [Core] [GOBBLIN-310] Skip rerunning completed tasks on mapper reattempts
+* [Core] [GOBBLIN-300] Use 1.7.7 form of Schema.createUnion() API that takes in a list
+* [Core] [GOBBLIN-294] Change logging level of refection utilities
+* [Core] [GOBBLIN-271] Move the grok converter to the gobblin-grok module
+* [Core] [GOBBLIN-252] Add some azkaban related constants
+* [Core] [GOBBLIN-240] Adding three more Azkaban tags
+* [Core] [GOBBLIN-186] Add support for using the Kerberos authentication plugin without a GobblinDriverInstance
+* [Core] [GOBBLIN-179] Make migrated Gobblin code work with old state files
+* [Core] [GOBBLIN-178] Migrate Gobblin codebase from gobblin to org.apache.gobblin package
+* [State Store] [GOBBLIN-335] Increase blob size in MySQL state store
+* [State Store] [GOBBLIN-270] State Migration script
+* [State Store] [GOBBLIN-230] Convert old package name to new name in old states
+* [Source] [GOBBLIN-296] Kafka json source and writer
+* [Source] [GOBBLIN-245] Create topic specific extract of a WorkUnit in KafkaSource
+* [Source] [GOBBLIN-210] Implement a source based on Dataset Finder
+* [Extractor] [GOBBLIN-197] Modify JDBCExtractor to support reading clob columns as strings
+* [Converter] [GOBBLIN-228] Add config property to ignore fields in JsonRecordAvroSchemaToAvroConverter
+* [Converter] [GOBBLIN-226] Nested schema support in JsonStringToJsonIntermediateConverter and JsonIntermediateToAvroConverter
+* [Writer] [GOBBLIN-314] Validate filesize when copying in writer
+* [Writer] [GOBBLIN-171] Add a writer wrapper that closes the wrapped writer and creates a new one
+* [Writer] [GOBBLIN-6] Support eventual consistent filesystems like S3
+* [Compaction] [GOBBLIN-354] Support DynamicConfig in AzkabanCompactionJobLauncher
+* [Retention] [GOBBLIN-348] Hdfs Modified Time based Version Finder for Hive Tables
+* [Hive-Registration] [GOBBLIN-342] Option to set hive metastore uri in Hiveregister
+* [Kafka] [GOBBLIN-331] Add sharedConfig support for the KafkaDataWriters
+* [Kafka] [GOBBLIN-312] Pass extra kafka configuration to the KafkaConsumer in KafkaSimpleStreamingSource
+* [Kafka] [GOBBLIN-198] Configuration to disable switching the Kafka topic's and Avro schema's names before registering schema
+* [Kafka] [GOBBLIN-195] Ability to switch Avro schema namespace switch before registering with Kafka Avro Schema registry
+* [Avro-to-ORC] [GOBBLIN-313] Option to explicitly set group name for staging and final destination directories for Avro-To-Orc conversion
+* [Avro-to-ORC] [GOBBLIN-297] Changing access modifier to Protected for HiveSource and Watermarker classes
+* [Metrics] [GOBBLIN-326] Gobblin metrics constructor only provides default constructor for Codhale metrics
+* [Metrics] [GOBBLIN-189] Add additional information in events for gobblintrackingevent_distcp_ng to show published dataset path
+* [Metrics] [GOBBLIN-307] Implement lineage event as LineageEventBuilder in gobblin
+* [Metrics] [GOBBLIN-261] Add kafka lineage event
+* [Metrics] [GOBBLIN-182] Emit Lineage Events for Query Based Sources
+* [Metrics] [GOBBLIN-22] Graphite prefix in configuration
+* [Salesforce] [GOBBLIN-288] Add finer-grain dynamic partition generation for Salesforce
+* [Salesforce] [GOBBLIN-265] Add support for PK chunking to gobblin-salesforce
+* [Compaction] [GOBBLIN-256] Improve logging for gobblin compaction
+* [Hive Registration] [GOBBLIN-266] Improve Hive Task setup
+* [Hive Registration] [GOBBLIN-253] Hive materializer enhancements
+* [Hive Registration] [GOBBLIN-172] Pipelined Hive Registration thru. TastStateCollectorService
+* [Config] [GOBBLIN-209] Add support for HOCO global files
+* [DistcpNG] [GOBBLIN-173] Add pattern support for job-level blacklist in distcpNG/replication
+* [DistcpNG] [GOBBLIN-8] Add simple distcp job publishing to S3 as an example
+* [DistcpNG] [GOBBLIN-5] Make Watermark checking configurable in distcpNG-replication
+* [Documentation] [GOBBLIN-282] Support templates on Gobblin Azkaban launcher
+* [Documentation] [GOBBLIN-170] Updating documentation to include Apache with Gobblin
+* [Documentation] [GOBBLIN-25] Gobblin data-management run script and example configuration
+* [Documentation] [GOBBLIN-339] Example to illustrate how to build custom source and extractor in Gobblin.
+* [Documentation] [GOBBLIN-305] Add csv-kafka and kafka-hdfs template
+* [Apache] [GOBBLIN-169] Ability to curate licenses of all Gobblin dependencies
+* [Apache] [GOBBLIN-168] Standardize Github PR template for Gobblin
+* [Apache] [GOBBLIN-167] Add dev tooling for signing releases
+* [Apache] [GOBBLIN-166] Add dev tooling for simplifying the Github PR workflow
+* [Apache] [GOBBLIN-163] Setup Wiki for Gobblin
+* [Apache] [GOBBLIN-162] Setup new PR process for Gobblin
+* [Apache] [GOBBLIN-161] Migrate all Gobblin issues from Github to Apache
+* [Apache] [GOBBLIN-160] Move mailing lists to Apache
+* [Apache] [GOBBLIN-65] Add com.linkedin.gobblin to alias resolver
+* [Apache] [GOBBLIN-38] Create workunitstream for CompactionSource
+* [Apache] [GOBBLIN-2] Setup Apache Gobblin's website
+* [Apache] [GOBBLIN-1] Move Gobblin codebase to Apache
+* [AdminUI] [GOBBLIN-9] Improve AdminUI and RestService with better sorting, filtering, auto-updates, etc.
+* [Streaming] [GOBBLIN-4] Added control messages to Gobblin stream.
+
+## BUGS FIXES
+
+* [Bug] [GOBBLIN-353] Fix low watermark overridden by high watermark in SalesforceSource
+* [Bug] [GOBBLIN-347] KafkaPusher is not closed when GobblinMetrics.stopReporting is called
+* [Bug] [GOBBLIN-344] Fix help method getResolver in LineageInfo is private
+* [Bug] [GOBBLIN-343] Table and db regexp does not work in HiverRegistrationPolicyBase
+* [Bug] [GOBBLIN-341] Fix logger name to correct class prefix after apache package change
+* [Bug] [GOBBLIN-338] HiveAvroManagerSerde failed if external table was on different fs
+* [Bug] [GOBBLIN-337] HiveConf token signature bug
+* [Bug] [GOBBLIN-328] GobblinClusterKillTest failed. Not able to find expected output files.
+* [Bug] [GOBBLIN-322] Cluster mode failed to start. Failed to find a log4j config file
+* [Bug] [GOBBLIN-321] CSV to HDFS ISSUE
+* [Bug] [GOBBLIN-315] Fix shaded avro is used in LineageEventBuilder
+* [Bug] [GOBBLIN-309] Bug fixing for contention of adding jar file into HDFS
+* [Bug] [GOBBLIN-308] Gobblin cluster bootup hangs
+* [Bug] [GOBBLIN-306] Exception when using fork followed by converters with EmbeddedGoblin
+* [Bug] [GOBBLIN-303] Compaction can generate zero sized output when MR is in speculative mode
+* [Bug] [GOBBLIN-301] Fix the key GOBBLIN_KAFKA_CONSUMER_CLIENT_FACTORY_CLASS
+* [Bug] [GOBBLIN-295] Make missing nullable fields default to null in json to avro converter
+* [Bug] [GOBBLIN-291] Remove unnecessary listing and reading of flowSpecs
+* [Bug] [GOBBLIN-289] Gobblin only partially decrypt the PGP file using keyring
+* [Bug] [GOBBLIN-286] Fix bug where non hive dataset throw NPE during dataset publish
+* [Bug] [GOBBLIN-285] KafkaExtractor does not compute avgMillisPerRecord when partition pull is interrupted
+* [Bug] [GOBBLIN-284] Add retry in SalesforceExtractor to handle transient network errors
+* [Bug] [GOBBLIN-283] Refactor EnvelopePayloadConverter to support multi fields conversion
+* [Bug] [GOBBLIN-279] pull file unable to reuse the json property.
+* [Bug] [GOBBLIN-278] Fix sending lineage event for KafkaSource
+* [Bug] [GOBBLIN-276] Change setActive order to prevent flow spec loss
+* [Bug] [GOBBLIN-275] Use listStatus instead of globStatus for finding persisted files
+* [Bug] [GOBBLIN-274] Fix wait for salesforce batch completion
+* [Bug] [GOBBLIN-268] Unique job uri and job name generation for GaaS
+* [Bug] [GOBBLIN-267] HiveSource creates workunit even when update time is before maxLookBackDays
+* [Bug] [GOBBLIN-263] TaskExecutor metrics are calculated incorrectly
+* [Bug] [GOBBLIN-260] Salesforce dynamic partitioning bugs
+* [Bug] [GOBBLIN-259] Support writing Kafka messages to db/table file path
+* [Bug] [GOBBLIN-258] Try to remove the tmp output path from wrong fs before compaction
+* [Bug] [GOBBLIN-254] Add config key to update watermark when a partition is empty
+* [Bug] [GOBBLIN-247] avro-to-orc conversion validation job should fail only on data mismatch
+* [Bug] [GOBBLIN-244] Need additional info for gobblin tracking hourly-deduped
+* [Bug] [GOBBLIN-241] Allow multiple datasets send different lineage event for kafka
+* [Bug] [GOBBLIN-237] Update property names in JsonRecordAvroSchemaToAvroConverter
+* [Bug] [GOBBLIN-235] Prevent log warnings when TaskStateCollectorService has no task states detected
+* [Bug] [GOBBLIN-234] Add a ControlMessageInjector that generates metadata update control messages
+* [Bug] [GOBBLIN-233] Add concurrent map to avoid multiple job submission from GobblinHelixJobScheduler
+* [Bug] [GOBBLIN-229] Gobblin cluster doesn't clean up job state file upon job completion
+* [Bug] [GOBBLIN-225] Fix cloning of ControlMessages in PartitionDataWriterMessageHandler
+* [Bug] [GOBBLIN-223] CsvToJsonConverter should throw DataConversionException
+* [Bug] [GOBBLIN-222] Fix silent failure in loading incompatible state store
+* [Bug] [GOBBLIN-220] FileAwareInputDataStreamWriter only logs file names when a copy completes successfully
+* [Bug] [GOBBLIN-219] Check for copyright header
+* [Bug] [GOBBLIN-218] Ensure runImmediately is honored in Gobblin as a Service
+* [Bug] [GOBBLIN-217] Fix gobblin-admin module to use correct idString
+* [Bug] [GOBBLIN-215] hasJoinOperation failed when SQL statement has limit keyword
+* [Bug] [GOBBLIN-214] Filtering doesn't work in FileListUtils:listFilesRecursively
+* [Bug] [GOBBLIN-212] Exception handling of TaskStateCollectorServiceHandler
+* [Bug] [GOBBLIN-208] JobCatalogs should fallback to system configuration
+* [Bug] [GOBBLIN-206] Remove extra close of CloseOnFlushWriterWrapper
+* [Bug] [GOBBLIN-205] Fix Replication bug in Push Mode
+* [Bug] [GOBBLIN-194] NPE in BaseDataPublisher if writer partitions are enabled and metadata filename is not set
+* [Bug] [GOBBLIN-193] AbstractAvroToOrcConverter throws NoObjectException when trying to fetch partition info from table when partition doesn't exist
+* [Bug] [GOBBLIN-192] Gobblin AWS hardcodes the log4j config
+* [Bug] [GOBBLIN-191] Make sure cron scheduler works and tune schedule period
+* [Bug] [GOBBLIN-184] Call the flush method of CloseOnFlushWriterWrapper when a FlushControlMessage is received
+* [Bug] [GOBBLIN-183] Gobblin data management copy empty directories
+* [Bug] [GOBBLIN-176] Gobblin build is failing with missing dependency jetty-http
+* [Bug] [GOBBLIN-175] String is not escaped while creating hive query for avro_to_orc conversion.
+* [Bug] [GOBBLIN-174] fix distcp-ng so it does not remove existing target files
+* [Bug] [GOBBLIN-165] Fix URI is not absolute issue in SFTP
+* [Bug] [GOBBLIN-159] Gobblin Cluster graceful shutdown of master and workers
+* [Bug] [GOBBLIN-129] AdminUI performs too many requests when update is pressed
+* [Bug] [GOBBLIN-127] Admin UI duration chart is sorted incorrectly
+* [Bug] [GOBBLIN-109] Remove need for current.jst
+* [Bug] [GOBBLIN-87] Gobblin runOnce not working correctly
+* [Bug] [GOBBLIN-79] Add config to specify database for JDBC source
+* [Bug] [GOBBLIN-54] How to use oozie to schedule gobblin with mapreduce mode, not the local mode
+* [Bug] [GOBBLIN-48] java.lang.IllegalArgumentException when using extract.limit.enabled
+* [Bug] [GOBBLIN-40] Job History DB Schema had not been updated to reflect new LauncherType
+* [Bug] [GOBBLIN-39] JobHistoryDB migration files have been incorrectly modified
+* [Bug] [GOBBLIN-37] Gobblin-Master Build failed
+* [Bug] [GOBBLIN-33] StateStores persists Task and WorkUnit state to state.store.fs.uri
+* [Bug] [GOBBLIN-32] StateStores created with rootDir that is incompatible with state.store.type
+* [Bug] [GOBBLIN-31] Reflections concurrency issue
+* [Bug] [GOBBLIN-30] Reflections errors when scanning classpath and encountering missing/invalid file paths.
+* [Bug] [GOBBLIN-29] GobblinHelixJobScheduler should be able to be run without default configuration manager
+* [Bug] [GOBBLIN-27] SQL Server - incomplete JDBC URL
+
+
 GOBBLIN 0.11.0
 -------------