You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@samza.apache.org by "Chris Riccomini (JIRA)" <ji...@apache.org> on 2013/08/15 00:45:49 UTC

[jira] [Commented] (SAMZA-14) Fix misc bugs in QA jobs

    [ https://issues.apache.org/jira/browse/SAMZA-14?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13740334#comment-13740334 ] 

Chris Riccomini commented on SAMZA-14:
--------------------------------------

*samza-kafka/src/main/scala/org/apache/samza/system/kafka/KafkaSystemProducer.scala*

{code}
-          warn("Triggering a reconnect for %s because connection failed: %s" format (systemName, e.getMessage))
+          warn("Triggering a reconnect for %s because connection failed:".format(systemName), e)
           debug("Exception while producing to %s." format systemName, e)
{code}

The warn e.getMessage here is intentional. There is a debug immediately following that prints the full stack trace. I did this because a full stack trace scares people, and the failure is usually some timeout or something where the producer just reconnects. Trying to avoid verbose logs and stack traces that aren't actually relevant to the user. Recommend reverting this change back.

*samza-test/java.hprof.txt*

Should be removed. Was it accidentally left in, or was this intentional?

*samza-test/src/main/java/org/apache/samza/test/integration*

System.out is used in a bunch of places in the integration test tasks. Prefer using Grizzled here so we can roll logs over. STDOUT doesn't roll, and if we leave integration tests running for a long time, the files will get huge.

*samza-test/src/main/resources/common.properties*

I don't think these configs are used, but are listed in common.properties:

{noformat}
systems.kafka-checkpoints.serializer.class=samza.task.state.KafkaCheckpointEncoder
systems.kafka-checkpoints.partitioner.class=samza.task.state.KafkaCheckpointPartitioner
systems.kafka-checkpoints.key.serializer.class=kafka.serializer.NullEncoder
{noformat}

All Kafka config is under the name space:

systems.<system name>.samza.*
systems.<system name>.consumer.*
systems.<system name>.producer.*

See KafkaConfig class for details. I think it's likely that you want the producer namespace.

*.samsa*

Kind of a cool extension for samsa job property files, but I'm not crazy about mixing them (common.properties, hello-stateful-world.samza). Can you do .properties for everything, and then open up a separate JIRA that includes updating hello-samza, if you want to switch to the .samsa extension? I'm cool either way, but not both. :)

*samza-test/src/main/resources/join/checker.samsa*

I don't think this is getting picked up any more:

{noformat}
systems.kafka.partitioner.class=org.apache.samza.test.integration.join.EpochPartitioner
{noformat}

Probably want producer namespace.

{noformat}
stores.checker-state.factory=org.apache.samza.storage.kv.KeyValueStorageEngineFactory
stores.checker-state.key.serde=string
stores.checker-state.msg.serde=string
stores.checker-state.changelog=kafka.checker-state
{noformat}

*samza-test/src/main/resources/join/reset.sh*

Some docs on how to use this would be helpful. I think best list of changes is:

1. Remove Unit Tests link from nav.
2. Add Tests page to nav (docs/.../contribute/tests.md).
3. Add sections to tests page for running unit tests (Gradle command), and integration test, and a link to Hudson page (what is currently the nav link).

Also, why is reset.sh in the join folder?

*samza-test/src/main/resources/join/watcher.samza*

{noformat}
mail.from=gregor@incubator.apache.org
{noformat}

Prefer gregor-noreply@... here.

{noformat}
mail.smtp.host=email.corp.linkedin.com
{noformat}

Hmmm... :) Prefer mail.smtp.host=TODO.add.smpthost.here

*samza-test/src/main/resources/perf/counter.samza b/samza-test/src/main/resources/perf/counter.samza*

Docs on what exactly perf does would be useful too.

Also -- wacky idea -- what do you think about hooking up some sort of metrics to the integration tests, so we have a real-time dashboard of perf and whatnot? Would be useful for debugging when the integration tests break, and would also be useful for perf metrics. Alternatively, or in addition to, it might be nice to have some perf suite that can executes during CI and generates a report. Not sure which is better, or both? Just thinking out loud.
                
> Fix misc bugs in QA jobs
> ------------------------
>
>                 Key: SAMZA-14
>                 URL: https://issues.apache.org/jira/browse/SAMZA-14
>             Project: Samza
>          Issue Type: Bug
>            Reporter: Jay Kreps
>            Assignee: Jay Kreps
>         Attachments: SAMZA-14-v1.patch
>
>
> Some bugs that have already been there and some that came with the various package refactorings.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira