You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kafka.apache.org by Jay Kreps <ja...@gmail.com> on 2011/11/05 21:34:53 UTC

sync() in java

Does anyone know if there is an equivalent to the sync() system call
available in Java? This is the system call that flushes all files. This
seems like it might be a good optimization for the time-based log flush. If
you are going to sequentially flush all the log partitions anyway it might
be better to just do a single sync() and let the I/O scheduler have more
leeway in the ordering of he writes. I know who to get the equivalent of
fsync() or fdatasync() using FileChannel.force(...) to flush a single file
but I don't know how to get the equivalent of sync().

-Jay

Re: sync() in java

Posted by Joe Stein <cr...@gmail.com>.
yup, willing to help, sounds good.  I kind of see 3+ patches here going.

1) the initial LogHelper trait
2) scouring the existing code refactoring to new logging
3) anything patched not on trunk yet after making it to trunk patch over it
for logging changes (if any needed)
4) watchful eye moving forward setting coding guideline and fixing anything
that slips in

I will create a JIRA and am happy to get started on #1 & #2, I can help
with #3 after things show up in trunk (can just script a grep after svn
update for old style logging).

For #3 and #4 the question is more of maintence do we close the ticket
after #1 & #2 and a commiter commits the patch to trunk and then another
JIRA ticket for #3 & #4 another ticket we keep around until next release to
keep up with the "lint"

not sure if any -1 are going to show up for this if so that
would definitely put a crimp in the effort, +1 are always good here too...
maybe after the initial trait is in (#1) and I scour the existing code (#2)
we can get +1 on changing style guidelines which will make #3 and #4 be
easier moving forward and worse case if enough -1 show up it is just a
small amount of time wasted and the code is not affected.

On Sun, Nov 6, 2011 at 11:09 AM, Jay Kreps <ja...@gmail.com> wrote:

> Cool, I tried a similar test:
>  https://gist.github.com/1343069
>
> Here are the results I get for logging debug to stdout and redirecting to a
> file:
>  isDebugEnabled with logging = DEBUG: 9673.7 ns per invocation
>  Debug helper with logging = DEBUG: 10248.6 ns per invocation
>  isDebugEnabled with logging = INFO: 22.3 ns per invocation
>  Debug helper with logging = INFO: 17.2 ns per invocation
>
> isDebugEnabled is the explicit check and "debug helper" is the lazy value.
> The important case is the logging = INFO case, as this is the case you are
> trying to optimize with the isDebugEnabled check. So there really is no
> difference in the debug helper case here, it looks like at least for this
> microbenchmark the jvm can optimize it away. There does seem to be some
> overhead for the helper method, as that seems to be persistently around
> 500ns slower, but I don't think that is a problem since the logging itself
> is 10us.
>
> Thanks for this pointer! I recommend we definitely move to this more
> readable style. If you are willing to help with the conversion that would
> of course be fantastic.
>
> -Jay
>
> On Sat, Nov 5, 2011 at 10:44 PM, Joe Stein <cr...@gmail.com> wrote:
>
> > So, for the possible new log style and if it is performance hindering I
> > created some test code
> >
> >
> https://github.com/joestein/qetesh/blob/master/src/main/scala/logstyletest.scalato
> > simulate how it would act
> >
> > can be run as a standalone jar just clone the project and then in qetesh
> >
> > java -jar target/scala-2.9.1/logstyletest_2.9.1-1.0.0.0-one-jar.jar 1000
> > 100000 10
> >
> > first param is how many threads in the pool, second is how many pushes to
> > the pool and third is how many times to repeat the test (since it will be
> > bumping into things not related to the test that will slow it down since
> it
> > is contending for resources).
> >
> > running on real hardware and ec2 both on large numbers and very small I
> do
> > not see any overheard on the log style
> >
> > let me know if/how you want to proceed or any questions with the test put
> > together or if you want something else in the test also covered.   I
> would
> > be willing to sift through and make the changes for a patch if/when it
> gets
> > some +1
> >
> > /*
> > Joe Stein
> > http://www.linkedin.com/in/charmalloc
> > Twitter: @allthingshadoop <http://www.twitter.com/allthingshadoop>
> > */
> >
> > On Sat, Nov 5, 2011 at 8:02 PM, Jay Kreps <ja...@gmail.com> wrote:
> >
> > > Ah, a couple interesting ideas there.
> > >
> > > For the process builder solution, I think it is too dangerous for this
> > use
> > > to pay off as an optimization. Java, being multithreaded by default,
> > uses a
> > > fork implementation that doesn't seem to use copy-on-write. This means
> > that
> > > if you have a 2GB process, you will do a 2GB allocation every time you
> > > instantiate a process before execing your 40k sync binary. This can
> cause
> > > really weird problems. We have been bitten by it in Hadoop and our job
> > > scheduler, both of which do that extensively to work around the lack of
> > > access to posix utilities. :-(
> > >
> > > Here is one of the bugs discussing this:
> > > https://issues.apache.org/jira/browse/HADOOP-5059
> > >
> > > I like your approach to logging, and I think that because you have
> > declared
> > > the type for the log statements as msg: => String they are lazy, but
> what
> > > is the overhead of that closure? I would fear it might outweigh the
> cost
> > of
> > > the string concatenation it replaces. I wonder if there is any
> > benchmarking
> > > on this? If it isn't too expensive, I would love for us to switch to
> that
> > > style of logging.
> > >
> > > -Jay
> > >
> > > On Sat, Nov 5, 2011 at 2:57 PM, Joe Stein <cr...@gmail.com> wrote:
> > >
> > > > Yeah, right good point that won't work for you.
> > > >
> > > > How about using ProcessBuilder?  I do this for calling shell scripts
> > for
> > > > whenever I can't do something in an API (or it is already implemented
> > in
> > > > bash scripts) and just need to execute something on the system.
> > > >
> > > > some object like this
> > > >
> > > >    private val reader = actor {
> > > >        debug("created actor: " + Thread.currentThread)
> > > >        var continue = true
> > > >        loopWhile(continue){
> > > >            reactWithin(WAIT_TIME) {
> > > >                case TIMEOUT =>
> > > >                    caller ! "react timeout"
> > > >                case proc:Process =>
> > > >                    debug("entering first actor " +
> > Thread.currentThread)
> > > >                    val streamReader = new
> > > > java.io.InputStreamReader(proc.getInputStream)
> > > >                    val bufferedReader = new
> > > > java.io.BufferedReader(streamReader)
> > > >                    val stringBuilder = new java.lang.StringBuilder()
> > > >                    var line:String = null
> > > >                    while({line = bufferedReader.readLine; line !=
> > null}){
> > > >                        stringBuilder.append(line)
> > > >                        stringBuilder.append("\n")
> > > >                    }
> > > >                    bufferedReader.close
> > > >                    caller ! stringBuilder.toString
> > > >            }
> > > >        }
> > > >    }
> > > >
> > > >    def run(command: String, ran: (=> String)=>Unit) : Boolean = {
> > > >        debug("gonna runa a command: " + Thread.currentThread + " =
> > > > "+command)
> > > >        val args = command.split(" ")
> > > >        val processBuilder = new ProcessBuilder(args: _* )
> > > >        processBuilder.redirectErrorStream(true)
> > > >        val proc = processBuilder.start()
> > > >
> > > >        //Send the proc to the actor, to extract the console output.
> > > >        reader ! proc
> > > >
> > > >        //Receive the console output from the actor.
> > > >        receiveWithin(WAIT_TIME) {
> > > >                case TIMEOUT => ran("receiving Timeout")
> > > >                case result:String => ran(result)
> > > >        }
> > > >
> > > >                true
> > > >    }
> > > >
> > > >
> > > > note my debug statement is a logging trait so as to not have to if
> > > > (log.isDebugEnabled()) with by name param.  I was going to bring this
> > up
> > > > something I noticed while going through the code last night (e.g.
> > > >
> > > >
> > >
> >
> https://github.com/joestein/skeletor/blob/master/src/main/scala/util/Logging.scala
> > > > )
> > > > figure now is a good time.  Also don't have to ever do
> logger.getLogger
> > > > because that comes with the trait.
> > > >
> > > > Maybe a generic utility and then you can just call and use it for
> > sync()
> > > at
> > > > the OS level like you need?
> > > >
> > > > On Sat, Nov 5, 2011 at 5:26 PM, Jay Kreps <ja...@gmail.com>
> wrote:
> > > >
> > > > > But does that sync that file or all files, looking for the later...
> > > > >
> > > > > -jay
> > > > >
> > > > > Sent from my iPhone
> > > > >
> > > > > On Nov 5, 2011, at 1:44 PM, Joe Stein <cr...@gmail.com> wrote:
> > > > >
> > > > > > On the fileoutputstream you can get the filedescriptor using
> > getFD()
> > > > and
> > > > > then on that object you can sync()
> > > > > >
> > > > > > /*
> > > > > > Joe Stein
> > > > > > http://www.medialets.com
> > > > > > Twitter: @allthingshadoop
> > > > > > */
> > > > > >
> > > > > > On Nov 5, 2011, at 4:34 PM, Jay Kreps <ja...@gmail.com>
> wrote:
> > > > > >
> > > > > >> Does anyone know if there is an equivalent to the sync() system
> > call
> > > > > >> available in Java? This is the system call that flushes all
> files.
> > > > This
> > > > > >> seems like it might be a good optimization for the time-based
> log
> > > > > flush. If
> > > > > >> you are going to sequentially flush all the log partitions
> anyway
> > it
> > > > > might
> > > > > >> be better to just do a single sync() and let the I/O scheduler
> > have
> > > > more
> > > > > >> leeway in the ordering of he writes. I know who to get the
> > > equivalent
> > > > of
> > > > > >> fsync() or fdatasync() using FileChannel.force(...) to flush a
> > > single
> > > > > file
> > > > > >> but I don't know how to get the equivalent of sync().
> > > > > >>
> > > > > >> -Jay
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > >
> > > > /*
> > > > Joe Stein
> > > > http://www.linkedin.com/in/charmalloc
> > > > Twitter: @allthingshadoop <http://www.twitter.com/allthingshadoop>
> > > > */
> > > >
> > >
> >
>



-- 

/*
Joe Stein
http://www.linkedin.com/in/charmalloc
Twitter: @allthingshadoop <http://www.twitter.com/allthingshadoop>
*/

Re: sync() in java

Posted by Jay Kreps <ja...@gmail.com>.
Cool, I tried a similar test:
  https://gist.github.com/1343069

Here are the results I get for logging debug to stdout and redirecting to a
file:
  isDebugEnabled with logging = DEBUG: 9673.7 ns per invocation
  Debug helper with logging = DEBUG: 10248.6 ns per invocation
  isDebugEnabled with logging = INFO: 22.3 ns per invocation
  Debug helper with logging = INFO: 17.2 ns per invocation

isDebugEnabled is the explicit check and "debug helper" is the lazy value.
The important case is the logging = INFO case, as this is the case you are
trying to optimize with the isDebugEnabled check. So there really is no
difference in the debug helper case here, it looks like at least for this
microbenchmark the jvm can optimize it away. There does seem to be some
overhead for the helper method, as that seems to be persistently around
500ns slower, but I don't think that is a problem since the logging itself
is 10us.

Thanks for this pointer! I recommend we definitely move to this more
readable style. If you are willing to help with the conversion that would
of course be fantastic.

-Jay

On Sat, Nov 5, 2011 at 10:44 PM, Joe Stein <cr...@gmail.com> wrote:

> So, for the possible new log style and if it is performance hindering I
> created some test code
>
> https://github.com/joestein/qetesh/blob/master/src/main/scala/logstyletest.scalato
> simulate how it would act
>
> can be run as a standalone jar just clone the project and then in qetesh
>
> java -jar target/scala-2.9.1/logstyletest_2.9.1-1.0.0.0-one-jar.jar 1000
> 100000 10
>
> first param is how many threads in the pool, second is how many pushes to
> the pool and third is how many times to repeat the test (since it will be
> bumping into things not related to the test that will slow it down since it
> is contending for resources).
>
> running on real hardware and ec2 both on large numbers and very small I do
> not see any overheard on the log style
>
> let me know if/how you want to proceed or any questions with the test put
> together or if you want something else in the test also covered.   I would
> be willing to sift through and make the changes for a patch if/when it gets
> some +1
>
> /*
> Joe Stein
> http://www.linkedin.com/in/charmalloc
> Twitter: @allthingshadoop <http://www.twitter.com/allthingshadoop>
> */
>
> On Sat, Nov 5, 2011 at 8:02 PM, Jay Kreps <ja...@gmail.com> wrote:
>
> > Ah, a couple interesting ideas there.
> >
> > For the process builder solution, I think it is too dangerous for this
> use
> > to pay off as an optimization. Java, being multithreaded by default,
> uses a
> > fork implementation that doesn't seem to use copy-on-write. This means
> that
> > if you have a 2GB process, you will do a 2GB allocation every time you
> > instantiate a process before execing your 40k sync binary. This can cause
> > really weird problems. We have been bitten by it in Hadoop and our job
> > scheduler, both of which do that extensively to work around the lack of
> > access to posix utilities. :-(
> >
> > Here is one of the bugs discussing this:
> > https://issues.apache.org/jira/browse/HADOOP-5059
> >
> > I like your approach to logging, and I think that because you have
> declared
> > the type for the log statements as msg: => String they are lazy, but what
> > is the overhead of that closure? I would fear it might outweigh the cost
> of
> > the string concatenation it replaces. I wonder if there is any
> benchmarking
> > on this? If it isn't too expensive, I would love for us to switch to that
> > style of logging.
> >
> > -Jay
> >
> > On Sat, Nov 5, 2011 at 2:57 PM, Joe Stein <cr...@gmail.com> wrote:
> >
> > > Yeah, right good point that won't work for you.
> > >
> > > How about using ProcessBuilder?  I do this for calling shell scripts
> for
> > > whenever I can't do something in an API (or it is already implemented
> in
> > > bash scripts) and just need to execute something on the system.
> > >
> > > some object like this
> > >
> > >    private val reader = actor {
> > >        debug("created actor: " + Thread.currentThread)
> > >        var continue = true
> > >        loopWhile(continue){
> > >            reactWithin(WAIT_TIME) {
> > >                case TIMEOUT =>
> > >                    caller ! "react timeout"
> > >                case proc:Process =>
> > >                    debug("entering first actor " +
> Thread.currentThread)
> > >                    val streamReader = new
> > > java.io.InputStreamReader(proc.getInputStream)
> > >                    val bufferedReader = new
> > > java.io.BufferedReader(streamReader)
> > >                    val stringBuilder = new java.lang.StringBuilder()
> > >                    var line:String = null
> > >                    while({line = bufferedReader.readLine; line !=
> null}){
> > >                        stringBuilder.append(line)
> > >                        stringBuilder.append("\n")
> > >                    }
> > >                    bufferedReader.close
> > >                    caller ! stringBuilder.toString
> > >            }
> > >        }
> > >    }
> > >
> > >    def run(command: String, ran: (=> String)=>Unit) : Boolean = {
> > >        debug("gonna runa a command: " + Thread.currentThread + " =
> > > "+command)
> > >        val args = command.split(" ")
> > >        val processBuilder = new ProcessBuilder(args: _* )
> > >        processBuilder.redirectErrorStream(true)
> > >        val proc = processBuilder.start()
> > >
> > >        //Send the proc to the actor, to extract the console output.
> > >        reader ! proc
> > >
> > >        //Receive the console output from the actor.
> > >        receiveWithin(WAIT_TIME) {
> > >                case TIMEOUT => ran("receiving Timeout")
> > >                case result:String => ran(result)
> > >        }
> > >
> > >                true
> > >    }
> > >
> > >
> > > note my debug statement is a logging trait so as to not have to if
> > > (log.isDebugEnabled()) with by name param.  I was going to bring this
> up
> > > something I noticed while going through the code last night (e.g.
> > >
> > >
> >
> https://github.com/joestein/skeletor/blob/master/src/main/scala/util/Logging.scala
> > > )
> > > figure now is a good time.  Also don't have to ever do logger.getLogger
> > > because that comes with the trait.
> > >
> > > Maybe a generic utility and then you can just call and use it for
> sync()
> > at
> > > the OS level like you need?
> > >
> > > On Sat, Nov 5, 2011 at 5:26 PM, Jay Kreps <ja...@gmail.com> wrote:
> > >
> > > > But does that sync that file or all files, looking for the later...
> > > >
> > > > -jay
> > > >
> > > > Sent from my iPhone
> > > >
> > > > On Nov 5, 2011, at 1:44 PM, Joe Stein <cr...@gmail.com> wrote:
> > > >
> > > > > On the fileoutputstream you can get the filedescriptor using
> getFD()
> > > and
> > > > then on that object you can sync()
> > > > >
> > > > > /*
> > > > > Joe Stein
> > > > > http://www.medialets.com
> > > > > Twitter: @allthingshadoop
> > > > > */
> > > > >
> > > > > On Nov 5, 2011, at 4:34 PM, Jay Kreps <ja...@gmail.com> wrote:
> > > > >
> > > > >> Does anyone know if there is an equivalent to the sync() system
> call
> > > > >> available in Java? This is the system call that flushes all files.
> > > This
> > > > >> seems like it might be a good optimization for the time-based log
> > > > flush. If
> > > > >> you are going to sequentially flush all the log partitions anyway
> it
> > > > might
> > > > >> be better to just do a single sync() and let the I/O scheduler
> have
> > > more
> > > > >> leeway in the ordering of he writes. I know who to get the
> > equivalent
> > > of
> > > > >> fsync() or fdatasync() using FileChannel.force(...) to flush a
> > single
> > > > file
> > > > >> but I don't know how to get the equivalent of sync().
> > > > >>
> > > > >> -Jay
> > > >
> > >
> > >
> > >
> > > --
> > >
> > > /*
> > > Joe Stein
> > > http://www.linkedin.com/in/charmalloc
> > > Twitter: @allthingshadoop <http://www.twitter.com/allthingshadoop>
> > > */
> > >
> >
>

Re: sync() in java

Posted by Joe Stein <cr...@gmail.com>.
So, for the possible new log style and if it is performance hindering I
created some test code
https://github.com/joestein/qetesh/blob/master/src/main/scala/logstyletest.scalato
simulate how it would act

can be run as a standalone jar just clone the project and then in qetesh

java -jar target/scala-2.9.1/logstyletest_2.9.1-1.0.0.0-one-jar.jar 1000
100000 10

first param is how many threads in the pool, second is how many pushes to
the pool and third is how many times to repeat the test (since it will be
bumping into things not related to the test that will slow it down since it
is contending for resources).

running on real hardware and ec2 both on large numbers and very small I do
not see any overheard on the log style

let me know if/how you want to proceed or any questions with the test put
together or if you want something else in the test also covered.   I would
be willing to sift through and make the changes for a patch if/when it gets
some +1

/*
Joe Stein
http://www.linkedin.com/in/charmalloc
Twitter: @allthingshadoop <http://www.twitter.com/allthingshadoop>
*/

On Sat, Nov 5, 2011 at 8:02 PM, Jay Kreps <ja...@gmail.com> wrote:

> Ah, a couple interesting ideas there.
>
> For the process builder solution, I think it is too dangerous for this use
> to pay off as an optimization. Java, being multithreaded by default, uses a
> fork implementation that doesn't seem to use copy-on-write. This means that
> if you have a 2GB process, you will do a 2GB allocation every time you
> instantiate a process before execing your 40k sync binary. This can cause
> really weird problems. We have been bitten by it in Hadoop and our job
> scheduler, both of which do that extensively to work around the lack of
> access to posix utilities. :-(
>
> Here is one of the bugs discussing this:
> https://issues.apache.org/jira/browse/HADOOP-5059
>
> I like your approach to logging, and I think that because you have declared
> the type for the log statements as msg: => String they are lazy, but what
> is the overhead of that closure? I would fear it might outweigh the cost of
> the string concatenation it replaces. I wonder if there is any benchmarking
> on this? If it isn't too expensive, I would love for us to switch to that
> style of logging.
>
> -Jay
>
> On Sat, Nov 5, 2011 at 2:57 PM, Joe Stein <cr...@gmail.com> wrote:
>
> > Yeah, right good point that won't work for you.
> >
> > How about using ProcessBuilder?  I do this for calling shell scripts for
> > whenever I can't do something in an API (or it is already implemented in
> > bash scripts) and just need to execute something on the system.
> >
> > some object like this
> >
> >    private val reader = actor {
> >        debug("created actor: " + Thread.currentThread)
> >        var continue = true
> >        loopWhile(continue){
> >            reactWithin(WAIT_TIME) {
> >                case TIMEOUT =>
> >                    caller ! "react timeout"
> >                case proc:Process =>
> >                    debug("entering first actor " + Thread.currentThread)
> >                    val streamReader = new
> > java.io.InputStreamReader(proc.getInputStream)
> >                    val bufferedReader = new
> > java.io.BufferedReader(streamReader)
> >                    val stringBuilder = new java.lang.StringBuilder()
> >                    var line:String = null
> >                    while({line = bufferedReader.readLine; line != null}){
> >                        stringBuilder.append(line)
> >                        stringBuilder.append("\n")
> >                    }
> >                    bufferedReader.close
> >                    caller ! stringBuilder.toString
> >            }
> >        }
> >    }
> >
> >    def run(command: String, ran: (=> String)=>Unit) : Boolean = {
> >        debug("gonna runa a command: " + Thread.currentThread + " =
> > "+command)
> >        val args = command.split(" ")
> >        val processBuilder = new ProcessBuilder(args: _* )
> >        processBuilder.redirectErrorStream(true)
> >        val proc = processBuilder.start()
> >
> >        //Send the proc to the actor, to extract the console output.
> >        reader ! proc
> >
> >        //Receive the console output from the actor.
> >        receiveWithin(WAIT_TIME) {
> >                case TIMEOUT => ran("receiving Timeout")
> >                case result:String => ran(result)
> >        }
> >
> >                true
> >    }
> >
> >
> > note my debug statement is a logging trait so as to not have to if
> > (log.isDebugEnabled()) with by name param.  I was going to bring this up
> > something I noticed while going through the code last night (e.g.
> >
> >
> https://github.com/joestein/skeletor/blob/master/src/main/scala/util/Logging.scala
> > )
> > figure now is a good time.  Also don't have to ever do logger.getLogger
> > because that comes with the trait.
> >
> > Maybe a generic utility and then you can just call and use it for sync()
> at
> > the OS level like you need?
> >
> > On Sat, Nov 5, 2011 at 5:26 PM, Jay Kreps <ja...@gmail.com> wrote:
> >
> > > But does that sync that file or all files, looking for the later...
> > >
> > > -jay
> > >
> > > Sent from my iPhone
> > >
> > > On Nov 5, 2011, at 1:44 PM, Joe Stein <cr...@gmail.com> wrote:
> > >
> > > > On the fileoutputstream you can get the filedescriptor using getFD()
> > and
> > > then on that object you can sync()
> > > >
> > > > /*
> > > > Joe Stein
> > > > http://www.medialets.com
> > > > Twitter: @allthingshadoop
> > > > */
> > > >
> > > > On Nov 5, 2011, at 4:34 PM, Jay Kreps <ja...@gmail.com> wrote:
> > > >
> > > >> Does anyone know if there is an equivalent to the sync() system call
> > > >> available in Java? This is the system call that flushes all files.
> > This
> > > >> seems like it might be a good optimization for the time-based log
> > > flush. If
> > > >> you are going to sequentially flush all the log partitions anyway it
> > > might
> > > >> be better to just do a single sync() and let the I/O scheduler have
> > more
> > > >> leeway in the ordering of he writes. I know who to get the
> equivalent
> > of
> > > >> fsync() or fdatasync() using FileChannel.force(...) to flush a
> single
> > > file
> > > >> but I don't know how to get the equivalent of sync().
> > > >>
> > > >> -Jay
> > >
> >
> >
> >
> > --
> >
> > /*
> > Joe Stein
> > http://www.linkedin.com/in/charmalloc
> > Twitter: @allthingshadoop <http://www.twitter.com/allthingshadoop>
> > */
> >
>

Re: sync() in java

Posted by Jay Kreps <ja...@gmail.com>.
Ah, a couple interesting ideas there.

For the process builder solution, I think it is too dangerous for this use
to pay off as an optimization. Java, being multithreaded by default, uses a
fork implementation that doesn't seem to use copy-on-write. This means that
if you have a 2GB process, you will do a 2GB allocation every time you
instantiate a process before execing your 40k sync binary. This can cause
really weird problems. We have been bitten by it in Hadoop and our job
scheduler, both of which do that extensively to work around the lack of
access to posix utilities. :-(

Here is one of the bugs discussing this:
https://issues.apache.org/jira/browse/HADOOP-5059

I like your approach to logging, and I think that because you have declared
the type for the log statements as msg: => String they are lazy, but what
is the overhead of that closure? I would fear it might outweigh the cost of
the string concatenation it replaces. I wonder if there is any benchmarking
on this? If it isn't too expensive, I would love for us to switch to that
style of logging.

-Jay

On Sat, Nov 5, 2011 at 2:57 PM, Joe Stein <cr...@gmail.com> wrote:

> Yeah, right good point that won't work for you.
>
> How about using ProcessBuilder?  I do this for calling shell scripts for
> whenever I can't do something in an API (or it is already implemented in
> bash scripts) and just need to execute something on the system.
>
> some object like this
>
>    private val reader = actor {
>        debug("created actor: " + Thread.currentThread)
>        var continue = true
>        loopWhile(continue){
>            reactWithin(WAIT_TIME) {
>                case TIMEOUT =>
>                    caller ! "react timeout"
>                case proc:Process =>
>                    debug("entering first actor " + Thread.currentThread)
>                    val streamReader = new
> java.io.InputStreamReader(proc.getInputStream)
>                    val bufferedReader = new
> java.io.BufferedReader(streamReader)
>                    val stringBuilder = new java.lang.StringBuilder()
>                    var line:String = null
>                    while({line = bufferedReader.readLine; line != null}){
>                        stringBuilder.append(line)
>                        stringBuilder.append("\n")
>                    }
>                    bufferedReader.close
>                    caller ! stringBuilder.toString
>            }
>        }
>    }
>
>    def run(command: String, ran: (=> String)=>Unit) : Boolean = {
>        debug("gonna runa a command: " + Thread.currentThread + " =
> "+command)
>        val args = command.split(" ")
>        val processBuilder = new ProcessBuilder(args: _* )
>        processBuilder.redirectErrorStream(true)
>        val proc = processBuilder.start()
>
>        //Send the proc to the actor, to extract the console output.
>        reader ! proc
>
>        //Receive the console output from the actor.
>        receiveWithin(WAIT_TIME) {
>                case TIMEOUT => ran("receiving Timeout")
>                case result:String => ran(result)
>        }
>
>                true
>    }
>
>
> note my debug statement is a logging trait so as to not have to if
> (log.isDebugEnabled()) with by name param.  I was going to bring this up
> something I noticed while going through the code last night (e.g.
>
> https://github.com/joestein/skeletor/blob/master/src/main/scala/util/Logging.scala
> )
> figure now is a good time.  Also don't have to ever do logger.getLogger
> because that comes with the trait.
>
> Maybe a generic utility and then you can just call and use it for sync() at
> the OS level like you need?
>
> On Sat, Nov 5, 2011 at 5:26 PM, Jay Kreps <ja...@gmail.com> wrote:
>
> > But does that sync that file or all files, looking for the later...
> >
> > -jay
> >
> > Sent from my iPhone
> >
> > On Nov 5, 2011, at 1:44 PM, Joe Stein <cr...@gmail.com> wrote:
> >
> > > On the fileoutputstream you can get the filedescriptor using getFD()
> and
> > then on that object you can sync()
> > >
> > > /*
> > > Joe Stein
> > > http://www.medialets.com
> > > Twitter: @allthingshadoop
> > > */
> > >
> > > On Nov 5, 2011, at 4:34 PM, Jay Kreps <ja...@gmail.com> wrote:
> > >
> > >> Does anyone know if there is an equivalent to the sync() system call
> > >> available in Java? This is the system call that flushes all files.
> This
> > >> seems like it might be a good optimization for the time-based log
> > flush. If
> > >> you are going to sequentially flush all the log partitions anyway it
> > might
> > >> be better to just do a single sync() and let the I/O scheduler have
> more
> > >> leeway in the ordering of he writes. I know who to get the equivalent
> of
> > >> fsync() or fdatasync() using FileChannel.force(...) to flush a single
> > file
> > >> but I don't know how to get the equivalent of sync().
> > >>
> > >> -Jay
> >
>
>
>
> --
>
> /*
> Joe Stein
> http://www.linkedin.com/in/charmalloc
> Twitter: @allthingshadoop <http://www.twitter.com/allthingshadoop>
> */
>

Re: sync() in java

Posted by Joe Stein <cr...@gmail.com>.
Yeah, right good point that won't work for you.

How about using ProcessBuilder?  I do this for calling shell scripts for
whenever I can't do something in an API (or it is already implemented in
bash scripts) and just need to execute something on the system.

some object like this

    private val reader = actor {
        debug("created actor: " + Thread.currentThread)
        var continue = true
        loopWhile(continue){
            reactWithin(WAIT_TIME) {
                case TIMEOUT =>
                    caller ! "react timeout"
                case proc:Process =>
                    debug("entering first actor " + Thread.currentThread)
                    val streamReader = new
java.io.InputStreamReader(proc.getInputStream)
                    val bufferedReader = new
java.io.BufferedReader(streamReader)
                    val stringBuilder = new java.lang.StringBuilder()
                    var line:String = null
                    while({line = bufferedReader.readLine; line != null}){
                        stringBuilder.append(line)
                        stringBuilder.append("\n")
                    }
                    bufferedReader.close
                    caller ! stringBuilder.toString
            }
        }
    }

    def run(command: String, ran: (=> String)=>Unit) : Boolean = {
        debug("gonna runa a command: " + Thread.currentThread + " = "+command)
        val args = command.split(" ")
        val processBuilder = new ProcessBuilder(args: _* )
        processBuilder.redirectErrorStream(true)
        val proc = processBuilder.start()

        //Send the proc to the actor, to extract the console output.
        reader ! proc

        //Receive the console output from the actor.
        receiveWithin(WAIT_TIME) {
                case TIMEOUT => ran("receiving Timeout")
                case result:String => ran(result)
        }

		true
    }


note my debug statement is a logging trait so as to not have to if
(log.isDebugEnabled()) with by name param.  I was going to bring this up
something I noticed while going through the code last night (e.g.
https://github.com/joestein/skeletor/blob/master/src/main/scala/util/Logging.scala)
figure now is a good time.  Also don't have to ever do logger.getLogger
because that comes with the trait.

Maybe a generic utility and then you can just call and use it for sync() at
the OS level like you need?

On Sat, Nov 5, 2011 at 5:26 PM, Jay Kreps <ja...@gmail.com> wrote:

> But does that sync that file or all files, looking for the later...
>
> -jay
>
> Sent from my iPhone
>
> On Nov 5, 2011, at 1:44 PM, Joe Stein <cr...@gmail.com> wrote:
>
> > On the fileoutputstream you can get the filedescriptor using getFD() and
> then on that object you can sync()
> >
> > /*
> > Joe Stein
> > http://www.medialets.com
> > Twitter: @allthingshadoop
> > */
> >
> > On Nov 5, 2011, at 4:34 PM, Jay Kreps <ja...@gmail.com> wrote:
> >
> >> Does anyone know if there is an equivalent to the sync() system call
> >> available in Java? This is the system call that flushes all files. This
> >> seems like it might be a good optimization for the time-based log
> flush. If
> >> you are going to sequentially flush all the log partitions anyway it
> might
> >> be better to just do a single sync() and let the I/O scheduler have more
> >> leeway in the ordering of he writes. I know who to get the equivalent of
> >> fsync() or fdatasync() using FileChannel.force(...) to flush a single
> file
> >> but I don't know how to get the equivalent of sync().
> >>
> >> -Jay
>



-- 

/*
Joe Stein
http://www.linkedin.com/in/charmalloc
Twitter: @allthingshadoop <http://www.twitter.com/allthingshadoop>
*/

Re: sync() in java

Posted by Joe Stein <cr...@gmail.com>.
Yeah, right good point that won't work for you.

How about using ProcessBuilder?  I do this for calling shell scripts for
whenever I can't do something in an API (or it is already implemented in
bash scripts) and just need to execute something on the system.

some object like this

    private val reader = actor {
        debug("created actor: " + Thread.currentThread)
        var continue = true
        loopWhile(continue){
            reactWithin(WAIT_TIME) {
                case TIMEOUT =>
                    caller ! "react timeout"
                case proc:Process =>
                    debug("entering first actor " + Thread.currentThread)
                    val streamReader = new
java.io.InputStreamReader(proc.getInputStream)
                    val bufferedReader = new
java.io.BufferedReader(streamReader)
                    val stringBuilder = new java.lang.StringBuilder()
                    var line:String = null
                    while({line = bufferedReader.readLine; line != null}){
                        stringBuilder.append(line)
                        stringBuilder.append("\n")
                    }
                    bufferedReader.close
                    caller ! stringBuilder.toString
            }
        }
    }

    def run(command: String, ran: (=> String)=>Unit) : Boolean = {
        debug("gonna runa a command: " + Thread.currentThread + " = "+command)
        val args = command.split(" ")
        val processBuilder = new ProcessBuilder(args: _* )
        processBuilder.redirectErrorStream(true)
        val proc = processBuilder.start()

        //Send the proc to the actor, to extract the console output.
        reader ! proc

        //Receive the console output from the actor.
        receiveWithin(WAIT_TIME) {
                case TIMEOUT => ran("receiving Timeout")
                case result:String => ran(result)
        }

		true
    }


note my debug statement is a logging trait so as to not have to if
(log.isDebugEnabled()) with by name param.  I was going to bring this up
something I noticed while going through the code last night (e.g.
https://github.com/joestein/skeletor/blob/master/src/main/scala/util/Logging.scala)
figure now is a good time.  Also don't have to ever do logger.getLogger
because that comes with the trait.

Maybe a generic utility and then you can just call and use it for sync() at
the OS level like you need?

On Sat, Nov 5, 2011 at 5:26 PM, Jay Kreps <ja...@gmail.com> wrote:

> But does that sync that file or all files, looking for the later...
>
> -jay
>
> Sent from my iPhone
>
> On Nov 5, 2011, at 1:44 PM, Joe Stein <cr...@gmail.com> wrote:
>
> > On the fileoutputstream you can get the filedescriptor using getFD() and
> then on that object you can sync()
> >
> > /*
> > Joe Stein
> > http://www.medialets.com
> > Twitter: @allthingshadoop
> > */
> >
> > On Nov 5, 2011, at 4:34 PM, Jay Kreps <ja...@gmail.com> wrote:
> >
> >> Does anyone know if there is an equivalent to the sync() system call
> >> available in Java? This is the system call that flushes all files. This
> >> seems like it might be a good optimization for the time-based log
> flush. If
> >> you are going to sequentially flush all the log partitions anyway it
> might
> >> be better to just do a single sync() and let the I/O scheduler have more
> >> leeway in the ordering of he writes. I know who to get the equivalent of
> >> fsync() or fdatasync() using FileChannel.force(...) to flush a single
> file
> >> but I don't know how to get the equivalent of sync().
> >>
> >> -Jay
>



-- 

/*
Joe Stein
http://www.linkedin.com/in/charmalloc
Twitter: @allthingshadoop <http://www.twitter.com/allthingshadoop>
*/

Re: sync() in java

Posted by Jay Kreps <ja...@gmail.com>.
But does that sync that file or all files, looking for the later...

-jay

Sent from my iPhone

On Nov 5, 2011, at 1:44 PM, Joe Stein <cr...@gmail.com> wrote:

> On the fileoutputstream you can get the filedescriptor using getFD() and then on that object you can sync()
> 
> /*
> Joe Stein
> http://www.medialets.com
> Twitter: @allthingshadoop
> */
> 
> On Nov 5, 2011, at 4:34 PM, Jay Kreps <ja...@gmail.com> wrote:
> 
>> Does anyone know if there is an equivalent to the sync() system call
>> available in Java? This is the system call that flushes all files. This
>> seems like it might be a good optimization for the time-based log flush. If
>> you are going to sequentially flush all the log partitions anyway it might
>> be better to just do a single sync() and let the I/O scheduler have more
>> leeway in the ordering of he writes. I know who to get the equivalent of
>> fsync() or fdatasync() using FileChannel.force(...) to flush a single file
>> but I don't know how to get the equivalent of sync().
>> 
>> -Jay

Re: sync() in java

Posted by Jay Kreps <ja...@gmail.com>.
But does that sync that file or all files, looking for the later...

-jay

Sent from my iPhone

On Nov 5, 2011, at 1:44 PM, Joe Stein <cr...@gmail.com> wrote:

> On the fileoutputstream you can get the filedescriptor using getFD() and then on that object you can sync()
> 
> /*
> Joe Stein
> http://www.medialets.com
> Twitter: @allthingshadoop
> */
> 
> On Nov 5, 2011, at 4:34 PM, Jay Kreps <ja...@gmail.com> wrote:
> 
>> Does anyone know if there is an equivalent to the sync() system call
>> available in Java? This is the system call that flushes all files. This
>> seems like it might be a good optimization for the time-based log flush. If
>> you are going to sequentially flush all the log partitions anyway it might
>> be better to just do a single sync() and let the I/O scheduler have more
>> leeway in the ordering of he writes. I know who to get the equivalent of
>> fsync() or fdatasync() using FileChannel.force(...) to flush a single file
>> but I don't know how to get the equivalent of sync().
>> 
>> -Jay

Re: sync() in java

Posted by Joe Stein <cr...@gmail.com>.
On the fileoutputstream you can get the filedescriptor using getFD() and then on that object you can sync()

/*
Joe Stein
http://www.medialets.com
Twitter: @allthingshadoop
*/

On Nov 5, 2011, at 4:34 PM, Jay Kreps <ja...@gmail.com> wrote:

> Does anyone know if there is an equivalent to the sync() system call
> available in Java? This is the system call that flushes all files. This
> seems like it might be a good optimization for the time-based log flush. If
> you are going to sequentially flush all the log partitions anyway it might
> be better to just do a single sync() and let the I/O scheduler have more
> leeway in the ordering of he writes. I know who to get the equivalent of
> fsync() or fdatasync() using FileChannel.force(...) to flush a single file
> but I don't know how to get the equivalent of sync().
> 
> -Jay

Re: sync() in java

Posted by Joe Stein <cr...@gmail.com>.
On the fileoutputstream you can get the filedescriptor using getFD() and then on that object you can sync()

/*
Joe Stein
http://www.medialets.com
Twitter: @allthingshadoop
*/

On Nov 5, 2011, at 4:34 PM, Jay Kreps <ja...@gmail.com> wrote:

> Does anyone know if there is an equivalent to the sync() system call
> available in Java? This is the system call that flushes all files. This
> seems like it might be a good optimization for the time-based log flush. If
> you are going to sequentially flush all the log partitions anyway it might
> be better to just do a single sync() and let the I/O scheduler have more
> leeway in the ordering of he writes. I know who to get the equivalent of
> fsync() or fdatasync() using FileChannel.force(...) to flush a single file
> but I don't know how to get the equivalent of sync().
> 
> -Jay