You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@calcite.apache.org by "Julian Hyde (Jira)" <ji...@apache.org> on 2023/06/07 21:05:00 UTC

[jira] [Created] (CALCITE-5764) Puffin, an Awk for Java

Julian Hyde created CALCITE-5764:
------------------------------------

             Summary: Puffin, an Awk for Java
                 Key: CALCITE-5764
                 URL: https://issues.apache.org/jira/browse/CALCITE-5764
             Project: Calcite
          Issue Type: Bug
            Reporter: Julian Hyde


Create Puffin, which allows a programming model similar to the {{awk}} scripting language.

An {{awk}} program is a collection of rules, each of which is a pair: a predicate and an action. For each line in a file, the rules are applied in sequence, and if the predicate evaluates to true, the action is executed. Then {{awk}} goes on to the next file.

In {{Puffin}}, each predicate is a {{Predicate<Line>>}}, and each action is a {{Consumer<Line>}}. {{Line}} is a data structure that gives access to the text of the line, regular expression matching, and file-local and global state.

File-local state is allocated by a factory, and each file is processed in a single thread. This allows {{Puffin}} to be invoked on multiple files (or more generally sources, including URLs) and processed in parallel. Global state is shared, and rules must coordinate when they access it.

Here is a simple {{awk}} script that counts the number of non-comment lines in a file:

{code}
/^#/ { ++n; }
END { printf("counter: %d\n", n); }
{code}

Here is the equivalent Puffin program:
{code}
    Puffin.Program<Unit> program =
        Puffin.builder(() -> Unit.INSTANCE, u -> new AtomicInteger())
            .add(line -> !line.startsWith("#"),
                line -> line.state().incrementAndGet())
            .after(context ->
                context.println("counter: " + context.state().get()))
            .build();
{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)