You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@calcite.apache.org by "Julian Hyde (Jira)" <ji...@apache.org> on 2023/06/07 21:05:00 UTC
[jira] [Created] (CALCITE-5764) Puffin, an Awk for Java
Julian Hyde created CALCITE-5764:
------------------------------------
Summary: Puffin, an Awk for Java
Key: CALCITE-5764
URL: https://issues.apache.org/jira/browse/CALCITE-5764
Project: Calcite
Issue Type: Bug
Reporter: Julian Hyde
Create Puffin, which allows a programming model similar to the {{awk}} scripting language.
An {{awk}} program is a collection of rules, each of which is a pair: a predicate and an action. For each line in a file, the rules are applied in sequence, and if the predicate evaluates to true, the action is executed. Then {{awk}} goes on to the next file.
In {{Puffin}}, each predicate is a {{Predicate<Line>>}}, and each action is a {{Consumer<Line>}}. {{Line}} is a data structure that gives access to the text of the line, regular expression matching, and file-local and global state.
File-local state is allocated by a factory, and each file is processed in a single thread. This allows {{Puffin}} to be invoked on multiple files (or more generally sources, including URLs) and processed in parallel. Global state is shared, and rules must coordinate when they access it.
Here is a simple {{awk}} script that counts the number of non-comment lines in a file:
{code}
/^#/ { ++n; }
END { printf("counter: %d\n", n); }
{code}
Here is the equivalent Puffin program:
{code}
Puffin.Program<Unit> program =
Puffin.builder(() -> Unit.INSTANCE, u -> new AtomicInteger())
.add(line -> !line.startsWith("#"),
line -> line.state().incrementAndGet())
.after(context ->
context.println("counter: " + context.state().get()))
.build();
{code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)