Imagine you want to split a file on lines matching a pattern, like on empty lines, or on the sections of a markdown file (## Section).

How do you do that on the cli?

1. Unix/Linux split

split, just like most Unix/Linux tools, works on a per-line basis.

Therefore, it is not usable.

2. csplit

Gnu coreutils comes with csplit for this purpose, although its syntax is peculiar:

csplit --quiet --elide-empty-files --suppress-matched <input_file> '/^$/' '{*}'

This will split input_file on empty lines (/^$/), as many times as possible ({*}), and save the outputs to files named xx01, xx02, etc.

It’s a terminal operation, as in "cannot be piped to something else".

3. awk

Similarly, we can use awk to do the same:

awk 'BEGIN {x="xx0"} /^$/{x="xx"++i;} {print > x}' <input_file>

This is:

  1. creating a variable x set to xx0 at the very beginning

  2. only updating it to xx1 if pattern is matched (/^$/)

  3. in all cases, redirecting the output of print to the file named by the variable x.

This operation is also terminal.

4. jq

Doing it with jq allows us to have the outputs in a structured format, and so we can pipe it to another command in a pipe:

jq -nR --stream '[inputs] | reduce .[] as $item ([[]]; if $item | test("^$") then . += [[]] else .[-1] += [$item] end)' <input_file>

This reads the inputs as a list of lines, and split it, returning a list of list of lines.

5. Java

For fun, we can try to implement it in Java (11+):

import java.io.IOException;
import java.nio.charset.StandardCharsets;
import java.nio.file.Files;
import java.nio.file.StandardOpenOption;
import java.nio.file.Paths;
import java.util.regex.Pattern;

public class SplitBySeparators {
  public static void main(final String args[]) throws IOException {
    if (args.length != 2) {
      System.err.println("Usage: SplitBySeparators <filename> <separator>");
      System.exit(1);
    }
    final String filename = args[0];
    final Pattern pattern = Pattern.compile(args[1]);

    int counter = -1;
    for (final String line: Files.lines(Paths.get(filename)).toList()) {
      var option = StandardOpenOption.APPEND;
      String lineToAdd = "\n" + line;
      if (pattern.matcher(line).matches() || counter == -1) {
        counter += 1;
        option = StandardOpenOption.CREATE;
        lineToAdd = line;
      }
      Files.write(Paths.get(filename + counter),
                  lineToAdd.getBytes(StandardCharsets.UTF_8),
                  option);
    }
  }
}

Compile it once:

javac SplitBySeparators.java

And then use it:

java SplitBySeparators <input_file> '^$'

Optionally, you can even compile it to a native binary:

native-image --no-server --static SplitBySeparators SplitBySeparators

and use it:

./SplitBySeparators <input_file> '^$'