Monitoring a file until a string is found

Become or hire the top 3% of the developers on Toptal https://topt.al/25cXVn

--

Music by Eric Matyas
https://www.soundimage.org
Track title: Life in a Drop

--

Chapters
00:00 Question
00:38 Accepted answer (Score 77)
02:26 Answer 2 (Score 68)
03:05 Answer 3 (Score 20)
03:30 Answer 4 (Score 20)
10:16 Thank you

--

Full question
https://superuser.com/questions/270529/m...

--

Content licensed under CC BY-SA
https://meta.stackexchange.com/help/lice...

--

Tags
#bash #grep #tail #logfiles

#avk47

ACCEPTED ANSWER

Score 81

Here is a simple one-liner. It doesn't need bash-specific or non-POSIX tricks, or even a named pipe. All you really need is to decouple the termination of tail from grep. That way, once grep ends, the script can continue even if tail hasn't ended yet. So this simple method will get you there:

( tail -f -n0 logfile.log & ) | grep -q "Server Started"

grep will block until it has found the string, whereupon it will exit. By making tail run from it's own sub-shell, we can place it in the background so it runs independently. Meanwhile, the main shell is free to continue execution of the script as soon as grep exits. tail will linger in its sub-shell until the next line has been written to the logfile, and then exit (possibly even after the main script has terminated). The main point is that the pipeline no longer waits for tail to terminate, so the pipeline exits as soon as grep exits.

Some minor tweaks:

The option -n0 to tail makes it start reading from the current last line of logfile, in case the string exists earlier in the logfile.
You might want to give tail -F rather than -f. It is not POSIX, but it allows tail to work even if the log is rotated while waiting.
Option -q rather than -m1 makes grep quit after the first occurrence, but without printing out the trigger line. Also it is POSIX, which -m1 isn't.

ANSWER 2

Score 68

The accepted answer isn't working for me, plus it's confusing and it changes the log file.

I'm using something like this:

tail -f logfile.log | while read LOGLINE
do
   [[ "${LOGLINE}" == *"Server Started"* ]] && pkill -P $$ tail
done

If the log line matches the pattern, kill the tail started by this script.

Note: if you want to also view the output on the screen, either | tee /dev/tty or echo the line before testing in the while loop.

ANSWER 3

Score 21

If you're using Bash (at least, but it seems it's not defined by POSIX, so it may be missing in some shells), you can use the syntax

grep -m 1 "Server Started" <(tail -f logfile.log)

It works pretty much like the FIFO solutions already mentioned, but much simpler to write.

ANSWER 4

Score 20

There are a few ways to get tail to exit:

Poor Approach: Force `tail` to write another line

You can force tail to write another line of output immediately after grep has found a match and exited. This will cause tail to get a SIGPIPE, causing it to exit. One way to do this is to modify the file being monitored by tail after grep exits.

Here is some example code:

tail -f logfile.log | grep -m 1 "Server Started" | { cat; echo >>logfile.log; }

In this example, cat won't exit until grep has closed its stdout, so tail is not likely to be able to write to the pipe before grep has had a chance to close its stdin. cat is used to propagate the standard output of grep unmodified.

This approach is relatively simple, but there are several downsides:

If grep closes stdout before closing stdin, there will always be a race condition: grep closes stdout, triggering cat to exit, triggering echo, triggering tail to output a line. If this line is sent to grep before grep has had a chance to close stdin, tail won't get the SIGPIPE until it writes another line.
It requires write access to the log file.
You must be OK with modifying the log file.
You may corrupt the log file if you happen to write at the same time as another process (the writes may be interleaved, causing a newline to appear in the middle of a log message).
This approach is specific to tail—it won't work with other programs.
The third pipeline stage makes it hard to get access to the return code of the second pipeline stage (unless you're using a POSIX extension such as bash's PIPESTATUS array). This is not a big deal in this case because grep will always return 0, but in general the middle stage might be replaced with a different command whose return code you care about (e.g., something that returns 0 when "server started" is detected, 1 when "server failed to start" is detected).

The next approaches avoid these limitations.

A Better Approach: Avoid Pipelines

You can use a FIFO to avoid the pipeline altogether, allowing execution to continue once grep returns. For example:

fifo=/tmp/tmpfifo.$$
mkfifo "${fifo}" || exit 1
tail -f logfile.log >${fifo} &
tailpid=$! # optional
grep -m 1 "Server Started" "${fifo}"
kill "${tailpid}" # optional
rm "${fifo}"

The lines marked with the comment # optional can be removed and the program will still work; tail will just linger until it reads another line of input or is killed by some other process.

The advantages to this approach are:

you don't need to modify the log file
the approach works for other utilities besides tail
it does not suffer from a race condition
you can easily get the return value of grep (or whatever alternative command you're using)

The downside to this approach is complexity, especially managing the FIFO: You'll need to securely generate a temporary file name, and you'll need to ensure that the temporary FIFO is deleted even if the user hits Ctrl-C in the middle of the script. This can be done using a trap.

Alternative Approach: Send a Message to Kill `tail`

You can get the tail pipeline stage to exit by sending it a signal like SIGTERM. The challenge is reliably knowing two things in the same place in code: tail's PID and whether grep has exited.

With a pipeline like tail -f ... | grep ..., it's easy to modify the first pipeline stage to save tail's PID in a variable by backgrounding tail and reading $!. It's also easy to modify the second pipeline stage to run kill when grep exits. The problem is that the two stages of the pipeline run in separate "execution environments" (in the terminology of the POSIX standard) so the second pipeline stage can't read any variables set by the first pipeline stage. Without using shell variables, either the second stage must somehow figure out tail's PID so that it can kill tail when grep returns, or the first stage must somehow be notified when grep returns.

The second stage could use pgrep to get tail's PID, but that would be unreliable (you might match the wrong process) and non-portable (pgrep is not specified by the POSIX standard).

The first stage could send the PID to the second stage via the pipe by echoing the PID, but this string will get mixed with tail's output. Demultiplexing the two may require a complex escaping scheme, depending on the output of tail.

You can use a FIFO to have the second pipeline stage notify the first pipeline stage when grep exits. Then the first stage can kill tail. Here is some example code:

fifo=/tmp/notifyfifo.$$
mkfifo "${fifo}" || exit 1
{
    # run tail in the background so that the shell can
    # kill tail when notified that grep has exited
    tail -f logfile.log &
    # remember tail's PID
    tailpid=$!
    # wait for notification that grep has exited
    read foo <${fifo}
    # grep has exited, time to go
    kill "${tailpid}"
} | {
    grep -m 1 "Server Started"
    # notify the first pipeline stage that grep is done
    echo >${fifo}
}
# clean up
rm "${fifo}"

This approach has all the pros and cons of the previous approach, except it's more complicated.

A Warning About Buffering

POSIX allows the stdin and stdout streams to be fully buffered, which means that tail's output might not be processed by grep for an arbitrarily long time. There shouldn't be any problems on GNU systems: GNU grep uses read(), which avoids all buffering, and GNU tail -f makes regular calls to fflush() when writing to stdout. Non-GNU systems may have to do something special to disable or regularly flush buffers.