Replacing SmokePing with a very small shell script

For a while, I ran a small IRC bot for keeping the members of a local hackerspace informed about when the space is open for visitors. This bot needed to check a status page every minute or so.

The problem was that for a few days there were regular connection problems. The server running the bot couldn’t connect to the server hosting the status page. The error messages weren’t helpful – just connect() timeouts. To try to better understand the intermittent problem, I wanted to measure the latency and connectivity between these hosts on an ongoing basis.

The natural choice for long term latency measurement is, of course, SmokePing.

The only trouble is the amount of effort involved in installing SmokePing, configuring it, configuring a httpd to serve the content and run a Perl CGI, keep the content private, etc. None of that is hard, but it is certainly annoying and I was not interested. I wanted to measure latency between two machines, not spend a good half hour, if lucky, setting up all that.

So I used Metricfire instead. I wrote a one line snippet of shell script to make 20 ICMP ping requests, format the latency results, and send it to Metricfire’s HTTP API.

I handed that off to cron to run once a minute, and now you can see the complete recipe:

* * * * * curl -s https://my-api-key-redacted@api.metricfire.com/v1/m\
etric/smokecheap/tog-dot-ie --data-binary "[$(ping -c 20 tog.ie | awk \
'/bytes from/{print $8}' | cut -b6-10 | xargs | tr ' ' ',')]"

After a little clicking around, I had a graph of the min/avg/max latency between these two machines.

The whole thing took less than five minutes. It’s obviously not as complete as SmokePing, but it gave me the result I wanted with a fraction of the effort. 80% of the functionality with much less work. I’m pretty satisfied with that.

The twist to the story is that the intermittent connectivity problem disappeared before I got a chance to observe it! Definitely glad I didn’t set up SmokePing just for this. If it happens again I’ll be ready with this historical data to help me spot patterns.

- Charlie

2 thoughts on “Replacing SmokePing with a very small shell script

  1. Great hack! I stopped looking at smokeping after reading the manpage – it’s actually *longer* than lsof’s, if you can believe it …

    I notice that you’re posting 20 results to metricfire which, according to your docs, will all be timestamped with the (single) POST’s arrival time. Well, using your rather nice list-of-lists data format, I came up with the following (slightly!) more complex pipeline that timestamps each RTT:

    ping -n -c 20 -W 2 10.0.10.1 | sed –unbuffered ‘s/=/ /g’ | awk -W interactive ‘BEGIN { printf “%s”, “[" } /bytes from/ { printf "%s%s,%s%s", "[", srand(), $10, "],” } END { print “]” } ‘ | sed ‘s/],]/]]/’

    It looks like there are some optimisations you could make, but there are some annoying parts that make it tricky. You /could/ replace the first sed with a gsub() in the central awk. You /could/ replace the last sed with a flag inside the central awk. Probably both worth doing if you’re at scale, but would only obfuscate it further :-)

    I’ll give it a go with metricfire as soon as you OK my account request (hint, hint!) :-)
    It’s an @7digital.com address, FWIW …

    Cheers!
    Jonathan

    • That’s a pretty cool improvement, Jonathan! Nice work.

      Generating valid JSON isn’t much fun in shell. :)

      I’ve already sent your beta invite email. Thanks!