Quantcast
Channel: Johannes Weber – Weberblog.net
Viewing all articles
Browse latest Browse all 311

Stats from Participating the NTP Pool Project

$
0
0

I am participating in the NTP Pool Project with at least one NTP server at a time. Of course, I am monitoring the count of NTP clients that are accessing my servers with some RRDtool graphs. ;) I was totally surprised that I got quite high peaks for a couple of minutes whenever one of the servers was in the DNS while the overall rate did grow really slowly. I am still not quite sure why this is the case.

For one month I also logged all source IP addresses to gain some more details about its usage. Let’s have a look at some stats:

This article is one of many blogposts within this NTP series. Please have a look!

Prenotes

  • For this blogpost I took the stats from March 2019. At this time, I had four servers online:
    • ntp2: Stratum 1, Raspberry Pi 1 B+ with GPS
    • ntp3: Stratum 1, Meinberg M200
    • ntp4: Stratum 2, Raspberry Pi 3 B Rev 1.2
    • ntp5: Stratum 2, Dell PowerEdge R200, Intel(R) Pentium(R) Dual CPU E2200 @ 2.20GHz, 4 GiB DDR2 Memory
  • All were listed with a net speed of 10 Mbit/s. My actual ISP speed was 100 Mbit/s.
  • Since all servers are IPv6 only, it is quite easy to count NTP clients. Every single source IPv6 address is a single client.

Scoring

NTP servers are only used by the round-robin DNS of the pool if they have a score higher than 10. But there are some concerns about this scoring. “Points are deducted if the server can’t be reached or if the time offset is more than 100ms (as measured from the monitoring systems).” More specific: “The monitoring system works roughly like an SNTP (RFC 2030) client, so it is more susceptible by random network latencies between the server and the monitoring system than a regular ntpd server would be.”

In fact, almost once a day my scores drop dramatically, sometimes even below a score of 0, while my NTP servers are fully functional from my point of view. Here’s an example from my server ntp5. The yellow dots (offset) are increasing regularly, while the score dropped between 14-23 o’clock on April 1st, 2019:

Now, this was my point of view from my monitoring station. [Ref: Basic NTP Server Monitoring] Neither the jitter (measured in µs rather than ms!) nor the offset (in ms) had any issues. April 1st, 2019 on the left-hand side of the graphs:

This seems to be related to some routing behavior from Los Angeles (the location of the NTP monitoring station) to my network (DTAG, AS3320). Or generic network congestion. I don’t know.

That is: My overall experience with this score is mixed. To my mind, it is not reliable and should be replaced by a more profound one. Note that this discussion is not new, refer to some threads on the pool mailing list: “Why is my server score suddenly so poor?” or “Decentralised monitoring?“.

NTP Client Stats

At first here is a weekly graph from one of my servers (ntp5) which shows the normal case all over the time. That is: High peaks (up to 30 k), but only for a very small amount of time:

I am wondering why there are so many NTP clients that are querying the servers only *once* at the time the IP address is listed in the DNS. I expected that NTP clients are resolving the DNS and staying on those IP addresses until the next service restart or system reboot. But obviously, they don’t. Any ideas? Bad implementations such as explained here: “How to NOT use the NTP Pool“?

Here is the summary graph of my NTP servers (ntp1, 2, 4, 5) from March 2019. It shows the maximum unique clients = IPv6 source addresses per 20 minutes. [Ref: Counting NTP Clients] That is: Max clients per 20 min is about 30-40 k. Wow.

I have logged all incoming connections through my FortiGate FG-100D firewall to an external syslog-ng server. Hence I could cat ‘n grep through the raw logfiles from this whole month.

TL;DR: Four servers listed with 10 Mbit/s each on the NTP Pool. For one month, each server got 91 k requests per day (avg) = 1.05 requests per second on average. The absolute max requests per second was 1794. Unique IPv6 source addresses over all four servers: 3.8 M.

Some more details with the values per month:

ServerNTP RequestsUnique SourcesMax Requests per Second
ntp225566219739311794
ntp3309162810223461223
ntp430770019983431110
ntp525553228642741280
AVG2820143964723-
AVG per Day9097231120-
AVG per Second1.050.36-

Here are the top 10 queries per second timestamps from one server (ntp2). As you can see, it’s only the top 10 which exceeds the 1000 queries/s rate:

weberjoh@jw-nb10-syslog-mirror:/var/log/firewalls/2003:de:2016::3/2019/03$ cat * | grep "dstip=2003:de:2016:330::6b5:123 dstport=123" | awk '{print $5,$6}' | uniq -c | sort -rg | head
   1794 date=2019-03-08 time=00:04:01
   1702 date=2019-03-08 time=00:03:12
   1547 date=2019-03-08 time=00:03:15
   1528 date=2019-03-08 time=00:03:13
   1444 date=2019-03-08 time=00:03:19
   1280 date=2019-03-08 time=00:03:17
   1266 date=2019-03-08 time=00:04:03
   1130 date=2019-03-08 time=00:03:10
   1064 date=2019-03-08 time=00:03:11
   1061 date=2019-03-08 time=00:04:00

Here’s another analysis. How often do I see how many requests/s. First column: counts per month, second column: queries/s. Listing from ntp2. That is: The vast majority is below 10 queries/s. For example, line 11 reads: 829x per month the server got 10 requests per second. To my mind, that’s not that much.

weberjoh@jw-nb10-syslog-mirror:/var/log/firewalls/2003:de:2016::3/2019/03$ cat * | grep "dstip=2003:de:2016:330::6b5:123 dstport=123" | awk '{print $5,$6}' | uniq -c | sort -rg | awk '{print $1}' | uniq -c | sort -rg | head -20
 705352 1
 158096 2
  34087 3
  10946 4
   6838 5
   3664 6
   2054 7
   1369 8
   1076 9
    829 10
    724 11
    636 12
    573 13
    501 14
    460 15
    435 16
    412 17
    351 18
    349 19
    316 21

This is how I grepped:

### NTP requests per server address:
cat * | grep "dstip=2003:de:2016:333:221:9bff:fefc:8fe1 dstport=123" | awk '{print $15}' | sed s/srcip=// | wc -l

### Unique source IPv6 addresses per server address:
cat * | grep "dstip=2003:de:2016:333:221:9bff:fefc:8fe1 dstport=123" | awk '{print $15}' | sed s/srcip=// | sort | uniq | wc -l

### Queries per second, top 10 per server address:
cat * | grep "dstip=2003:de:2016:333:221:9bff:fefc:8fe1 dstport=123" | awk '{print $5,$6}' | uniq -c | sort -rg | head

### How often are certain queries per second, top 20 per server address:
cat * | grep "dstip=2003:de:2016:333:221:9bff:fefc:8fe1 dstport=123" | awk '{print $5,$6}' | uniq -c | sort -rg | awk '{print $1}' | uniq -c | sort -rg | head -20

### Unique source addresses over all servers for one month:
cat * | grep policy6 | grep "dstport=123" | awk '{print $15}' | sed s/srcip=// | sort | uniq | wc -l

Requests after Leaving the Pool

I had to move my lab to another location with new IPv6 addresses. Hence I had to delete my servers since they are referenced by IP addresses rather than DNS names in the pool.

This is the NTP clients graph (zoomed in) for the first six days after I left the pool:

Clients are decreasing while there are still a couple of hundreds that are constantly using my server. Again, from a technical perspective I was expecting even many more clients using it constantly. I thought that once an NTP client queries the DNS name of the pool it stays on those resolved IP addresses until a reboot of the system. But obviously, this isn’t the case for the majority of clients. One idea: Maybe these clients use ntpdate rather than ntpd which is called every hour via cronjob? In this case, each run would initiate a new DNS query rather than staying on the same NTP server. But that’s just an idea. I have no clue what’s going on there.

Featured image “Hourglass with coins” by Marco Verch is licensed under CC BY 2.0.


Viewing all articles
Browse latest Browse all 311

Trending Articles