IEEE 802.11 Wi-Fi is a complicated beast, and the intricacies of the protocol coupled with the unpredictability of the radio frequency (RF) environment makes for some strange issues that we at Uplevel Systems face in the field.
Recently we stumbled across one that was stranger than most.
The initial symptoms were rather baffling. We were originally alerted to the fact that a customer was reporting poor Wi-Fi performance from some newly installed access points. Customer mobile devices were frequently disconnecting, and even when they were connected, the performance was spotty. Rebooting the access points cleared things up for a short period, then the problem started again.
All this was occurring despite the fact that the signal strengths of the mobile devices at the APs (and vice versa) were quite good. A quick check of the surroundings showed no interference on the channels. So what could be going wrong?
The first clue came when we attempted to dial into one of the APs through our diagnostic connection and run some Wi-Fi diagnostics. The diagnostics ran, all right, but they were very very slow. Not only that, CPU and memory utilization on the APs were very high. The returned data values on the diagnostics were also very peculiar (for example, a routine channel check claimed that the AP radios were set to unknown channels - patently impossible, because clients were associated with the APs!).
A check of the system logs on one of the APs showed a pattern that we had never seen before: a rapidly repeating pattern of failed Wi-Fi associations, hundreds of times a second. This prompted our engineers to run a low-level radio packet capture utility and see what was going on 'under the hood'. They were amazed to discover that there were a couple of Wi-Fi client devices that were almost literally hammering the APs with connection requests as fast as possible! Here was the source of all the trouble.
Wi-Fi clients are required to associate (connect with) an AP and establish security keys prior to starting data traffic. Normally, associations are fast and efficient: four packets (Authentication Request, Authentication Response, Association Request, Association Response) are all that are needed to set up a connection context, followed by an exchange of security keys to encrypt the actual data.
A successful association causes both the client and AP to allocate connection state and get ready to process data. Since it could take some time between setting up the connection state and the subsequent security keys, APs hold the state for a considerable time.
In this case, however, we found client devices from a manufacturer called Tuya Smart, Inc, that were doing something quite abnormal. These devices would perform the complete four-way association handshake correctly; but then they would immediately abandon the connection context established by the handshake, and start a new one right away. They would repeat this over and over again, without stopping.
The poor AP, of course, had no idea that this was going on; so the available database of connection contexts was being completely filled up by these fake associations. Not only that, the processing of the association requests and the flushing of the connection table was eating up all the available CPU and RAM. The result? The rest of the (well-behaved) clients were left out in the cold, and everyone saw poor Wi-Fi performance.
Once we discovered this, the remedy was simple. The Uplevel system allows the blacklisting of MAC addresses on the Wi-Fi and Ethernet LAN, so we simply added a MAC blacklist of the Tuya Smart devices. Presto! The whole situation cleared up like magic, because every packet arriving from the offending devices was dropped and the association handshakes couldn't even start. The APs and Wi-Fi returned to normal operation.
Normally, such behavior would be classified as a Denial of Service attack (one of many possible DoS attacks that we're familiar with). But did the manufacturer actually intend to create such a DoS attack? We looked them up, and discovered that Tuya Smart is a Chinese vendor of Wi-Fi modules used in smart lighting systems. Our best guess is that they didn't really plan to create a DoS device, but instead this is a simple firmware bug. Maybe if the customer had known that they were installing a smart lighting device, and had actually connected the device to the Wi-Fi SSID, it would not have spammed the AP with unceasing association requests. Who knows?
Just goes to show: Wi-Fi is complex, and the strangest things happen!