Monday, February 15, 2016

Unusually high load average times on my Raspberry Pi?

I run a few Raspberry Pi 2 on my network. They handle basic tasks - network and device monitoring via Cacti and Smokeping, a local email and chat server, and an ad-blocking DNS server, ala a Pi-Hole. None of these services really task any of their respective machines terribly, so when load times on the Cacti/Smokeping machine started creeping up past a 1m average of 1.06 from a 1m of 0.05, I got a little concerned. It didn't seem to be affecting the machine adversely, and top didn't show me anything unusual, so I let it go as it seemed stable.

That is, until a bootloader update came along recently. Attempting to install it threw I/O errors while the installer tried to copy the new files to /boot.

 I boot my devices from an NFS share that sits on the network, but due to the way the Pi is designed, you still need a single FAT32 partition on an SD card that contains the bootloader's binary blob. This partition gets mounted as /boot at runtime, and is the only partition residing on the SD card in my particular case. Everything else sits on the network share, safe from the corrupting influences of the SD card.

Attempting to ls the /boot directory revealed an empty directory!

I was really happy I hadn't tried to reboot the machine remotely at this point. I shut it down and pulled the card. While the files normally there were still there, the directory was considered dirty by Windows and wanted to be checked on mounting. I was, however, able to copy files from the /boot directory onto a new card, which consisted of a single FAT32 partition.

I reinserted the card, and the Pi rebooted normally at this point. Updates installed, everyone cheered, and I forgot about it until today. Logging in, I noticed that the load averages were now back to what I considered normal for the light load on the machine! I'm not sure why having a bad /boot directory caused my load times to skyrocket, but the cause and effect seems to be there. The machine is running happy again, and my original thought of having the actual OS offloaded in case of SD card failure proved to be a sane one. The whole operation took about 10 minutes to correct, most of that time trying to find an SD card that I could erase.

I've had problems with SD cards simply failing, but that was with the entire OS on the card. Having only part of the system on the card allowed it to continue running, just...strangely.

The happy Pi, chirping once more...

 I'm sure there's an explanation for why the load times were high with a bad boot sector, but I'm unable to explain it. Perhaps you can?