logo
background
 Home and Links
 Your PC and Security
 Server NAS
 Wargames
 Astronomy
 PhotoStory
 DVD making
 Raspberry Pi
 PIC projects
 Other projects
 Next >>

Raspberry Pi WDT (WatchDog Timer)

Pi WDT

*** UNDER CONSTRUCTION ***

Background

The Raspberry Pi can not be relied on to run 24x7x52 without 'locking up' due to some 'memory leak' or one in a million combination of circumstances leading to a hardware 'race' condition that defeats the software drivers etc. In fact, you would be hard pressed to find any computer (and for sure not a Windows computer) that would keep going indefinitely.

So, we just have to accept that the Pi needs to be rebooted every so often - and whilst we can run a background task that performs a reboot (say) every 24 hours (to 'head off' gradual memory leaks etc), sooner or later it's going to 'lock up' completley, to the point where background tasks stop running

The only way to ensure the Pi 'recovers' from such as lock up, is to perform a 'hard reset (or a power-cycle). Fortunatly, there is a way to do this without needing anyone to 'push the button'

To 'hard reset' the Pi, you pull pin1 (the 'round' hole) on P6 (A/B) or 'RUN' option (A+/B+/B2/Zero) header hols Lo (i.e. connect it to Gnd, p0) :- On the A/B, P6 is the two holes near the board edge , between the power socket and HDMI socket On the A+/B+, and B2, RUN is the two holes near the board edge, between the DSI socket and the I/O 40pin header On the Pi Zero, RUN is the two holes near the board edge, between the 'TV' holes and the I/O 40pin header

Enter the 'watch dog timer'

A 'watch dog timer' is simply a 'counter' that has to be 'reset' every so often to prevent it timing out and applying a 'hard reset' to the Pi.

The 'simplest' approach is to use a NE555 'timer' and have your task 'poke' (control) it using one of the Pi i/o lines. Many designs exist and it should be a simple matter to adopted one for the Pi (visit here for an example

This sort of approach is ideal if you want to reset when your 'task' (script) has 'crashed' i.e. when a script that should be 'running all the time' has stopped working. To use the WDT in this mode, the task 'restarts' the WDT on each 'loop'. If the script 'crashes' (and stops sending WDT 'restart' signals), the WDT will time out and apply the reset. The only drawback is that you have to know, in advance, the 'worst case' loop time (plainly the WDT has to be set so it does not time-out during normal operation)

A safer WDT

The danger any WDT is, of course, that whilst 'your' task may have 'crashed', the system itself (or other task(s)) might still be running and if the SD card (or a USB device) is being written when the 'hard reset' is applied this will trash the file or system partition folder/directory structure

So, rather than wire your NE555 timer based WDT straight to the Pi 'reset' pin, wire it to one of the i/o pins. Then write a simmple background task that monitors this pin and issues a 'sudo shutdown' command when it 'spots' the i/o pin 'time-out'.

Whilst this does not protect against a total chip 'lock up', it will protect you from your 'main script' crashing (for whatever reason) - and will do so in a way that avoids (most) file system corruption

Using the Pi's built in WDT

The Pi Broadcom BCM2835 comes with a hardware-based watchdog timer already built in ! Needless to say, getting it to work is another story, especially as it monitors 'task completion' (so, no good if you are running a simple photoframe, because the neither the 'foreground' task (fbi = display photos) nor the 'background' task (your script looking for new photos) will ever complete)

Anther 'problem' is that, unlike you 'own' design, the Pi WDT comes with a pre-defined 'poke' function along with many other (hard to understand) 'clever functions'. This means it is almost inevitable you will end up 'triggering' the WDT in error ... so, DON'T set it to 'run at power-on' until you are SURE you have defined the 'trigger conditions' correctly

The final issue is that each o/s change seems to 'break' the WDT in some way. First it was the B+/B2 then, after upgrading to Jessie, the watchdog failed to start at boot (although starting it manually using "sudo service watchdog start" did work). The information below is 'vintage 2013', you will need to check the web for the latest information

Start by installing the bcm2708_wdog kernel module To load the watchdog kernel module right now, issue the following command:

$ sudo modprobe bcm2708_wdog

If you are running Raspbian

1) Add the bcm2708_wdogmodule to the /etc/modules file (so it loads on system boot-up). You can edit the file (sudo nano /etc/modules) or just 'echo' the text to 'tee' (-a means 'add' rather than 'overwrite :-) ) :-

$ echo "bcm2708_wdog" | sudo tee -a /etc/modules

2) Install the software watchdog daemon' by running the following command:-

$ sudo apt-get install watchdog

3) Install the run on reboot command :-

$ sudo update-rc.d watchdog defaults

4) Configure the watchdog daemon parameters in /etc/watchdog.conf using nano :-

$ sudo nano /etc/watchdog.conf

Uncomment the line that starts with #watchdog-device by removing the hash (#) to enable the watchdog daemon to use the watchdog device.
Uncomment the line that says #max-load-1 = 24 by removing the hash symbol to reboot the device if the load goes over 24 over 1 minute.
A load of 25 of one minute means that you would have needed 25 Raspberry Pis to complete that task in 1 minute. You may tweak this value to your liking.

5) Start the watchdog daemon

$ sudo chkconfig watchdog on

If you are running Arch Linux

1) Create a file called "bcm2708_wdog.conf" with the text "bcm2708_wdog" in /etc/modules-load.d/. You can do this using 'echo' with the 'tee' command:

$ echo "bcm2708_wdog" | sudo tee /etc/modules-load.d/bcm2708_wdog.conf

2) Install the software watchdog 'daemon' = in Arch, we use pacman:-

$ sudo pacman -S watchdog

3) Install the run on reboot commands :-

$ sudo chkconfig --add watchdog $ sudo systemctl enable watchdog

4) Configure the watchdog daemon parameters in /etc/watchdog.conf using nano :-

$ sudo nano /etc/watchdog.conf (or $ sudo /etc/init.d/watchdog start)

Uncomment the line that starts with #watchdog-device by removing the hash (#) to enable the watchdog daemon to use the watchdog device.
Uncomment the line that says #max-load-1 = 24 by removing the hash symbol to reboot the device if the load goes over 24 over 1 minute.
A load of 25 of one minute means that you would have needed 25 Raspberry Pis to complete that task in 1 minute. You may tweak this value to your liking.

5) Start the watchdog daemon :

$ sudo systemctl start watchdog.service

In all systems

From then on, any task that fails to complete within the alloted time will cause the WDT to time out and the Pi will 'hard reset' itself.

Check the .conf file for other WDT functionality

The danger any WDT is, of course, that whilst 'your' task may have 'crashed', if the 'system' (or some other task(s)) are still tunning, there is always the chance that they might be writing to the SDHC card (or to a USB device) when the 'hard reset' is applied. This will likely trash the file system/directory.

Preventing system corruption

Every Pi system 'distro' (i.e. every 'flavour' of Linux) will, by default, write reams and reams of 'status' data to the SDHC card every few milliscconds (just like Windows). Even a 'read' operation is 'unsafe' as Linux updates the 'last read date/time' in the file system directoty. It is thus more or less inevitable that, sooned or later, you will suffer from the dreaded Pi 'SDHC corruption' if you ever have to pull the power or reset a 'locked up' / 'non-responsive' Pi

One way to minimise system corruption is to make the whole system partition 'read only'

Some 'guides' will suggest that you only need to make your system files 'read only', however this is not good enough = the main cause of a 'corrupted system SDHC' is not the corruption of a few status/logging files but corruption of the system directory structure. To isolate the system directory structure, you have to make the whole system partition read only

Note that the 'lock/unlock' switch on the SDHC card is irrelevant (it's just a 'status line' that the Pi SD card reader doesn't even 'track' to the Pi SOC chip) and even if it 'worked', the Pi system would immediately crash when it found itself unable to write the 'boot-up' status ...

Instead you have to tell the Pi system to stop updating every files 'last read' date/time and 'move' the 'log files' to a RAM disk (and find another partition for a number of other folders/files - for example the current date/time 'tick count'). Finally, once you have the system partition 'read only' you need to be aware that you need to make it 'writable' again before doing an 'update' (or GET'ing new software)

(+) How to make the System partition read only ?


Next page :- Pi GPS - (Time Server)

[top]