Home and Links
 Your PC and Security
 Server NAS
 DVD making
 Raspberry Pi
 PIC projects
 Other projects
 Next >>

Using scripts to preserve your data

Scripts to save data

What are scripts ?

A script is similar to a MS Word or Excel 'macro'. It is a series of COMMANDS that are 'recorded' in a file and can be 'played back' (i.e. 'run') to instruct Windows to perform some repetitive task.

The most common type is a simple 'command file' (.cmd) or 'batch file' (.bat) that runs in a 'DOS Box' aka 'command window'. Other types are based on BASIC (QBasic, QuickBasic etc) or more exotic languages such as Python or Java

How do I create a script ?

The simplest type of script consists of a series of 'DOS' commands typed into a text file using Notepad or similar. When the file is given the extension '.cmd', Windows recognises it as a 'command file' (which is essentially** similar to the old DOS / Windows 9x '.bat' batch file) which it will run in a 'command window'.

**NOTE that in Windows 2000 / XP, running a '.bat' script will invoke the command.com 'interpreter', whilst a '.cmd' script invokes the cmd.exe 'interpreter'. Some of the commands I use in the script can only be understood by cmd.exe and will not work in a '.bat' file.

How do I 'run' a script ?

When you 'double click' a .cmd file Windows will automatically open a 'command window' and start executing the instructions in the file. You can also 'add' it to the Task Scheduler which will automatically run the .cmd file at the time(s) you have chosen.

If you expect your computer to 'go to sleep' when it's not being used, you need to be very careful when setting 'run' times in the Task Scheduler. Whilst Windows should 'wake up' OK from 'sleep' to run a task, chances are it won't from 'hibernate'. Needless to say, if you set the cmd to run every 5 minutes, your computer will never 'sleep' at all !

Are there are any drawbacks to a .cmd script ?

Yes, plenty.

a1. The main drawback is that whilst the 'cmd language' provides quite good support of file manipulation (cut, copy, paste, attributes etc) it is rather poor at dealing with 'text', especially anything 'html', as many characters are interpreted as 'commands' (< > etc)

To manipulate (search & replace) the contents of html files, a '.bas' (Basic) script file can be used. To 'run' the Basic commands, the QBasic.exe (or similar) interpreter is required. This can be found on any Windows NT4 System disc.

a2. Among the other potential issues .cmd files are designed for 'unattended' use = so Windows will generally NOT ask 'Are you sure ?' when it encounters a command like 'Delete everything' (or something else that will trash your files/system).

Whilst error messages etc. will appear in the 'Command Window', when the script completes the Window is closed and any messages lost. So it's a 'good idea' to direct the error messages to a 'log file' that can be examined later

Are there are any drawbacks to a .bas script ? ?

Yes, plenty.

The main problem is that the QBasic 'interpreter' uses '8.3' DOS file and folder names - and when the .bas script is run, the 'current folder' is the location (path/folder) of QBasic.exe and NOT the location of the script.

Whilst it is possible to get QBasic to work with long file names this is not at all easy !

The other main issue is that whilst it is possible to 'call' a .cmd script from withing a .bas file, the QBasic interpreter will stop after the .cmd completes and wait for 'user input' before continuing with the rest of the .bas script.

The only practical approach is to use a .cmd script to copy the file to be processed 'in' to the QBasic.exe path/folder/ (and give the file an 8.3 name eg 'INPUT.DAT'), then 'call' the .bas script to process the file (eg. into OUTPUT.DAT) and then, after the .bas 'returns' (with the SYSTEM command) to the .cmd script, copy the processed file 'out' again (see here for an example of using .cmd and .bas together)

Note that whilst a .cmd script can make use of the 'Environment Variables', QBasic can't (even if it could, Environment Variables modified by a .bas script will 'revert' to their original values on exit from the .bas script)

How can scripts help me prevent unintended file deletion ?

You can create a script that will change the 'Attributes' of all the files in some specific location (for example 'My Music') to 'Read Only'. You can then set the script to run at regular intervals so that any new files in that location are 'protected'. So long as you don't have 'too many' files, the script can set every file to 'Read Only' (even those already set to Read only) and still take only a few seconds to run.

When you try to modify / overwrite / delete any file set to 'Read Only' Windows will 'fail' the operation. Only after you remove the Read Only attribute manually can you change or delete the file - and when the script next runs the file will become 'read only' again. This should prevent you accidentally deleting (or updating) files.

How can I allow other users to 'write' files into a folder but not delete them ?

You can prevent other users with 'Write' access to one of your 'archive' folders from deleting Read Only protected files by removing their permission to 'Change File Attributes'.

The 'originator' of any new file will have until the script next runs to make changes to the file name / contents before it becomes Read Only.

When the user tries to delete or replace a file with a 'Read Only' Attribute, they will get 'operation denied' due to their lack of 'permission'

What if I have tens of thousands of photo's ?

When you first run the script it will take some time changing all the attributes of all the photos. However, you will have organised your photo's into folders, so, next time the script is run, it can check a folders 'modified' date and 'skip over' those folders that have not been modified since the last 'run' (so can have no new files (photos) in them).

You must 'restrict' the script to the folders where you place your documents / music / photo's etc. If you run the script on 'C:' it will try to set 'open' Windows files to Read Only and fail.

When a command within a running script fails, the entire script is 'aborted' and the remaining commands are never executed. This means you must be very careful about 'cleaning up' or 'logging results' as you go along (and not leave it all to the end which might never be reached)

Does changing a files 'Attributes' change it's 'Last Modified' date/time' ?

No. In fact, changing a files Attributes has no effect on the 'Last Modified' date/time of the 'containing' folder either, so you will need to find some way to 'flag' when a folder has been 'processed'.

One method would be to write a 'log file' into each folder. Comparing the folder Modified date against the log file date will show if the folder was modified AFTER the log was written.

However, as will be seen later, a better way is to simply have the script generate a 'copy' of itself into each folder that needs to be processed (if no copy exists, this is a new folder that needs to be processed).

What is the Modified date of a copied file ?

Same as the original :-) = that is why the script has to 'generate' (write into a new file) a copy of itself, rather than simply use Windows to 'make a copy'

How can a script run across multiple folders & set all the files to Read Only ?

One step at a time, and 'recursively', of course :-)

First, when the script is 'run' in a 'top-level' folder it must have a means of 'writing' itself into any 'first level' sub-folder(s). The top level script then calls it's sub-folder copy - and the copy will generate further copies of itself in the next level (of sub-folders) down and so on until the end of the 'tree' is reached.

So, when first run, the script will immediately 'propagate' itself to all existing sub-folders 'below' the current one, no matter how 'deep' the chain.

'Control' will thus be quickly passed 'down' to the lowest sub-folder copy. When that point is reached (i.e. when the script finds no sub-folder below itself), the copy at that point will set the attributes of all the files there to 'Read-Only' and then 'exit' back up to the 'calling' script at the next higher level.

The script at the higher level will first call any other sub-folder at that level not yet visited (which will call any further down that 'chain'), then set it's 'own level' files to Read Only before 'exitting' back up to the next highest level - and so on.

Once we have such a script, all we need to do is modify it to avoid it wasting time visiting sub-folders that have not been Modified since the script was last run (which also means it can discover folders that have 'not been called yet')

When a file is added to a folder, that folders 'modified' date is updated along with the modified dates of any 'parent' folder(s), all the way 'up' to the root of the drive. This means it is possible to follow the 'chain' from a top level 'parent' all the way down to the modified folder, no matter how many levels below.

In order to achieve this, when the script is run, it must first find the modified folders (by comparing the dates of the folder with that of the script in it & 'calling' the script when it finds the folder has been modified 'more recently' than the script) and, after setting all the (other) files in the folder to Read Only, 'write' itself again (so it gets the 'just done now' date).

The 'calling stack' will 'limit' at the number of folder LEVELS, not the number of actual folders (this is key since a 'call stack' of hundreds of folders is likely to 'fall over' pretty fast as the script runs out of storage space :-) )

The final 'tweak' to the script will be to ensure it copes correctly with any folders created since it was last run (such folders will not contain a copy of the script so no date compare can be made).

Pseudo code For (each sub-folder) .. if the script does not exist in the sub-folder, copy script to sub-folder & CALL the copy (without performing a date check) .. otherwise Call the 'check modified date' Subroutine End - 'check modified date' Subroutine if sub-folder has not been Modified since it's script was last run, 'return' (to the For loop) .. otherwise switch to that sub-folder and Call the script .. on return here, set all files here to read-only, and 're-generate' the (just run) script .. go back 'up' to parent folder, and then 'return' (to the For loop)

Can I have a copy of the actual script ?

Yes, but please READ THE WARNING first ...

WARNING - when you run this script it will set to 'Read Only' ALL files in the 'starting' folder and ALL the sub-folders !!!

If the 'top level' folder is 'My Documents' or Desktop (or C:), you will set to Read Only many files that both Windows and your applications rely on to save 'user settings' etc. and THIS WILL CRASH YOUR APPS AND TRASH YOUR SYSTEM.

You should ONLY place it in the 'root' of folders that ONLY contain files (and sub-folder files) that you want to protect - eg My Music, My Pictures, My Webs etc.

For this reason I have 'zipped up' the file (since I can guarantee some-one will 'double click' on the link then wonder why their entire C: drive has been set to 'Read only' and Windows won't start anymore)

If you agree not to contact me after running this script and trashing your system use RIGHT CLICK and 'SAVE AS' (or 'SAVE LINK AS') HERE. The 'recursive' part of the script you download is 'non-functional' = those who know what they are doing will be able to change the one line needed to make it fully recursive

Script Kiddies please note - my web site notes the IP address of visitors and records the details of those who download stuff.

How do I automate the script ?

Place 'top level' copies of the script in the 'root' of each "C:\Documents and Settings\{your user name}\My Documents\My {folders}" that you want to 'protect' (My Music, My Pictures, My Webs and any other you decide to create)

FOR EACH 'top level' script, in Start, Settings, Control panel, Scheduled tasks, click 'Add Scheduled Task'& follow the 'wizard' (Browse to select the .cmd' script in it's folder).

Note that the script presented here only checks the Modified DATE (and not the time). This means that there is no point in running the script more than once a day (and you should do so at the end of the day, not the start).

Can you explain how the script works ?

Yes .... let's look at the code one part at a time ...

The main FOR command 'loop' :-

for /f "tokens=1-6 delims=/<> skip=5" %%G in ('dir /ad /tw') Do (Call :s_dirchk %%G %%H %%I %%J %%K %%L)

'for' means perform this command multiple times. '/f' means process for files&folders, 'tokens' defines how many 'bits' to extract from each line of the 'in' ('dir ..)' list, 'delims' defines the 'bits' separators (/ separates the DD/MM/YYYY date into 3 parts) and skip=5 tells it not to process the first 5 lines of the 'in ('dir ..)' list

'%%G' defines the first variable name to use for the extracted bits (others are assigned automatically, one for each 'bit' extracted, starting with the one you specify, however BEWARE of the 'reserved' %% codes (such as %%F)).

The 'dir /ad /tw' command generates a 'directory' listing, '/ad' means list only 'directories' (i.e. folders) and '/tw' means list the 'write' (i.e. last modified) date.

'Call :s_dirchk' means control will switch to the code at label 's_dirchk' taking along with it the first 5 'bits' (%%G and so on) that were extracted from the dir line (other extracted bits are ignored). %%G will be copied into the 'parameters' (%1 to %6) for the code at the 'Call' location.

To 'return' from the 'call', the "goto :eof" command is used. On return from a call, control is passed back to the For command, which then extracts the next line. This loop continues to execute until every line has been extracted from the dir list (which is why the Call'd code has to deal with 'no date found in the last 2 lines).

NOTE When developing the script, I discovered that trying to 'extract' the date and folder name using 'tokens=1,2,3,6' doesn't work as the tokens (parameters) are assigned to parts of the line broken up by spaces and comma's as well as any 'delims' (separators) you specify

Since the separator characters are all discarded (i.e. not passed to the 'DO'), specifying < and > as seperators means they will be 'stripped out' (and not passed on to the Call code where they would be interpreted as 'redirect' commands :-) )

Finally, although the 'FOR' command is SUPPOSED to ignore 'blank' lines, in fact it doesn't ! So to skip the 3 lines of text & 2 blank lines at top of the 'dir' listing, we must specify 'skip=5'. After the first 5 are skipped, all lines are then processed, including the last 3 - "n File(s)', 'x Dir(s)' and the final 'blank' line (so :s_dirchk has to cope with 'null' date parameters)

NB a 'brief' dir listing (Dir /b) removes the unwanted header and trailer lines but it also suppresses the dates/time fields (/b means 'only show names'). It also fails to remove the final blank line at the end of the listing

The rest of the code is more or less self-explanatory.

Where can I find an full explanation of .CMD script 'commands' ?

You can find a good list and explanation here.

Where can I get more 'tools' to help me manage my computer and files ?

Microsoft makes available a free download called a Resource Kit for Windows2003/XP.

For How to setup RAID on Windows 2000 / XP Pro, click "Next>>" in the navigation bar left

Next subject :- RAID set up