Website lock-down

Note that this ONLY applies to websites run by the Open Source 'Apache' web hosting software (with php). Those using Microsoft hosting (with asp) should look elsewhere.

NB lock-down is only possible if you have control over your .htaccess file - many 'off the shelf' free website packages (eg WordPress) won't allow this

Below is one way to 'structure' a web site. However, I don't want to 'give away' too much information that might help 'Mr Hacker' access this, my own, web site. So below is a 'starting point' = hopefully you will find some ideas and techniques you can 'adapt' and apply in slightly different ways (as, indeed, have I)

Hack-free page uploading

First, I always use SFTP to upload new web pages - and not go in via my web-browser (eg via 'cPanel') and 'drag and drop' files

The usual caution applies = if I can 'upload' using some web browser from 'anywhere in  the world', so can some-one else

You have to assume that sooner (rather than later) some hacker will get access to your SFTP user name/password

So you MUST restrict FTP access to 'non-live' directories.
 
The 'non-live' structure will be a 'ghost' of the real directory structure structure and should have a 'root' named 'something expected' (for example, 'htdocs').
 
This lets you (or the hacker) upload new pages just fine, however they will then just 'sit there' doing nothing until they are copied into the 'live' directories - and that will be done by Apache itself (using some PHP code)
 
Needless to say, once you have your PHP 'copy' code 'up and running', you will have set your (sub-)folder 'permissions' to prevent ANY (not even FTP) access to the live web site files = i.e restrict access to System (Apache) only.
 
NB. I make extensive use of files names starting '.'. This is the *nix 'do not list' flag and has the advantage that such files give most Windows code (and thus most script kiddies) headaches :-)

The ghost folder structure has to be 'within' the website itself (because you are going to use a web page to 'trip' the PHP script that copies and 'activates' the new pages), but not 'accessible' from the web EXCEPT to you = i.e. the Ghost root will have a .htaccess that imposes a password AND limits access to your own IP address (and 127.0.0.1, for local testing)

We thus have an upload folder structure something like :-
.FTP (in root, '.' prefix is a 'fall back' meaning 'don't list')
.FTP/htdocs/(replicated structure for FTP uploading)
.FTP/.htaccess (password, IP lock, no index with (say) '.golive' set to a php file type

The .golive file is your php script that copies new uploaded pages into the 'live' site.
 
When you launch the .golive script it 'scans' the 'nn_' directories for 'nn_' page changes, shows you the list (so you can check nothing else has been added), asks for 'confirmation' and only then generates the one-way-hash and copies the new files across.
 
Needless to say, the script NEVER copies anything other than nn_ pages in nn_ directories (so no '.' files (.htaccess .htpasswd), no 'root' files) nor will it create new directories in Live (although if it finds a new directory in live it should replicate that into ghost).
 
To create new Live directories, add new .htaccess / .htpasswd or copy anything into root, you will have to use cPanel (if the hacker gets into cPanel permissions, 'all is lost' anyway)
 
Note that, since '.' files will never be copied, it's quite OK to fill the ghost directory structure with 'dummy' .htaccess files for hackers to examine ..
 
Better is to have your 'page generator' check if .htaccess is 'missing' from the 'ghost' (which may indicate that Mr Hacker has managed to obtain FTP access and delete it) and then copy the 'dummy' back in from the live site !

NOTE THAT the '.golive' php is also responsible for setting the 'correct' access permissions (chmod)

Additional protection for page definitions

The scheme outlined above is fine for pages - where it's not totally disastrous if Mr Hacker manages to read the pages . However if Mr Hacker gets access to the page definition php he can examine them for flaws that allow him access to the site. So, 'as a rule' you should only ever use cPanel to upload new page definition php files

Basic page design

1) The first thing .htaccess does is 'dump' banned IP's .. then it filters out unwanted TLD's (with a polite 'Sorry, this site is limited to UK residents browsing from their own home')

2) Next, it checks for speeding scrip kiddies and 'bad bots' - and directs them to the 'rabbit hole' script

3) Finally, the requested URL is passed onto the page generation php. 'Known' IP's go to 'knownIP.php', the rest go to 'guestIP.php'.

How the site is structured

a) All the actual files on the entire website will be in sub-directories that are 'un-readable' from the web, so the only way for a visitor to get a page is to have the site php generate it.

You might think that means your website can't function - not so - Apache will always 'run' the .htaccess EVEN IF that .htaccess then makes that the directory 'password protected' or 'deny all' (i.e. accessible only by user name / password or not readable at all)
 
Further, Apache is quite happy to 'run' php script from a directory that can't be accessed from the web. So, to make your site 'unreadable', just 'deny all' and then 'run' some php that takes the users request (incoming URL) and generates the page for them

b) All 'links' on your pages that refer to OTHER pages (or files) on your site will NEVER specify the 'real' directory names (or real file names)

This is easy to achieve using php to create random directory / file names and 'symlink' them to the 'real' path/file. To allow the user to 'bookmark' a page, the symlink will need to be 'preserved' = however this defeats the aim of 'hiding' page locations.
 
Fortunately, .htaccess allows you to specify that a page has 'permanently moved' = and to do so in such a way that the users browser becomes aware of that.
Thus, every time a symlink'd members page bookmark is used, we can 'move' it to another (random) place (and delete the old symlink). This will be 'transparent' to users but ensures that any attempt to 'index' the page will fail

We can 'be nice' to Google etc. by redirecting 'failed' member page symlink URL's to the 'Please register' page

c) Ideally, everything required to display the page should be 'included' by the page generation php. The problem here is the limit on embedded image data - so you will have to add some sort of 'link' to the image and (allow) the user's browser to 'fetch' it - and then 'lock it down' so the user can't 'reuse' the link

When the users browser tries to fetch the linked image, it won't be able to 'go direct' because the image will be in a non-web-accessible directory. So, the page generation php will have to 'read' the image and 'pass it on' to the visitor - and then delete the symlink ..

Page generation

Since 'your' site will never have more than a few hundred 'real' visitors, we can 'afford' the time (and file space) to 'track' each one. This makes it easy to 'spot' any who are 'over speeding' (or repeat offenders looking for CMS vulnerabilities etc.)

When a visitor 'arrives' at the page generator, it first checks the request URL for '?' (GET). If found, then the request page name is checked == if it's NOT one that supports GET then we have a script kiddie and the IP goes to immediate ban (and nothing is returned).
If the name is OK, then we look for a match in the 'outstanding GET list' - if found then the visitor is passed on to the page generator (if not, then the '?' component is dropped)
The php code then looks for the visitors IP file - if none exists, then this is a 'new' visitor, otherwise the last access file date/time will be read.
The page generator imposes a 1.1s delay before completing a page = so if the file date/time found is within the last second the visitor is attempting multiple accesses without waiting for a page to be returned. This would normally be enough to get them 'marked' as a 'script kiddie' and banned
However it's not (quite) that simple - Google (and other search engines) will 'over speed' - so we need to detect the 'bad' from the 'good' ..
So the next filter checks if the request is for a non-existent directory/page. If the page COULD have existed (i.e. it's name structure is 'correct') a 404 is returned. If the name could never have existed, an over-speeder is sent straight down the rabbit hole (a visitor who is not over-speeding is sent a single warning instead - and then get dumped down the hole if they try it again)
Next we check if the visitor is actually requesting one of the (symlink'd) 'rabbit hole' directories (from the robots.txt = bad bots will read the list and go looking for the pages - good bots won't, especially as none of these 'rabbit holes' will be referenced by any 'normal' page) - that's what they will get (a few Gb's of garbage before banning them)
Now we check the HTTP_REFERER contents. If it's an existing visitor and the REFERER is 'ourselves' (or empty = likely a search engine) the page generator creates the page (or sends the file) requested without delay. Any REFERER from a non-UK (or .com) domain is ignored. The rest are passed on.
This is the final step - new visitors are given 10s to click on a 'I'm human' button before the requested page is delivered (and an IP file generated for them), all others are delayed 1.1s and then passed to the page generator. Those who fail to click within 10s are ignored

'Members' pages

These are real pages that are 'marked' in robots.txt as 'don't index'. In theory, Google and other search engines should ignore them, however sometimes Google etc will follow links direct from other sites (rather than going via our own root).

So extra care has to be taken with members pages - we can't just ban the visitor (since it might be Google). Instead we send a 'rabbit hole' page with a 'don't index' header. The bad bots will try to follow the links and will get dumped down the hole - 'good spiders' will respect the setting and ignore the contents (and thus avoid following the links that will get the others banned)

Access to the members pages is via a 'please register' page. This is only shown to visitors who follow the 'link' to 'members only' and are not already 'registered' (i.e. those from 'unknown' IP addresses)

How we protect against :-

The unexpected POST/GET

When a visitor is asked to submit data, save a 'key', derived from the one-way-hash of the visitors IP, somewhere in your sites file system . When the visitor returns the requested data, check the senders IP (i.e. hash it and compare to the list of outstanding requests, if a match found accept the data (and delete the 'hashed IP' key). If no match is found, just drop the visitor down the rabbit hole (after taking the precaution of deleting the GET data).

To impose a time limit, check the date/time when the 'key' file was created against the current date time.
 
If the time elapsed is too long, the visitor is returned to the 'please (re)submit' page (with the usual 1.1s delay).
 
If the time is 'too short' (< 1s) it's rabbit hole time.

Note - the length of the returned data should always be checked first (if it exceeds the field limits, rabbit hole) then 'sanitised' before the actual values are checked - if OK the visitor is passed on, if 'incorrect' the visitor is returned to 'please (re)submit'

Detect and prevent 'over-speeding'

Script kiddies will often hammer your site with 'log-in' attempts (often to PHP 'admin' pages that don't exist). Even worse is the kiddie who stumbles across your real members 'log-in' page and starts sending it the national phone directory as 'user name' and the extended Dictionary of English as the 'password' - and all without bothering to await a response from your web site

So, before delivering any page, note the visitors IP address and 'save' it in a file called '.generating-page'. Then 'sleep' for 1.1s before delivering the page.
 
If, during that time, the same visitor returns, you will discover the visitors address already in the '.generating' file.
 
You can then decide to send the visitor a 'rabbit hole' page (rather than a 'real page') - or a 'moved' / 'off-line' response (before adding his IP address to the 'banned' list).
 
NB. Whilst it's obvious that the visitors IP be removed when the page has been sent, it's also a 'good idea' to remove any addresses older than, say, 2 seconds = otherwise, should Apache fail to send the page for some reason, you will end up with a file full of 'ghosts' :-)

The 'moved / off-line' response

Personally I just 'silently drop' unwanted visitors = it has the advantage of making the script kiddies handle their own 'time outs' (with a bit of luck their Router will quickly fill up with 'pending' requests and fall over). However if you just want to 'put off' someone who already 'knows' you 'exist', you might want to send them a 'this site/page permanently moved' (or just a 'site off-line') response

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN">
<html>
<head>
<title>Index of /</title>
</head>
<body>
<h1>Index of /</h1>
<ul></ul>
<address>Apache/2.2.26 (Unix) mod_ssl/2.2.26 OpenSSL/1.0.1e-fips DAV/2 mod_auth_passthrough/2.1 mod_bwlimited/1.4 FrontPage/5.0.2.2635 Server at www.{my-domain}.com Port 80</address>
</body></html>

[This is the last page in the "Home and Links" topic. Use the Navigation menu, left, to select a new Top Topic]

[top]

How to lock-down your website