logo
background
 Home and Links
 Your PC and Security
 Server NAS
 Wargames
 Astronomy
 PhotoStory
 DVD making
 Raspberry Pi
 PIC projects
 Other projects
 Next >>

Securing your website

Website security
Note that this ONLY applies to websites run by the Open Source 'Apache' web hosting software (with php). Those using Microsoft hosting (with asp) should look elsewhere.

NB many 'off the shelf' website packages (eg Wordpress) limit your ability to change things like '.htaccess'. Whilst this means they have 'basic' security 'built in' already, they limit your ability to do things like 'ban' visitors by their IP address

How to make life harder for the website 'hijacker'

Almost all 'hijacks' are performed via the web and achieved by using your web-hosting software (Apache) to do the hijackers bidding (in much the same way as you use Apache to administer your website via 'cPanel' etc). The first step is thus to instruct Apache 'not to let them in' (easy) and the second is to restrict the files that can be 'accessed' by apache (much harder)

You can limit where the visitor is 'coming from' by checking their IP address (or the 'domain' their IP address is registered at). You can limit the 'content' they can 'see' by setting access 'rights' on your directories (and by careful directory structure layout) - you can even set a 'user name & password' on a directory or file

Since web users can be as 'anonymous' as they wish (using Proxy Servers and the TOR network), banning visitors based on 'who they appear to be' will never 'catch' the determined hacker (but may well prevent legitimate visitors accessing your pages). So the next step is to monitor what the visitor is doing and ban those who appear to be 'up to no good'

For example, those attempting to access the 'admin' pages (and especially those attempting to 'probe' for admin pages that don't exist) are plainly 'up to no good' and should be 'blocked' without warning. Blocking 'Mr Hacker' on his first (failed) attempt will prevent thousands of other attempts, on one of which may well succeed !

Most hackers run 'automated' tools (scripts) that scan for hundreds of vulnerabilities at a time. By noting the exact time of each visit you can monitor each visitors 'page request rate' - any visitor asking for pages faster than (say) 5 a second can't be human and you could slow them down by adding a delay before responding - or even ban them outright (this, to some extent, even makes it possible to resist a 'denial of service' attack).

Of course many hackers use 'bot nets' (hundreds or thousands of individual PC's whose 'owners' have no idea that their PC has been 'taken over' by criminals), so if your site really attracts the interest of a criminal you may soon discover yourself blocking hundreds (if not thousands) of individual IP addresses

Needless to say, if you want search engine 'spiders' (eg Google) to 'index' your pages, you can't just 'blanket ban' everyone who asks for a page that doesn't exist (or asks for more than X pages a second)

Using .htaccess to control access

You can place a .htaccess file in each directory and this will control what Apache does when a web user visits that directory. To reach a 'sub-directory', a visitor has to 'get past' each .htaccess in the path to that directory.

So you start in the domain 'root' with the 'generic .htaccess' blocking unwanted visitors by IP addresses (or by CIDR (= a block of IP addresses)) or even by TLD (Top Level Domain = the Country 'code' like '.cn' for China, '.kp' for N Korea) - however the average individuals website is a lot more likely to attract as 'hacker' some bored US University script-kiddie (and not some state-sponsored cyber-warrior :-) ).

If the visitor 'passes' the root and wants to visit a specific sub-directory - for example your 'members pages' - you can impose more specific limitations = for example if the visitor is not coming from a known members home address you could send them a 'registration' page (or ask for their user name/password)

It takes time to process .htaccess with a long list of 'banned' addresses using 'if (a or b or c or d or e ... or z) then banned' since Apache has to go through the entire 'or' list from a to z for every visitor

So 'structure' .htaccess to 'deal with the most common case first' = 'everyone is banned EXCEPT if ('white-list' of 'known good' IP addresses i.e. your members, Google & other good search spiders etc). Now Apache can 'exit' the 'if' test as soon as a 'good' address match is found.

If the visitor is not white-listed, your next check is against your 'own' generated 'black-list' (yes, you can add some of the most 'recently active' known abusers, plus some of the 20,000+ known proxy servers and 2700+ TOR nodes and even a few foreign domains (eg .cn, .pk perhaps). With known internet abusers using almost 1/4 million different Domain Names and IP Addresses, there is just no way to 'catch them all' (even if you ban whole hosting domains and country TLD's)

Whilst it is tempting to impose a 'UK only' approach, some ISP's operating in the UK use non-UK IP addresses (typically .com) plus, of course, you will always want to grant access to the US 'search engines' (Google etc.)

Because most web sites will allow search engine 'spiders' access, the clever hacker will try to 'disguise' their 'script' to 'look like Google' etc.

Of course you can check if the 'Google spider' is coming from a Google IP address, however Google often adds new IP addresses (and there are dozens of other search engines you may wish to 'cater for').

Trying to check for search engine IP address is thus another 'never ending' task - a more 'generic' approach would be "if you claim to be a 'known good' spider, then we will allow you in, but only so long as you behave like a 'good' spider" (see 'robots.txt' below)

Limit access to known member 'home' addresses

Most Club/Society websites will want to limit access individuals browsing from their own computers at home - and block anyone coming in from a 'commercial' i.e. 'company' source (which would also include most Proxy Servers)

In theory, this can be done by performing a 'who-is look-up' on the incoming visitors IP address. If that address is a single 'static' address (assigned to an individual), rather than part of an assigned CIDR 'range', then your visitor is coming from a single computer.

Whilst this blocks 'unwanted' (proxy server, TOR net) traffic, it also blocks users of 'pooled' sources (such as Universities, Internet cafes, WiFi 'hot spots' etc etc. - and (of course) many popular ISP's never 'register' the IP address to the individual anyway (due, no doubt, to the speed of customer turn-over) - plus 'bot nets' are single private computers that have been hijacked, so these will be 'allowed in'.

Finally, performing a look-up introduces a delay - so you should only perform the look-up on addresses that you can't 'judge by other means' (i.e. not in the white-list or black-list, are asking for a valid page and are waiting for you to respond (as we will see later, most 'script kiddies' can be detected when they fail to wait for your site to deliver the first page before demanding another ..)

Limit access to the 'registration' page

Why limit access to the 'registration' page ? Well ANY page that supports the 'return' of data to the Server is a potential 'hole' in your defences that can be exploited ! Further, any moronic 'script kiddie' that finds this page may well start 'feeding' it millions of 'known exploit strings' in an attempt to get access

So the 'registration' page should be especially well protected - ideally place it in it's own folder that is reached from a 'button click' on some other page on your site. It can then have a 'private' .htaccess with a 'block all except those coming from the 'button press' page (see HTTP_REFERER)

Blocking Proxy Servers

Whilst you want to block the 'bad guys' hiding behind (commercial) proxy servers, the 'problem' is that many 'good guys' who are simply using 'ad blockers' or 'parental controls' (or other intermediate content controls) can also appear to be 'browsing via a proxy'

One (almost foolproof) way to detect commercial proxy servers is to send a web access request to the visitors IP address. If you get a response, then the source is running a web-server ! Plainly no home computer has any business running web services .. so you can assume they are coming from a commercial source (eg Proxy server)

The problem is, of course, that you have to decide 'how long to wait' for a private computer (that is not running a webserver), to 'not respond'. Since you need to respond to a legitimate visitor (one that fails to respond to the web request), the time-out can't be too long - and inevitably that means you will 'miss' Proxy Servers that are 'busy' when you send the web query - plus, of course, if there is a web server at that address, and it tries to detect if YOUR IP is running a webserver, you could get into an 'infinite loop' :-) [of course that can't happen if you impose a 'one page at a time' blocking rule]

White listing

For a Club or Society website, the 'white-list' is simply a list of members IP addresses that you pass through without further checking. Plainly (to save time) this should include the 'known good' Search Engine spiders (such as Google).

To confirm the visitor is a real human (rather than a script running on a home computer that has been hijacked by some criminal), non-members (who have not been 'blocked' already) might be directed to a 'registration' page (or 'keyhole page' - see later), irrespective of the page they may have 'asked' for, where you ask them to 'register' (usually be entering an email address, completing a 'Captcha' and picking up some password from the eMail you send them).

NB. Whilst you can confirm the email address they gave is 'real' (by sending them some sort of 'validation' code which they would need to enter to complete the registration process) any scammer with their own domain has an almost unlimited number of 'real' throw-away email addresses they can use. Further, due to the volume of advertising spam from some sites - and the fact that some sites actually sell your email address to advertising companies - even 'legitimate' ISP's are now providing 'free throw away (single use) email addresses' for the express purpose of allowing their customers to 'sign up' and then discard that address (to avoid the spam).

Of course you could ask your real members to provide a real address when joining the Club/Society 'in person' - and then check for that address (and only that one) when they 'register' on the website

Start .htaccess in root with 'deny all'

The 'default' condition should be to 'deny' access to everyone. You then 'allow' those in the 'white-list', then 'allow' those NOT in the black-list

# By default, block everything (order deny first means deny unless specifically allowed) Order Deny,Allow Deny from all # above applies to root and, by default, all sub-directories unless over-ridden with their own .htaccess # so each sub-dir that visitor is ALLOWED to access must have a .htaccess set to 'allow from (who-ever)'. # NOTE this is vital, because it's all too easy to forget to create a separate .htaccess for a newly created folder ! # # always allow local access (for testing) Allow from 127.0.0.1 # always allow from your own home address Allow from xxx.xxx.xxx.xxx

To reach a sub-directory, a visitor must first be 'allowed' to access the root. In theory, sub-directories can be set to 'allow all' (which means, "allow all those who get past the root .htaccess") - which is fine ONLY so long as 'Mr Hacker' fails to find some way to delete the root .htacess.

Every folder on your website MUST have it's own .htaccess
 
Each .htaccess should be coded by making NO assumptions about what a previous .htaccess has 'blocked'.
 
This provides 'defense in depth' against both hackers (bypassing or deleting an 'up stream' .htaccess) and your own mistakes (miss-0coding, accidental deletion etc.)

Of course, if you administer your web site 'from the web' it is VITAL that all the sub-directories containing 'admin' scripts are not only be 'password' protected but also be 'locked' to allow access only from your home IP address !

Block country domains (TLD's)

You can block visitors using an IP address registered in a specific country (eg .cn = China), however Apache has to do a 'reverse DNS lookup' to determine the 'source' country (which adds to the response time). Of course, anyone who is blocked this way may them move o to use a UK (or US) Proxy Server (so they will appear to be registered in UK (or US)). So whilst block by Country is a good first step, you also have to block Proxy Servers

# block those from an IP registered in China or other soviet & ex-soviet states
# (place in 'most common' first order - the first to 'hit' will fail the test)
Require not host .cn .kp .ru .by .su .ua .ee .al .am .ao .kz .in

Blocking on HTTP_REFERER

The HTTP_REFERER parameter (should) contain a value indicating 'how the visitor got here' (so, for example, if you clicked on a 'link' from a Google 'search results' page, HTTP_REFERER will state you came from Google.

All 'good' websites 'fill in' this field - so if (for example) a link to your site address has been 'posted' on YouTube, YouTube will 'fill in' the HTTP_REFERER parameter - however there is nothing to stop the users browser from 'stripping it off' due to some 'privacy control' they set. Also, search engine 'spiders' typically don't bother to set this field (it's especially annoying that the Google spider fails to say it's from Google). So, rather than 'pass on good HTTP_REFERER' we have to say 'block on bad'.

Block visitors directed to you from unwanted / 'dubious' sites

Why (you might ask) would a 'dubious' website 'publish' a link to your site (and fill in the HTTP_REFERER field) ? Well, for one, by linking to lots of 'honest' sites they they increase their own 'trustworthiness' in the eyes of the Google 'search engine' rankings etc. = and second, they may be planning to hijack your site (and use the link to fool people into parting with log-in details)

The problem is, you can't just 'ban all REFERER traffic' - for sure you DO want visitors to be sent to you from Google (and the other major search engines - although perhaps not from baidu.cn).
 
# Dump (F = 403 Forbidden) all visitors referred from China and all Soviet / ex-Soviet
RewriteCond %{HTTP_REFERER} \.(cn|ru|by|su|ua|ee|al|am|ao|kz|in)(/|$) [NC]
RewriteRule .* - [F]

Of course, like all other 'header' checking, it only 'works' whilst Mr Scammer isn't taking a copy of the Google etc. headers and using them in his own 'faked' header

Use HTTP_REFERER to limit access to Registration pages

Your 'Registration' page (and any page that can be used to 'up link' data to your site) MUST be in it's own directory that is reached by manually clicking on a link elsewhere in your site.

If a 'script kiddie' finds a page that allows 'uploading', he will bombard it with millions of 'mal-formed data packets' in a effort to find some weakness. The more 'robust' your 'check' code is, the longer it will take to process each access attempt - so whilst he may never get in, your site may well collapse under the weight of the 'hack attack'
 
So add a .htaccess to the Registration directory with a "block all except when HTTP_REFERER = (your own site)".
 
However be aware that this is not 'fool proof' (you still need to impose timing limits (see at end below) and other checks) because the clever hacker, once 'denied', will then just set HTTP_REFERER to whatever you allow in ..

Only allow your own pages to 'download stuff'

When Apache sends a page to a visitor, the visitors browser will automatically 'fetch' the .jpg images etc. that 'make up' the rest of the page. In doing so, the browser will insert your page URL into the HTTP_REFERER header. The same happens when they click on a 'link' on a page to download a text (or other) file. The HTTP_REFERER field thus allows you to identify the page that is supporting the download (see here for an explanation of how this works)

# only permit images etc. to be downloaded by our own http/https pages at our own HOST name
RewriteEngine On
RewriteCond %{HTTP_HOST}@@%{HTTP_REFERER} !^([^@]*)@@https?://\1/.*
RewriteRule \.(jpg|jpeg|gif|txt)$ [NC,F]

NOTE. Above will block those who browse from behind a firewall etc. that strips the HTTP_REFERER from outgoing web requests. Too bad (you can 'enable' those with a blank HTTP_REFERER ( !^$ ), however I would like to bet that 99% of those using a blank HTTP_REFERER are up to no good). NB. search engine spiders typically send blank HTTP_REFERER, but then they should not be poking in your .txt file directories anyway

Below is from here. It allows 'blank' HTTP_REFERER fields and you have to insert your own 'domain' name

 RewriteEngine on
RewriteCond %{HTTP_REFERER}     !^$
RewriteCond %{REQUEST_FILENAME} -f
RewriteCond %{REQUEST_FILENAME} \.(gif|jpe?g?|png)$ [NC]
RewriteCond %{HTTP_REFERER}     !^https?://([^.]+\.)?domain\. [NC]
RewriteRule \.(gif|jpe?g?|png|txt)$ - [F,NC,L]

Read your logs !

Your access log will 'give away' the IP address of anyone who visits your site, along with the page they asked for. Anyone who keeps asking for pages that do not exist is most likely a 'script kiddie' trying to 'probe' for a weakness (to see 'who they are', use one of the 'who-is' sites - for example, closetnoc.org). If you don't like what you see, add them (or the entire CIDR range used by their ISP) to your IP block list

The problem with trying to ban the IP being used by a unwelcome visitor is that they can simply switch to (another) Proxy Server - and there are over 20,000 active Proxy Servers in existence, with hundreds of new ones appearing (and old ones disappearing as they get blocked) every day (so it's a pointless task trying to generate an 'all inclusive' proxy 'IP block list').
 
Instead you will have to adopt a 'country' blocking strategy (it seems that few 'foreign' hackers bother to use an 'appropriate' Proxy =  for example to 'hack' a .co.uk website you might think they would choose a UK Proxy .. however most do not, so banning '.cn' actually 'works') along with code to detect (and ban) based on 'inappropriate behaviour'.
 
Even so, you must limit the length of the 'banned IP' list (I suggest no more than 100 or so), otherwise your web site will slow to a crawl. For some of the techniques I use to detect 'inappropriate behaviour' see "Auto-ban script kiddies" below (many of these tricks will also catch unwanted spiders)

Don't tell the bad guys anything

You should never return '403 forbidden' to banned visitors = that just tells them that 'something interesting' actually exists at that URL .. instead you should simply ignore them (if you feel you must respond, always return '404 not found')

In .htaccess, you can 'override' the standard error messages with your own. eg To use your own 404.php (you must specify the path = '/' means 'in the same folder as .htaccess) :-
 
# use a custom 404 - Page not found msg.
ErrorDocument 404 /404.php
 
# feed the hacker the same for all other errors
# 400 - Bad request, 403 - Forbidden directory, 500 - Internal Server Error'
ErrorDocument 400 /404.php
ErrorDocument 403 /404.php
ErrorDocument 600 /404.php
# 401 - Authorization Required, is not an error unless the visitor has strayed into a 'blocked' folder
 
Whilst you are at it, you should disable the Apache 'ID' string = you have no control over the version of Apache your Host site runs so the last thing you want to do is give away the 'version' (which will allow the hacker to 'target' that versions specific vulnerabilities)
 
# disable the server signature & header info (just say Prod = Apache)
ServerSignature Off
ServerTokens Prod

Note, it's good policy to add some small delay (eg 1.1 seconds) to every page. This more or less guarantees that the average impatient script kiddie will request a second page before waiting for the first .. and whilst he is 'marking' the first page as 'not responding' your code is 'marking' the scrip kiddie as 'banned' (see 'ban high speed access requests' below)

An alternative to dumping the request or issuing a 404 would be to direct the hacker to one of the 'ant-hacker' site 'honey pot' registration pages (for example, http://anti-hacker-alliance.com/registerspext.php).Whilst the 'anti-hacker' site URL is obviously 'not yours', many hackers use automated tools that will 'fill in' the 'registration page' automatically without bothering to actually check the URL

Preventing cross-infection

The 'local LAN' address ranges should all be 'banned' (10.0.0.0/8, 172.16.0.0/12, 192.168.0.0/16) - why ? well should some hacker succeed in getting into some other website running on some other computer at your hosting site, the last thing you want is YOUR website being 'cross infected'

Note that you need to add '127.0.0.1' to the 'white-list' (otherwise you won't be able to perform local testing on your own computer before 'uploading' the website)

Preventing access to the 'white-list'

It's vital that the 'white-list' is not only 'inaccessible' but also 'encrypted' in some way. Should 'Mr Hacker' discover the IP addresses of computers that can access your website unhindered he may well decide it's easier to 'hijack' one of those computers and launch his attack from there instead !

Of course you place the list in a directory that isn't 'referenced' (linked) in any web page and can't be accessed from the web anyway (this is easy - just use  .htaccess to 'require' a user name & password, then set the name and password to some pair of random 64 character strings and forget about it)
 
You then 'one-way' hash' the contents = so if Mr Hacker ever does get access, all he will find is the 'one-way hash' of the IP address and NOT the actual IP address at all !
 
He will also get access to you hash algorithm, so it had better be a 'real' one-way with a decent 'result' length :-)

What's a 'one way hash' ?

A 'one way hash' is an algorithm that generates a more-or-less unique 'signature' of some data in such as way that it's impossible to recover the data from the signature

The simplest example of a 'one way hash' would be to keep 'adding up' all digits of a number until you end up with a single digit (so, for example, 127.0.0.1 is 127+0+0+1 = 128, keep going 1+2+8 = 11, 1+1 = 2, OK reached 1 digit, so the 'hash' of 127.0.0.1 is 2.
 
But so is the 'hash' of 127.0.0.10 (and 127.0.0.100, 127.0.1.0 etc etc)... and so knowing the hash (2) doesn't help much, even if you knew that original data consisted of 4 groups of 3 digits (and that's without 'counting' the '.' (which is ascii code 46))
 
Of course this simple 'add until 1 digit' example does not generate a very unique signature - in fact there is a 1 in 10 chance that any random string of digits will have the same 'hash'. So to use a 'hash' as means of checking that the user has entered the 'correct' password (or is coming from a white-listed IP address) we need something a lot more unique than 'add until 1 digit'.
 
Fortunately many standard algorithms exists that lets us take a 12 digit (32bit) IP address string, 'concatenate' it with a 'salt' (i.e. pad it out with some other especially selected data) and use that to generate a multi-digit hash value that is unique to at least 1 in 2^48.

The fantastic thing about this 'one way' process is that, EVEN IF YOU KNOW HOW THE HASH IS GENERATED there is no way to 'recover' the original data.

If your know the algorithm, the best Mr Hacker can do is to keep 'feeding' it with 'random' values until he hits on one that gives the 'right answer'.
 
This may be OK when being asking for a password, however if the hash is of a white-listed IP address the hacker still has no way of knowing if the 'random' one he found is the 'actual' one (if he finds the 'actual' IP then he can get access to your site by 'impersonating' (or taking over) the PC at that address)

Confirming a club/society members identity

The advantage that a 'real world' Club/Society has, is that members will attend physical meetings and pay real subscriptions. The club can then ask the member for their email address 'in person' and use this in the 'member validation' process on the website.

When a new member joins, you add a one-way-hash of their email address to your website 'validation' list

When the member visits the club site and is asked to register, they enter the email address they gave when joining. Then all you have to do is 'match' the hash of the just entered address against the 'validation' list

For new members, the above should be all that is necessary. If, however, you are asking your existing membership to register on the website for the first time, it's not impossible for some hacker to discover a members email address and register in their place (i.e. before the real member manages to 'get around to it').
 
To prevent this, after confirming the address hash, the website could auto-send an email to that member address, along with a unique confirmation code (and store a hash of the code sent)When the member receives the email, they submit the confirmation code to the website and, if the code hash matches, the website adds a hash of their IP address to the 'white list'

To make detection of ID thieves a little easier, the website could ask the member to choose their first name + initial letter of their surname from a list of those members not yet registered (before asking them to enter their address). If a member discovers their name is 'missing' from the list, the website could provide a "Help, my ID's been stolen" button for them to click

Hiding your pages from the script kiddie

Since they know that a website will often not respond, ever, to an 'invalid' page name, the average 'script kiddie' typically uses a very short 'time-out' before 'marking' a page as 'not found' (and going on to request some other typical page name)

So encourage the script kiddie to both 'over-speed' (ask for too many pages per second) as well as 'miss' a valid page, you can introduce a short 'human' delay (say, 1.1 seconds) before delivering any page - and especially those with 'easily guessable' names (such as 'index')
 
If you do send a 'not found' error, always ensure this has a built-in delay (a human will wait a couple of seconds before starting to reach for the 'page refresh' button / F5 key = a script running on a GHz CPU will not)

Auto-ban script kiddies

Search engine 'spiders' are also 'scripts'. So you need to 'exempt' known 'good' search engine spiders from the 'script detect' code (or you will soon discover that you have banned Google from your site :-) ) - see also 'ban bad search engine spiders' below

Ban on 'invalid page' access

Script kiddies have a long list of default Content Management System and 'administration tool' directory names and php function pages that your site could be 'hosting' - and especially those containing php code which is 'known to be vulnerable'. Since these directories and pages are not 'liked' to the rest of your site, they have to 'ask' for them 'blind' (typically by using a 'HEAD' request to save time). Should they find such a page, they can then send it various 'malformed' parameters in an attempt to poke holes in your site. So any visitor that tries to access anything in a directory with a name starting, for example, Joomla, e107 and especially 'phpMyAdmin...' is plainly 'up to no good'

Keeping your own list of 'admin tool directory names' and 'known vulnerabilities' is pain. Instead you can just ban anyone trying direct access to anything in any directory that does not exist (or exists but is 'locked' to your own IP address)
 
On my own site, I go one step further. All the 'private' directories start with a unique two character code. So, when .htaccess checks the visitor 'request' (URI), if it finds the unique two character code in the requested path, that visitor will quickly find themselves banned (actually, they won't 'find' anything .. because my website does not waste time responding to these types of hacker at all)

If you feel 'auto-ban on invalid' is a bit harsh, you can create a few 'trap' directories (with names that 'match' the 'defaults' = like "phpMyAdmin-2.11.10.1-english") and just ban anyone who tries to access them ...

RewriteCond can be used to detect if the visitor is asking for a directory (-d) that (does not !-d) exist or if a (non-zero -s) file (does not !-s) exist. You can treat each case separately (test directory first, of course)# Catch request for a non-existing directory
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule  ^/$  /noDir.php  [L]# catch non-existing files (note, 0 byte files will pass)
#RewriteCond %{REQUEST_FILENAME} !-f# catch non-existing files, including 0 files
RewriteCond %{REQUEST_FILENAME} !-s
RewriteRule  ^/$  /noFile.php  [L]

Ban on invalid 'GET'

If you have any pages that use the GET method (data in URL after ?), some script kiddies will attempt to 'feed' that page (or just any valid page they find) with 'escaped/encoded instructions' or overly long data strings designed to cause a 'buffer-overflow' etc. (anyone who sends 'embedded instructions' should be banned permanently)

Needless to say, as a rule, you should only use POST, never GET.
 
If you never use GET and detect a '?' in the incoming URL at the .htaccess stage you can call php to add the visitor to the banned list immediately.
 
Since I want to allow users to 'bookmark' a 'text expanded' page (where I use GET to specify what text to expand), my 'ban on invalid GET' code is slightly more complex (essentially anyone who sends anything after the ? other than expected 'text file name format' in a directory that supports 'text files' will get themselves banned = remember this can be done at the .htaccess level, so you don't need to run any code)

Ban 'high speed' access requests

Script kiddies want to get through the boring 'find a hole' part of their attack as fast as possible. So anyone clever enough to stick to probing 'known to exist' directories will often still try to 'scan' these (by asking for 'typical' page names - such as 'index.php') as fast as possible. Their 'clever' scripts don't normally 'waste time' waiting for your site to respond (or, if it's clever, not respond at all) with a 'not found' before 'asking' for the 'next typical' page name

Your site will typically impose a 'wait' before delivering a page to an 'unknown' visitor (eg 1.1 seconds). This has minimal effect on human visitors but more or less guarantees that the script kiddies code will 'run on' to asking for the 'next' page
 
If you make a note of a visitors IP address (and the exact time of their visit), it's a relatively easy task to automatically ban the IP address of anyone who, for example, attempts to access more than 10 pages in one second.
 
Visits 'known good' Search engine 'spiders' - such as Google - must not be speed limited (or they might give up and assume your site is 'down') nor blocked (of course) as 'good' spiders are likely to ask for 'all' your pages one after the other in a very short time == and you don't want to ban them (but see 'bad spiders impersonate good' below)

Auto ban 'bad' search engine spiders

Whilst it is tempting to create a 'white-list' of search engine spiders, there are just too many 'legitimate' ones to keep track of - and trying to keep a 'bad bot black-list' up to date is even worse ! So whilst you should ban some specific ones (do you really want to be listed in China ?) a rather better way is to lay a 'spider trap'. 'Good' search engines will check the contents of your 'robots.txt' file (which will be in the 'root' of your site) and NOT visit directories that you list as "don't visit" even (especially) if your pages contain actual 'links' to these directories

The problem with 'allow good' is that the bad boys can impersonate the 'good' spiders (it's a trivial matter for Mr Hacker to copy, for example, Googlebot user-Agent string and paste it into their 'bad' spider heading)
 
So create a directory called, for example, 'admin' (or 'members_email'), add this to the robots.txt with a "Disallow:" setting, sprinkle a few hidden references to the rabbit-hole folders in your 'real' pages and wait.
 
Any spider that attempts to access '/admin' (or 'members_email') can then be banned, no matter 'who' they claim to be. The beauty of this is that to avoid the trap, the 'bad bot' has to 'obey' the robots.txt 'Disallow:' entries !

Controlling what's 'visible'


If 'Mr Hacker' can get a listing of your php code, he can work out how to 'drive through it' - for example, knowing what proxy server IP addresses are 'banned' (they should ALL be) means he can choose one that is not (yet) banned - and knowing what 'defences' you have against eg. multiple password retry attempts allows him to set his attack 'just below' the threshold

Note also it's a common mistake to 'allow' Googlebot into your 'sensitive' directories .. and then wonder how Mr hacker found them ! (yes, all he needed to do was search on Google :-) )

The first thing you should prevent anyone seeing (i.e. reading) is the contents of your .htaccess file !

# Block visitors from reading the htaccess file
<Files .htaccess>
order deny,allow
deny from all
</Files>

Use .htaccess to prevent directory name and content 'listing'.

# prevent web visitors indexing this site
Options -Indexes
# prevent directory content listing
IndexIgnore *

Divide your site into 'directories' with restrictive access settings

You start by having a 'clean' (mainly empty) 'root' - if the only files in the root are .htaccess, robots.txt and the (default) index.php, it's a lot easier to 'spot' unwanted files 'inserted' by a hijacker.Indeed, since you can set .htaccess to run php code before checking blacklists etc. you can even 'auto delete' any unwanted files from the root every time some-one visits your site (see at end, below)

It goes without saying that every page on your site must be a .php page. Further the actual php code that 'constructs' the page should be kept separate to the 'raw' page text

The 'raw page text' has to be in a directory that is 'accessible' (i.e. at least 'read only') by the web visitor. However your 'page construction' php can kept in a directory that is NOT readable (in fact, not 'user' accessible at all - this can be achieved by setting it's .htaccess to 'deny all' - or if you want to waste the intruders time, you can 'require' a specific user name/password (eg 'require user Fred') without actually specifying 'Fred' in the password file).
 
Note that directories containing files that need to be 'fetched' by the users own web browser (eg .css (page style definition) files, 'raw' html text and any 'included' images (anything referenced using href="..") or anything you want to offer as a download (eg .txt, .pdf) files must be accessible (readable) from the web.
 
This means, for example, that if you want to 'hide' your .css directory / file names (and hold them in an inaccessible directory), then instead of using an 'include' command (which means the directory has to be readable by the visitor) your page generation php could 'read' the .css from a 'deny all' directory and then 'write' (echo) the contents directly into the page returned to the user

Since all your 'normal' web-page directories will be 'readable' from the web, you need to keep sensitive files (such as the banned list) in a totally separate directory (if any of the files are writable, the directory needs to be 'group' writable i.e. writable by Apache, never by the normal 'user')

Use robots.txt to trap bad spiders

Legitimate search engine spiders will 'honour' the contents of your robot.txt (so sure your have one in the 'root' of your domain, which is where the spiders will expect to find it)

Robots.txt can be a 'two edged sword' .. clever bad bots will always check the contents to find the names of directories you want to keep hidden !However, as mentioned later, if your robots.txt contains lots of 'trap' paths, the 'clever' bad bots will quickly get themselves banned
 
# Robots.txt contents
# 'User-agent: *' means applies to all spiders (remember - fake user-agent strings means you can't tell good from bad)
User-agent: *
Disallow: /my_admin/
Disallow: /my_accessControl/
The 'sb and an underscore' folders are (also) 'traps', just in case the bad bot guys get really clever (and spot how all my 'real' folder names are constructed)
Disallow: /siteAdmin/
Disallow: /pageAdmin/
Disallow: /pageArt/
Disallow: /pageCSS/
Disallow: /pageDefinitions/
Disallow: /pageImages/
Disallow: /mailLists/
Disallow: /mailLists_old/
Disallow: /mailLists_new/
Disallow: /whiteLists/
Disallow: /blackLists/
You can add any number of 'tempting' folder names, however the more you add the more likely the (clever) script-kiddie will realise what you are up to :-)

Your 'trap' folders (if they exist) should always have a 'default' page (index.php, or 00_index.php) in them so the bad spiders have something to chew on (if you don't ban them instantly, you could send them some random mix of text from elsewhere in your site .. sprinkled with a few fake links to non-existent pages in the same folder - or some fake email addresses perhaps = see 'Misdirecting bad spiders' below)

Note that if you implement a 'keyhole' approach (see below), none of the 'trap' folders need actually exist - alternatively, just create the names and 'link' (symlink) them all to the same real folder symlink

Use php to generate lots of dummy folders

Your 'trap generation' script can create lots of tempting directory names for the bad bots to follow and '#link' them all to the same physical folder.

In the php, use the command "symlink ($targetFile, $dummyFile )" to link a dummy file name to an actual target file name
 
You can detect incoming requests for symbolic linked directories using .htaccess and redirect them. If .htaccess is deleted, the link test will no longer exists and the visitor will 'drop through' to the real directory
 
'-l' (is symbolic link)
Treats the TestString as a pathname and tests whether or not it exists, and is a symbolic link.

Members pages should never be 'indexed'

As Google point out, even if they 'obey' your robots.txt, Google can still end up listing your "don't index" pages because some OTHER site has grabbed them and displays them

All 'member only' web pages should contain the 'don't index' meta-tag in the < head > section :-

[top]

<meta name="robots" content="noindex" >

Directory and file access permissions (chmod)

On your web server, use 'chmod' to control basic directory and file access permissions. Maximum security is set by using the minimum permissions

Apache (and php) needs read access to your web content files (so they can be set to 644). Note that php scripting and .htaccess / .htpasswd are 'interpreted' (i.e these files are READ not EXECUTED) so are also set chmod 644

Apache (and php) needs read and execute rights to folders that it has to 'traverse' (traverse == execute), so they can be set to 750

Finally, Apache needs write permissions on a folder that contains files you intend to write (so this directory can be set to 660

Using .htaccess can enable (or, more typically, denies) access "by visitor IP" to the folder (and all subfolders) in which the .htaccess is found.

In order to reach a sub-directory, the visitor (and Apache) must have permission to access all of the higher directories in the 'chain'.

A folder containing files that will only ever be 'written' by your php code can have a .htaccess that 'denies' all visitor access.
 
In fact, by default, all folders should be 'deny to all' - only those containing files that need to be 'referenced' directly by the visitor need be readable. In theory, your whole site (with the exception of the domain 'root' directory) could be 'deny all', so long as at least one php 'page' exists (in the root) that CAN be 'accessed' by the visitor.
 
That 'keyhole' page (see below) would have to contain php code that 'constructs' all the other pages 'on the fly' (php can 'instruct' Apache to read the contents of files in directories that the visitor is denied access to)

Directory (and file) passwords (.htaccess)

The .htaccess file, which controls what visitors can do via the web (i.e. via Apache), allows you to set a 'user name / password' for a directory (and any below it). The 'valid' user ID's are held in the .htpasswd file (.htaccess choose which ID is required for a directory - one (or more) specific names or 'any valid')
If the visitor requests a URL that includes a password protected directory, Apache 'asks' the users browser for their name/password. If the browser doesn't already have that information, it will 'pop-up' a box requesting the user enter the name/password. If the name/password is invalid, the visitor is denied access

For example, your site could consist of 3 directory layers :-
'public'
'members' - where a visitor has to enter a generic ID name and password
'committee' - where a committee member has to enter their own name and individual password.
 
To avoid the need to issue hundreds of user names and passwords, members would all use a 'common' ID, however each of the dozen or so committee members would have their own ID.
 
For maximum security, two separate .htpasswd files would be generated.
 
Note that the 'one way hash' of user passwords held in the .htpasswd file is not 'locked' to the folder = so, to allow the committee to 'log in' to the members section using their committee ID (rather than the member ID), all you have to do is add the committee .htpasswd entries into the members .htpasswd file (both members and committee directory .htaccess would be 'any valid', however committee .htpasswd would ONLY contain the committee IDs)
# .htaccess for protecting this directory (i.e. the one it's in) and all sub-directories AuthType Basic AuthName "This directory is Password Protected" AuthUserFile /path/to/.htpasswd Require valid-user # if a specific user is required, specify their name, rather than (any) 'valid-user'

You can password protect single files (the .htaccess goes into the directory where the file exists). The user name/password is requested when the visitors attempts to access (view, download) the protected file

# .htaccess entry for protected file
AuthType Basic
AuthName "This file is Password Protected"
AuthUserFile /full/path/to/.htpasswd<Files "mypage_protected_file.xxx">
Require valid-user
</Files>

Always remember - if the directory can be 'written', Mr hacker may be able to find a way in (for example, by 'fooling' some other software running on your webserver or on another computer at the hosting site (eg. mySQL Server) to 'do the deed') and delete or overwrite your .htaccess file (and thus remove the web access protection). So you still need to set the directory and file 'chmod' permissions (.htaccess only controls what web visitors can do via Apache, not what other software on other computers at the host can do)

Access via a 'keyhole'

Your root folder .htaccess can call a php file to generate the requested page 'on the fly'. The php being called can be anywhere (even in a directory to which all web users are denied access), however any page content that is 'referenced' (such as images) must be in a web visitor readable directory (or root, since that must be readable to reach ANYTHING)

Data (content text etc) that is 'fetched' by the php and "echo'ed" into the page returned to the visitor does not have to be accessible by the visitor.
 
Running php to 'override' the requested URL doesn't even require use of the .htaccess 'Rewrite engine' (the 'keyhole' php script can end with a 'die' which terminates the page generating process)
 
Note that not all requests will be for pages - if your page contains .jpg (or you want to allow the user to download .txt files) you need to 'pass on through' those URL's .... however the power of the 'keyhole' is that you can simply ignore (die) any URL you don't like the look of - in other words, unlike the redirection of specific pages, you can redirect everything, process what you want and ignore the rest.
 
Of course the 'main' advantage of the 'keyhole' approach is that EVERYTHING goes via a single php - and it's a lot easier to ensure there are 'no holes' in a single script than it is to test and debug dozens of individual pages.NB. Typically you should still 'code' each page 'as if' it was accessed directly - this aids debugging a lot :-)
(also, should Mr Hacker manage to delete your root .htaccess it would be nice if your site could 'repair itself' (see later))

Hiding debug messages

Debugging text messages can give the hacker all sorts of 'clues' (such as the name of the directory where you hold the 'banned IP' list) - so whilst debug text is vital when you are testing new pages on your local server (so enable debug from 127.0.0.1) or trying to debug 'live' pages (enable debug when your own home IP is seen) there is no reason at all for anyone else to get ANY debug text.

Set up a 'global' control variable called eg '$dBug'. Initialise this to 'false'. Check the visitors IP - if it's you (127.0.0.1 or your home IP) set it to 'true'.Make sure all debug 'echo statements start with "if ($dBug) .." (eg "IF ($dBug) ECHO 'debug is on';")

No public access to SQL query code

The potential for 'break in' via SQL Server (MySQL) 'injection' is just too great to use SQL 'for anything' on your 'public' pages.

If, for example, you want to display the 'latest' images posted by members to your (3rd party) 'photo gallery' (the 'code' over which you have no control), then add php code to the members 'post a new photo' page that will copy the latest postings to some accessible directory on the public side

If you REALLY want to show members of the public information from a SQL database, then ALWAYS 'front' the 'query' with your own php code (you must NEVER pass parameters direct from the user to a 'SQL Query'). Of course, your php code should only use a 'read only' database user account

Needless to say, actual SQL log-in should only be performed by php code in one of the non-accessible read-only directories.However, in many cases there is no need to allow any sort of public user input - for example if you want a page to display the 'latest photos', you could call php direct from the .htaccess in that page directory (and have the php check for the latest photos and fetch any new ones into that pages own directory (so the actual page need make no reference to the existence of the SQL at all)

Of course any code that will actually 'write' into the SQL database needs to be accessible ONLY from a password protected directory which in turn can only be reached by members whose IP addresses are 'white listed'

Hiding from eMail 'harvesting' spiders

Spiders (both 'good search engine spiders' and 'bad email harvesting spiders') start by 'landing' on your 'home' page (index.php or whatever you decide to name it) and then follow all the 'links' from that page to all the other pages on your site - and from those pages to any others (not yet seen) and so on until every page has been visited.

From this it's obvious that any 'unlinked' page will not be found - so, for example, if there is no 'button' or link on any 'public' page to '\members\members_area.php', the 'good' spider will never try to visit the members area

NB. you COULD (also) set /members/ as 'do not visit' in robots.txt, however the 'bad' spiders are now clever enough to 'harvest' robots.txt for a list of 'interesting' directories to 'target' !

NOTE If you structure your site to prevent any 'direct' (i.e. user 'bookmarked') access to your pages this may 'hide' your content from the 'script kiddie' trying to 'spider' your site - but is also hides your content from Google and other search engines !

However that's exactly what you require for 'members only' pages = you don't want these to be 'indexed' by anything

Misdirecting bad spiders

To 'misdirect' spiders (and any script that blindly follows links looking for other pages), all you have to do is add a few 'dummy links' in your real pages that the human user will never follow (eg. because they can't be seen - black text on a black image background (or white text on a white image) for example - or because they can't be 'selected' - eg. a 1x1 pixel (or transparent) 'button' placed 'anywhere unexpected' on the page

Of course the 'dummy' pages should actually (or at least appear to) 'exist' (in a directory named something 'interesting' eg 'members_emails' that you have 'marked' in robots.txt as 'do not visit' - so the 'good' spiders don't get misdirected) - and be filled with lots of tempting fake email addresses .. The 'easy' way to do this is to have the 'dummy' page generated totally by php .. which opens the door to dropping the 'bad spiders' into an 'infinite loop' aka 'down the rabbit hole'

Dropping the bad spider down the rabbit hole

When your .htaccess 'spots' a request for ANY page in a your 'rabbit hole' folder (eg '\members_emails\') it runs some php that not only feeds the bad spider lots of fake email addresses but ALSO adds in some tempting (but actually random named) page 'links' and 'tempting' file links (for example, 'all_email.txt', 'addmails.php', 'old_emails.htm', 'new_emails.php', 'verified.lst', 'whitelisted.dat' with some random 'ver' number pre-fixed or post fixed etc etc).

My own site already ignores requests for folders and pages that do not start 'nn_' (and bans the visitors IP).

However a URL that 'passes' this test then gets checked to see if it references a 'real' page (or is fetching a '.jpg' image or downloading a '.txt' file) - and if it does not, the user is sent to the 'rabbit hole' page generator or is simply ignored (my site delivers only three things to the outside world - php generated pages, (jpg) images and text files)

The 'rabbit hole' php generates pages with lots of fake email addresses and with more 'links' to folder and page names that start 'nn_'. This is designed to 'protect' my real folders and real pages from both bad spiders and script kiddies trying to 'guess' hidden page names.

The 'fake' folder names are all referenced in robots.txt = so a 'good' spider will never ask for anything at that path - and fake 'nn_' pages in real 'nn_' folders are NEVER referenced in any real page (the only way a visitor can be asking for a fake 'nn_' page reference in a real folder is because he is following a fake link on an previous 'rabbit hole' page).

The bad spider will gobble up these links and request more (fake) pages - and the 'rabbit hole' php will respond with more fake emails and even more random (fake page) links ... the php will thus 'feed the spider' an infinite series of fake pages filled with fake email addresses and fake links ...Whilst this is all good fun, you should always add a 'delay' before returning each fake page (otherwise your Hosting service might get a bit upset when it discovers it's web server is 100% loaded feeding an infinite series of fake pages to some US script kiddie at the maximum bandwidth supported by your site)

Needless to say, having noted the IP address of the 'bad spider', when they 'give up' asking for pages (or after some time out = I think feeding the spider every few seconds for a day or two should be enough to fill their hard drive) you should just add them to your 'banned IP list

[top]

'NOTE that in the <head> section of all fake pages, the string <meta name="robots" content="noindex"> must appear (this tells Google and other 'good' bots that might 'stumble across' the page not to index it .. whilst at the same time telling the bad bots "here is something I want to keep hidden, so it must be really interesting !"

Detecting and killing 'phishing' files and folders placed on your site

One recent variation in the 'hijack a website' war (in addition to the usual placing of 'spam' up-loaders (such as 'mailer.php') in your root) is to add fake 'payment' code onto your site = plainly, anyone visiting (or referred to) your site silly enough to enter their eBay / iTunes etc. account details will soon find some criminal purchasing all sorts of unwelcome products (porn etc.) on their behalf

Fortunately all iTunes folders contain the string 'apple.com' .. so would be easy to 'spot' (as should those containing 'eBay.com' etc) - and if you have kept your site 'root' clutter free, spotting 'mailer.php' (or similar) should also be easy

The 'first line of defence' is to invoke php code from within .htaccess that will 'auto-delete' any 'unknown' root file or 'unknown' directory. Of course, chances are, if 'Mr Hacker' has gained access to the extent that he is able to write his own files and create his own directories, he will have deleted your root .htaccess (or overwritten it with his own 'empty' file) even if you made it 'system/group read access only'.

So you should invoke the same 'delete unknown files/directories' code from each and every page - this way, any user visiting your site and dropping through to access a real page directly (which will be possible for anyone, if your root .htaccess has been deleted) will help make life difficult for Mr Hacker

On my site I auto-delete anything named '.php' in root EXCEPT that named 'index.php - and also try to auto-delete any folder with a name that contains a '.' in it.

Auto-repair methods

If you code your site so it can 'keep functioning' without a 'root .htaccess', you can add 'anti-hacker' php code to run on each page that checks for the .htaccess file = and if the .htaccess no longer exists (or it's one-way-hash is found to be invalid - or does not match the hash of the back-up held in some hidden (i.e. unlinked) read-only sub-directory), your php could (attempt to) replace the root .htaccess with the back-up (always assuming that the back-up one-way-hash is as expected :-) )

One place to put your 'auto-repair' code is in the normal 'index.php' in the root. If your .htaccess still exists, it will re-direct users to some other default (eg 'my_index.php') = so 'index.php' will only ever 'run' when your own .htaccess is 'lost'

Preventing others using your php

Every 'supporting' php script should check where it's being 'included' from (page and path name). If the names do not 'match' what might be expected (in my case, all paths and pages start with two digits and an underscore) the 'die' command is invoked (this kills the page generation at that point and ensues that the script returns nothing (extra) to the visitor)

Add a 'time limit' to Registration pages

Whilst it (should) be obvious that any IP sending 'completed' Registration forms faster than (say) 1 a second has to be a 'script kiddie', you should also impose a 'expiry' limit (in order to prevent the script kiddie performing 'trickle' attacks) and a 'max re-try' count within that time

My Registration page (and all other 'pages' that return data to the Server) is created by php code in a private folder than (at least in theory) can't be accessed directly from the web at all. Rather, each page is 'generated' by manually clicking a link on a 'gateway' page.

Because the gateway page generates the 'link' using php, instead of 'giving away' the Registration directory name (by returning it's URL), the gateway returns it's own URL with a '?' string. The '?' string is the one-way-hash of the visitors IP address which will have been used to create a file hidden elsewhere in the directory tree.

The .htaccess in the 'gateway' directory looks for the '?' and, when it's found, directly 'runs' the Registration php which handles the registration process (this lets you put the registration php code in a directory that can't be accessed from the web at all).

The registration php starts by generating a one-way-hash of the visitors actual IP address - if this fails to exactly 'match' the '?' value, the page is dropped (with a 'die'). Next the 'hidden file' is checked and the current date/time compared to the file date time.

If the time elapsed is 'way too short' the page is dropped, the user is added to the 'banned by IP' list and the 'hidden file' deleted.If the time is 'too long' (say > 15 mins) the user is sent back to the 'gateway' page with an 'expired' status (and the 'hidden file' deleted).

If the time is neither too short, nor too long, the hidden file is checked for the number of 'failed attempts' - if this exceeds N, the user is sent back to the 'gateway' page with a 'too many failed attempts' status (and the 'hidden file' deleted).

If none of the above apply, the 'hidden file' count of 'failed attempts' is updated and only then is the POST data processed (the file is updated first in case some mal-formed POST data causes the script to crash)If the POST is OK, the user is 'allowed in' (and the 'hidden file' deleted), if not, and an 'honest' error is detected (data not entered, incorrect email address format), an appropriate Error msg. is returned.

If the POST contains 'scrip kiddie' inserted data (strings exceeding max. char count (or otherwise 'breaking' input field restrictions), strings containing 'escaped' characters or 'instruction' code etc etc) is found, the page is dropped (die) and the user's IP is added to the banned list

Some final words

1) Don't assume that .htaccess is a 'cure all' - the average hacker will always try to 'read' your .htaccess files (so they can 'workaround' the restrictions) and then, if the restrictions can't be 'worked around', they will focus on deleting (or replacing) your .htaccess file(s)

So structure your site in a way that will still 'keep the bad guys out' EVEN IF they have a complete copy of your domain root .htaccess and EVEN If they manage to delete it ..

2) Remember that EVERYTHING you send to the visitor (and ESPECIALLY everything in a Registration page etc) can be 'captured' and examined by 'Mr Hacker' !

Don't give away directory names / paths in comments etc. Avoid giving away full details of anything (eg in a pull-down list)

3) Remember that all images etc. in a page have to be in directories that are accessible from the web !

NEVER be tempted to place 'control data' files in web accessible directories. ALWAYS add a .htaccess 'deny all' to a directory that should never be accessed from the web (you may think they can't reach it, but remember that .htaccess files can be deleted - so make it as difficult as possible to reach your hidden sub-directories

4) Finally - be aware that visitors can make mistakes (click on the 'wrong' link, select the 'wrong' item from a list, miss-read a Captcha) and that knowledgeable visitors may be 'curious' - so construct your 'rabbit holes' and 'script kiddie' detection 'traps' in a way that allows a HUMAN visitor a chance to 'exit' before being 'banned'

Q & A

Can I make any directory (even the domain root) totally inaccessible from the web ?

Yes. You can setup a .htaccess in any directory (including the domain root) that you will 'deny all' access but still deliver pages 'from that directory'. This is done by having the .htaccess call a php file (which can then be in any directory, especially one that can't even be 'read' from the web) to generates a complete 'web page' which it sends (ECHO's) to the visitor before 'kicking the visitor out' with the 'deny all'.

The drawback to 'blocking' your domain root in this way is that your entire site becomes 'inaccessible' to 'normal' web browsing (everything sent to the visitor has to be generated by the php, which takes time). Since a page normally contains many image elements needing to be 'fetched' from a readable directory (using the standard 'file reference' approach &let;img src="path/to/image.png" > construct) these requests would also have to be 'intercepted' and handled by the php code (see php commands :-
 
imagetypes() - Return the image types supported by this PHP build
imagepng() - Output PNG image to either the browser or a file
imagegif() - Output GIF image to browser or file
imagewbmp() - Output BMP image to browser or file
imagejpeg() — Output JPG image to browser or file)
 
If all you have are "buttons", "icons" or small thumbnails etc, you could instead embed the actual raw image data directly into the page (&lf;img src="data:image/png;base64,lotsOfAsciiCodedData ..."> for a .png ..). Due to data size limitations, only small images / simple graphics (.png, .gif, .bmp. and small jpg) would be possible (your php would use "base64_encode(file_get_contents())" to generate the embedded data, which will be up to 1/3rd larger than the image ('binary') file size)Opera (and, I would suspect, most Mobile browsers) limits the data to about 4,100 characters. Firefox supports data up to 100Kb (enough for some very decent JPG thumbnails)

Can I serve files (jpg text etc) from a non-readable directory ?

Yes. This can be done by having your page generation php use symbolic links (symlink)

The 'secret' is that whilst the directly requested path/file has to exist in a 'web-readable' directory, the requested file name can be 'symbolically linked' to a real file in a non-web-readable directory (because it's Apache that follows the sym link, NOT the web visitor)

Of course the symlink has to be 'deleted' after the visitor has 'used it' (otherwise anyone could use the same symlinked URL). For some 'how to' tips, see here

Whilst it's said that this can only be locally tested on Windows Vista / Server 2008 or later, see here for sym links on XP (you will need the 'Symbolic Link Driver' to support symlink in php on XP)

How to detect when a symlink is used

It is also possible to detect the use of a symbolic link in .htaccess and re-direct the visitor to some other (web readable) file. This opens yet another possibility to 'detect when .htaccess is compromised'

# detect sym links and send visitor elsewhere RewriteCond %{REQUEST_FILENAME} -l RewriteRule .* symlinked.php [L] # if .htaccess is deleted, visitor will drop through to the symlink and not be redirected

Can I set-up a web page to process it's own data ?

Yes. Such a web page has to first check for data returned by the visitor. If none is found, it then 'asks' (GET, POST) for the data giving itself as the return URL

The danger is that you can too easily drop the visitor into an 'infinite loop' (by asking for data that you then 'reject' for some reason, thus 'dropping them through' to asking for the same data yet again ..)

Can I hide the .php file extension ?

Yes. You can set any .ext (or none) to be 'interpreted' as a php file. This allows you to 'hide' the fact that a directory does not exist (and that all other elements in the path URL are parameters to the php file) by forcing the file type

To setup a fake 'directory' eg 'members', generate a (php) file called 'members' (without the .php extension). Then in the root .htaccess add the code :-

<Files members> ForceType application/x-httpd-php5 </Files>

When users use the URL www.domainRoot.co.uk/members/emails/verify.php, what will happen is that when Apache finds 'members' in the root, it will execute the file as php (and the php code will then process '/emails' and '/verify.php' as two parameters). No actual emails folder need exist (and no verify.php need exist either)

On the other hand, when Apache sees www.domainRoot.co.uk/public/members/emails/verify.php it will navigate down the path /public/members/emails/ and look for the file verify.php (unless, of course, .htaccess in public or members or emails tell it to do something else)

Can I have .htaccess tell robots not to index a (or this) directory ?

Normally the 'don't index' directory list goes into the 'robots.txt' in the root. However if lots of other pages 'out there' refer to one of your pages in a directory you don't want indexed, Google may still contain references to it. You can have .htaccess in that directory automatically add a 'do not index' header to every page issued from that directory as follows :-

#add a do not index header to every page in this directory Header set X-Robots-Tag "noindex, noarchive, nosnippet"

Can you tell Google that (all) pages vary when accessed from a tablet etc. ?

Yes - just get .htaccess to add the required header to every page (visit here for a good explanation)

Header append Vary User-Agent

The pages in this topic are :-

  + Locking down your website == Latest changes (modified 21st Apr 2017 06:15.)


Next page :- Locking down your website

[top]