Recent Posts

    Authors

    Published

    Tag Cloud

    How to configure the ideal stSoftware server cluster?

    Network layout for a fully redundant fault tolerant stSoftware server cluster.

    Overview

    Best practice network design for a highly scalable, distributed web system has:-

    Network Layout Design 

    Ideal server cluster

    ( source diagram)

    DNS setup

    The DNS for yoursite will have two (or more) IP addresses, one for each web server. This is known as a DNS round robin

    We also define a direct access host name per server for health monitoring

    Example configuration for yoursite.com

      • www1 -> 10.0.0.1
      • www2 -> 10.0.0.2
      • www -> 10.0.0.1, 10.0.0.2

    Ideally the static IP of one server would be from one network provider and the other server IP would be from another network provider. This means we are not reliant on one network provider.

    DMZ Firewall

    The firewall only allows the ports HTTP & HTTPS (80 & 443) through to web servers in the DMZ, all other ports are closed.

    The firewall is the first line of defence from a DOS attack the firewall will be configured to drop concurrent requests from one IP if a threshold is exceeded. We recommend 50 concurrent requests from one IP to be a reasonable limit. Given that one browser will only make 3 or 4 concurrent requests but users behind a proxy will be seen as one IP address. A limit of 50 allows for a staff meeting at your local Starbucks but still protect against a simple single node DOS attack.

    See sample configuration of iptables to prevent DOS attacks.

    We install the package “fail2ban” with a custom configuration script which monitors the web server access logs for well known hack attempts. When a hack attempt is detected the IP address of the hackers system is automatically blocked for 10 minutes.

    Please note: The “fail2ban” module will need to be disabled when/if penetration testing (PENTEST) is being performed as the tester will be locked out as soon as they run a script looking for well known issues.

     

    DMZ

    Web server

    The web server run by a low permissioned user "webapps". The DMZ firewall has redirected the high permissioned ports from HTTP/ HTTPS ( 80 & 443) to a low permissioned port for example 8080 & 8443. The low permissioned user "webapps" has a group of "nobody". 

    The web server can only access the data, files and message server through the "green zone firewall".

    The servers in the DMZ do not store ANY client data or files. They can be restored from backup or completely rebuilt without the loss of any client data. The DMZ servers are considered "disposable", the web servers have as much CPU and cache as possible, and the disk space is only used for caching reasons. 

    All Linux servers are locked down to the best industry standard.

    Recommended Specifications
      • 300 gigs of disk space.
      • 32 gigs of RAM
      • 8 CPU cores 

    Green zone Firewall

    The "green zone firewall" will be configured to open the database, message server ( port 61616) and SFTP (22 port) from the DMZ to the "green zone" storage servers.

    Green zone

    Storage Server

    The storage servers is where the data is stored and must be backed up as regularly as possible, the disk drives must be as reliable as possible, the web servers cache the files as required so disk speed isn't a large concern.

    JMS

    The JMS servers are configured with failover transport with a bridged connection between the two JMS servers. The definition of the JMS server is entered into the aspc_master database in the table aspc_server.

    Worker

    The application worker process for all background events processing. 

    Database

    The actual database storage. The database is set up with "Multi-master replication" with ZERO latency or in a traditional Master slave replication

    To date the only database tested and supported Multi-master replication is Oracle RAC. The system relies heavily on optimistic locking to handle multi-server bidding and processing, it is very important that the database is 100% ACID with no latency conditions. If the link between data centers goes down ( last remaining single point of failure) we need to take one of the data centres off line and then tell Oracle RAC not to sync. When the link between data centers is back up and running then we need to do a full backup restore to the database that was off line.  

    Our system relies heavily on the consistency of the database. We ask questions like "what is the next invoice number?" we can't have the same invoice number been given to two application servers no matter what. This is quite a complex task for multiple master database replication. 

    For other supported databases (Postgres,MSSQL or MySQL) a traditional Master-Slave database replication is supported. This configuration does mean that there is the need for manual intervention to swap from the master database to the slave database in case of outage. 

    Note: There are cheaper database solutions that claim multiple master replication but there is qualifiers around the word "ACID". If there is any qualifiers on the word ACID we don't support it.

    File Server (SFTP)

    All raw files which are compressed and encrypted are stored on a series of SFTP servers. The default file server which is defined in the aspc_master database in the table aspc_server table will be defined to have the connection details for both file servers. When a new file is uploaded to one of the web server, the web server tries to write to both file servers. As long as the write to one file server is successful then the client file upload is treated as successful. 

    There is a periodic task to sync any file that was successfully upload to one file server to the other. The system will automatically heal a file server that has a missing raw file as the missing files are discovered. This allows a file server to be recovered from backup as long as the redundant server has the full set of files since the backup being restored. 

    Recommended Specifications
        • 2 TB of disk space.
        • RAID 10
        • 16 gigs of RAM
        • 4 CPU cores