All Linux HowTo's

NginX and Rate Limiting Search Bots

This article demonstrates how to rate limit Search Bots with NginX. Our objective is to allow all visitors high speeds while trying to slow down search bots. We’ll limit both kinds of traffic but our priority is to ensure real people have a nicer experience. We’ll be limiting search bots to 5 requests per minute and allow bursts up to 7 requests per minute.

We’re using the following software versions:

  • CentOS 7 64bit version “7.2.1511 (Core)”
  • NginX version “nginx-1.10.2-2.el7.x86_64”
  • PHP-FPM version “php-fpm-5.4.16-42.el7.x86_64”

You can install the above (versions will likely differ) using the following:

yum install nginx php-fpm php

Copy the following into you vhost file. For example “/etc/nginx/conf.d/example.com.conf”.

server {
        listen 80;
        server_name example.com;

        access_log   /var/log/nginx/example.com.access.log;
        error_log    /var/log/nginx/example.com.error.log;

        root /var/www/html/example.com;
        index index.php;

        location / {
                try_files $uri $uri/ /index.php?$args;
        }

        location ~ .php$ {
                #try_files $uri =404;
                include fastcgi_params;
                include fastcgi.conf;
                fastcgi_index  index.php;
                fastcgi_pass   127.0.0.1:9000;
        }
}

Create the Document Root directory and put a test file in it:

mkdir -p /var/www/html/example.com
echo "test" > /var/www/html/example.com/index.php
chown apache.apache 

Make sure your firewall are correct. Here we’re using IPTables “/etc/sysconfig/iptables”:

*filter
:INPUT ACCEPT [0:0]
:FORWARD ACCEPT [0:0]
:OUTPUT ACCEPT [0:0]
-A INPUT -m state --state ESTABLISHED,RELATED -j ACCEPT
-A INPUT -p icmp -j ACCEPT
-A INPUT -i lo -j ACCEPT
-A INPUT -m state --state NEW -m tcp -p tcp --dport 22 -j ACCEPT
-A INPUT -m state --state NEW -m tcp -p tcp --dport 80 -j ACCEPT
-A INPUT -m state --state NEW -m tcp -p tcp --dport 443 -j ACCEPT
-A INPUT -j REJECT --reject-with icmp-host-prohibited
-A FORWARD -j REJECT --reject-with icmp-host-prohibited
COMMIT

Add the following to your “/etc/nginx/nginx.conf” file.

map $http_user_agent $isbot_ua {
        default 0;
        ~*(GoogleBot|bingbot|YandexBot|mj12bot) 1;
}
map $isbot_ua $limit_bot {
        0       "";
        1       $binary_remote_addr;
}

limit_req_zone $limit_bot zone=bots:10m rate=1r/m;
limit_req_zone $binary_remote_addr zone=one:10m rate=10r/s;

limit_req zone=bots burst=2 nodelay;
limit_req zone=one burst=15 nodelay;

The above was sourced from “http://alex.mamchenkov.net/2017/05/17/nginx-rate-limit-user-agent-control-bots/”

The following is my complete “/etc/nginx/nginx.conf” file looks like this:

user nginx;
worker_processes auto;
error_log /var/log/nginx/error.log;
pid /run/nginx.pid;
events {
    worker_connections 1024;
}
http {
    log_format  main  '$remote_addr - $remote_user [$time_local] "$request" '
                      '$status $body_bytes_sent "$http_referer" '
                      '"$http_user_agent" "$http_x_forwarded_for"';
    access_log  /var/log/nginx/access.log  main;
    sendfile            on;
    tcp_nopush          on;
    tcp_nodelay         on;
    keepalive_timeout   65;
    types_hash_max_size 2048;
    include             /etc/nginx/mime.types;
    default_type        application/octet-stream;
    include /etc/nginx/conf.d/*.conf;
    server_tokens off;
    map $http_user_agent $isbot_ua {
            default 0;
            ~*(GoogleBot|bingbot|YandexBot|mj12bot) 1;
    }
    map $isbot_ua $limit_bot {
            0       "";     
            1       $binary_remote_addr;
    }
    limit_req_zone $limit_bot zone=bots:10m rate=1r/m;
    limit_req_zone $binary_remote_addr zone=one:10m rate=10r/s;
    limit_req zone=bots burst=2 nodelay;
    limit_req zone=one burst=15 nodelay;
}

Restart NginX and PHP-FPM and test it:

systemctl restart nginx
systemctl restart php-fpm

Either configure DNS (ideal) or add an entry to your local ‘host’ file to make sure you can browse to your new web server.

Resources:

  • http://alex.mamchenkov.net/2017/05/17/nginx-rate-limit-user-agent-control-bots/

Similar Posts: