Night Hour

Reading under a cool night sky ... 宁静沉思的夜晚 ...

Developing an Nginx URL Whitelisting Module

Growing tree

Premature optimization is the root of all evil. ,
Donald Knuth.


29 Oct 2019


Introduction

One of the challenges of securing web applications and websites is preventing the accidental exposure of sensitive parts of an application or website, such as administrative interfaces. A common technique is to blacklist an application path and prevent access to resources starting or matching with that path. Other techniques include disabling unneeded administrative interfaces, or removing unwanted features. This article shows how to develop an Nginx module that allows access only to whitelisted URLs or web resources. Any URLs that are not in the whitelist will be blocked.

A whitelist approach offers good protection as all resources are denied access by default. A security administrator or web developer has to explicitly whitelist each web resource to enable access. Whitelisting can also mitigate against application frameworks that exposes sensitive interfaces by mistake, such as the case of vulnerable Spring Boot actuators. It can also mitigate against accidental uploading of sensitive files to a website.

Whitelisting may sound tedious, but for a web application or web based api, the developers will know exactly what are the web resources that should be accessible by normal end users. The quality assurance testers will also be testing the known end points and functionalities accessible by users.

It will be relatively easy to create a listing of such URLs to be whitelisted. A whitelisting approach may not be suitable for all use cases, but it will prove useful in cases where security is essential. It can greatly reduce the attack surface of a website or web application.

Design and Approach

The diagram below shows the URLs for a web application. It can be seen that the URLs that should be accessible by end users are a small subset of all the available URLs.

URLs of an Application
Fig 1. URLs of an Application

Some traditional enterprise middlewares and application frameworks are complex and have a fair number of interfaces exposed. Many of these interfaces are not meant for user access.

Applications can also have administrative consoles, management interfaces, status monitoring features and internal APIs that should not be publicly accessible. It will be prudent not to expose the URLs that users should not access.

Nginx can be configured as a reverse proxy with the URLs whitelisting module enabled. Any URLs that are not explicitly whitelisted will be blocked. The following diagram illustrates this.

Nginx Reverse Proxy with URLs whitelsting
Fig 2. Nginx Reverse Proxy with URLs whitelsting

The "/" on www.myapp.com is whitelisted, and the /store is whitelisted. Access is granted to these 2 URLs. When an access attempt is made for /admin, it is blocked and HTTP 404 (Not Found) is returned.

One of the things to consider when building the whitelisting module is the data structure for the whitelist. Nginx exposes a HTTP Request structure (ngx_http_request_t) to modules. It contains a uri field (ngx_str_t) holding the URL string starting from web root. The whitelisted URLs can be stored as an array of Nginx string (ngx_str_t) and a comparison be done with the uri field in a loop.

The problem with using an array is that if the number of whitelisted URLs are large, many comparisons will be required. For performance, using a hash table will ensure a faster lookup. But memory requirements and the hashing function have to be considered. For the case here, we will use a tree like data structure (trie), just like the file directory tree. It should offer a faster lookup than an array of URL strings. We also don't have to worry about hashing function or allocating memory for hashing tables.

The following diagram illustrates this.

Directory like Structure
Fig 3. Directory like Structure

A URI or URL is broken into parts, starting from the root '/'. The root has children either sub directories or files. Each sub directory again has its own children, either files or sub directories. A sub directory ends with a "/"; for example, "scripts/".

To whitelist http://www.nighthour.sg/articles/index.html. The hostname portion is not included, the syntax starts from the webroot "/", follows by "article/" and then "index.html". The module requires a URL syntax like this

/articles/index.html

This will be further broken down into the following parts in the tree structure.

/ articles/ index.html

If a URL ends with a subdirectory, a trailing forward slash is required when specifying to the module that the subdirectory should be whitelisted and accessible. For example, https://www.nighthour.sg/articles/. The URL syntax required for the module to whitelist this will be

/articles/

This will give the following parts in the directory tree

/ articles/

If a webresource ends with a file, the trailing slash is not required. For example, https://www.nighthour.sg/myapi/myapplication. The URL syntax for the module will be

/myapi/myapplication

This will give the following parts in the tree.

/ myapi/ myapplication

We will use a node structure that represents a part of the URL. A node may have other child nodes. A node contains a string holding its path segment, example "/" or "scripts/". Using this we can build a tree like structure that can represent all the whitelisted URLs of a website or web application.

For each HTTP request, the module compares the uri string against the tree structure, part by part. Once a part doesn't match, we know it is not in the whitelist. A directory tree like structure minimizes the comparisons required. Conversely, if all parts matched, then it is in the whitelist and access should be granted. The module returns a HTTP 404 (not found) error for URLs that are not whitelisted.

Note that the module doesn't compare against the URL query string or query parameters. For example,

https://www.nighthour.sg/myapi/myapplication?queryid=338899&type=abc

The portion starting from the question mark is the query string. This is not used by the module when checking the whitelist. When specifying the URL syntax for the module to whitelist; do not include the query string.

The URI whitelisting module can be used on a site hosted directly by nginx or with nginx configured as a reverse proxy. The reverse proxy option is particularly useful as an additional layer of protection for web applications or api end points.

Extensions Bypass Feature

A web application or website may have a lot of static assets like images that are for public access. If there are no other sensitive images present on the application or website, it may be convenient to have a way to grant access to all images or all files that end with a specific extension. The URLs whitelist module has a directive that caters for this.

The wh_list_bypass directive. It can take a list of file extensions such as "jpg", "png", "svg", "gif", "webp" etc... as arguments. Any web resource that ends with one of the specified extension will be granted access by the module.

This directive should be used carefully. Although it is a convenient way to grant access to URLs ending with specific extensions, it is also contrary to the strict whitelisting approach.

Implementation

This section will run through the source code of the Nginx URL whitelisting Module. It will not explain the basics of writing Nginx modules. Refer to the Nginx Development Guide for details on Nginx development. Another good beginner resource is Emiller's Guide to Nginx Module Development.

The full source code of the module is available at the Github link at the end of the article.

The code snippet below shows a few macro constants and the node data structure for building the URI tree. NGX_WHL_INIT_CHIDREN_SZ is the number of initial children for each node. The child nodes can be expanded when necessary until the maximum defined in NGX_WHL_MAX_CHILDREN.

NGX_WHL_MAXPATHSZ sets the maximum length for a URL. It is currently defined as 2048. A web administrator may want to reduce this number if he or she is sure that the web application or website does not have URLs that are this long. For example, I can set a value of 100. Any URL that exceeds 100 in length will be blocked by the module with HTTP 404 error.

NGX_WHL_MAX_CHILDREN defines the maximum number of child nodes that a parent node can have. NGX_WHL_TH_BSEARCH defines when binary search will be used to find a child node. If the number of children exceeds NGX_WHL_TH_BSEARCH, binary search will be used, otherwise it will just loop through all the children. The child nodes are sorted by qsort when the module loads in the configuration.

NGX_WHL_MAX_NEST defines the maximum number of nested path segments. Example, /myapp/dir1/dir2/, this will have 3 path segments.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
#define  NGX_WHL_INIT_CHIDREN_SZ  8
#define  NGX_WHL_MAXPATHSZ  2048
#define  NGX_WHL_MAX_CHILDREN  65536
#define  NGX_WHL_TH_BSEARCH  6
#define  NGX_WHL_MAX_NEST  10

typedef struct ngx_whl_pnode_s  ngx_whl_pnode_t;

struct ngx_whl_pnode_s
{
    ngx_str_t *segment;
    size_t num_child;
    ngx_whl_pnode_t **children;
    size_t maxchild;
    size_t end_slash_allowed;
};

The following shows the configuration structure of the module. This structure is used by Nginx for storing the configuration options. The uri_tree variable holds the URL tree. This tree is built as Nginx reads in the configuration options.

bp_extens is an array containing file extensions that will be bypassed by the module. A list of extensions such as jpg, gif etc... can be provided in a bypass configuration option. This URI whitelist module will skip URLs with such extensions and allow access. The enabled flag sets whether the module is turned on or off.

1
2
3
4
5
6
/* Configuration struct */
typedef struct {
    ngx_flag_t enabled;
    ngx_array_t *bp_extens;
    ngx_whl_pnode_t *uri_tree; 
} ngx_http_uri_whitelist_loc_conf_t; 

The following shows the code snippet for the module configuration directives. wh_list directive can be set to on|off, to determine whether the module is enabled or disabled. The wh_list_uri directive takes a URL string starting with "/" , these are the URLs that will be whitelisted. wh_list_bypass is for specifying the extensions that will be bypassed by the module.

The functions for handling each directive in the configuration file are specified in this ngx_command_t array as well. ngx_http_wh_list_cfg() is a function to process each wh_list_uri directive and builds up the URI tree. ngx_http_wh_list_bypass_cfg() populates the bypass array with the file extensions that will be skipped by the module.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
/* Module Directives */
static ngx_command_t  ngx_http_uri_whitelist_commands[] = {

    { ngx_string("wh_list"),
      NGX_HTTP_LOC_CONF | NGX_CONF_FLAG,
      ngx_conf_set_flag_slot,
      NGX_HTTP_LOC_CONF_OFFSET,
      offsetof(ngx_http_uri_whitelist_loc_conf_t, enabled),
      NULL },
      
    { ngx_string("wh_list_uri"),
      NGX_HTTP_LOC_CONF | NGX_CONF_TAKE1,
      ngx_http_wh_list_cfg,
      NGX_HTTP_LOC_CONF_OFFSET,
      0,
      NULL },
      
    { ngx_string("wh_list_bypass"),
      NGX_HTTP_LOC_CONF | NGX_CONF_1MORE,
      ngx_http_wh_list_bypass_cfg,
      NGX_HTTP_LOC_CONF_OFFSET,
      0,
      NULL },
      
    ngx_null_command
};

The following are the code snippets for the Module context and Module definition. This article will not go into details on what these are. Refer to the earlier links on Nginx development for more information.

The ngx_http_uri_whitelist_init() function initializes the module after the configuration has been read. ngx_http_uri_whitelist_create_loc_conf() and ngx_http_uri_whitelist_merge_loc_conf() are for creating and merging the configuration structure.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
/* Module Context */
static ngx_http_module_t  ngx_http_uri_whitelist_module_ctx = {
    NULL,                                  /* preconfiguration */
    ngx_http_uri_whitelist_init,              /* postconfiguration */

    NULL,                                  /* create main configuration */
    NULL,                                  /* init main configuration */

    NULL,                                  /* create server configuration */
    NULL,                                  /* merge server configuration */

    ngx_http_uri_whitelist_create_loc_conf,/* create location configuration */
    ngx_http_uri_whitelist_merge_loc_conf  /* merge location configuration */
};


/* Module Definition */
ngx_module_t  ngx_http_uri_whitelist_module = {
    NGX_MODULE_V1,
    &ngx_http_uri_whitelist_module_ctx,       /* module context */
    ngx_http_uri_whitelist_commands,          /* module directives */
    NGX_HTTP_MODULE,                       /* module type */
    NULL,                                  /* init master */
    NULL,                                  /* init module */
    NULL,                                  /* init process */
    NULL,                                  /* init thread */
    NULL,                                  /* exit thread */
    NULL,                                  /* exit process */
    NULL,                                  /* exit master */
    NGX_MODULE_V1_PADDING
};    

The following is the code snippet for the ngx_http_uri_whitelist_init() function. It registers the module handler, ngx_http_uri_whitelist_handler(), to Nginx HTTP Access phase. At this phase of Nginx, the handler can choose whether to accept or reject a HTTP request.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
/* Module initialization */
static ngx_int_t
ngx_http_uri_whitelist_init(ngx_conf_t *cf)
{
    ngx_http_handler_pt        *h;
    ngx_http_core_main_conf_t  *cmcf;

    cmcf = ngx_http_conf_get_module_main_conf(cf, ngx_http_core_module);

    /* Add our module handler to the HTTP ACCESS phase */
    h = ngx_array_push(&cmcf->phases[NGX_HTTP_ACCESS_PHASE].handlers);
    if (h == NULL) {
        return NGX_ERROR;
    }

    *h = ngx_http_uri_whitelist_handler;
    
    return NGX_OK;
}

The following is the code snippet for the module handler, ngx_http_uri_whitelist_handler(). The handler function checks whether the whitelist module is set to enable or disable. If it is disabled, it will pass control back to nginx; otherwise it will proceed to check the URL for bypass file extensions. If an extension matches, it will pass control back to nginx.

The handler then calls ngx_http_wh_check_path_exists() function to see if the URL is in the whitelist URI tree. It returns HTTP 404 error if the URL is not whitelisted. If the URL is whitelisted, control is passed back to Nginx.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
/* Module Handler */
static ngx_int_t
ngx_http_uri_whitelist_handler(ngx_http_request_t *r)
{
    size_t                             i; 
    ngx_str_t                          *ext; 
    ngx_http_uri_whitelist_loc_conf_t  *slcf;
    
#if WHL_DEBUG    
    ngx_log_debug1(NGX_LOG_DEBUG_HTTP, r->connection->log, 0,
                "[URI_WHITELIST]: %V",&r->uri);
    ngx_log_debug1(NGX_LOG_DEBUG_HTTP, r->connection->log, 0,
                "[URI_WHITELIST] extension: %V",&r->exten);
#endif

    if (r->uri.len == 0) {
        return NGX_HTTP_BAD_REQUEST;
    }

    slcf = ngx_http_get_module_loc_conf(r, ngx_http_uri_whitelist_module);
    
    if (slcf == NULL) {
        return NGX_HTTP_INTERNAL_SERVER_ERROR;
    }
    
    if (slcf->enabled != 1) {
        ngx_log_error(NGX_LOG_WARN, r->connection->log, 0,
            "[URI_WHITELIST] : White list module disabled !"); 
        return NGX_DECLINED;
    }
    
 
    /* Check for extensions bypass */
    ext = slcf->bp_extens->elts;
    for (i=0; i < slcf->bp_extens->nelts; i++) {
        
        if (r->exten.len == ext[i].len 
            && ngx_strncmp(r->exten.data, ext[i].data, r->exten.len) == 0) 
        {
            return NGX_DECLINED; 
        }
        
    }
    
    
    if (!ngx_http_wh_check_path_exists(r->uri.data, 
        r->uri.len, slcf->uri_tree)) 
    {
        /* If uri is not present in whitelist */
        ngx_log_error(NGX_LOG_ALERT, r->connection->log, 0,
            "[URI_WHITELIST] : Access Denied for [ %V ] ", &r->uri);
        return NGX_HTTP_NOT_FOUND;
    }
    
                   
    return NGX_DECLINED;
}

The following code snippet are the functions for building up the URI tree. ngx_http_wh_create_node() function creates a new node. ngx_http_wh_add_child() function adds a child node to a parent. If the url path passed in is a single "/", ngx_http_wh_add_child() returns the parent. This is to skip repeated "/" in the URL. ngx_http_wh_add_child() returns the child node either if the child node already exists or it is added successfully to the parent node.

If the parent node runs out of space for storing child nodes, ngx_http_wh_add_child() calls the ngx_http_wh_resize_children() function. ngx_http_wh_resize_children() function resizes the children array of the parent node doubling the capacity each time. The maximum number of children nodes is limited to NGX_WHL_MAX_CHILDREN (65536), defined earlier in the source.

ngx_http_wh_add_path() function adds a URL or URI to the URI tree. It loops through the URL string, breaking it into its constituent parts and add each to the URI tree.

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
/* Creates a path node based on a part of the uri */
static ngx_whl_pnode_t * 
ngx_http_wh_create_node(const u_char* path,  size_t plen, ngx_conf_t *cf)
{
    size_t           sz;
    ngx_str_t        *sgmt;
    ngx_whl_pnode_t  *node; 
    
    if (path == NULL)
        return NULL;
    
    if (plen == 0 || plen >= NGX_WHL_MAXPATHSZ) 
        return NULL;
    
    sgmt = ngx_pcalloc(cf->pool, sizeof(ngx_str_t));
    if (sgmt == NULL) { 
        return NULL;
    }
    
    sz = plen + 1;
    sgmt->data = ngx_pcalloc(cf->pool, sz * sizeof(u_char));
    if (sgmt->data == NULL) {
        return NULL;
    }
    
    ngx_memcpy(sgmt->data, path, sz);
    sgmt->len = plen;
        
    node = ngx_pcalloc(cf->pool, sizeof(ngx_whl_pnode_t));
    if (node == NULL) {
        return NULL;
    }
    
    node->children = ngx_pcalloc(cf->pool, 
        NGX_WHL_INIT_CHIDREN_SZ * sizeof(ngx_whl_pnode_t *));
        
    if (node->children == NULL) {
        return NULL;
    }
    
    node->segment = sgmt; 
    node->num_child = 0;
    node->maxchild = NGX_WHL_INIT_CHIDREN_SZ;
    node->end_slash_allowed = 0;
    
    return node; 
}

/* Adds a uri path to the uri tree */
static ngx_whl_pnode_t *
ngx_http_wh_add_child(const u_char *path, ngx_whl_pnode_t *parent, 
    ngx_conf_t *cf)
{
    size_t           plen, i;
    ngx_whl_pnode_t  *node;
    
    if (path == NULL || parent == NULL) {
        return NULL;
    }
  
    plen = ngx_strlen(path);
    if (plen == 0 || plen >= NGX_WHL_MAXPATHSZ) {
        return NULL; 
    }
    
    /* Ignore additional '/' */   
    if (plen == 1 && ngx_strncmp(path, "/", plen) == 0) {
        return parent;
    }
      
    for (i = 0; i < parent->num_child; i++) {
    /* check if segment path already exists */   
        node = parent->children[i];
        if(node->segment->len == plen && 
            ngx_strncmp(path, node->segment->data, plen) == 0) 
        {        
            return node;
        }
    }
    
    /* uri segment path does not exists allocate new child */
    node = ngx_http_wh_create_node(path, plen, cf);
    
    if (node == NULL) {
        return NULL;
    }
    
    if (i >= parent->maxchild) {
        if (!ngx_http_wh_resize_children(parent, cf)) {
            return NULL;
        }
    }
    
    parent->children[i] = node;
    parent->num_child ++;
    
    return node;    
}

/* Resizes a node children array if original space is insufficient */
static size_t
ngx_http_wh_resize_children(ngx_whl_pnode_t *parent, ngx_conf_t *cf)
{
    size_t           new_sz, i;
    ngx_whl_pnode_t  **old, **new; 
    
    if (parent == NULL) {
        return 0;
    }
    
    new_sz = parent->maxchild * 2;
    
    if (new_sz > NGX_WHL_MAX_CHILDREN) {
        return 0;
    }
    
    new = ngx_pcalloc(cf->pool, new_sz * sizeof(ngx_whl_pnode_t*));
    
    if (new == NULL) {
        return 0;
    }
    
    old = parent->children; 
    
    for (i=0; i<parent->num_child; i++) {
        new[i] = old[i];
    }
    
    parent->children = new;
    parent->maxchild = new_sz; 
    old = NULL; 
    
    return 1;
}


/* Adds a full URL to the uri tree */
static size_t
ngx_http_wh_add_path(u_char *path, ngx_whl_pnode_t *root, ngx_conf_t *cf)
{
    size_t           plen, last, index, nested;
    u_char           *p, c, tmp[NGX_WHL_MAXPATHSZ];
    ngx_whl_pnode_t  *node; 
    
    if (path == NULL || root == NULL) {
        return 0;
    }
    
    plen = ngx_strlen(path);
    if (plen == 0 || plen >= NGX_WHL_MAXPATHSZ) {
        return 0;
    }
    
    p = path; 
    index = last = nested = 0;
    node = root; 
  
    while ((c=*p++) != '\0') {
    
        switch(c) {            
        case '/':
            if (index + 1 >= NGX_WHL_MAXPATHSZ 
                || nested > NGX_WHL_MAX_NEST) {
                return 0;
            }
            
            tmp[index] = c;
            index++;
            
            tmp[index] = '\0';
            node = ngx_http_wh_add_child(tmp, node, cf);
            
            if (node == NULL) {
                return 0; 
            }
            
            nested++; 
            index = last = 0;     
            break;
            
        default:
            if (index >= NGX_WHL_MAXPATHSZ) {
                return 0; 
            }
            
            tmp[index] = c; 
            index++; 
            last = 1; 
        
        }
       
    }
    
    if (last) {
        if (index >= NGX_WHL_MAXPATHSZ
            || nested > NGX_WHL_MAX_NEST) {
            return 0; 
        }
        
        tmp[index] = '\0';
        node = ngx_http_wh_add_child(tmp, node, cf);
        if (node == NULL) {
            return 0;
        } 
        
    } else {
        /* node ends with '/' */
        node->end_slash_allowed = 1;
    }
    
    return 1;
    
}

The following is the code snippet for the ngx_http_wh_list_cfg() function. This function is called to process each wh_list_uri directive containing the URL to be whitelisted. It calls the ngx_http_wh_add_path() function to add each URL to the whitelist URI tree. It also creates the root node using ngx_http_wh_create_node() function, if it doesn't exist.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
/* Process the white list uri configuration */
static char *
ngx_http_wh_list_cfg(ngx_conf_t *cf, ngx_command_t *cmd, void *conf)
{
    size_t                             len;
    u_char                             *uri;
    ngx_str_t                          *value; 
    ngx_whl_pnode_t                    *root;
    ngx_http_uri_whitelist_loc_conf_t  *slcf;
    
    if (cf->args->nelts < 2) {
        return NGX_CONF_ERROR;
    }
    
    value = cf->args->elts;
    uri = value[1].data; 
    len = value[1].len; 
    
    if (uri[0] != '/') {
        ngx_log_error(NGX_LOG_EMERG, cf->log, 0, "[URI_WHITELIST]: "
        "Error uri must starts with '/'");
        return NGX_CONF_ERROR;
    }
    
    if (uri[len] != '\0') {
        ngx_log_error(NGX_LOG_EMERG, cf->log, 0, "[URI_WHITELIST]: "
        "Error uri does not end with '\0'");
        return NGX_CONF_ERROR;
    }
    
    slcf = conf; 
    if (slcf->uri_tree == NULL) {
        slcf->uri_tree = ngx_http_wh_create_node( (u_char *)"/", 1, cf);
        if (slcf->uri_tree == NULL) {
            return NGX_CONF_ERROR;
        }
    } 
  
    root = slcf->uri_tree;
    
    if (!ngx_http_wh_add_path(uri, root, cf)) {
        ngx_log_error(NGX_LOG_EMERG, cf->log, 0, "[URI_WHITELIST]: "
            "Error cannot add uri to whitelist");
        return NGX_CONF_ERROR;
    }
    
    return NGX_CONF_OK;
}

The ngx_http_wh_check_path_exists() function checks if a URL string is present in the URI tree. The following shows the code snippet. It breaks down a URL string into its parts. It checks that a URL string always begin with a "/" (must always have a root node). Then for each of its subsequent child parts, it checks whether the parent node contains the child part.

If the node is the root node "/" or if the node ends with a slash like "scripts/", then the end_slash_allowed flag of the node is checked. When end_slash_allowed is set to 1, it means that the node (URL) is present, otherwise it is not. The end_slash_allowed flag is set only when there is an explicit whitelist directive (wh_list_uri) for a URL that ends with "/".

This is required because when a URL like "/mydirectory/myfile.php" is whitelisted; the nodes "/", "mydirectory/" and "myfile.php" are created in the URI tree. However, this doesn't mean that the URL string "/" , or "/mydirectory/" should be accessible, since these 2 URLs are not whitelisted explicitly. To make "/" or "/mydirectory/" accessible, they must be specified explicitly using the whitelist directive.

Notice that in the parsing code, there is no handling of "./" or "../". This is not necessary in our case as Nginx normalizes the request URL before passing it to the module.

The ngx_http_wh_check_path_exists() function calls ngx_http_wh_check_path_seg() to check that a child node exists under a parent node. We will not go through the ngx_http_wh_check_path_seg() function. Refer to the Github link at the end of the article for the full module source code.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
/* Checks if a uri path is present in the uri tree */
static size_t 
ngx_http_wh_check_path_exists(u_char* path, size_t len, ngx_whl_pnode_t *root)
{
    size_t           plen, index, last;
    u_char           c, *p, tmp[NGX_WHL_MAXPATHSZ]; 
    ngx_whl_pnode_t  *node;
    
    if (path == NULL || root == NULL) {
        return 0;
    }
    
    
    if (len == 0 || len >= NGX_WHL_MAXPATHSZ) {
        return 0;
    }
    
    p = path; 
   
    c = *p++;
    if( c != '/') {
        return 0;            
    }
    
    plen = len - 1; 
        
    node = root; 
    index = last = 0; 
    
    while (plen-- > 0) {
        
        c = *p++;

        switch(c) {            
        case '/':
            if (index + 1 >= NGX_WHL_MAXPATHSZ) {
                return 0;
            }
            
            tmp[index] = c;
            index++;
            tmp[index] = '\0';
            
            node = ngx_http_wh_check_path_seg(tmp, index, node); 
            if (node == NULL) {
                return 0;
            }
            
            index = last = 0;
            break;
        
        default:
            last = 1;
            
            if (index >= NGX_WHL_MAXPATHSZ) {
                return 0; 
            }
            tmp[index] = c;
            index++;
            
        }
        
    }
    
    
    if (last) {
        if (index >= NGX_WHL_MAXPATHSZ) {   
            return 0; 
        }
        
        tmp[index]='\0';
        node = ngx_http_wh_check_path_seg(tmp, index, node); 
        
        if (node == NULL) {
            return 0; 
        }
        
    } else {
        /* node ends with '/' */
        if (node->end_slash_allowed == 0) {
            return 0; 
        }
        
    }
    
    return 1; 
    
}

Installation and Testing

To install the module, obtain a copy of the module source code from github.

git clone https://github.com/ngchianglin/ngx_http_uri_whitelist_module.git

To verify the integrity and signature of the module source code, refer to this link. Obtain a copy of my public key; follow the page instructions on how to import it and verify the git commit.

Download the latest stable nginx source code from https://nginx.org. Verfiy the integrity of the source code using the pgp signature.

wget https://nginx.org/download/nginx-1.18.0.tar.gz

The downloaded gzipped file should have the following SHA256 checksum.

4c373e7ab5bf91d34a4f11a0c9496561061ba5eee6020db272a17a7228d35f99 nginx-1.18.0.tar.gz

Extract the nginx source and compile nginx with the URI whitelisting module. Install it into /usr/local/nginx.

tar -zxvf nginx-1.18.0.tar.gz
cd nginx-1.18.0/
./configure --with-cc-opt="-Wextra -Wformat -Wformat-security -Wformat-y2k -Werror=format-security -fPIE -O2 -D_FORTIFY_SOURCE=2 -fstack-protector-all" --with-ld-opt="-pie -Wl,-z,relro -Wl,-z,now -Wl,--strip-all" --without-http_rewrite_module --add-module=../ngx_http_uri_whitelist_module
make
sudo make install

We can now test the URI whitelist module. It is assumed that there is already an apache website set up on the system and apache httpd is configured to listen on port 8080. We can configure nginx as a reverse proxy for the apache website. The module can also be used on a website hosted directly by nginx.

Edit the /usr/local/nginx/conf/nginx.conf with the following.

user  nginx nginx;
worker_processes  1;
error_log  /var/log/nginx/error.log warn;
pid        /var/log/nginx/nginx.pid;


events {
    worker_connections  1024;
}


http {
    include       mime.types;
    default_type  application/octet-stream;

    log_format  main  '$remote_addr - $remote_user [$time_local] "$request" '
                      '$status $body_bytes_sent "$http_referer" '
                      '"$http_user_agent" "$http_x_forwarded_for" "$gzip_ratio"';

    sendfile        on;
    keepalive_timeout  65;
    server_tokens off;
    
    proxy_cache_path /usr/local/nginx/cache levels=1:2 keys_zone=webcache:2m max_size=150m;
    proxy_cache_key "$scheme$request_method$host$request_uri$is_args$args";
    proxy_cache_valid 200 302 30m;
    proxy_cache_valid 404 1m;

    gzip  on;
    
    map $sent_http_content_type $cachemap {
        default    no-store;
        ~text/html  "private, max-age=900";
        text/plain  "private, max-age=900";
        text/css    "private, max-age=7776000";
        application/javascript "private, max-age=7776000";
        ~image/    "private, max-age=7776000";
    }

    server {
        listen 80;
        server_name localhost;
        root   /opt/nginx/www;
        
        charset utf-8;
        access_log  /var/log/nginx/access.log  main;
        
        location / {
            
            index  index.html index.htm;
            
            proxy_cache webcache;
            proxy_cache_bypass $http_cache_control;
            
            proxy_set_header HOST $host;
            proxy_set_header X-Real-IP $remote_addr;
            proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
            proxy_pass http://127.0.0.1:8080;
            
            add_header Cache-Control $cachemap;
            wh_list off;
        }

        

        # redirect server error pages to the static page /50x.html
        error_page   500 502 503 504  /50x.html;
        location = /50x.html {
            root   html;
        }

    }

}

Create a web root directory for nginx.

sudo mkdir -p /opt/nginx/www

Make sure that the nginx user and group are present, otherwise create them.

sudo mkdir -p /opt/nginx/home
sudo chmod 750 /opt/nginx/home
sudo groupadd -g 9870 nginx
sudo useradd -d /opt/nginx/home -u 9870 -g 9870 -s /bin/false nginx

On the apache website, make sure that you have a index.html with some test content inside. Create another test file, test.txt and put in some test content. The Nginx URI whitelist module is currently turned off in the nginx.conf. So these urls should be accessible from the Nginx proxy. Make sure apache httpd is running and listening on port 8080. Start up Nginx.

sudo /usr/local/nginx/sbin/nginx

Access http://localhost, http://localhost/index.html and http://localhost/test.txt. All three URLs should be accessible and return the right content.

devuser1@devmachine:~$ curl -i http://localhost
HTTP/1.1 200 OK
Server: nginx
Date: Tue, 29 Oct 2019 04:38:06 GMT
Content-Type: text/html; charset=UTF-8
Content-Length: 170
Connection: keep-alive
Last-Modified: Tue, 29 Oct 2019 04:36:42 GMT
Vary: Accept-Encoding
Cache-Control: private, max-age=900
Accept-Ranges: bytes

<!DOCTYPE html>
<html>
<head>
<meta charset="UTF-8">
<title>Testing html page</title>
</head>
<body>
<p>
This is a test for Nginx URI whitelisting !
</p>
</body>
</html>


devuser1@devmachine:~$ curl -i http://localhost/index.html
HTTP/1.1 200 OK
Server: nginx
Date: Tue, 29 Oct 2019 04:43:04 GMT
Content-Type: text/html; charset=UTF-8
Content-Length: 170
Connection: keep-alive
Last-Modified: Tue, 29 Oct 2019 04:36:42 GMT
Vary: Accept-Encoding
Cache-Control: private, max-age=900
Accept-Ranges: bytes

<!DOCTYPE html>
<html>
<head>
<meta charset="UTF-8">
<title>Testing html page</title>
</head>
<body>
<p>
This is a test for Nginx URI whitelisting !
</p>
</body>
</html>


devuser1@devmachine:~$ curl -i http://localhost/test.txt
HTTP/1.1 200 OK
Server: nginx
Date: Tue, 29 Oct 2019 04:43:50 GMT
Content-Type: text/plain; charset=UTF-8
Content-Length: 56
Connection: keep-alive
Last-Modified: Tue, 29 Oct 2019 04:37:23 GMT
Cache-Control: no-store
Accept-Ranges: bytes

This is a test text file
Testing Nginx URI whitelisting

Edit the /usr/local/nginx/conf/nginx.conf and turn on the Nginx URI whitelisting module.

wh_list on;

Reload nginx with the new configuration.

sudo /usr/local/nginx/sbin/nginx -s reload

Access the 3 URLs again using curl. This time, access should be denied with HTTP 404 error.

devuser1@devmachine:~$ curl -i http://localhost
HTTP/1.1 404 Not Found
Server: nginx
Date: Tue, 29 Oct 2019 04:49:49 GMT
Content-Type: text/html; charset=utf-8
Content-Length: 146
Connection: keep-alive

<html>
<head><title>404 Not Found</title></head>
<body>
<center><h1>404 Not Found</h1></center>
<hr><center>nginx</center>
</body>
</html>

devuser1@devmachine:~$ curl -i http://localhost/index.html
HTTP/1.1 404 Not Found
Server: nginx
Date: Tue, 29 Oct 2019 05:07:19 GMT
Content-Type: text/html; charset=utf-8
Content-Length: 146
Connection: keep-alive

<html>
<head><title>404 Not Found</title></head>
<body>
<center><h1>404 Not Found</h1></center>
<hr><center>nginx</center>
</body>
</html>

devuser1@devmachine:~$ curl -i http://localhost/test.txt
HTTP/1.1 404 Not Found
Server: nginx
Date: Tue, 29 Oct 2019 04:49:40 GMT
Content-Type: text/html; charset=utf-8
Content-Length: 146
Connection: keep-alive

<html>
<head><title>404 Not Found</title></head>
<body>
<center><h1>404 Not Found</h1></center>
<hr><center>nginx</center>
</body>
</html>

Let's whitelist some of the URLs. Edit nginx.conf and add the following.

wh_list_uri /index.html;
wh_list_uri /test.txt;

Reload nginx.

sudo /usr/local/nginx/sbin/nginx -s reload

These 2 URLs should now be accessible again due to the whitelist.

devuser1@devmachine:~$ curl -i http://localhost/index.html
HTTP/1.1 200 OK
Server: nginx
Date: Tue, 29 Oct 2019 05:13:50 GMT
Content-Type: text/html; charset=UTF-8
Content-Length: 170
Connection: keep-alive
Last-Modified: Tue, 29 Oct 2019 04:36:42 GMT
Vary: Accept-Encoding
Cache-Control: private, max-age=900
Accept-Ranges: bytes

<!DOCTYPE html>
<html>
<head>
<meta charset="UTF-8">
<title>Testing html page</title>
</head>
<body>
<p>
This is a test for Nginx URI whitelisting !
</p>
</body>
</html>

devuser1@devmachine:~$ curl -i http://localhost/test.txt
HTTP/1.1 200 OK
Server: nginx
Date: Tue, 29 Oct 2019 05:16:35 GMT
Content-Type: text/plain; charset=UTF-8
Content-Length: 56
Connection: keep-alive
Last-Modified: Tue, 29 Oct 2019 04:37:23 GMT
Cache-Control: no-store
Accept-Ranges: bytes

This is a test text file
Testing Nginx URI whitelisting

However, when we try to access http://localhost or http://localhost/, both show HTTP 404 error.

devuser1@devmachine:~$ curl -i http://localhost
HTTP/1.1 404 Not Found
Server: nginx
Date: Tue, 29 Oct 2019 05:17:43 GMT
Content-Type: text/html; charset=utf-8
Content-Length: 146
Connection: keep-alive

<html>
<head><title>404 Not Found</title></head>
<body>
<center><h1>404 Not Found</h1></center>
<hr><center>nginx</center>
</body>
</html>


devuser1@devmachine:~$ curl -i http://localhost/
HTTP/1.1 404 Not Found
Server: nginx
Date: Tue, 29 Oct 2019 05:17:49 GMT
Content-Type: text/html; charset=utf-8
Content-Length: 146
Connection: keep-alive

<html>
<head><title>404 Not Found</title></head>
<body>
<center><h1>404 Not Found</h1></center>
<hr><center>nginx</center>
</body>
</html>

This is because the root directory "/" has not been whitelisted. To allow access, we need to add the following to nginx.conf.

wh_list_uri /;

Reload nginx and the root directory URL should be accessible again. This is similar for subdirectory. If the root of a subdirectory is to be accessible, it has to be whitelisted. For example,

wh_list_uri /mydirectory/subdirectory2/;

Play around with the Nginx whitelist module. There is also the bypass extensions directive that will allow files with certain extensions such as jpg, gif etc... to be bypassed. The extensions bypass directive should be used carefully. For the best protection, web resources including static image files that are supposed to be accessible, should be whitelisted explicitly. The README.md at the module github repository has details on the syntax of its directives.

To whitelist a file extension, for instance ".txt", add the following to the nginx.conf

wh_list_bypass txt;

Create a new text file, mytest.txt and fill in some content. Reload nginx. This new text file will be accessible without being explicitly whitelisted. In fact, all files that end with ".txt" extensions will be accessible. The whitelist module will bypass the access checks for such extension.

devuser1@devmachine:~$ sudo /usr/local/nginx/sbin/nginx -s reload
devuser1@devmachine:~$ curl -i http://localhost/mytest.txt
HTTP/1.1 200 OK
Server: nginx
Date: Thu, 31 Oct 2019 02:24:54 GMT
Content-Type: text/plain; charset=UTF-8
Content-Length: 70
Connection: keep-alive
Last-Modified: Thu, 31 Oct 2019 02:23:38 GMT
Vary: Accept-Encoding
Cache-Control: no-store
Accept-Ranges: bytes

This is another test file for trying on extensions
bypass directive.

The Nginx whitelist module will print warnings and alerts to the nginx error log. If a URL is denied access, an alert will be in the error log. If the module itself is turned off, a warning will be logged. This is useful for security monitoring, where a security engineer or administrator may want to know about illegal access or if the module itself got disabled.

Some examples from the error log.

2019/10/29 12:43:50 [warn] 6523#0: *6 [URI_WHITELIST] : White list module disabled !, client: 127.0.0.1, server: localhost, request: "GET /test.txt HTTP/1.1", host: "localhost"
2019/10/29 12:49:40 [alert] 6653#0: *8 [URI_WHITELIST] : Access Denied for [ /test.txt ] , client: 127.0.0.1, server: localhost, request: "GET /test.txt HTTP/1.1", host: "localhost"

Generating a Whitelist Configuration using Python

To ease the whitelisting process, a python script can be used to generate a list of whitelisted URLs automatically. The following is a simple python3 script that traverse through a web document root folder/directory, generating the whitelist directives.

#!/usr/bin/python3
#
#  Simple python script to traverse a web document root directory
#  Generate a whitelist configuration for the files
#  and directories
#
#  Ng Chiang Lin
#  Nov 2020
#  https://www.nighthour.sg/articles/2019/developing-nginx-url-whitelisting-module.html
# 

import os

rootdir = "HomePage" 
ignore_exts = ['jpg', 'png', 'svg','gif','webp']
directive = "wh_list_uri"


def checkFile(filename):

    #hidden file or directory
    if filename.startswith('.'):
        return False
    
    parts = filename.split('.')
    length = len(parts)
    
    #filename doesn't have a dot extension
    if length < 2 :
        return True
    
    #check that file extension is not in ignore list
    extension = parts[length - 1]
    for ext in ignore_exts:
        if extension == ext :
            return False

    return True

    
def formatRelativePath(path):

    parts = path.split('/')
    length = len(parts)
    
    if length < 2:
        print("An error occurred file path format is wrong")
        exit(1)
    
    relativepath = ""

    for i in range(1,length):
        if i < length -1 :
            relativepath = relativepath + parts[i] + "/"
        else:
            relativepath = relativepath + parts[i]
    
    return relativepath

    

def listDir(directory):

    with os.scandir(directory) as it:
        for entry in it:
            entryname = ''
            if entry.is_file() and checkFile(entry.name) :
                entryname = formatRelativePath(entry.path)
                print(directive, ' /', entryname, ' ;', sep='')
            elif entry.is_dir():
                entryname = formatRelativePath(entry.path)
                print(directive, ' /', entryname, ' ;',sep='')
                print(directive, ' /', entryname, '/ ;',sep='')
                listDir(entry.path)
    
    


if __name__ == "__main__":
    print(directive, ' / ;', sep='')
    listDir(rootdir)

The script prints the whitelist directives to the console. This can be redirected to a configuration file that can be included in an nginx configuration.

It is assumed that the files and sub directories in the web document root are not sensitive, and all should be publicly accessible. The script also skips certain file extensions such as images, "jpg", "png", "svg" etc... The rootdir variable and ignore_exts define the document root directory as well as the extensions to skip.

The python script can generate an initial list of white listed URLs. A web or security administrator should go through the whitelist and remove files/directories that should not be accessible.

For the image extensions that are skipped by the python script, the extension bypass directive of the Nginx module can be used to grant access to these image types. If there are sensitive images that should not be accessible, the image types should not be bypassed. Instead, an explicit whitelist needs to be generated for each of the image file that are not sensitive and can be publicly accessible.

Defense in Depth

There are many ways to protect administrative interfaces/consoles, API end points or other web resources that should not be publicly accessible. Security best practices often stress defense in depth, having multiple layers of defenses and mitigations.

Administrative components that are not necessary should be removed or disabled in an application. Proper firewall rules should be set up to protect internal interfaces that are listening on ports that should not be publicly accessible. Strong authentication and complex passwords should be set for administrative interfaces.

IP address filtering can be set up to control access to administrative console etc... There is also the approach of using mutual TLS authentication to secure private APIs. Mutual TLS ensures that only authorized clients with the right certificates can access an application. Mutual TLS also prevents Man in the middle attack, since both the client and server certificate have to be verified and trusted.

All these different measures can be used together with URLs whitelisting to secure an application and reduce its attack surface.

Conclusion and Afterthought

Whitelisting is a useful technique in information security. It can be used in web applications to guard against invalid user input, it can be used in enterprises to prevent unauthorized applications from running on desktops and servers. Whitelisting is used on network firewalls and rate limiters to stop malicious network traffic.

We can also use whitelisting on web URLs, controlling access to web resources. An Nginx module can control access to web resources using a whitelist of URLs. This can be an additional layer of defense against web attacks, vulnerabilities in web application frameworks, misconfigurations and accidental uploads of sensitive files. White listing of URLs together with other security measures can reduce the attack surface of a website or web application.

Useful References

The full source code for the Nginx URI Whitelist Module is available at the following Github link.
https://github.com/ngchianglin/ngx_http_uri_whitelist_module

The python script that can generate a initial list of whitelist.
https://github.com/ngchianglin/VPS_MISC/blob/master/whitelist.py

If you have any feedback, comments, corrections or suggestions to improve this article. You can reach me via the contact/feedback link at the bottom of the page.

Article last updated on Nov 2020.