Night Hour

Reading under a cool night sky ... 宁静沉思的夜晚 ...

Writing an Nginx Response Body Filter Module

Willow tranquility

By three methods we may learn wisdom: First, by reflection, which is noblest; Second, by imitation, which is easiest; and third by experience, which is bitterest. , Confucius (孔子)


15 Dec 2017


Introduction

Nginx is a popular opensource web/proxy server that is known for its performance and is used by many websites. It supports 3rd party modules that can provide additional functionalities and customizations. This article shows how to write and develop a simple filter module that inserts a text string after the html <head> element in a HTTP response body.

This can be useful in some cases. For instance, to insert a monitoring script without modifying the existing web pages or web application. Nginx can be used as a reverse proxy to speed up access to the website and at the same time insert the monitoring script to the web content.


Article last updated Nov 2019.

Table of Content

  1. Design and Approach
    1. Html Tag Parser
    2. Nginx Buffer Chains
    3. A Big Picture View of the Filter Setup
    4. Logical Flow of the Filter Module
    5. Performance Considerations
    6. Avoiding HTTP Chunked Transfer Encoding
  2. Structure of an Nginx HTTP Filter Module
    1. Components of Nginx Module
    2. Nginx Module Filter Chain
    3. Module Config Shell File
  3. Implementation of the Nginx Response Body Filter
    1. Nginx Per Request/Respond Context
    2. Saving and Retrieving Per Request/Response Context
    3. Structure for Storing Module Configuration
    4. Module Directives
    5. Nginx Module Context
    6. The module initialization function
    7. The module configuration creation and merge functions
    8. Nginx Module Definition
    9. The response headers filter function
    10. The response body filter function
    11. Explaining ctx->in, ctx->out, ctx->last_out
    12. The output function
    13. The output empty page function
    14. The output buffering function
    15. The html tag parser function
    16. The text insertion function
  4. Compiling the Nginx Body Filter Module
  5. Testing the Nginx Filter Module
  6. Conclusion and Afterthought
  7. Useful References

Design and Approach

We will use C language to write the nginx module. Besides the boilerplate code for integrating with Nginx, we need a parser that can parse an input stream for html tags or elements. The following describes how a simple html tag parser can be implemented.

Html Tag Parser

We need to understand how a html tag or element is structured before we can decide on the strategy and method to parse for such tags. The following shows the syntax diagram of a html tag.

Html tag syntax diagram
Fig 1. Syntax Diagram HTML Tag

A tag starts with a angle bracket < and ends with the corresponding closing > bracket. It has a tagname, an optional "/" and optional attributes. SP represents whitespace. There must be a space between the tagname and an attribute. Additonal whitespaces may be present between each of these tokens.

A simplified BNF (Backus–Naur form) for HTML tagname and its attributes may look like this.

Tagname :: alphabetic letters
Attribute :: AttributeName <opt space> = <opt space> <opt quotes> AttributeValue <opt quotes>
AttributeName :: alpha-numeric letters
AttributeValue :: alpha-numeric letters | EscapeSequences | empty
alphabetic letters :: a|b|c|d|e|f|g|h|i|j|k|l|m|n|o|p|q|r|s|t|u|v|w|x|y|z
alpha-numeric letters :: 0|1|2|3|4|5|6|7|8|9 | alphabetic letters
EscapeSequences :: '\"' | '\'' | '\n' | '\r' | '\\' | '\t' | '\v' | '\f' | '\b' | '\a' | '\xhhhh' | '\uhhhh'
<opt space> :: Optional white spaces
<opt quotes> :: Optional quotes
Optional quotes :: " | ' | empty
Optional white spaces :: '  ' | '\r' | '\n' | '\t' | '\v' | '\f' | empty
empty :: ''

It does look complex. Parsing html into a syntax tree like what a web browser does is hard. Fortunately, it is not as difficult as thought in our case. We can forget about the BNF listing above.

The parser just needs to focus on four key tokens. A starting angle bracket, closing angle bracket, single quote and double quote.

< > ' "

A stack can be used to collect the html tag encountered in an input stream. When the parser encounters a start bracket, '<', it initializes an empty stack and push the start bracket into the stack. Other characters that come after the start bracket will be pushed into the stack. If a single or double quote is seen, a toggling flag is set to indicate the start of string content. A corresponding closing quotation mark is required to end the string content.

When the parser finally sees an end bracket, '>', it pushes it into the stack and the complete html tag is now present on the stack. The complete html tag can be analyzed and processed.

Toggling flags are meant to determine if a '<' or '>' represents a token or is part of a string. Any '<' or '>' tags encountered after the start bracket and a quotation mark is part of a string. It will be treated as a normal character to be pushed into the stack. When the corresponding closing quotation is seen, the relevant toggling flag is reset. Any '<' or '>' encountered afterwards will be interpreted as the start or end token for a html tag.

The same toggling mechanism applies for the quotation marks too. A single quote that appears after a start bracket and double quote is part of a string. The same thing applies for the double quote. Any characters encountered before a start bracket, '<', are ignored. These are the content of the html document. A fresh stack is initialized each time the starting bracket is encountered.

These simple rules allow a complete html tag or element to be extracted from an input stream. We will look at the parser code later in the article.

Nginx Buffer Chains

Nginx stores the content of the response body into a linked list of buffer chains (ngx_chain_t). Each chain contains a buffer structure (ngx_buf_t) that holds a part of the content. The final buffer containing data in the HTTP response body has a special flag, last_buf, configured. This marks it as the last buffer in the output. See the illustration below.

Nginx buffer chain diagram
Fig 2. Nginx chain of buffers

More than one linked list of buffer chains may be required to store the entire content of a HTTP response body. Nginx will pass each linked list of chains to the filter module as and when data is available. The last_buf marks the last buffer in the output.

The parser has to process each buffer in the linked list of chains. Each buffer(ngx_buf_t) has a pointer to a block of memory space, holding a part of the content. The parser treats this content block as an input stream starting from the pointer. It goes through the content blocks buffer by buffer, extracting html tag and examing to see if it is a <head> tag.

Once the <head> tag is found, its end position must be in the content block, held by the current ngx_buf_t buffer. To insert our own text string, this buffer will be split up and relinked with our text inserted in the middle. The following illustrates how an original buffer is split up into 3 new buffers with our text in-between.

Nginx buffers Insertion of Text 1
Fig 3. Insertion of Text 1

If the original buffer doesn't contain any data after the <head> tag, it can be split and linked as 2 parts.

Nginx buffers Insertion of Text 2
Fig 4. Insertion of Text 2

This new set of buffers with the inserted text are linked up in the correct order with the other buffers in the nginx output chain. The output is then sent to other filters in nginx and finally to the client.

So far, there are 3 diagrams showing the structure of Nginx buffer chains but these are actually high level abstract views, meant to describe the concepts of text insertion. The actual data structures of ngx_chain_t, ngx_buf_t and the memory block holding content is not shown. The following is a more detailed view of how these structures are actually organized.

More detailed view Nginx buffers Chain
Fig 5. A more detailed view Nginx Buffers Chain

The diagram shows a single linked list of ngx_chain_t containing ngx_buf_t (buffers) that points to block of memory holding the content of the HTTP response body. The last ngx_buf_t buffer structure has the last_buf flag set to true. This indicates the end of output, i.e. this particular linked list of ngx_chain_t holds the last of the output in the response body.

There can be other linked lists of ngx_chain_t prior to this one that contains the earlier parts of the response body. But in these earlier linked lists, the last_buf flag will not be set in their ngx_buf_t buffers.

Refer to the official Nginx Development Guide for detailed description of ngx_chain_t and ngx_buf_t structures.

A Big Picture View of the Filter Setup

The earlier description of the parser and text insertion is basically the heart or engine of the filter module that will be implemented. The diagram below illustrates the big picture view of how the filter module together with Nginx can be deployed.

Nginx Reverse Proxy Setup Architecture
Fig 6. Nginx Reverse Proxy Setup Architecture

Nginx and the web server are located on the same machine. The web server listens only on localhost (127.0.0.1) and accepts traffic from Nginx. Nginx is setup as a reverse proxy with the filter module installed. It forwards incoming client request to the web server and modifies the web server response with the inserted text (a monitoring script).

Nginx will implement TLS (Transport Layer Security, a.k.a HTTPS) and serve as TLS termination for the web server. Caching will be enabled on Nginx to speed up performance. There are a few other things the filter module has to handle. For example, if the original content from the web server is compressed (gzip or deflate), the filter will let the result pass through unmodified. The web server should therefore disable compression and let Nginx itself handle content compression.

The order of module loading in Nginx is important. The filter module needs to be run first before Nginx's gzip module; otherwise it cannot process the content that is compressed by gzip. By default, the filter module will run before gzip. The filter module will only handle html content type. Other content types like images, javascript, stylesheets or binary will be passed through unmodified.

The filter module will check the HTTP status code as well. If the status is not HTTP 200 OK, the content will pass through unmodified. This means error pages will not have the text inserted. Another check that the module does is the content length of the html output. If content length exceeds 10MiB (10 * 1024 *1024), the filter module will skip the content.

This is a sanity check to avoid processing extremely large content. It is highly unlikely that html content will exceed 10MiB. Having this limit also guards against misconfigurations, where large binary files are sent with the text/html content type instead of the relevant binary mime types.

Finally, the filter needs to be able to handle malformed html, such as those without <head> tag or those with multiple <head> tags etc... The text will only be inserted once after the first <head> tag that is encountered. If the <head> tag is not found after the first 128 characters of a response body, the module can be configured to "block" the page.

When the "block" configuration directive is enabled; the filter will display a blank page if the response body doesn't have a valid <head> tag in the first 128 characters. A single html tag including its attributes cannot be more than 512 characters. The maximum stack size for the parser is set to 512.

Logical Flow of the Filter Module

We can work out the logical flow of the filter module on paper or using a diagram so that there will be more clarity when writing the code. The simple block diagram below shows the flow of the filter module logic.

Logic flow of the Nginx Filter Module
Fig 7. Logic Flow of Nginx Filter Module

A linked list of buffers is processed and there are two possible outcomes. The <head> tag is found or it is not found. In the case of found, the modified output buffers chain can be sent. The not found case is slightly more complicated.

If the blocking mode is disabled, the unmodified output buffers chain can be sent. When blocking mode is enabled, a check is done for the characters limit of 128. <head> tag must be in the first 128 characters of a response body. If the limit is reached, an empty page can be sent; otherwise the output buffers chain must be kept on hold.

Holding or buffering the output chain is necessary, because at this stage we do not know whether the <head> tag will eventually be found in subsequent linked list of buffers chain. Therefore, the filter module has to buffer the output and let nginx continue with other processing.

When a subsequent buffers chain is available, the flow starts from the top again. If a <head> tag is found within the characters limit, the held up output together with the new modified output are sent in the correct order.

An empty page will be sent if the characters limit is hit and no <head> tag is found under the blocking mode. Having a logical flow diagram makes it far easier to code the nginx filter module later on.

Performance Considerations

The filter module needs to be fast. An nginx setup may include many other modules; our module needs to do it work fast and pass the output to other modules and nginx for processing.

The modified content with the text to be inserted can be done in a single pass through the chain of buffers, as long as the html <head> tag is found. The parser will only process a maximum of the first 128 characters in the response body. Anything that comes after will not be parsed. This avoids parsing all of the response body when non blocking mode is enabled.

The parser will stop parsing the response body after the <head> tag is found, even before the limit of 128 characters is reached. This should occur most of the time for html content that is properly formatted.

Output buffering (where content is held back) is only done in blocking mode and only for the first 128 characters of data. After 128 characters, the filter module will either send an empty page if <head> is not found or otherwise; send the modified content with inserted text.

These rules, together with other checks and sane limits ensure the fast performance of the nginx filter module.

Avoiding HTTP Chunked Transfer Encoding

A particular problem of modifying response body only if certain html tags are present is the setting of the new HTTP content length header. Since we are unable to tell whether a tag is present until we have processed the content, we can't determine the value of the content length header that is to be sent in advance.

The usual solution is to use Chunked Transfer Encoding that indicates unknown response body size. One of the uses of chunked transfer encoding is streaming data where file sizes are large and the processing can be done in chunks instead of all at once in memory. This is usually not required for reasonable html content.

For our filter module, I will like to avoid chunked encoding and send a fixed content length to any downstream filter modules; and eventually to the client browser, if downstream modules do not change this. Take note that when gzip compression is enabled on Nginx, it will send the response as chunked encoding to the clients.

Some tricks are used in the filter module to avoid chunked encoding. Under blocking mode, the following applies

Blocking Mode
  • The existing content length header value is incremented with the length of text to be inserted and sent to the client.

  • If the <head> tag is not found, an empty page is sent. The empty page is made up of blank content that matches the size of the modified content length header.

  • If the <head> tag is found, the modified content can be sent as per normal; since the inserted text will be added, matching the modified content length header.

Under non blocking mode, the following applies.

Non Blocking Mode
  • The existing content length header value is incremented with the length of text to be inserted and sent to the client.

  • If the <head> is not found. All the unmodified content can be sent. However, blank paddings, the size of the inserted text will be appended to the end of the unmodified content. This will match the modified content length header.

  • If the <head> tag is found, the modified content can be sent as per normal; since the inserted text will be added, matching the modified content length header.

The use of empty page and paddings with appropriate sizes, enable the filter module to avoid chunked transfer encoding.

Structure of an Nginx HTTP Filter Module

To develop and write an Nginx module, there is a need to know the data types and function calls available in the Nginx system. The official Nginx Development Guide is a key document to read. It provides information on what header files to include, what return codes are supported, the Nginx data types such as ngx_str_t (String), arrays, lists etc...that are available. There is also information on memory management, the Nginx cycle, Nginx events, how connections and requests are defined.

The official guide is the main reference to learn how to develop Nginx modules. However, it is rather long and multiple readings are probably required to understand the content. An easier introduction is available at EMiller 's Guide To Nginx Development. This guide is a useful tutorial for beginners learning to write Nginx modules.

Components of Nginx Module

In this section, we will briefly run through the key components of an Nginx filter module without going into too much details. In the implementation section, we will run through the source code of the filter module. There are 3 important Nginx data structures that modules rely on.

  1. Module Definition
  2. Module Context
  3. Module Directive Structure

The following table describes each item in more details. The source definition column provides the link to the actual nginx source code where the structure is defined.

Data Structure Description Source Definition
ngx_module_t
(Module Definition)

This structure is the module definition. It is a typedef of ngx_module_s and it defines the module. It is a global variable for each module. At the top of the structure are version information that can be filled by using a macro NGX_MODULE_V1. There are also several unused fields for future extensions at the bottom of the struct that can be filled with NGX_MODULE_V1_PADDING.

For the remaining fields, we are interested in only 3 of them. The rest are handlers that can be called at various points in the Nginx cycle. These are set to NULL. The 3 fields that concern us are as follow.

  • void *ctx;
    This takes the module context (ngx_http_module_t) which contains the function handlers for creating module configuration struct and merging module configuration. ngx_http_module_t is covered later in this table.
  • ngx_command_t *commands;
    This takes a pointer to an array of ngx_command_t. Each ngx_command_t defines a directive that the module takes. ngx_command_t is covered later in this table.
  • ngx_uint_t type;
    This defines the type of module (let Nginx know what is stored in ctx), such as NGX_CORE_MODULE, NGX_HTTP_MODULE etc...
Source Def
ngx_http_module_t
(Module Context)

Module context, a static data structure that defines the handlers for the creation and initialization of a module's configuration struct. It includes handlers that can run pre and post configuration.

A module can have its own configuration struct that contains the parameters it requires. The function handlers define here are for the creation and merging of the module configuration struct. There are seperate pairs of function handlers for the module configuration that appear in Nginx 's main config block, server config block and location block. There are also two handlers that can run pre and post configuration.

For those handlers that are not needed, NULL can be specified. For example, if a module only has directives in Nginx's location block and it doesn't require merging values from higher levels, the function handler for creating a location configuration can be specified, while all others set to NULL.

Source Def
ngx_command_t
(Module Directive Structure)

This is a typedef of ngx_command_s, for defining a module directive. A static array of ngx_command_t, containing the directives of a module is passed to Nginx. The arrays is terminated by a ngx_null_command. ngx_command_t has the following fields.

  • Directive Name
    An ngx string for the name of the directive.
  • Bitmask
    Indicates where the directive will be configured (eg. HTTP, server or location block in the Nginx config file). The bitmask also indicates how many and what arguments the directive takes.
  • Set Function pointer
    A set handler function for saving the directive arguments. Nginx has several pre-defined set functions for saving various directive arguments like boolean, string etc... A custom handler can also be specified.
  • Configuration Structure
    This specifies the configuration structure passed to the directive handler. If a module directive is configured in the server context/block of the Nginx config file, then the server context offset (NGX_HTTP_SRV_CONF_OFFSET) should be specified here. The handler function use this information for locating the right module configuration.
  • Parameter offset
    This is where the parameter for the module configuration is located. The set handler function will save the directive argument here.
  • Post
    A secondary function pointer can be specified that will be called after the earlier set function handler has saved the directive argument. This field can also hold a default value that can be used by some of the Nginx pre-defined set functions.
Source Def

Nginx Module Filter Chain

Besides these 3 data structures, we need to know a bit about how Nginx handles the http filter modules. Nginx treats http filter modules like a chain. The first filter will call the second and the second calls the third and so on... until the last. There are two separate chains, one for handling HTTP response headers and another for the HTTP response body.

A filter module can register a handler for HTTP response headers, as well as a handler for HTTP response body.

Registration can be done in an initialization function defined as a post configuration function in the module context. The module context (ngx_http_module_t) is described in the table earlier.

The filter handlers take the arguments and return values required by Nginx. For example, a HTTP response headers handler function takes a pointer of ngx_http_request_t as argument and return ngx_int_t. This handler function will call the next response headers handler in the chain when it is done.

Function prototype of a filter handler for HTTP headers.

static ngx_int_t ngx_http_html_head_header_filter(ngx_http_request_t *r );

The HTTP response body filter handler takes two arguments, a pointer to ngx_http_request_t and a pointer to ngx_chain_t. It returns an ngx_int_t. The second argument, ngx_chain_t* is a linked list for the output buffers. Each buffer stores part of the HTTP response body.

Function prototype of a filter handler for HTTP response body.

static ngx_int_t ngx_http_html_head_body_filter(ngx_http_request_t *r, ngx_chain_t *in);

Our filter module will be parsing the content blocks in the ngx_chain_t* linked list; inserting our text after the <head> tag. Once it is done, it will call the next response body handler in the chain.

Note that the response body filter handler function can be called many times in a single request. This is due to the nature of asynchronous data access, non blocking I/O that enables nginx to be high performance. The filter handler is called once data is available. It will call the next filter handler in the chain when it has processed the current linked list of data buffers. The request though, may not have ended and there can be more data buffers coming.

ngx_http_top_header_filter is a global pointer for storing the first HTTP response headers filter handler, to be called by Nginx.

ngx_http_top_body_filter is a global pointer that stores the first HTTP response body filter handler, to be called by Nginx. These are used when registering our own handlers. We will see their usage in the source code later.

Module Config Shell File

To tell Nginx about the filter module, a config file is required. This is just a regular shell file. It tells Nginx, the module name, the module type and the module source code location. For more details on the config file and Nginx module, refer to the Nginx Development Guide. The Nginx Wiki provides information on the config file as well.

Implementation of the Nginx Response Body Filter

Let's run through the key functions and data structures in the source code for the Html Head filter module. The full source is available at the Github link at the bottom of the article.

The following is the listing for the config file of Html Head filter module. Note, the filename of the config file is "config". It specifies the type of the module, a name for the module and a single c source file that contains the module code.

1
2
3
4
5
6
7
ngx_module_type=HTTP_AUX_FILTER
ngx_module_name=ngx_http_html_head_filter_module
ngx_module_srcs="$ngx_addon_dir/ngx_http_html_head_filter_module.c"

. auto/module

ngx_addon_name=$ngx_module_name

ngx_http_html_head_filter_module.c is the filter source file. The 3 Nginx header files required for HTTP module development are included at the top of the source file. Four macros are defined at the top as well.

The following code listing shows these macros and include files.

1
2
3
4
5
6
7
8
#include <ngx_config.h>
#include <ngx_core.h>
#include <ngx_http.h>

#define HF_MAX_STACK_SZ 512
#define HF_MAX_CONTENT_SZ 128
#define HF_BUF_SIZE 4096
#define HF_MAX_CONTENT_LENGTH 10 * 1024 * 1024

The following is a brief explanation of each of the four macros. These are used as important parameters and limits in our filter module.

  • HF_MAX_STACK_SZ defines the size of the parsing stack, currently set to 512.

  • HF_MAX_CONTENT_SZ defines the maximum characters in a response body that the parser will look for the <head> tag. Currently set as 128 characters.

  • HF_BUF_SIZE defines the default buffer size used by the filter. Currently set to 4096.

  • HF_MAX_CONTENT_LENGTH defines the maximum content length that the filter will accept. Currently set as 10MiB.

Nginx Per Request/Respond Context

Nginx allows a module to keep state information per HTTP request/response through a data structure defined by the module. We define a structure ngx_http_html_head_filter_ctx_t that stores the state of processing a response. It includes a stack, headfilter_stack_t, used by the parser.

There are also a number of other members like count, which tracks the number of characters processed by the parser so far. The filter module expects to find the <head> tag in the first 128 characters of the response body.

The following shows the code for the per request/respond context structure and the parser stack.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
/*
stack for parsing html
*/
typedef struct 
{
u_char data[HF_MAX_STACK_SZ];
ngx_int_t top;
}
headfilter_stack_t;


/*
module data struct for maintaining
state per request/response
*/
typedef struct
{
ngx_uint_t  last;
ngx_uint_t  last_search; 
ngx_uint_t  count;
ngx_uint_t  found;
ngx_uint_t  index;
ngx_uint_t  starttag; 
ngx_uint_t  tagquote;
ngx_uint_t  tagsquote;
ngx_uint_t buffered;
headfilter_stack_t stack;
ngx_chain_t  *free;
ngx_chain_t  *busy;
ngx_chain_t  *out;
ngx_chain_t  *in;
ngx_chain_t  **last_out;
}
ngx_http_html_head_filter_ctx_t;

The index variable in ngx_http_html_head_filter_ctx_t structure, stores the current position parsed so far in the content block of a buffer (ngx_buf_t). If a <head> tag is found, index will point to the position of the closing bracket ">" in the content block of the current buffer. This information will be used for splitting up the buffers and inserting our text.

Structure members like found, last_search and last are flags to indicate certain conditions. found is set to true when the <head> tag is found. last_search is set when the characters limit of 128 is hit. last is set when the last buffer of the output is processed.

starttag, tagquote and tagsquote are used by the parser when parsing the content block. The buffered flag indicates whether the filter module is buffering output (holding back output).

The ngx_chain_t pointers, free, busy, out and in, are used together with the pointer to ngx_chain_t pointer, last_out, for handling the incoming and outgoing buffers chains. free and busy are required for buffer reuse. Refer to the Nginx Development Guide for more details on buffer reuse.

Saving and Retrieving Per Request/Response Context

Nginx offers two functions, ngx_http_set_ctx(r, ctx, module) and ngx_http_get_module_ctx(r, module) for saving and retrieving the module's per request/response context.

In our filter module implementation, ngx_http_set_ctx() function is called by the response headers filter handler when creating and initializing the per request/response context structure. The response body handler calls ngx_http_get_module_ctx() to retrieve the per request/response context structure.

If this structure is NULL, the response body handler will skip processing and call the next response body filter in the filter chain. The response headers filter handler will not create this context if certain checks failed. For example, if the content type is not "text/html" etc... You shall see this later in the source code.

Structure for Storing Module Configuration

The following is the data structure for storing the arguments of the configuration directives.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
/* 
Configuration struct for module
*/
typedef struct
{
ngx_str_t insert_text;
ngx_flag_t block;
}
ngx_http_html_head_filter_loc_conf_t; 

static ngx_http_output_header_filter_pt  ngx_http_next_header_filter;
static ngx_http_output_body_filter_pt    ngx_http_next_body_filter;

ngx_http_html_head_filter_loc_conf_t has a string field, insert_text, that holds the text to be inserted after the <head> tag. It has a block flag that indicates whether a blank html should be displayed if a <head> tag is not found in the first 128 characters of a response.

The two static variables ngx_http_next_header_filter and ngx_http_next_body_filter, are pointers for storing the next header filter and body filter in the Nginx chain of filters. These are set during initialization of our filter module and are called when our module has done its work.

Module Directives

The following listing shows the directives that our filter module will take. These directives are declared as a static array of ngx_command_t structures.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
/*
Module directives
*/
static ngx_command_t ngx_http_html_head_filter_commands[] =
{
   {
     ngx_string("html_head_filter"), //Module Directive name
     NGX_HTTP_LOC_CONF | NGX_CONF_1MORE, //Directive location and argument 
     ngx_conf_set_str_slot, //Handler function 
     NGX_HTTP_LOC_CONF_OFFSET, //Save to loc config 
     offsetof(ngx_http_html_head_filter_loc_conf_t, insert_text),//loc para
     NULL
   },
   
   {
     ngx_string("html_head_filter_block"), //Module Directive name
     NGX_HTTP_LOC_CONF | NGX_CONF_FLAG, //Directive location and argument
     ngx_conf_set_flag_slot, //Handler function 
     NGX_HTTP_LOC_CONF_OFFSET, //Save to loc config 
     offsetof(ngx_http_html_head_filter_loc_conf_t, block),//loc para
     NULL
   },
   
   ngx_null_command
};

ngx_http_html_head_filter_commands[ ] is an array of ngx_command_t, it holds 2 directives for our filter module and is terminated by a ngx_null_command.

The first directive structure (ngx_command_t) that is defined is "html_head_filter".

1
2
3
4
5
6
7
8
  {
     ngx_string("html_head_filter"), //Module Directive name
     NGX_HTTP_LOC_CONF | NGX_CONF_1MORE, //Directive location and argument 
     ngx_conf_set_str_slot, //Handler function 
     NGX_HTTP_LOC_CONF_OFFSET, //Save to loc config 
     offsetof(ngx_http_html_head_filter_loc_conf_t, insert_text),//loc para
     NULL
   }

The following describes its individual fields.

  • Its first field is simply the directive name, an ngx_str_t, "html_head_filter".

  • The second field is a bitmask that defines where this directive can occur in the nginx configuration file (NGX_HTTP_LOC_CONF) and the number of arguments (NGX_CONF_1MORE) that it takes. In our case, we specify that this directive can occur in the location context in nginx configuration file and takes 1 or more argument. The argument is a string, the text to be inserted after the <head> tag.

  • The third field is the handler function that is called to read in our directive and set its argument. In this case, we use some of the set functions provided by Nginx. ngx_conf_set_str_slot( ) will read a string argument and save it in our module configuration structure.

  • The fourth field, NGX_HTTP_LOC_CONF_OFFSET, tells the handler function that our module configuration structure is a location configuration.

  • The fifth field, specifies the offset for saving the argument. In this case, the argument should be saved in our ngx_http_html_head_filter_loc_conf_t module configuration structure in the insert_text variable.

  • The sixth field, allows the specification of a post handler that can be used for further initialization of the directive argument. In our case, we are not using this and set it to NULL.

Note, that the "html_head_filter" directive is required in order to enable the filter module. If this directive is not set in the nginx configuration, our filter module will skip processing.

The second directive is "html_head_filter_block", we will not go through its fields in detail. It is structured similarly like the first directive. The directive takes a on/off flag that determines whether blocking mode is enabled or not. In blocking mode, an empty page will be sent if the <head> tag is not found within the first 128 characters of the response body.

This is an optional directive. By default blocking is not enabled.

Nginx Module Context

The module context, ngx_http_html_head_filter_ctx, sets three function handlers. The following shows the code listing.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
/*
Module context 
*/
static ngx_http_module_t  ngx_http_html_head_filter_ctx =
{
    NULL, //Pre config
    ngx_http_html_head_init, //Post config
    NULL, //Create main config
    NULL, //Init main config
    NULL, //Create server config
    NULL, //Merge server config
    ngx_http_html_head_create_conf, //Create loc config
    ngx_http_html_head_merge_loc_conf //Merge loc config
};

ngx_http_html_head_init( ) is used for initializing the module after configuration is done and ngx_http_html_head_create_conf( ) is for creating the module configuration structure. ngx_http_html_head_merge_loc_conf( ) function is used for merging configuration directives from parent location contexts in the nginx configuration file.

More details of these 3 functions are provided below.

The module initialization function

The ngx_http_html_head_init( ) function initializes the module and registers our handlers in the filter chain. This function is set in the post configuration field of the module context earlier. Nginx will call it after the configuration has been read.

The module's header filter and body filter handler functions are assigned to the global ngx_http_top_header_filter and ngx_http_top_body_filter pointers respectively. Nginx will call these and hence invoke our filter handlers.

The original function handlers in these 2 global pointers are saved in ngx_http_next_header_filter and ngx_http_next_body_filter respectively. When our module completes its work, it will in turn call these saved function handlers. This establishes the Nginx filter chain, enabling one filter to call the next until the last in the filter chain.

The following shows the source code for the ngx_http_html_head_init( ) function.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
/* Function to initialize the module */
static ngx_int_t
ngx_http_html_head_init(ngx_conf_t * cfg)
{

    ngx_http_next_header_filter = ngx_http_top_header_filter;
    ngx_http_top_header_filter = ngx_http_html_head_header_filter;

    ngx_http_next_body_filter = ngx_http_top_body_filter;
    ngx_http_top_body_filter = ngx_http_html_head_body_filter;
 
    return NGX_OK;

}

The module configuration creation and merge functions

The following shows the code snippets for the ngx_http_html_head_create_conf( ) and ngx_http_html_head_merge_loc_conf( ) functions.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
/* Creates the module location config struct */
static void* 
ngx_http_html_head_create_conf(ngx_conf_t *cf)
{

    ngx_http_html_head_filter_loc_conf_t *conf;
    conf = ngx_pcalloc(cf->pool, sizeof(ngx_http_html_head_filter_loc_conf_t));
    if(conf == NULL)
    {
        ngx_conf_log_error(NGX_LOG_EMERG, cf, 0,
            "[Html_head filter]: ngx_http_html_head_create_conf: "
            " cannot allocate memory for config");
        return NGX_CONF_ERROR;
    }

    conf->block = NGX_CONF_UNSET;
    return conf;

}

/* Merges the module location config struct */
static char* 
ngx_http_html_head_merge_loc_conf(ngx_conf_t *cf,                 
    void *parent, void *child) 
{

    ngx_http_html_head_filter_loc_conf_t *prev = parent;
    ngx_http_html_head_filter_loc_conf_t *conf = child;

    ngx_conf_merge_value(conf->block, prev->block, 0);
    ngx_conf_merge_str_value(conf->insert_text, prev->insert_text, '\0');

   return NGX_CONF_OK;

}

The ngx_http_html_head_create_conf( ) function creates our module configuration structure for saving our directives. The ngx_http_html_head_merge_loc_conf( ) function merges directives that appears in parent locations with that appearing in child locations.

Nginx Module Definition

The array of module directives, the module context and module type are specified in the ngx_module_t structure. This is the module definition discussed in the earlier section. The following shows the code.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
/*
Module definition
*/
ngx_module_t  ngx_http_html_head_filter_module = 
{
    NGX_MODULE_V1,
    &ngx_http_html_head_filter_ctx,     /* module context */
    ngx_http_html_head_filter_commands, /* module directives */
    NGX_HTTP_MODULE,                    /* module type */
    NULL,                                  
    NULL,                                  
    NULL,                                  
    NULL,                                  
    NULL,                                  
    NULL,                                 
    NULL,                                  
    NGX_MODULE_V1_PADDING
};

The response headers filter function

The following shows the code listing for the ngx_http_html_head_header_filter() function. This is the handler that is registered earlier by the module initialization function. It process the incoming HTTP response headers, does some checks and initialize the module per request/response context for managing state.

If some of the checks failed, the context will not be created. The current response headers will be passed unmodified to the next headers filter handler. Some examples of checks failing include, the "html_head_filter" directive is not set, or if the HTTP response is compressed.

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
/*Module function handler to filter http response headers */
static ngx_int_t
ngx_http_html_head_header_filter(ngx_http_request_t *r )
{

    ngx_http_html_head_filter_loc_conf_t *slcf;
    ngx_http_html_head_filter_ctx_t *ctx;
    ngx_uint_t content_length=0; 

    slcf = ngx_http_get_module_loc_conf(r, ngx_http_html_head_filter_module);
    
    
    if(slcf == NULL)
    {
        #if HT_HEADF_DEBUG
            ngx_log_debug0(NGX_LOG_DEBUG_HTTP, r->connection->log, 0,
                "[Html_head filter]: ngx_http_html_head_header_filter: "
                "null configuration");
        #endif
       
        return ngx_http_next_header_filter(r);
    }
    

    if(slcf->insert_text.len == 0)
    {
        #if HT_HEADF_DEBUG
            ngx_log_debug0(NGX_LOG_DEBUG_HTTP, r->connection->log, 0,
                "[Html_head filter]: ngx_http_html_head_header_filter: "
                " empty configuration insert text");
        #endif
        
        return ngx_http_next_header_filter(r);
    }
    

    if(r->headers_out.content_type.len == 0 || 
        r->headers_out.content_length_n == 0 ||
        r->header_only || 
        r->headers_out.content_length_n > HF_MAX_CONTENT_LENGTH )
    {
        #if HT_HEADF_DEBUG
            ngx_log_debug0(NGX_LOG_DEBUG_HTTP, r->connection->log, 0,
                "[Html_head filter]: ngx_http_html_head_header_filter: "
                "empty content type, header only, invalid content length");
        #endif
        
        return ngx_http_next_header_filter(r);
    }
    
     
    if(ngx_test_content_type(r) == 0) 
    {
        #if HT_HEADF_DEBUG
            ngx_log_debug0(NGX_LOG_DEBUG_HTTP, r->connection->log, 0,
                "[Html_head filter]: ngx_http_html_head_header_filter: "
                "content type not html");
        #endif            
        
        return ngx_http_next_header_filter(r);
    }

    
    if(ngx_test_content_compression(r) != 0)
    {/* Compression enabled, don't filter  */ 

        #if HT_HEADF_DEBUG
            ngx_log_debug0(NGX_LOG_DEBUG_HTTP, r->connection->log, 0,
                "[Html_head filter]: ngx_http_html_head_header_filter: "
                "compression enabled");
        #endif    
                     
        return ngx_http_next_header_filter(r);
    }
 
    if(r->headers_out.status != NGX_HTTP_OK)
    {/* Response is not HTTP 200   */

        #if HT_HEADF_DEBUG
            ngx_log_debug0(NGX_LOG_DEBUG_HTTP, r->connection->log, 0,
                "[Html_head filter]: ngx_http_html_head_header_filter: "
                "http response is not 200");
        #endif   
                     
        return ngx_http_next_header_filter(r);
    }

    r->filter_need_in_memory = 1;

    if (r == r->main) 
    {//Main request
        content_length = r->headers_out.content_length_n + 
            slcf->insert_text.len;
            
        #if HT_HEADF_DEBUG
            ngx_log_debug2(NGX_LOG_DEBUG_HTTP, r->connection->log, 0,
                "[Html_head filter]: ngx_http_html_head_header_filter: "
                "content length prev : %ui, new : %ui", 
				r->headers_out.content_length_n,
				content_length);
        #endif 
        
        r->headers_out.content_length_n = content_length;      
    }
    

    ctx = ngx_http_get_module_ctx(r, ngx_http_html_head_filter_module);
    if(ctx == NULL)
    {
        ctx = ngx_pcalloc(r->pool, 
                sizeof(ngx_http_html_head_filter_ctx_t)); 
        
        if(ctx == NULL)
        {
            #if HT_HEADF_DEBUG
                ngx_log_debug0(NGX_LOG_DEBUG_HTTP, r->connection->log, 0,
                "[Html_head filter]: ngx_http_html_head_header_filter: "
                "cannot allocate ctx memory");
            #endif 
                          
            return ngx_http_next_header_filter(r);
        }
        
        ngx_http_set_ctx(r, ctx, ngx_http_html_head_filter_module);
    }
    
    /* Intializes the last output chain */
    ctx->last_out = &ctx->out;
    
    return ngx_http_next_header_filter(r);
    
}

The response body filter function

The following is the code listing for the ngx_http_html_head_body_filter() function. Like the header filter handler, this function is registered by the module initialization function.

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
/*
Module function handler to filter the html response body
and insert the text string
*/
static ngx_int_t
ngx_http_html_head_body_filter(ngx_http_request_t *r, ngx_chain_t *in)
{

    ngx_buf_t                               *b;
    ngx_int_t                               rc;
    ngx_http_html_head_filter_ctx_t         *ctx;
    ngx_http_html_head_filter_loc_conf_t    *slcf;
   
  
    slcf = ngx_http_get_module_loc_conf(r, ngx_http_html_head_filter_module);
    ctx = ngx_http_get_module_ctx(r, ngx_http_html_head_filter_module);

    
    if(slcf == NULL)
    {
        #if HT_HEADF_DEBUG
            ngx_log_debug0(NGX_LOG_DEBUG_HTTP, r->connection->log, 0,
                "[Html_head filter]: ngx_http_html_head_body_filter: "
                "null configuration");
        #endif
       
        return ngx_http_next_body_filter(r, in);
    }


    if(ctx == NULL)
    {
        #if HT_HEADF_DEBUG
            ngx_log_debug0(NGX_LOG_DEBUG_HTTP, r->connection->log, 0,
                "[Html_head filter]: ngx_http_html_head_body_filter: "
                "unable to get module ctx");
        #endif           
            
        return ngx_http_next_body_filter(r, in);
    }


    if(in == NULL)
    {
        #if HT_HEADF_DEBUG
            ngx_log_debug0(NGX_LOG_DEBUG_HTTP, r->connection->log, 0,
                "[Html_head filter]: ngx_http_html_head_body_filter: "
                "input chain is null");
        #endif     
        
       return ngx_http_next_body_filter(r, in);
    }
	
   
    /* Copy the incoming chain to ctx-in */
    if (ngx_chain_add_copy(r->pool, &ctx->in, in) != NGX_OK) 
    {
        ngx_log_error(NGX_LOG_ERR, r->connection->log, 0, 
            "[Html_head filter]: ngx_http_html_head_body_filter: "
            "unable to copy input chain - in");
                     
        return NGX_ERROR;
    }
    
    
    /* Loop through and process all the incoming buffers */
    while(ctx->in)
    {	
        ctx->index = 0; 
                
        if(ctx->found == 0 && ctx->last_search == 0)
        {		 
    
            rc = ngx_parse_buf_html(ctx, r);
            if(rc == NGX_OK)
            { /* <head> is found */
                ctx->found = 1; 
                rc=ngx_html_insert_output(ctx, r, slcf);
			   
                if(rc == NGX_ERROR)
                {
                    return rc; 
                }
            }
            else if(rc == NGX_ERROR)
            {
                ctx->last_search = 1;
            }	
        }	

        b = ctx->in->buf;

        if(b->last_buf || b->last_in_chain)
        {/* Last buffer  */
           ctx->last = 1; 
        }		
    
        *ctx->last_out=ctx->in;
        ctx->last_out=&ctx->in->next;
        ctx->in = ctx->in->next;
    }

    
    if(!ctx->found)
    {/* html head tag not found */

        if(slcf->block)
        {/* blocking mode */
            
            if(ctx->count < HF_MAX_CONTENT_SZ && ctx->last == 0)
            {/* Hasn't hit maximum characters yet and not last buffer*/
        
                /* Buffer output */   
                return ngx_http_html_head_buffer_output(r, ctx);
            }
            
            if(ctx->count >= HF_MAX_CONTENT_SZ || ctx->last)
            {/* Hit maximum characters limit or last buffer send empty page */
                return ngx_http_html_head_output_empty(r, ctx);
            }
            
        }
        else
        {/* non blocking mode */
    
            /*Log alert if last buffer*/
            if(ctx->last)
            {
                ngx_log_error(NGX_LOG_ALERT, r->connection->log, 0,
                    "[Html_head filter]: cannot find <head> "
                    "non blocking");
            }
            
            return ngx_http_html_head_output(r, ctx);
        }
        
    }
    
    
    /* html head tag found, send modified output */
    return  ngx_http_html_head_output(r, ctx);
    
}

Notice that the code follows the logical flow diagram closely. The logical flow diagram serves as a blueprint and it clarifies the logic of our code here.

The while loop on line 67 iterates through the incoming chain of buffers and call ngx_parse_buf_html( ) function to parse each buffer for the <head> tag. The <head> tag can be split and spanned two or more consecutive buffers; the parser through the use of the stack can handle and track this easily.

If the <head> tag is found, the found flag in the module per request/response context is set and ngx_html_insert_output( ) function is called. ngx_html_insert_output( ) will insert our text after the <head> tag. The process for doing this is described in the earlier Design and Approach section. The text insertion is done in a single pass of the incoming buffers chain.

If <head> tag is not found after the first 128 characters, the last_search flag is set in the per request/response context. This stops the ngx_parse_buf_html( ) from being called on subsequent buffers, speeding up performance.

The found flag also prevents ngx_parse_buf_html( ) from being called on subsequent buffers once the <head> tag is found. It also ensures that the text will only be inserted once, after the occurence of the first <head> tag even if there are multiple <head> tags in a response body. The while loop builds the output chain that will be passed to the next nginx filter.

After processing the buffers, the logic flow goes to the <head> tag found or not found state. It is just like the steps shown in the logical flow diagram. If the <head> tag is found, the output chain is sent to the next filter using the ngx_http_html_head_output( ) function.

In the not found state, the mode is checked to see if it is blocking or non blocking. For non blocking mode, the output chain is sent using ngx_http_html_head_output( ). In the blocking mode though, if the 128 characters limit is hit, ngx_http_html_head_output_empty( ) function is called to send an empty page.

Otherwise, the function ngx_http_html_head_buffer_output( ) is called to hold back the output chain, buffering the output.

Explaining ctx->in, ctx->out, ctx->last_out

Before we go through the code snippet for the 3 functions ngx_http_html_head_output( ), ngx_http_html_head_output_empty( ) and ngx_http_html_head_buffer_output( ), let's run through how the filter module actually handles the incoming buffers chain of the response body.

ctx->in and ctx->out are both pointers of ngx_chain_t. ctx->last_out is a pointer to a pointer of ngx_chain_t. When our response body handler, ngx_http_html_head_body_filter( ), is called; it is passed an incoming linked list of ngx_chain_t containing the buffers storing the response content. This linked list is copied to ctx->in. From that point on, our filter module will work on our own linked list, ctx->in.

The copying is done because our filter module may be replacing the buffers in the linked list of ngx_chain_t. These input chain of buffers in ctx->in are then processed and placed in ctx->out. ctx->out points to the head of the linked list of ngx_chain_t containing the buffers to be sent out.

To faciliate the placement of processed buffers into ctx->out, the pointer to pointer, ctx->last_out is used. ctx->last_out is initialized to the address of ctx->out, head of the output list in the ngx_http_html_head_header_filter( ) function. As and when buffer chain are added to ctx->out, ctx->last_out is updated to the address of the next chain.

ctx->last_out always point to the address of the next output chain. When the output chain is sent out in the two output functions, ngx_http_html_head_output( ) and ngx_http_html_head_output_empty( ), ctx->last_out is reinitialized to the address of ctx->out. When new buffer chains are available for our filter to process, ctx->last_out will be ready to add these to ctx->out.

For the case where output is buffered in the ngx_http_html_head_buffer_output( ) function, ctx->last_out is set to the address of the last chain in our buffered chain. When new buffer chains are available, these will be appended to our buffered output chain until the output can finally be sent out through either ngx_http_html_head_output( ) or ngx_http_html_head_output_empty( ) functions.

The output function

The following shows the code snippet for ngx_http_html_head_output( ) function. It sends the output chain to the next response filter in nginx filter chain. Notice that it pads up the output with blank data if <head> tag is not found and the last buffer has been processed.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
/* 
Function to send output to the next body filter
*/
static ngx_int_t
ngx_http_html_head_output(ngx_http_request_t *r, 
    ngx_http_html_head_filter_ctx_t *ctx)
{
    u_char                                  *padding;
    ngx_buf_t                               *b;
    ngx_int_t                               rc;
    ngx_chain_t                             *cl;
    ngx_http_html_head_filter_loc_conf_t    *slcf;
   
  
    slcf = ngx_http_get_module_loc_conf(r, ngx_http_html_head_filter_module);
    
    if(slcf == NULL)
    {
        
        ngx_log_error(NGX_LOG_ERR, r->connection->log, 0, 
            "[Html_head filter]: ngx_http_html_head_output: "
            "null configuration");
       
        return NGX_ERROR;
    }
    

    if(ctx->last && ctx->found == 0)
    {/* Append additional buffer to make up for content length */
     
        cl = ngx_chain_get_free_buf(r->pool, &ctx->free);
        if (cl == NULL) 
        {
            ngx_log_error(NGX_LOG_ERR, r->connection->log, 0, 
                "[Html_head filter]:  ngx_http_html_head_output: "
                "unable to allocate output chain memory");
                
            return NGX_ERROR;
        }
        
        padding = ngx_pcalloc(r->pool, sizeof(u_char) * slcf->insert_text.len);
        if (padding == NULL)
        {
            ngx_log_error(NGX_LOG_ERR, r->connection->log, 0, 
                "[Html_head filter]:  ngx_http_html_head_output: "
                "unable to allocate output buffer data memory");
                
            return NGX_ERROR;
        }
    
        b = cl->buf;
        ngx_memzero(b, sizeof(ngx_buf_t));
        
        b->tag = (ngx_buf_tag_t) &ngx_http_html_head_filter_module;
        b->memory = 1;
        b->pos = padding;
        b->last = padding + (sizeof(u_char) * slcf->insert_text.len);
        b->start = b->pos;
        b->end = b->last; 
        b->recycled = 1; 
        b->last_buf = (r == r->main) ? 1: 0;
        b->last_in_chain = 1;
    
        *ctx->last_out = cl;
        ctx->last_out = &cl->next;
    }
    
    rc = ngx_http_next_body_filter(r, ctx->out);
    ngx_chain_update_chains(r->pool, &ctx->free, &ctx->busy, &ctx->out,
                            (ngx_buf_tag_t)&ngx_http_html_head_filter_module);
    
    ctx->last_out = &ctx->out;    
    ctx->in = NULL; 
    
    if(ctx->buffered && ctx->last)
    {
         r->connection->buffered &= ~NGX_HTTP_SUB_BUFFERED;
    }
    
                                                    
    return rc;
}

The padding ensures that the output matches the modified content length header despite our text not being inserted into the output.

The output empty page function

The following shows the code listing for ngx_http_html_head_output_empty( ) fucntion. This function outputs an empty page.

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
/* Function to send empty page */
static ngx_int_t
ngx_http_html_head_output_empty(ngx_http_request_t *r, 
    ngx_http_html_head_filter_ctx_t *ctx)
{
    size_t        i, quotient, remainder;
    u_char        *empty_content; 
    ngx_buf_t     *b;
    ngx_int_t     rc;
    ngx_uint_t    content_length = 0;
    ngx_chain_t   *cl, **ll;
    
    
    ngx_log_error(NGX_LOG_ALERT, r->connection->log, 0,
        "[Html_head filter]: ngx_http_html_head_output_empty: "
        "cannot find <head> blocking");
        
    content_length = r->headers_out.content_length_n; 
    
    /* Ensure that content length is a sane value */    
    if (r->headers_out.content_length_n == -1 
        || content_length > HF_MAX_CONTENT_LENGTH)
    {
        #if HT_HEADF_DEBUG
            ngx_log_debug0(NGX_LOG_DEBUG_HTTP, r->connection->log, 0,
                "[Html_head filter]: ngx_http_html_head_output_empty: "
                "Unsafe content length detected, setting to sane value" 
                );
        #endif 

        content_length = HF_BUF_SIZE; 
        /* Fall back to keepalive = 0 */
        r->keepalive = 0;
    }        
    
  
    quotient = content_length / HF_BUF_SIZE;
    remainder = content_length % HF_BUF_SIZE; 
    
    if (remainder > 0)
    {
        quotient = quotient + 1; 
    }
    
    empty_content = ngx_pcalloc(r->pool, sizeof(u_char) * HF_BUF_SIZE);
    if (empty_content == NULL) 
    {
        ngx_log_error(NGX_LOG_ERR, r->connection->log, 0, 
            "[Html_head filter]: ngx_http_html_head_output_empty: "
            "unable to allocate empty content memory");
        return NGX_ERROR;
    }
    
    #if HT_HEADF_DEBUG
        ngx_log_debug3(NGX_LOG_DEBUG_HTTP, r->connection->log, 0,
            "[Html_head filter]: ngx_http_html_head_output_empty: "
            "content length: %ui quotient: %ui remainder: %ui", 
            content_length, quotient, remainder);
    #endif         
    
    
    ll = &ctx->out; 
    
    for (i = 0; i < quotient; i++)
    {
        cl = ngx_chain_get_free_buf(r->pool, &ctx->free);
        
        if (cl == NULL) 
        {
            ngx_log_error(NGX_LOG_ERR, r->connection->log, 0, 
                "[Html_head filter]: ngx_http_html_head_output_empty: "
                "unable to allocate output chain memory");
            return NGX_ERROR;
        }
    
        b = cl->buf ;
        ngx_memzero(b, sizeof(ngx_buf_t));
        
        b->tag = (ngx_buf_tag_t) &ngx_http_html_head_filter_module;
        b->memory = 1;
        b->pos = empty_content;
        b->last = empty_content + (sizeof(u_char) * HF_BUF_SIZE);
        b->start = b->pos;
        b->end = b->last; 
        b->recycled = 1; 
        b->last_buf = 0;
        b->last_in_chain = 0; 
        
        if (i == (quotient - 1))
        {/* last iteration */
         /* Set the content size to the remaining remainder */
           b->last = empty_content + remainder;
           
           b->last_buf = (r == r->main) ? 1: 0;
           b->last_in_chain = 1;
        }
        
        *ll = cl;
        ll = &cl->next; 
        
    }
    
    
    ctx->last = 1;
    
    rc = ngx_http_next_body_filter(r, ctx->out);
    ngx_chain_update_chains(r->pool, &ctx->free, &ctx->busy, &ctx->out,
        (ngx_buf_tag_t)&ngx_http_html_head_filter_module);
        
    
    ctx->last_out = &ctx->out;    
    ctx->in = NULL; 
        
    
    if(ctx->buffered)
    {
         r->connection->buffered &= ~NGX_HTTP_SUB_BUFFERED;
    }
    
    return rc; 
}

Notice that the empty page is sized correctly using the content length. A check is also done to make sure that the content length is a sane value. For example, the backend server can be sending chunked transfer encoding to Nginx, the content length will not be valid (-1), it can be a large enormous value.

Since the empty page is constructed using blank memory buffers allocated based on the content length; resource exhaustion can happen. The check will prevent such a scenario. Notice that the request or connection keep alive is set to 0, if a large content length is detected. A small blank content is then sent, disabling the keep alive will help browsers know that there is no more content.

The output buffering function

The following list the code snippet for the ngx_http_html_head_buffer_output( ) function. This function buffers the output instead of sending it to the next filter.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
/* Function to buffer output */
static ngx_int_t
ngx_http_html_head_buffer_output(ngx_http_request_t *r, 
    ngx_http_html_head_filter_ctx_t *ctx)
{
    size_t        sz, alloc_sz; 
    ngx_chain_t   *cl, *tmp, **ll;
    
    
    if(ctx->buffered == 0)
    {
        ctx->buffered = 1;
        r->connection->buffered |= NGX_HTTP_SUB_BUFFERED;
    }
    
    
    ll = &ctx->out; 
    tmp = ctx->out; 
    
    
    /* Replace all the output chain buffers with our own*/
    while(tmp)
    {
        
        if(tmp->buf->tag == (ngx_buf_tag_t) &ngx_http_html_head_filter_module)
        {/* our own allocated buffer skip to next */
             tmp = tmp->next; 
             continue; 
        }
        
        cl = ngx_chain_get_free_buf(r->pool, &ctx->free);
        
        if (cl == NULL) 
        {
            ngx_log_error(NGX_LOG_ERR, r->connection->log, 0, 
                "[Html_head filter]: ngx_http_html_head_buffer_output: "
                "unable to allocate output chain memory");
                
            return NGX_ERROR;
        }
        
        ngx_memzero(cl->buf, sizeof(ngx_buf_t));
        
        /* Size the memory for buf data*/
        sz = ngx_buf_size(tmp->buf);
        alloc_sz = HF_BUF_SIZE;
        while (sz > alloc_sz)
        {
            alloc_sz = alloc_sz * 2; 
        }
        
        cl->buf->start =  ngx_palloc(r->pool, alloc_sz);
        
        if(cl->buf->start == NULL)
        {
            ngx_log_error(NGX_LOG_ERR, r->connection->log, 0, 
                "[Html_head filter]: ngx_http_html_head_buffer_output: "
                "unable to allocate output chain buffer data memory");
                
            return NGX_ERROR;
            
        }
        
        cl->buf->pos  = cl->buf->start; 
        cl->buf->last = cl->buf->start;
        cl->buf->end = cl->buf->start + alloc_sz; 
        cl->buf->tag = (ngx_buf_tag_t) &ngx_http_html_head_filter_module;
        cl->buf->recycled = 1; 
        cl->buf->temporary = 1; 
        
        cl->buf->last = ngx_copy(cl->buf->last, tmp->buf->pos, sz);
        
        /* Attach our own buffer chain to our context output chain */
        *ll = cl;
        ll = &cl->next;
        
        /* Consume the output chain buffer */
        if (tmp->buf->recycled) 
        {
            tmp->buf->pos = tmp->buf->last;
        }
        
        tmp = tmp->next; 
    }
    
    /* Update the last output chain address to our own chain*/
    ctx->last_out = ll; 
    
    return NGX_OK; 
    
}

Output buffering is necessary so that the filter module can be effective at blocking pages or html content that don't have a <head> tag. After processing a chain of buffers, it is possible that the filter module is at a stage where it is not sure whether subsequent chain of buffers will have the <head> tag or not.

In blocking mode, the filter cannot send the processed buffers to the next filter, to avoid leaking data that it should be blocking. The output therefore has to be buffered until the filter module is able to determine whether the <head> tag is found or not.

Buffering is done by creating our own chain, our own buffers and content storage. The existing output is copied over to our buffers chain. The existing output buffers are "consumed" so that these can be recycled. An NGX_OK status is returned to nginx.

The r->connection->buffered bitmask is set appropriately to indicate output buffering. This bitmask setting is reset by the other two output functions, ngx_http_html_head_output_empty( ) and ngx_http_html_head_output( ) when output can be finally sent out.

The html tag parser function

The following lists the code for the ngx_parse_buf_html() function.

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
/*
Parses the buffer to look for the <head> tag
Returns NGX_OK if found, 
        NGX_AGAIN if not found in this buffer,
        NGX_ERROR if an error occurs.
*/
static ngx_int_t 
ngx_parse_buf_html(ngx_http_html_head_filter_ctx_t *ctx, 
                   ngx_http_request_t *r)
{
    u_char *p, c;
    ngx_int_t rc;
    ngx_buf_t* buf;
	
    if(ctx->in == NULL)
    {
        ngx_log_error(NGX_LOG_ERR, r->connection->log, 0, 
            "[Html_head filter]: ngx_parse_buf_html: "
            "ctx->in is NULL");  
            
        return NGX_ERROR;
    }
		
    buf = ctx->in->buf; 

    for(p=buf->pos; p < buf->last; p++)
    {

        c = *p;
        if(ctx->count == HF_MAX_CONTENT_SZ)
        {
            ngx_log_error(NGX_LOG_WARN, 
               r->connection->log, 0, 
               "[Html_head filter]: ngx_parse_buf_html: "
               "unable to find <head> tag within 128 characters");
               
            return NGX_ERROR;
        } 
        
        switch(c)
        {
            case '<':

                ctx->starttag=1;
                if(!ctx->tagquote && ! ctx->tagsquote)
                {
                   ngx_init_stack(&ctx->stack);
                }

                if(push(c, &ctx->stack) == -1)
                {
                      ngx_log_error(NGX_LOG_ERR, r->connection->log, 0, 
                        "[Html_head filter]: ngx_parse_buf_html: "
                        "parse stack is full");  
                         
                      return NGX_ERROR;
                }
                
                break;

            case '>':

                if(ctx->starttag)
                {
                    if(push(c, &ctx->stack) == -1)
                    {
                        ngx_log_error(NGX_LOG_ERR, r->connection->log, 0, 
                            "[Html_head filter]: ngx_parse_buf_html: "
                            "parse stack is full");  
                            
                        return NGX_ERROR;
                    }

                    if(!ctx->tagquote && !ctx->tagsquote)
                    {    
                        ctx->starttag = 0; 
                        //Process the tag
                        rc = ngx_process_tag(ctx,r);

                        if(rc == NGX_OK)
                        {
                            return NGX_OK;
                        }
                        else if(rc == NGX_ERROR)
                        {
                            return NGX_ERROR; 
                        }
                
                    }
                }

                break;

            case '\"':

                if(ctx->starttag && ctx->tagsquote==0 && ctx->tagquote==0 )
                {
                    ctx->tagquote=1;
                    if(push(c, &ctx->stack) == -1)
                    {
                        ngx_log_error(NGX_LOG_ERR, r->connection->log, 0, 
                            "[Html_head filter]: ngx_parse_buf_html: "
                            "parse stack is full");  
                            
                        return NGX_ERROR;
                    }
                }
                else if(ctx->starttag && ctx->tagsquote==0 && ctx->tagquote)
                {
                    ctx->tagquote=0; 
                    if(push(c, &ctx->stack) == -1)
                    {
                         ngx_log_error(NGX_LOG_ERR, r->connection->log, 0, 
                            "[Html_head filter]: ngx_parse_buf_html: "
                            "parse stack is full");
                            
                        return NGX_ERROR;
                    }
            
                }
                else if(ctx->starttag && ctx->tagsquote)
                {
                    if(push(c, &ctx->stack) == -1)
                    {
                        ngx_log_error(NGX_LOG_ERR, r->connection->log, 0, 
                            "[Html_head filter]: ngx_parse_buf_html: "
                            "parse stack is full");
                            
                        return NGX_ERROR;
                    }
                }
          
                break;

            case '\'':

                if(ctx->starttag && ctx->tagquote == 0 && ctx->tagsquote == 0)
                {
                    ctx->tagsquote = 1;
                    if(push(c, &ctx->stack) == -1)
                    {
                        ngx_log_error(NGX_LOG_ERR, r->connection->log, 0, 
                            "[Html_head filter]: ngx_parse_buf_html: "
                            "parse stack is full");
                            
                        return NGX_ERROR;
                    }  
                }   
                else if(ctx->starttag && ctx->tagquote==0 && ctx->tagsquote)
                {
                    ctx->tagsquote = 0;
                    if(push(c, &ctx->stack) == -1)
                    {
                         ngx_log_error(NGX_LOG_ERR, r->connection->log, 0, 
                            "[Html_head filter]: ngx_parse_buf_html: "
                            "parse stack is full");
                            
                        return NGX_ERROR;
                    }
                } 
                else if(ctx->starttag && ctx->tagquote)
                {
                    if(push(c, &ctx->stack) == -1)
                    {
                        ngx_log_error(NGX_LOG_ERR, r->connection->log, 0, 
                            "[Html_head filter]: ngx_parse_buf_html: "
                            "parse stack is full");
                            
                        return NGX_ERROR;
                    }
                }

                break;

            default:
         
                if(ctx->starttag)
                {
                    if(push(c, &ctx->stack) == -1)
                    {
                         ngx_log_error(NGX_LOG_ERR, r->connection->log, 0, 
                            "[Html_head filter]: ngx_parse_buf_html: "
                            "parse stack is full");
                            
                        return NGX_ERROR;
                    }
                }

        }

        ctx->count++;
        ctx->index++;
    }

    return NGX_AGAIN;
}

The function goes through the character stream in a buffer and looks for the four tokens <, ", ', >. The < token indicates a starting html tag. The stack is initialized and the token pushed into the stack. Subsequent characters that are not a token, are pushed into the stack. If a double quote or single quote is encountered, toggling flags for the respective quote is set. Any > that comes after either quotation will not be interpreted as an html ending tag. Any < that comes after a quotation will not be interpreted as a start tag.

The relevant quotation flags are reset when a second double quote or single quote is encountered. A subsequent > will then be treated as an end tag. The parser will then call the function ngx_process_tag() to check if the html tag in the stack is a <head>. Leading and trailing spaces in the tag are ignored and the check is case insensitive. However, the <head> tag cannot contain attributes.

Some examples will make this clearer. <   HeAD> is considered valid, while <Head id=1> is invalid. The parser function returns NGX_OK if a valid <head> tag is found, it returns NGX_AGAIN to indicate processing can continue with subsequent buffers and NGX_ERROR if an error occurs.

The text insertion function

We will list one more function, the ngx_html_insert_output( ) function that will insert our text into the buffer chains. The following is the code snippet for ngx_html_insert_output( ).

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
/*
Insert the text into body response buffer
*/
static ngx_int_t 
ngx_html_insert_output(ngx_http_html_head_filter_ctx_t *ctx, 
                       ngx_http_request_t *r, 
                       ngx_http_html_head_filter_loc_conf_t *slcf)
{

    ngx_chain_t  *cl, *ctx_in_new, **ll;
    ngx_buf_t  *b;

    if(ctx->in == NULL)
    {
        ngx_log_error(NGX_LOG_ERR, r->connection->log, 0, 
             "[Html_head filter]: ngx_html_insert_output: "
             "ctx->in is NULL");
             
        return NGX_ERROR;
    }

				   
    ll = &ctx_in_new;				   
    b=ctx->in->buf;
   
    if(b->pos + ctx->index + 1 > b->last)
    {//Check that the head tag position does not exceed buffer
        ngx_log_error(NGX_LOG_ERR, r->connection->log, 0, 
            "[Html_head filter]: ngx_html_insert_output: "
            "invalid input buffer at text insertion");
            
        return NGX_ERROR;          
    }

    cl = ngx_chain_get_free_buf(r->pool, &ctx->free);
    if (cl == NULL) 
    {
        ngx_log_error(NGX_LOG_ERR, r->connection->log, 0, 
            "[Html_head filter]: ngx_html_insert_output: "
            "unable to allocate output chain memory");
            
        return NGX_ERROR;
    }

    b=cl->buf;
    ngx_memzero(b, sizeof(ngx_buf_t));
   
    b->tag = (ngx_buf_tag_t) &ngx_http_html_head_filter_module;
    b->memory=1;
    b->pos = ctx->in->buf->pos;
    b->last = b->pos + ctx->index + 1;
    b->start = ctx->in->buf->start;
    b->end = ctx->in->buf->end;
    b->recycled = 1;
    b->flush = ctx->in->buf->flush; 
       
    *ll = cl;  
    ll = &cl->next;
	

    cl = ngx_chain_get_free_buf(r->pool, &ctx->free);
    if (cl == NULL) 
    {
        ngx_log_error(NGX_LOG_ERR, r->connection->log, 0, 
             "[Html_head filter]: ngx_html_insert_output: "
             "unable to allocate output chain memory");
             
        return NGX_ERROR;
    }

    b=cl->buf;
    ngx_memzero(b, sizeof(ngx_buf_t));
   
    b->tag = (ngx_buf_tag_t) &ngx_http_html_head_filter_module;
    b->memory=1;
    b->pos=slcf->insert_text.data;
    b->last=b->pos + slcf->insert_text.len;
    b->start = b->pos;
    b->end = b->last; 
    b->recycled = 1;
	 
    *ll = cl;
    ll = &cl->next;
	 

    if(ctx->in->buf->pos + ctx->index + 1 == ctx->in->buf->last )
    {//head tag is in last position of the buffer
   
        b->last_buf = ctx->in->buf->last_buf;
        b->last_in_chain = ctx->in->buf->last_in_chain;
		 
        *ll = ctx->in->next;
        
        if(ctx->in->buf->recycled)
        {//consume existing buffer
            ctx->in->buf->pos = ctx->in->buf->last;  
        }
		
	    ctx->in = ctx_in_new;
	    return NGX_OK;
    }
     
    
    //tag is within buffer last position, 
    //i.e. ctx->in->buf->pos + ctx->index + 1 < ctx->in->buf->last
    cl = ngx_chain_get_free_buf(r->pool, &ctx->free);
    if (cl == NULL) 
    {
        ngx_log_error(NGX_LOG_ERR, r->connection->log, 0, 
            "[Html_head filter]: ngx_html_insert_output: "
            "unable to allocate output chain memory");
            
        return NGX_ERROR;
    }

    b=cl->buf;
    ngx_memzero(b, sizeof(ngx_buf_t));

    b->tag = (ngx_buf_tag_t) &ngx_http_html_head_filter_module;
    b->memory=1;
    b->pos = ctx->in->buf->pos + ctx->index + 1;
    b->last = ctx->in->buf->last;
    b->start = ctx->in->buf->start;
    b->end = ctx->in->buf->end;
    b->recycled = 1;
    b->last_buf = ctx->in->buf->last_buf;
    b->last_in_chain = ctx->in->buf->last_in_chain;

    *ll = cl;
    ll = &cl->next;
    *ll = ctx->in->next;
    
    if(ctx->in->buf->recycled)
    {//consume existing buffer
        ctx->in->buf->pos = ctx->in->buf->last; 
    }
	  
    ctx->in = ctx_in_new; 
	   
    return NGX_OK;

}

The insert text function splits the input buffer where the <head> tag is found into either 3 or 2 buffers with the text inserted. The process is illustrated earlier in the Design and Approach section. If the current input buffer has only content up to the <head> tag, then our text can be inserted directly as a new buffer after the input buffer. In this case, it is split into 2 buffers.

Alternatively if the current input buffer has content after the <head> tag, the input buffer will be split into 3 buffers. The first is the content up till and including the <head> tag, the second is our inserted text and the third is the content after the <head> tag.

The new set of buffers are then incorporated into the output chain by the while loop in the function handler, ngx_http_html_head_body_filter( ). If the original buffer is marked with a recycled flag, it will be consumed. This is done by setting the start position of the buffer content to be equal to its last content position. The recycled flag indicates that the buffer has to be consumed as soon as possible, so that it can potentially be reused.

There are a couple of other functions and code snippet not covered in this implementation section. Some examples, include the functions for handling the parser stack, the ngx_process_tag( ) function etc... Refer to the github link below for the full source code.

Compiling the Nginx Body Filter Module

Let's proceed to compile and test the html head filter module. Create a working directory "Build-Module" to hold the source files that are required. The filter module source code can be obtained from the github repository. On a Ubuntu linux system with git installed, the following commands can be used.

mkdir Build-Module
cd Build-Module
git clone https://github.com/ngchianglin/NginxHtmlHeadFilter.git

To verify the signature of the git download, refer to these instructions. Let's do a quick static analysis of the module's source code to make sure that there are no major vulnerabilities, such as buffer overflows. On Ubuntu, we can install cppcheck.

sudo apt-get install cppcheck
cd NginxHtmlHeadFilter
cppcheck --enable=warning ngx_http_html_head_filter_module.c

Good, our module code doesn't have any glaring issues that the cppcheck analyzer can find. We can proceed to download the other packages that are required. Change our directory back to Build-Module.

cd ..

The filter module works with the latest stable Nginx 1.16.1. Download the latest stable nginx source code from the official Nginx download page. We are going to download Openssl 1.1.1d, zlib 1.2.11 and pcre 8.43 as well.

Verify the integrity of the downloads with either SHA-256 checksum or gpg signature provided by each of the package website. The following lists the sha256 checksums of the packages.

nginx-1.16.1.tar.gz
f11c2a6dd1d3515736f0324857957db2de98be862461b5a542a3ac6188dbe32b

openssl-1.1.1d.tar.gz
1e3a91bc1f9dfce01af26026f856e064eab4c8ee0a8f457b5ae30b40b8b711f2

zlib-1.2.11.tar.gz
c3e5e9fdd5004dcb542feda5ee4f0ff0744628baf8ed2dd5d66f8ca1197cb1a1

pcre-8.43.tar.gz
0b8e7465dc5e98c757cc3650a20a7843ee4c3edf50aaf60bb33fd879690d2c73

Extract these tar balls in the Build-Module directory. Issue the following commands to configure Nginx. The options include hardening flags to ensure a hardened binary.

cd nginx-1.16.1
./configure --with-cc-opt="-Wextra -Wformat -Wformat-security -Wformat-y2k -Werror=format-security -fPIE -O2 -D_FORTIFY_SOURCE=2 -fstack-protector-all" --with-ld-opt="-pie -Wl,-z,relro -Wl,-z,now -Wl,--strip-all" --with-http_v2_module --with-http_ssl_module --without-http_uwsgi_module --without-http_fastcgi_module --without-http_scgi_module --without-http_empty_gif_module --with-openssl=../openssl-1.1.1d --with-openssl-opt="no-ssl2 no-ssl3 no-comp no-weak-ssl-ciphers -O2 -D_FORTIFY_SOURCE=2 -fstack-protector-all -fPIC" --with-zlib=../zlib-1.2.11 --with-zlib-opt="-O2  -D_FORTIFY_SOURCE=2 -fstack-protector-all -fPIC" --with-pcre=../pcre-8.43 --with-pcre-opt="-O2 -D_FORTIFY_SOURCE=2 -fstack-protector-all -fPIC" --with-pcre-jit --add-module=../NginxHtmlHeadFilter

The configure command above will create a Makefile in the objs directory. Proceed to build the binary and install it into /usr/local/nginx.

make
sudo make install

We can tar zip the compiled nginx package and move it to our server machine for testing. As a security measure and best practice, the server doesn't have gcc or compiler tools installed. We compile the code on a separate workstation that has the same architecture and OS as the server and then copy the compiled package to the server using sftp or scp.

cd /usr/local
tar -czvf nginx-binary-package.tgz nginx
sftp -i /home/devuser1/keyloc/private_rsa user@myserver
put nginx-binary-package.tgz

Testing the Nginx Filter Module

On the server, extract the nginx binary package to /usr/local/nginx. Ensure that the ownership and permission on this extracted nginx binary location are secure. The Apache web server shall serve the main website on this machine. It listens locally (127.0.0.1) on port 80 and will not accept any external network traffic.

Nginx will be configured as a reverse proxy in front of the Apache web server. Nginx accepts external network traffic and forward the traffic to the Apache web server. Refer to the earlier section, Design and Approach, for a big picture view of the deployment architecture.

Nginx is run using the nginx user and group. The following commands create the user and group, as well as the directories used by Nginx.

sudo mkdir /opt/nginx
sudo chmod 755 /opt/nginx
sudo groupadd -g 8800 nginx
sudo useradd -d /opt/nginx/home -m -u 8800 -g 8800 -s /bin/false nginx
sudo mkdir /var/log/nginx
sudo chown nginx: /var/log/nginx
sudo chmod 700 /var/log/nginx
sudo mkdir /opt/nginx/www
sudo chmod 755 /opt/nginx/www
sudo mkdir /opt/nginx/cache
sudo chown nginx: /opt/nginx/cache
sudo chmod 700 /opt/nginx/cache

Let 's do some additional hardening of the /usr/local/nginx location.

sudo chown -R root:nginx /usr/local/nginx
sudo chmod 750 /usr/local/nginx
sudo chown -R root:root /usr/local/nginx/sbin
sudo chmod 700 /usr/local/nginx/sbin/nginx
sudo chown -R root:root /usr/local/nginx/conf
sudo chmod -R 600 /usr/local/nginx/conf/
sudo chmod 700 /usr/local/nginx/conf

Opened up the nginx configuration file located at /usr/local/nginx/conf/nginx.conf and fill in the following settings. Note these configuration settings are for nighthour.sg. Edit and replace the IP address, the server name, the ssl certificates, etc... with settings that are relevant for your test environment. Testing should be done on a non production system.

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
user  nginx nginx;
worker_processes  4;
error_log  /var/log/nginx/error.log warn;
pid        /var/log/nginx/nginx.pid;

events {
    worker_connections  1024;
}

http {
    include       mime.types;
    default_type  application/octet-stream;

    log_format  main  '$remote_addr - $remote_user [$time_local] "$request" '
                      '$status $body_bytes_sent "$http_referer" '
                      '"$http_user_agent" "$http_x_forwarded_for" "$gzip_ratio"';

    sendfile    on;
    keepalive_timeout  65;
    server_tokens off;
    
    proxy_cache_path /usr/local/var/nginx/cache levels=1:2 keys_zone=webcache:2m max_size=150m;
    proxy_cache_key "$scheme$request_method$host$request_uri$is_args$args";
    proxy_cache_valid 200 90d;
    proxy_cache_valid 404 1m;

    gzip  on;
    
    map $sent_http_content_type $cachemap {
        default    no-store;
        ~text/html  "private, max-age=900";
        text/plain  "private, max-age=900";
        text/css    "private, max-age=7776000";
        application/javascript "private, max-age=7776000";
        ~image/    "private, max-age=7776000";
    }

    server {
        listen       128.199.64.100:80;
        server_name  www.nighthour.sg nighthour.sg;
        root   /var/www/html;
        charset utf-8;
        access_log  /var/log/nginx/access.log  main;
        
        expires 900;
        add_header Cache-Control public;
        if ( $host ~* "nighthour.sg$" )
        {
           return 301 https://$host$request_uri;
        }

        return 400;

        location / {
            index  index.html index.htm;
        }

        # redirect server error pages to the static page /50x.html
        error_page   500 502 503 504  /50x.html;
        location = /50x.html {
            root   html;
        }

    }

    # HTTPS server
    server {
        listen       128.199.64.100:443 ssl http2;
        server_name  www.nighthour.sg nighthour.sg;
        root /var/www/html;
        charset utf-8;

        ssl_certificate      /etc/letsencrypt/live/www.nighthour.sg/fullchain.pem;
        ssl_certificate_key  /etc/letsencrypt/live/www.nighthour.sg/privkey.pem;
 
        ssl_session_timeout 15m;
        ssl_session_cache shared:SSL:50m;
        ssl_session_tickets off;
        
        ssl_protocols TLSv1.2 TLSv1.3;
        ssl_ciphers 'ECDHE-ECDSA-CHACHA20-POLY1305:ECDHE-RSA-CHACHA20-POLY1305:ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES128-SHA256:ECDHE-RSA-AES128-SHA256';
        ssl_prefer_server_ciphers  on;
        
        ssl_stapling on;
        ssl_stapling_verify on;
        ssl_trusted_certificate /etc/letsencrypt/live/www.nighthour.sg/fullchain.pem;
        resolver 8.8.8.8 8.8.4.4 valid=300s;
        resolver_timeout 5s;
        
        add_header Strict-Transport-Security "max-age=31536000;includeSubDomains";
        access_log  /var/log/nginx/ssl_access.log  main;

        location / {
            
            index  index.html index.htm;
            
            html_head_filter "<script src=\"/scripts/mymonitor.js\" async></script>";
            html_head_filter_block on;
            
            proxy_cache webcache;
            proxy_cache_bypass $http_cache_control;
            
            proxy_set_header HOST $host;
            proxy_set_header X-Real-IP $remote_addr;
            proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
            proxy_pass http://127.0.0.1;
            
            add_header Cache-Control $cachemap;
            add_header Strict-Transport-Security "max-age=31536000;includeSubDomains";
        }
   
        # redirect server error pages to the static page /50x.html
        error_page   500 502 503 504  /50x.html;
        location = /50x.html {
            root   html;
        }
        
    }

}

The configuration above sets up Nginx to listen on the public ip address at port 80 and 443. The server block at port 80 redirects HTTP request to HTTPS at port 443. In the server block for port 443 (HTTPS), proxy_pass to http://127.0.0.1 is configured. http://127.0.0.1 is where the Apache web server is listening for traffic.

We also turn on the Html Head filter module by setting the directive html_head_filter with its argument string in the location block. This argument string is the text to be inserted after the <head> tag in the HTTP response body from the Apache web server. The argument string is a script tag. It is a monitoring javascript, mymonitor.js. This script tag will be inserted into the HTTP response body.

html_head_filter "<script src=\"scripts\mymonitor.js\"></script>";
html_head_filter_block on;

The html_head_filter_block directive is set to on, this tells the Html Head filter module to display a blank html page for HTTP responses that do not contain a <head> tag within the first 128 characters.

Start up Nginx with the following command

sudo /usr/local/nginx/sbin/nginx

Access a page on the website using your favourite web browser and view the page source. The monitoring script should be inserted.

Nginx Html head filter module script insertion
Fig 8. Nginx Html Head filter module -- Script insertion

Create a test html page on the website that doesn't contain any <head> tag and is at least 128 characters.

echo "<html> Hello world" > testwithouthead.html
perl -e 'print "A" x 128' >> testwithouthead.html
echo "</html>" >> testwithouthead.html

Move the test html file into the document root of the Apache web server. Try accessing it, a blank page should be displayed.

Nginx Html head filter module blank page
Fig 9. Nginx Html head filter module -- Blank page

Edit the Nginx configuration /usr/local/nginx/conf/nginx.conf and set html_head_filter_block to off.

html_head_filter_block off;

Send a HUP signal to Nginx to re-read the configuration.

sudo kill -HUP `ps -ef | grep "nginx: master process" | grep -v grep | awk '{print $2;}'`

Clear the browser cache and restart the browser. Access the page again. The page will be displayed without being "blocked".

Nginx Html head filter module Block off
Fig 10. Nginx Html head filter module -- Block off

Some other tests can include html pages with multiple <head> tags, (the monitoring script should be inserted once), head tags with leading/trailing spaces and a mix of upper/lower case, or a Php script dynamically generating html content, or a 404 not found error page (monitoring script should not be inserted) etc... The Html Head filter module should handle all these cases properly.

When all the testings are done and the results met expectations, the filter module can be deployed to production. The filter module is actually deployed on nighthour.sg, inserting the monitoring script into the web pages here.

Conclusion and Afterthought

This article runs through the design and implementation of a simnple nginx filter module that inserts a text into the http response body, after the html <head> tag. The code implementation though doesn't exactly follow nginx coding convention, it follows the author's random style.

Nginx has its own recommended coding convention. For those attempting to write nginx modules, it is good to follow the nginx coding convention. The coding convention is documented in the Nginx development guide. I may reformat this code again in the future to follow the nginx convention.

Nginx is a high performance web server and reverse proxy that is highly extensible. It can serve as a Web Application Firewall (WAF) through modules such as Mod-Security, NAXSI or even act as an application server through project such as Openresty. Learning to write an Nginx module will allow an IT professional to know more about the internals of this flexible web infrastructure that is gaining wide usage.

The knowledge gained can benefit developers, infrastructure engineers, security engineers/professionals and even system administrators who code.

Useful References

The full source code for the Nginx Html Head Filter is available at the following Github link.
https://github.com/ngchianglin/NginxHtmlHeadFilter

If you have any feedback, comments, corrections or suggestions to improve this article. You can reach me via the contact/feedback link at the bottom of the page.

Article last updated on Nov 2019.