Neural networks have a large circle of acquaintance engaged in mathematics, engineering, economics and many others due to their excellent performance in function approximation, pattern recognition, associative memories, forecasting and generation of new meaningful patterns. Applications of neural networks in wastewater treatment systems require full understandings of these all. Therefore, for further comprehension of this methodology, this section provides a preliminary understanding of neural networks starting with their development and structure, and emphasizes more on their classification and technical coupling with other machine learning methods.
Evolution: from artificial neuron to deep learning
The past six decades have witnessed ups and downs in the development of neural networks. The modern era of neural networks began with the pioneering work of McCulloch and Pitts (1943), who proposed a neurophysiology and mathematics based model that appeared to calculate any computable functions. The next major advance came in six years later that explicit statement of a physiological learning rule for synaptic modification was presented to support for the development of the following computational models. After that, the golden age prelude opened. Practical neuron-computers were developed one after another, the most classical of which were recognized as the perceptron and adaptive linear element respectively. With a booming in the research of neural networks, intelligent learning machines seemed to be created already. However, an analysis of the weakness of the perceptron when confronting with nonlinearly problems put an end to this overestimation. The enthusiasm faded slowly combined with declining research funds at the same time, however, the prolonged silence built the foundation of theories for the still continuing renaissance. Until the introduction of the back-propagation of error learning procedure did the research of neural networks whipped into its right track again. Recently, although originally viewed with scepticism, neural networks have undergone a renaissance in the form of deep learning ( such as Deep Feedforward Networks, Convolutional Neural Networks, Generative Adversarial Networks and AutoEncoder), as a result of the development of novel training rules, an expansion in the number of layers, the access of large-scale datasets and better hardware implementations. Although researches based on the principles of the neuron doctrine are far from being finished, neural networks in computer science, benefiting from their overwhelming self-adaptability, self-organization and self-learning, have stretched into every walk of life.
Hierarchy: from node to network
Biologically, an activated neuron acts like an interchange station transferring chemical substance to the connected ones. Chemical substance converts the electric potential in neurons, and we define this neuron is activated if the electric potential exceeds a specific threshold. Similarly, the neuron in computer science is just an information-process unit derives from the biological neuron (Haykin, 1998). It consists of three basic elements:
i. A bunch of weighted connections referred to as w_ij and linking two neurons i and j;
ii. An accumulator calculating a weighted sum of input signals x_ij;
iii. An activation function referred to as f and judging whether or not to send its activation value in turn down to other connecting ones depending on difference between the weighted sum and a threshold value.
The most common activation functions are summarized in Table.2.
In mathematical terms, taking the threshold value in the form of bias into consideration, the output of a neuron i can be described by the following equation:
y_i=f(∑_1^m▒w_ij x_ij+bias)
Neurons are the element of neural networks which combines neurons in a specific network topology. Generally, in the network, data are introduced to the input layer with further procession in the following layers and constitution of an overall response to the initial inputs in the output layer (Haykin, 1998; Kriesel, 2007). Neural networks consisting of three or more fully/partly connected layers (an input and an output layer with one or more hidden layers) of linearly/nonlinearly-activating nodes are described as multilayer perceptron (MLP), which is the most basic and frequently used form of neural networks. The information learned by neural networks is stored in connecting weights and bias, which are elementary parameters of neural networks. Naturally, the number of layers and the number of nodes per layer are meta-parameters. The learning process adjusting elementary parameters and validation process to optimize meta-parameters will be introduced in Section 3.
Classification: supervised or unsupervised
Benefit from their respective topologies, different networks show distinct advantages in solving different problems. Traditionally, neural networks are classified based on the following categories : (i) topological structure involved in the information flow of networks, (ii) the degree of learning supervision, (iii) the learning algorithms. In this section, we describe several popular networks briefly in the order of learning supervision with some typical topological structure introduced in sub-classifications. Meanwhile, considering complicated issues and special requirements in wastewater treatment, hybrid frameworks, not a particular type, are added to illustrate the strong technical coupling of neural networks with other machine learning methods.
Supervised Learning
Supervised learning means inferring a model from labeled training data (Mohri et al., 2012). Among supervised learning, feed-forward neural networks (FFNNs) and recurrent neural networks (RNNs) are representatives of two disparate topological structures. FFNNs consist of neurons organized in layers with information flowing forward, from the input layer, through the hidden layer(s) and to the output layer. Each neuron in each layer is always completely linked to those in the neighboring layer (Fig.3). Back-propagation based FFNNs, due to their efficiency, conciseness and flexibility, are the most commonly used type (Basheer and Hajmeer, 2000). Back-propagation means error is transmitted in the opposite direction against data flow. Back-propagation algorithm was created by Paul Werbos and reorganized by Rumelhart et al (Rumelhart, 1986a; Werbos, 1974). So important it is that we must introduce back-propagation in detail in Section 3 when talking about model training. Radial basis function (RBF) networks are special cases of three-layered back-propagation based networks but always listed out separately to make comparisons with the back-propagation based networks. RBF networks employ RBFs (such as Gaussian kernel, Multiquadric and Inverse quadratic) working as activation functions in the sole hidden layer to cluster inputs of the network and implement a linear combination of RBFs in the output layer (Park and Sandberg, 1993, 1991). RBF networks are trained faster than back-propagation based ones but not as versatile (Basheer and Hajmeer, 2000). Convolutional Neural Networks are a specialized kind of deep, feed-forward networks for processing data known as grid-like topology (Goodfellow et al., 2016). The shared-weights architecture and translation invariance characteristics achieved by convolutional layer and pooling layer make them tremendously successful in practical applications involved in computer vision. Generative Adversarial Networks is a combination of twins sub-networks working together: one generates content and the other judges it. The discriminating network receives either training data or generated content from the generative network. Its discriminating ability is sent to the generating network as a feedback, which creates a form of competition making the discriminator work better at distinguishing real data from generated data and the generator learn to become less predictable to the discriminator. Different from FFNNs, RNNs make output signals fed back to neurons in the same or previous layers. The information and connection feedbacks allow the current state of RNNs to depend not only on the current inputs but also on the network state in the previous time steps. Therefore, the dynamic memory of RNNs plays an important role in solving changes related to time variations. The most common ones are Elman networks, combing hidden-layered neuron’s outputs with signals from input layer next time step as inputs of neurons in the hidden layer (Williams and Zipser, 1989). Another typical RNNs, Hopfield neural networks, are two-layered networks with an energy function serving as guarantee to converge to local minimums, which makes them efficient in solving optimization problems (Sathasivam, 2008). Long/short term memory (LSTM) networks is a kind of RNNs equipped with memory cells, each of which has three gates: input, output and forget. The input gate and the output gate determine strength of the information flow from the previous layer and to the following layer, respectively. The forget gate determines how much information in the current cell to forget. With the adjustable memory, LSTMs have been proved to be able to learn complex interrelationship from sequence problems.
Unsupervised Learning
Unsupervised learning means inferring a model to describe the hidden structure from unlabeled training data (Mohri et al., 2012). Networks affiliated to unsupervised learning adaptively update a certain bunch of weights related to the winning output neuron, which derives from competitions between all output-layered neurons (Basheer and Hajmeer, 2000). The most common are Self-Organizing Maps (SOMs), which are also known as Kohonen networks. In SOMs, output-layered neurons are not isolated but interconnected with the neighboring ones in the form of two or three dimensional matrix (Kalteh et al., 2008; Kohonen, 1982). It is convenient for mapping input data to a low dimensional space but the internal topological structure of high-dimensional characteristics is maintained in parallel, which provides SOMs significant advantages in clustering and data compression (Kohonen and Honkela, 2007). Adaptive resonance theory (ART) networks, another frequently-used ones, do not modify the learned information stored in the weight vectors when presented with a new pattern but enlarge memory capacity synchronously with the increase of patterns (Carpenter and Grossberg, 2003; Grossberg, 2013). AutoEncoder is a typical unsupervised learning model attached to deep learning. It is trained to copy its input to its output by learning to compress data from the input layer into a short code and then to uncompress that code into something matching the original data. This forces the AutoEncoder to engage in dimensionality reduction and learning how to ignore noise.
Hybrid frameworks
Hybrid frameworks are not a particular kind of neural networks, but one combining traditional neural networks with other machine learning methods (e.g. fuzzy system, reinforcement learning, or GA) to dovetail their respective superiority. Most of the developments on hybrid GA and neural networks focus on the exploitation of an enhancement to the design of neural networks, especially in the determination of the network structure. GAs evolve network topologies, starting with a set of arbitrary parameters and ensure the global optimal of the network topology. Likewise, particle swarm optimization (PSO), another type of global and population-based algorithm, has the same capability in improving the design of neural networks as that of GAs. Reinforcement learning, a kind of data-free method, pursues the maximum reward of adjusted actions based on the relationship between agent and environment. Therefore, it is perfect for system control especially for a lack of training data to implement supervised learning methods. Fuzzy neural networks, due to the fuzzy knowledge base brought by fuzzy system, are accomplished in expressing the fuzzy rules that are required during system control. Apparently, hybrid frameworks have the capacity to deal with more complex issues but are recommended to be avoided if single networks perform well enough, because complicated and coupled model structure impose extra computational burden.
Indeed, there are no “one-size-fit-all” neural networks, but a relatively better choice, just like a key to a lock, does exist for a given problem. SOMs work better in classification and data visualization; an optimization problem requires Hopfield networks; back-propagation and RBF based networks may be appropriate for forecasting and controlling; RNNs are perfect for time series problems. Therefore, the selection of neural network architecture foreshadows the effective handling of the problem to be resolved.