Augmented Lagrange Multiplier Method to Solve Quadratic Programming Problems in Standard Form: A Neural Network Approach
Dr. W. Abdul Hameed, Dr. P. Rajendran*
School of Advanced Sciences, VIT University, Vellore-632 014, Tamil Nadu, India
*Corresponding authors Email: hameedvellore@yahoo.co.in, prajendran@vit.ac.in
ABSTRACT:
In this Paper, we present a neural network for solving the quadratic programming problems in real time by means of augmented Lagrange multiplier method for problems in standard form. It is shown that the proposed neural network is stable in the sense of Lyapunov and can converge to an exact optimal solution of the original problem. Validity and transient behavior of the proposed neural network are demonstrated by some simulation results using MATLAB software.
KEYWORDS: Quadratic Programming Problems, augmented Lagrange multiplier method, Neural Network, MATLAB.
INTRODUCTION:
The quadratic programming (QP) problem is a special case of the nonlinear optimization problem in which a quadratic objective function is to be optimized subject to linear constraint conditions. However, owing to its importance and frequent occurrence in everyday practice, it is regarded as a separate problem. In this Paper, we consider a neural network approach of solving the QP problems defined relative to the forms of the linear constraints, namely, Neural network for the QP problems in standard form.
These problems arise in a wide variety of scientific and engineering fields including regression analysis, function approximation, signal processing, image restoration, parameter estimation, filter design, robot control, etc., and hence a real time solution is often desired. However, traditional numerical methods might not be efficient for digital computers since the computing time required for a solution is greatly dependent on the dimension and the structure of the problem and the complexity of the algorithm used.
One promising approach to handle these optimization problems with high dimensions and dense structure is to employ artificial neural network based on circuit implementation. The neural network approach can solve optimization problems in running times at the orders of magnitude much faster than conventional optimization algorithms executed on general-purpose digital computers.
The structure of this Paper is as follows. In Section 1, the quadratic form and augmented Lagrange multiplier method are explained. In Section 2, the basic concepts, approach, algorithms and the architecture of the neural networks are presented. In Section 3, the simulation results using MATLAB software for QP problems in standard form are derived. Finally some concluding remarks are drawn in Section 4.
QUADRATIC FORM AND AUGMENTED LAGRANGE MULTIPLIER METHOD
Quadratic form
A quadratic form is a part of the objective function which is optimized in all QP problems. A function defined as
n n
f(x) = Σ Σ qij xi xj
i=1 j=1
is called a quadratic form of a vector x Є Rnx1 . It can be rewritten in a more compact form as
f(x) = xT Q x
where Q Є Rnxn (1)
Recognizing that for every x and Q, the product xT Q x is a scalar quantity, we have
xT Q x = (xT Q x)T = xTQTx, and therefore,
xT Q x = (xT Q x + xT Q Tx) / 2 = xT [(Q + QT ) / 2 ] x = xT Θ x where Θ = (Q + QT ) / 2
That is, matrix Θ is a symmetric matrix having elements given by (qij + qji) / 2 where qij and qji are elements of Q. It shows that for every matrix Q, an equivalent quadratic form can be formulated with a symmetric coefficient matrix Θ. For that reason, without any loss of generality, we can assume that Q is a real symmetric matrix. A quadratic form is called positive definite if xT Q x > 0 for every nonzero x Є Rnx1. Also, a quadratic form is called positive semi-definite if xT Q x ≥ 0 for all nonzero x Є Rnx1 and there exists at-least one vector x ≠ 0 for which xT Q x = 0.
A fundamental mathematical concept related to nonlinear programming is that of a convex function. A function f(x) defined over a convex set D in Rnx1 is said to be convex if for any two points x1 and x2 in D and for any 0 ≤ λ ≤ 1, we have f [λx1 + (1-λ) x2] ≤ λ f(x1) + (1-λ) f(x2). If the function is convex, its value in an arbitrary point on the segment (x1, x2) is smaller than the value on the straight line connecting the points f(x1) and f(x2). The following theorem presents an important property of the convex function.
Theorem (1)
If f(x) is convex on a convex set D, then f(x) has at-most one local minimum. If there is such a minimum, it is a global minimum and is obtained on the convex set D. In essence, the theorem states that if the energy function is convex, the problem of local minima does not exist and the gradient-based minimization is guaranteed to terminate at a point of the global minimum. The quadratic form in (1) is a common part of many energy function used in quadratic programming.
Theorem (2)
Let x Є Rnx1 and let f(x) = xT Q x be a quadratic form defined on a convex set D Ì Rnx1. If matrix Q is semi-definite, the quadratic form f(x) is convex on set D. This theorem shows that the semi-definite quadratic form is a convex function. Therefore, according to Theorem (1), it has a unique minimum that can be easily found by using a gradient-based iterative technique. Hence we assume Q as a symmetric and a positive semi-definite matrix in solving a quadratic programming problem using neural network techniques. However, if the matrix Q is indefinite, there is a possibility that the quadratic form may have multiple local minima. From a practical standpoint, this is a rarely encountered limitations since the vast majority of quadratic programming problems can be formulated with Q being positive semi-definite.
Augmented Lagrange Multiplier Method
The augmented Lagrange multiplier method is one of the most effective general approaches to solving Quadratic programming problems. It was derived independently by Hestens [5] and Powel [8]. Mathematically, the constrained optimization problems can be formulated as follows:
Minimize
f(x) = f (x1,x2, …., xn)
subject to Ax = b and x1,x2, …., xn ≥ 0,
where x Є Rnx1, A Є Rmxn , b Є R mx1. The Ordinary Lagrange Multiplier method transforms the above problem to an unconstrained optimization problem. It is formulated by appending the constraints to the objective function with Lagrange multipliers used as searching factors. The new objective function is called the Lagrangian and it is of the form
L(x, λ) = f(x) + λT (Ax-b)
where x Є Rnx1 and λ = [ λ1 ,λ2 , ….., λm]T Є R mx1. The augmented Lagrangian is formed by addition of extra penalty term. The most popular form of the augmented Lagrangian, given by Gill, Murray and Wright [3] is
LA(x, λ, K) = f(x) + λT(Ax-b) + K(Ax-b)2
where λ = [ λ1 ,λ2 , ….., λm]T Є R mx1 denotes the vector of the Lagrange multipliers and K = [k1,k2, ……, km] T ЄR mx1 is the vector of positive penalty parameters. After this, the optimal solution is found as its unconstrained minimum. This can be done by using any of the unconstrained optimization techniques. In the most simplistic approach, we can use Steepest Descent where the minimization of the Lagrangian is converted to a system of difference equations of the form
x(k+1) = x(k) - μx [∂ L(x, λ,K)/∂x] (2)
λ(k+1) = λ (k) + μx [∂ L(x,λ,K)/∂λ] (3)
where μx ,μλ > 0 represent the learning rule parameters. There are several important properties of the augmented Lagrange multiplier method that need to be taken into consideration.
(i) Local-minimum property: The augmented Lagrange multiplier method guarantees convergence to the local minimum of the augmented Lagrangian. The local minimum of the augmented Lagrangian converges to the constrained minimum of the objective function only in the limiting case when the penalty parameter in K is sufficiently large.
(ii) Choice of the penalty parameter: In general, the penalty parameter needs to be chosen so that the Hessian matrix of the augmented Lagrangian defined as H = Ñx2 LA(x, λ, K) = ∂2L/∂x2 is positive definite. If the values of the penalty parameter are too small, the algorithm may fail to converge or it may converge to a value that is a local minimum of the augmented Lagrangian but does not minimize the objective function itself. On the other hand, if the parameters are chosen too large, the algorithm may exhibit oscillatory behavior in the vicinity of the solution.
(iii) Convergence of the Lagrange Multipliers: For the algorithm in (2) and (3), to find the optimal solution it is necessary that both x and λ converge to their optimal values x* and λ*. In some cases the augmented Lagrangian can be very sensitive to the values for the multipliers, and it may take a considerable number of iterations before convergence is achieved.
BASIC CONCEPTS AND NEURAL NETWORK APPROACH
Basic Concepts
Neural networks represent a computational (e.g., Marks [6] ) approach to intelligence as contrasted with the traditional, more symbolic approach. The idea of such systems is due to the work of the psychologist D. Hebb (e.g., Hebb [4] ) (and after whom a class of learning techniques is referred to as Hebbian). Despite the pioneering early work of McCulloch and Pitts (e.g., McCulloch and Pitts [7] ) and Rosenblatt (e.g., Rosenblatt [9] ), the field was largely ignored through most of 1960’s and 1970’s, with researchers in artificial intelligence (AI) mostly concentrating on symbolic techniques. Reasons for this could be the lack of appropriate computational hardware or the work of Minsky and Papert which showed limitations of a class of neural networks (single layer perceptrons) popular then. The development of very large-scale integration (VLSI) and parallel computing revived interest in neural networks in the mid 1980’s as an alternate mechanism to investigate, understand, and duplicate intelligence. In the past decade, there has been a phenomenal growth in this field. Some researchers view neural networks as mechanism to study intelligence, but most literature in the area sees neural networks as a tool to solve problems in science and engineering. Most of these problems involve optimizing the solutions.
Neural Network Approach
The basic approach for solving problems using networks involves four phases: The first phase consists of constructing an appropriate error cost function for the particular type of problem to be solved. The error cost function is based on a defined error variable(s) which is typically formulated from a functional network for the particular problem. Thus, the problem, in general, is represented by a structured multiplayer neural network (e.g., Wang and Mendel [11] ).
The second phase is an optimization step which involves deriving the appropriate learning rule for the structured neural network using the defined error cost function from step (1) above. This typically involves deriving the learning rule in its batch (or vector-matrix) form. Once the vector-matrix form for the learning rule is derived, the scalar form can be formulated in a relatively straightforward manner.
The third phase involves the training of the neural network using the learning rule developed in step (2) above to match some set of desired patterns, that is, input/output signal pairs. Therefore, the network is essentially optimized to minimize the associated error cost function. That is, the training phase involves adjusting the network’s synaptic weights according to the derived learning rule in order to minimize the associated error cost function.
The fourth and final phase is actually the application phase in which the appropriate output signal are collected from the structural neural network for a particular set of inputs to solve a specific problem.
Neural Network for the QP Problems in Standard Form
The standard form of the QP problem is as follows:
Minimize f(x) = cTx + (1/2) xTQx
subject to Ax = b and x1, x2, …., xn ≥ 0,
where x Є Rnx1, A Є Rmxn , b Є R mx1 , Q Є Rnxn , x Є Rnx1 , m<n and Q is assumed to be a symmetric positive definite matrix. In order for a neural network approach to be used, a convenient energy function needs to be defined. Using the augmented Lagrange multiplier method, we can define an energy function as (e.g., Chichoki and Unbehauen [1] )
E (x, λ) = cTx + (1/2) xTQx + λT(Ax-b) + (K/2) (Ax-b)T (Ax-b) (4)
where λ = [ λ1 , λ2 , ….., λm]T Є R mx1 and K ≥ 0 is the penalty parameter. Applying a gradient method, we can form the network update equations as
x(k+1) = x(k)- μ [ÑxE(x, λ)]
λ(k+1) = λ (k) + η [ÑλE(x,λ)]
where μ, η > 0 are the learning rule parameters. Taking the partial derivatives of the energy function with respect to x and λ in (4) which yield
ÑxE(x, λ) = (∂/∂x) [E(x)]
= (∂/∂x) [ cTx + (1/2) xTQx + λT(Ax-b) + (K/2) (Ax-b)T (Ax-b)]
= (∂/∂x) [ cTx + (1/2) xTQx + λTAx - λTb + (K/2) {(xTAT-bT) (Ax-b)}]
= (∂/∂x) [ cTx + (1/2) xTQx + λTAx - λTb + (K/2) {(xTATAx- xTATb- bTAx+bT b)}]
= (∂/∂x)[ cTx + (1/2) xTQx+ (ATλ)Tx-λTb+(K/2){(xTATAx- xTATb-(AT b)Tx+bT b)}]
= c + (1/2) (2Qx) + ATλ + (K/2) {2ATAx - ATb - AT b}
= c + Qx + ATλ + K (ATAx - ATb)
= c + Qx + ATλ + K AT(Ax - b)
ÑλE(x, λ) = (∂/∂λ) [ cTx + (1/2) xTQx + λT(Ax-b) + (K/2) (Ax-b)T (Ax-b)]
=(Ax-b)
After we determine the gradient, the resulting network update equations are
x(k+1) = x(k) - μ [c + Q x(k) + ATλ(k) + K AT{Ax(k) - b}]
λ(k+1)
= λ(k) + η (Ax-b)
SIMULATION
In order to verify the feasibility and efficiency of the above discrete-time neural networks, two examples are carried out using MATLAB software (e.g., Rudra Pratap [10] ) to solve the QP problems in standard form.
Example-1
Consider the following QP problem in standard form taken from Gass [2]
Maximize f(x) = 4x1 + 6x2 - 2x12 - 2x1x2 - 2x22
subject to x1 + 2x2 = 2 and x1, x2 ≥ 0
Here A = [1, 2]; b = [2]; c = [-4 ; 6]; Q = [4 2 ; 2 4]. Zero initial conditions are assumed for x and for Lagrange multiplier, λ. The learning rate parameters are chosen as μ = 0.01 and η = 0.01 and the penalty parameter as K = 0. The Figure (1) shows the trajectories for each of the independent variables. As can be seen, the neural network converges after 400 iterations. The optimal solution to the QP problem is given as x = [0.333, 0.8333] and f(x) = 4.1666, which is the same as the solution obtained by the classical Lagrangian method.
Example-2
Consider the following QP problem in standard form taken from Gass [2]
Minimize f(x) = 4x1 + 9x2 - x12 - x22
subject to 4x1 + 3x2 = 15; 3x1 + 5x2 = 14; and x1, x2 ≥ 0
Here A = [4 3 ; 3 5]; b = [15 ; 14]; c = [4 ; 9]; Q = [-2 0 ; 0 -2]. Zero initial conditions are assumed for x and for Lagrange multiplier, λ. The learning rate parameters are chosen as μ = 0.01 and η = 0.01 and the penalty parameter as K = 2. The Figure (2) shows the trajectories for each of the independent variables. As can be seen, the neural network converges after 400 iterations. The optimal solution to the QP problem is given as x = [3 , 1] and f(x) = 11, which is the same as the solution obtained by the classical Lagrangian method.
Figure (1): Trajectories of the independent variables for the QP problem in Example (1).
igure (2): Trajectories of the independent variables for the QP problem in Example (2).
CONCLUSION:
In this Paper, we have shown how the neural network technique is applied to solve quadratic programming problems in standard form. We have presented the graph to understand the updating process of the neural network architecture and trajectories of the independent variables.
The neural algorithm proposed in this Paper has several advantages over the conventional parallel algorithms. First, the neural algorithms are much simpler than others. The only algebraic operations required in the neural algorithms are addition and multiplication. Inverse and other complicated logic operations are not needed. Second, the neural algorithms are the most parallel, compared with conventional parallel algorithms. Simplicity in implementation of the neural algorithms is the third advantage.
REFERENCES:
[1] Chichoki A. and Unbehauen R., Neural Networks for optimization and Signal processing, Chichester, England: Wiley (1993).
[2] Gass S.I., Linear Programming, Methods and Applications, 5th ed., New York: McGraw-Hill (1984).
[3] Gill P.E., Murray W., and Wright M.H., Practical Optimization, London: Academic (1981).
[4] Hebb D.O., The Organization of Behavior: A Neuropsychological Theory, New York: Wiley (1949)
[5] Hestens M.R., Multiplier Gradient Methods, Journal of optimization Theory and Applications, (1969), Vol.4, pp: 303-320.
[6] Marks R.J., Intelligence: Computational verses artificial, IEEE Trans. Neural Networks, (1993), vol.4.
[7] McCulloch W.S., and Pitts W., A logical calculus of ideas immanent in nervous activity, Bull. Math. Biophys., (1943) vol. 5, pp. 115-133.
[8] Powell M.J.D., A Method for Nonlinear Constraints in Minimization Problems, In Optimization, ed. Fletcher, R. London: Academic, (1969), pp:283-298.
[9] Rosenblatt F., Principles of Neuro-dynamics. New York: Spartan (1962).
[10] Rudra Pratap, Getting Started with MATLAB-A Quick Introduction for Scientists and Engineers, Version-6, Oxford University Press, New York, (2003).
[11] Wang L.X., and Mendel J.M., Structured Trainable Networks for Matrix Algebra, Proceedings of the International Joint Conference on Neural Networks, San Diego, CA, (1990) vol. 2, pp. 125-132.
Received on 11.07.2016 Modified on 21.07.2016
Accepted on 30.08.2016 © RJPT All right reserved
Research J. Pharm. and Tech 2016; 9(10):1727-1731.
DOI: 10.5958/0974-360X.2016.00347.4