Review of algorithms for modeling metal distribution equilibria in liquid-liquid extraction processes

This work focuses on general guidelines to be considered for application of least-squares routines and artificial neural networks (ANN) in the estimation of metal distribution equilibria in liquid-liquid extraction process. The goal of the procedure in the statistical method is to find the values of the equilibrium constants (K¡) for the reactions involved in the metal extraction which minimizes the differences between experimental distribution coefficient (Dgxp) and theoretical distribution coefficients according to the mechanism proposed (Dt̂ ĝor)Iri the first part of the article, results obtained with the most frequently routine reported in the bibliography are compared with those obtained using the algorithms previously discussed. In the second part, the main features of a single back-propagation neural network for the same purpose are discussed, and the results obtained are compared with those obtained with the classical methods.


INTRODUCTION
The use of minimization routines to obtain the best value of equilibrium distribution constants for liquid-liquid systems that fits experimental data, is achieved by least-squares optimization using several computer tools.The program LETAGROPt^ï is the most frequently used, but more accurate results can be obtained using the last algorithms developed for that purpose.
The main disadvantage of this "classical" method to fit experimental values to a liquid-liquid extraction model is that we need an "a priori" knowledge of the reactions involved in the process.Since mass transfer phenomena in hydrometallurgical systems may be very complex (dimerization, aggregation of organic species, hydration processes, uncertainty in activity coefficients..) the mechanism proposed is always a "simplified" vision of the real problem.
Artificial neural networks (ANN) are being applied in many fields of chemical engineering due to their ability to approximate virtually any function in a stable and efficient way^ ' .They are trainable dynamical systems and, unlike statistical estimators, they approximate a function without a mathematical model of how outputs depend on inputs^^l These "model-free" estimators are widely recognized as an adequate technique for modeling complex or under-defined systems that are difficult to model otherwise, ANN consists of numerous, simple processing units or "neurons" that we can globally program for computation.Despite the wide range of applications, ANN are still designed through a trial and error approach^ .This design involves the following steps: a) Structural design of the network, i.e. determination of number of layers and number of neurons in each layer and neuron transfer function.b) Selection of the training methods and corresponding parameters (i.e.synaptic weights and neuron's biases).

METHODS FOR LEAST-SQUARES OPTIMIZATION
The basis of the method lies on the minimization of the function 17 = ^^{D,,,,,, -D,,,)' where 17 is the error square sum, D^j^^^y. is the theoretical distribution ratio predicted with the model and D^^p is the experimental distribution ratio.
The program "Isqcurvefit.m"included in the MATLAB Optimization Toolbox solve nonlinear curve-fitting (data-fitting) problems in the leastsquares sense^^^.That is, given input data xdata, and the observed output Dexf>, find coefficients K that "best-fit" the equation: where, xdata and Dexf) are vectors of length equal to the number of experimental data points and F(K, xdata) is a vector valued function that computes Dtheor according to the extraction mechanism proposed.
The algorithm developed to solve the problem is based on a trust-region method for non linear minimization.Consider the unconstrained minimization problem, min fix), where the function takes vector arguments and returns scalars.Suppose you are at a point x in n-space and you want to improve, i.e., move to a point with a lower function value.The basic idea is to approximate / with a simpler function q which reasonably reflects the behavior of function / in a neighborhood around the point x.This neighborhood is the trust region.A trial step s is computed by minimizing (or approximately minimizing) over N.This is the trust-region subproblem.

min j{}(s) SE.N\ (2)
The current point is updated to be x+s if f(x+s)>f(x); otherwise, the current point remains unchanged and N, the region of trust, is shrunk and the trial step computation is repeated.
The key questions in defining a specific trustregion approach to minimizing f(x) are: a) How to choose and compute the approximation (defined at the current point x) b) How to choose and modify the trust region N c) How accurately to solve the trust-region subproblem.
In the standard trust-region method^^^, the quadratic approximation q is defined by the first two terms of the Taylor approximation to /at x; the neighborhood N is usually spherical or ellipsoidal in shape.Mathematically the trust-region subproblem is typically stated: where, g is the gradient of / at the current point x, H is the Hessian matrix (the symmetric matrix of second derivatives), D is a diagonal scaling matrix, D is a positive scalar, and | | .| | is the 2-norm.Good algorithms exist for solving eq. 3 ^ s such algorithms typically involve the computation of a full eigensystem and a Newton process applied to the secular equation: Such algorithms provide an accurate solution to equation 3.However, they require time proportional to several factorizations of H. Therefore, for largescale problems a different approach is needed.Several approximation and heuristic strategies, based on eq. 3, have been proposed in the literature^ ^ ^"^K The approximation approach followed in the Matlab Optimization Toolbox is to restrict the trust-region subproblem to a two-dimensional subspace S^^^' T K Once the subspace S has been computed, the work to solve eq, 3 is trivial even if full eigenvalue/eigenvector information is needed (since in the subspace, the problem is only twodimensional).The dominant work has now shifted to the determination of the subspace.
The two-dimensional subspace S is determined with the aid of a preconditioned conjugate gradient process^ \ The toolbox assigns S=<s¡yS2>y where S] is in the direction of the gradient g, and $2 is either an approximate Newton direction, i.e., a solution to: or a direction of negative curvature, •H-S2<0 The philosophy behind this choice of S is to force global convergence (via the steepest descent direction or negative curvature direction) and achieve fast local convergence (via the Newton step, when it exists).
The framework for the Matlab Optimization Toolbox approach to unconstrained minimization using trust-region ideas is: These four steps are repeated until convergence.The trust-region dimension D is adjusted according to standard rules.In particular, it is decreased if the trial step is not accepted, i.e., f(x+s) > f(x) ^^^ ^ ^^l

Cationic (acidic) extraction reagents
The general equation proposed is the next: The code for the program caleql .m and the file difcuadl .m for the function that defines D^heor ^^^ showed in the appendix, and the data are stored in columns in an external Excel file {nnurOOl .xls),

Anionic extraction reagents
For the general case of a tertiary amine in the form of amine salt, the next expression is proposed: Amine salt is formed according to the reaction: In a similar form that described in the previous section, we can write for this kind of systems: We can use equation 16 and the distribution coefficient to obtain: Combining both expressions we get: For this system we can write an analogous expression to that defined for previous system, thus:

[Y"^] » •UM^X,)'^'""]-[L]
(20) Distribution coefficient is defined as: Combining both expressions we get: Equation 22 lets determine theoretical distribution coefficient (Dij^^gy) if the next values are known: a) Equilibrium constant for the system 378 b) Concentration of neutral ligand (it can be approximated to initial concentration), c) Concentration of the cation that forms the neutral complex, Y, Experimental distribution coefficient can be calculated following the same scheme that defined for previous systems-The code for the program caleqi.m is presented in the appendix with the file that defines D^-^eor {difcuad3.m).The data are also stored in columns in an external Excel file {nnur003.xls).
For the case of the synergistic extraction of a metal using a mixture of an anionic reagent (e.g.amines) and a neutral ligand (e.g.phosphine oxides), similar equilibrium equations can be used according to the mechanism proposed.Fe^^ + 3 HR ^ FeR3 + 3 H^ log K2= -L9943 ± 0.0674 (LETAGROP)

ARTIFICIAL NEURAL NET DESIGN
Although many different architectures have been reported, the use of back propagation (BP) neural networks are particularly widespread in chemical engineering research, owing to their simplicity, compact design and flexibility^ I As mentioned in the introduction, this "modehfree" estimator does not need to know an "hypothetic" mechanism for the liquid-liquid extraction process.We need only to know the variables that influence in the value of the distribution coefficient obtained for each experiment, because this system adaptively estimates continuous functions from data without specifying mathematically how outputs depend on inputs.In BP neural nets, units (neurons) are divided into distinct layers, as shown in figure 3   units from each layer are linked to the processing units in sucessive layers by weighted connections.Each neuron processes its weighted inputs using an activation function, which results in an output for each neuron that passes to the next layer.
The methodology used to obtain a neural model is the next: That means D= / (pH, Meq, HmX, MO).The common sigmoidal function will be used for the hidden layer neurons as first option.Table 1 shows the two data sets used for the design of the neural net.Since MO (Initial concentration of metal) is constant for all the data available, its influence will be included in the bias term, so the initial structure of the neural net is showed in figure 4-

RESULTS FORTHE NEURAL NET MODEL
The training algorithm developed in MATLAB reaches a minimum value after 100 iterations and is not dependant on the initial random values of the weights and biases, as can be seen in figure 5.
The final error of the neural net is below 0.07.Although small errors can be achieved using the backpropagation algorithm to adjust weights and Figura 5. Evolución del error en la fase de aprendizaje de la red neuronal.
biases in the neural net, it is necessary to check if the net emits appropriate response when faced with new data.The output of the net for the data set 2 showed in table I are summarized in table II and compared with results obtained with the Kmodel developed in previous section.
As can be seen, data predicted with this simple neural net fit better the experimental distributions coefficients.
Superior de Investigaciones Científicas Licencia Creative Commons 3.0 España (by-nc) http://revistademetalurgia.revistas.csic.es ]^^^ and [Ml^^ are the total concentration of metal in the aqueous and organic phases in equilibrium respectivelycalculate the theoretical distribution coefficient (D^J^^Q^) if the next variables are known: a) Equilibrium constant for the system (K) b) Concentration of acid extraction reagent in equilibrium (approximately equals to the initial concentration).c) Equilibrium pH.If several simultaneous equilibria are proposed, the theoretical values of the distribution coefficient are calculated using: D,H.or = I^,=l K|H,xf H" (11) where i denotes the number of equilibria proposed.Rev. Metal.Madrid 41 (2005) 374-383 (c) Consejo Superior de Investigaciones Científicas Licencia Creative Commons 3.0 España (by-nc) http://revistademetalurgia.revistas.csic.esExperimental distribution coefficient (D^^p) can be calculated following two ways: 1) Analytical determination of the metal concentration in both phases: determination of the metal concentration in the aqueous phase and calculation of the metal concentration in the organic phase in equilibrium using mass balance.The expression is: M"=M^ii^..K_[ML)", v.. where, R denotes the ratio of aqueous to organic phase and [M]o the initial concentration of the metal Finally of the metal (positive) = Number of atoms of the metal in the anionic complex = Anionic ligand = Number of atoms of the anion in the complex (positive) m = Charge of the anion in the complex (negative) R^N = Anionic extraction reagent (tertiary amine) Y = Anion in the amine salt b = Valence of the acid (positive) 18) Equation 18 allows to determine the theoretical distribution coefficient (D^^^oj.)if the next values are known: a) Equilibrium constant for the system (K) b) Concentration of anionic extraction reagent in equilibrium (approximately equals to the initial concentration).c) Concentration of the anion that forms amine salt, Y (it can be related to the pH of aqueous phase).For multiple equilibria, a similar expression of equation 12 can be used.Experimental distribution coefficient is calculated from analytical determinations.The code for the program that resolves the minimization problem is similar to that presented for acid reagents (program caleqi.m)and is showed in the appendix.The function that define Dj-^eor Is included in the file difcuadl.mRev. Metal Madrid 41 (2005) 374-383 377 (c) Consejo Superior de Investigaciones Científicas Licencia Creative Commons 3.0 España (by-nc) http://revistademetalurgia.revistas.csic.es3.3.Solvation (neutral) extraction reagents The next expression is proposed: state of the metal (positive) a = Number of atoms of the metal in the complex X = Anionic ligand c = Number of atoms of the anion in the complex (positive) m = Charge of the anion in the complex (negative) L = Neutral ligand d = Stoichometric coefficient for the neutral ligand Y = Accompanying cation b = Charge of the cation (positive)

Figure 1 Figure 1 .
Figure1shows the results of the program for the data obtained for the system Cu^^ / NO3-/ ACORGA M5640 / IBERFLUID, considering that the extraction mechanism corresponds to:
a) Depending on the activation function used in the neurons, input and output data vector must be scaled in an adequate range: [0 1] for sigmoidal function, [-l +1] for hyperbolic tangent function, etc... b) The first layer of neurons simply distributes the input data to the next layer, so the number of neurons of this layer is the same that the number of different input variables of our problem.c) A bias or reference is added to each layer except at the input layer.d) Neurons in the hidden and output layers receive weighted inputs from the previous layers, which are summed.The resulting weighted summation plus bias term are passed through the activation function choosen, so the output of each neuron is always in the range previously selected.e) The learning process consists of identifying the weights that produce the best fit of the produced outputs over the entire training data set.The initial values for the weights are generally set to random values.During the training the weights are adjusted continuously based on the error generated by the discrepancy between the output of the network and the output of the training examples.The initial structure chosen for modeling the liquid-liquid equilibrium data is shown in figure 4. 379 (c) Consejo Superior de Investigaciones Científicas Licencia Creative Commons 3.0 España (by-nc) http://revistademetalurgia.revistas.csic.esThe number of input variables for the liquid^Uquid extraction mechanisms discussed in previous section is four.Following the same notation, for a cat ionic extraction mechanism, we can assign: l)xi-^pH 2) X2 -^ Meq (Metal concentration in aqueous phase in equilibrium) 3) X3 -> HmLX (Concentration of acidic (cationic) extraction reagent) 4) X4 -^ MO (Initial concentration of metal in aqueous phase) 5) Yi -^ Distribution coefficient

Tabla I . Figure 4 .
Figure 4. Structure of the BP neural net chosen for modeling liquid-liquid equilibrium data.

Figure 5 .
Figure 5. Evolution of the error in the training of the neural net.
Tabla IL Comparison of K-model and neural net model for predicting equilibrium data for a cationic system (D values are normalized in the range [0 +1 ]) TableII.Comparación de los modelos basados en la constante de equilibrio y la red neuronal en la predicción de los datos de equilibrio para un sistema catiónico (Los valores de D se han normalizado en el rango[O, 1]) Superior de Investigaciones Científicas Licencia Creative Commons 3.0 España (by-nc) http://revistademetalurgia.revistas.csic.es