A REGULARIZED LEVENBERG–MARQUARDT TYPE METHOD APPLIED TO THE STRUCTURAL INVERSE GRAVITY PROBLEM IN A MULTILAYER MEDIUM AND ITS PARALLEL REALIZATION

The structural inverse gravity problem in a multilayer medium is one of the most important geophysics problem. Until recently, the problem was reduced to the separation of gravitational fields and the restoration of unknown layers independently. Now the methods are in demand that allow find unknown layers simultaneously. For solving Urysohn integral equation of the first kind describing the problem regularized algorithms Levenberg–Marquardt type with weight factors are investigated. A new Levenberg–Marquardt type method based on Levenberg–Marquardt scheme is proposed. A regularized Levenberg–Marquardt type method compared with classic Levenberg–Marquardt method. For classic Levenberg–Marquardt method some computational optimizations are offered. The numerical experiments using model gravitational data allow to compare convergence rates, relative errors and program execution times of classic Levenberg–Marquardt algorithm and Levenberg–Marquardt method. The parallel programs implementing the algorithms are developed using CUDA and OpenMP technologies. 10.14529/cmse170301.


Introduction
This paper is concerned with iterative solutions solving the inverse structural gravity problem in a multilayer medium and is a continuation of the series of works [1,2].
Hence we consider an operator equation where A(u) is nonlinear Frechet differentiable integral Urysohn type operator between Hilbert spaces U, F , u = (u 0 , .., u L ) are unknown functons describing L desired interfaces, f is the total gravitational field. The solution of (1) does not depend continuously on the data and thus using of noise-contaminated data would lead to a meaningful deviation from solution. Hence a stable \ast The paper is recommended for publication by the Program Committee of the International Scientific Conference "Parallel Computational Technologies (PCT) 2017".
solution of (1) requires regularization techniques, for example the method of Tikhonov. We obtain where A \prime (u) \ast is a conjugated operator for derivative operator A \prime (u), \alpha > 0 is a regularization parameter, \| ff \delta \| \leq \delta , u 0 is an initial approximation. So we will assume solving equation (2). To solve (2) the regularized Levenberg-Marquardt algorithm can be used [3]: where \gamma is the damping factor. This method used in iterative solution nonlinear inverse problems of filtration, borehole and exploration geophysics ( [3][4][5]) etc. In the article [6] the method (3) strong convergence to the solution is set up for the Tikhonov-regularized equation on the assumption that the condition of the sourcewise representability of the solution z of the equation (1) and the Lipschitz conditions for the derivative of the operator A are fulfilled and the initial approximation is taken from a rather small neighborhood of the regularized solution.
This method is complex to implement. It takes a lot of time for matrix to matrix multiplication, matrix inversion. It is possible to use iterative methods for matrix inversion, so the iterative process is two-step: at each step we reduce the problem to SLAE, which we solve by some iterative method. We can see that LM algorithm tends to have larger computational overheads with an increase in the size of input data. The previous work [2] is concerned with a regularized Levenberg-Marquardt method (CLM) which within the weight factors approach proposed in [7] lets find simultaneously several structural boundaries described by unknown functions u 0 , .., u L in equation (1) using the total gravitational field f . Weight factors w i will be chosen as follows: .., f M \times L , ..., f L\times M \times N ) \rightar (w 1 , w 2 , ..., w L\times M \times N ), where F l (l = 1, 2, ..., L) are anomalous fields generated by the gravitating mass located below the corresponding depths H l for the sought surfaces of interface S l (l = 1, 2, ..., L). Weight factors depend on field F l which separated from field original F using preliminary processing of gravity observations [8].
Linearized gradient type methods based on linearized steepest descent method with weight factors (5) for solving the gravity problem are considered in works [9,10] where S(u k ) = A \prime (u k ) \ast (A(u k )f ), \psi is damping factor. This method is suited to deal with multilayer problem but there is a matrix to matrix multiplication operation.
Also in [11] a Levenberg-Marquardt gradient method based on Landweber-type scheme is proposed As seen, this method is fast but is suitable only for finding interfaces in two layer model. The present paper is focused on comparison of relative errors, numbers of iterations and computation times between classic regularized Levenberg-Marquardt method (LM) and CLM.
Here there are used gravitational field models with uniform 15% noise. In a view of big memory consumption and high computational complexity of LM some algorithmic optimizations are proposed. On a basis of algorithms the parallel programs are implemented using OpenMP and CUDA technologies. The perfomance estimations of parallel programs are obtained.
The rest of the paper is organized as follows. The section 1 is dedicated to inverse multilayer gravity problem definition. The section 2 devoted to LM and CLM description. The next section 3 describes a techniques and principles used for program development. The section 4 presents the numerical results using quasi-model gravitational data and the results of parallel implementations. The final section lists the conclusions.

Multilayer structural gravity problem statement
The three-dimensional structural inverse gravity problem on finding interfaces between medium layers on the basis of data on the gravitational field measured in a certain area of the earth surface, and the density jumps.
It is assumed that the lower half-space consists of several layers with a constant density \Delta \sigma l (l = 1, .., L), divided by desired interfaces S l , where L is the number of interfaces ( fig. 1). The gravitational effect of such a half-space is equal to the sum of the gravitational effects of all the interfaces. Functions u l = u l (x, y) describing the desired interfaces satisfy operator equation (2), operator A takes the form where f is the gravitational constant, \Delta \sigma l (l = 1, .., L) is the density jump, \Delta g(x \prime , y \prime ) = \sum L l=1 g l is the sum of an anomalous gravitational fields. Preliminary processing of the gravity data with the aim to select the anomalous field from the measured gravity data is performed using the methodology [8]. The problem is undetermined because of attemption to find several unknown functions u l = u l (x, y) from the given function \Delta g(x \prime , y \prime ). So it's necessary to use the weight factors which can be found from formula [7].

Numerical methods for solving the problem
To solve (6) the regularized Levenberg-Marquardt algorithm with weight factors can be used: where \Lambda is operator with a corresponding diagonal matrix with the weight factors on the main diagonal.
Remark. In nonlinear inverse gravimetry problems in a discrete representation the matrix A \prime (u k ) is ill-conditioned which entails significant increasing the condition number of The second method is a Levenberg-Marquardt regularized Levenberg-Marquardt algorithm [2]. Here iterative process approximates each of the solution components u l , l = 1, .., L: where where K \prime u (x \prime , y \prime , x, y, u k l (x, y)) is transposed kernel function of K \prime u (x, y, x \prime , y \prime , u k l (x, y)), A \prime (u k l ) \ast is a transposed derivative operator in u k l . The value \varphi l depends on u k l . The process (8) is implemented in discrete form Here we don't need computation of the inverse of matrix A \prime (u k ) T A \prime (u k ) + \alpha I. It makes this method more economical for numerical solution then (7) which computational complexity is O(n 3 ) because of multiplication A \prime (u k ) T A \prime (u k ) and matrix A \prime (u k ) T A \prime (u k ) + \alpha I inversion. The computational complexity of (8) is O(n 2 ) because the most time-consuming operation here is A \prime (u k ) T matrix elements calculation and matrix-vector multiplication.
Discretizing equation (6) on the n = M \times N grid with the given right-hand side \Delta g(x \prime , y \prime ) and approximating integral operator A(u) using the quadrature formula, we obtain the right-hand side F (x \prime , y \prime ) of M \times N dimension, the solution vector u(x, y) = [u 1 (x, y), .., u L (x, y)] of L\times M \times N dimension, the derivative matrix of operator A \prime (u k ) of (M \times N ) \times (L \times M \times N ) dimension, and the system of nonlinear equations The \| A n [u k ] -F n \| /\| F n \| < \varepsi relative error condition for comparing the exact and numerical solutions with a sufficiently small \varepsi is taken as the termination criterion.

Optimization, parallelization and implementation
A big size matrices in LM algorithm require large amounts of memory. For example, when L = 3, M = N = 1000 the matrix A \prime (u k ) \ast A \prime (u k ) type of double allocates \approx 67 Tb. Also full matrix-matrix multiplication is very computationally expensive problem. So to reduce memory allocation the decision was made to make all matrix-matrix and matrix-vector computations flying: a matrix element is calculated at the time of access to this element. Let it show.
Previously the system of non-linear equations (10) reduces to the SLAE: and [A \prime (u k ) T A \prime (u k )]u k on the fly. Within the "associative law"[A \prime (u k ) T A \prime (u k )]u k equals to , so "on the fly"technique makes it possible to avoid matrix to matrix multiplication replacing it matrix-vector twice operation. Further the system (11) can be solved by iterative gradient-type methods, minimal residual method e.g [12,13]. A method chosen in this work is a minimal residual method. Parallel algorithms for solving (6) are implemented numerically on the multicore Intel Xeon processor and NVIDIA Tesla M2050 graphics processors unit incorporated in the parallel computing system Uran at the Institute of Mathematics and Mechanics of the Ural Branch of RAS. The parallel algorithms are implemented on the multicore Intel Xeon processor using the OpenMP technology and Intel MKL library and on NVIDIA Tesla GPUs using the CUDA technology and CUBLAS library.
For the multicore Intel Xeon processor, the optimization of the vector-matrix operations using the Intel Xeon compiler options and the loop vectorization using the directive #pragma simd are implemented.

Results of numerical experiments
The structural inverse gravimetry problem of finding model interfaces S 1 , S 2 , S 3 for the four-layer medium with the density jumps was solved using the quasi-model original gravitational data and with uniform noise with an amplitude of 15% noise for the grids 100 \times 100 km 2 and 1000 \times 1000 km 2 . The gravitational field ( fig. 2) is a real but density jumps are taken from model, the model surfaces are based on the quasi-real surfaces constructed in work [9].
The part a) of the fig. 3 shows model interfaces S 1 , S 2 , S 3 . The part b) shows reconstructed interfaces by LM and the part c) shows CLM results. The parts d) and c) shows reconstructed interfaces by LM and CLM from noised gravitational field.
The table presents the computation times for solving the gravity problem in the three-layer medium for model interfaces with/without noise by the using LM, CLM methods for the grids of 100 \times 100 and 1000 \times 1000 dimensions. The weight factors were obtained from preliminary selected fields by formula from [7] with parameters \alpha = 1, \beta = 1, 1. The regularization parameter \alpha = 10 - 3 and the dumping factor \gamma = 1 were taken for both methods. The termination criterion \varepsi was set to 0.25. In the second column of the table number of iterations for gravitational data without noise is written, in the third column number of iterations for gravitational data with 15% noise is shown. The relative errors \delta i = \| u au e \| /\| u e \| for comparing the exact u e and numerical solution u a for each i layer are shown (for original gravitational data). In the last columns the solution times are shown: T 1 is the solution time on one core of Intel Xeon, T 2 is the solution time on eight cores of Intel Xeon, T 3 is the solution time on NVIDIA Tesla M2050 GPU. Data in the top substrings corresponds to 100 \times 100 grid and data in the bottom substrings corresponds to 1000 \times 1000 grid.

Conclusion
On a base of Levenberg-Marquardt and componentwise Newton type algorithms a Levenberg-Marquardt method is proposed. This method joins advantages of Levenberg-Marquardt scheme in solving gravity multilayer problem and simlicity in Levenberg-Marquardt apporoach in the Gauss-Newton method. At the same time a regularized Levenberg-Marquardt type method avoids some of the complexities associated with using classic Levenberg-Marquardt method. At first, the inversing an ill-conditioned matrices using internal iterative process. In the second place, matrix-to matrix multiplying entails high computational complexity and big memory consumption. This problem may be solved by "on the fly"technique. The results of numerical experiments show that CLM method has better convergence then classic LM. The both methods are resistant to uniform noise. For large-scale grids, when the data cannot be stored in the memory, "on the fly"technique is the fastest. The computations' acceleration and efficiency on multi-core and graphic accelerators are sufficient. At small grid sizes, the acceleration S n < n, where n is the number of processors, but when the grid size increases it is equalized S n \approx n and an efficiency E n \approx 1. This means a high resource parallelism of algorithms.

A.F. Skurydina
In the future, the question of theoretical interest of the Levenberg-Marquardt method concerns investigating its convergence properties, the conditions on the kernel of the integral operator in equation (1). The obtained conclusions will be useful for another applications.