THE USE OF THE LINE-BY-LINE RECURRENT METHOD FOR SOLVING SYSTEMS OF DIFFERENCE ELLIPTIC EQUATIONS WITH NINE-DIAGONAL MATRICES

The applying of the line-by-line recurrent method for solving systems of difference elliptic equations with nine-diagonal matrices is the subject of the article. Such matrices take place in the case of difference approximation of 2D differential problems of a higher order of accuracy on a regular grid covering the area under consideration. The technology of the so-called compensatory transform which allows replacing the initial nine-diagonal matrix of the system with the five-diagonal one is offered in the article, due to the fact that originally the line-by-line recurrent method was designed for solving systems of difference equations with a five-diagonal matrix. The efficiency of this technology is analyzed by comparing the solutions of the test boundary value problem in a unit square. The solutions are found both with the help of different implementations of the compensatory transform technology and by other modern highly efficient iterative methods for solving the systems of difference equations. The problem is solved on the sequence of grids from coarse ( 501 \times  501 ) to fine ( 4001 \times  4001 ) nodes. The accuracy of the solution convergence is determined by the relative norm of the residual, which is equal to 10  -  12 in the present work. It is shown that the line-by-line recurrent method retains its high efficiency over the entire range of the grids under consideration despite the use of the intermediate technology of the compensatory transform.


Introduction
As is well known, many "standard" finite-difference approximation technologies for two-dimensional differential equations of problems of fluid dynamics and heat transfer on regular grids are based on five-point stencil. The Patankar scheme [1] can be mentioned here as an appropriate example of this kind of technologies. This technology yields an algorithm which strictly provides monotonicity and central-point dominance of the finite-difference five-point scheme. And as a result, a system of linear algebraic equations (SLAE) with a five-diagonal matrix of positive type arises [2]. The Patankar scheme has shown itself well in numerous studies. However, the need for higher-order accuracy schemes arises in a number of cases. For example, the five-point central-difference scheme provides a smaller error than the Patankar scheme, despite the fact that these schemes are formally of the same order of approximation. Unfortunately, the SLAE matrix obtained on the basis of the central-difference scheme can lose its positive type in certain conditions.
One can use the nine-point stencil (five points along each direction) to avoid this difficulty and increase the approximation order at the same time. In this connection a variety of higher-order numerically stable schemes was developed using the nine-point stencil [3][4][5][6][7][8]. As a rule, the convective flux discretization is the focus of a higher-order difference approximation, as it was done in the previously cited works. While, the diffusion part of the transport equations is usually approximated by the standard five-point scheme of the second order accuracy. The works that use a higher-order difference approximation of diffusion flux are found much less often [9]. Difference approximation of differential equations with derivatives of higher order (up to the fourth one) is another origin of difference linear equations on five points along each independent coordinate [10]. In this case, it is evident that for 2D problems the difference stencil will consist of nine nodes.
It is remarkable that the absence or cursory mention of an iterative method for solving arising nine-diagonal SLAE is a common feature of the above works. Probably the researchers believe that the expansion of existing methods for solving systems with five-diagonal matrices to methods for solving systems with nine-diagonal ones is not an insuperable barrier. Indeed, the transition of such well-known methods as block successive over-relaxation (BSOR) or line-by-line method [1] from five-point to nine-point versions do not cause much difficulty. But on the other side, there are effective computational technologies for the SLAE solution, which don't have an easy expansion up to nine-point stencils (see, for example, [11,12]).
A deferred correction procedure was developed to overcome this problem. Its idea is that a nine-diagonal matrix of the system of equations is replaced by a five-diagonal one which is obtained with the use of a difference approximation of lower order in a special way. After that one can use any available method for solution of SLAE with a five-diagonal matrix. Interestingly, the procedure occurs in two variants in the literature. The first variant is used when there is an independent difference approximation of the convective and diffusive terms of the transport equation [6]. So, it operates with convective terms of the equation on the stage of their difference approximation. The second variant of the procedure is applied to an already formed algebraic difference equation, regardless of how it was obtained and, therefore, it is more universal [13]. The article will consider the second version of the deferred correction procedure. It should be noted that the procedure has a weak point, namely: if there is no difference approximation of some terms of the equation of lower order on a reduced number of nodes, then it is not applicable. For example, such situation takes place when there are fourth-order derivatives in a differential equation that cannot be approximated using less than five nodes. Therefore, in this case one have to design other procedure to replace a nine-diagonal matrix of SLAE with the five-diagonal one, which does not have this shortcoming.
Recently, a highly efficient line-by-line recurrent method was developed and successfully used for solving systems of difference elliptic equations in some problems of computational fluid dynamics and heat transfer [14]. This method is applicable to systems of equations with five-diagonal matrices due to its design features in the case of two-dimensional problems [15]. However, attempts to modify this method for the case of systems of equations with nine-diagonal matrices faced great difficulties. Therefore, the way out is to use the line-by-line recurrent method in cooperation with an intermediate technology of matrices transformation like the deferred correction procedure, for example. In the light of the foregoing, the objective of the work is to develop a universal technology of matrix transformation and to investigate the effectiveness of the line-by-line recurrent method in solving systems of difference elliptic equations with nine-diagonal matrices.
The paper is organized as follows. The mathematical statement of a problem, namely: definition area, differential transport equation, boundary conditions, closing formulas -are described in Section 1. In the following Section 2, the details of the numerical technique including high order numerical discretization, deferred correction procedure, compensatory transform technology are given. The comparisons of solutions of the test problem obtained by different methods are presented in Section 3. The conclusions are drawn in the final Section.

Statement of problem
Let \Omega = \{ (x, y) : 0 \leq x \leq 1, 0 \leq y \leq 1\} be a unit square in Cartesian coordinates ( Fig. 1) as the definition area of unknown \Phi (x, y) which is governed by a differential transport equation. In this case the formulation of a test 2D boundary value problem in \Omega can be used as an origin for obtaining a system of difference elliptic equations with the sparse matrix with the help of difference approximation of the initial differential problem. The generalized steady-state convection-diffusion transport equation written for \Phi (x, y) can be stated as [1] U \partial\Phi \partialx where U (x, y), V (x, y) -flow velocity components, \Gamma (x, y) -transfer coefficient, S(x, y)source. Dirichlet conditions take place on the area boundaries. Let velocity components and transfer coefficient be as follows: U (x, y) = - 3y 2 arctan x, V (x, y) = y 3 1 + x 2 ; \Gamma (x, y) = exp( - l 2 ), l 2 = x 2 + y 2 .
So, the test 2D boundary value problem is defined and one can begin to solve it numerically.

High-resolution numerical discretization
The problem area is covered by a uniform orthogonal mesh which nodes one can separate into three groups: internal, near-boundary, and boundary nodes. The nine-point stencil with an internal central node is presented in Fig. 2a. The so-called SMART scheme [7] is used for higher-order approximation of (1) in internal mesh nodes. Briefly, the technology of the scheme obtaining is as follows [16]. Integrating the equation (1) over the control volume ( Fig. 2a) and using the divergence theorem for a Cartesian coordinate system allows getting the following discrete equation: where J e , J w , J n , J s represent the total fluxes of unknown \Phi across faces e, w, n, s of the control volume, and Q is the volume integral of the source term S. Each of the surface fluxes J contains convective and diffusive contributions. It is expressed, for example, for the face e, as follows: where The tilde above unknown \Phi denotes a so-called normalized variable which is defined as Here, subscripts U, D mean upflow and downflow nodes relative the central point, correspondingly. For example, for the face e the central node will be point P in The fluxes through the w, n and s faces can be found in a similar manner. The eight-point stencil with a near-boundary central node is presented in Fig. 2b. For example, the right boundary of the problem area \Omega is chosen. For this case the general approximation scheme described above can be applied if U e \geq 0. In contrary, if U e < 0 (as in the figure) it is necessary to follow special practices. In this context, it should be kept in mind The Use of Line-by-Line Recurrent Method for Solving Systems of Difference Elliptic... that a lower order approximation takes place as a result of the "windward rule" in any case. In literature the first order upwind scheme for near-boundary node is used as a rule [6]. But in the present work the more complex technology of Patankar scheme is adapted with a view to obtain a second order upwind difference scheme. For this case it is not difficult to get the approximation formula for \Phi e based on this methodology applying the profile of the fifth degree for unknown \Phi , namely: where, in the context of the grid uniformity \ \Psi (P e ) = \left\{ - P e , P e < - 10, Indeed, as is well known, the solution of the problem "convection and diffusion" for a domain 0 \leq x \leq L with boundary conditions: \Phi = \Phi 0 at x = 0, and \Phi = \Phi L at x = L is on the assumption with \Gamma and \rho U are constants [1]. Here P is a Peclet number defined by P \equiv \rho U L/\Gamma . The value of \Phi e in (4) is calculated according to the solution profile (7), i. e. it is assumed that \Phi 0 = \Phi P , \Phi L = \Phi E , L = \delta x e , x = \delta x e /2, P = P e , and \Phi = \Phi e in the formula (7). So, where \Psi (z) = z/(exp(z) -1). Because an exponential function is very expensive to compute, \Psi (z) is approximated by Patankar's power-law scheme (see formulas (5.27) and (5.33) in [1]) which is represented by the complex formula (6) in the present work. In other words, \Psi \approx \\Psi . As a result, it is easy to see that in this case the formulas (8) and (4) are almost identical taking into account the formula (5). What was required to show. Finally, the trivial "approximation" takes place for the third group of the mesh (i. e. for the boundary lines of \Omega ) because of Dirichlet boundary conditions in the problem.

Deferred correction procedure
The deferred correction (DC) method is a simple and proven procedure that enables the use of high order approximation schemes in codes initially written for low order schemes. Let, in general case, there be a difference scheme as a result of approximation of the original differential equation (1) on the nine-point stencil (Fig. 2a) of the following kind In turn, let the difference scheme of lower order approximation for the same equation (1) be as follows It is easy to see, adding to both sides of equation (9) the combination of a L P \Phi P -\sum nb a L nb \Phi nb , composed of the terms of equation (10), the DC procedure results in a five-point equation where k is number of iteration, nb = \{ E, W, N, S\} , nnb = \{ EE, W W, N N, SS\} . It is clear that the solution of equation (11) tends to the solution of equation (9) with the convergence of the iterative process (i. e., with \Phi k+1 ---\rightar k\rightar\infty \Phi k ). At the same time, one can use any previously created methods for solving SLAE with five-diagonal matrices to solve modified system on the base of equation (11). Further in the article the DC procedure will be denoted as DSPt5, since the Patankar scheme with the profile of unknown \Phi of the fifth degree is used to the lower order approximation.

Compensatory transform technology
As was mentioned above, the DC method is usable when a lower order approximation on a truncated five-point stencil takes place. Otherwise, one must apply other more general technology to transform a nine-diagonal matrix of SLAE to a five-diagonal one. Precisely that kind of a procedure, the so-called compensatory transform technology, is offered in the work. The major idea of the compensatory transform technology is to express the iterative increment of the sought-for solution in the "extreme" nodes of the stencil (in the Fig. 3 they are marked in cyan) through the increment in the "internal" nodes (white and black nodes of the stencil). So, the "extreme" nodes of the stencil are excluded and the matrix of the system of equations is transformed from nine-diagonal to five-diagonal one. The transformation formula has the first or second order of accuracy depending on the number of the "internal" nodes of the difference stencil used in the expression. For example, in the case of uniform grid the formula of the first order accuracy for node EE is as follows and the formula of the second order of accuracy for the same node is as follows Here \Delta \Phi k+1 = \Phi k+1 -\Phi k -is increment of the sought-for solution, \theta is a parameter of compensation, which should be in the range 0 \leq \theta \leq 1 [2]. It is easy to verify that the application of the above formulas will lead out to the following expressions for the transformed coefficient in for the first and second order of accuracy respectively. Transformed coefficients for other nearby points W, N, S are written in a like manner. As a result, the transformed five-point difference equation is arrived as follows where for the first order of accuracy \= a P = a P + \theta (a EE + a W W + a N N + a SS ) , and for the second order of accuracy, respectively \= a P = a P + 3\theta (a EE + a W W + a N N + a SS ) , In the further, the compensatory transform technique of the first order of accuracy will be denoted as C1 and of the second order -as C2.  The maximum effective value of the iteration parameter was selected for each method in each calculation, since all methods use the iteration parameters. In other words, an upper estimate of the effectiveness was made for each method. This approach made it possible to correctly identify the advantages of one methods in relation to others because all methods were placed in the same conditions.

Results: coarse and fine meshes
The most interesting for the analysis is the behaviour of the convergence curves which are the dependencies of the \bigm\| \bigm\| R k \bigm\| \bigm\| 2 / \bigm\| \bigm\| R 0 \bigm\| \bigm\| 2 value on the iteration number or the CPU time of the problem. Here \bigm\| \bigm\| R k \bigm\| \bigm\| 2 is Euclidean norm of the residual error at the kth iteration. Such convergence curves as functions of the iteration number are plotted in Fig. 4 for the coarse and fine meshes. It is not difficult to see that accuracy of the solution convergence is 10 - 12 . The same value of accuracy is applied in all other results of the work.
Analysis of the curves in Fig. 4a allows the following conclusions. First, the classical block SOR method (curve 8) is not usable due to a huge number of iteration to converge the methodmore than one thousand. Naturally, there is not enough place for such curve on the graph. Second, the combination DCPt5 + LR2sK (curve 1) is most effective both in reducing the initial residual error on the first iteration and the total number of iterations for the method convergence. And finally, third, in whole, the versions with line-by-line algorithm are more powerful with respect to the variants of the bi-conjugate gradient method. As to calculations with the fine mesh (see Fig. 4b), here the results coincide qualitatively with the ones on the coarse grid, but, as a rule, the number of iterations for solution convergence is several times greater. The absence of the convergence curve of the BSOR9 method is explained by the lack of convergence of the method -the relative residual error was more than 5 \times 10 - 8 after 3000 iterations.
It is obvious that different methods require different amounts of mathematical operations and, accordingly, different amounts of a CPU time to perform calculations of one iteration. Again, a researcher is ultimately interested in the time spent by a computer for working out a solution. Therefore, a comparison of computation times is also of research interest. Yet it is clear, that only the relative CPU times have a sense here. In other words, only the times of calculations performed on the same computer can be compared. Precisely such results in the form of convergence curves are shown in Fig. 5 for coarse and fine meshes. From now on, CPU time is presented in seconds.
It is easy to see that the combination of DCPt5 + LR2sK methods (curve 1) has not even got into the "top three winners". The reason is quite clear: recalculation of the right part of the system of linear equations by DC procedure at each iteration is a time-consuming activity. Owing to similar arguments the CPU time of the DCPt5 + LR2sK combination is almost equal to CPU time of the C2 + LR2sK one for the 4001 \times 4001 mesh (see Fig. 5b). In all other respects, the relative behaviours of the curves in the graphs of Fig. 4 and Fig. 5 coincide qualitatively.
As is known, the average rate of convergence is one of the most indicative characteristics of the efficiency of an iterative method which is formulated as follows  Fig. 6. It goes without saying that the higher the curve the more efficient the method is. The low-lying curve 8 once again confirms the relative inefficiency of the classical method BSOR9. The productivities of the other methods are comparable with each other. And as expected from the previous graphs, the highest curve 1 corresponds to the most effective method -the combination DCPt5 + LR2sK. The almost direct behavior of the curves in the logarithmic system of coordinates indicates a power dependence of the average rate of convergence Q k on the iteration number k. Some quantitative results of solving the problem on the coarse and fine grids with the use of the methods under consideration are presented in Tab. 2. The data in brackets for BSOR9 method for grid 4001 \times 4001 emphasize the lack of the solution convergence with the required accuracy. Here \bigm\| \bigm\| Z k \bigm\| \bigm\| \infty value is an infinity norm of the error at the moment of solution convergence. One Table 2 The results of solving the problem by various methods with various meshes can see the norm is reduced by two orders of magnitude with a decrease in the value of the grid step by only an order of magnitude. It should also be noted that the number of iterations in this case increases by less than an order of magnitude, while the CPU time is increased by as much as two orders of magnitude on average. Special attention should be paid to the fact that 8 iterations were required for the method convergence on the 501 \times 501 grid, and only 7 iterations -on the 4001 \times 4001 grid, while using the combination DCPt5 + LR2sK. The explanation for this fact will be presented a little later.

Influence of the mesh resolution
The influence of the grid resolution on the number of iterations and the CPU time of the solution convergence is presented in Fig. 7. It is known that an increase in the dimension of SLAE (a decrease in the magnitude of the grid step), other things being equal, leads to an increase in the number of iterations [20]. Exactly such regularity takes place in Fig. 7a, except for curve 1 (the combination DCPt5 + LR2sK), which demonstrates the decrease in the number of iterations with increasing the grid dimensionality. The reason is that the original line-by-line recurrent method has a fundamental individuality: as the grid step decreases, the number of iterations required for convergence decreases [15]. This feature of the method has been manifested only in the combination DCPt5 + LR2sK. Yet other combinations with line-by-line recurrent method (curves 2-5) do not demonstrate the features because the presence of the approximate compensatory transform technology in the combinations suppresses this fundamental individuality. In the general case, line-by-line recurrent method of the second order LR2 is more efficient than the one of the first order LR1, regardless of whether this method was accelerated in Krylov subspaces or not, and the relationship of the curves 2 and 3 confirms this thesis. However, the locations of the curves 4 and 5 demonstrate the opposite. The reason is that additional approximate compensatory transformation technology decreases the method stability. It is necessary to lower the value of compensation parameter \theta in relation to its optimum value to maintain stability. As an effect, the more parameter \theta differs from its optimum, the slower the method carries out. And one has to make \theta lower for LR2 than for LR1 due to lower stability of LR2, which in turn leads to a greater slowdown LR2 in relation to LR1.
And finally, it is not difficult to see that power dependences of the number of iterations K I and CPU time T I against a mesh resolution take place because graphs are almost direct in the logarithmic coordinates.

Conclusions
The technology of expanding the use of the line-by-line recurrent method on the case of SLAE with nine-diagonal matrices arising from the difference approximation of 2D boundary value problems of higher order was considered in the article. Approximation of the government differential equation have been carried out using the SMART scheme of the third order of accuracy. Also, approximation of the second order of accuracy in the near-boundary nodes using the Patankar scheme instead of classical upwind one of the first order of accuracy was proposed in the paper. Both the known deferred correction method and the original compensatory transform technology were used in the work to replace an initial nine-diagonal matrix of SLAE with a five-diagonal one. The comparative analysis of several modern methods and their combinations with algorithms for replacement of nine-diagonal matrices with five-diagonal ones to solve SLAE has been performed to reveal their efficiency in relation to each other.