Quantcast
Channel: Recent posts
Viewing all articles
Browse latest Browse all 190

Issues with executing PARDISO, dcsrilu0_example0 MKL routines on Xeon Phi co-processor

$
0
0


Dear Mic people,


I am facing problems executing Intel MKL routines, PARDISO and dcsrilu0, on the Xeon Phi co-processor.
Please go through the report below. I would be very grateful if you could tell me any problems in my report.


The configuration of the Xeon Phi machine follows:


==============================================================
Compiler:
$ which icc
/opt/intel/composer_xe_2013_sp1.2.144/bin/intel64/icc
$ icc --version
icc (ICC) 14.0.2 20140120
Copyright (C) 1985-2014 Intel Corporation.  All rights reserved.


MKL toolkit
/opt/intel/composer_xe_2013_sp1.2.144/mkl


Host CPU Info
...
processor       : 7
vendor_id       : GenuineIntel
cpu family      : 6
model           : 62
model name      : Intel(R) Xeon(R) CPU E5-2609 v2 @ 2.50GHz
stepping        : 4
cpu MHz         : 1200.000
...


Xeon Phi coprocessor
...
processor       : 239
vendor_id       : GenuineIntel
cpu family      : 11
model           : 1
model name      : 0b/01
stepping        : 3
cpu MHz         : 1052.630
cache size      : 512 KB
...


$ cat /proc/meminfo
MemTotal:        7882352 kB
MemFree:         6109436 kB
...


==============================================================


I am facing problems running the following examples supplied in the MKL toolkit.


examples_core/solverc/pardiso_unsym_c.c
examples_core/solverc/dcsrilu0_exampl1.c

I am using the following make options.
make sointel64 interface=ilp64
make sointel64 interface=lp64


I am creating large sparse matrices in CSR format in Python using scipy package.
Characteristics of some of the matrices are as follows:


CSR matrix is 10240x10240. Number of nonzeros 1058707.
CSR matrix is 15554x15554. Number of nonzeros 2434660
CSR matrix is 16384x16384. Number of nonzeros 2700567.
...


I am able to successfully solve using these matrices in Python using numpy.solve and scipy.


I am also able to successfully execute the examples on host processor (with varying number of MKL threads from 1 to 8) for
all these matrices. However, I am facing problems running these examples on Xeon Phi coprocessor (with varying number of MKL threads from 1 to 240). The segmentation faults happen consistently.


I tried automatic offload, compiler-assisted offload, and manually copying the executable and running on Xeon Phi. The failures
occur in all the scenarios.


Output from one of the executions is shown below:


===================================================================
MAX MKL threads 240.
CHANGING: Reading COO matrix from MM file CSRMM.mtx.
Time to read MM file 13.443625 seconds.
Converting COO matrix to CSR matrix.
Time to convert coo to csr format 14.641642 seconds.
CSR matrix is 16384x16384. Number of nonzeros 2700567.
PARDISO...


=== PARDISO: solving a real nonsymmetric system ===
The local (internal) PARDISO version is                          : 103911000
1-based array indexing is turned ON
PARDISO double precision computation is turned ON
METIS algorithm at reorder step is turned ON
Scaling is turned ON
Matching is turned ON



Summary: ( reordering phase )
================


Times:
======
Time spent in calculations of symmetric matrix portrait (fulladj): 1.078529 s
Time spent in reordering of the initial matrix (reorder)         : 11.579866 s
Time spent in symbolic factorization (symbfct)                   : 1.920166 s
Time spent in data preparations for factorization (parlist)      : 0.076818 s
Time spent in allocation of internal data structures (malloc)    : 20.075173 s
Time spent in additional calculations                            : 5.905185 s
Total time spent                                                 : 40.635737 s


Statistics:
===========
< Parallel Direct Factorization with number of processors: > 240
< Numerical Factorization with BLAS3 and O(n) synchronization >


< Linear system Ax = b >
             number of equations:           16384
             number of non-zeros in A:      2700567
             number of non-zeros in A (%): 1.006040


             number of right-hand sides:    1


< Factors L and U >
             number of columns for each panel: 72
             number of independent subgraphs:  0
< Preprocessing with state of the art partitioning metis>
             number of supernodes:                    1120
             size of largest supernode:               15394
             number of non-zeros in L:                122143939
             number of non-zeros in U:                121034313
             number of non-zeros in L+U:              243178252
Time for PARDISO 40.637273 seconds.


Reordering completed ...
Number of nonzeros in factors = 243178252
Number of factorization MFLOPS = 2518180=== PARDISO is running in In-Core mode, because iparam(60)=0 ===
Percentage of computed non-zeros for LL^T factorization
Segmentation fault
===================================================================

Segmentation faults are happening always in numeric factorization.

I have attached the files (pardiso and dcsrilu0) for your perusal.


Could you please let me know how I can debug these failures in MKL routines?


Thanking you for all the help.


Best Regards
Manredd
 


Viewing all articles
Browse latest Browse all 190

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>