Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HPCG crash when nx=440 ny=440 nz=424 #66

Open
NguyenDacLiem opened this issue Jul 19, 2021 · 4 comments
Open

HPCG crash when nx=440 ny=440 nz=424 #66

NguyenDacLiem opened this issue Jul 19, 2021 · 4 comments

Comments

@NguyenDacLiem
Copy link

NguyenDacLiem commented Jul 19, 2021

I run HPCG test on my machine with 2048GB and 128 CPU by command "mpirun -np 16 -x OMP_NUM_THREADS=8 -x OMP_PROC_BIND=TRUE -x OMP_PLACES=cores --allow-run-as-root ./xhpcg"
and the program crashes, error message:
../src/GenerateProblem_ref.cpp:205: void GenerateProblem_ref(SparseMatrix&, Vector*, Vector*, Vector*): Assertion `totalNumberOfNonzeros>0' failed
As I investigate, this issue related to overflow, so how can I fix this issue? Thanks.

@luszczek
Copy link
Collaborator

Please take a look at Geometry.hpp line 37. Make sure that HPCG_NO_LONG_LONG is not defined so that sizeof(global_int_t) is at least 8. It seems that your global indices use 32-bit integers. More information about your build would help indicate the specific flag that caused this.

@NguyenDacLiem NguyenDacLiem changed the title HPCG crash when nx=440 ny=424 nz=424 HPCG crash when nx=440 ny=440 nz=424 Jul 20, 2021
@NguyenDacLiem
Copy link
Author

Thanks for your reply, I have a small mistake in my status, I updated it. I sure the HPCG_NO_LONG_LONG is not defined and sizeof(global_int_t) is 8
When I set nx = 440, ny=424, nz=424, the program run well. But when I increase ny to 440, it crashes. It also happens when I set nx=ny=nz=512. I add debug and below are the out put:
// If this assert fails, it most likely means that the global_int_t is set to int and should be set to long long
// This assert is usually the first to fail as problem size increases beyond the 32-bit integer range.
printf("LIEM %d %lld\n", sizeof(totalNumberOfNonzeros), totalNumberOfNonzeros);
assert(totalNumberOfNonzeros>0); // Throw an exception of the number of nonzeros is less than zero (can happen if int overflow)

Output:
nx = 440 ny = 424 nz =424
LIEM 8 34105190392
LIEM 8 34105190392
LIEM 8 34105190392
LIEM 8 34105190392
nx = 440 ny = 440 nz = 424
LIEM 8 -33326285448
LIEM 8 -33326285448
LIEM 8 -33326285448
LIEM 8 -33326285448
LIEM 8 -33326285448
LIEM 8 -33326285448

@NguyenDacLiem
Copy link
Author

#@Header

----------------------------------------------------------------------

- shell --------------------------------------------------------------

----------------------------------------------------------------------

SHELL = /bin/sh

CD = cd
CP = cp
LN_S = ln -s -f
MKDIR = mkdir -p
RM = /bin/rm -f
TOUCH = touch

----------------------------------------------------------------------

- HPCG Directory Structure / HPCG library ------------------------------

----------------------------------------------------------------------

TOPdir = .
SRCdir = $(TOPdir)/src
INCdir = $(TOPdir)/src
BINdir = $(TOPdir)/bin

----------------------------------------------------------------------

- Message Passing library (MPI) --------------------------------------

----------------------------------------------------------------------

MPinc tells the C compiler where to find the Message Passing library

header files, MPlib is defined to be the name of the library to be

used. The variable MPdir is only used for defining MPinc and MPlib.

MPdir = /root/opt/openmpi
MPinc = -I$(MPdir)/include
MPlib = $(MPdir)/lib/libmpi.so

----------------------------------------------------------------------

- HPCG includes / libraries / specifics -------------------------------

----------------------------------------------------------------------

HPCG_INCLUDES = -I$(INCdir) -I$(INCdir)/$(arch) $(MPinc)
HPCG_LIBS =

- Compile time options -----------------------------------------------

-DHPCG_NO_MPI Define to disable MPI

-DHPCG_NO_OPENMP Define to disable OPENMP

-DHPCG_CONTIGUOUS_ARRAYS Define to have sparse matrix arrays long and contiguous

-DHPCG_DEBUG Define to enable debugging output

-DHPCG_DETAILED_DEBUG Define to enable very detailed debugging output

By default HPCG will:

*) Build with MPI enabled.

*) Build with OpenMP enabled.

*) Not generate debugging output.

HPCG_OPTS = -DHPCG_ENABLE_DETAILED_DEBUG

----------------------------------------------------------------------

HPCG_DEFS = $(HPCG_OPTS) $(HPCG_INCLUDES)

----------------------------------------------------------------------

- Compilers / linkers - Optimization flags ---------------------------

----------------------------------------------------------------------

CXX = mpicxx
CXXFLAGS = $(HPCG_DEFS) -v -O3 -ffast-math -ftree-vectorize -ftree-vectorizer-verbose=0 -fopenmp

LINKER = $(CXX)
LINKFLAGS = $(CXXFLAGS)

ARCHIVER = ar
ARFLAGS = r
RANLIB = echo

----------------------------------------------------------------------

@NguyenDacLiem
Copy link
Author

The issue is local_int_t is set to int not long long type. How can I run test by 64/64 - global and local are 64-bit without change source code?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants