- 5 questions, each 10 points;
- Three types of questions: conceptual questions, codes correction/completion/intepretation/output, write code to solve a question.
- Data parallelism: SPMD, data(variables) are local to different threads. Each thread wil change and have its own data.
- Concurrency: Concurrency is the simultaneous execution of multiple tasks. It can be logical or physical. Logical: multiple tasks run on the same processor. Phyiscal: run on different processor.
- Parallelism: breaking one task into several small tasks and combine the results into one.
- A process is a program in execution.
- The process is executed in a circle:execute the head of a que, suspend, append to the end of the que.
- Threads are independent executed, but with a shared memory space.
- How fast a process will run depends on how much the useful data is been cached.
- Cache hit: data found in cache. Cache miss: data needs to be retrieved from the ram.
- Principle of referential locality: In computer science, locality of reference, also known as the principle of locality, is a term for the phenomenon in which the same values, or related storage locations, are frequently accessed, depending on the memory access pattern. There are two basic types of reference locality – temporal and spatial locality. Temporal locality refers to the reuse of specific data, and/or resources, within a relatively small time duration. Spatial locality refers to the use of data elements within relatively close storage locations.
- Compile OpenMP:
gcc -fopenmp code.c
;omp_set_num_threads(4);
- Hyperthreading: A physical core is treated as two logical cores to maximum the efficiency. If a thread is in idle state, the core switch to execute an other thread. This may reduce the multiple-threads architecture programs.
- Barrier: there is always a barrier at the end of
#pragma omp parallel {}
- Improvement: Reduce communications, balance load, reduce cache miss
- #pragma omp parallel for (nowait): a default barrier at the end of for loop
- #pragma omp parallel schedule(static/dynamic/guided)
- #pragma omp critical: only one thread can enter the critical region
- #pragma omp atomic: same as critical but only applies to the memory location update
- SPMD: single program, multiple data
- #pragma omp barrier: wait all threads at the barrier point
- #pragma omp master: only executes by the master thread
- #pragma omp single: code only executes by the single block
- Data sharing: shared/ private/ firstprivate.
- #pragma omp for private(variable): the value is uninitialized
- firstprivate(variable): with the corresponding original value
- lastprivate(variable): defined as the value of its last sequential iteration
- The message-passing model: MPI is for communicating among processes, which have separate address spaces.
- Data parallel: SIMD, same instruction multiple data.
- Task parallel: MIMD, multiple instruction multiple data.
- SPMD: single program multiple data. It is quivalent to MIMD, since multiple instructions can be stored in single program.
- Message passing and MPI is for MIMD/SPMD parallelism.
- Program structure:
int rank, size; MPI_Init(null, null); MPI_Comm_rank( MPI_COMM_WORLD, &rank ); MPI_Comm_size( MPI_COMM_WORLD, &size ); printf( "I am %d of %d\n", rank, size ); MPI_Finalize(); return 0;
- Processes can be collected into groups. Each message is sent in a context and must be received in the same context. A group and context together form a communicator. A process is identified by its rank in the group associated with a communicator. MPI_COMM_WORLD is a default group that has all processes.
- TAG is used to identify the message from sender to receivers. (MPI_ANY_TAG)
- MPI_SEND (start_memory_address, memory_length, dataType, dest_rank in the communicator, tag, communicator)
example:
int rc; rc=MPI_Send(&array,array_length,MPI_INT,1,MPI_ANY_TAG,MPI_COMM_WORLD); if (rc != MPI_SUCCESS){...}
- MPI_RECV(start, count, datatype, source, tag, comm, status)
MPI_Status status; rc=MPI_Recv(&local_array,array_length,MPI_INT, MPI_ANY_SOURCE, MPI_COMM_WORLD, status);
- Basic API
- MPI_Init(null,null);
- MPI_Finalize();
- MPI_Comm_size(MPI_COMM_WORLD,worldsize);
- MPI_Comm_rank(MPI_COMM_WORLD, rank);
- MPI_Bcast -- broadcast data from one process to others
- MPI_Reduce -- combines data from other processes into one process
- In concurrent computing, a deadlock is a state in which each member of a group of actions, is waiting for some other member to release a lock. As a result, every process is waiting for others and no progress is made.
- Compile: mpicc -fopenmp code.c 16.When to use MPI:
- Portability and Performance
- Irregular Data Structures
- Building tools for others(libraries,crosslanguage)
- Need to manage memory per processor
- When not to use MPI:
- Regular computation matches HPF
- Solution already exists
- Require fault tolerance
- Distributed Computing
- A communicator can be a group of ordered processes, each process with a rank number.
- Groups allow collective operations to work on a subset of processes.
- Information can be added into communicators to be used by the processes in the communicators.
- Communicator access operations are local, thus requiring no interprocess communication Communicator constructors are collective and may require interprocess communication All the routines in this section are for intracommunicators, intercommunicators will be covered separately
- Intracommunicator: communicator within a group. Intercommunicator: communicator between a group.
- Intracommunicator create: MPI_COMM_CREATE. This is a collective routine, meaning it must be called by all processes in the group associated with commThis routine creates a new communicator which is associated with groupMPI_COMM_NULL is returned to processes not in groupAll group arguments must be the same on all calling processesgroup must be a subset of the group associated with comm.
- Destructors: Groups/communicator number is limited. MPI_Group/Comm_Free()..
- Intercommunicators are associated with 2 groups of disjoint processes Intercommunicators are associated with a remote group and a local group The target process (destination for send, source for receive) is its rank in the remote group A communicator is either intra or inter, never both.
- The original MPI standard did not allow for collective communication across intercommunicators MPI-2 introduced this capability Useful in pipelined algorithms where data needs to be moved from one group of processes to another.
- Rooted: One group (root group) contains the root process while the other group (leaf group) has no root Data moves from the root to all the processes in the leaf group (one-to-all) or vice-versa (all-to-one) The root process uses MPI_ROOT for its root argument while all other processes in the root group pass MPI_PROC_NULL All processes in the leaf group pass the rank of the root relative to the root group.