Skip to content

C and C plus plus Best Practices

asoplata edited this page Jul 29, 2013 · 2 revisions

(For general infomation on C and C++, including how to choose between them, see this page)

Choosing a development environment

  • On Windows, Microsoft's Visual Studio Express (whatever the latest edition is) is a free for everyone IDE, NOT just students. Students can get the full Visual Studio for free. There is also popular Free and Open Source software (FOSS) Bloodshed Dev-C++, which is commonly used by professionals, despite its name.
  • On Mac, Apple's free Xcode IDE is analogous to MS's Visual Studio as the operating system's "main IDE" for developing C, C++, Objective-C, etc. code for desktop/server/mobile/iPhone/iPad applications or whatever applications you want.
  • On Linux, there are many free IDEs like Code::Blocks, Eclipse CDT, NetBeans, the list goes on...
  • Alternatively, Unix, which Mac's OS X uses and Linux is based on and almost identical to, was originally made as a C programming environment! So you don't need an IDE, and can instead use tools that Unix/Linux ALREADY have installed on their operating systems, like the GCC compiler, [Make](https://en.wikipedia.org/wiki/Make_(software) to organize compilation, GDB for debugging, Valgrind for memory debugging and leaks, etc. C/C++ work with the Unix-y command line so well that learning C/C++ is a good excuse for learning just how powerful the command line is!
    • For the Unix/Linux command line, Windows people can download Cygwin, which emulates a Unix-y terminal. Using Cygwin alone on Windows, one can install all the command-line-utilities mentioned above and compile using GCC.
    • One of the many advantages of this is that once you can do this, you can easily develop and deploy your code on a remote server easily, without ever having to use a windowing system/graphical user interface (GUI).
  • No matter what, enable syntax highlighting in whatever text editor you use (IDEs can do this easily in Preferences, if they're not already enabled by default).

Choosing a compiler

  • C++:
    • In general, use the G++ part of GNU Compiler Collection (GCC), as this is one of the most popular and ubiquitous compilers around. Its error messages can be esoteric, and it doesn't integrate well to IDE plugins that try to note warnings and errors in your code, but it is not whacky and code written for it will compile under most compilers. If you are unsure, use this compiler. On Windows, use this via Cygwin.
    • If you are addicted to Visual Studio (VS) in Windows and don't want to use Bloodshed Dev-C++, which uses Mingw's GCC implementation, Visual C++ for VS uses their own compiler that is made for the VS IDE. However, some parts of the code (like even how it defines basic types like some integer type categories) will have to be modified before compiling with a different compiler. Note that if you use VS' compiler, it will be annoyingly difficult for anyone who doesn't use it (say, people using Mac or Linux) to compile your code.
    • Apple's Xcode uses Clang++/llvm-clang++, which is a fairly new one that is gaining traction. In the coming years it may be a serious contender with GCC, but for right now it is not very popular in the scientific programming community.
  • C:
    • Use the GCC part of GCC as mentioned above.
    • Do not use Microsoft's Visual Studio for C compilation, as it is not standards-compliant going back to 1999's C99 and possibly before.
    • Clang (without the ++ but part of the same project as Clang++) is a growing contender for a C compiler.

Best Practices

Both C and C++:

  • Of course, follow all the best practices of general programming. Especially important is to plan beforehand for at least twice as long as you plan to spend implementing/actually writing code.
  • In addition to naming your variables well, both languages support custom data structs, so use them to organize your variables! E.g., if you have different neuron classes with many different parameters, use a single struct to organize all that celltype's parameters, like PY.g_Na for struct PY. This way, in functions you can pass a pointer to a single struct so as not to expensively pass-by-value, in addition to only having to pass one thing to the function, independent of however many things you store in that struct!
    • If you want to add a new data member to a certain type, keep track either in your head, through searching multiple files with regular expressions, or an IDE, of where to declare, what declarations/assignments require the member name, what require it to be initialized, and where it needs to be destroyed if necessary. All this is usually handled by the language itself in a dynamically-typed language like Python or MATLAB, but learning it will help you to parse those languages' error messages better.
  • Take the time to learn pointers, and how to use them with arrays; this is especially important if coming from MATLAB or Python. Think of C/C++'s preference for allocating memory at compile-time (literally, the time at which you compile your program) in arrays as similar to pre-allocating large chunks of memory prior to using them for MATLAB, which is a very important Best Practice.
  • Always initialize any data that is declared before using it. Undefined behavior (that's programming-language speak for Very Bad Things Possibly) can happen when accessing certain types when they're uninitialized.
  • If you expect to only change a small number of values over the course of many simulations/program runs, make your program accept inputs into the program itself by learning to use main(int argc, char *argv[]) in the function call of your "main" program. Do not be intimidated by this standard; argc is merely an integer that's the number of arguments you're supplying to the computer to run with this program (including the title of the program itself), aka the size of argv, and argv is an array of pointers to C-style strings, aka pointers to a character array, each of which are the arguments (including the title of the program itself) supplied when you RUN the program.
  • Use #include <ctime> (C++) or #include <time.h> (C) to measure in computer clock cycles how fast portions of your code run. This is also useful in debugging.
  • Use std::cout (C++) and its C equivalent copiously for easy debugging and tracking of values if you haven't already invested time in learning GDB or other debuggers.
    • This is especially useful when doing complicated pointer assignments, as un-dereferenced pointers, aka raw pointers, will print to standard output their memory addresses exactly.
  • If you are not going to use something later in the C/C++ program, but it is some result of a simulation/calculation you want to keep, write it to file as new/later values of that data are gradually created rather than storing it all in memory and then saving all the data at the end. This minimizes memory usage, and you will be writing it to file eventually anyways.
  • Always double check that you destroy memory you have allocated on the heap using new!!!
  • If you use control flow in determining how/which files you compile, learn to use Make, as this works with most compiled languages.
  • In your headers, organize your function prototypes separately from your data structure declarations.
  • Rarely seen in scientific programming, make constant using const nearly all the data members that will be constant and never changed. There will of course be a lot of parameters that will be modified and thus this won't apply to, but this is so you aren't allowed to make a change to a parameter somewhere and then forget about it. Real software engineering is much more demanding on this point.
  • Whatever compiler you use, tell the compiler you want it to tell you whenever there's a warning.
    • Treat warnings (which do not stop your program from compiling) as errors (which do), in that you should fix them immediately. Some compilers give you the option to tell it to treat warnings as errors, but enabling that is only for the hardest of core.
    • Despite C++'s allowance of exception handling, do your absolute best to avoid using them. They should be treated similarly to MATLAB's evil eval: they are almost never the best solution, and if you really think using them is the best solution, then consider reorganizing the relevant code. Still, sometimes they are the best solution.

C++ specifically:

  • If you don't know the size of a data vector/list until runtime (literally, the time at which you run the program, after it's been compiled), and

    • if you're not going to access it very often, use an std::vector type from the standard library. These are containers that are made to be dynamically (meaning during runtime) expanded/changed in size, and in a lot of cases are optimized to be almost as fast as arrays if they're not used that often in the code.

    • if you're going to be accessing it very often, and especially if you're going to be accessing it contiguously (i.e. going down the list in the order it exists in memory, as is common use with arrays), construct a dynamic array on the heap using (this is an example of a MULTIDIMENSIONAL dynamic array) something like

            matrix = new double* [dim_size_one];
            for (i = 0; i < dim_size_one, i++) {
                matrix[i] = new double [dim_size_two];
                for (j = 0; j < dim_size_two, j++) {
                    matrix[i][j] = sin(i * j);
                }
            }
      
      • In general software engineering, this is NOT best practice, as it is trying to solve a problem (containers of variable size at runtime) that is already meant to be solved by vectors. However, if you are going for high speed like with simulations, and in spite of the multidimensionality, if you are going down the elements of that array of pointers to arrays consecutively in memory, then this should offer better speed.
  • If you don't plan on your program becoming large, use using namespace std outside of the main() call for ease of use of things like cout. This is considered bad practice in general software engineering, as it brings in possible namespace conflicts, but for scientific programming where most of your variables are scientific-paradigm-specific, it shouldn't be a problem.

  • In addition to the general programming Rule(s) of Three, there is an [additional one for C++ classes](https://en.wikipedia.org/wiki/Rule_of_three_(C%2B%2B_programming), where if you must define either a destructor, a copy constructor, or a copy assignment operator for a class, you should probably define all three.

  • Resource Acquisition Is Initialization (RAII) is a C++ specific idiom which says, if you're using memory resources, you should obtain them when you initialize an object (in the OOP sense) and destroy them when the object's destructor is run. This is because object destructors are the only things definitely run when an exception is thrown.

    • Relatedly, in the new C++11 standard, which most compilers support now, use smart pointers, which are pointers that automagically manage (including destroy) objects that they point to.