Learning by inmersion#

My time as a graduate student and then as a postdoc was dictated mostly by the academic train of events: publish, publish, publish. There was little time to be thinking about code and much less writing good code. However, it was easy to spot when bad code was being written.

This did not matter much - we didn’t only need to publish we also needed to abide by the milestones for the Exascale Computing Project, which we really did not want to miss.

This publish and milestone driven development was quite taxing yet quite fun. I was learning a lot, not only about high performance computing but also theoretical chemistry in general. I was quite happy that, in theory, my code would be able to accelerate many people’s research at some point in time.

At this point in time, circa 2019, the best way to access GPUs was using CUDA - there wasn’t really a good second options. Before you say, well there was OpenCL and OpenACC - I did say “good”. Take in mind, we were working with a legacy Fortran application that has been in continuous development since the 1980s. So, baking in OpenACC or OpenCL would’ve been very difficult.

Additionally, OpenACC seemed to be on its way out at that point. However, around that same time the OpenMP board came out saying that they would support GPUs too and they would do it by extending the current capabilities of the OpenMP standard. GAMESS already has OpenMP support, it seemed like a good idea at that point to also bank on that.

It was thus decided that the development would fork - one path would be keeping the Fortran and the second would be developing new capabilities in C++/CUDA. Now that I look back at it…the code we wrote was mostly C, there was a very small extend of C++ in what we wrote.

Since I agreed to join the C++ effort and I had already watched 10 hours of C++ I embarked on this adventure with other two graduate students. If you’re familiar with quantum chemistry, you’d know that the best place to start is the evaluation of the two electron integrals. For those who are uninitiated, let me explain.

Quantum Chemistry seeks to solve the Schrodinger equation for a many body system, i.e. molecules which contain many atoms and thus many electrons. Only the Hydrogen atom can be solved exactly - we also say, it has an analytical solution. Anything larger than that runs into the famous many-body problem, where there are many interacting particles and the equation which solves for its interactions cannot be solved. This is a well known problem that I am not going to explain here.

Because of this, solving the Schrodinger equation for chemically interesting systems (anything larger than a Hydrogen atom) requires a series of approximations. How these approximations are used and exploited is the core of computational quanutm chemistry. I will not go very deep into this, since there are many good resources online - at least, not yet.

The key approximation is called Hartree-Fock and it relies on expanding the wavefunction in terms of a basis, which allows the equations to be portrayed in terms of matrices. This is what makes it computationally friendly.

Then the physical interactions of the system are boiled down to the kinetic energy, nuclear-repulsion and attraction, and the electron-electron repulsion.

The motion of nuclei is taken to be constant, since they are so much larger than electrons (this is called the Born Oppenheimer approximation) - and thus we arrive to the “electronic problem”, the equations that represents the behavior of the electrons in the system. The problematic bit is still here, electrons are negatively charged so they repel each other. The position of one electron is directly influenced by the position of another electron and that electron is influenced by another, This is once again a many body problem.

Hartree-Fock solves this by taking the average contribution of all electrons and measuring how that affects as single electron. So, in English, for each electron, calculate the average Coulomb force of the rest (N-1) electrons and see how that “field” affects the electron N we are looking at. You do this for all electrons and you’ve effectively approximated the electron-electron repulsion - this is called the “mean field approximation”.

The integrals that calculate the Coulombic repulsion between the electrons in this mean field approximation are called the “two electron repulsion integrals”, also called “four center two electron integrals”, or ERIs for short.

The problem with ERIs is that there are many of them. Without getting into much detail, for each measure of size N there are N^4 integrals that need to solved. If you are familiar with computer science the big O notation of this problem is O(N^4), which is terrible! And the saddest part is that this is the simplest approximation out there.

Now to make it worse…the procedure to solve the Hartree-Fock equations is iterative! The method to do this is called the Self-Consistent Field (SCF) procedure. So we need to iterate until we’ve reached a stable solution. These integrals are needed at every step of convergence procedure and thus either they need to be stored or recomputed. Storing things in disk is expensive - we will go into why later, for now just take my word. A while ago, very smart people figured out that it is way easier to recompute these ERIs at every step instead of the process than reading them from memory. Thus, it is very important to calculate them as fast as possible.

Because of this, the first step anyone should take to porting a quantum chemistry software to use GPUs is having fast and reusable integrals. Thus, this is what we did.