Computer facilities

CLUSTER Joan Francesc Fernandez (JFF) at CTTC

Background information
Computational Fluids Dynamics and Heat Transfer (CFD&HT) are characterized by huge computational demand, far beyond the capabilities of current computers for certain applications. Since the very beginning, the computational strategy at CTTC has been based on the use of systems with the best cost/performance ratios, such as VAX 750 and HP workstations 9000/735.
In 1996, the first parallel computer was installed. It was a shared-memory SMP HP 9000-k with 6 processors. The GNU/Linux irruption in the operating systems changed completely the prospect of scientific computing. In 1997 the first Linux PC box was installed. As it was clearly cheaper and faster than the HP workstations and SMP systems that we had at that time, others soon followed it.
At this time, the enhancement of the cluster (Second Generation) is achieved with the financing of the project "New cluster of parallel calculation of high performance (HPC, CFD & HT)" Project with the code: UNPC08-4E-002, of the Ministerio de Ciencia e Inovacion, Subdireccion General de Infrastructura Cientifica.
 
Beowulf HPC cluster JFF first generation
In 1999 was configured the first High Performance Computing (HPC) "Beowulf cluster" at CTTC named Joan Francesc Fernandez (JFF) after the recently disappeared professor of computer science and numerical analysis that brought the first computer to the faculty. With his brilliant teaching, Professor Fernandez awoke the interest for research in numerical simulation to many persons.

The first JFF system started in 1999 and has 16 nodes switched Fast Ethernet (100 Mbit/s) network running Debian Linux. JFF used AMD K7 processors at 600 MHz. It was one of the first Beowulf systems in Spain. Last update of Beowulf cluster JFF was in 2007, with 125 CPUs a total RAM of 100Gbytes and a space disk of 7.25 Terabytes.

Nowadays, JFF cluster first generation is based on 48 nodes with 2 cores desktop type and 4 Gb of RAM memory linked with a Gigabit Ethernet network.

JFF cluster (2005)
JFF cluster (2009)
 
HPC cluster JFF second generation

At this moment, high performance computing is clearly focused on parallelization because electronics does not allow processors frequency increasing like two decades ago. The most extended solution is the commodity hardware interconnected with a high performance network. In fact, the idea is based on having the maximum number of cores per processor connected with an efficient network in terms of latency.

In our case, growing up significantly on processors number under desktop type nodes would involve a very difficult and almost impossible management of cabling, and a non-efficient and very expensive air conditioning system to reject the heat produced by the equipments. Both issues are better solved with rack type configuration.

Infiniband network has been selected, as it is the one which better solves the key aspect of the network interconnection between nodes with latencies of around 2.25 microseconds instead of latencies between 29 and 120 microseconds of Gigabit Ethernet type. The Infiniband network used has a capacity of 20 Gbits/s. This solution allows scaling the number of nodes significantly. In our particular case, four selves having each one 32 nodes of 1U high, can deploy 128 nodes which imply having 1024 cores all together.

The last important aspect in our HPC cluster second generation is the parallel file system called Lustre. This system of files allow unified capacities of several Petabytes (Pb), being highly scalable, and obtaining the optimal efficiency using a high performance hardware like DDN S2A9900 units with a useful maximum capacity of 1 Pb with 1200 disks of 1 terabyte (Tb), writing at 5.6 Gbytes/s. This unit has been acquired with 80 disks of 1 Tb, obtaining a storage capacity of 64 Tb and writing at 2.6 Gbytes/s.

Front view

Rear view

JFF cluster Second generation (2011)
 
HPC cluster JFF third generation
40 cluster nodes, each node has 2 AMD Opteron with 16 Cores for each CPU linked with 64 Gigabytes of RAM memory and an infiniband QDR 4X network interconnection between nodes with latencies of 1.07 microseconds with a 40Gbits/s bandwith.
JFF cluster Third generation (2013)