Collisions in the LHC generate particles that often decay in complex ways into even more particles. Electronic circuits record the passage of each particle through a detector as a series of electronic signals, and send the data to the CERN Data Centre for digital reconstruction. The digitised summary is recorded as a 'collision event'. Up to about 1 billion particle collisions can take place every second inside the LHC experiment's detectors. It is not possible to read out all of these events. A 'trigger' system is therefore used to filter the data and select those events that are potentially interesting for further analysis.
Even after the drastic data reduction performed by the experiments, the CERN Data Centre used to process on average one petabyte (one million gigabytes) of data per day during LHC Run 2.
The LHC experiments plan to collect more data during LHC Run 3 than they did in the first two runs combined. It means that the computing challenge during long shutdown 2 was to get ready to store and analyse more than 600 petabytes of data (600 million gigabytes), equivalent to over 20 000 years of 24/7 HD video recording.
Archiving the vast quantities of data is an essential function at CERN. Magnetic tapes are used as the main long-term storage medium and data from the archive is continuously migrated to newer technology, higher density tapes.
The CERN storage system, EOS, was created for the extreme LHC computing requirements. EOS instances at CERN exceed seven billion files (as of June 2022), matching the exceptional performances of the LHC machine and experiments. EOS is now expanding for other data storage needs beyond high-energy physics, with AARNET, the Australian Academic and Research Network, and the EU Joint Research Centre for Digital Earth and Reference Data adopting it for their big-data systems.