Astronomical Data Volumes
Max Planck Institute for Radio Astronomy uses GRAU DATA‘s ArchiveManager for big data from pulsar research
© 2012 CASE STUDY by Max Planck Institute
Whenever a research institute such as the Max Planck Institute scours the depths of the galaxy for new knowledge, huge amounts of data are generated, often collected over many years. The research group for fundamental physics in radio astronomy at the Max Planck Institute is investigating radio waves in cosmic rays and examining pulsars in order to study the magnetic forces of the Milky Way. The observations permit tests on the theory of general relativity and alternative theories of gravitation. The data for this comes from the Effelsberg radio telescope, which generates over 100 gigabytes of data in just 30 minutes during one measurement. Some 18 terabytes (TiB) of measurement data are stored for calculation and analysis each month. The analysis of the data takes considerably more time. The researchers are dependent on having the data stored for many years and being able to access it unhindered at any time. The Max Planck Institute has found a way to store these large quantities of data using GRAU DATA ArchiveManager, an active archive solution capable of very efficiently managing several petabytes of data.
The Max Planck Institute is a leader in fundamental physics in radio astronomy, and the amounts of data measured and analyzed by the employees of the research groups are enormous. And while the institute is required by law to retain its data for ten years, the research data needs to be held much longer. New algorithms are constantly being developed to also integrate old data into calculations. Storing all of the data collected using the radio telescope on hard drives, i.e. using online storage, would go far beyond the boundaries of the institute‘s budget. In addition, the data is not used constantly and will often remain inactive for long periods of time on the storage unit.
The solution was an active archiving concept based on GRAU DATA’s archiving software that made use of LTO magnetic tape as a long-term archiving medium.
ArchiveManager‘s Testing, Modification, and Production System
In August 2011, the Max Planck Institute began the project using the active archiving solution called
ArchiveManager in conjunction with GRAU DATA. The first step was to rapidly port the software to the Debian/GNU Linux operating system at the request of the Max Planck Institute. The tests had already been successfully completed in October and the complete solution began productive operation in November.
The astronomical measurement data from the Effelsberg radio telescope was first buffered in an 8 Gbit FC SAN on a 120 TB online storage drive. Powerful Fujitsu Primergy RX 300 S6 systems were provided as servers, and these were used to store the data redundantly on the Spectralogic LTO 5 tape libraries in Effelsberg and Bonn with the help of GRAU DATA‘s
ArchiveManager.
The archiving software now manages around 350 tapes per library, with each tape capable of holding around 1.5 terabytes (TiB). The amount of data stored is growing rapidly. In total, the data stored up to May 2012 has grown to 525 terabytes, with the total system currently expandable to 3.5 petabytes.
"Unlike traditional corporate archiving systems, the tape technology in our division of the Max Planck Institute is often used as an expanded online storage medium that researchers access at regular intervals,“ said Jan Behrend, IT specialist at the Max Planck Institute, as he explained the structure of the storage solution. "The tape libraries, together with
ArchiveManager, are fast enough on a 1Gbit/s network to provide the research groups with their enormous amounts of data. At the same time, the storage system provides us with enormous cost advantages in comparison to traditional disk-based online storage.
The hardware-independent GRAU DATA
ArchiveManager, combined with the Fujitsu servers, is capable of migrating large quantities of data very quickly to the tape libraries. The rate of data input into the active archiving system is around a gigabit per second. The read/write speed can reach up to 130 MiB per second, per drive, which corresponds to around 500 GiByte an hour.
Management of Large Amounts of Data Made Easy
ArchiveManager provides the IT team of the research institute with an easy and intuitive system of administration. Fill levels and transfer rates are constantly monitored by the software. Should manual intervention become necessary, the administrator will receive a message immediately. Even the daily backup of the metadata to the remote location performs automatically.
Due to the smooth operation of the active archiving software, the Max Planck Institute decided to make use of the solution‘s multi-client capability and include two other research groups in the overall system. The multi-client capability allows separate partitions to be created, which ensured that the data and the usage of the drives and tapes could be separated.
Long-term Open Source Strategy In addition to comprehensive functionality, one of the most critical reasons behind the decision to use GRAU DATA archiving software was that it was ported to the Debian Linux operating system and that an open source alternative with practically identical functionality was available. The Max Planck Institute, much like the majority of research institutes around the globe, makes use of Linux-based operating systems. With "OpenArchive“, GRAU DATA is offering the only professional, Linux-based, open source archiving software (see
www.openarchive.net).
"
ArchiveManager was the best product for us during the first phase in order to ensure stable, high-performance operation. In the long term, we may move to the open source alternative from GRAU DATA. This idea takes into consideration not only that we may save costs for licenses, but also that we as a research institute often design and develop our applications. These applications can be integrated much more easily into a consistent open source environment thanks to the open-sourced code of the archiving software. At the moment,
ArchiveManager suits our needs superbly. The software runs absolutely reliably and there is nothing more that could be asked of an administration system,“ said Jan Behrend, commenting on the successful project.