Project

Back to overview

Machine Learning based Analytics for Big Data in Astronomy

English title Machine Learning based Analytics for Big Data in Astronomy
Applicant Voloshynovskiy Sviatoslav
Number 167158
Funding scheme NRP 75 Big Data
Research institution Centre Universitaire d'Informatique Université de Genève
Institution of higher education University of Geneva - GE
Main discipline Information Technology
Start/End 01.07.2017 - 30.06.2021
Approved amount 823'476.00
Show all

All Disciplines (2)

Discipline
Information Technology
Astronomy, Astrophysics and Space Sciences

Keywords (7)

astronomical data sets; classification; data management; image processing; solar flares; big data; machine learning

Lay Summary (German)

Lead
Astronomische Beobachtungsmissionen sammeln Daten in beträchtlichen Mengen, die nicht mehr von Hand, sondern nur noch automatisch analysiert werden können. Dieses Projekt verwendet Methoden des maschinellen Lernens in der Sonnenforschung, um Sonneneruptionen besser verstehen und vorhersagen zu können.
Lay summary

Die IRIS-Mission sammelt seit 2013 Daten von verschiedenen Schichten der Sonnenatmosphäre. Der entstehende Datensatz ermöglicht es, unser Verständnis der Physik der Sonne zu vertiefen. Doch dazu müssen die grossen Datenmengen zuerst automatisch charakterisiert und durchsucht werden können. Wir entwickeln Methoden, bei denen Rechner lernen im IRIS-Archiv Muster zu erkennen und den zeitlichen Verlauf der beobachteten Sonneneruptionen zu charakterisieren. Darauf basierend möchten wir das Verständnis und die  Prognose von Sonneneruptionen massgebend verbessern.

Die regelmässig auftretenden Eruptionen auf der Sonne können zu Störungen auf der Erde – etwa in Funk- und GPS-Ortungssystemen – aber auch zu Ausfällen in Stromnetzen führen. Bisher versteht die Wissenschaft weder die physikalische Ursache von Sonneneruptionen, noch kann sie diese verlässlich vorhersagen. Weil Sonneneruptionen in vielen verschiedenen komplexen räumlichen und zeitlichen Mustern auftreten,  ist die systematische Analyse dieser Phänomene erheblich erschwert.

Das Ziel dieses Projekts ist es, die Physik der Sonne besser zu verstehen und Methoden zu entwickeln, um Sonneneruptionen vorhersagen zu können. Dazu verwenden wir das grosse Datenarchiv, das IRIS (Interface Region Imaging Spectrograph), der neuste Sonnensatellit der NASA, anlegt. Wir erstellen Algorithmen des maschinellen Lernens, welche die Daten hinsichtlich räumlicher und zeitlicher Muster auswerten.

Da Sonneneruptionen zu weitreichenden Beeinträchtigungen auf der Erde führen können, ist ihre Vorhersage von grosser Bedeutung. Sie kann etwa für die Planung von Flügen und für den Betrieb von Satelliten und Stromnetz dienlich sein und so allfällige Eruptionsschäden verkleinern. Überdies können die von uns entwickelten Algorithmen und Bildverarbeitungsmethoden für die Analyse weiterer Datensätze – in der Wissenschaft oder in der Industrie – verwendet werden.

Direct link to Lay Summary Last update: 26.07.2017

Lay Summary (French)

Lead
Les missions d’observation astronomique collectent des quantités considérables de données qui ne peuvent être analysées qu’automatiquement. Ce projet utilise des méthodes d’apprentissage automatique dans le domaine de la recherche sur le Soleil afin de mieux comprendre et prévoir les éruptions solaires. La mission IRIS collecte depuis 2013 des données sur les différentes couches de l’atmosphère solaire. Le jeu de données ainsi créé approfondit notre compréhension de la physique du Soleil.
Lay summary

Mais cela nécessite en premier lieu une automatisation de la caractérisation et de l’exploration de ces gros volumes de données. Nous développons des méthodes au moyen desquelles des ordinateurs apprennent à reconnaître des schémas dans l’archive d’IRIS et à caractériser le déroulement temporel des éruptions solaires observées. Sur cette base, nous souhaitons améliorer de manière déterminante la compréhension et les prévisions des éruptions solaires.

Les éruptions qui se produisent régulièrement à la surface du Soleil peuvent provoquer des perturbations sur la Terre – par exemple dans les systèmes de localisation radio et GPS – mais également des pannes dans les réseaux d’électricité. Jusqu’ici, la science n’a pas été en mesure de comprendre les causes des éruptions solaires ni de les prévoir de manière fiable. Les éruptions se produisant dans des configurations spatiales et temporelles très variées, leur analyse systématique s’en trouve considérablement compliquée.

L’objectif de ce projet est de mieux comprendre la physique du Soleil et de développer des méthodes de prévisions des éruptions solaires. Pour ce faire, nous utilisons la grande banque de données créée d’IRIS (Interface Region Imaging Spectrograph), le nouveau satellite solaire de la NASA. Nous élaborons des algorithmes d’apprentissage automatique qui évaluent les données en fonction des configurations spatiales et temporelles.

Les éruptions solaires pouvant causer des perturbations de grande ampleur sur la Terre, leur prévision a une grande importance. Elle peut par exemple être utile pour la planification des vols ainsi que pour l’exploitation des satellites et des réseaux d’électricité, et ainsi réduire les dégâts éventuels dus aux éruptions. Les algorithmes et méthodes de traitement des images que nous développons peuvent par ailleurs être utilisés pour l’analyse d’autres jeux de données, dans la science ou l’industrie.

Direct link to Lay Summary Last update: 26.07.2017

Lay Summary (English)

Lead
Astronomical observation missions collect data in such large quantities that they can no longer be analysed manually, but only automatically. The present project uses methods of machine learning in solar research to better understand and predict solar eruptions.
Lay summary

The IRIS Mission has been collecting data from various layers of the sun’s atmosphere since 2013. The resulting data set will provide us with a deeper knowledge of the physics of the sun. But first a way must be found to automatically characterise and search the enormous quantities of data. We are developing methods that enable computers to learn to recognise patterns in the IRIS archive and to characterise the temporal progression of the solar eruptions observed. Building on this, we would like to significantly improve our understanding of and ability to predict solar eruptions.

The regular eruptions that take place on the sun can cause disturbances on Earth – for instance in radio and GPS positioning systems – as well as power outages. To date, scientists have been unable to explain the physical cause of solar eruptions, nor can they reliably predict them. The fact that solar eruptions occur in diverse complex spatial and temporal patterns makes it considerably more difficult to systematically analyse these phenomena.

The objective of this project is to gain a better understanding of the physics of the sun and to develop methods for predicting solar eruptions. We will be using the huge data archive compiled by IRIS (Interface Region Imaging Spectrograph), NASA’s latest solar satellite. We will create machine learning algorithms to evaluate the data for spatial and temporal patterns.

Since solar eruptions can cause widespread interference on Earth, a great deal of importance is attached to being able to predict them. This would be helpful for flight planning and for the operation of satellites and power grids and, accordingly, reduce any damage from eruptions. Moreover, the algorithms and image processing methods we develop could be used for the analysis of other data sets in science or industry.


Direct link to Lay Summary Last update: 26.07.2017

Responsible applicant and co-applicants

Employees

Publications

Publication
DCT-Tensor-Net for solar flares detection on IRIS data
UllmannDenis (2018), DCT-Tensor-Net for solar flares detection on IRIS data, in 7-th European Workshop on Visual Information Processing (EUVIP), Tampere, FinlandIEEE, Tampere, Finland.
Identifying Typical Mg II Flare Spectra Using Machine Learning
Panos Brandon, Kleint Lucia, Huwyler Cedric, Krucker Sam, Melchior Martin, Ullman Denis, Voloshynovskiy Slava (2018), Identifying Typical Mg II Flare Spectra Using Machine Learning, in Astrophysical Journal, 861(1), 62.

Scientific events

Active participation

Title Type of contribution Title of article or contribution Date Place Persons involved
3rd SCOSTEP workshop, PMOD/WRC Talk given at a conference Machine Learning and Solar Flares 06.03.2019 Davos, Switzerland Panos Brandon;
ISSI Meeting on Observed Large-scale Variability of Coronal Loops as a Probe of Coronal Heating Talk given at a conference Tracking Coronal Rain with Machine Learning Methods 21.01.2019 Bern, Switzerland Panos Brandon;
EUVIP 2018, Tampere University of Technology Talk given at a conference DCT‐Tensor-Net for Solar Flare Detection on IRIS Data 26.11.2018 Tampere, Finland Ullmann Denis;
Solar Physics Seminar, University of Pekin, China - invited seminar given by Dr. Lucia Kleint Individual talk Invited Seminar 19.10.2018 Pekin, China Panos Brandon;
IRIS-9, Max Planck Institute for Solar System Research Talk given at a conference Do all flares share the same chromospheric physics? 29.06.2018 Göttingen, Germany Panos Brandon;
IRIS-9, Max Planck Institute for Solar System Research Poster Flare detection from discrete Fourier transforms of SJI 28.06.2018 Göttingen, Germany Ullmann Denis;
IRIS-9, Max Planck Institute for Solar System Research Poster Irisreader - A Python Library for IRIS Data Processing 26.06.2018 Göttingen, Germany Huwyler Cédric;
17th RHESSI Workshop Talk given at a conference Do all flares share the same chromospheric physics? 19.06.2018 Dublin, Ireland Panos Brandon;
EWASS 2018 - invited talk given by Dr. Lucia Kleint Talk given at a conference Multi-wavelength observations of major solar flares (invited) 03.04.2018 Liverpool, Great Britain and Northern Ireland Panos Brandon;
International Space Science Institute Talk given at a conference Prelimianry reuslts on IRIS data processing; L. Kleint and B. Panos 23.01.2018 Bern, Switzerland Panos Brandon;


Communication with the public

Communication Title Media Place Year
Media relations: print media, online media Lucia Kleint will die Sonne verstehen: «Ich hoffe, dass die Rakete beim Start nicht explodiert» Rimmattaller Zeitung German-speaking Switzerland 2018
Media relations: print media, online media Sonneneruptionen besser verstehen mit Machine Learning German-speaking Switzerland 2018

Associated projects

Number Title Start Funding scheme
173716 Euclid: high-precision cosmology in the dark sector 01.06.2017 Sinergia
193716 Robust Deep Density Models for High-Energy Particle Physics and Solar Flare Analysis (RODEM) 01.12.2020 Sinergia
189180 Solar Orbiter STIX 01.04.2020 Project funding (Div. I-III)

Abstract

Astronomical observations produce a wealth of data in excess of several TB per day. Clearly, even a fraction of such data cannot be analyzed manually. In this project, we will investigate the use of big data analytics tools such as machine learning techniques applied to astronomical data. Specifically, we will consider observations of solar flares - magnetic eruptions that influence the whole solar system and cause space weather phenomena on Earth such as blackouts and problems in aircraft communication and GPS positioning. So far, flares are neither understood, nor can they be reliably predicted. The problem is that the patterns found in flares and in their temporal evolution are diverse and most complex and the data volume too large to analyze manually. A huge first step towards a better understanding of the underlying physics and the development of space weather forecasting is to systematically identify, collect, and characterize the different spatio-temporal patterns in solar physics data. Here, efficient big data analytics tools such as machine learning techniques are crucial. We propose an interdisciplinary approach to set up, customize, and optimize analytics capabilities for big data applications in astronomy. The project team consists of astronomers, experts in machine learning and statistical image processing, and specialists in data management systems for Big Data astronomy projects. This interdisciplinary approach will allow for a drastic improvement in the level of science questions that can be addressed and will, in turn, lead to a quantum leap in the understanding of the physics of solar flares and the quality of space weather predictions. In a first step, various existing state of the art machine-learning techniques for clustering, classification, and outlier detection will be applied. In a second step, we will develop algorithms customized to the science question to be addressed and to the statistics of the input data. For the processing and the analysis of the big data, we will setup a big data analytics system suited for optimizing machine learning algorithms to use cases in astronomy but also in other science domains. While applied for solar data, our results could be of interest not only for any kind of astronomical data, but also for other applications that have in common a large amount of unlabeled and unstructured data daily produced by distributed sources. Several domains will benefit from our results: 1) Solar Physics by developing models for flare analysis and prediction, which nowadays cannot be fully exploited due to their Big Data nature, 2) Machine Learning and Image Processing by contributing to both theory and practice, and 3) Applications with Big Data by setting up a framework to analyze and classify large datasets, which are of high value not only to other domains in astronomy, but also for example to genomics and medical diagnostics.
-