Based on one million arXiv papers submitted from May 2018 to January 2024, we assess the textual density of ChatGPT's writing style in their abstracts by means of a statistical analysis of word frequency changes. Our model is calibrated and validated on a mixture of real abstracts and ChatGPT-modified abstracts (simulated data) after a careful noise analysis. We find that ChatGPT is having an...
Particulate Matter (PM) has different severe impacts on human health and climate depending on its size and composition (Yang et al. (2018), Daellenbach et al. (2020)). Source apportionment (SA) is the process of identification of ambient air pollution sources and the quantification of their contribution to pollution levels, and is usually conducted through receptor models (RM). Their usual...
Simulation of physics processes and detector response is a vital part of high energy physics research but also representing a large fraction of computing cost. Generative machine learning is successfully complementing full (standard, Geant4-based) simulation as part of fast simulation setups improving the performance compared to classical approaches.
A lot of attention has been given to...
In this talk I will first introduce the main motivations to use cosmological data to test the laws of gravity. I shall focus on the established methods and main results. Finally, I will describe how machine learning can help understanding the fundamental laws of nature.
The GeV gamma-ray sky, as observed by the Fermi Large Area Telescope (Fermi LAT), harbors a plethora of localized point-like sources. At high latitudes ($|b|>30^{\circ}$), most of these sources are of extragalactic origin. The source-count distribution as a function of their flux, $\mathrm{d}N/\mathrm{d}S$, is a well-established quantity to summarize this population. We employ sequential...
In any large data base, most of the features defining a data point are redundant, irrelevant, or affected by large noise, and have to be discarded. To do this, one needs to answer: What is the best dimensionality of a reduced feature space in order to retain maximum information? How can one correct for different units of measure? What is the optimal scaling of importance between features? We...
"Anomaly detection" covers a broad range of problems and settings. In some instances, it is seen as finding "rare objects", i.e. objects lying in a low-density region of the feature space. However, this task can quickly become difficult, particularly for higher dimensional, noisy or complex (non-rectangular) data where reliable density estimation is non-trivial.
Additionally, not all...
The talk will cover some key modelling issues that come up when considering the long-term development of societies. Of particular importance is the topic of societal collapse as the archaeological record has numerous instances of the phenomenon. I will discuss some of the general modelling philosophy, relevant literature, my own work on ancient societies (Easter Island, the Maya, Roman Empire...
We will present the HF-SCANNER project, detailing the datasets used, the project's goals, and the preliminary results. The HF-SCANNER project aims to develop a fast and accurate forecasting system for high-frequency sea level oscillations (HFOs) and meteotsunamis in the Mediterranean using deep learning and data from both simulations (ECMWF) and observations (sea level and air pressure)....
Recent years have seen a surge of interest in evolutionary reinforcement learning (evoRL), where evolutionary computation techniques are used to tackle reinforcement learning (RL) tasks. Naturally, many of the existing ideas from meta-RL can also be applied in this context. This is particularly important when handling dynamic (non-stationary) RL environments, where agents need to respond...
Accurate modeling of sea level and storm surge dynamics with several day-long temporal horizons is essential for effective coastal flood response and the protection of coastal communities and economies. The classical approach to this challenge involves computationally intensive ocean models that typically calculate sea levels relative to the geoid, which must then be correlated with local tide...
Sea surface temperature (SST) is critical for weather forecasting and climate modeling, however remotely sensed SST data often suffer from incomplete coverage due to cloud obstruction and limited satellite swath width. While deep learning approaches have shown promise in reconstructing missing data, existing methods struggle to accurately recover fine-grained details, which, however are...
One of the most challenging tasks in Numerical Weather Prediction (NWP) is forecasting convective storms. Data Assimilation (DA) methods improve the initial condition and subsequent forecasts by combining observations and previous model forecast (background). Weather radar provides a dense source of observations in storm monitoring. Therefore, assimilating radar data should significantly...
The EPICURE project aims to enhance support for European supercomputer users, particularly within the EuroHPC network. It covers several key areas: code enablement, performance analysis, benchmarking, refactoring and optimization. Each aspect involves porting and refining applications to ensure they scale efficiently across larger node counts in high-performance computing environments. By...
Large transformers have been successfully applied to self-supervised data analysis across various data types, including protein sequences, images, and text. However, the understanding of their inner working is still limited. We discuss how, by applying unsupervised learning techniques, we can describe several geometric properties of the representation landscape of these models and how they...
This talk addresses the challenge of interpreting the high-dimensional hidden representations in Transformer models, a critical issue given their widespread use in sequential data tasks. We propose using Topological Data Analysis (TDA), a powerful mathematical approach that allows us to understand the shape and structure of complex data. Using TDA, we develop a framework that follows the...
We discuss LHC searches for simplified models in which a singlet Majorana dark matter candidate couples to Standard Model leptons through interactions mediated by scalar lepton partners. We summarize the dark matter production mechanisms in these scenarios, highlighting the parameter space which can both satisfy the relic density and account for muon g-2. We focus on the case of intermediate...
Axion-like particles (ALPs) are promising candidates from theories beyond the Standard Model, possibly linked to dark matter. When subjected to external magnetic fields, ALPs can convert to photons and vice versa, rendering them observable. The ALP-photon mixing distorts the gamma-ray blazar spectra with measurable effects, albeit tiny. The description of blazar jets varies per target and...
Dark matter remains a crucial missing piece in our understanding of the Universe. Since the late 1970s, the astrophysics community has widely accepted that visible galaxies lie at the centre of large dark matter halos. Significant progress has been made in understanding the halo that hosts our own Milky Way galaxy, including its overall mass and density distribution. However, the halos...
The integration of machine learning (ML) with advanced biomarker discovery techniques offers new opportunities for pathology, particularly in personalized medicine. This research will focus on using nanobodies—small, stable, and highly specific single-domain antibodies derived from camelids—as versatile tools for advancing biomarker research. We plan to utilize ML to explore a diverse, naïve...
I introduce floZ, an improved method based on normalizing flows, for estimating the Bayesian evidence (and its numerical uncertainty) from samples drawn from the unnormalized posterior distribution. I validate it on distributions whose evidence is known analytically, up to 15 parameter-space dimensions and I demonstrate its accuracy for up to 200 dimensions with $10^5$ posterior samples. I...
The detection of faint γ-ray sources is an historical challenging task for the very-high energy astrophysics community. The standard approaches to identify sources rely on likelihood analyses. However, our lack of knowledge of background uncertainties can introduce strong biases in the results and hinder a detection. The field of machine learning (ML) has advanced dramatically over the past...
Astronomical datasets include millions, sometimes billions of records, and in order to handle such volumes of data in the last 20 years astronomers actively use ML methods for various classification and characterization tasks. However, most of those applications utilize supervised ML, which requires large pre-existing training samples. Obtaining those training samples is a complicated task,...
Light absorbing carbonaceous aerosols (LAC) contribute positive forcing to the Earth radiative budget, which results in atmospheric warming. To determine the actual contribution of LAC aerosols, measurements from across the globe are incorporated into climate models. The most used approach for this measurement is by using filter-photometers (FP), which measure the attenuation of light through...
The Galactic Plane Survey (GPS) as proposed by CTAO is one of the key science projects that will cover an energy range from ~30 GeV to ~100 TeV with unprecedented sensitivity leading to an increase in the known gamma-ray source population by a factor of five.
Here we tested our deep-learning-based automatic source detection techniques and compared them with traditional likelihood detection...
This work is dedicated to develop the most detailed comprehensive numerical framework to date, combining magnetohydrodynamics (MHD) and Monte-Carlo simulations to derive the multi-wavelength (MWL) and multi-messenger spectra from the magnetized environment of galaxy clusters. Special attention will be given to Perseus-like clusters hosting active galactic nuclei (AGNs). We will study...
This contribution presents an enhanced approach to secure communications using multiple inverse systems to design alpha-stable noise based Random communication systems (RCSs). This method incorporates multiple inverse systems to transform the encoded alpha stable noise signals on the transmitter side with corresponding inverse systems on the receiver side to decode the retrieved signals...