Imahn Shekhzadeh

Imahn Shekhzadeh

Data Scientist | Consultant @d-fine GmbH | MSc Physics

About Me

I am a data scientist. Currently, I am a consultant at d-fine in Frankfurt. Previously, I had a research position at the Computer Science department of the University of Geneva, where my research interests were at the intersection of ML/AI and the physical sciences (ML\(4\)Science). My research stay was funded by the Swiss National Science Foundation. I earned an MSc degree in Physics at the University of Hamburg. In my MSc thesis, I worked on and developed L2LFlows. During my BSc and MSc studies in Physics, I was a scholarship holder of the German Academic Scholarship Foundation (German Studienstiftung des deutschen Volkes). I am also a member of the Hamburg Mathematical Society.

If you want to write me an encrypted e-mail, use my public PGP key. If you want to connect with me via Signal: Please write me an e-mail or a message via LinkedIn and I am happy to share my username with you!

Blog

DistributedDataParallel in PyTorch

This script demonstrates what sampler.set_epoch(epoch) does in a distributed setup in PyTorch. To test its effect, comment out sampler.set_epoch() and observe how in the same rank for the same batch index (yet for another epoch), the data remains the same. In this particular example, the test is done with an infinite loop over the dataloader.
import argparse
import logging
import os
import random
from typing import Generator, List

import numpy as np
import torch
from torch import Tensor
from torch import distributed as dist
from torch.utils.data import DataLoader, DistributedSampler, TensorDataset


def infiniteloop(dataloader) -> Generator[List[Tensor], None, None]:
    while True:
        for data in iter(dataloader):
            yield data


def setup(
    rank: int,
    world_size: int,
    master_addr: str = "localhost",
    master_port: str = "12355",
    backend: str = "nccl",
) -> None:
    """
    Initialize the distributed environment.

    Args:
        rank: Rank of the current process.
        world_size: Number of processes participating in the job.
        master_addr: IP address of the master node.
        master_port: Port number of the master node.
        backend: Backend to use.
    """

    os.environ["MASTER_ADDR"] = master_addr
    os.environ["MASTER_PORT"] = master_port

    # initialize the process group
    dist.init_process_group(
        backend=backend,
        rank=rank,
        world_size=world_size,
    )


def get_args() -> argparse.Namespace:
    """Get arguments passed via CLI."""
    parser = argparse.ArgumentParser()
    parser.add_argument(
        "--master_addr",
        type=str,
        default=None,
        help="IP address of the master node.",
    )
    parser.add_argument(
        "--master_port",
        type=str,
        default=None,
        help="Port of the master node.",
    )
    return parser.parse_args()


def seed_worker(worker_id: int) -> None:
    """
    Seed the worker for the dataloader. Function copy-pasted from [1].

    Args:
        worker_id: Worker ID.

    References:
        [1] https://pytorch.org/docs/stable/notes/randomness.html#dataloader
    """
    worker_seed = torch.initial_seed() % 2**32
    np.random.seed(worker_seed)
    random.seed(worker_seed)


def run(
    rank: int | torch.device, world_size: int, args: argparse.Namespace
) -> None:
    """
    Run test.

    Args:
        rank: Rank of the current process. Can be `torch.device("cpu")` if no
            GPU is available.
        world_size: Number of processes participating in distributed training.
            If `world_size` is 1, no distributed training is used.
    """
    if world_size > 1:
        setup(
            rank=rank,
            world_size=world_size,
            master_addr=args.master_addr,
            master_port=args.master_port,
        )

    num_samples = 12
    batch_size = 4
    # When using a single GPU per process and per
    # DistributedDataParallel, we need to divide the batch size
    # ourselves based on the total number of GPUs of the current node.
    batch_size = int(batch_size / world_size)
    num_epochs = 2

    tensor = torch.randn(
        num_samples, 2, generator=torch.Generator().manual_seed(2)
    )
    dataset = TensorDataset(tensor)
    sampler = DistributedSampler(dataset) if world_size > 1 else None
    dataloader = DataLoader(
        dataset=dataset,
        sampler=sampler,
        shuffle=False if world_size > 1 else True,
        worker_init_fn=seed_worker,
        generator=torch.Generator().manual_seed(0),
        batch_size=batch_size,
        num_workers=4,
    )
    datalooper = infiniteloop(dataloader)
    num_batches__per_epoch = len(dataloader)
    logging.info(f"# Batches/epoch: {num_batches__per_epoch}")

    for epoch in range(num_epochs):
        if sampler is not None:
            # necessary to ensure shuffling of the data
            # https://pytorch.org/docs/stable/data.html
            sampler.set_epoch(epoch)

        for batch_idx in range(num_batches__per_epoch):
            data = next(datalooper)
            logging.info(
                f"\n\nRank: {rank}, Epoch: {epoch}, Batch: {batch_idx}, Data:\n{data}"
            )


if __name__ == "__main__":
    logging.basicConfig(
        level=logging.DEBUG, format="%(asctime)s - %(levelname)s - %(message)s"
    )

    args = get_args()
    world_size = int(os.getenv("WORLD_SIZE", 1))
    logging.info(
        f"{args}\nWorld_size: {world_size}\nPyTorch version: "
        f"{torch.__version__}"
    )

    run(rank=int(os.getenv("RANK", 0)), world_size=world_size, args=args)

Book Summary: Die Entdeckung der Unendlichkeit by Aeneas Rooch (translation: The discovery of infinity)

Publications

Calibrating Neural Simulation-Based Inference with Differentiable Coverage Probability (NeurIPS, 2023)

Maciej Falkiewicz, Naoya Takeishi, Imahn Shekhzadeh, Antoine Wehenkel, Arnaud Delaunoy, Gilles Louppe, Alexandros Kalousis

Abstract: Bayesian inference allows expressing the uncertainty of posterior belief under a probabilistic model given prior information and the likelihood of the evidence. Predominantly, the likelihood function is only implicitly established by a simulator posing the need for simulation-based inference (SBI). However, the existing algorithms can yield overconfident posteriors (Hermans *et al.*, 2022) defeating the whole purpose of credibility if the uncertainty quantification is inaccurate. We propose to include a calibration term directly into the training objective of the neural model in selected amortized SBI techniques. By introducing a relaxation of the classical formulation of calibration error we enable end-to-end backpropagation. The proposed method is not tied to any particular neural model and brings moderate computational overhead compared to the profits it introduces. It is directly applicable to existing computational pipelines allowing reliable black-box posterior inference. We empirically show on six benchmark problems that the proposed method achieves competitive or better results in terms of coverage and expected posterior density than the previously existing approaches.

L2LFlows: generating high-fidelity 3D calorimeter images (Journal of Instrumentation, 2023)

Sascha Diefenbacher, Engin Eren, Frank Gaede, Gregor Kasieczka, Claudius Krause, Imahn Shekhzadeh, David Shih

Abstract: We explore the use of normalizing flows to emulate Monte Carlo detector simulations of photon showers in a high-granularity electromagnetic calorimeter prototype for the International Large Detector (ILD). Our proposed method — which we refer to as "Layer-to-Layer Flows" (L2LFlows) — is an evolution of the CaloFlow architecture adapted to a higher-dimensional setting (30 layers of 10 × 10 voxels each). The main innovation of L2LFlows consists of introducing 30 separate normalizing flows, one for each layer of the calorimeter, where each flow is conditioned on the previous five layers in order to learn the layer-to-layer correlations. We compare our results to the BIB-AE, a state-of-the-art generative network trained on the same dataset and find our model has a significantly improved fidelity.

Code: https://gitlab.com/Imahn/l2lflows

Advancing Generative Modelling of Calorimeter Showers on Three Frontiers (NeurIPS ML\(4\)PhysicalSciences Workshop, 2023)

Erik Buhmann, Sascha Diefenbacher, Engin Eren, Frank Gaede, Gregor Kasieczka, William Korcari, Anatolii Korol, Claudius Krause, Katja Krüger, Peter McKeown, Imahn Shekhzadeh, David Shih

Abstract: Generative machine learning can be used to augment and speed-up traditional physics simulations, i.e. the simulation of elementary particles in the detector of collider experiments. Like many physics data, these calorimeter showers can either be represented as images or as permutation-invariant lists of measurements, i.e. as point clouds. We advance the generative models for calorimeter showers on three frontiers: (1) increasing the number of conditional features for precise energy- and angle-wise generation with the bounded bottleneck auto-encoder (BIB-AE), (2) improving generation fidelity using a normalizing flow model, dubbed "Layer-to-Layer-Flows" (L\(2\)LFlows), (3) developing a diffusion model for geometry-independent calorimeter point cloud scalable to \(\mathcal O\)(1000) points, called CaloClouds, and distilling it into a consistency model for fast single-shot sampling.