STEP-UP RSLondon 2025: Abstracts

This page contains the abstracts of all accepted submissions for the STEP-UP RSLondon Conference 2025. You can browse the abstracts here or link to a specific abstract by clicking a talk title in the conference schedule.

Where available, links to conference presentations are also provided.

Poster abstracts are provided towards the bottom of the page.

Title: Bringing the Virtual Human to Life
Authors: Andrea Townsend-Nicholson, UCL
Type: Keynote
Time/Track: 10:10-10:55, General track
Abstract:
Imagine assembling a silicon twin for every person using their digital health and biological data, from the whole genome sequence to their skeletal architecture. Imagine possessing the ability to make predictions not only of disease progression and outcomes but also the therapeutic effects of treatment-options and interventions - and the means of testing the impact of different lifestyles and treatments to select the ones that give us the preferred outcome. This digital twin could be used to make informed decisions that range from the treatment of disease to improving quality of life and, ultimately, will greatly inform the design of clinical trials so that these could be faster and safer. Since 2016, I have been focused on supporting the creation and translation of patient-specific computational models into validated human digital twins to inform clinical decision making through my research in drug discovery, and my education and outreach activities. Here, I will describe “Bringing the Virtual Human to Life” – a community effort that requires bringing together a wide range of people from experts in computational biomedicine, clinicians, policymakers and the general public. I will describe the state of the art in the building of human digital twin components, and I will share several of the challenges that I have encountered along the way that have been resolved with the support of the research technology professionals with whom I have been collaborating.

Title: Contextualising open science training programmes
Authors: Sara Villa, Yo Yehudi and Malvika Sharan
Type: Regular talk
Time/track: 11:40-12:00, General track
Slides: Presentation slides (via Zenodo)
Abstract:
Open science training programmes are on the rise both in academic and non academic settings. Several organisations are providing upskilling sessions on different topics essential to ensure reproducibility and accessibility of research. Among them, OLS, a capacity building organisation has led the way with their flagship Open Seeds programme, a training and mentoring programme in Open Science for researchers from all levels. This programme provides hands-on experience with open science principles during a 16 week long training period, including different topics like Open data, open communities and EDI.

One of the disadvantages of these kinds of global programmes is the lack of context in the content provided. Participants can fall out from learning key concepts and tools related to their specific fields of work.

We have recently run a pilot version of Open Seeds in the School of Neuroscience at King’s College London, called Open Neuroseeds. In this session, we will highlight the lessons learned and the feedback obtained from contextualising this important type of training. We will provide tips and tricks to adapt your own training programmes to the audience to help contextualise lessons learn, and hence, embrace them more efficiently in their research workflows.

This is an important discussion for all RSEs and trainers who usually work in multidisciplinary teams and with researchers not well versed in open science practices.

Title: Professionalising diverse data science roles
Authors: Emma Karoune and Malvika Sharan
Type: Regular talk
Time/track: 12:00-12:20, General track
Slides: Presentation slides (via Zenodo)
Abstract:
The interdisciplinary nature of the data science workforce extends beyond the traditional notion of a “data scientist.” A successful data science team requires a wide range of technical expertise, domain knowledge and leadership capabilities. To strengthen such a team-based approach, we recommend that institutions, funders and policymakers invest in developing and professionalising diverse roles, including a broad set of digital research technical professionals roles, fostering a resilient data science ecosystem for the future.

We will delve into the different roles that make up data science teams at Turing, including research software engineers, data wranglers and research computing engineers, highlighting overlapping as well as specific skills. We will also explore the importance of coordination and engagement roles such as project managers and research community managers, which are often overlooked in conversations about digital research technical professionals, but who play an essential role in the team to drive forward collaboration and impact.

By recognising these diverse specialist roles that collaborate within interdisciplinary teams, organisations can leverage deep expertise across multiple skill sets, enhancing responsible decision-making and fostering innovation at all levels. Ultimately, we seek to shift the perception of data science professionals from the conventional view of individual data scientists to a competency-based model of specialist roles within a team, each essential to the success of data science initiatives.

Title: Why are we creating an Open Source Programme Office at UCL? How are we doing so?
Authors: David Pérez-Suárez, Mosè Giordano, Miguel Xochicale and Sam Cunliffe
Type: Lightning talk
Time/track: 12:20-12:25, General track
Abstract:
An Open Source Programme Office (OSPO) is a body within an organisation to look after their open source strategy and operations. Much of the research, infrastructure and every day lives depends on Open Source, but universities don’t know how much. An OSPO can provide insights on that and become a body that protects and nurture Open Source communities from within universities. We are creating an OSPO at UCL! Join us and learn how we are doing it and how you could start one at your institution.

Title: DisCouRSE: Developing a Community of Leaders
Authors: Jonathan Cooper
Type: Lightning talk
Time/track: 12:25-12:30, General track
Slides: Presentation slides (via Zenodo)
Abstract:
I will introduce our newly funded UKRI Network+ grant, aiming to connect and train aspiring leaders across digital Research Technical Professional career tracks. The talk will cover what we’re aiming to accomplish in the next 3.5 years, and how we would like the whole community to get involved!

Title: CCP-AHC: A new collaborative computational community for arts, humanities, and culture research
Authors: Eamonn Bell, Karina Rodriguez and Jeyan Thiyagalingam
Type: Lightning talk
Time/track: 12:30-12:35, General track
Slides: Presentation slides (via Zenodo)
Abstract:
We present “Toward a new CCP for Arts, Humanities, and Culture research (CCP-AHC)”, a new research software community-building exercise, funded by UKRI and STFC for 24 months from January 2025. The goal of CCP-AHC is to support the sustainable and efficient development of software, pipelines, and workflows used by arts, humanities, and culture researchers who make use of UK-based digital research infrastructure (DRI), including high-performance computing (HPC) and advanced computing infrastructures supported by UKRI. It will do so by disseminating and implementing the Collaborative Computational Project (CCP) model that has successfully been used by many other scientific software communities over the past several decades, with the support of the national CoSeC programme at STFC. There is a special emphasis on ensuring that research software developed by this community makes the best of use of large-scale compute infrastructures supported by public funding. This includes existing and future high-performance computing (HPC) and advanced computing infrastructures supported by UKRI, as well as those run by UK-based HEIs and other research organisations eligible for UKRI funding. During the first year of the project, we will bring together key stakeholders in the area, including RSEs and other dRTPs supporting computationally intensive research in arts, humanities, and culture. In the second year, we will use RSE and computational scientist resource within the project to produce a portfolio of codes, pipelines, and workflows for adoption by the project. Our main deliverable is a roadmap for the development of the community, proposing a five-year plan for the community within the broader national and international DRI landscape.

Title: Diversity and Inclusion in Practice: What you need to know when you plan to hire International RSE and dRTPs
Authors: Yo Yehudi, Sarah Villa, Aman Goel, Toby Hodges and Malvika Sharan
Type: Regular talk
Time/track: 12:35-12:55, General track
Abstract:
The world, and the UK specifically, is currently very anti-diversity, and anti-inclusion. One way to push back against this - indeed, to step up as a UK-based Ally, is to recruit international RSEs and PhD students into your teams. They can bring a breadth of experience that includes different cultural, religious, and racial backgrounds, as well as different language and domain expertise.

It’s not always straightforward, however, in a hostile environment. This talk will draw from real-world experiences of immigrants, and provide practical tips for how to work around the system to create more inclusive spaces. This is reflected directly in the talk authorship: Aman is an Indian national, currently in the UK on a time and contract bound work visa [1] (obtained it before the rules were made more restrictive [2]) , Yo is a naturalised UK citizen of Israeli/Kiwi background (who was deported twice when they first came to the UK), Sara is a Spanish national with settled status (luckily obtained before Brexit so it didn’t impact her work options).

We will also solicit experiences and advice from the community present at the event, and more widely. We anticipate that eventually a learning module on this topic will be submitted to The Carpentries Incubator [3] for a richer “what you need to know when leading a group of RSEs” lesson.

[1] https://www.gov.uk/skilled-worker-visa
[2] https://www.gov.uk/government/news/new-laws-to-cut-migration-and-put-british-workers-first-in-force
[3] https://carpentries-incubator.org/

Title: SoFAIR - Making Software FAIR: A machine-assisted workflow for the research software lifecycle
Authors: David Pride, Petr Knoth, Matteo Cancellieri and Laurent Romary
Type: Regular talk
Time/track: 14:25-14:45, Software/RSE track
Abstract:
A key issue hindering discoverability, attribution and reusability of open research software is that its existence often remains hidden within the manuscript of research papers. For these resources to become first-class bibliographic records, they first need to be identified and subsequently registered with persistent identifiers (PIDs) to be made FAIR (Findable, Accessible, Interoperable and Reusable). To this day, much open research software fails to meet FAIR principles and software resources are mostly not explicitly linked from the manuscripts that introduced them or used them.

SoFAIR is a 2-year international project (2024-2025) which proposes a solution to the above problem realised over the content available through the global network of open repositories. SoFAIR will extend the capabilities of widely used open scholarly infrastructures (CORE, Software Heritage, HAL) and tools (GROBID) operated by the consortium partners, delivering and deploying an effective solution for the management of the research software lifecycle.

The ambition of SoFAIR is to:
1. Develop a machine learning assisted workflow for software assets lifecycle covering all the steps from 1) identification of software mentions in research manuscripts, 2) their validation by authors, 3) their registration with PIDs and archival if needed.
2. Embed this workflow into established scholarly infrastructures, making the solution available to the global network of open repositories, covering tens of millions of open access research papers originating from across more than 12.5k repository systems.

Title: AI OnDemand: Segmenting Images at Scale with Ease
Authors: Cameron Shand, Marie-Charlotte Domart, Jon Smith and Amy Strange
Type: Regular talk
Time/track: 14:45-15:05, Software/RSE trackk
Slides: Presentation slides (via Zenodo)
Abstract:
As we continue to generate increasingly detailed and numerous images of biological structures, so too have the capabilities and sophistication of AI models used to analyse such data. Applying such models presents its own set of hurdles to use, however, often requiring computational skills and knowledge uncommon among those generating the data. This problem further compounds when trying to use local HPC or cloud compute, which is increasingly required due to the size of the data and hardware (GPU) needs. To address this, we have developed AI OnDemand, which combines an easily-navigable interface (in Napari) with a Nextflow pipeline to seamlessly execute running a range of segmentation models (such as Cellpose, Mitonet, and SAM2) at scale, distributing a model over parts of an image to maximise parallelisation. Through features like automatic UI construction, an extendable model registry with a simple schema, and the ability to use private/local models alongside public ones, we have developed a platform that easily incorporates new developments and inherently encourages community contribution. Through future features such as built-in model training and finetuning, we hope to further democratise the use of AI in biological image analysis and provide a tool as useful for ML experts as it is wet-lab scientists.

Title: Python Profiling and Optimisation & the RPC SIG
Authors: Jost Migenda and Robert Chisholm
Type: Lightning talk
Time/track: 15:05-15:10, Software/RSE track
Abstract:
Most researchers writing software are not classically trained programmers. Instead, they learn to code organically, often picking up “bad habits” that limit their software‘s performance.

In this talk, we present a new course on Python Profiling and Optimisation. We give an overview over the course contents, discuss our plans for developing the course further and share how you can run the course at your own institution. Finally, we introduce the Society of RSE’s Reasonable Performance Computing SIG and its plans to develop additional resources.

Title: Two more tiny Python packages for scientific computing: mpi-pytest and petsctools
Authors: Connor Ward
Type: Lightning talk
Time/track: 15:45-15:50, Software/RSE track
Abstract:
In this talk I will present two Python packages that I have written that I believe will be of interest/use to the wider community. The first package is called mpi-pytest; it is a pytest plugin for easily running tests in parallel using MPI. The second package is called petsctools and it provides a number of ‘Pythonic’ extensions to petsc4py (the Python interface to the PETSc library used for solving massive linear algebra problems).

Both packages originated as custom code inside the finite element simulation framework Firedrake and have been extracted as separate packages to allow for community reuse (and avoid copy-pasting the code into other libraries). Naturally both tools are thus already used within Firedrake.

Title: Mini-guide to reproducible Python code
Authors: Diego Alonso Álvarez
Type: Regular talk
Time/track: 15:50-16:10, Software/RSE track
Slides: Presentation slides (via Zenodo)
Abstract:
A lot of modern research requires custom software to be written, either to do some calculations, analyse experimental data or something else. Creating good quality, sustainable software is always desirable, but ticking all the boxes that are often described as necessary to accomplish this can be a daunting task for people - researchers - who often have other priorities in mind.

Reproducibility is, however, not an optional feature of a piece of research - including software or otherwise - and that is something that researchers are fully responsible for addressing. Luckily, out of the many requirements of good quality and sustainable software, only a handful are necessary, or can go a long way, to support the reproducibility of the results.

Here we describe these absolutely essential steps that researchers should take in order to support the reproducibility of their software. The recommendations are for software developed using Python, they might not apply to all cases, and it is not fool proof as reproducibility is a really complex business, but it is a good start, applicable - in spirit, at least, to other programming languages - and will narrow the chances of things going wrong when other people try to use the software.

Title: Towards a declarative, reproducible, homogeneous, cross-platform command-line environment across remote HPC machines
Authors: Krishnakumar Gopalakrishnan
Type: Regular talk
Time/track: 16:10-16:30, Software/RSE track
Abstract:
Whilst almost all HPC systems facilitate secure shell access for their users, they do not typically provide administrative permissions to them owing to security considerations on such shared resources. Traditionally, system administrators evaluate and install user-requested software natively and make it available through the modules system, or provide pre-built container images executable with a non-root container runtime installed system-wide. In recent years, the combinatorial explosion in their possible build provenances and runtime dependencies have led to the development of package managers like Spack and Easybuild that allow end-users to build and run scientific software from their home directories without requiring elevated privileges.

However, there exists a gap in the user experience (UX) aspect of using remote systems. While synchronising the user’s shell and other tool configurations (i.e. ‘dotfiles’) with a central repository can help towards homogenising remote shell environments, it does not solve UX issues like the remote system’s shell, runtime system libraries, core utilities and other user-facing parts being not recent enough to support a particular feature flag, for instance. Furthermore, recent years have seen the introduction of several cross-platform utilities written in high-performance compiled languages like Rust and Go for speeding up tasks like searching files/strings, improved shell history/theming, and other quality-of-life tooling.

This talk presents the author’s iterative journey with userspace meta package managers towards deploying such cross-platform static binaries on remote machines. It discusses the relative strengths and drawbacks of an exhaustive selection of tools considered, and introduces the author’s open-source automation module to deploy, manage and update the current state-of-art tooling best suited for each platform/architecture through trials. By sharing his journey, the author hopes to glean valuable feedback as well as provide the community with a framework that enables users with a declarative, reproducible, cross-platform, homogeneous, command-line environment on all their remote machines.

The self-contained workflow, code and configs presented in this talk are available here: https://github.com/krishnakumarg1984/setup_new_hosts

Talk slides (PDF)

Title: Introducing Helix: Imperial College London’s New FAIR Data Repository
Authors: Christopher Cave-Ayland and Wayne Peters
Type: Regular talk
Time/track: 14:25-14:45, Research data track
Abstract:
Recently released in beta, Helix is designed to provide a modern, sustainable, and well-defined data repository service that supports Imperial College London’s research strategy. The design and implementation of Helix have been the result of extensive collaboration among various digital research technical professionals (dRTPs) and academic roles. Data managers, Research Software Engineers (RSEs), and Research Infrastructure Engineers have all contributed to delivering a vision shaped by university functions such as the library, research office, historical archive, and academic departments.

To align with Imperial’s strategic goals, Helix has been developed in-house by extending and customizing InvenioRDM (the application that powers Zenodo) in partnership with the consultancy Cottage Labs. The development of the beta release has focused on building a robust, minimal core application deployed on stable infrastructure, while embedding in-house expertise. Planned future developments for Helix include support for depositing large (multi-terabyte) datasets, sensitive data, and domain-specific metadata formats.

This presentation will explore Imperial’s wider strategic objectives for supporting FAIR data principles, the service delivery model for Helix, key technical implementation aspects, and the supporting policies designed to ensure the long-term sustainability and accessibility of research data.

Title: Developing a sustainable data infrastructure for physical sciences
Authors: Nicola Knight, Samantha Pearman-Kanza, Louise Saul and Cerys Willoughby
Type: Regular talk
Time/track: 14:45-15:05, Research data track
Abstract:
As in many academic disciplines, challenges exist for researchers and research enablers in the physical sciences in ensuring that data and associated attributes are FAIR and published for re-use. Whilst the field is diverse and encapsulates many practices, the identified challenges include: data interoperability, loss of data when converting between different scientific data formats, heavy reliance on data generation through facilities, heterogeneity in naming, lack of uptake on tools to support best practice such as Electronic Lab Notebooks, and lack of training to support the required change in research culture.

The Physical Sciences Data Infrastructure (PSDI) initiative is creating a data infrastructure that connects existing experimental and computational facilities within Physical Sciences and beyond to enhance research efficiency. PSDI looks to improve data handling across the research data lifecycle through the creation of tools and services that can be deployed within the researcher’s workflow, developed from community requirements, incorporating strategies for the skills & training, best practices and standardization activities needed alongside.

Following an initial pilot and development phase which focused on connecting existing infrastructures, data stewardship practices, and best use of people technology, PSDI has recently launched an initial set of resources for the physical sciences community. These include services, software tools, data sources and guidance. PSDI enables researchers to use reference quality data from commercial and open sources; share resources within the community; make use of technologies such as AI to explore data; and learn how to make their research open and FAIR.

In this session, we will talk about how PSDI developed workflows to promote community engagement and the activities undertaken in the initiative. We will also describe how PSDI intends to promote the uptake of data stewardship practices as well as increasing disciplinary awareness of why these processes are essential.

Title: CaSDaR (The Careers and Skills for Data-Driven Research) Network+: Empowering Data Stewards for Research Excellence
Authors: Samantha Pearman-Kanza, Simon Coles, James Baker, Simon Hettrick and Isobel Stark
Type: Regular talk
Time/track: 15:45-16:05, Research data track
Abstract:
The amount of data generated by research, is growing at an exponential rate. And yet, so much of this data is unusable, due to the lack of expertise, tools, and resources for effective data management. Data Stewards are the key to bridging the gap between data generation and reuse, as they have a fundamental role that ensures the quality, accuracy, accessibility and longevity of data across the entire data lifecycle. We place great value in data, but the current investment in the time and resources to drive forward data excellence is sorely lacking, and best practice like FAIR cannot be implemented without investing in data stewards. So, this is where CaSDaR comes in! We are a brand new UKRI Funded Network+, started in April 2025 and our goal is to establish a diverse, inclusive, self-sustaining community of Data Stewards and to create a model for data steward support systems within research intensive institutions, thereby clarifying their role and integration within the research data lifecycle. This talk will discuss the important role that data stewards play across the entire data lifecycle, introduce CaSDaR and our plans for the next four years, and explain how you can get involved!

Title: Shoehorning Interoperability in Astronomical Science Data Metadata Model Mapping
Authors: Michael Johnson and Erin Brassfield Bourke
Type: Regular talk
Time/track: 16:05-16:25, Research data track
Abstract:
In a world of bespoke astrophysical data archives supplying unique data products described by roll-your-own metadata models, we as data stewards are tasked with evolving the data archive culture towards a FAIR (Findable, Accessible, Interoperable, Reproducable) data representation.

Looking toward a future of distributed data services and multi-wavelength astronomy, we employed a case study harvesting metadata from e-MERLIN interferometric radio telescope science data products into a relational database archive designed with the Canadian Astrophysical Data Centre’s CAOM (Common Archive Observation Model) schema and IVOA TAP (Table Access Protocol) service to test the feasibility of using an interoperable data model to accurately represent varied data products into a relational database.

We present lessons learned in mapping these hierarchical multi-target data products into a rich observational astronomy data discovery metadata model originally designed around single-telescope optical astronomy observations and weigh the benefits and detriments of a common model versus a roll-your-own approach.

Title: Building a Production-Ready Barts Health Secure Data Environment: Tooling, Access Control, and Cost Governance
Authors: Idowu Samuel Bioku, Evan Hann, Tony Wildish, Steven Newhouse, Benjamin Eaton, Ruzena Uddin and Francene Clarke-Walden
Type: Regular talk
Time/track: 14:15-14:35, Computing infrastructure/HPC track
Slides: Presentation slides (via Slideshare)
Abstract:
The Barts Health Data Platform (BHDP) is a Secure Data Environment (SDE) based on Azure Trusted Research Environment (TRE). It provides researchers with secure access to health data and scalable data analysis environments hosted on Microsoft Azure. However, additional technical requirements must be addressed before production deployment. These include customised VM images for complex health data research, and custom-built cost management tool to provide cost granularity for billing purpose.

The audience will learn about the enhancement of the core AzureTRE product with production-level tooling. These improvements provide robust data integration, cost transparency, and the ability to handle complex analytical workflows.

Key enhancements included the development of bespoke virtual machines tailored for machine learning, medical image analysis, and data workloads, all with automated configuration. Secure project-specific storage space is automatically provisioned for internal data pipelines to securely transfer data into the platform.

To improve cost oversight, we implemented a custom-built cost management tool capable of granular tracking and long-term cost attribution by research project. Transparent billing has also been achieved through the integrated cost management solution. In parallel, we are introducing identity management, a dynamic service catalogue to integrate with RBAC, allowing SDE and workspace administrators to control the set of tools available to researchers, making upgrades and deprecation of tools easier to manage.

The enhanced platform is in production as the default analysis environment for all new approved projects at Barts Health and researchers have benefited from the integration of customised virtual machines.

In summary, the BHDP demonstrates a production-ready tooling that includes secure data connectivity, customized compute resources, and robust cost governance to meet the NHS and research user requirements, offering a practical blueprint for similar platforms across the UK health data research landscape.

Title: Harnessing the power of AIRR supercomputers for trusted research
Authors: Jim Madge, Matt Craddock and Martin O’Reilly
Type: Regular talk
Time/track: 14:35-14:55, Computing infrastructure/HPC track
Slides: Presentation slides (via Zenodo)
Abstract:
How can we use the country’s most powerful supercomputers for research on sensitive data while keeping that data secure?

Complex AI models trained on large datasets are having a enormous impact in many research domains. However, training and applying such models requires high performance hardware and specialist accelerators such as GPUs. The new AI Research Resource (AIRR) has greatly expanded the availability of GPU-enabled compute to support large scale AI research in the UK.

Working with sensitive data requires high levels of security to ensure that the data is only accessible by approved researchers, and only for approved research. Trusted Research Environments (TREs) provide secure analysis environments for working safely with sensitive data. However, TREs do not provide the computational power and scaling that is required for the development and application of large models. Conversely, high performance computing (HPC) platforms do not generally support TRE capabilities, and therefore cannot provide sufficient security for working with sensitive data.

In FRIDGE we are building a SATRE and NHS standards compliant, cross-platform TRE on AIRR; unlocking the power of these system for AI-driven research using sensitive data.

In our talk we will show our progress in enabling trusted research on AIRR and discuss,

The unique challenges of creating a secure enclave on a shared resource
How FRIDGE can be used to add new capabilities to existing TREs
How we solve governance, when responsibility is shared between the HPC site and TRE operator

Title: No Secrets, Just Trust: Securely Deploying Infrastructure Without Persistent Credentials
Authors: Brian Maher
Type: Regular talk
Time/track: 14:55-15:15, Computing infrastructure/HPC track
Abstract:
As we transition towards a cloud-native world, Infrastructure as Code and other Continuous Integration/Continuous Delivery tools continue to become more ubiquitous. Whilst clearly an incredible benefit, enabling more agile infrastructures, these bring new challenges with regards to secrets management. Infrastructure deployments that may have previously been run by a lone sysadmin using a Kerberos ticket on their laptop once everyone had gone home for the night are now often run automatically using multi-user cloud-based tools. Tools which often hold the keys to multiple kingdoms.

These long-lived credentials are a multi-faceted evil. They present an opportunity for stolen credentials to be hoarded and used at a more opportune moment, cause a headache when staff members join or leave, and often expire at the worst possible time. Despite our best intentions, credential rotation remains an often-overlooked chore – especially given the sprawling nature of modern infrastructure.

This talk will explain how we can improve the situation by creating trust relationships between various tools in a deployment pipeline, replacing long-lived keys with just-in-time dynamic credentials. It will demonstrate how secrets management tools such as Hashicorp Vault can be used to build these trust relationships, give an overview of how JSON Web Token (JWT) authentication works and demonstrate some of the tooling that makes this possible.

Finally, whilst the primary point of this talk is theoretical, it will demonstrate a modern take on a classic sysadmin workflow: deploying an application to a virtual machine via SSH from a cloud-based tool with no secrets in sight.

Title: Developing the next generation of dRTPs
Authors: Stephanie E.M. Thompson
Type: Lightning talk
Time/track: 15:45-15:50, Computing infrastructure/HPC track
Abstract:
The Advanced Research Computing team at the University of Birmingham has been running the annual supercomputing ‘BEAR Challenge’ for several years. This increasingly popular 3-day event, exclusively for taught students, features a range of challenges such as agent-based modelling and designing a compute cluster. The students work in teams, often interdisciplinary, developing their social skills, as well as knowledge of supercomputing in relation to real-world problems. A key draw for students from computer science is the access that they are given to a Tier 2 (national) supercomputer. Popular features of the event are talks on careers in the HPC-field from industry and the University, and tours of our innovative data centre – now mostly virtual due to numbers but with the top teams getting a full tour. The challenge has expanded from 5 teams to 15 but demand still cannot be met, with 28 teams registering interest for this year’s challenge so far, including from economics and engineering. In this talk I will give some tips on how to reach students outside of computer science and describe how combining teaching, research and careers not only benefits the students, but can raise your profile with the most senior levels of your institution.

Title: Addressing the HPC Skills Shortage Through Learning Pathways and Visible Infrastructure
Authors: Jeremy Cohen, Weronika Filinger, Eirini Zormpa and Michael Bearpark
Type: Regular talk
Time/track: 15:50-16:10, Computing infrastructure/HPC track
Abstract:
High Performance Computing (HPC) and large-scale research computing infrastructure are becoming ever more important parts of the research lifecycle. They enable researchers to process the huge volumes of data, run the high resolution simulations and train the next generation AI models that represent such an important aspect of much modern research. AI is, of course, critical since the AI revolution is a primary driver of the massive demand for specialist computing infrastructure in both research and industry. This is great for the HPC community, however, demand for skilled HPC professionals is going beyond the capabilities of existing approaches to inspire and train people to take up roles in this field.

What can we do about this and how can we change existing unsustainable methods to address the HPC skills shortage?

This talk will provide an overview of two key areas 1) the development of training pathways that can help to effectively deliver technical skills to existing or new research computing professionals and 2) ways to inspire the next generation of research computing professionals by making infrastructure more visible, helping us to build a much more diverse and inclusive community of technical professionals.

In the context of training pathways, we’ll look at work undertaken in the UNIVERSE-HPC project, and through related activities and groups, to develop an understanding of the different routes to grow HPC skills, and the specific skill sets that are required. In the context of inspiring the next generation of HPC practitioners, we’ll look at our developing “visible HPC” programme which aims to address the fact that people rarely have the opportunity to see large-scale computing infrastructure. They generally interact with it through a terminal on a laptop or desktop computer where there is little visible difference between connecting to a local server or the world’s largest supercomputers!

Title: EasyBob, the friendly software installation bot
Authors: Jörg Saßmannshausen
Type: Regular talk
Time/track: 16:10-16:30, Computing infrastructure/HPC track
Abstract:
Installation of software on a heterogeneous cluster consisting out of several CPU and GPU architectures can be quite a challenge when done manually. The risks are:

the installation is not reproducible
the installation is only done for one CPU micro-architecture (common denominator)
required dependencies versions vary from installation to installation

This is not in line of a modern approach regarding software installation, which requires an installation to be reproducible.

At Imperial College London, we were facing this issue as our clusters consist of several Intel CPU micro-architectures, some AMD nodes and a healthy mixture of GPU nodes as well. To make the mix more interesting, the cluster internally has an IPv6 network only.

So we came up with an automatic installation program called EasyBob [1]. EasyBob not only allows us to do the installation micro-architecture specific, taking into account different CPU/GPU nodes, it also could serve as the ‘backend’ of the self-service portal where users can request software to be installed via our ticket system in the future. To enable this, we wrote a robot which uses EasyBuild and only requires the name of the EasyConfig file of the software to be installed. That name can conveniently derived from a user-facing web-interface. Furthermore, 1st line support could kick off the bot without having any knowledge about EasyBuild or the bot itself.

This presentation gives an insight into the mechanics of the bot, how it can be configured and how it is used at Imperial.

[1] https://github.com/sassy-crick/easybob

Title: Shaping Research Culture Through Communities: Lessons from Open Science
Authors: Malvika Sharan, The Alan Turing Institute and OLS
Type: Closing keynote, General track
Slides: Presentation slides (via Zenodo)
Abstract:
Ever been in a research team or community where you felt truly welcome, empowered, and excited to engage? This experience is rarely accidental. The secret sauce lies in intentional facilitation, genuine spaces for collaboration, and inclusive community management. My talk will explore lessons (the ingredients) from both a community member’s and a community builder’s perspective. Drawing from my experience participating in and building Open Science communities, specifically The Turing Way and Open Life Science (OLS), I will highlight key aspects and actionable strategies for engaging and supporting research communities. Attendees, whether in ‘formal’ or informal roles, will leave with valuable insights and familiar reminders on fostering inclusive communities, improving research culture, and preserving the inherent joy of collaboration. Ultimately, it’s about investing in communities to achieve research goals that serve our society.

Poster abstracts

Title: Designing support strategies for enabling cultural change in data practices for the physical sciences
Authors: Nicola Knight, Samantha Pearman-Kanza, Louise Saul and Cerys Willoughby
Type: Poster
Abstract:
Support strategies, such as training and resource provision, are essential in the process of improving the uptake of practices aligned with research excellence. We will describe how the Physical Sciences Data Infrastructure (PSDI), an initiative developed in response to a disciplinary need, has developed resources and infrastructure to promote a cultural shift towards implementing sustainable data practices. We will describe how input was sought from key stakeholders to identify the areas where training and resources were needed and how this led to the development of asynchronous training modules, a knowledge base with the collected essential information, and seminars to provide the opportunity for peer discussion and learning. Furthermore, we will detail the collection of feedback from the community and how this informed evaluation processes.

Title: Research Data Stewardship at UCL
Authors: Katarina Buntic, Mahmoud Abdelrazek, Martin Donnelly, Daniel Delargy, James A J Wilson, Michelle Harricharan, Nicholas Owen, Preeti Matharu, Sulyman Abdulkareem, Shipra Suman, Victor Olago, Victoria Yorke-Edwards, Angharad Green, Farzan Ramzan, Georg Otto, Murat Halit, Socrates Varakliotis and Jack Hindley
Type: Poster
Abstract:
This poster defines the data stewardship model adopted by Advanced Research Computing (ARC) at University College London (UCL), focusing on the formalisation and professionalisation of the research data steward role/job family. UCL’s Research Data group is actively engaged in supporting research services and collaborating within UCL and with other institutions on various research projects.

ARC is both a research centre and a professional services provider, collaborating on numerous projects across various UCL departments, taking a holistic approach and providing tailored solutions. We showcase examples of these projects and the services we offer, including the Research Data Storage Service, Research Data Repository, and Electronic Research Notebook. Some flagship collaborative projects include Harbour, where we assist in organising and streamlining datasets collected by UCL researchers, and E-Child, where our data stewards contribute to integrating health, education, and social care information for all children in England.

Title: Building a Scalable, Open Research Data Repository with Ceph and Customised InvenioRDM
Authors: Irufan Ahmed
Type: Poster
Abstract:
This poster presents our experience developing a scalable, institution-hosted research data repository by integrating a customised instance of InvenioRDM with Ceph object storage. The repository is designed to support HPC simulation and experimental datasets, emphasising open infrastructure and institutional control.

We outline key architectural decisions, including our evaluation of Ceph deployment strategies (bare-metal vs. containerised cephadm) and storage interfaces (CephFS vs. RGW S3). We also present the integration of Ceph with a customised InvenioRDM instance that enables partial file retrieval and institutional LDAP authentication. A key aspect of our work involved designing custom metadata schemas to support domain-specific workflows. One example draws on NASA’s PDS4 data modelling approach, providing a structured and extensible framework for describing large-scale simulation outputs. We share lessons learned in configuring Ceph for research data access, extending InvenioRDM’s functionality, and resolving integration challenges between complex open-source components.

This work will interest research computing professionals seeking sustainable, extensible alternatives to commercial cloud platforms. It aims to support wider discussions around digital research infrastructure and promote community-driven development.