STEP-UP RSLondon 2025: Abstracts
This page contains the abstracts of all accepted submissions for the STEP-UP RSLondon Conference 2025. You can browse the abstracts here or link to a specific abstract by clicking a talk title in the conference schedule.
Where available, links to conference presentations are also provided.
Poster abstracts are provided towards the bottom of the page.
Title: Bringing the Virtual Human to Life
Authors: Andrea Townsend-Nicholson, UCL
Type: Keynote
Time/Track: 10:10-10:55, General track
Abstract:
Imagine assembling a silicon twin for every person using their digital health
and biological data, from the whole genome sequence to their skeletal
architecture. Imagine possessing the ability to make predictions not only of
disease progression and outcomes but also the therapeutic effects of
treatment-options and interventions - and the means of testing the impact of
different lifestyles and treatments to select the ones that give us the
preferred outcome. This digital twin could be used to make informed decisions
that range from the treatment of disease to improving quality of life and,
ultimately, will greatly inform the design of clinical trials so that these
could be faster and safer. Since 2016, I have been focused on supporting the
creation and translation of patient-specific computational models into
validated human digital twins to inform clinical decision making through my
research in drug discovery, and my education and outreach activities. Here, I
will describe “Bringing the Virtual Human to Life” – a community effort that
requires bringing together a wide range of people from experts in computational
biomedicine, clinicians, policymakers and the general public. I will describe
the state of the art in the building of human digital twin components, and I
will share several of the challenges that I have encountered along the way that
have been resolved with the support of the research technology professionals
with whom I have been collaborating.
Title: Contextualising open science training programmes
Authors: Sara Villa, Yo Yehudi and Malvika Sharan
Type: Regular talk
Time/track: 11:40-12:00, General track
Slides: Presentation slides (via Zenodo)
Abstract:
Open science training programmes are on the rise both in academic and non
academic settings. Several organisations are providing upskilling sessions on
different topics essential to ensure reproducibility and accessibility of
research. Among them, OLS, a capacity building organisation has led the way
with their flagship Open Seeds programme, a training and mentoring programme in
Open Science for researchers from all levels. This programme provides hands-on
experience with open science principles during a 16 week long training period,
including different topics like Open data, open communities and EDI.
One of the disadvantages of these kinds of global programmes is the lack of
context in the content provided. Participants can fall out from learning key
concepts and tools related to their specific fields of work.
We have recently run a pilot version of Open Seeds in the School of
Neuroscience at King’s College London, called Open Neuroseeds. In this session,
we will highlight the lessons learned and the feedback obtained from
contextualising this important type of training. We will provide tips and
tricks to adapt your own training programmes to the audience to help
contextualise lessons learn, and hence, embrace them more efficiently in their
research workflows.
This is an important discussion for all RSEs and trainers who usually work in
multidisciplinary teams and with researchers not well versed in open science
practices.
Title: Professionalising diverse data science roles
Authors: Emma Karoune and Malvika Sharan
Type: Regular talk
Time/track: 12:00-12:20, General track
Slides: Presentation slides (via Zenodo)
Abstract:
The interdisciplinary nature of the data science workforce extends beyond the
traditional notion of a “data scientist.” A successful data science team
requires a wide range of technical expertise, domain knowledge and leadership
capabilities. To strengthen such a team-based approach, we recommend that
institutions, funders and policymakers invest in developing and
professionalising diverse roles, including a broad set of digital research
technical professionals roles, fostering a resilient data science ecosystem for
the future.
We will delve into the different roles that make up data science teams at
Turing, including research software engineers, data wranglers and research
computing engineers, highlighting overlapping as well as specific skills. We
will also explore the importance of coordination and engagement roles such as
project managers and research community managers, which are often overlooked in
conversations about digital research technical professionals, but who play an
essential role in the team to drive forward collaboration and impact.
By recognising these diverse specialist roles that collaborate within
interdisciplinary teams, organisations can leverage deep expertise across
multiple skill sets, enhancing responsible decision-making and fostering
innovation at all levels. Ultimately, we seek to shift the perception of data
science professionals from the conventional view of individual data scientists
to a competency-based model of specialist roles within a team, each essential
to the success of data science initiatives.
Title: Why are we creating an Open Source Programme Office at UCL? How are we doing so?
Authors: David Pérez-Suárez, Mosè Giordano, Miguel Xochicale and Sam Cunliffe
Type: Lightning talk
Time/track: 12:20-12:25, General track
Abstract:
An Open Source Programme Office (OSPO) is a body within an organisation to look
after their open source strategy and operations. Much of the research,
infrastructure and every day lives depends on Open Source, but universities
don’t know how much. An OSPO can provide insights on that and become a body
that protects and nurture Open Source communities from within universities. We
are creating an OSPO at UCL! Join us and learn how we are doing it and how you
could start one at your institution.
Title: DisCouRSE: Developing a Community of Leaders
Authors: Jonathan Cooper
Type: Lightning talk
Time/track: 12:25-12:30, General track
Slides: Presentation slides (via Zenodo)
Abstract:
I will introduce our newly funded UKRI Network+ grant, aiming to connect and
train aspiring leaders across digital Research Technical Professional career
tracks. The talk will cover what we’re aiming to accomplish in the next 3.5
years, and how we would like the whole community to get involved!
Title: CCP-AHC: A new collaborative computational community for arts, humanities, and culture research
Authors: Eamonn Bell, Karina Rodriguez and Jeyan Thiyagalingam
Type: Lightning talk
Time/track: 12:30-12:35, General track
Slides: Presentation slides (via Zenodo)
Abstract:
We present “Toward a new CCP for Arts, Humanities, and Culture research
(CCP-AHC)”, a new research software community-building exercise, funded by UKRI
and STFC for 24 months from January 2025. The goal of CCP-AHC is to support the
sustainable and efficient development of software, pipelines, and workflows
used by arts, humanities, and culture researchers who make use of UK-based
digital research infrastructure (DRI), including high-performance computing
(HPC) and advanced computing infrastructures supported by UKRI. It will do so
by disseminating and implementing the Collaborative Computational Project (CCP)
model that has successfully been used by many other scientific software
communities over the past several decades, with the support of the national
CoSeC programme at STFC. There is a special emphasis on ensuring that research
software developed by this community makes the best of use of large-scale
compute infrastructures supported by public funding. This includes existing and
future high-performance computing (HPC) and advanced computing infrastructures
supported by UKRI, as well as those run by UK-based HEIs and other research
organisations eligible for UKRI funding. During the first year of the project,
we will bring together key stakeholders in the area, including RSEs and other
dRTPs supporting computationally intensive research in arts, humanities, and
culture. In the second year, we will use RSE and computational scientist
resource within the project to produce a portfolio of codes, pipelines, and
workflows for adoption by the project. Our main deliverable is a roadmap for
the development of the community, proposing a five-year plan for the community
within the broader national and international DRI landscape.
Title: Diversity and Inclusion in Practice: What you need to know when you plan to hire International RSE and dRTPs
Authors: Yo Yehudi, Sarah Villa, Aman Goel, Toby Hodges and Malvika Sharan
Type: Regular talk
Time/track: 12:35-12:55, General track
Abstract:
The world, and the UK specifically, is currently very anti-diversity, and
anti-inclusion. One way to push back against this - indeed, to step up as a
UK-based Ally, is to recruit international RSEs and PhD students into your
teams. They can bring a breadth of experience that includes different cultural,
religious, and racial backgrounds, as well as different language and domain
expertise.
It’s not always straightforward, however, in a hostile environment. This talk
will draw from real-world experiences of immigrants, and provide practical tips
for how to work around the system to create more inclusive spaces. This is
reflected directly in the talk authorship: Aman is an Indian national,
currently in the UK on a time and contract bound work visa [1] (obtained it
before the rules were made more restrictive [2]) , Yo is a naturalised UK
citizen of Israeli/Kiwi background (who was deported twice when they first came
to the UK), Sara is a Spanish national with settled status (luckily obtained
before Brexit so it didn’t impact her work options).
We will also solicit experiences and advice from the community present at the
event, and more widely. We anticipate that eventually a learning module on this
topic will be submitted to The Carpentries Incubator [3] for a richer “what you
need to know when leading a group of RSEs” lesson.
[1] https://www.gov.uk/skilled-worker-visa
[2] https://www.gov.uk/government/news/new-laws-to-cut-migration-and-put-british-workers-first-in-force
[3] https://carpentries-incubator.org/
Title: SoFAIR - Making Software FAIR: A machine-assisted workflow for the research software lifecycle
Authors: David Pride, Petr Knoth, Matteo Cancellieri and Laurent Romary
Type: Regular talk
Time/track: 14:25-14:45, Software/RSE track
Abstract:
A key issue hindering discoverability, attribution and reusability of open
research software is that its existence often remains hidden within the
manuscript of research papers. For these resources to become first-class
bibliographic records, they first need to be identified and subsequently
registered with persistent identifiers (PIDs) to be made FAIR (Findable,
Accessible, Interoperable and Reusable). To this day, much open research
software fails to meet FAIR principles and software resources are mostly not
explicitly linked from the manuscripts that introduced them or used them.
SoFAIR is a 2-year international project (2024-2025) which proposes a solution
to the above problem realised over the content available through the global
network of open repositories. SoFAIR will extend the capabilities of widely
used open scholarly infrastructures (CORE, Software Heritage, HAL) and tools
(GROBID) operated by the consortium partners, delivering and deploying an
effective solution for the management of the research software lifecycle.
The ambition of SoFAIR is to:
1. Develop a machine learning assisted workflow for software assets lifecycle
covering all the steps from 1) identification of software mentions in
research manuscripts, 2) their validation by authors, 3) their registration
with PIDs and archival if needed.
2. Embed this workflow into established scholarly infrastructures, making the
solution available to the global network of open repositories, covering tens
of millions of open access research papers originating from across more than
12.5k repository systems.
Title: AI OnDemand: Segmenting Images at Scale with Ease
Authors: Cameron Shand, Marie-Charlotte Domart, Jon Smith and Amy Strange
Type: Regular talk
Time/track: 14:45-15:05, Software/RSE trackk
Slides: Presentation slides (via Zenodo)
Abstract:
As we continue to generate increasingly detailed and numerous images of
biological structures, so too have the capabilities and sophistication of AI
models used to analyse such data. Applying such models presents its own set of
hurdles to use, however, often requiring computational skills and knowledge
uncommon among those generating the data. This problem further compounds when
trying to use local HPC or cloud compute, which is increasingly required due to
the size of the data and hardware (GPU) needs. To address this, we have
developed AI OnDemand, which combines an easily-navigable interface (in Napari)
with a Nextflow pipeline to seamlessly execute running a range of segmentation
models (such as Cellpose, Mitonet, and SAM2) at scale, distributing a model
over parts of an image to maximise parallelisation. Through features like
automatic UI construction, an extendable model registry with a simple schema,
and the ability to use private/local models alongside public ones, we have
developed a platform that easily incorporates new developments and inherently
encourages community contribution. Through future features such as built-in
model training and finetuning, we hope to further democratise the use of AI in
biological image analysis and provide a tool as useful for ML experts as it is
wet-lab scientists.
Title: Python Profiling and Optimisation & the RPC SIG
Authors: Jost Migenda and Robert Chisholm
Type: Lightning talk
Time/track: 15:05-15:10, Software/RSE track
Abstract:
Most researchers writing software are not classically trained programmers.
Instead, they learn to code organically, often picking up “bad habits” that
limit their software‘s performance.
In this talk, we present a new course on Python Profiling and Optimisation. We
give an overview over the course contents, discuss our plans for developing the
course further and share how you can run the course at your own institution.
Finally, we introduce the Society of RSE’s Reasonable Performance Computing SIG
and its plans to develop additional resources.
Title: Two more tiny Python packages for scientific computing: mpi-pytest and petsctools
Authors: Connor Ward
Type: Lightning talk
Time/track: 15:45-15:50, Software/RSE track
Abstract:
In this talk I will present two Python packages that I have written that I
believe will be of interest/use to the wider community. The first package is
called mpi-pytest; it is a pytest plugin for easily running tests in parallel
using MPI. The second package is called petsctools and it provides a number of
‘Pythonic’ extensions to petsc4py (the Python interface to the PETSc library
used for solving massive linear algebra problems).
Both packages originated as custom code inside the finite element simulation
framework Firedrake and have been extracted as separate packages to allow for
community reuse (and avoid copy-pasting the code into other libraries).
Naturally both tools are thus already used within Firedrake.
Title: Mini-guide to reproducible Python code
Authors: Diego Alonso Álvarez
Type: Regular talk
Time/track: 15:50-16:10, Software/RSE track
Slides: Presentation slides (via Zenodo)
Abstract:
A lot of modern research requires custom software to be written, either to do
some calculations, analyse experimental data or something else. Creating good
quality, sustainable software is always desirable, but ticking all the boxes
that are often described as necessary to accomplish this can be a daunting task
for people - researchers - who often have other priorities in mind.
Reproducibility is, however, not an optional feature of a piece of research -
including software or otherwise - and that is something that researchers are
fully responsible for addressing. Luckily, out of the many requirements of good
quality and sustainable software, only a handful are necessary, or can go a
long way, to support the reproducibility of the results.
Here we describe these absolutely essential steps that researchers should take
in order to support the reproducibility of their software. The recommendations
are for software developed using Python, they might not apply to all cases, and
it is not fool proof as reproducibility is a really complex business, but it is
a good start, applicable - in spirit, at least, to other programming
languages - and will narrow the chances of things going wrong when other people
try to use the software.
Title: Towards a declarative, reproducible, homogeneous, cross-platform
command-line environment across remote HPC machines
Authors: Krishnakumar Gopalakrishnan
Type: Regular talk
Time/track: 16:10-16:30, Software/RSE track
Abstract:
Whilst almost all HPC systems facilitate secure shell access for their users,
they do not typically provide administrative permissions to them owing to
security considerations on such shared resources. Traditionally, system
administrators evaluate and install user-requested software natively and make
it available through the modules system, or provide pre-built container images
executable with a non-root container runtime installed system-wide. In recent
years, the combinatorial explosion in their possible build provenances and
runtime dependencies have led to the development of package managers like Spack
and Easybuild that allow end-users to build and run scientific software from
their home directories without requiring elevated privileges.
However, there exists a gap in the user experience (UX) aspect of using remote
systems. While synchronising the user’s shell and other tool configurations
(i.e. ‘dotfiles’) with a central repository can help towards homogenising
remote shell environments, it does not solve UX issues like the remote system’s
shell, runtime system libraries, core utilities and other user-facing parts
being not recent enough to support a particular feature flag, for instance.
Furthermore, recent years have seen the introduction of several cross-platform
utilities written in high-performance compiled languages like Rust and Go for
speeding up tasks like searching files/strings, improved shell history/theming,
and other quality-of-life tooling.
This talk presents the author’s iterative journey with userspace meta package
managers towards deploying such cross-platform static binaries on remote
machines. It discusses the relative strengths and drawbacks of an exhaustive
selection of tools considered, and introduces the author’s open-source
automation module to deploy, manage and update the current state-of-art tooling
best suited for each platform/architecture through trials. By sharing his
journey, the author hopes to glean valuable feedback as well as provide the
community with a framework that enables users with a declarative, reproducible,
cross-platform, homogeneous, command-line environment on all their remote
machines.
The self-contained workflow, code and configs presented in this talk are available here: https://github.com/krishnakumarg1984/setup_new_hosts
Talk slides (PDF)
Title: Introducing Helix: Imperial College London’s New FAIR Data Repository
Authors: Christopher Cave-Ayland and Wayne Peters
Type: Regular talk
Time/track: 14:25-14:45, Research data track
Abstract:
Recently released in beta, Helix is designed to provide a modern, sustainable,
and well-defined data repository service that supports Imperial College
London’s research strategy. The design and implementation of Helix have been
the result of extensive collaboration among various digital research technical
professionals (dRTPs) and academic roles. Data managers, Research Software
Engineers (RSEs), and Research Infrastructure Engineers have all contributed to
delivering a vision shaped by university functions such as the library,
research office, historical archive, and academic departments.
To align with Imperial’s strategic goals, Helix has been developed in-house by
extending and customizing InvenioRDM (the application that powers Zenodo) in
partnership with the consultancy Cottage Labs. The development of the beta
release has focused on building a robust, minimal core application deployed on
stable infrastructure, while embedding in-house expertise. Planned future
developments for Helix include support for depositing large (multi-terabyte)
datasets, sensitive data, and domain-specific metadata formats.
This presentation will explore Imperial’s wider strategic objectives for
supporting FAIR data principles, the service delivery model for Helix, key
technical implementation aspects, and the supporting policies designed to
ensure the long-term sustainability and accessibility of research data.
Title: Developing a sustainable data infrastructure for physical sciences
Authors: Nicola Knight, Samantha Pearman-Kanza, Louise Saul and Cerys Willoughby
Type: Regular talk
Time/track: 14:45-15:05, Research data track
Abstract:
As in many academic disciplines, challenges exist for researchers and research
enablers in the physical sciences in ensuring that data and associated
attributes are FAIR and published for re-use. Whilst the field is diverse and
encapsulates many practices, the identified challenges include: data
interoperability, loss of data when converting between different scientific
data formats, heavy reliance on data generation through facilities,
heterogeneity in naming, lack of uptake on tools to support best practice such
as Electronic Lab Notebooks, and lack of training to support the required
change in research culture.
The Physical Sciences Data Infrastructure (PSDI) initiative is creating a data
infrastructure that connects existing experimental and computational facilities
within Physical Sciences and beyond to enhance research efficiency. PSDI looks
to improve data handling across the research data lifecycle through the
creation of tools and services that can be deployed within the researcher’s
workflow, developed from community requirements, incorporating strategies for
the skills & training, best practices and standardization activities needed
alongside.
Following an initial pilot and development phase which focused on connecting
existing infrastructures, data stewardship practices, and best use of people
technology, PSDI has recently launched an initial set of resources for the
physical sciences community. These include services, software tools, data
sources and guidance. PSDI enables researchers to use reference quality data
from commercial and open sources; share resources within the community; make
use of technologies such as AI to explore data; and learn how to make their
research open and FAIR.
In this session, we will talk about how PSDI developed workflows to promote
community engagement and the activities undertaken in the initiative. We will
also describe how PSDI intends to promote the uptake of data stewardship
practices as well as increasing disciplinary awareness of why these processes
are essential.
Title: CaSDaR (The Careers and Skills for Data-Driven Research) Network+: Empowering Data Stewards for Research Excellence
Authors: Samantha Pearman-Kanza, Simon Coles, James Baker, Simon Hettrick and Isobel Stark
Type: Regular talk
Time/track: 15:45-16:05, Research data track
Abstract:
The amount of data generated by research, is growing at an exponential rate.
And yet, so much of this data is unusable, due to the lack of expertise, tools,
and resources for effective data management. Data Stewards are the key to
bridging the gap between data generation and reuse, as they have a fundamental
role that ensures the quality, accuracy, accessibility and longevity of data
across the entire data lifecycle. We place great value in data, but the current
investment in the time and resources to drive forward data excellence is sorely
lacking, and best practice like FAIR cannot be implemented without investing in
data stewards. So, this is where CaSDaR comes in! We are a brand new UKRI
Funded Network+, started in April 2025 and our goal is to establish a diverse,
inclusive, self-sustaining community of Data Stewards and to create a model for
data steward support systems within research intensive institutions, thereby
clarifying their role and integration within the research data lifecycle. This
talk will discuss the important role that data stewards play across the entire
data lifecycle, introduce CaSDaR and our plans for the next four years, and
explain how you can get involved!
Title: Shoehorning Interoperability in Astronomical Science Data Metadata Model Mapping
Authors: Michael Johnson and Erin Brassfield Bourke
Type: Regular talk
Time/track: 16:05-16:25, Research data track
Abstract:
In a world of bespoke astrophysical data archives supplying unique data
products described by roll-your-own metadata models, we as data stewards are
tasked with evolving the data archive culture towards a FAIR (Findable,
Accessible, Interoperable, Reproducable) data representation.
Looking toward a future of distributed data services and multi-wavelength
astronomy, we employed a case study harvesting metadata from e-MERLIN
interferometric radio telescope science data products into a relational
database archive designed with the Canadian Astrophysical Data Centre’s CAOM
(Common Archive Observation Model) schema and IVOA TAP (Table Access Protocol)
service to test the feasibility of using an interoperable data model to
accurately represent varied data products into a relational database.
We present lessons learned in mapping these hierarchical multi-target data
products into a rich observational astronomy data discovery metadata model
originally designed around single-telescope optical astronomy observations and
weigh the benefits and detriments of a common model versus a roll-your-own
approach.
Title: Building a Production-Ready Barts Health Secure Data Environment: Tooling, Access Control, and Cost Governance
Authors: Idowu Samuel Bioku, Evan Hann, Tony Wildish, Steven Newhouse, Benjamin Eaton, Ruzena Uddin and Francene Clarke-Walden
Type: Regular talk
Time/track: 14:15-14:35, Computing infrastructure/HPC track
Slides: Presentation slides (via Slideshare)
Abstract:
The Barts Health Data Platform (BHDP) is a Secure Data Environment (SDE) based
on Azure Trusted Research Environment (TRE). It provides researchers with
secure access to health data and scalable data analysis environments hosted on
Microsoft Azure. However, additional technical requirements must be addressed
before production deployment. These include customised VM images for complex
health data research, and custom-built cost management tool to provide cost
granularity for billing purpose.
The audience will learn about the enhancement of the core AzureTRE product with
production-level tooling. These improvements provide robust data integration,
cost transparency, and the ability to handle complex analytical workflows.
Key enhancements included the development of bespoke virtual machines tailored
for machine learning, medical image analysis, and data workloads, all with
automated configuration. Secure project-specific storage space is automatically
provisioned for internal data pipelines to securely transfer data into the
platform.
To improve cost oversight, we implemented a custom-built cost management tool
capable of granular tracking and long-term cost attribution by research
project. Transparent billing has also been achieved through the integrated cost
management solution. In parallel, we are introducing identity management, a
dynamic service catalogue to integrate with RBAC, allowing SDE and workspace
administrators to control the set of tools available to researchers, making
upgrades and deprecation of tools easier to manage.
The enhanced platform is in production as the default analysis environment for
all new approved projects at Barts Health and researchers have benefited from
the integration of customised virtual machines.
In summary, the BHDP demonstrates a production-ready tooling that includes
secure data connectivity, customized compute resources, and robust cost
governance to meet the NHS and research user requirements, offering a practical
blueprint for similar platforms across the UK health data research landscape.
Title: Harnessing the power of AIRR supercomputers for trusted research
Authors: Jim Madge, Matt Craddock and Martin O’Reilly
Type: Regular talk
Time/track: 14:35-14:55, Computing infrastructure/HPC track
Slides: Presentation slides (via Zenodo)
Abstract:
How can we use the country’s most powerful supercomputers for research on
sensitive data while keeping that data secure?
Complex AI models trained on large datasets are having a enormous impact in
many research domains. However, training and applying such models requires high
performance hardware and specialist accelerators such as GPUs. The new AI
Research Resource (AIRR) has greatly expanded the availability of GPU-enabled
compute to support large scale AI research in the UK.
Working with sensitive data requires high levels of security to ensure that the
data is only accessible by approved researchers, and only for approved
research. Trusted Research Environments (TREs) provide secure analysis
environments for working safely with sensitive data. However, TREs do not
provide the computational power and scaling that is required for the
development and application of large models. Conversely, high performance
computing (HPC) platforms do not generally support TRE capabilities, and
therefore cannot provide sufficient security for working with sensitive
data.
In FRIDGE we are building a SATRE and NHS standards compliant, cross-platform
TRE on AIRR; unlocking the power of these system for AI-driven research using
sensitive data.
In our talk we will show our progress in enabling trusted research on AIRR and
discuss,
- The unique challenges of creating a secure enclave on a shared resource
- How FRIDGE can be used to add new capabilities to existing TREs
- How we solve governance, when responsibility is shared between the HPC site and TRE operator
Title: No Secrets, Just Trust: Securely Deploying Infrastructure Without Persistent Credentials
Authors: Brian Maher
Type: Regular talk
Time/track: 14:55-15:15, Computing infrastructure/HPC track
Abstract:
As we transition towards a cloud-native world, Infrastructure as Code and other
Continuous Integration/Continuous Delivery tools continue to become more
ubiquitous. Whilst clearly an incredible benefit, enabling more agile
infrastructures, these bring new challenges with regards to secrets management.
Infrastructure deployments that may have previously been run by a lone sysadmin
using a Kerberos ticket on their laptop once everyone had gone home for the
night are now often run automatically using multi-user cloud-based tools. Tools
which often hold the keys to multiple kingdoms.
These long-lived credentials are a multi-faceted evil. They present an
opportunity for stolen credentials to be hoarded and used at a more opportune
moment, cause a headache when staff members join or leave, and often expire at
the worst possible time. Despite our best intentions, credential rotation
remains an often-overlooked chore – especially given the sprawling nature of
modern infrastructure.
This talk will explain how we can improve the situation by creating trust
relationships between various tools in a deployment pipeline, replacing
long-lived keys with just-in-time dynamic credentials. It will demonstrate how
secrets management tools such as Hashicorp Vault can be used to build these
trust relationships, give an overview of how JSON Web Token (JWT)
authentication works and demonstrate some of the tooling that makes this
possible.
Finally, whilst the primary point of this talk is theoretical, it will
demonstrate a modern take on a classic sysadmin workflow: deploying an
application to a virtual machine via SSH from a cloud-based tool with no
secrets in sight.
Title: Developing the next generation of dRTPs
Authors: Stephanie E.M. Thompson
Type: Lightning talk
Time/track: 15:45-15:50, Computing infrastructure/HPC track
Abstract:
The Advanced Research Computing team at the University of Birmingham has been
running the annual supercomputing ‘BEAR Challenge’ for several years. This
increasingly popular 3-day event, exclusively for taught students, features a
range of challenges such as agent-based modelling and designing a compute
cluster. The students work in teams, often interdisciplinary, developing their
social skills, as well as knowledge of supercomputing in relation to real-world
problems. A key draw for students from computer science is the access that they
are given to a Tier 2 (national) supercomputer. Popular features of the event
are talks on careers in the HPC-field from industry and the University, and
tours of our innovative data centre – now mostly virtual due to numbers but
with the top teams getting a full tour. The challenge has expanded from 5 teams
to 15 but demand still cannot be met, with 28 teams registering interest for
this year’s challenge so far, including from economics and engineering. In this
talk I will give some tips on how to reach students outside of computer science
and describe how combining teaching, research and careers not only benefits the
students, but can raise your profile with the most senior levels of your
institution.
Title: Addressing the HPC Skills Shortage Through Learning Pathways and Visible Infrastructure
Authors: Jeremy Cohen, Weronika Filinger, Eirini Zormpa and Michael Bearpark
Type: Regular talk
Time/track: 15:50-16:10, Computing infrastructure/HPC track
Abstract:
High Performance Computing (HPC) and large-scale research computing
infrastructure are becoming ever more important parts of the research
lifecycle. They enable researchers to process the huge volumes of data, run the
high resolution simulations and train the next generation AI models that
represent such an important aspect of much modern research. AI is, of course,
critical since the AI revolution is a primary driver of the massive demand for
specialist computing infrastructure in both research and industry. This is
great for the HPC community, however, demand for skilled HPC professionals is
going beyond the capabilities of existing approaches to inspire and train
people to take up roles in this field.
What can we do about this and how can we change existing unsustainable methods
to address the HPC skills shortage?
This talk will provide an overview of two key areas 1) the development of
training pathways that can help to effectively deliver technical skills to
existing or new research computing professionals and 2) ways to inspire the
next generation of research computing professionals by making infrastructure
more visible, helping us to build a much more diverse and inclusive community
of technical professionals.
In the context of training pathways, we’ll look at work undertaken in the
UNIVERSE-HPC project, and through related activities and groups, to develop an
understanding of the different routes to grow HPC skills, and the specific
skill sets that are required. In the context of inspiring the next generation
of HPC practitioners, we’ll look at our developing “visible HPC” programme
which aims to address the fact that people rarely have the opportunity to see
large-scale computing infrastructure. They generally interact with it through a
terminal on a laptop or desktop computer where there is little visible
difference between connecting to a local server or the world’s largest
supercomputers!
Title: EasyBob, the friendly software installation bot
Authors: Jörg Saßmannshausen
Type: Regular talk
Time/track: 16:10-16:30, Computing infrastructure/HPC track
Abstract:
Installation of software on a heterogeneous cluster consisting out of several
CPU and GPU architectures can be quite a challenge when done manually. The
risks are:
- the installation is not reproducible
- the installation is only done for one CPU micro-architecture (common denominator)
- required dependencies versions vary from installation to installation
This is not in line of a modern approach regarding software installation, which
requires an installation to be reproducible.
At Imperial College London, we were facing this issue as our clusters consist
of several Intel CPU micro-architectures, some AMD nodes and a healthy mixture
of GPU nodes as well. To make the mix more interesting, the cluster internally
has an IPv6 network only.
So we came up with an automatic installation program called EasyBob [1].
EasyBob not only allows us to do the installation micro-architecture specific,
taking into account different CPU/GPU nodes, it also could serve as the
‘backend’ of the self-service portal where users can request software to be
installed via our ticket system in the future. To enable this, we wrote a robot
which uses EasyBuild and only requires the name of the EasyConfig file of the
software to be installed. That name can conveniently derived from a user-facing
web-interface. Furthermore, 1st line support could kick off the bot without
having any knowledge about EasyBuild or the bot itself.
This presentation gives an insight into the mechanics of the bot, how it can be
configured and how it is used at Imperial.
[1] https://github.com/sassy-crick/easybob
Title: Shaping Research Culture Through Communities: Lessons from Open Science
Authors: Malvika Sharan, The Alan Turing Institute and OLS
Type: Closing keynote, General track
Slides: Presentation slides (via Zenodo)
Abstract:
Ever been in a research team or community where you felt truly welcome,
empowered, and excited to engage? This experience is rarely accidental. The
secret sauce lies in intentional facilitation, genuine spaces for
collaboration, and inclusive community management. My talk will explore lessons
(the ingredients) from both a community member’s and a community builder’s
perspective. Drawing from my experience participating in and building Open
Science communities, specifically The Turing Way and Open Life Science (OLS), I
will highlight key aspects and actionable strategies for engaging and
supporting research communities. Attendees, whether in ‘formal’ or informal
roles, will leave with valuable insights and familiar reminders on fostering
inclusive communities, improving research culture, and preserving the inherent
joy of collaboration. Ultimately, it’s about investing in communities to
achieve research goals that serve our society.
Poster abstracts
Title: Designing support strategies for enabling cultural change in data practices for the physical sciences
Authors: Nicola Knight, Samantha Pearman-Kanza, Louise Saul and Cerys Willoughby
Type: Poster
Abstract:
Support strategies, such as training and resource provision, are essential in
the process of improving the uptake of practices aligned with research
excellence. We will describe how the Physical Sciences Data Infrastructure
(PSDI), an initiative developed in response to a disciplinary need, has
developed resources and infrastructure to promote a cultural shift towards
implementing sustainable data practices. We will describe how input was sought
from key stakeholders to identify the areas where training and resources were
needed and how this led to the development of asynchronous training modules, a
knowledge base with the collected essential information, and seminars to
provide the opportunity for peer discussion and learning. Furthermore, we will
detail the collection of feedback from the community and how this informed
evaluation processes.
Title: Research Data Stewardship at UCL
Authors: Katarina Buntic, Mahmoud Abdelrazek, Martin Donnelly, Daniel
Delargy, James A J Wilson, Michelle Harricharan, Nicholas Owen, Preeti Matharu,
Sulyman Abdulkareem, Shipra Suman, Victor Olago, Victoria Yorke-Edwards,
Angharad Green, Farzan Ramzan, Georg Otto, Murat Halit, Socrates Varakliotis
and Jack Hindley
Type: Poster
Abstract:
This poster defines the data stewardship model adopted by Advanced Research
Computing (ARC) at University College London (UCL), focusing on the
formalisation and professionalisation of the research data steward role/job
family. UCL’s Research Data group is actively engaged in supporting research
services and collaborating within UCL and with other institutions on various
research projects.
ARC is both a research centre and a professional services provider,
collaborating on numerous projects across various UCL departments, taking a
holistic approach and providing tailored solutions. We showcase examples of
these projects and the services we offer, including the Research Data Storage
Service, Research Data Repository, and Electronic Research Notebook. Some
flagship collaborative projects include Harbour, where we assist in organising
and streamlining datasets collected by UCL researchers, and E-Child, where our
data stewards contribute to integrating health, education, and social care
information for all children in England.
Title: Building a Scalable, Open Research Data Repository with Ceph and Customised InvenioRDM
Authors: Irufan Ahmed
Type: Poster
Abstract:
This poster presents our experience developing a scalable, institution-hosted
research data repository by integrating a customised instance of InvenioRDM
with Ceph object storage. The repository is designed to support HPC simulation
and experimental datasets, emphasising open infrastructure and institutional
control.
We outline key architectural decisions, including our evaluation of Ceph
deployment strategies (bare-metal vs. containerised cephadm) and storage
interfaces (CephFS vs. RGW S3). We also present the integration of Ceph with a
customised InvenioRDM instance that enables partial file retrieval and
institutional LDAP authentication. A key aspect of our work involved designing
custom metadata schemas to support domain-specific workflows. One example draws
on NASA’s PDS4 data modelling approach, providing a structured and extensible
framework for describing large-scale simulation outputs. We share lessons
learned in configuring Ceph for research data access, extending InvenioRDM’s
functionality, and resolving integration challenges between complex open-source
components.
This work will interest research computing professionals seeking sustainable,
extensible alternatives to commercial cloud platforms. It aims to support wider
discussions around digital research infrastructure and promote community-driven
development.