Department of Statistics

Department of Statistics

Oxford University

The Department of Statistics at Oxford is a world leader in research including computational statistics and statistical methodology, applied probability, bioinformatics and mathematical genetics. In the 2014 Research Excellence Framework (REF), Oxford's Mathematical Sciences submission was ranked overall best in the UK. This is an exciting time for the Department. We have now moved into our new home on St Giles and we are currently settling in. The new building provides improved lecture and teaching space, a variety of interaction areas, and brings together researchers in Probability and Statistics. It has created a highly visible centre for the Department in Oxford. Since 2010, the Department has been awarded over forty research grants with a total value of £9M, not counting several very large EPSRC and MRC funded awards for Centres for doctoral training.The main sponsors are the European Commission, EPSRC, the Medical Research Council and the Wellcome Trust. We offer an undergraduate degree (BA or MMath) in Mathematics and Statistics, jointly with the Mathematical Institute. At postgraduate level there is an MSc course in Applied Statistics, as well as a lively and stimulating environment for postgraduate research (DPhil or MSc by Research). Our graduates are employed in a wide range of occupational sectors throughout the world, including the university sector. The Department co-hosts the EPSRC and MRC Centre for Doctoral Training (CDT) in Next-Generational Statistical Science- the Oxford-Warwick Statistics Programme OxWaSP.

1 hour 3 minutes

A Theory of Weak-Supervision and Zero-Shot Learning

A lecture exploring alternatives to using labeled training data. Labeled training data is often scarce, unavailable, or can be very costly to obtain. To circumvent this problem, there is a growing interest in developing methods that can exploit sources of information other than labeled data, such as weak-supervision and zero-shot learning. While these techniques obtained impressive accuracy in practice, both for vision and language domains, they come with no theoretical characterization of their accuracy. In a sequence of recent works, we develop a rigorous mathematical framework for constructing and analyzing algorithms that combine multiple sources of related data to solve a new learning task. Our learning algorithms provably converge to models that have minimum empirical risk with respect to an adversarial choice over feasible labelings for a set of unlabeled data, where the feasibility of a labeling is computed through constraints defined by estimated statistics of the sources. Notably, these methods do not require the related sources to have the same labeling space as the multiclass classification task. We demonstrate the effectiveness of our approach with experimentations on various image classification tasks.

9 June 2022, 1:40 pm
50 minutes 33 seconds

Victims of Algorithmic Violence: An Introduction to AI Ethics and Human-AI Interaction

A high-level overview of key areas of AI ethics and not-ethics, exploring the challenges of algorithmic decision-making, kinds of bias, and interpretability, linking these issues to problems of human-system interaction. Much attention is now being focused on AI Ethics and Safety, with the EU AI Act and other emerging legislation being proposed to identify and curb "AI risks" worldwide. Are such ethical concerns unique to AI systems - and not just digital systems in general?

6 April 2022, 9:59 am
52 minutes 45 seconds

The practicalities of academic research ethics - how to get things done

A brief introduction to various legal and procedural ethical concepts and their applications within and beyond academia. It's all very well to talk about truth, beauty and justice for academic research ethics. But how do you do these things at a practical level? If you have a big idea, or stumble across something with important implications, what do you do with it? How do you make sure you've got appropriate safeguards without drowning in bureaucracy?

5 April 2022, 4:05 pm
56 minutes 11 seconds

Statistics, ethical and unethical: Some historical vignettes

David Steinsaltz gives a lecture on the ethical issues in statistics using historical examples.

5 April 2022, 3:56 pm
55 minutes 11 seconds

Joining Bayesian submodels with Markov melding

This seminar explains and illustrates the approach of Markov melding for joint analysis. Integrating multiple sources of data into a joint analysis provides more precise estimates and reduces the risk of biases introduced by using only partial data. However, it can be difficult to conduct a joint analysis in practice. Instead each data source is typically modelled separately, but this results in uncertainty not being fully propagated. We propose to address this problem using a simple, general method, which requires only small changes to existing models and software. We first form a joint Bayesian model based upon the original submodels using a generic approach we call "Markov melding". We show that this model can be fitted in submodel-specific stages, rather than as a single, monolithic model. We also show the concept can be extended to "chains of submodels", in which submodels relate to neighbouring submodels via common quantities. The relationship to the "cut distribution" will also be discussed. We illustrate the approach using examples from an A/H1N1 influenza severity evidence synthesis; integrated population models in ecology; and modelling uncertain-time-to-event data in hospital intensive care units.

5 April 2022, 3:23 pm
55 minutes 17 seconds

Neural Networks and Deep Kernel Shaping

Rapid training of deep neural networks without skip connections or normalization layers using Deep Kernel Shaping. Using an extended and formalized version of the Q/C map analysis of Pool et al. (2016), along with Neural Tangent Kernel theory, we identify the main pathologies present in deep networks that prevent them from training fast and generalizing to unseen data, and show how these can be avoided by carefully controlling the "shape" of the network's initialization-time kernel function. We then develop a method called Deep Kernel Shaping (DKS), which accomplishes this using a combination of precise parameter initialization, activation function transformations, and small architectural tweaks, all of which preserve the model class. In our experiments we show that DKS enables SGD training of residual networks without normalization layers on Imagenet and CIFAR-10 classification tasks at speeds comparable to standard ResNetV2 and Wide-ResNet models, with only a small decrease in generalization performance. And when using K-FAC as the optimizer, we achieve similar results for networks without skip connections. Our results apply for a large variety of activation functions, including those which traditionally perform very badly, such as the logistic sigmoid. In addition to DKS, we contribute a detailed analysis of skip connections, normalization layers, special activation functions like RELU and SELU, and various initialization schemes, explaining their effectiveness as alternative (and ultimately incomplete) ways of "shaping" the network's initialization-time kernel.

5 April 2022, 11:28 am
48 minutes 40 seconds

Introduction to Advanced Research Computing at Oxford

Andy Gittings and Dai Jenkins, deliver a graduate lecture on Advance Research Computing (ARC).

5 April 2022, 11:05 am
39 minutes 49 seconds

Ethics from the perspective of an applied statistician

Professor Denise Lievesley discusses ethical issues and codes of conduct relevant to applied statisticians. Statisticians work in a wide variety of different political and cultural environments which influence their autonomy and their status, which in turn impact on the ethical frameworks they employ. The need for a UN-led fundamental set of principles governing official statistics became apparent at the end of the 1980s when countries in Central Europe began to change from centrally planned economies to market-oriented democracies. It was essential to ensure that national statistical systems in such countries would be able to produce appropriate and reliable data that adhered to certain professional and scientific standards. Alongside the UN initiative, a number of professional statistical societies adopted codes of conduct. Do such sets of principles and ethical codes remain relevant over time? Or do changes in the way statistics are compiled and used mean that we need to review and adapt them? For example as combining data sources becomes more prevalent, record linkage, in particular, poses privacy and ethical challenges. Similarly obtaining informed consent from units for access to and linkage of their data from non-survey sources continues to be challenging. Denise draws on her earlier role as a statistician in the United Nations, working with some 200 countries, to discuss some of the ethical issues she encountered then and how these might change over time.

31 March 2022, 3:50 pm
40 minutes 19 seconds

A Day in the Life of a Statistics Consultant

Maria Christodoulou and Mariagrazia Zottoli share what a standard day is like for a statistics consultant.

31 March 2022, 3:39 pm
56 minutes

Metropolis Adjusted Langevin Trajectories: a robust alternative to Hamiltonian Monte-Carlo

Lionel Riou-Durand gives a talk on sampling methods. Sampling approximations for high dimensional statistical models often rely on so-called gradient-based MCMC algorithms. It is now well established that these samplers scale better with the dimension than other state of the art MCMC samplers, but are also more sensitive to tuning. Among these, Hamiltonian Monte Carlo is a widely used sampling method shown to achieve gold standard d^{1/4} scaling with respect to the dimension. However it is also known that its efficiency is quite sensible to the choice of integration time. This problem is related to periodicity in the autocorrelations induced by the deterministic trajectories of Hamiltonian dynamics. To tackle this issue, we develop a robust alternative to HMC built upon Langevin diffusions (namely Metropolis Adjusted Langevin Trajectories, or MALT), inducing randomness in the trajectories through a continuous refreshment of the velocities. We study the optimal scaling problem for MALT and recover the d^{1/4} scaling of HMC without additional assumptions. Furthermore we highlight the fact that autocorrelations for MALT can be controlled by a uniform and monotonous bound thanks to the randomness induced in the trajectories, and therefore achieves robustness to tuning. Finally, we compare our approach to Randomized HMC and establish quantitative contraction rates for the 2-Wasserstein distance that support the choice of Langevin dynamics. This is a joint work with Jure Vogrinc, University of Warwick.

31 March 2022, 3:19 pm
59 minutes 22 seconds

Modelling infectious diseases: what can branching processes tell us?

Professor Samir Bhatt gives a talk on the mathematics underpinning infectious disease models. Mathematical descriptions of infectious disease outbreaks are fundamental to understanding how transmission occurs. Reductively, two approaches are used: individual based simulators and governing equation models, and both approaches have a multitude of pros and cons. This talk connects these two worlds via general branching processes and discusses (at a high level) the rather beautiful mathematics that arises from them and how they can help us understand the assumptions underpinning mathematical models for infectious disease. This talk explains how this new maths can help us understand uncertainty better, and shows some simple examples. This talk is somewhat technical, but focuses as much as possible on intuition and the big picture.

31 March 2022, 1:58 pm
More Episodes? Get the App

About Department of Statistics

Links

Listeners Also Subscribed To

JAMAevidence JAMA Guide to Statistics and Methods

Your feedback is valuable to us. Should you encounter any bugs, glitches, lack of functionality or other problems, please email us on [email protected] or join Moon.FM Telegram Group where you can talk directly to the dev team who are happy to answer any queries.