This special semester was organised by Joel Gibson, Georg Gottwald, and Geordie Williamson and took place in Semester 1 2022

The Machine Learning for the Working Mathematician seminar is an introduction to ways in which machine learning (and in particular deep learning) has been used to solve problems in mathematics. The seminar is an initiative of the Sydney Mathematical Research Institute.

We aim for a toolbox of simple examples, where one can get a grasp on what machine learning can and cannot do. We want to emphasise techniques in machine learning as tools that can be used in mathematics research, rather than a source of problems in themselves. The first six weeks or so will be introductory, and the second six weeks will feature talks from experts on applications.

Two nice examples of recent work that give the ‘flavour’ of the seminar are:

Advancing mathematics by guiding human intuition with AI, a collaboration of Geordie Williamson (an organiser of MLWM), Mark Lackenby and Andras Juhasz (at the University of Oxford) with the team at Google Deepmind which led them to a new theorem in knot theory and a new conjecture in representation theory. This was recently featured in the 2022 State of AI report.
Constructions in combinatorics via neural networks, where Adam Zsolt Wagner (invited to speak) uses ML strategies to come up with several counterexamples to conjectures in graph theory and other combinatorial problems.

Seminars

Week 1 Seminar: Thursday 24th February, Carslaw 273, 3pm – 5pm (world clock)
- Geordie Williamson, Basics of Machine Learning: classic problems in machine learning, kernel methods, deep neural networks, supervised learning, and basic examples (Lecture notes, lecture recording).
- Week 1 Workshop: Friday 25th February, Carslaw 273, 3pm – 4pm. Introduction to Pytorch (Notebook).
Week 2: Thursday 3rd March, Carslaw 273, 3pm – 5pm (world clock)
- Joel Gibson, What can and can’t neural networks do: Universal approximation theorem and convolutional neural networks (Lecture notes, lecture recording).
  - Lecture links: Tensorflow playground, approximation by bumps/ridges, convolution filters.
- Week 2 Workshop: Friday 4th March, Carslaw 273, 3pm – 4pm. Predicting the Möbius function (Notebook).
Week 3: Thursday 10th March, Carslaw 273, 3pm – 5pm (world clock)
- Georg Gottwald, How to think about machine learning: borrowing from statistical mechanics, dynamical systems and numerical analysis to better understand deep learning. (Lecture notes, Lecture recording).
- Week 3 Workshop: Friday 10th March, Carslaw 273, 3pm – 4pm. Continuation of Week 3 (Notebook).
Week 4: Thursday 17th March, Carslaw 273, 3pm – 5pm (world clock)
- Joel Gibson and Georg Gottwald, Regularisation and Recurrent Neural Nets. (Lecture notes 1, Lecture notes 2, Lecture recording 1, Lecture recording 2).
- Week 4 Workshop: Friday 17th March, Carslaw 273, 3pm – 4pm. (Notebook).
Week 5: Thursday 24th March, Carslaw 273, 3pm – 5pm (world clock)
- Geordie Williamson, Geometric Deep Learning (Lecture notes, Lecture recording)
- Week 5 Workshop: Friday 25th March, 3pm – 4pm, online. (Use the usual zoom link). (Notebook)
Week 6: Thursday 31st March, Carslaw 273, 3pm – 5pm (world clock)
- Georg Gottwald, Geometric Deep Learning II, and Geordie Williamson, Saliency and Combinatorial Invariance (Lecture notes 1, Lecture notes 2, Lecture recording 1, Lecture recording 2)
- Week 6 Workshop: Friday 1st April, 3pm – 4pm, online (Use the usual zoom link) (Notebook).
Week 7: Thursday 7th April, Online, 3pm – 5pm (world clock)
- Adam Zsolt Wagner: A simple RL setup to find counterexamples to conjectures in mathematics (Lecture notes, Lecture recording).
- Abstract: In this talk we will leverage a reinforcement learning method, specifically the cross-entropy method, to search for counterexamples to several conjectures in graph theory and combinatorics. We will present a very simplistic setup, in which only minimal changes need to be made (namely the reward function used for RL) in order to successfully attack a wide variety of problems. As a result we will resolve several open problems, and find more elegant counterexamples to previously disproved ones.
- Related paper: Constructions in combinatorics via neural networks.
- Week 7 workshop: Friday 8th April, 3pm – 4pm, Online! (Use the usual zoom link) (Notebook, Notebook V2).
Week 8: Thursday 14th April, Online, 11am (world clock)
- Bamdad Hosseini: Perspectives on graphical semi-supervised learning
- Abstract: Semi-supervised learning (SSL) is the problem of extending label information from a small subset of a data set to the entire set. In low-label regimes the geometry of the unlabelled set is a crucial aspect that should be leveraged in order to obtain algorithms that outperform standard supervised learning. In this talk I will introduce graphical SSL algorithms that rely on manifold regularization in order to incorporate this geometric information. I will discuss interesting connections to linear algebra and matrix perturbations, kernel methods, and theory of elliptic partial differential equations.
- Week 8 workshop: Easter Friday, no workshop!
(Thursday 21st April: Midsemester break, no seminar!)
Week 9: Thursday 28th April, Online, 3pm – 5pm (world clock)
- Carlos Simpson, Machine learning for optimizing certain kinds of classification proofs for finite structures, (Lecture notes, Lecture recording).
- Abstract: We’ll start by looking at the structure of classification proofs for finite semigroups and how to program these in Pytorch. (That could be the subject of the tutorial.) A proof by cuts generates a proof tree—think of solving Sudoku. Its size depends on the choice of cut locations at each stage. This leads to the question of how to choose the cuts in an optimal way. We’ll discuss the Value-Policy approach to RL for this, and discuss some of the difficulties notably in sampling. Then we’ll look at another approach, somewhat more heuristic, that aims to provide a faster learning process with the goal of obtaining an overall gain in time when the training plus the proof are counted together.
- Related paper: Learning proofs for the classification of nilpotent semigroups
- Week 9 workshop: Friday 29th April, 3pm – 4pm, Online and in S225 in the Quadrangle. (Notebook)
Week 10: Thursday 5th May, (Note: 4pm Sydney time (1 hour later than usual!), online) (world clock)
- Alex Davies, A technical history of AlphaZero (Lecture recording).
- Abstract: In 2016 AlphaGo defeated the world champion go player Lee Sedol in a historic 5 game match. In this lecture we will discuss the research behind this system and the innovations that ultimately lead to AlphaZero, which can learn to play multiple board games, including Go, from scratch without human knowledge.
- Week 10 workshop: Friday 6th May, 3pm – 4pm, Online and in S225 in the Quadrangle. (Notebook)
Week 11: Thursday 12th May (Note: 9am Sydney time, online) (world clock)
- Daniel Halpern-Leinster, Learning selection strategies in Buchberger’s algorithm (Lecture recording)
- Abstract: Studying the set of exact solutions of a system of polynomial equations largely depends on a single iterative algorithm, known as Buchberger’s algorithm. Optimized versions of this algorithm are crucial for many computer algebra systems (e.g., Mathematica, Maple, Sage). After discussing the problem and what makes it challenging, I will discuss a new approach to Buchberger’s algorithm that uses reinforcement learning agents to perform S-pair selection, a key step in the algorithm. In certain domains, the trained model outperforms state-of-the-art selection heuristics in total number of polynomial additions performed, which provides a proof-of-concept that recent developments in machine learning have the potential to improve performance of algorithms in symbolic computation.
- Week 11 workshop: Friday 13th May, 3pm – 4pm, Online and in S225 in the Quadrangle. (Notebook)
Week 12: Thursday 19th May, 3pm-4pm (world clock)
- Gitta Kutyniok, Deep Learning meets Shearlets: Explainable Hybrid Solvers for Inverse Problems in Imaging Science (Lecture recording)
- Abstract: Pure model-based approaches are today often insufficient for solving complex inverse problems in medical imaging. At the same time, methods based on artificial intelligence, in particular, deep neural networks, are extremely successful, often quickly leading to state-of-the-art algorithms. However, pure deep learning approaches often neglect known and valuable information from the modeling world and suffer from a lack of interpretability.

In this talk, we will develop a conceptual approach towards inverse problems in imaging sciences by combining the model-based method of sparse regularization by shearlets with the data-driven method of deep learning. Our solvers pay particular attention to the singularity structures of the data. Focussing then on the inverse problem of (limited-angle) computed tomography, we will show that our algorithms significantly outperform previous methodologies, including methods entirely based on deep learning. Finally, we will also touch upon the issue of how to interpret the results of such algorithms, and present a novel, state-of-the-art explainability method based on information theory.

Week 12 workshop: Friday 20th May, 3pm – 4pm, Online and in S225 in the Quadrangle. (Notebook)

Week 13: Thursday 26th May, Online, 3pm – 5pm (world clock)
- Qianxiao Li, Deep learning for sequence modelling. (Lecture notes, Lecture recording)
- Abstract: In this talk, we introduce some deep learning based approaches for modelling sequence to sequence relationships that are gaining popularity in many applied fields, such as time-series analysis, natural language processing, and data-driven science and engineering. We will also discuss some interesting mathematical issues underlying these methodologies, including approximation theory and optimization dynamics.
- Qianxiao has provided some notebooks on GitHub.
- There will be no workshop this week.
Week 14: Thursday 2nd June, (Note: 4pm Sydney time, online) , 4pm (world clock)
- Lars Buesing, Searching for Formulas and Algorithms: Symbolic Regression and Program Induction (Lecture recording, Lecture notes)
- Abstract: In spite of their enormous success as black box function approximators in many fields such as computer vision, natural language processing and automated decision making, Deep Neural Networks often fall short of providing interpretable models of data. In applications where aiding human understanding is the main goal, describing regularities in data with compact formuli promises improved interpretability and better generalization. In this talk I will introduce the resulting problem of Symbolic Regression and its generalization to Program Induction, highlight some learning methods from the literature and discuss challenges and limitations of searching for algorithmic descriptions of data.
- There will be no workshop this week.

References

The main reference for the first half of the course will be Deep Learning by Ian Goodfellow, Yoshua Bengio, and Aaron Courville, freely available online. More specific references given in lectures will also be listed here, in the seminar schedule.

Additional colabs

Here we provide links to a few additional colabs which might be interesting.

Classifying descents in Sn : Here we train a simple neural network to classify the descent sets of a permutation.

Playing with parity : Here we train a simple neural net to learn the parity function, and look at how well it generalizes.Additional colabs

Playing with parity : Here we train a simple neural net to learn the parity function, and look at how well it generalizes.

Classifying descents in Sn : Here we train a simple neural network to classify the descent sets of a permutation.

Weekly overview

Week 1, Geordie Williamson

Basics of Machine Learning

Classic problems in machine learning, kernel methods, deep neural networks, supervised learning, and basic examples.

Week 2, Joel Gibson

What can and can’t neural networks do

Universal approximation theorem and convolutional neural networks

Lecture links: Tensorflow playground, approximation by bumps/ridges, convolution filters

Week 3, Georg Gottwald

How to think about machine learning

Borrowing from statistical mechanics, dynamical systems and numerical analysis to better understand deep learning.

Week 4, Georg Gottwald

Regularisation

Week 4, Joel Gibson

Recurrent Neural Nets

Week 5, Geordie Williamson

Geometric Deep Learning; or never underestimate symmetry

Week 6, Georg Gottwald

Geometric Deep Learning II

Week 6, Geordie Williamson

Saliency and Combinatorial Invariance

Week 7, Adam Zsolt Wagner

A simple RL setup to find counterexamples to conjectures in mathematics

Abstract: In this talk we will leverage a reinforcement learning method, specifically the cross-entropy method, to search for counterexamples to several conjectures in graph theory and combinatorics. We will present a very simplistic setup, in which only minimal changes need to be made (namely the reward function used for RL) in order to successfully attack a wide variety of problems. As a result we will resolve several open problems, and find more elegant counterexamples to previously disproved ones.

Week 8, Bamdad Hosseini

Perspectives on graphical semi-supervised learning [no recording, see lecture notes]

Abstract: Semi-supervised learning (SSL) is the problem of extending label information from a small subset of a data set to the entire set. In low-label regimes the geometry of the unlabelled set is a crucial aspect that should be leveraged in order to obtain algorithms that outperform standard supervised learning. In this talk I will introduce graphical SSL algorithms that rely on manifold regularization in order to incorporate this geometric information. I will discuss interesting connections to linear algebra and matrix perturbations, kernel methods, and theory of elliptic partial differential equations.

Week 9, Carlos Simpson

Machine learning for optimizing certain kinds of classification proofs for finite structures

Abstract: We’ll start by looking at the structure of classification proofs for finite semigroups and how to program these in Pytorch. (That could be the subject of the tutorial.) A proof by cuts generates a proof tree—think of solving Sudoku. Its size depends on the choice of cut locations at each stage. This leads to the question of how to choose the cuts in an optimal way. We’ll discuss the Value-Policy approach to RL for this, and discuss some of the difficulties notably in sampling. Then we’ll look at another approach, somewhat more heuristic, that aims to provide a faster learning process with the goal of obtaining an overall gain in time when the training plus the proof are counted together.

Week 10, Alex Davies

A technical history of AlphaZero

Abstract: In 2016 AlphaGo defeated the world champion go player Lee Sedol in a historic 5 game match. In this lecture we will discuss the research behind this system and the innovations that ultimately lead to AlphaZero, which can learn to play multiple board games, including Go, from scratch without human knowledge.

Week 11, Daniel Halpern-Leinster

Learning selection strategies in Buchberger’s algorithm

Abstract: Studying the set of exact solutions of a system of polynomial equations largely depends on a single iterative algorithm, known as Buchberger’s algorithm. Optimized versions of this algorithm are crucial for many computer algebra systems (e.g., Mathematica, Maple, Sage). After discussing the problem and what makes it challenging, I will discuss a new approach to Buchberger’s algorithm that uses reinforcement learning agents to perform S-pair selection, a key step in the algorithm. In certain domains, the trained model outperforms state-of-the-art selection heuristics in total number of polynomial additions performed, which provides a proof-of-concept that recent developments in machine learning have the potential to improve performance of algorithms in symbolic computation.

Week 12, Gitta Kutyniok

Deep Learning meets Shearlets: Explainable Hybrid Solvers for Inverse Problems in Imaging Science

Abstract: Pure model-based approaches are today often insufficient for solving complex inverse problems in medical imaging. At the same time, methods based on artificial intelligence, in particular, deep neural networks, are extremely successful, often quickly leading to state-of-the-art algorithms. However, pure deep learning approaches often neglect known and valuable information from the modeling world and suffer from a lack of interpretability.

Week 13, Qianxiao Li

Deep learning for sequence modelling

Abstract: In this talk, we introduce some deep learning based approaches for modelling sequence to sequence relationships that are gaining popularity in many applied fields, such as time-series analysis, natural language processing, and data-driven science and engineering. We will also discuss some interesting mathematical issues underlying these methodologies, including approximation theory and optimization dynamics.

Qianxiao has provided some notebooks on GitHub.

Week 14, Lars Buesing

Searching for Formulas and Algorithms: Symbolic Regression and Program Induction

Abstract: In spite of their enormous success as black box function approximators in many fields such as computer vision, natural language processing and automated decision making, Deep Neural Networks often fall short of providing interpretable models of data. In applications where aiding human understanding is the main goal, describing regularities in data with compact formuli promises improved interpretability and better generalization. In this talk I will introduce the resulting problem of Symbolic Regression and its generalization to Program Induction, highlight some learning methods from the literature and discuss challenges and limitations of searching for algorithmic descriptions of data.