Learning Machines 101

Richard M. Golden, Ph.D., M.S.E.E., B.S.E.E.

  • 35 minutes 29 seconds
    LM101-086: Ch8: How to Learn the Probability of Infinitely Many Outcomes

    This 86th episode of Learning Machines 101 discusses the problem of assigning probabilities to a possibly infinite set of outcomes in a space-time continuum which characterizes our physical world. Such a set is called an “environmental event”. The machine learning algorithm uses information about the frequency of environmental events to support learning. If we want to study statistical machine learning, then we must be able to discuss how to represent and compute the probability of an environmental event. It is essential that we have methods for communicating probability concepts to other researchers, methods for calculating probabilities, and methods for calculating the expectation of specific environmental events. This episode discusses the challenges of assigning probabilities to events when we allow for the case of events comprised of an infinite number of outcomes. Along the way we introduce essential concepts for representing and computing probabilities using measure theory mathematical tools such as sigma fields, and the Radon-Nikodym probability density function. Near the end we also briefly discuss the intriguing Banach-Tarski paradox and how it motivates the development of some of these special mathematical tools. Check out: www.learningmachines101.com and www.statisticalmachinelearning.com for more information!!!

    20 July 2021, 11:23 pm
  • 30 minutes 51 seconds
    LM101-085:Ch7:How to Guarantee your Batch Learning Algorithm Converges

    This 85th episode of Learning Machines 101 discusses formal convergence guarantees for a broad class of machine learning algorithms designed to minimize smooth non-convex objective functions using batch learning methods. In particular, a broad class of unsupervised, supervised, and reinforcement machine learning algorithms which iteratively update their parameter vector by adding a perturbation based upon all of the training data. This process is repeated, making a perturbation of the parameter vector based upon all of the training data until a parameter vector is generated which exhibits improved predictive performance. The magnitude of the perturbation at each learning iteration is called the “stepsize” or “learning rate” and the identity of the perturbation vector is called the “search direction”. Simple mathematical formulas are presented based upon research from the late 1960s by Philip Wolfe and G. Zoutendijk that ensure convergence of the generated sequence of parameter vectors. These formulas may be used as the basis for the design of artificially intelligent smart automatic learning rate selection algorithms. The material in this podcast is designed to provide an overview of Chapter 7 of my new book “Statistical Machine Learning” and is based upon material originally presented in Episode 68 of Learning Machines 101! Check out: www.learningmachines101.com for the show notes!!!

     

    21 May 2021, 12:59 am
  • 33 minutes 13 seconds
    LM101-084: Ch6: How to Analyze the Behavior of Smart Dynamical Systems

    In this episode of Learning Machines 101, we review Chapter 6 of my book “Statistical Machine Learning” which introduces methods for analyzing the behavior of machine inference algorithms and machine learning algorithms as dynamical systems. We show that when dynamical systems can be viewed as special types of optimization algorithms, the behavior of those systems even when they are highly nonlinear and high-dimensional can be analyzed. Learn more by visiting: www.learningmachines101.com and www.statisticalmachinelearning.com .

    5 January 2021, 8:10 pm
  • 34 minutes 22 seconds
    LM101-083: Ch5: How to Use Calculus to Design Learning Machines

    This particular podcast covers the material from Chapter 5 of my new book “Statistical Machine Learning: A unified framework” which is now available! The book chapter shows how matrix calculus is very useful for the analysis and design of both linear and nonlinear learning machines with lots of examples. We discuss how to use the matrix chain rule for deriving deep learning descent algorithms and how it is relevant to software implementations of deep learning algorithms.  We also discuss how matrix Taylor series expansions are relevant to machine learning algorithm design and the analysis of generalization performance!!

    For additional details check out: www.learningmachines101.com and www.statisticalmachinelearning.com

     

    29 August 2020, 1:21 pm
  • 29 minutes 5 seconds
    LM101-082: Ch4: How to Analyze and Design Linear Machines

    The main focus of this particular episode covers the material in Chapter 4 of my new forthcoming book titled “Statistical Machine Learning: A unified framework.”  Chapter 4 is titled “Linear Algebra for Machine Learning.

    Many important and widely used machine learning algorithms may be interpreted as linear machines and this chapter shows how to use linear algebra to analyze and design such machines. In addition, these same techniques are fundamentally important for the development of techniques for the analysis and design of nonlinear machines. 

    This podcast provides a brief overview of Linear Algebra for Machine Learning for the general public as well as information for students and instructors regarding the contents of Chapter 4 of Statistical Machine Learning. For more details, check out: www.statisticalmachinelearning.com

    23 July 2020, 10:57 pm
  • 37 minutes 20 seconds
    LM101-081: Ch3: How to Define Machine Learning (or at Least Try)

    This particular podcast covers the material in Chapter 3 of my new book “Statistical Machine Learning: A unified framework” with expected publication date May 2020. In this episode we discuss Chapter 3 of my new book which discusses how to formally define machine learning algorithms. Briefly, a learning machine is viewed as a dynamical system that is minimizing an objective function. In addition, the knowledge structure of the learning machine is interpreted as a preference relation graph which is implicitly specified by the objective function. In addition, this week we include in our book review section a new book titled “The Practioner’s Guide to Graph Data”  by Denise Gosnell and Matthias Broecheler. To find out more information visit the website: www.learningmachines101.com .

    9 April 2020, 2:30 pm
  • 31 minutes 43 seconds
    LM101-080: Ch2: How to Represent Knowledge using Set Theory

    This particular podcast covers the material in Chapter 2 of my new book “Statistical Machine Learning: A unified framework” with expected publication date May 2020. In this episode we discuss Chapter 2 of my new book, which discusses how to represent knowledge using set theory notation. Chapter 2 is titled “Set Theory for Concept Modeling”.

    29 February 2020, 10:56 pm
  • 26 minutes 7 seconds
    LM101-079: Ch1: How to View Learning as Risk Minimization

    This particular podcast covers the material in Chapter 1 of my new (unpublished) book “Statistical Machine Learning: A unified framework”. In this episode we discuss Chapter 1 of my new book, which shows how supervised, unsupervised, and reinforcement learning algorithms can be viewed as special cases of a general empirical risk minimization framework. This is useful because it provides a framework for not only understanding existing algorithms but also for suggesting new algorithms for specific applications.

    24 December 2019, 12:06 am
  • 39 minutes 18 seconds
    LM101-078: Ch0: How to Become a Machine Learning Expert

    This particular podcast (Episode 78 of Learning Machines 101) is the initial episode in a new special series of episodes designed to provide commentary on a new book that I am in the process of writing. In this episode we discuss books, software, courses, and podcasts designed to help you become a machine learning expert! For more information, check out: www.learningmachines101.com

    24 October 2019, 1:07 am
  • 24 minutes 15 seconds
    LM101-077: How to Choose the Best Model using BIC

    In this 77th episode of www.learningmachines101.com , we explain the proper semantic interpretation of the Bayesian Information Criterion (BIC) and emphasize how this semantic interpretation is fundamentally different from AIC (Akaike Information Criterion) model selection methods. Briefly, BIC is used to estimate the probability of the training data given the probability model, while AIC is used to estimate out-of-sample prediction error. The probability of the training data given the model is called the “marginal likelihood”.  Using the marginal likelihood, one can calculate the probability of a model given the training data and then use this analysis to support selecting the most probable model, selecting a model that minimizes expected risk, and support Bayesian model averaging. The assumptions which are required for BIC to be a valid approximation for the probability of the training data given the probability model are also discussed.

    2 May 2019, 3:03 am
  • 28 minutes 17 seconds
    LM101-076: How to Choose the Best Model using AIC and GAIC

    In this episode, we explain the proper semantic interpretation of the Akaike Information Criterion (AIC) and the Generalized Akaike Information Criterion (GAIC) for the purpose of picking the best model for a given set of training data.  The precise semantic interpretation of these model selection criteria is provided, explicit assumptions are provided for the AIC and GAIC to be valid, and explicit formulas are provided for the AIC and GAIC so they can be used in practice. Briefly, AIC and GAIC provide a way of estimating the average prediction error of your learning machine on test data without using test data or cross-validation methods. The GAIC is also called the Takeuchi Information Criterion (TIC).

    23 January 2019, 4:27 am
  • More Episodes? Get the App
© MoonFM 2024. All rights reserved.