Data Skeptic

Data Skeptic

Kyle Polich

40 minutes 26 seconds

Auditing LLMs and Twitter

Our guests, Erwan Le Merrer and Gilles Tredan, are long-time collaborators in graph theory and distributed systems. They share their expertise on applying graph-based approaches to understanding both large language model (LLM) hallucinations and shadow banning on social media platforms.

In this episode, listeners will learn how graph structures and metrics can reveal patterns in algorithmic behavior and platform moderation practices.

Key insights include the use of graph theory to evaluate LLM outputs, uncovering patterns in hallucinated graphs that might hint at the underlying structure and training data of the models, and applying epidemic models to analyze the uneven spread of shadow banning on Twitter.

-------------------------------

Want to listen ad-free? Try our Graphs Course? Join Data Skeptic+ for $5 / month of $50 / year

https://plus.dataskeptic.com

29 January 2025, 6:53 pm
37 minutes 23 seconds

Fraud Detection with Graphs

In this episode, Šimon Mandlík, a PhD candidate at the Czech Technical University will talk with us about leveraging machine learning and graph-based techniques for cybersecurity applications.

We'll learn how graphs are used to detect malicious activity in networks, such as identifying harmful domains and executable files by analyzing their relationships within vast datasets.

This will include the use of hierarchical multi-instance learning (HML) to represent JSON-based network activity as graphs and the advantages of analyzing connections between entities (like clients, domains etc.).

Our guest shows that while other graph methods (such as GNN or Label Propagation) lack in scalability or having trouble with heterogeneous graphs, his method can tackle them because of the "locality assumption" – fraud will be a local phenomenon in the graph – and by relying on this assumption, we can get faster and more accurate results.

-------------------------------

Want to listen ad-free? Try our Graphs Course? Join Data Skeptic+ for $5 / month of $50 / year

https://plus.dataskeptic.com

22 January 2025, 3:04 am
38 minutes 4 seconds

Optimizing Supply Chains with GNN

Thibaut Vidal, a professor at Polytechnique Montreal, specializes in leveraging advanced algorithms and machine learning to optimize supply chain operations. In this episode, listeners will learn how graph-based approaches can transform supply chains by enabling more efficient routing, districting, and decision-making in complex logistical networks.

Key insights include the application of Graph Neural Networks to predict delivery costs, with potential to improve districting strategies for companies like UPS or Amazon and overcoming limitations of traditional heuristic methods.

Thibaut’s work underscores the potential for GNN to reduce costs, enhance operational efficiency, and provide better working conditions for teams through improved route familiarity and workload balance.

15 January 2025, 3:34 pm
47 minutes 47 seconds

The Mystery Behind Large Graphs

Our guest in this episode is David Tench, a Grace Hopper postdoctoral fellow at Lawrence Berkeley National Labs, who specializes in scalable graph algorithms and compression techniques to tackle massive datasets.

In this episode, we will learn how his techniques enable real-time analysis of large datasets, such as particle tracking in physics experiments or social network analysis, by reducing storage requirements while preserving critical structural properties.

David also challenges the common belief that giant graphs are sparse by pointing to a potential bias: Maybe because of the challenges that exist in analyzing large dense graphs, we only see datasets of sparse graphs? The truth is out there…

David encourages you to reach out to him if you have a large scale graph application that you don't currently have the capacity to deal with using your current methods and your current hardware. He promises to "look for the hammer that might help you with your nail".

10 January 2025, 1:49 am
38 minutes 7 seconds

Customizing a Graph Solution

In this episode, Dave Bechberger, principal Graph Architect at AWS and author of "Graph Databases in Action", brings deep insights into the field of graph databases and their applications.

Together we delve into specific scenarios in which Graph Databases provide unique solutions, such as in the fraud industry, and learn how to optimize our DB for questions around connections, such as "How are these entities related?" or "What patterns of interaction indicate anomalies?"

This discussion sheds light on when organizations should consider adopting graph databases, particularly for cases that require scalable analysis of highly interconnected data and provides practical insights into leveraging graph databases for performance improvements in tasks that traditional relational databases struggle with.

16 December 2024, 11:03 pm
32 minutes 48 seconds

Graph Transformations

In this episode, Adam Machowczyk, a PhD student at the University of Leicester, specializes in graph rewriting and its intersection with machine learning, particularly Graph Neural Networks.

Adam explains how graph rewriting provides a formalized method to modify graphs using rule-based transformations, allowing for tasks like graph completion, attribute prediction, and structural evolution.

Bridging the worlds of graph rewriting and machine learning, Adam's work aspire to open new possibilities for creating adaptive, scalable models capable of solving challenges that traditional methods struggle with, such as handling heterogeneous graphs or incorporating incremental updates efficiently.

Real-life applications discussed include using graph transformations to improve recommender systems in social networks, molecular research in chemistry, and enhancing IoT network analysis.

9 December 2024, 6:10 am
36 minutes 42 seconds

Networks for AB Testing

In this episode, the data scientist Wentao Su shares his experience in AB testing on social media platforms like LinkedIn and TikTok.

We talk about how network science can enhance AB testing by accounting for complex social interactions, especially in environments where users are both viewers and content creators. These interactions might cause a "spillover effect" meaning a possible influence across experimental groups, which can distort results.

To mitigate this effect, our guest presents heuristics and algorithms they developed ("one-degree label propagation”) to allow for good results on big data with minimal running time and so optimize user experience and advertiser performance in social media platforms.

25 November 2024, 3:00 pm
37 minutes 52 seconds

Lessons from eGamer Networks

Alex Bisberg, a PhD candidate at the University of Southern California, specializes in network science and game analytics, with a focus on understanding social and competitive success in multiplayer online games.

In this episode, listeners can expect to learn from a network perspective about players interactions and patterns of behavior. Through his research on games, Alex sheds light on how network analysis and statistical tests might explain positive contagious behaviors, such as generosity, and explore the dynamics of collaboration and competition in gaming environments. These insights offer valuable lessons not only for game developers in enhancing player experience, engagement and retention, but also for anyone interested in understanding the ways that virtual interactions shape social networks and behavior.

18 November 2024, 3:00 pm
42 minutes 24 seconds

Github Collaboration Network

In this episode we discuss the GitHub Collaboration Network with Behnaz Moradi-Jamei, assistant professor at James Madison University. As a network scientist, Behnaz created and analyzed a network of about 700,000 contributors to Github's repository. The network of collaborators on GitHub was created by identifying developers (nodes) and linking them with edges based on shared contributions to the same repositories. This means that if two developers contributed to the same project, an edge (connection) was formed between them, representing a collaborative relationship network consisting of 32 million such connections. By using algorithms for Community Detection, Behnaz's analysis reveals insights into how developer communities form, function, and evolve, that can be used as guidance for OSS community managers.

11 November 2024, 8:49 pm
41 minutes 59 seconds

Graphs and ML for Robotics

We are joined by Abhishek Paudel, a PhD Student at George Mason University with a research focus on robotics, machine learning, and planning under uncertainty, using graph-based methods to enhance robot behavior. He explains how graph-based approaches can model environments, capture spatial relationships, and provide a framework for integrating multiple levels of planning and decision-making.

4 November 2024, 2:00 pm
52 minutes 8 seconds

Graphs for HPC and LLMs

We are joined by Maciej Besta, a senior researcher of sparse graph computations and large language models at the Scalable Parallel Computing Lab (SPCL). In this episode, we explore the intersection of graph theory and high-performance computing (HPC), Graph Neural Networks (GNNs) and LLMs.

29 October 2024, 3:13 pm
More Episodes? Get the App

About Data Skeptic

Links

Your feedback is valuable to us. Should you encounter any bugs, glitches, lack of functionality or other problems, please email us on [email protected] or join Moon.FM Telegram Group where you can talk directly to the dev team who are happy to answer any queries.