Scale Cast – A podcast about big data, distributed systems, and scalability

Unknown

A podcast about big data, distributed systems, and scalability

  • An Introduction to ZooKeeper Video

    In 2006 we were building distributed applications that needed a master, aka coordinator, aka controller to manage the sub processes of the applications. It was a scenario that we had encountered before and something that we saw repeated over and over again inside and outside of Yahoo!.

    For example, we have an application that consists of a bunch of processes. Each process needs be aware of other processes in the system. The processes need to know how requests are partitioned among the processes. They need to be aware of configuration changes and failures. Generally an application specific central control process manages these needs, but generally these control programs are specific to applications and thus represent a recurring development cost for each distributed application. Because each control program is rewritten it doesn’t get the investment of development time to become truly robust, making it an unreliable single point of failure.

    link to podcast

    26 April 2008, 6:57 pm
  • More Optimal Bloom Filters

    The Bloom filter, conceived by Burton H. Bloom in 1970, is a
    space-efficient probabilistic data structure that is used to test
    whether an element is a member of a set. False positives are possible,
    but false negatives are not. Elements can be added to the set, but not
    removed (though this can be addressed with a counting filter). The
    more elements that are added to the set, the larger the probability of
    false positives.

    For example, one might use a Bloom filter to do spell-checking in a
    space-efficient way. A Bloom filter to which a dictionary of correct
    words has been added will accept all words in the dictionary and
    reject almost all words which are not, which is good enough in some
    cases. Depending on the false positive rate, the resulting data
    structure can require as little as a byte per dictionary word.

    In the last few years Bloom filter become hot topic again and there
    were several modifications and improvements. In this talk I will
    present my last few improvements in this topic.

    Speaker: Ely Porat
    Ely Porat received his Doctorate from Bar-Ilan University in 2000.
    Following that, he fulfilled his military service and, in parallel,
    worked as a faculty member at Bar-Ilan University. Having spent the
    spring 2007 semester as a Visiting Scientist in Google, he is now back
    at Bar-Ilan University.

    The main body of Ely Porat’s work concerns matching problems: string
    matching, pattern matching, subset matching. He also worked on the
    nearest pair problem in high-dimensional spaces as well as sketching
    and edit distance.

    link

    18 April 2008, 5:13 pm
  • An Overview of High Performance Computing and Challenges for the Future

    In this talk we examine how high performance computing has changed
    over the last 10-year and look toward the future in terms of trends.
    These changes have had and will continue to have a major impact on our
    software. A new generation of software libraries and algorithms are
    needed for the effective and reliable use of (wide area) dynamic,
    distributed and parallel environments. Some of the software and
    algorithm challenges have already been encountered, such as management
    of communication and memory hierarchies through a combination of
    compile–time and run–time techniques, but the increased scale of
    computation, depth of memory hierarchies, range of latencies, and
    increased run–time environment variability will make these problems
    much harder.

    Link to video

    8 April 2008, 12:33 am
  • Disk-Based Parallel Computation, Rubik’s Cube, and Checkpointin

    This talk takes us on a journey through three varied, but interconnected
    topics. First, our research lab has engaged in a series of disk-based
    computations extending over five years. Disks have traditionally
    been used for filesystems, for virtual memory, and for databases.
    Disk-based computation opens up an important fourth use: an abstraction
    for multiple disks that allows parallel programs to treat them in a
    manner similar to RAM. The key observation is that 50 disks have
    approximately the same parallel bandwidth as a _single_ RAM subsystem.
    This leaves latency as the primary concern. A second key is the use
    of techniques like delayed duplicate detection to avoid latency

    link to video

    29 March 2008, 5:36 am
  • Announcing Scalecast – A Meta Podcast about Designing Scalable Systems

    Today, I’m announcing a new meta podcast about designing scalable systems named Scalecast.

    I’ve been seeing more and more conference interesting presentations online but I can’t get them to work with my iphone/ipod since they require streaming flash video.

    I now have a simple script that can fetch the flash video from youtube, transcode it to mp4 video, including AAC audio, and publish it to WordPress. It’s then available for use on any Apple device including legacy ipods and more modern iphones.

    If you have a suggestion for a video to include in this podcast, just add it as a comment on this post and I’ll try to transcode for you.

    I primarily did this just for myself. I need to be able to view these videos for my work and my primary mechanism for doing so is my iphone.

    4 January 2008, 8:35 am
  • Lecture 5: Cluster Computing and MapReduce
    3 January 2008, 3:26 am
  • Lecture 3: Cluster Computing and MapReduce
    3 January 2008, 1:12 am
  • Lecture 4: Cluster Computing and MapReduce
    3 January 2008, 1:11 am
  • Lecture 2: Cluster Computing and MapReduce
    3 January 2008, 1:10 am
  • Lecture 1: Cluster Computing and MapReduce

    Lecture 1 in a five part series introducing mapreduce and cluster computing. See http://code.google.com/edu/… for slides and other resources.

    Link to video

    3 January 2008, 1:08 am
  • More Episodes? Get the App
© MoonFM 2025. All rights reserved.