Tales From the Ops Side is a fascinating, behind-the-scenes look at the running of an Internet Ops company. Hosted by Hany Fahim – owner of Toronto-based stack.io – Tales From The Ops Side is a must-listen for anyone working in or interested in Interne...
It all started as a simple error message from a piece of software that had run seamlessly thousands of times over the past two years.
And yet, once a week in late 2016, this error message would rear its ugly head. At first, it was thought to be a user error. But after some testing, it was clear that we were dealing with a software bug. Thus began a month-long hunt for this elusive bug, becoming as much a personal vendetta as a solution to an intractable problem.
Software bugs are a fact of life; born out of the ever-increasing complexity of our world. Humans work hard at reducing their frequency and effect, but they can never be eliminated.
In our case, what complicated matters was our inability to replicate the problem, which would have allowed us to reliably test and fix the issue. No matter how many times we tried, we simply could not reproduce this bug, and yet, it continued to haunt us week after week.
The problematic code was only 8 lines long. After several weeks of chasing, these eight lines were re-written and simplified. This should have been the end of the hunt, but instead of using the re-written code, it was decided, perhaps foolishly, that the hunt must go on.
Eventually, the whole company was pulled in to aid in the hunt. The source of the bug was eventually discovered, but at what cost?
Listen to your bug-hunter host Hany Fahim explain how it all went down.
Connect with Hany at his company stack.io and LinkedIn.
Don't forget to leave us a review and subscribe to our channel to keep up with the latest episodes!
Most companies don’t discover they have been hacked until months later. But imagine catching a hacker red-handed? Your course of action would be very different.
Instead of dealing with the aftermath, your focus would be on urgently identifying the source and then blocking the hack.
This is exactly what stack.io CEO and Founder Hany Fahim did when dealing with his first hack, back when he was a young Systems Administrator.
Yes, this episode tells a tale from some years ago. So long ago that our host Hany was using a Blackberry and struggled to remember the course of events. This slight memory haze introduces a humorous interlude in the episode, with current-day Hany and his younger self in a quibble about the true course of events.
The hack started with an internal search tool used company-wide to retrieve important information. Its speed had slowed to a crawl. Hany investigated and discovered a huge backlog of searches. The culprit? The task at the head of the queue was trying to back-up the database.
Hany terminated the task only to have it pop-up again minutes later. He traced the request to a web server, then sent an email to the hardware team and to the entire company about the issue. The situation was resolved, or so he thought.
Suddenly, monitoring alarms started sounding from a highly secured back-up system deep inside the data centre. The hack was coming from inside the office! Hany had the hardware team track it - directly to a vice president’s office. Knowing the VP was on vacation, Hany went into his office and discovered the hackers were remotely logging into the VP’s terminal and using it to back-up the company database. Hany unplugged the computer. It turned out that this was only the beginning.
We won’t give the entire story away here . . . otherwise, that would spoil the episode for you!
Hany notes that humans, not machines, are generally the weak-link when it comes to security. Similar to this tale, the recent SolarWinds hack in December 2020, which breached many Fortune 500 companies and US government agencies (including Homeland Security and the National Nuclear Security Administration), were also connected to human error, rather than the machines.
Connect with Hany at his company stack.io and LinkedIn.
Don't forget to leave us a review and subscribe to our channel to keep up with the latest episodes!
stack.io CEO and show host Hany Fahim puts on his Sherlock Holmes hat in this episode of Tales from the Ops Side.
He's investigating a missing file of sensitive data, while his client and their partner point fingers at each other over fault.
This client had an agreement to provide sensitive data on a nightly basis to a partner. This was automated, the process called a cronjob. Hany's lunch was interrupted in late 2017 when an urgent request came in from this client.
The problem: One night the data was never received by the partner.
Hany shifted into troubleshooting mode, working with his client and examining the guts of their system. It appeared the job did run as scheduled, even though the partner did not receive the file.
What happened?
Digging deeper into alert files, database logs and network graphs didn’t shed any more light on the problem. After examining all the evidence and chasing down logical leads, Hany was no further ahead.
That night at home, he was distracted by the problem. After a late-night of research, he was no further ahead. Over a cup of strong coffee the next morning, he spotted an obscure forum post he had disregarded the night before. The post gave his investigation a new avenue to explore - Linux Process Accounting.
With more sleuthing, hiking through the historical bowels of the internet, and combing through over 1000 lines of code, Hany was rewarded with an answer and knew what the client's problem was and who was at fault.
For the whole story, all the technical details, and Hany's insider view, listen to the episode.
Connect with Hany at his company stack.io and LinkedIn.
If you enjoyed this episode, please share it with anyone you think will enjoy it. And if you can give us a review on Apple Podcasts, we’ll be grateful!
Host Hany Fahim tells an intriguing tale of the global ripple effects of a political event and the phenomenal power of social media.
Hany was - somewhat successfully – at home assembling a chair one evening in 2015. Suddenly, he received news of a huge spike in traffic. Normally, this would point to a number of things, such as a marketing event or even an attack—but nothing seemed to add up.
Hunting for answers, Hany traced the users’ IP addresses. Interestingly, they all seemed to lead back to Brazil. That seemed like a red flag; was this a calculated attack? Rather than trying to puzzle it out on his own, Hany consulted the social media grapevine for answers: Twitter.
When Hany learned of the Brazilian government’s 48-hour ban on WhatsApp, the puzzle pieces began to fall into place. Brazil’s citizens were circumventing the popular app’s ban by using VPNs. Luckily, Hany was able to use this information to find ways to accommodate the spike in traffic, rather than defending against it.
How Brazil Kept Us Up At Night is a thought-provoking look at how business, politics, and technology are deeply intertwined in our social lives. Hany reflects on human ingenuity in the face of government censorship and the positive impacts of social media when it comes to problem solving for ops.
Connect with Hany at his company stack.io and LinkedIn.
If you enjoyed this episode, please share it with anyone you think will enjoy it. And if you can give us a review on Apple Podcasts, we’ll be grateful!
Welcome to Tales From the Ops Side, a new podcast show exploring Internet Ops.
Tales From the Ops Side is produced by stack.io and narrated by stack.io’s founder, Hany Fahim.
The show is an exploration of Hany’s adventures in running stack.io. Hany delves into the latest technologies, trends, and challenges in the business of building and operating web applications in the public cloud.
If you work in DevOps or Internet Ops, you’ll know exactly what Ops Side means. For those who don’t, DevOps is a culture of development and IT operations teams working together.
And before you stop reading, assuming that Tales From the Ops Side will be too technical or dry, hold on! Hany is an outstanding storyteller and uses everyday examples as analogies to make his episodes accessible and engaging.
He takes listeners on some incredible journeys as he describes the extent to which he and his team sometimes go to resolve client issues. Be prepared to hear about leap-seconds, earthquakes and the moon, repressive regimes in Brazil, and Hany’s favourite errors. Yes! All these disparate topics help make up Tales From The Ops Side.
Based in Toronto, Canada, stack.io helps clients implement DevOps practices and adopt a DevOps culture, so that they can focus on their core business services, web application development.
In this trailer episode Hany discusses stack.io’s objectives, its origin story, the latest trends in DevOps, and what it’s been like launching, growing, and running a tech start-up.
Hany attributes his success to the fact that he’s an incredibly curious guy, and a heads-down kind of guy, too. He focuses intensely on the problem before him. This allows him to maintain composure when others could feel overwhelmed.
For example, in Episode One - Denial of Sleep, Hany digs into some pretty vicious and sustained DoS attacks, along with a ransom note for thousands of dollars. Hany’s calmness, curiosity, and ability to dig deep saw him though.
If you work in Internet Ops or appreciate well-told stories full of surprising facts told with intrigue and humour, Tales From The Ops Side is for you!
And now, let’s hear some Tales From the Ops Side with Hany Fahim, interviewed by Sheelagh Caygill.
Host Hany Fahim tells the fascinating and sometimes frightening story of some massive DoS attacks that knocked clients offline for extended periods of time, along with ransom letters demanding thousands of dollars.
Hany and his team spent hundreds of hours dealing with the attacks, which led to a Denial of Sleep, too!
This is a desperate, edge-of-the-seat tale that makes it clear that only those with nerves of steel should get into the Internet Ops business! And of course, the work demands ingenuity, creativity, and a penchant for research.
Should Internet Ops professionals pay ransom fees? How did Hany keep focused and come up with viable solutions? How did he maintain his composure when, on the inside, he was a ball of stress, unable to eat or sleep, having already spent thousands of dollars dealing with these attacks. These are just a few of the questions Hany asks and answers in this episode.
Denial of Sleep is an accessible story, thanks to Hany’s use of an analogy - the coffee shop. This shop represents a web application, and the loyal coffee drinkers are the end users. Meanwhile, the sidewalks, roads, and highways serve as the various networks that people can use to get to the shop. This is the World Wide Web we all know.
Connect with Hany at his company stack.io and LinkedIn.
If you enjoyed this episode, please share it with anyone you think will enjoy it. And if you can give us a review on Apple Podcasts, we’ll be grateful!
Host Hany Fahim tells a roller-coaster tale of an emergency situation that occurred one beautiful summer evening in 2012.
When multiple client websites his company supported mysteriously went down without warning, Hany raced against the clock—quite literally, it turned out—to get everything back up and running.
Although his alerts indicated that high load was at the heart of the issue, when Hany checked things out, he was in for a surprise. Nothing seemed to indicate anything should be wrong. So, what could have possibly all these different systems to go down all at once?
After finding an oddly simple solution, Hany traced the root of the problem back to an unlikely place: time keeping. This discovery took Hany down a research rabbit hole—one that changed his understanding of space and time, and the meaning of interconnectedness. Hany reflects on the unpredictability of the universe itself in this episode, and his tenacious spirit that summer night shows how important it is to rise to the occasion in Ops when things go awry.
Earthquakes and The Moon is an intriguing deep-dive into science, time, and the physics of the universe. Hany’s story twists and turns to show how a single second, the Moon, and an earthquake half a world away can change the course of history.
Hany is founder and CEO of stack.io and you can connect with him there, or on LinkedIn.
If you enjoyed this episode, please share it with anyone you think will enjoy it. And if you can give us a review on Apple Podcasts, we’ll be grateful!
Your feedback is valuable to us. Should you encounter any bugs, glitches, lack of functionality or other problems, please email us on [email protected] or join Moon.FM Telegram Group where you can talk directly to the dev team who are happy to answer any queries.