Speaking Of Reliability: Friends Discussing Reliability Engineering Topics | Warranty | Plant Maintenance

Reliability.FM: Accendo Reliability, focused on improving your reliability program and career

Gain the experience of your peers to accelerate improvement of your program and career. Improve your product development process, reliability or warranty performance; or your plant uptime or asset performance. Learn about reliability and maintenance engineering practical approaches, skills, and techniques. Join the conversation today.

  • Why is PoF so Hard?

    Why is PoF so Hard?

    Abstract

    Chris and Fred discuss why the Physics of Failure (PoF) is hard to model? … or is it?

    Key Points

    Join Chris and Fred as they discuss how the Physics of Failure (PoF) is seen as hard to use to model time to failure of something. It usually needs a detailed equation or formula to model how long it takes for something to fail based on physical parameters like grain size, modulus, strain exponent and so on. Sounds hard!

    Topics include:

    • What does PoF mean? It means that instead of doing things like testing products until failure to see the spread of times to failure (as in, how probability is distributed), an ‘accurate’ model that might have lots of parameters based on material properties is used instead of teasing to quickly and accurately model time to failure.
    • So what’s the problem? It can be really, really hard to know which of the thousands of complex equations are the one(s) that describe how your product fails. There are resources out there that have huge lists of PoF models (and their detailed equations) for you to pick from. But then … how do you know which one perfectly captures the way your thing fails?
    • Then there are the parameters. Some PoF models require tens of parameters to be known. But if you don’t know what these parameters are … you are in trouble. Some people just ‘guess’ these parameters based on similar materials or scenarios. The problem with this is now that you are modeling someone else’s failure that may or may not be similar to yours.
    • But we do use PoF more than we might think. When we do Accelerated Life Testing (ALT), we often use what we call ‘Arhennius Plots.’ These are charts that happen to make it really easy for us to see and model how increasing the temperature of a product speeds up the failure process. This allows us to ‘accelerate’ testing by increasing temperature to not have to spend 10 years testing products to understand service reliability. But … ‘Arhennius Plots’ only work for failure mechanisms that are based on chemical reactions (like corrosion, dendritic growth and so on). And many people try and use ‘Arrhenius Plots’ for things that are not chemical reactions.
    • Again … work out what decision you are trying to inform. This will help you see if you need to understand PoF, do your own test, use expert judgment or anything else!

    Enjoy an episode of Speaking of Reliability. Where you can join friends as they discuss reliability topics. Join us as we discuss topics ranging from design for reliability techniques to field data analysis approaches.

    Download Audio RSS

    Show Notes

    The post SOR 966 Why is PoF so Hard? appeared first on Accendo Reliability.

    17 May 2024, 10:28 am
  • MTBF, Really?

    MTBF, Really?

    Abstract

    Chris and Fred discuss the MTBF … again. And again. People don’t (want to) get it. So here we go again …

    Key Points

    Join Chris and Fred as they discuss the MTBF and why it should virtually never be used. Why?

    Topics include:

    • What’s wrong with the MTBF when it comes to reliability? When we assume that the only thing we need to understand is the MTBF, we can never use reliability models that include any form of early wear-in or late wear-out. So, it means we assume a constant hazard rate, which means your thing never stays young and never gets old. That’s right, a 100-year-old product that is somehow still working is just as likely to survive the next day as one that comes out of the box.
    • But when I assume (just) the MTBF, I get better results than when we do more detailed analysis. A Toyota Corolla has a 1.6 Litre engine. So does a F1 race car. Now let’s say that you measured the top speeds of both cars. For the F1 race car, we get 372.499 km/h or 231.46 mph. For the Toyota Corolla, we get 188.3 km/h or 117.0 mph. But let’s now say that we don’t like the top speed of the Toyota Corolla, and would like it to be higher. What you could do is pretend you didn’t measure the top speed of the Toyota Corolla, and then assume that because it’s engine is the same size as the F1 race car’s engine … we assume it has the same top speed as the F1 race car. Crazy right? … just as crazy as assuming an MTBF or constant hazard rate because you like the number you get better.
    • Ostriches don’t actually put their heads in the sand … but many ‘reliability engineers’ do. When we ask some organizations and reliability engineers why they still use nothing but the MTBF, they say things like ‘we’ve never seen it be anything else.‘ And when we ask what, if anything, they have done to look for evidence to the contrary … ‘we just assume we are in the bottom of the bathtub curve.’ Some people don’t know that no system actually has a ‘bathtub curve’ that we see beautifully traced out in a textbook. So why are we still here?

    Enjoy an episode of Speaking of Reliability. Where you can join friends as they discuss reliability topics. Join us as we discuss topics ranging from design for reliability techniques to field data analysis approaches.

    Download Audio RSS

    Show Notes

    The post SOR 965 MTBF, Really? appeared first on Accendo Reliability.

    13 May 2024, 10:26 am
  • Finding Failures and Firefighting

    Finding Failures and Firefighting

    Abstract

     

    Kirk and Fred discuss new product market release schedule pressures, and then after customers start finding reliability issues, the actual firefighting begins. Many times, those who quickly can fix the causes of failures, the firefighters, get many more accolades than those who find and mitigate product weaknesses that become failures during the design and development phase.

    Key Points

    Join Kirk and Fred as they discuss the common excuses for not doing enough analysis and testing to discover latent defects before market release, if it does happen. Many products are robust designs and the latent defects are introduced during assembly and final testing.
    Topics include:

    • Suppose HALT reveals significant differences in environmental step stress limits in a small group of 3 to 5 samples. In that case, that is likely an indication of a wide distribution in a component or subsystems that some percentage of the weakest of the distribution will intersect with the worst-case end-use stress environment, even though there are not enough samples to do a statistical analysis with.
    • CAD systems can very well analyze how component variations will affect the functions of the circuits, but they are based on ideal averages and not the real parametric variations in high-volume production.
    • Understanding the root cause of failure is of utmost importance. Even the most robust designs can only be reliable if the manufacturing processes are capable and consistent. This underscores the weight of the responsibility as manufacturing professionals to ensure the quality and reliability of our products. 
    • Real firefighters saving a person from a burning building generally get much more publicity and accolades than the inventors of the smoke detector, which have saved magnitudes of more people by alerting them at the beginning of a fire in time to escape.

    Enjoy an episode of Speaking of Reliability. Where you can join friends as they discuss reliability topics. Join us as we discuss topics ranging from design for reliability techniques to field data analysis approaches.

    Download Audio RSS

    Show Notes

    Please click on this link to access a relatively new analysis of traditional reliability prediction methods article from the US ARMY and CALCE titled  “Reliability Prediction – Continued Reliance on a Misleading Approach”. It is in the public domain, so please distribute freely. Trying to predict reliability for development is a misleading a costly approach.

    You can now purchase the most recent recording of Kirk Gray’s Hobbs Engineering 8 (two 4 hour sessions) hour Webinar “Rapid and Robust Reliability Development 2022 HALT & HASS Methodologies Online Seminar” from this link.

    For more information on the newest discovery testing methodology here is a link to the book “Next Generation HALT and HASS: Robust design of Electronics and Systems” written by Kirk Gray and John Paschkewitz.

    The post SOR 964 Finding Failures and Firefighting appeared first on Accendo Reliability.

    10 May 2024, 10:07 am
  • Proving HALT Works

    Proving HALT Works

    Abstract

    Kirk and Fred discuss the challenge of showing those new to limit discovery using HALT and proving does find relevant future field issues that either already have occurred in a new released product, or in a product under development.

    Key Points

    Join Kirk and Fred as they discuss finding potential weaknesses in a new or established product using HALT, and how we can connect the weakness to field reliability, first, if  the field issue has already been corrected and all products have been retrofitted with a fix, and second, those weaknesses in development that are found in “conditions the product will never experience in the field (HALT)”
    Topics include:

    • The challenge of proving the relevance of a failure under HALT is very dependent on the weakness found. Failures such as component spacing and shorting are typically catastrophic, and most engineers will quickly correct them in the design. Other failures, such as a significant repeating transient voltage spike that damages an I/O interface, will be more challenging to link to field issues if they have not already been observed.
    • Comparing limits and observing large distributions of those limits among the three or more samples used in HALT can help establish the case for the lot or manufacturing variation leading to weak products.
    • Many rush into new product development to HALT before known failures and weaknesses are corrected. Before HALT can be useful, all the prototypes must function correctly, and all time-zero failures must be corrected.

    Enjoy an episode of Speaking of Reliability. Where you can join friends as they discuss reliability topics. Join us as we discuss topics ranging from design for reliability techniques to field data analysis approaches.

    Download Audio RSS

    Show Notes

    Please click on this link to access a relatively new analysis of traditional reliability prediction methods article from the US ARMY and CALCE titled  “Reliability Prediction – Continued Reliance on a Misleading Approach”. It is in the public domain, so please distribute freely. Trying to predict reliability for development is a misleading a costly approach.

    Here is a link to Kirk’s article “Thermal HALT A Tool for Discovery of Signal Integrity and Software Reliability Issues”

    You can now purchase the most recent recording of Kirk Gray’s Hobbs Engineering 8 (two 4 hour sessions) hour Webinar “Rapid and Robust Reliability Development 2022 HALT & HASS Methodologies Online Seminar” from this link.

    For more information on the newest discovery testing methodology here is a link to the book “Next Generation HALT and HASS: Robust design of Electronics and Systems” written by Kirk Gray and John Paschkewitz.

    The post SOR 963 Proving HALT Works appeared first on Accendo Reliability.

    6 May 2024, 10:05 am
  • Limits of Block Diagrams

    Limits of Block Diagrams

    Abstract

    Chris and Fred discuss how we go about modeling the reliability of systems … particularly with things called ‘block diagrams.’ Might this help you?

    Key Points

    Join Chris and Fred as they discuss how we can go about modelling a system, mainly in response to a listener question. The question revolves around modeling a ‘complex’ system that involves a relief valve (which means it only needs to work at certain times), and other valves that redirect things in pipes to three different processes. Where do we start?

    Topics include:

    • What are you trying to achieve? As in … what decision are you trying to inform? Is this to optimize maintenance? … or see if you meet reliability requirements? … or to minimize downtime? … what is it?
    • So what is a Reliability Block Diagram (RBD)? It’s like a fault tree (if you have heard of that) which essentially tells us what combinations of components need to work for the system to work. Now RBDs can’t of themselves tell us if a system is (for example) a parallel system. An RBD might look the same for a two-component load-sharing system as it does for a two-component parallel system. It’s up to you to work out how to model it.
    • And it can be a little complicated. If your emergency relief valve has failed, then your system could still be ‘happily’ working. Until an emergency comes along. So is your system that is still working with a failed relief valve … failed? Your system will only fail when an ’emergency’ comes along (if nothing else fails). So you need to know how often those emergencies come along … There is nothing wrong with an RBD. It’s just that it can’t do all the thinking.

    Enjoy an episode of Speaking of Reliability. Where you can join friends as they discuss reliability topics. Join us as we discuss topics ranging from design for reliability techniques to field data analysis approaches.

    Download Audio RSS

    Show Notes

    The post SOR 962 Limits of Block Diagrams appeared first on Accendo Reliability.

    3 May 2024, 10:28 am
  • Where do Confidence Bounds Come From

    Where do Confidence Bounds Come From

    Abstract

    Chris and Fred discuss where the ideas of ‘confidence bounds’ come from … and perhaps what they mean.

    Key Points

    Join Chris and Fred as they discuss how we come up with things we call ‘confidence bounds.’ What are they? … and how do they help?

    Topics include:

    • What are ‘confidence bounds’? Confidence bounds are usually explained as limits on what we believe some actual value is. A simple example might be when we judge a distance. For example, if you are standing in a field and see a tree, you might think to yourself that the tree might be 70 – 100 meters away. Your best guess might be around 85 meters, but the values 75 and 100 represent the ‘confidence bounds’ on this best guess because you know there is uncertainty involved.
    • So how do we get ‘confidence bounds’? .. it starts with ‘likelihoods.’ Let’s say that you find a size 6.5 shoe in the street (adjusting for the difference in how manufacturers calculate their shoe sizes fore male and female shoes.) We also know that women’s feet tend to need shoes of sizes of 6.5 to 7.5. We also know that men’s feet tend to need shoes of sizes 9 to 10. Some women will randomly have large feet that can exceed sizes 10, 11, 12 and so on. And likewise, some men will randomly have small feet that are less than sizes 6.5, 6, 5.5 and so on. But that said, we know that the shoe with size 6.5 that we found is more likely to be worn by a woman. And that is the basis of everything!
    • Gammas, Chi-squareds, Students-t … what are we talking about? Some really smart people have been able to take the concept of likelihood for things like the mean (times to failure) for random processes. And probability distributions have been developed to help us get confidence bounds based on how each thing fails. These probability distributions quantify the likelihood that potential mean (times to failure) values are ‘true.’ Which can be really helpful … sometimes.

    Enjoy an episode of Speaking of Reliability. Where you can join friends as they discuss reliability topics. Join us as we discuss topics ranging from design for reliability techniques to field data analysis approaches.

    Download Audio RSS

    Show Notes

    The post SOR 961 Where do Confidence Bounds Come From appeared first on Accendo Reliability.

    29 April 2024, 10:25 am
  • FMEA Approaches Debate

    Differing FMEA Approaches

    Abstract

    Carl and Fred discuss their overall approach to FMEA, what works and doesn’t work.

    Key Points

    Join Carl and Fred as they discuss how they approach FMEA to keep it lean, effective and workable. Topics include:

    • Does a longer FMEA make for a better FMEA?
    • If you have the right FMEA team and no one is concerned about a problem, no need to include it in the FMEA.
    • Not seeing forest for trees is high risk
    • Bottom-up FMEA vs Top-down FMEA
    • Problems with “Bottom-up FMEA”
    • Even lower-level FMEAs begin with functions
    • FMEAs use engineering judgment
    • Human judgment of FMEA team members is critical part of process
    • There is not the time or bandwidth to FMEA everything
    • Benefits of starting with System FMEA
    • FMEA “nesting”
    • Tracing component failure propagation to next level and up to the system is critical to assessing risk
    • If one person on an FMEA team is concerned about an issue, the team needs to discuss the issue

    Enjoy an episode of Speaking of Reliability. Where you can join friends as they discuss reliability topics. Join us as we discuss topics ranging from design for reliability techniques to field data analysis approaches.

    Download Audio RSS

    Show Notes

     

    The post SOR 960 Differing FMEA Approaches appeared first on Accendo Reliability.

    26 April 2024, 10:13 am
  • Knotty Detection

    Knotty Detection

    Abstract

    Carl and Fred discuss reader questions on FMEA detection, a subject which can be challenging and confusing. Detection is a key part of FMEA during product development as well as in operation. This podcast will discuss some of the “knottiest” challenges with understanding detection in FMEA.

    Key Points

    Join Carl and Fred as they discuss when and how to use detection in FMEAs. Topics include:

    • Where and when are we detecting the problem?
    • Detection scales can appear reversed: high likelihood of detection is low score
    • MIL-STD 1629a does not use Detection scale during product development
    • There is risk from lack of detection during product development
    • Subject of detection during product development vs detection in operation
    • Example of oil light in vehicle
    • Monitoring and System Response (MSR)
    • Case studies where confusion exists with detection with tests and detection in operation
    • How to detect intermittent problems
    • What to do when conducting an FMEA and the “answer is not in the room”
    • Detection scale can be 1 to 5 or 1 to 10, the key is prioritizing risk
    • You want to detect the problem early in product development, if possible
    • Keep focus on creating a better product

    Enjoy an episode of Speaking of Reliability. Where you can join friends as they discuss reliability topics. Join us as we discuss topics ranging from design for reliability techniques to field data analysis approaches.

    Download Audio RSS

    Show Notes

     

    The post SOR 959 Knotty Detection appeared first on Accendo Reliability.

    22 April 2024, 10:10 am
  • Learning Weibull Analysis

    Learning Weibull Analysis

    Abstract

    Chris and Fred discuss Weibull Analysis and how it can help you can first take your ‘tentative’ steps to learn more about it.

    Key Points

    Join Chris and Fred as they discuss Weibull analysis. This is perhaps one of the most talked about forms of analysis reliability engineers talk about. And so for some people who are first starting to do reliability stuff, it can be a little intimidating to not know about this analysis methodology that everyone else seems to use. So where do you start?

    Topics include:

    • Usually we start by saying ‘find your decision’ … but perhaps this is not the thing you need to do when it comes to trying to learn about what Weibull analysis does. How can you know if your decision can even be helped by Weibull analysis..
    • Humans are visual creatures who can see patterns in things. This is really important. Computers aren’t close to what we can do when it comes to finding corners in straight lines and things like that.
    • Weibull analysis (at its best) is all about turning things like failure data into visual patterns we can see. Data starts looking like a table of numbers in a spreadsheet or something similar. Weibull analysis turns these numbers into points that can be visualized as curves. And these patterns can tell you things like … what should my servicing interval be? … what percentage of products experience infant mortality? … what is the likely dominant failure mechanism?
    • … and it’s not all about software/numbers. Reliability engineering isn’t about force-feeding numbers into Weibull analysis plotting software. If you put numbers in and get numbers out, and don’t know what those numbers mean (or if they are relevant) you will make bad decisions. All the time.

    Enjoy an episode of Speaking of Reliability. Where you can join friends as they discuss reliability topics. Join us as we discuss topics ranging from design for reliability techniques to field data analysis approaches.

    Download Audio RSS

    Show Notes

    The post SOR 958 Learning Weibull Analysis appeared first on Accendo Reliability.

    19 April 2024, 10:04 am
  • Learning From Those Closest

    Learning From Those Closest

    Abstract

    Kirk and Fred discuss the fact that many times those on the assembly and production lines are the ones that have the most information for assembly issues and causes of failures, yet the information they have is not heard by the engineers and management that could improve it.

    Key Points

    Join Kirk and Fred as they discuss getting the information on reliability issues from those workers and technicians assembling the product or running production equipment to the engineers who made the assembly procedures.
    Topics include:

    • Getting engineers to sit on the production lines and perform the procedure they wrote can be difficult even though watching the challenges and potential difficulty of the procedure and failures can be extremely beneficial and can help them relate to the assembly issues.
    • Management by walking around is a common method for knowing the real issues on the production floor, and allows managers and engineers to have a more macro perspective of the entire manufacturing process.
    • Fred tells of his experience finding a solution from a line worker for floating components in a wave solder using a ceramic bead bag that was very cost-effective, even though the engineers had come up with a much more expensive fixture.

    Enjoy an episode of Speaking of Reliability. Where you can join friends as they discuss reliability topics. Join us as we discuss topics ranging from design for reliability techniques to field data analysis approaches.

    Download Audio RSS

    Show Notes

    Please click on this link to access a relatively new analysis of traditional reliability prediction methods article from the US ARMY and CALCE titled  “Reliability Prediction – Continued Reliance on a Misleading Approach”. It is in the public domain, so please distribute freely. Trying to predict reliability for development is a misleading a costly approach.

    You can now purchase the most recent recording of Kirk Gray’s Hobbs Engineering 8 (two 4 hour sessions) hour Webinar “Rapid and Robust Reliability Development 2022 HALT & HASS Methodologies Online Seminar” from this link.

    For more information on the newest discovery testing methodology here is a link to the book “Next Generation HALT and HASS: Robust design of Electronics and Systems” written by Kirk Gray and John Paschkewitz.

    The post SOR 957 Learning From Those Closest appeared first on Accendo Reliability.

    15 April 2024, 10:44 am
  • Getting Failure Feedback

    Getting Failure Feedback

    Abstract

    Kirk and Fred discuss the many required tests before market release and post market ongoing reliability testing and why testing is so necessary.

    Key Points

    Join Kirk and Fred as they discuss the reasons we have to do so many tests to get the feedback on failures sometimes long after the tests have no failures for long periods.
    Topics include:

    • Some companies have big investments in chambers and processes to perform “burn-in” testing, which may have a poor ROI, but they found a reliability issue months ago and that justifies it forever.
    • Testing to find margins and improve them where it is possible (HALT) is the most cost-effective early testing and helps products withstand component and vendor variations. The test should always be compared to what failures are occurring in the field and if not relevant to the field should be eliminated.
    • Field failures are the best and most valuable data on reliability issues, but getting failed parts back for failure analysis can be extremely difficult, and field service engineers are rewarded for quick repair and sending back failed parts is a low priority.
    • Sometimes when a company has an issue with a particular component type, such as Al Electrolytic capacitors, which drives them to develop ongoing vendor highly focused test requirements for every vendor that makes that component type, and while no failures occur, the past fears require them to keep testing regardless of the fact that 100% pass.

    Enjoy an episode of Speaking of Reliability. Where you can join friends as they discuss reliability topics. Join us as we discuss topics ranging from design for reliability techniques to field data analysis approaches.

    Download Audio RSS

    Show Notes

    Please click on this link to access a relatively new analysis of traditional reliability prediction methods article from the US ARMY and CALCE titled  “Reliability Prediction – Continued Reliance on a Misleading Approach”

    You can now purchase the most recent recording of Kirk Gray’s Hobbs Engineering 8 (two 4 hour sessions) hour Webinar “Rapid and Robust Reliability Development 2022 HALT & HASS Methodologies Online Seminar” from this link.

    For more information on the newest discovery testing methodology here is a link to the book “Next Generation HALT and HASS: Robust design of Electronics and Systems” written by Kirk Gray and John Paschkewitz.

    The post SOR 956 Getting Failure Feedback appeared first on Accendo Reliability.

    12 April 2024, 10:41 am
  • More Episodes? Get the App
© MoonFM 2024. All rights reserved.