How reliable is your IoT platform?

Surprisingly, reliability is not the most important priority for most IoT platform vendors, but it’s incredibly important for enterprise customers. Reliability is defined in the Merriam-Webster online dictionary as, “the extent to which an experiment, test, or measuring procedure yields the same results on repeated trials.”

Testing, quantifying, fixing, and guaranteeing on-going reliability takes excellent performance testing software, knowledge about IoT testing, and an IoT platform vendor that considers reliability one of its top priorities. Based on IoT performance testing that MachNation completes for IoT platform vendors, only the highest quality vendors even wish to disclose quantitative metrics proving their platforms’ reliability.

If you care about IoT solution reliability, these 3 questions will help you determine if your platform provider feels the same. You’ll know very quickly whether your vendor strongly focuses on IoT platform reliability and can prove it with independently verified test metrics.

Question #1: Can you prove that your IoT platform offers at least 99.9% reliability for data queried out of your digital twin?

Why ask this question? Let me share a little secret from the IoT world: getting data into a digital twin is easy, but querying data out is consistently one of the poorest performing parts of an IoT platform. IoT platform vendors spend time and money making sure that data can flow from devices into their IoT platforms reliably. However, they almost never devote enough time and technical resources to guarantee you can get data out reliably.

Why is this an important question to ask? The value of an IoT solution isn’t driven by IoT device data sitting in a database. The value of an IoT solution is driven by the business insights derived from the data. Therefore, there needs to be a reliable way to get IoT data out of an IoT platform. For a production-scale solution, this is often accomplished by querying device digital twins in the IoT platform to extract the data you need.

What reliability data should your vendor give you? Your IoT platform vendor should be able to provide you with query failure rates and latencies for the top 5 API query methods for their IoT platform. Your vendor should know how often API queries fail at varying levels of scale from tens of queries to thousands of queries per second. And for these API query methods, your vendor should know query latency statistics (i.e., how long it takes for queries to be successful) at various scales.

What will your vendor’s answer tell you about IoT platform reliability? If an IoT vendor’s platform is capable of at least 99.9% reliability for digital twin queries with average latency less than 300 milliseconds, you can be fairly confident that properly implemented IoT applications will perform well on this platform.

Question #2: Can you prove that your IoT platform can remotely update firmware and software on my entire fleet of connected devices within 24 hours with an extremely low failure rate?

Why ask this question? Firmware- and software-over-the-air (FOTA and SOTA) updates are taxing workflows on an IoT platform. A high-quality IoT device management platform should be able to handle bulk device updates efficiently, while successfully dealing with errors caused by interrupted connectivity to the cloud, partially completed updates, firmware version incompatibility, and more.

Why is this an important question to ask? Enterprises update IoT devices to keep them secure from bad actors and malware. When serious security vulnerabilities are discovered, we have precious little time to implement firmware and software patches and know that it was done correctly on every device. Exposed IoT devices are a threat to the well-being of an enterprise and can result in compromises to IoT, OT, and IT technology and networks.

What reliability data should your vendor give you? Your IoT platform vendor should be able to provide you with independently produced metrics showing average and maximum speeds for FOTA and SOTA updates at production-level scale. They should also be able to give you data showing performance metrics for devices that do not update the first time (i.e., error handling metrics) and mean-time-to-complete delta updates on these devices.

What will your vendor’s answer tell you about IoT platform reliability? If an IoT vendor’s platform is capable of production-scale device updates approaching 100 updates per second with less than a 0.3% failure rate, you can be fairly confident that you can patch the most serious security vulnerabilities appropriately and offer enterprise-grade performance.

Question #3: Can you provide performance metrics proving your platform’s reliability during a worst-case IoT scenario, like a massive disconnect of IoT devices?

Why ask this question? The best IoT platform vendors have created their platforms to gracefully recover from worst-case events that might happen to their customers’ IoT solutions. Besides a DDoS event or other security breach, some of the worst events are when large numbers of IoT devices unexpectedly disconnect from an IoT platform and then simultaneously try to reconnect. MachNation calls this a mass disconnect event.

Why is this an important question to ask? Enterprises deploying IoT solutions want to quantify and plan for worst-case risks. When large numbers of IoT devices unexpectedly disconnect and reconnect to an IoT platform, such as during the loss of an availability zone, enormous pressure is put on an IoT platform. On an untested IoT platform, this often leads to an entire IoT solution failing. During IoT performance testing, MachNation is able to simulate this type of mass disconnect event. In these simulations, we’ve seen the powerful, unplanned impacts of an unprepared IoT platform.

What reliability data should your vendor give you? Your IoT platform vendor should be able to provide independently produced performance metrics from a simulated mass disconnect event. The data should show:

  • Number of devices disconnecting from, and remaining connected to, the IoT platform
  • Percentage of devices able to reconnect on their first, second, third, or more attempts
  • Maximum time required for devices to reconnect
  • Message and query failure rates during the mass disconnect event for devices still connected to the IoT platform

What will your vendor’s answer tell you about IoT platform reliability? If an IoT vendor’s platform is capable of gracefully recovering from a large mass disconnect event with 100% of devices reconnecting within 5 minutes and less than 0.5% message and query failure during the event, you can be fairly confident that the vendor’s platform can withstand very heavy, unplanned processing stress.

Conclusion

Reliability is a critical, yet rarely quantified aspect of an IoT solution. If you’re an enterprise that cares about IoT solution reliability, you should insist on having independently produced performance metrics. There are many more aspects to scalability and performance that are specific to each IoT use case. We’ll cover more of these aspects in a separate article soon.

Comment on this article below or via Twitter: @IoTNow_OR @jcIoTnow

How reliable is your IoT platform?