Predictive Insight, used correctly, is invaluable

How to import data into Predictive Insight, PI and in which cases it adds the most value?

We at Compose IT have for many years collaborated with IBM and worked with their products. In 2018, we did a deep dive into IBM’s Predictive Insight (PI) to see how PI could complement our other products.

In our last article on Predictive Insight we described what PI can do and how it works. We also mentioned two test cases we have worked on and how to transfer data to PI. In this article we will go deeper into how to transfer data to PI. We will also tell you more about the two test cases we mentioned and describe both a case where PI’s results were invaluable and a case PI is not optimized for.

Predictive Insight supports transferring data from many different types of systems. There are several out-of-the-box mediation packages from IBM that transfer data from a source to PI. There are mediation packages for several systems, including Splunk, Nagios, SCOM and ITNM. Data can also be transferred directly from a relational database, CSV files, JSON files and JSON http posts to PI.

When CSV or JSON is used as the import method, there may be a need to process the data into a suitable format. It may be a selection of fields to be exported to PI or perhaps a field needs to be formatted. The most common processing that takes place before data is imported into PI is to format the timestamp as PI requires timestamps in the format of microseconds. If there is a processing need then Elastic Stacks Logstash is the option that IBM recommends. Elastic Stack, of which Logstash is a part, is an open source solution that we at Compose IT also have experience with. Since Logstash is open source based, there are several modules for collecting, filtering and exporting data available. IBM has created modules for Logstash for PI that format the data and write it to a CSV file with PI structure. With the help of Logstash, it becomes possible to format and import data into PI regardless of the source’s data format.

When data is imported from a relational database into PI, PI’s mediation tool is used.
When we imported data into PI for the two test cases mentioned in our previous article, two different import and processing methods were used, and in both test cases production data was used.

In the first test case, we had a known issue with a script intermittently not executing correctly. The problem occurred every night between one and three. To try to find the root cause of the problem, we started collecting and sending performance data from the server running the script as well as information about how the script was executed to PI. This data was collected, compiled into JSON format, written to Kafka, read from Kafka and imported into PI in real-time. In this case, a script was used to extract data from Kafka and import it into PI, but Logstash works just as well. We recommend the combination Kafka -> Logstash as we think Kafka is a very good product and because it is easy to read from Kafka using Logstash.

The idea was that PI would be able to find relationships between the data values that would help us find the root cause of the problem. However, since the problem already existed and occurred on a regular basis when we started providing PI with this data, the problem was considered part of normal behavior. PI only shows correlations between data values when there are two active deviations that relate to each other. Because this problem was considered part of normal behavior, there was no deviation and no relationships were shown. PI found some other interesting things but nothing related to the known problem. AI products like PI are in many cases very good, but this is not one of the cases they are optimized for.

In the second test case, we weren’t looking for anything in particular, but were curious to see what PI would find if it were allowed to analyze irregular non-time series data. To do this, we collected events that had taken place over a period of six months from the event manager Netcool. This data was written to CSV files that were processed using PI’s mediation tool and imported into PI. PI’s analysis of the data values took a few hours, we let the analysis run overnight and the next morning it was ready.

Analyzing non-time series data in PI can be difficult and in some cases completely useless. Because PI needs a large enough data set to be able to find normal behavior and thus find deviations. If there are too few data points during an interval, the system considers missing data to be normal and each data record received becomes an anomaly. For example. if there are very few events, then no events are considered normal and as soon as an event occurs, it is seen as an anomaly. This can be corrected slightly by setting the aggregation interval setting, but the longest possible interval is 60 minutes. Given enough data, PI can be just as valuable for non-time series data as it is for time series data.

In this case, there was enough data and PIs

Predictive Insight, used correctly, is invaluable

How to import data into Predictive Insight, PI and in which cases it adds the most value?

Read more:

Compose Collection 2025

Real-time network performance monitoring using SevOne from IBM

AI Operator 2.0: AI-Driven Automation for Operators

Nordic Telecom Operator, Compose IT and IBM