IoT Sensor Data aka Time Series Data Analysis

Sachindra Shekhar
4 min readJun 24, 2021

In IoT world, things equipped with sensors and with connectivity to internet, are capable of sharing snapshot of it’s states periodically.

We can collect raw sensor data frequently say every 10 seconds and send it over internet for computation on central cloud. Note: considering millions of appliances this would not be cost optimized solution. Memory required for concurrent connection, network cost, IO cost also will be increased by factor of million. Hence quite likely you would like to send edge computed data at less frequent interval. Nevertheless it is still IoT data, though optimized it still poses challenges.

What is Time Series Data Analysis and why is it challenging?

Time series data is stream of data which keeps coming at certain frequency and does not have any end defined. In IoT world, devices with sensors keeps reading sensor measurements at certain frequency. Each data sample is recorded and mostly processed at device itself before sending batch of data samples to cloud server over internet.

At cloud we can use several approach for bringing some meaningful insight into these data. In most of scenarios it could be simple aggregation over time dimension (hourly/daily/weekly/monthly) at certain level of pre-defined hierarchy. Given volume of data these aggregation especially at higher dimensional granularity e.g. time granularity weekly/monthly/yearly could be high latency, resource intensive operation.

To ensure user experience such high latency aggregation values are pre computed and stored through batch processing. Each query response is generated through lookup to pre computed aggregation over older data set and running live aggregation on smaller current data set (unbounded data stream).

However given unreliable nature of internet we get plenty of delayed old data forcing us to redo aggregation to correct previously stored computed values. Needless to highlight importance of storing state of computation here. We don’t want to mess up with inter-dependency of tasks.

Here equally challenging aspect of time series data is duplicate data. Due to transport layer QoS or device fault tolerance implementation we also get quite many duplicate data. Considering parallel windowing of stream data filtering out duplicate data from computation also a design challenge to consider. Mind the different types of windowing requirement, to name few tumbling window (no overlapping data), sliding window (with overlap) etc.

Other challenges though not that frequent is managing distributed transaction data, interpolation for missing data.

Hope that gives some idea on the kind of problem we are dealing with.

What are different types of IoT sensor data?

Let’s talk about sensor measurement data which we also call IoT data. It could be raw sampled measurements at some defined interval or it could be computed over data samples at defined interval.

Broadly I would like to categories types of sensor data as below:

  1. Periodic sum: Here measurement is just an incrementing counter value which gets reset at fixed interval. For example, a smart camera which counts every 1 minute, number of people not wearing face mask coming out of metro station. From industry example could be, count of screws produced by a machine every 1 minute.
people counting smart camera output

2. Rolling sum: Let’s start with example. Vehicle odometer reading is example of rolling sum. It always gives total distance covered so far and it is always incrementing. And you always need a window to know distance covered in that window. Here window could be start and end of trip.

Vehicle Odometer reading

3. Absolute measurement: Here measurement is sensor readings at fixed interval. For example, temperature sensor reading every 10 second interval. Here we get series of snapshot values at fixed interval. And usually average value of readings in each tumbling window(non overlapping) of say 1 min is computed

4. State: In this example, measurement values could be one given set of values. For example an appliance status could be ON or OFF or state of inverter battery could be CHARGING, DISCHARGING, or IDLE. And we can read every 1 min what is the battery state.

5. Computed measurement values: With inverter battery example, let’s say we just collect timestamp range when battery state is DISCHARGING. It is obvious that this type of data does not arrive at any fixed frequency, and we need interpolation before running aggregation. Another example could be machine unplanned downtime. This type of data analysis with delayed data is challenging.

Duration in red represents machine unplanned downtime

Conclusion

Hope I was able to give understanding of time series data analysis challenges and types of data in IoT context. I feel real challenge is handling volume and variety of data, all at once. There are more than one types of sensors sending stream of data from each device and there are million such devices from different vendors and of different specification

--

--