Abstract
Standard of practice approaches to time series cluster analysis involve careful feature engineering, often utilizing expert input to tune and select features by hand. In many cases, expert input may not be readily available, or there may not yet exist a community consensus on the ideal features for a given application. This paper compares the results of several cluster analysis methods, using both hand selected features and those extracted automatically, when applied to large geospatial time series telematics data from commercial trucking fleets. The impacts of feature selection, dimensionality reduction, and choice of clustering algorithm on the quality of clustering results are explored. Results from this analysis confirm prior results that domain agnostic features are competitive with the hand engineered features with respect to clustering quality metrics. These results also provide new insight into the most successful strategies for identifying structure in large unstructured vehicle telematics data, and suggest that time series clustering using automatic feature extraction can be an effective approach to extract structure from large scale geospatial time series data in cases when hand selected features are not available.
Original language | American English |
---|---|
Number of pages | 26 |
State | Published - 2020 |
NREL Publication Number
- NREL/TP-2C00-74212
Keywords
- clustering
- data analysis
- domain agnostic features
- feature engineering
- scalable
- segmentation
- time series