【学术讲座】Divide-and-Conquer: A Distributed Hierarchical Factor Approach to Modeling Large-Scale Time Series Data

发布者：统计与数据科学学院发布时间：2022-10-10浏览次数：1857

【专家简介】高照省，浙江大学数据科学研究中心“百人计划”研究员，博士生导师。2016年于香港科技大学数学系获得哲学博士学位，先后于2016-2017年在英国伦敦政治经济学院和2017-2019年在美国芝加哥大学布斯商学院从事博士后研究工作。回国前于2019-2021年任职于美国理海大学数学系担任助理教授。研究方向主要集中在经济和金融市场中的数据分析方法，高维和大尺度统计和时间序列数据的统计分析，因子学习以及金融资产定价模型。研究成果发表在 Journal of the American Statistical Association, Journal of Econometrics, International Journal of Forecasting, Statistica Sinica和Econometrics and Statistics等统计学和经济学杂志上。

【报告摘要】In this talk, I will introduce a hierarchical approximate-factor approach to analyzing high-dimensional, large-scale heterogeneous time series data using distributed computing. The new method employs a multiple-fold dimension reduction procedure using Principal Component Analysis (PCA) and shows great promises for modeling large-scale data that cannot be stored nor analyzed by a single machine. Each computer at the basic level performs a PCA to extract common factors among the time series assigned to it and transfers those factors to one and only one node of the second level. Each second-level computer collects the common factors from its subordinates and performs another PCA to select the second-level common factors. This process is repeated until the central server is reached, which collects factors from its direct subordinates and performs a final PCA to select the global common factors. The noise terms of the second-level approximate factor model are the unique common factors of the first-level clusters. We focus on the case of two levels in our theoretical derivations, but the idea can easily be generalized to any finite number of hierarchies, and the proposed method is also applicable to data with heterogeneous and multilevel subcluster structures that are stored and analyzed by a single machine. We introduce a new diffusion index approach to forecasting based on the global and group-specific factors. Some clustering methods are discussed when the group memberships are unknown. We further extend the analysis to unit-root nonstationary time series. Asymptotic properties of the proposed method are derived for the diverging dimension of the data in each computing unit and the sample size T. We use both simulated and real examples to assess the performance of the proposed method in finite samples, and compare our method with the commonly used ones in the literature concerning the forecasting ability of extracted factors.

腾讯会议号：821-349-577

时间：时间：2022年10月13日 10:00-11:00