【学术讲堂】统计学:分类响应数据流下的最优子抽样

发布者:统计与数据科学学院发布时间:2023-12-22浏览次数:12

专家简介】:艾明要,北京大学数学科学学院统计学二级教授、博士生导师。兼任全国应用统计专业学位研究生教育指导委员会委员、培养组组长,中国现场统计研究会副理事长,中国数学会概率统计学会第十一届理事会秘书长,中国统计学会常务理事。担任四个国际重要SCI期刊Stat SinicaJSPISPLStat编委,国内核心期刊 《系统科学与数学》、《数理统计与管理》、《数学进展》编委,科学出版社《统计与数据科学丛书》编委。主要从事大数据采样理论与算法、试验设计与分析、计算机仿真试验与建模、应用统计的教学和研究工作,在AOSJASABiometrika、《中国科学》等国内外重要期刊发表学术论文八十余篇。主持国家自然科学基金重点项目、重点项目子课题、面上项目等多项,参与完成科技部重点研发计划项目多项。获得北京大学优秀博士学位论文指导教师、北京大学通识教育核心课程主讲教师和北京市高等学校优秀教学成果二等奖。

报告摘要】:Timely analyzing categorical data which arrive quickly in large-scale chunks are in high demand, especially for the case that storage or access to the historical data is not always possible or desirable. This work introduces an efficient subsampling procedure for online data streams with multinomial logistic model to sequentially update the parameter estimator. The proposed online subsampling and estimating algorithm is computationally efficient, minimally storage-intensive, and allows for the scenario that the labels of data are expensive to measure and are not all provided initially. Some theoretical properties to quantify the asymptotic behavior of the proposed estimator are established. Optimal subsampling probabilities are given according to the $A$-optimality criterion. An adaptive subsampling algorithm is suggested for ease of practical implementation. The advantages of the proposed method are illustrated through numerical studies on both simulated and real data sets. 

时间:20231227  15:00

会议地点:崇真楼110