讲座题目:Feature screening for clustering analysis
主讲人:席瑞斌 北京大学
讲座时间:2023年4月3日下午15:00
讲座地点:良乡校区数统楼311
主讲人简介:
席瑞斌,北京大学数学科学学院、统计科学中心研究员,长聘副教授,博士生导师。 2009年毕业于美国圣路易斯华盛顿大学,同年以助理研究员身份加入哈佛大学医学院从事生物医学信息学方面的研究。2012年9月加入北京大学。席瑞斌的主要研究方向是生物信息、高维统计、网络分析、贝叶斯统计、生物医学大数据、基因组大数据及肿瘤的精准医学。席瑞斌近年来有40多篇文章发表于PNAS, Science Translational Medicine等高水平的学术期刊。席瑞斌先后主持或参与过科技部973项目、国家重点研发项目、基金委重点项目及基金委面上项目等多个科研基金项目。
主讲内容:
We consider feature screening for ultrahigh dimensional clustering analyses. Based on the observation that the marginal distribution of any given feature is a mixture of its conditional distributions in different clusters, we propose to screen clustering features by independently evaluating the homogeneity of each feature’s mixture distribution. Important clustering-relevant features have heterogeneous components in their mixture distributions and unimportant features have homogeneous components. The well-known EM-test statistic is used to evaluate the homogeneity. Under general parametric settings, we establish the tail probability bounds of the EM-test statistic for the homogeneous and heterogeneous features, and further show that the proposed screening procedure can achieve the sure independent screening and even the consistency in selection properties. Limiting distribution of the EM-test statistic is also obtained for general parametric distributions. The proposed method is computationally efficient, can accurately screen for important clustering-relevant features and help to significantly improve clustering, as demonstrated in our extensive simulation and real data analyses.