使用数理统计模型从海量数据中有效挖掘信息越来越受到业界关注。在建立模型之初,为了尽量减小因缺少重要自变量而出现的模型偏差,通常会选择尽可能多的自变量。然而,建模过程需要寻找对因变量最具有强解释力的自变量集合,也就是通过自变量选择(指标选择、字段选择)来提高模型的解释性和预测精度。指标选择在统计建模过程中是极其重要的问题。Lasso算法则是一种能够实现指标集合精简的估计方法。
Tibshirani(1996)提出了Lasso(The Least Absolute Shrinkage and Selectionator operator)算法。这种算法通过构造一个惩罚函数获得一个精炼的模型;通过最终确定一些指标的系数为零,LASSO算法实现了指标集合精简的目的。这是一种处理具有复共线性数据的有偏估计。Lasso的基本思想是在回归系数的绝对值之和小于一个常数的约束条件下,使残差平方和最小化,从而能够产生某些严格等于0的回归系数,得到解释力较强的模型。R统计软件的Lars算法的软件包提供了Lasso算法。根据模型改进的需要,数据挖掘工作者可以借助于Lasso算法,利用AIC准则和BIC准则精炼简化统计模型的变量集合,达到降维的目的。因此,Lasso算法是可以应用到数据挖掘中的实用算法。
(一点心得,请高人指点 O(∩_∩)O~)
source:http://bbs.pinggu.org/thread-1415582-1-1.html
相关推荐
本书全面概述了神经网络。
A Brief Introduction to Boosting
A brief introduction to MetaPost A brief introduction to MetaPost A brief introduction to MetaPost A brief introduction to MetaPost A brief introduction to MetaPost
Think OS: A Brief Introduction to Operating Systems
关于 PySpark 的简介,适合新手入门学习。PySpark is a great language for performing exploratory data ... The goal of this post is to show how to get up and running with PySpark and to perform common tasks.
An introduction to neural network technology with a sample software written in Java.
doucument and guide about the openmp how to program
指标定理的介绍,来自知乎蓝青的论文,图片归集成pdf,便于阅读
A Brief Introduction to Machine Learning for Engineers A Brief Introduction to Machine Learning for Engineers
2010 JAVA ONE大会 ppt A Brief Introduction to Scala
supervision information like fully ground-truth labels due to the high cost of the data-labeling process. Thus, it is desirable for machine-learning techniques to work with weak supervision. This ...
Brief Introduction to Phoenix Parser
A Brief Introduction to Machine Learning, best machine learning book.
A textbook: "A Brief Introduction to Neural Networks
A Brief Introduction to Boosting.pdf
A Brief Introduction To DSB.ppt
Brief Introduction to MIPS32 Core Shadow Registers for Microcontroller Applications
这本简短的书介绍了人类如何利用人工智能来解决问题和完成任务。
A Brief Introduction to Machine Learning for Engineers