Multi-Label+classification

=多类标分类问题=

传统的单类标分类问题
当今的多类标分类： 多类标分类问题的应用方面： 一，生物基因功能分类问题，用于发现基因的未知功能，一般的数据集是Gene多媒体自动标记，比如照片图像的归类，标识，内容解释，视频的检测，and so on文本分类，对新闻报道的分类，情感态度识别，垃圾处理. 我的新想法 多类标分类问题的方法： 问题转化方法 算法转化方法 两种方法的比较： 一般来说问题转化方法需要多个模型，数据集need 预处理，方法不考虑类表之间的相互关联，相对低效 而算法转化方法正好与之相反，单模型分类，数据集为原始数据集，take count of the relationship among labels 还比较高效. Classification information 1 MLKNN: Multi-label learning originated from the investigation of text categorization problem, where each document may belong to several predefined topics simultaneously. In multi-label learning,the training set is composed of instances each associated with a set of labels, and the task is to predict the label sets of unseen instances through analyzing training instances with known label sets. In this paper, a multi-label lazy learning approach named MLKNN is presented. which is derived from the traditional K-Nearest Neighbors in the training set are firstly identified. After that, based on statistical information gained from the label sets of these neighboring instances, i.e. the number of neighboring instances belonging to each possible class, maximum a posteriori(MAP) principle is utilized to determine the label set for the unseen instance. Experiments on three different real-world multi-label learing problems ,i.e. Yeast gene functional analysis, natural scene classification and automatic web page categorization, show that MLKNN achieves superior performance to some well-established multi-label learning algorithms. 下面我们说一下具体思想， 度量方法 先介绍一下Hamming Loss math hloss_{s}(h)=\frac{1}{p}\sum_{i=1}^{p}\frac{1}{Q}|h(x_{i})\Delta Y_{i}| math 然后是One-error度量方法 (evalutates how many times the top ranked label is not in the set of proper labels of the instance.the performance is perfect when the value equals to zero.the smaller the better! math one-errors_{s}(f)=\frac{1}{p}\sum_{i=1}^{p}argmax_y\inY f(x_{i},y)]\notin Y_{i}] [[math 第三个是coverage it evaluates how far we need, on the average, to go down the list of labels in order to cover all the proper labels of the instance. it si loosely related to precision at the label of perfect recall. the smaller the better. math coverage_{S}(f)=\frac{1}{p}\sum_{i=1}^{p}max_{y\in Y_{i}}rank_{f}(x_{i},y)-1 math 接下来是ranking loss： It evaluates the average fraction of label pairs that are reversely ordered for the instance. The performance is perfect when rloss_{S}(f)=0; the smaller the better! math rloss_{S}(f)=\frac{1}{p}\sum_{i=1}^{p}\frac{1}{|Y_{i}||Y\up_{i}|}|{(y_{1},y_{2})|f(x_{i},y_{1})<=f(x_{i},y_{2}),(y_{1},y_{2})\in Y_{i}* Y_{i}}| math average precision: It evaluates the average fraction of labels ranked above a particular label math y\in Y math which actually are in Y, It is originally used in information retrival(IR)systems to evaluate the document ranking performance for query retrival. The performance is perfect when avgprefc_{s}(f)=1, the bigger the better math avgprec_{S}(f)=\frac{1}{p}\sum_{i=1}^{p}\frac {1}{Y_{i}}\sum_{y\in Y_{i}}\frac{|{y'|rank_{f}(x_{i},y')\smallorequal rank_{f}(x_{i},y),y'\in Y_{i}}|}{rank_{f}(x_{i},y)} math
 * =Introduction to MULAN=