20160516-深度学习综述及一个应用-冷杰武(1).pdf
深度学习综述 +应用:社群化制造交互上下文中的实体关系抽取 冷杰武 2016-5-17 西安交通大学 2019-4-19 广东工业大学 The Goal of this Talk? To provide you with an intuitive (but not all mathematical) understanding of what Deep Learning is and why it works. I’m not a Deep Learning expert, nor a mathematician – so I reserve the right to say “That’s a good question - I don’t know.” Overview 1. Overview of Deep Learning 1. Background(工业及学术) 2. 机器学习概念历史 3. 深度学习概念起源 4. 人工神经网络复习 5. 深度学习策略 2. Key techniques 3. Implementation Details 4. Deep Learning research progress 深度网络最主要的优势在于,它能以更加紧凑简洁的方式来表达比浅层网 络大得多的函数集合 ,换言之,它具有更强的表征能力。 深度学习总结 Overview 1. Overview of Deep Learning 2. Key techniques 3. Implementation methods 4. Deep Learning research progress } S2={“ outsource to ”; } Si-1={“ ”; } Si= {“ ”;} Semi-structured labeled Relationship data construction pre-processing Person test Social InteractionUn-structured (text) Interaction Context S1=0 0 0 1 0 0 1 0 0 … S2= 0 0 1 0 0 0 0 0 0 … S3= 1 0 0 0 0 1 0 0 0 … Si= 0 0 0 1 0 0 0 0 0 … Sn= 1 0 1 0 0 1 0 0 0 … L1=0 0 0 0 L2= 0 0 0 0 L3= 1 0 0 0 Li= 0 0 0 0 Ln= 1 0 0 0 Input training vectors X_labelX T1=1 0 0 1 0 1 0 0 0 … T2= 1 0 1 0 1 0 0 0 0 … Ti= 0 1 0 0 0 0 1 0 0 … Tn= 1 0 0 0 0 1 0 0 0 … T Input test vectors (unlabeled) train sam plin g ca pture R1= 0 0 0 0 R2= 0 1 0 0 Ri= 0 0 1 0 Rn= 1 0 0 0 Output extracted relationships test result Deep neural network Input Layer 𝑧1 Hidden Layer 3 Output LayerHidden Layer 2Hidden Layer 1 𝑥1 𝑥2 𝑥𝑖 𝑥4 𝑥𝑁 𝑥2 ෞ𝑥1 ෞ𝑥2 ෞ𝑥𝑖 ෞ𝑥4 ෞ𝑥N ෞ𝑥3 ෞ𝑥1 ෞ𝑥2 ෞ𝑥𝑖 ෞ𝑥4 ෞ𝑥N ෞ𝑥3 ෞ𝑥1 ෞ𝑥2 ෞ𝑥𝑖 ෞ𝑥4 ෞ𝑥N ෞ𝑥3 𝑧𝑀 3.模型 3.1 输入 本文中的研究对于任务中的实体关系语料标注形式如下所示。每行对应上下文中一 对实体,以及它们的信息的描述,包含了实体在当前句子中的起始位置以及两个实 体之间的关系类型。表示的形式为: E=“ENTITY text1” position1 || r = “relationship” || E= “ENTITY text2” position2 其中 E表示实体,前后两个分别表示实体对中的第一个和第二个,由于在一行中词 之间以空格分隔, position 信息由行和词在当前行的计数表示,行号从 1 开始,当 前行的词计数从 0开始, r 表示这对实体之间所对应的关系类型,如前面上图 中所述 的 5种关系之一。 对于实体关系抽取的识别效果,由于问题的特殊性,针对于关系类型的表述形式, 在本文中对 7种实体关系类别分别计算准确率和计算时间。 1 1 1 0 0 0 0 0 … … 𝑊𝐹𝑚…… Input data (X) Word vectors window processing 1 0 0 0 … 0 m 0 1 0 0 … i m-i 0 0 1 0 … m 0𝑃𝐹𝑚 𝑊𝐹𝑖 𝑃𝐹𝑖 𝑊𝐹1 𝑃𝐹1 0 𝑎𝑟𝑒 1 𝑠𝑝𝑒𝑐𝑖𝑎𝑙𝑖𝑧𝑒𝑑 2 𝑖𝑛 3 4 Initial sentence transform 𝑠𝑝𝑒𝑐𝑖𝑎𝑙𝑖𝑧𝑒𝑑 0 1 0 0 … Input data (X_label) extract key words (Relationship) transform … 3.模型 3.1 输入 预处理(原始数据) A.数据的预处理 :词语存在着大小写不统一、词形、时态以及领域词缩写等问题, 这些问题对于信息处理和进一步的工作有着一定的影响; B1.基本特征选择:要完成对于文本到特征集合的转换,通过对标注预料中的训练 数据和测试数据做相同方式的特征抽取,得到数据的特征向量表示,来进行下一 步的分类任务。 Category Examples ( X) X_Labeled Relationships Collaborate We will outsource a deep-hole machining job to Qinya. outsource machining job to Membership This gravure printing machine has a major component named compression roller. has component Call-for Wehave a gear boxgrinding task to outsource. have to outsource Capable-of We can provide with specialized computer-controlled glass milling capability. provide capability Manufacture These component are the up-grade version of coating machine which are already produced by Beiren. produced by Associate We have an assembly demand from the part of gravure printerto outsource. from part of Match Our assembly apart has the capacity to cover the requirement from Junye. cover requirement from 3.模型 3.1 输入 -word embeddings词袋 Word2vec:解决“词汇鸿沟”问题 可以通过计算向量之间的距离(欧式距离、 余弦距离等)来体现词与词的相似性! 3.模型 3.2—基于 SDAE的关系 pattern学习 1 1 1 * 0 * 0 0 … … 1 0 1 0 1 0 1 0 1 0 0 0 0 0 0 … … 1 1 1 0 0 0 … … 𝒙 𝒛 encoder 𝑓𝜽𝑒(𝑖) decoder 𝑔𝜽𝑑(𝑖) 𝒚 0 1 0 0 … x_label mapping 1 1 1 0 0 0 0 0 … … 𝒙 denoising single layer DAE Input Layer x Hidden Layer 1 Hidden Layer 2 Hidden Layer 3 Output Layer 𝑓𝜽𝑒(1) 𝑓𝜽𝑒(2) 𝑓𝜽𝑒(𝑖) 𝑓𝜽𝑒(𝑠𝑢𝑝) learned relationship pre-identified relationship Low - dim ension Ab stract Rel at ionship V ec tors Initi al Input T ext - base d V ec tors Noise inj ec ti on Auto - enc oder Dec oder Loss Funct ion K - 1 th La y er K th La y er K+1 th La y er Supe rvi se d fin e tu ni ng …… …… L1 para digm regula riz at ion V ec tor Reconstruct ed V e c to r 3.模型 3.3—SDAE算法细节 𝑳 𝒙,𝒛 = 𝒊 𝒙 𝒊 𝐥𝐨𝐠𝒙 𝒊 𝒛 𝒊 𝑱 𝜽𝒆,𝜽𝒅 = 𝑳 𝒙,𝒛 +𝜹 𝑗=0 𝜽𝒆,𝜽𝒅 𝜽𝒆𝑗,𝜽𝒅𝑗 𝜽𝒆 ← 𝜽𝒆 −𝑙 ∗𝜕 𝑱 𝜽𝒆,𝜽𝒅𝜕 𝜽 𝒆 , 𝜽𝒅 ← 𝜽𝒅 −𝑙 ∗𝜕𝑱 𝜽𝒆,𝜽𝒅𝜕 𝜽 𝒅 𝒛 = 𝑓𝜽𝑒 𝒙 = 𝑠 𝑾𝑒 ∗𝒙+𝒃𝑒 ෝ𝒙 = 𝑔𝜽𝑑 𝒛 = 𝑠 𝑾𝑑 ∗𝒛+𝒃𝑑 Kullback-Leibler损失函数 解码及编码 防过度拟合正则化惩罚项 随机梯度下降法更新权重 dict: a manufacturer capable of 1 has accepted the demand 2 $ a manufacturer capable of 1 has accepted the requirement 2 $ 4.程序及案例 4.1—预处理 4.程序及案例 4.1—输入数据统计 Items All Collabor ate Member ship Call-for Capable -of Manufa cture Associat e Match Samples (S) 6950 1140 830 1005 960 1005 980 1030 𝑁 1426 298 301 219 475 567 343 282 𝑀 97 37 28 42 28 40 35 35 (N-M)/N 0.93 0.87 0.91 0.81 0.94 0.93 0.90 0.88 Dimensi on high low relatively low low high high relatively high low Sparsity high relatively low relatively high low high high relatively high relatively low Items All Collabor ate Member ship Call-for Capabl e-of Manufa cture Associat e Match 𝐿1 1500 320 320 240 500 580 360 300 𝐿2 800 150 150 120 240 280 180 150 𝐿3 300 60 50 60 80 90 60 60 4.程序及案例 4.2—注算法参数设置 Algorithm parameters setting based on the sensitivity experiments 4.程序及案例 4.2—注算法参数设置 0 1 2 3 4 5 6 7 0 5 10 15 20 25 1 2 3 4 5 6 7 8 9 10 Pre dic t ion Error C omputati on T ime T ime ( min)Error N umbe r of Hidde n L ay e rs 0 5 10 15 20 25 E r r o r Lea r n i n g r a t e 0 1 2 3 4 5 6 7 8 9 10 Noi s e l evel 0 1 2 3 4 5 6 7 8 W ei g h t d eca y 4.程序及案例 4.2—算法运行过程 4.程序及案例 4.3—对比结果( BP、 DAE、 SDAE) Category Scale (S*N) Average Error(SD) Computation time BP DAE SAE SDAE BP DAE SAE SDAE All 9910700 7.85(0.42) 16.87(0.93) 4.67(0.02) 2.18(0.04) 69.4 176.3 53.7 53.9 Collaborate 339720 3.54(0.36) 5.85(0.43) 1.64(0.03) 0.55(0.03) 6.9 2.2 13.3 14.4 Membership 249830 2.61(0.46) 4.64(0.36) 0.82(0.02) 0.27(0.01) 4.1 1.5 9.6 10.3 Call-for 220095 1.53(0.24) 6.83(0.41) 0.59(0.03) 0.10(0.01) 3.5 1.1 7.8 8.7 Capable-of 456000 3.19(0.34) 9.58(0.54) 2.53(0.01) 1.13(0.02) 8.7 3.1 17.9 18.2 Manufacture 569835 4.23(0.31) 7.36(0.55) 2.16(0.03) 0.87(0.03) 11.6 4.7 20.2 20.4 Associate 336140 2.47(0.30) 4.46(0.40) 1.34(0.03) 0.82(0.02) 5.3 1.6 13.3 13.8 Match 290460 3.32(0.28) 6.17(0.46) 0.89(0.04) 0.41(0.02) 4.6 1.5 10.4 10.6 Prediction Error of three approaches for eight category Computation time variation with the problem scale 4.程序及案例 4.3—结果对比 结论: 1. 深度学习算法对问题规模有更好的鲁棒性; 2. 不同的数据维度及稀疏度算法会有不同表现,深度学习对高维度、高稀疏 数据表现更优秀。 3. 本研究的缺点是需要 NER的预处理,同时对错误数据没有过滤能力,可参 考 Open Refine; 4. 下一步预研对象是对制造过程中人 -机、机 -机交互数据的深度学习。 0 2 4 6 8 10 12 14 16 18 BP DA E SAE SDAE E r r o r 0 5 10 15 20 25 220095 249830 290460 336140 339720 456000 569835 P r o b l e m S c a l e BP D AE SA E SD AE Co m p u ta ti o n Ti m e 4.程序及案例 4.4—关系抽取软件界面 其他商务社交分析软件 http://www.radian6.com/ http://www.sas.com/software/customer-intelligence/social-media-analytics/ 参考文献 (基础 ) []. 刘建伟,刘媛,罗雄麟等,深度学习研究进展,计算机应用研究, 2014. 31(7):p.1921-1931 LeCun, Yann, Yoshua Bengio, and Geoffrey Hinton. “Deep learning.“ Nature 521, no. 7553 (2015): 436-444. []. Chen, M., et al., Marginalized Stacked Denoising Autoencoders, in Learning Workshop. 2012. []. Bengio, Y., et al., Greedy layerwise training of deep networks. NIPS, 2007: p. 153-160. []. Vincent, P., et al., Extracting and composing robust features with denoising autoencoders, in Proceedings of the 25th international conference on Machine learning. 2008. p. 1096-1103. []. Vincent, P., et al., Stacked denoising autoencoders: learning useful representations in a deep network with a local denoising criterion. Journal of Machine Learning Research, 2010. 11(12): p. 3371- 3408. References (机器学习) Sunila Gollapudi (2016), Practical Machine Learning, Packt Publishing Sebastian Raschka (2015), Python Machine Learning, Packt Publishing TensorFlow: https://www.tensorflow.org/ Rajat Monga (2016), TensorFlow: Machine Learning for Everyone, https://www.youtube.com/watch?v=wmw8Bbb_eIE Jeff Dean (2016), Large-Scale Deep Learning For Building Intelligent Computer Systems, The 9th ACM International Conference on Web Search and Data Mining (WSDM 2016), San Francisco, California, USA., February 22-25, 2016. http://www.wsdm-conference.org/2016/slides/WSDM2016-Jeff- Dean.pdf Deep Learning Basics: Neural Networks Demystified, https://www.youtube.com/playlist?list=PLiaHhY2iBX9hdHaRr6b7XevZtgZR a1PoU Deep Learning SIMPLIFIED, https://www.youtube.com/playlist?list=PLjJh1vlSEYgvGod9wWiydumYl8h OXixNu Theano: http://deeplearning.net/software/theano/ Keras: http://keras.io/ 81 References ( web mining) Bing Liu (2011) , “Web Data Mining: Exploring Hyperlinks, Contents, and Usage Data,” 2nd Edition, Springer. http://www.cs.uic.edu/~liub/WebMiningBook.html Bing Liu (2013), Opinion Spam Detection: Detecting Fake Reviews and Reviewers, http://www.cs.uic.edu/~liub/FBS/fake-reviews.html Bo Pang and Lillian Lee (2008), “Opinion mining and sentiment analysis,” Foundations and Trends in Information Retrieval 2(1-2), pp. 1–135, 2008. Wiltrud Kessler (2012), Introduction to Sentiment Analysis, http://www.ims.uni- stuttgart.de/~kesslewd/lehre/sentimentanalysis12s/introduction_sentimenta nalysis.pdf Z. Zhang, X. Li, and Y. Chen (2012), “Deciphering word-of-mouth in social media: Text-based metrics of consumer reviews,“ ACM Trans. Manage. Inf. Syst. (3:1) 2012, pp 1-23. Efraim Turban, Ramesh Sharda, Dursun Delen (2011), Decision Support and Business Intelligence Systems, Ninth Edition, 2011, Pearson. Guandong Xu, Yanchun Zhang, Lin Li (2011), Web Mining and Social Networking: Techniques and Applications, 2011, Springer 82 References (文本分析) Richard Socher, Alex Perelygin, Jean Y. Wu, Jason Chuang, Christopher D. Manning, Andrew Y. Ng, and Christopher Potts (2013), “Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank,“ In Proceedings of the conference on empirical methods in natural language processing (EMNLP), vol. 1631, p. 1642 http://nlp.stanford.edu/~socherr/EMNLP2013_RNTN.pdf Kumar Ravi and Vadlamani Ravi (2015), “A survey on opinion mining and sentiment analysis: tasks, approaches and applications.“ Knowledge- Based Systems, 89, pp.14-46. Vishal Kharde and Sheetal Sonawane (2016), “Sentiment Analysis of Twitter Data: A Survey of Techniques,“ International Journal of Computer Applications, vol 139, no. 11, 2016. pp.5-15. Jesus Serrano-Guerrero, Jose A. Olivas, Francisco P. Romero, and Enrique Herrera-Viedma (2015), “Sentiment analysis: A review and comparative analysis of web services,“ Information Sciences, 311, pp. 18-38. Steven Struhl (2015), Practical Text Analytics: Interpreting Text and Unstructured Data for Business Intelligence (Marketing Science), Kogan Page Bing Liu (2015), Sentiment Analysis: Mining Opinions, Sentiments, and Emotions, Cambridge University Press 83 References ( NLP) Abdel-Hamid, O., Mohamed, A., Jiang, H., and G. Penn, “Applying convolutional neural networks concepts to hybrid NN-HMM model for speech recognition,” Proc. ICASSP, 2012. Arel, I., Rose, C., and Karnowski, T. “Deep Machine Learning - A New Frontier in Artificial Intelligence,” IEEE Computational Intelligence Mag., Nov., 2010. Bengio, Y., Courville, A., and Vincent, P. “Representation learning: A review and new perspectives,” IEEE Trans. PAMI, 2013a. Bengio, Y. “Learning deep architectures for AI,” in Foundations and Trends in Machine Learning, Vol. 2, No. 1, 2009, pp. 1-127. Bengio, Y., De Mori, R., Flammia, G. and Kompe, F. “Global optimization of a neural network—Hidden Markov model hybrid,” in Proc. Eurospeech, 1991. Bergstra, J. and Bengio, Y. “Random search for hyper-parameter optimization,” J. Machine Learning Research,” Vol. 3, pp. 281- 305, 2012. Bouvrie, J. “Hierarchical Learning: Theory with Applications in Speech and Vision,” Ph.D. thesis, MIT, 2009. Bridle, J., L. Deng, J. Picone, H. Richards, J. Ma, T. Kamm, M. Schuster, S. Pike, and R. Reagan, “An investigation of segmental hidden dynamic models of speech coarticulation for automatic speech recognition,” Final Report for 1998 Workshop on Language Engineering, CLSP, Johns Hopkins, 1998. Ciresan, D., Giusti, A., Gambardella, L., and Schmidhuber, J. “Deep neural networks segment neuronal membranes in electron microscopy images,” Proc. NIPS, 2012. Collobert, R. “Deep learning for efficient discriminative parsing,” Proc. NIPS Workshop on Deep Learning and Unsupervised Feature Learning, 2010. Collobert, R. and Weston J. “A unified architecture for natural language processing: Deep neural networks with multitask learning,” Proc. ICML, 2008. Collobert, R., Weston, J., Bottou, L., Karlen, M., Kavukcuoglu, K., and Kuksa, P. “Natural language processing (almost) from scratch,” J. Machine Learning Research, Vo. 12, pp. 2493-2537, 2011. 85 Source: LeCun, Yann, Yoshua Bengio, and Geoffrey Hinton. “Deep learning.“ Nature 521, no. 7553 (2015): 436-444. Sebastian Raschka (2015), Python Machine Learning, Packt Publishing 86 Source: http://www.amazon.com/Python-Machine-Learning-Sebastian-Raschka/dp/1783555130 Sunila Gollapudi (2016), Practical Machine Learning, Packt Publishing 87 Source: http://www.amazon.com/Practical-Machine-Learning-Sunila-Gollapudi/dp/178439968X