R科学计量数据可视化(第二版) 本书详细介绍了意大利那不勒斯菲里德里克第二大学Massimo Aria和Corrado Cuccurullo基于R语言开发的BIBLIOMETRIX工具包。该R工具包基本上涵盖了进行科学计量和知识可视化的功能,可以满足爱好R软件,并试图使用R进行科学计量和知识图谱分析的读者。在此基础上,本书对于科学计量与知识图谱相关的一些R工具包,包括rAltmetric、wordcloud2、gender以及tidytext等工具包进行了简要介绍。 Preface We heard about bibliometrics 10 years ago for the first time. In 2008 Corrado was writing a monograph on fast growing firms, a niche theme, which he approached for the first time. Scientific literature was fairly limited. Scholars came from different disciplines with a variety of approaches and methods that made it difficult to cumulate the findings. We talked about this research problem once during a football match among scholars. Our discussion continued for several days on the various techniques of systematic analysis of literature. We enjoyed the exchange and concluded that bibliometrics was an interesting method and that it would have been fun to explore it together. Our goal became to examine the intellectual structure of fast growing firms research. We analyzed all the scientific production published in academic English-written journals. The analysis was complex because it required several steps and diverse analysis and mapping software tools, which were often available only under commercial licenses. All the process was unwieldly, from data-collection to data-visualization. Massimo greatly contributed with his statistical and coding skills. Our collaboration continued in moments of fun, such as our frequent football matches. While analyzing data, we discovered that we enjoyed working together. In short, our friendship soon turned into a scientific collaboration that still lasts. Within our departments and academic communities, the reaction to our work was positive. At that time, few people talked about bibliometrics in Italy, even from the point of view of research evaluation. Years later we presented a bibliometric analysis paper on performance management at the Annual Conference of the Academy of Management, the largest international management meeting. Also on that occasion, we got positive feedbacks that pushed us to persist. In the same years, young Italian colleagues asked us for suggestions for their literature reviews and for their research. Massimo opened some statistical analysis laboratories in R and together we presented the bibliometric analysis at some workshops. We are telling this story because without these feedbacks and stimuli we would not have published the bibliometrix release 0.1 in 2016. A year later we are at version 1.7, thanks to our growing passion for bibliometrics and to the suggestions that today come from scholars from all around the world. R-bibliometrix is currently a free tool for quantitative research in scientometrics and bibliometrics that includes all the main bibliometric methods of analysis, easy to use even for those who have no coding skills. Bibliometrix is a unique tool, developed in the statistical computing and graphic R language, according to a logical bibliometric workflow. R is highly extensible because it is an object-oriented and functional programming language, and therefore is pretty easy to automate analyses and create new functions. As it has an open-software nature, it is also easy to get help from the users community, mainly composed by prominent statisticians. Therefore, bibliometrix is flexible and can be rapidly upgraded and can be integrated with other statistical R-packages. That why, it is useful in a constantly changing science such as bibliometrics. Today bibliometrix is more than just a statistical tool. It is becoming a community of international developers and users who exchange questions, impressions, opinions, and examples within an open source project. For this reason, we are very honored that Dr Jie Liof the Research centerfor Safety and security SCITECH trends at the Department of Safety Science and Engineering, Shanghai Maritime University gave us the opportunity to tell you this story and to write an English preface for his book “Using R for Scientometrics data Visualization” that mainly introduces the BIBLIOMETRIX package to scholars and students. We said that Bibliometrix includes all the main bibliometric methods of analysis, but we use it especially for science mapping and not for measuring science, scientists, or scientific productivity. Synthesizing past research findings is one of the most important tasks in advancing a line of research. Various methods exist to summarize the amount of scientific activity in a domain, but bibliometrics has the potential to introduce a systematic, transparent and reproducible review process. This is very relevant in an age when the number of academic publications is rising at a very fast pace and it is increasingly unfeasible to keep track of everything that is being published; and when the emphasis on empirical contributions is resulting in voluminous and fragmented research streams, and a contested feld. Literature reviews are increasingly playing a crucial role in synthesizing past research findings to effectively use the existing knowledge base, advance a line of research, and give evidence-based insights into the practice of exercising and sustaining professional judgment and expertise. The overwhelming volume of new information, conceptual developments and data are the milieu in which bibliometrics becomes useful, by providing a structured analysis to a large body of information, to infer trends over time, themes researched, identify shifts in the boundaries of the disciplines, to detect most the prolifc scholars and institutions, and to show the “big picture” of extant research. Naples, Italy July 2017 Massimo Aria and Corrado Cuccurullo 前言 当前,我们正处于科学文献大数据时代。面对海量的文献,我们如何快速地了解一个研究领域、研究方向或者主题的整体格局以及未来的趋势?在此背景下,与该问题直接相关的科学计量理论、方法和技术的适时发展,成为解决上述科研问题的一种有效的途径。掌握与科学计量相关的技术和方法也成为科研工作者在新时代进行科学研究活动的基本技能要求。在过去十余年里,科学计量数据可视化的理论与方法已经大量地渗透到其他学科的研究实践中。在国内,这种以科学文本数据为研究对象,通过可视化技术来揭示学科结构、演进和互动的研究领域被统称为“科学知识图谱”。 科学计量数据可视化背后涉及大量的科学计量学(还包含文献计量学、网络计量学以及信息计量学)方面的基础理论,比如论文的作者生产率分布、论文的共被引、耦合、主题共现以及作者合作等。还包含了统计学和网络科学等方面的技术和方法,比如多维尺度分析、聚类分析、复杂网络分析、自然语言处理和文本挖掘等分析方法。上述理论和方法构成了进行科学计量数据可视化分析的知识基础,是进行知识图谱分析的前提。在理论和方法的支持下,当前国内外的相关学者已经开发了数十种科技文本挖掘方面的软件或者工具包,这些知名的工具包含了HistCite、BibExcel、CiteSpace、SCI2以及VOSviewer等。这些工具为有意借助领域文献分析以获取学科研究格局和动态的学者提供了可能。 笔者在过去5年从事科学计量和知识图谱的实践研究中,相继撰写了关于CiteSpace、VOSviewer以及BibExcel等方面的书籍,主要目的在于帮助非科学计量学领域的学者快速应用该方法辅助科学研究。从2016年开始,已经相继组织了4次与科学计量和知识图谱相关的活动,与来自国内的数百名知识图谱爱好者有过交流。在交流中,最为常见和令我反思的一个问题是:“我得到的图谱结果应该怎样解释呢?”我认为,科学计量及知识图谱的方法仅仅给我们提供了一种认识知识世界的新方式,但这种认识方式更需要知识图谱实践者结合自身的专业背景和知识图谱的理论与方法去思考。在进行科学计量和知识图谱分析的时候,读者一定要明确自己要解决的问题是什么,以及为什么知识图谱能够解决提出的问题,它与其他方法相比优势在哪里,等等。即在进行科学计量和知识图谱分析之前,一定要确定自己所要研究的问题,然后来选择使用何种知识图谱呈现方式解决问题。 本书是《CiteSpace:科技文本挖掘及可视化》《科学计量与知识网络分析——基于BibExcel等软件的实践》《科学知识图谱原理及应用——VOSviewer与CiteNetExplorer初学者指南》的姊妹篇。与前面这些应用程序不同的是,该书详细介绍了意大利那不勒斯菲里德里克第二大学(University of Naples Federico II)经济与统计系Massimo Aria和Corrado Cuccurullo基于R语言开发的BIBLIOMETRIX工具包。建议读者在应用时通过提供的链接来检查是否为最新版的BIBLIOMETRIX,在实际的研究中尽可能使用最新版来对数据进行分析(BIBLIOMETRIX-R Package for Bibliometric and Co-Citation Analysis,http://www.bibliometrix.org/)。该R工具包基本上涵盖了进行科学计量和知识可视化的功能(图0. 1),可以满足爱好R软件,并试图使用R进行科学计量和知识图谱分析的读者。在此基础上,对于科学计量与知识图谱相关的一些R工具包,如rAltmetric、wordcloud2、gender以及tidytext等工具包进行了介绍。本书对使用R进行英文全文本挖掘的介绍很少,对中文全文本挖掘尚未涉及。在今后的更新中将对使用R进行全文本挖掘进行适当的完善。 图0. 1bibliometrix功能概览 为了便于读者熟悉bibliometrix工具包,本书大多数的案例运行采用了工具包自带的数据,一些案例专门下载了Web of Science和Scopus数据集并进行了分析。案例中呈现了所分析的结果,但并未就结果进行描述性或者带有特定研究目的的解读。读者通过对这些结果的学习,自己去思考可以做些什么,或者至少可以通过这种方法了解自己所关注领域的基本情况。 本书在撰写中有如下约定: >后为代码 #为代码的说明 ##为代码运行的结果 感谢Massimo Aria和Corrado Cuccurullo,他们在本书写作过程中给予了大力帮助,并为本书撰写了英文序言。感谢首都经济贸易大学出版社杨玲社长对科学计量与知识图谱系列丛书出版的极力支持,感谢中国科学院李彬彬博士在提取子矩阵问题上的帮助,感谢滑铁卢大学博士后于淼对文稿提出的修改建议,感谢本书的责任编辑薛晓红以及研究生李平对本书的编辑和详细校对。 回首自己在科学计量和知识图谱研究与实践上的经历,感受五味杂陈。衷心地期望本书及相关系列丛书能进一步促进科学计量与知识图谱实践研究在国内的发展和普及,并使每一位读者受益。 李杰 2018年5月于北京 李杰, 博士/博士后,1987年生于陕西。现为中国科学院文献情报中心副研究员,研究领域为科学计量学与安全科学。担任Journal of Integrated Security and Safety Science共同主编、《安全与环境学报》青年编委会副主任、Safety Science等期刊编委,全国科学计量学与信息计量学专业委员会委员。发表学术论文60余篇,出版了《CiteSpace:科技文本挖掘及可视化》、《科学知识图谱原理及应用》、《科学计量与知识网络分析》以及《R科学计量数据可视化》等著作6部。 目录 第1讲R基础 1 1.1R下载 1 1.2R安装 3 1.3Rstudio安装 5 1.4安装包 6 1.5加载包 8 1.6包帮助 8 1.7引用包 9 1.8包数据调用 10 1.9用户数据加载 12 1.10编程错误 13 第2讲科学计量数据采集 14 2.1WoS数据 14 2.2Scopus数据 17 2.3PubMed数据 19 第3讲R科学计量分析基础 21 3.1R数据转换 21 3.2数据列名的意义 22 3.3数据集合并 23 3.4数据的除重 25 3.5数据的切片 26 3.6数据的编辑 27 3.7描述性分析 28 3.8统计可视化 33 3.9引文信息分析 36 3.10Altmetric信息 38 3.11作者排名分析 39 3.12作者性别判断 40 3.13h类指数 42 3.14Lotka分析 44 3.15知识单元时序分布 46 3.16文献与作者LCS计算 50 3.17被引次数标准化 52 3.18术语提取 54 第4讲R科学数据可视化 58 4.1知识单元隶属矩阵 58 4.2知识单元共现矩阵 60 4.3隶属矩阵的子矩阵 63 4.4共现矩阵的子矩阵 64 4.5共现矩阵标准化 66 4.6网络的可视化 67 4.7VOSviewer的可视化 70 4.8合作网络可视化 71 4.9耦合网络可视化 75 4.10共被引网络可视化 76 4.11历史引证网络分析 78 4.12共词网络可视化 80 4.13术语概念结构图 83 4.14语义地图分析 86 4.15主题演化可视化 89 4.16词云可视化 93 4.17PuMed数据可视化 96 4.18全文本挖掘及可视化 97 4.19高产作者动态 105 4.20耦合网络战略坐标图 106 4.21参考文献时间可视化 108 4.22分割网络图 110 第5讲网页版R-biblioshiny 113 5.1数据导入与格式转化(Data) 114 5.2数据筛选(Filter) 115 5.3数据集主要信息(Dataset) 116 5.4出版源信息(Sources) 119 5.5作者信息(Authors) 122 5.6文档信息(Documents) 127 5.7聚类(Clustering) 132 5.8概念结构(Conceptual Structure) 133 5.9认知结构(Interllectual Structure) 138 5.10社会结构(Social Structure) 140 第6讲上机实验 141 6.1特定作者的论文计量 141 6.2特定论文的科学计量 152 6.3特定机构的论文计量 163 6.4特定期刊的比较计量 175 6.5特定会议论文的计量 192 6.6特定主题文献的计量 203 6.7特定方法文献的计量 219 参考文献 230 附录 232 附录1R科学计量核心代码 232 附录2Web of Science核心字段含义 237 附录3常用的科学计量数据可视化工具 239 附录4R科学计量数据可视化工具包 240