Contributor biographies

Karam Abdulahhad is a postdoctoral researcher at GESIS – Leibniz Institute for the Social Sciences in Germany. He is engaged in the ExploreData project to build an advanced search engine for social science data. He has a PhD degree in computer sciences from Grenoble-Alpes University in France, where he tackled the problem of term mismatch. He proposed a new information retrieval (IR) model by adapting an idea from physics. His research interests include IR theory, logical/conceptual/semantic IR, machine learning, and text mining. He has recently been studying the profitability of modern embedding techniques in IR. He has taught in several universities and developed several tools.

Palakorn Achananuparp is a senior research scientist at the Living Analytics Research Centre (LARC), Singapore Management University. He is interested in developing and applying machine learning, natural language processing, and crowdsourcing techniques to solve problems in a variety of domains, including online social networks, politics, and public health.

Daniel Acuna is an assistant professor in the School of Information Studies at Syracuse University, Syracuse, NY. He runs the Science of Science and Computational Discovery Lab, supported by grants from NSF, DDHS, and DARPA and featured in Nature Podcast, Chronicle of Higher Education, NPR, and The Scientist. The goal of his current research is to predict future academic success and remove potential biases that scientists and funding agencies commit during peer review. He has created tools to improve literature search, peer review, and detect scientific fraud.

Bob Allen is developing a model-oriented approach to information organization. His previous work ranged from recommender systems to neural networks. Bob studied at Reed College and UCSD. He joined the research organization at Bell Laboratories. He then joined the Bellcore Applied Research group in information science and digital libraries. He was editor-in-chief of ACM Transactions on Information Systems and later chair of the ACM Publications Board. Since 1998 he has been a faculty member at a number of universities around the world, including Maryland, Drexel, Victoria (New Zealand), Tsukuba (Japan), and Yonsei (Korea).

Waleed Ammar is a senior research scientist at Google, where he works on NLP-related problems in biomedical and clinical applications. Before joining Google, Waleed was a research scientist at the Allen Institute for Artificial Intelligence where he led the Semantic Scholar research team. In 2016, he received a PhD degree in artificial intelligence from Carnegie Mellon University. Before pursuing the PhD, Waleed was a software developer at Microsoft Research, web developer at eSpace Technologies, and teaching assistant at Alexandria University.

Iz Beltagy is a research scientist at the Allen Institute for AI (AI2), working on Natural Language Processing, with a focus on scientific documents. He received his PhD in Computer Science from the University of Texas at Austin in 2016.

Stefan Bender is head of the Research Data and Service Center of the Deutsche Bundesbank and, since 2018, honorary professor at the School of Social Sciences, University of Mannheim. With his position at Deutsche Bundesbank he was chair of INEXDA (the Granular Data Network) and vice-chair of the German Data Forum (www.ratswd.de). Before joining the Deutsche Bundesbank, Stefan was head of the Research Data Center (RDC) of the Federal Employment Agency at the Institute for Employment Research (IAB), which established an international research data centre, including access to IAB data in the US (for example Berkeley, Harvard). His research interests are data access, data quality, merging administrative, survey data and/or big data, record linkage, management quality and mobility of inventors. He has published over 100 articles, in journals including American Economic Review and The Quarterly Journal of Economics.

Christine Betts is a software engineer working on human computation at Google AI. She graduated with honours in computer science from the University of Washington. While there, she was an intern at the Allen Institute for AI, and before that at Facebook and Google.

Katarina Boland is a research associate in the Knowledge Technologies for the Social Sciences department at GESIS – Leibniz Institute for the Social Sciences. She joined GESIS in August 2011 after earning her Magistra Artium degree in computational linguistics, computer science and psychology at Heidelberg University. Katarina has been part of the DFG projects InFoLiS I and InFoLiS II, which addressed the automatic linking of research data and scientific publications. Katarina’s main research interests lie in the field of natural language processing and text mining. Currently, she is primarily involved in research on information extraction, NLP and journalism, and automatic fact-checking.

Minh-Son Cao is a master’s student in the School of Computing at KAIST, under the supervision of Professor Sung-Hyon Myaeng at the Information Retrieval and Natural Language Processing Laboratory. Previously, he received his bachelor’s degree from the University of Engineering and Technology, Vietnam National University (VNU-UET) in June 2017. He was a member of the Data Mining and Knowledge Technology Laboratory from August 2015 to June 2017, under the supervision of Associate Professor Xuan-Hieu Phan. His research focuses on the application of deep learning in natural language processing, mainly on embedding problems.

Stefan Dietze is full professor of data and knowledge engineering at the Institute for Computer Science at Heinrich-Heine-University Düsseldorf, scientific director of the Knowledge Technologies for the Social Sciences department at GESIS – Leibniz Institute for the Social Sciences, and affiliated member at the L3S Research Center of the Leibniz University Hanover, Germany. His research interests are at the intersection of information retrieval, semantic technologies and artificial intelligence, and in particular the extraction, fusion and search of knowledge and data on the Web. Stefan’s work has been published at major scientific venues, such as WWW/The Web Conference, SIGIR, CHI and ISWC, where he also frequently serves as PC and/or OC member.

Dimitar Dimitrov is a postdoctoral researcher at GESIS – Leibniz Institute for the Social Sciences, Cologne, Germany. He obtained a PhD from the University of Koblenz-Landau, Koblenz, Germany. Before that, he studied software engineering at the University of Applied Sciences Konstanz, Konstanz, Germany, where he also obtained his master’s degree in computer science. At GESIS, Dimitar Dimitrov is working on the da|ra project aimed at delivering the software infrastructure for assigning DOI names to social and economic datasets. His research focuses on applying statistical and machine learning techniques to study user behaviour in web-based systems.

Hendrik Christian Doll is a data scientist at the Research Data and Service Centre of Deutsche Bundesbank, where his responsibilities include advancing various data science projects. He is a lecturer for exploratory data analysis at EBS Universität für Wirtschaft und Recht. His professional interests include applying supervised machine learning techniques for record linkage, using text extraction to find data citations in text, and automating data visualization in corporate design. He holds a MSc in Econometrics from the University of Geneva. When not in the office, you will find him rock climbing.

Behnam Ghavimi is a research fellow in the Knowledge Technologies for the Social Sciences department at GESIS – Leibniz Institute for the Social Sciences. He graduated from the University of Bonn with a master’s degree in computer science. His master’s thesis, under the supervision of Professor Sören Auer and Dr Philipp Mayr, was about detecting dataset references in texts. Since September 2016, he has been involved in different projects focused on NLP (text analysis and text mining) and recommender systems. One of his projects was the EXCITE project, jointly run by WeST at the University of Koblenz-Landau and GESIS, to extract citations from publications and make more citation data openly available.

Andrew Gordon is senior data engineer at Columbia University Information Technology. Previously, Andrew was a research information scientist with the Coleridge Initiative at New York University. There, Andrew served as an information specialist, programmer, and ETL engineer supporting the full research and administrative data life cycle for ingest, curation, facilitating data discovery, and providing access to sensitive, administrative data for academics and policy analysts. Andrew has an MS degree in information from the University of Michigan School of Information and a BA degree in cultural anthropology from the University of Michigan.

Suchin Gururangan is a predoctoral young investigator at the Allen Institute for AI (AI2). His research interests involve model evaluation and robustness in NLP, especially in low-resource settings and distant domains. Before joining AI2, Suchin was a master’s student in NLP at the University of Washington, and before graduate school, Suchin was a data scientist at various companies in Boston and Seattle.

Mark Hahnel is the CEO and founder of Figshare, which he created while completing his PhD in stem cell biology at Imperial College London. Figshare provides research data infrastructure for institutions, publishers and funders globally. Mark is passionate about open science and the potential it has to revolutionize the research community. Since 2012, Mark has led the development of research data infrastructure, with the aim of reusable and interoperable academic data. Mark sits on the board of DataCite, the advisory board for DOAJ, was a judge for the National Institutes of Health (NIH), Wellcome Trust Open Science prize, and acted as an advisor for the SpringerNature masterclasses.

Christian Herzog is CEO of Dimensions and chief portfolio officer at Digital Science. A medical doctor by training, Christian also studied economics and in 2005 started Collexis, a software company focused on text-mining based software applications for the research space. In 2010, Collexis was acquired by Elsevier, where Christian spent the following two years as VP for Product Management SciVal. In 2013, Christian and his co-founders started ÜberResesarch as part of Digital Science, which led to the launch of Dimensions as a large-scale research information infrastructure in 2018.

Christian Hirsch works at the Research Data and Service Centre of Deutsche Bundesbank. Before coming to Bundesbank he was a postdoctoral researcher and head of data center at the research institute Sustainable Architecture for Finance in Europe (SAFE), at Goethe University Frankfurt. He earned his PhD in Finance at Goethe University. Christian’s research interests include financial intermediation, monetary policy, and corporate governance. His work has been published in The Review of Financial Studies and Journal of Banking & Finance.

Daniel W. Hook is CEO of Digital Science. Daniel co-founded Symplectic, a research information management company, in 2003, with his PhD officemates. Symplectic received investment from Digital Science in 2010 and since then Daniel has worked across Digital Science’s portfolio. In 2015, he became CEO. Daniel continues to be an active researcher and holds visiting positions in Physics at Imperial College London and Washington University in St Louis, is a Policy Fellow at CSaP at the University of Cambridge, and is Co-Chair of the Research on Research Institute.

Giwon Hong is a master’s student in the School of Computing at KAIST and research assistant in the IR&NLP Lab at KAIST. He graduated from Sungkyunkwan University with a degree in computer science in February 2018. His research lies in the area of natural language processing, specifically in question answering and relation extraction.

Rricha Jalota is a developer in the Computer Science department of Paderborn University. She works in the areas of data access and knowledge extraction. Her interests lie in the application of machine learning/deep learning approaches to solve NLP problems in the domain of question answering, conversational AI and information retrieval.

Daniel King is an applied research scientist on the Semantic Scholar team at the Allen Institute for AI. He previously interned at Microsoft and Facebook, and received his BS degree in computer science from Harvey Mudd College in May 2018. His research interests are in natural language processing and using AI techniques to make useful tools for humans.

Sebastian Kohlmeier is the senior manager of programme management and business operations at the Allen Institute for AI, where he leads programme management for applied research, business intelligence and data science and partner development. Prior to joining the Allen Institute for AI, Sebastian worked as a technical programme manager and engineering manager in a variety of roles at Amazon and Microsoft. Sebastian graduated with honours from Western Washington University in 2007.

Philips Kokoh Prasetyo is a principal research engineer at the Living Analytics Research Centre (LARC) at Singapore Management University. He enjoys analysing data from many different perspectives, and his current interests include machine learning, natural language processing, text mining, and deep learning. He received a master’s degree from National Cheng Kung University in Taiwan, and a bachelor’s degree from Sekolah Tinggi Teknik Surabaya in Indonesia. He has received several awards, including an ACLCLP thesis award in 2009, and a DPU scholarship from 2007 to 2009.

Stacy Konkiel is the director of research relations at Dimensions and Altmetric. Stacy’s research interests include incentives systems in academia and informetrics, and she has written and presented widely about altmetrics, Open Science, and research data services. Previously, Stacy worked with teams at Impactstory, Indiana University & PLOS. You can learn more about Stacy at stacykonkiel.org.

Ekaterina Levitskaya is an associate research scientist at the Coleridge Initiative, New York University. She utilizes computational approaches to social science research, with a special focus on text analysis and natural language processing. Her background is in computational linguistics and applied data science. She is interested in applying computational skills in projects with social impact and utilizing text as data in a variety of applications for social science research.

Ee-Peng Lim is the Lee Kong Chian professor of information systems and director of the Living Analytics Research Center at Singapore Management University. He received his PhD degree in computer science from the University of Minnesota. His research expertise covers social media mining, social/urban data analytics, and information retrieval. He has published more than 90 international journal papers, many of them in top ACM and IEEE journals, and presented 280 conference papers. He received the Distinguished Contribution Award at the 2019 Pacific Asia Conference on Knowledge Discovery and Data Mining (PAKDD).

Jonathan Morgan is a doctoral candidate at the University of Mannheim. Jonathan has worked as a senior research scientist at New York University; a senior data scientist at the United States Census Bureau; a programmer, designer, and product manager for higher education systems integrations and data governance applications at various companies and institutions; and as an online producer for the New York Times and multiplatform editor for the Detroit News. He has a BA in computer science from Wittenberg University, an MA in journalism from NYU, and was a University Enrichment Fellow at Michigan State University.

Sung-Hyon Myaeng is a professor of Computer Science in the School of Computing and the head of Web Science and Technology Division at Korea Advanced Institute of Science and Technology (KAIST). He is also the Director of KAIST-Microsoft Research Collaboration Center (KMCC). Previously he was on the faculty at Syracuse University, USA, where he was granted tenure in 1994. He earned his MS and PhD from Southern Methodist University, Texas, USA in 1985 and 1987, respectively. His research has been in the intersection between lexical and semantic aspects in natural language processing and unconventional search techniques in information retrieval.

Axel-Cyrille Ngonga Ngomo is a full professor in the Computer Science department of Paderborn University. In his work he focuses on the life cycle of knowledge graphs. He has been involved in the development of approaches for the extraction, storage, querying, integration, fusion and exploitation of knowledge graphs. One core usage of knowledge graphs he explores is the development of explainable and responsible active machine learning algorithms. Axel is a proponent of open data, open research and open science, with a keen interest in paradigms and frameworks for reproducible scientific research.

Wolfgang Otto is a postgraduate and research associate at GESIS – Leibniz Institute for the Social Sciences in Germany. As part of the Knowledge Technologies for the Social Sciences department under Stefan Dietze, he applies NLP techniques on text and data corpora in the social sciences. Since finishing with a master’s degree at the NLP Group at Leipzig University (Prof. Dr. Gerhard Heyer), he has been part of a team in a third-party funded project (German Research Fund) to build up a specialized information service for political scientists (pollux-fid.de), a project of the State and University Library Bremen (SuUB) in cooperation with GESIS. During his studies, he collaborated on projects on digital humanities, applied text mining, and data science.

Sophie Rand is an associate research scientist working on the Rich Context project at the Coleridge Initiative. Previously, she was a public health data analyst at the New York City Department of Health and Mental Hygiene, first in the Bureau of Primary Care and Prevention, where she worked with data from health information exchanges and electronic health records in support of clinical–community public health programmes; and in the Division of Disease Control, working with real-time emergency department, reportable infectious disease, and school health data. Sophie holds a Bachelor of Science in Engineering degree from the Cooper Union and a Master’s in Public Health from the CUNY School of Public Health.

Michael Röder is a research associate and a PhD candidate in the Computer Science department of Paderborn University. His research focuses on data gathering, data analysis and benchmarking of linked data systems. He has been involved in several research projects and reviewed papers for various scientific journals and conferences.

Haritz Puerto San Roman is a master’s student in the School of Computing at KAIST and research assistant in the IR&NLP Lab at KAIST. He graduated from the University of Malaga with a degree in computer science in July 2017. His research lies in the application of machine learning to natural language processing, specifically to solve the problem of question answering.

Amila Silva graduated from the University of Moratuwa, Sri Lanka, with a first-class honours degree in electronics and telecommunication engineering, where he was placed second of the graduating class of 110 students. He is currently working towards a PhD degree at the Department of Computing and Information Systems, University of Melbourne, Australia. He was awarded the Melbourne Graduate Research Scholarship to support his studies. He was also awarded the Rowden White Scholarship, a prestigious scholarship provided by the University of Melbourne to talented PhD students. His research interests include continual learning, graph analytics, and data mining.

René Speck is a research associate and a PhD candidate in the data processing service center (Research and Development Department II) at Leipzig University. His work and research focus on knowledge extraction, knowledge graphs, natural language processing, and machine learning. René Speck has been involved in several projects at Leipzig University since 2013. He has also since then been a reviewer for several conferences and journals.

Nikit Srivastava is a master’s student and a student research assistant in the Computer Science department at Paderborn University. His research mainly focuses on data science chatbots and word embeddings. He has been involved in the development of many proofs of concept and prototype demonstrations for different scientific research papers and conferences.

Narges Tavakolpoursaleh is a postgraduate and research fellow at GESIS – Leibniz Institute for the Social Sciences in Germany. At the moment, as a part of a team, she is involved in a third-party funded project (STELLA) that aims to create an evaluation infrastructure for search and recommendation services within productive web-based search systems with real users.

Ricardo Usbeck is a senior (guest) researcher at Paderborn University focusing on data extraction and information retrieval. His main interest is in the combination of machine learning, statistics, and linked data. Ricardo is leading and executing several national and international research projects concerned with searching large amounts of heterogeneous and small, specific datasets using natural language.

Daniel Vollmers is a research associate and a PhD candidate in the Computer Science department of Paderborn University. His research focuses on question answering, knowledge extraction and machine learning. He has been involved in several research projects in these domains.

Alex D. Wade is programme manager for knowledge graphs and open science at the Chan Zuckerberg Initiative. Alex earned his master’s degree in library science from the University of Washington and has worked for the libraries at the University of California at Berkeley, the University of Michigan, and the University of Washington. Alex has spent his post-academic career working on problems in information retrieval, knowledge representation, and open science at Microsoft, Amazon, and Facebook, and currently works on the Meta service and the Open Science group at the Chan Zuckerberg Initiative.

Duane E. Williams is a vice president of US Government at Digital Science. Duane earned his doctorate in theoretical chemistry from the Quantum Theory Project at the University of Florida. His work focuses on improving strategic research investment decisions by using new data sets, tools and metrics to gain greater insight into the global research landscape. Prior to joining Digital Science, he served as a senior scientific analyst and project manager for the IP and Science division of Thomson Reuters (now Clarivate Analytics). There he designed and led custom analyses and software development to facilitate data-driven objective assessments of research programs.

Tong Zeng is a PhD candidate in the School of Information Management at Nanjing University and a visiting scholar in the School of Information Studies at Syracuse University, working with Professor Daniel Acuna in the Science of Science and Computational Discovery Lab. Tong’s research interests lie within text mining and scientometrics. In particular, he is interested in applying natural language processing and network science techniques on scientific literature to investigate, understand, and facilitate various aspects of scientific communication. His recent projects involve detecting dataset mentions in full text, assigning credit to datasets, and disambiguating authors at scale.

Andrea Zielinski is a senior research scientist at the Fraunhofer Institute for Systems and Innovation Research (ISI), Karlsruhe, Germany and conducts applied research in machine learning and text mining at the Innovation System Data Excellence Center (ISDEC). She studied computer science with a focus on artificial intelligence and linguistics at the University of Hamburg. In 2002, she received her PhD in computational linguistics from Saarland University. Since 2008 she has also served as a lecturer on text mining at the Department of Computational Linguistics, Heidelberg University, Germany. Her research interests lie at the intersection of natural language processing and machine learning, particularly on areas relating to text mining and semantics.

Madeleine van Zuylen is a data science analyst at the Allen Institute for AI (AI2) on the Semantic Scholar team. Before joining AI2, Madeleine graduated from the University of Notre Dame in 2017 with a BS in Biochemistry and Applied Computational Mathematics and Statistics.

Rich Search and Discovery for Research Datasets. Edited by Julia I. Lane, Ian Mulvany and Paco Nathan.

Student Resources

Contributor biographies