Exponential Random Graph Models; Big data; Patent citations; Snowball sampling; Social networks; Dynamic topic modeling; Knowledge networks; Data Retrieval
Borysenko Oleksandr, Byshkin Maksym (2021), CoolMomentum: a method for stochastic optimization by Langevin dynamics with simulated annealing, in Scientific Reports
, 11(1), 10705-10705.
ChakrabortyManajit, CrestaniFabio (2021), Old is Not Always Gold: Early Identification of Milestone Patents Employing Network Flow Metrics, in Proceedings of the Swiss Text Analytics Conference 2021
, Winterthur, SwitzerlandCEUR-WS.org, online.
Chakraborty Manajit, Byshkin Maksym, Crestani Fabio (2020), Patent citation network analysis: A perspective from descriptive statistics and ERGMs, in PLOS ONE
, 15(12), e0241797-e0241797.
ChakrabortyManajit, BahrainianSeyed Ali, CrestaniFabio (2020), Forecasting Patent Growth by Combining Time-Series Signals Using Covariance Patterns, in Proceedings of the First Joint Conference of the Information Retrieval Communities in Europe
, Samatan, GersCEUR-WS.org, online.
Stivala Alex, Robins Garry, Lomi Alessandro (2020), Exponential random graph model parameter estimation for very large directed networks, in PLoS ONE
, 15(1), e0227804.
Byshkin Maksym, Stivala Alex, Mira Antonietta, Robins Garry, Lomi Alessandro (2018), Fast Maximum Likelihood Estimation via Equilibrium Expectation for Large Network Data, in Scientific Reports
, 8(1), 11509.
Recent research recognizes the social character of knowledge production: ideas are embedded in complex networks connecting them to other ideas. The large space spanned by knowledge networks is not homogeneous: certain areas are more likely than others to produce innovation - unexpected combinations of existing ideas. Understanding how new knowledge is produced by recombination of existing knowledge is the key to understand, and predict technological, scientific and social innovation. This argument finds wide applicability in the analysis of patent data - one clear example of how flows of knowledge exchange, transfer and sharing may become observable and amenable for quantitative analysis. Against this general background, this project starts by merging four separate large datasets made available by the Organization for Cooperation and Economic Development (OECD). By tracking patent citations, he data covers the last 35 years of formalized knowledge production worldwide. The outcome will take the form of a very large patent citation network containing detailed information on patents and inventors. The merged dataset will be then combined further with information on corporate entities in order to link patents to firm-specific information. Financial data and data on international trade will also be merged into a single dataset.The dataset will be used to develop and test new and innovative information retrieval techniques and new network-analytic models for the analysis of very large datasets characterized by complex micro-relational structures. One objective of the project is to make available the next generation of statistical models for the analysis of “big” network data to the community of data scientists interested in the analysis of network structures at a very large scale. The project is organized into three distinct, but related subprojects. The first subproject (Constructing knowledge networks), employs contemporary data base management techniques and technologies to create and manage data produced by the largest and - possibly the most comprehensive - knowledge network available. The second subproject (Developing new computational approaches to the analysis of knowledge structures: Exploring the role of Switzerland), applies innovative data retrieval algorithms and topic modeling techniques to identify and delineate the different subnetworks linking Switzerland to the global structure of knowledge networks, and to represent the evolutionary development of knowledge structures in which Switzerland is embedded. The third subproject (Developing new computational statistics models for representing and analyzing large-scale knowledge networks), forwards the last generation of statistical models for social networks, and scales up currently available models to make them applicable to the analysis of networks of arbitrary size. Together, the three subprojects articulate possible answers to questions about the origins, development and change in the global structure of knowledge networks. The project may be classified in module three of NRP 75 (Applications) because it presents an application that may clearly benefit from the use of big data analytics. However, the project is unique as it focuses on the scientific, rather than the immediately practical, contribution of big data analysis. As such, the project clearly addresses issues of societal relevance (Module 2) in the context of an analysis of patent data. By developing new computational techniques and inferential technologies for the analysis of big data (Module 1). The interdisciplinary nature of the project is demonstrated by its general relevance to the NPR 75 “Big Data.” We believe that the project contributes significantly to building the bridge between technology and society through data that the NPR 75 was clearly designed to encourage. The research will be carried out at the Interdisciplinary Institute of Data Science (IDIDS) directed by Alessandro Lomi at the Universitá della Svizzera italiana (USI). The research team contains multi-disciplinary competences spanning the areas of computer science, social network analysis, management science, economics, data science, and statistics. The research project will benefit from the collaboration with the Institute of Computational Science (ICS) at USI, from the support of the Swiss National Center for Supercomputing (CSCS) in Lugano, and from the concrete support of Microsoft Research, a global commercial research company.