Doina Bucur

Assistant Professor in network data science at UTwente (The Netherlands). The name is pronounced: 'Doy-nah 'Boo-koor (and has a complicated etymology).

I learn (and make decisions from) empirical models for complex systems, from observational data. The models are networks (such as interaction or causal graphs) describing diverse systems well across disciplines: social, ecological, information, historical, and cultural. My methods are machine learning, network science, and evolutionary algorithms.

PI on the NWO Perspectief project Soil biodiversity analysis for sustainable production systems (SoilProS) (2023-2028), on functional ecological networks, co-PI on the NWA ORC project DECIDE: Democratizing AI, Empowering Citizens through Transparent Decision-making (2025-2031), on ethical AI design. FAIR Data Fund obtained (2022) to curate a dataset of constellation line figures from many world astronomies. Vice-chair of the Ethics Committee on Computer and Information Science at UTwente, where I handle AI ethics. Guest research staff at Netherlands Institute for Ecology. I serve in various Dutch (NWO) grant committees.

Social networks | Ecological networks | Information networks | Constellation line figures | AI ethics

Social networks: influence and network location

Among the many methods for influence maximisation, we were among the first to propose metaheuristics based on evolutionary algorithms: single-objective (2016), then multi-objective (2017, best paper award). This was followed by more efficient fitness functions and genetic operators, and a method using the downscaling of communities (2022) which drastically scales up the case studies if the network is modular. These metaheuristics lack explainability of the results: it is not clear why a certain network node has good spreading ability, by itself or in a group. Some explanations could be found for single influencers by linking models of influence diffusion with network statistics: one's influence can be predicted well by combinations of node centrality metrics first in small synthetic networks (2020), then also in large empirical networks (2020), and there is some common pattern of influential network positions across networks. I gave a keynote on this topic at Parallel Problem Solving from Nature (PPSN 2022, slides). (Image: network of Facebook pages with top influencers marked.)

Network of Facebook pages with top influencers marked

Ecological networks: from co-occurrence to functional models

In the SoilProS NWO Perspectief project we aim to learn functional (or: causal) ecological networks over soil biotics (species), abiotics (physico-chemical properties), management, and interventions. The input is: observational and interventional data, plus domain knowledge. This is challenging, because (1) the organisms are microscopic, so their interactions cannot be observed accurately, and (2) there are thousands of taxa, so we must first reduce dimensionality by learning functional groups of taxa. Ultimately, we aim to optimise interventions to best restore and stabilise the biology of the soil. (Image: correlation network of soil fungal species.)

We contribute to science as follows. EleMi (CompleNet 2024 and the authors' accepted version), is a new method (still correlational) to infer spatial co-occurrence networks. To better find the community structure, EleMi does not compute pairwise interactions, but does multi-regression with shared parameters. It is more robust and provides clearer community structure than the existing methods. Then, gFlora (BIOKDD 2024) learns functional co-response groups: groups of taxa whose total co-response effect associates well with a soil function. The novelty is in using the spatial co-occurrence network of taxa as well as the abundance of the biota, such that taxa with sparse abundance (but which does co-occur in the network, so may have important roles in the functional group) is also considered. To validate how far co-occurrence networks are from the reality of soil biota as seen from soil samples, we also developed a soil simulator. Here are slides from the Netherlands Soil Ecology conference (Oct 2024), which show the limitations of modelling soil with co-occurrence networks, how valid these are, and how we can move towards learning functional networks.

Correlation network of fungi species in soil

Information networks: predictive models of human and machine behaviour

Books form networks by their readers' co-buying habits. This provides information about readers: they are expected to prefer authors of their own gender, but how large is the bias, and with what consequences? In Gender homophily in online book networks (Information Sciences, 2019), I find that author gender assortativity reaches 0.50 : gender segregation is present, but not uniform: it is stronger in certain genres. Since female authors are a minority (33% of all authors), readers (likely female) with a positive bias to female authors end up reading equally from both genders; readers with a bias against female authors end up reading on median only 10-11% female authors. I gave a keynote on such intangible information networks at Network Traffic Measurement and Analysis (TMA 2022, slides). (Image: A community of books on sale on Amazon.com.)

In Learning the mechanisms of network growth (Scientific Reports, 2024) we learn which model of network growth (combinations of preferential attachment, fitness, aging) fit real-world citation networks best, and find that growth models themselves are easy to discriminate from observed dynamics, but the diagnosis of real-world citation networks is inconclusive---so citation networks are not accurately described by any of these typical models.

In Understanding Sparse Neural Networks from their topology via multipartite graph representations (Tr. Machine Learning Research, 2024) we do a topological analysis of SNNs with both linear and convolutional layers, with (i) a new input-aware Multipartite Graph Encoding (MGE), and (ii) new end-to-end topological metrics over the MGE. We show that these topological metrics are much better predictors of the accuracy drop than metrics computed from current input-agnostic single-layer encodings, and that which topological metrics are important varies at different sparsity levels and for different architectures.

I started modelling human-played games with piece captures from an ecological point of view. Here's a summary of empirical chess food webs (poster, CCS'24).

Community of comic books, linked by co-buying relationships

Constellation line figures: network structure, semantics, and geometry

Star constellations may be represented as line figures (spatial graphs in spherical coordinates). I digitised a dataset of constellation line figures (GitHub, .json format in progress) from scholarly literature, extending the sky cultures of the astronomical software Stellarium to 1900+ constellations from 75 cultures (from tribes to empires). Part 1 of the analysis measures the association between the type of culture and the network structure of constellations: The network signature of constellation line figures (PLOS ONE 2022, or ArXiv). This shows that the constellations cluster globally by network typology, as do the cultures. There is great diversity among the topologies drawn around the same root star, with only a minority being universal (those characterised by linear star patterns). Part 2 looks at the association between the semantics (symbolism) of constellations and the star pattern: The semantics of constellation line figures (ArXiv). This shows where in the world semantic similarities occur in the same regions of the sky, or over the same types of star clusters, finding many more semantic parallels than previously documented; I hypothesise which are natural effects of the star pattern, and which are likely cultural effects. Part 3, on the association between star pattern and line geometry, is work-in-progress. Gave a talk at The Artificial Sky meeting on data in cultural astronomy (slides, 2023). (Image: the reconstructed Golden Feline of the Inca in South America.)

The constellation Golden Feline of the Inca in South America