Doina Bucur

Main Publications

Conference proceedings BACRank: Ranking Building Automation and Control System Components by Business Continuity Impact [pdf]

Herson Esquivel Vargas, Marco Caselli, Doina Bucur, Erik Tews, Andreas Peter

Computer Safety, Reliability and Security (SafeComp). Springer, 2019.

Abstract:
Organizations increasingly depend on Building Automation and Control Systems (BACSs) to support their daily tasks and to comply with laws and regulations. However, BACSs are prone to disruptions caused by failures or active attacks. Given the role BACSs play in critical locations like airports and hospitals, a comprehensive impact assessment methodology is required that estimates the effect of unavailable components in the system. In this paper, we introduce BACRank, the first impact assessment methodology for BACSs focused on business continuity. The goal is to quantify the contribution of BACS components to different business activities. Moreover, we take functional dependencies among components into account to estimate indirect consequences throughout the infrastructure. We consider as important components those that are needed by core business activities and needed by other important components. We show the practical applicability of our approach on a real BACS deployed at a 5-story building hosting 375 employees on an international university campus. The experimental evaluation confirms that the proposed methodology successfully prioritizes the most relevant components of the system with respect to the business continuity perspective.

hide abstract show abstract
Conference proceedings FFORT: A benchmark set for fault tree analysis [pdf]

Enno Ruijters, Carlos E. Budde, Muhammad Chenariyan Nakhaee, Marielle Stoelinga, Doina Bucur, Djoerd Hiemstra, Stefano Schivo

European Safety and Reliability Conference (ESREL). Research Publishing Services, Singapore, 2019.

Abstract:
This paper presents FFORT (the Fault tree FOResT): A large, diverse, extendable, and open benchmark suite consisting of fault tree models, together with relevant metadata. Fault trees are a common formalism in reliability engineering, and the FFORT benchmark brings together a large and representative suite of fault tree models. The benchmark provides each fault tree model in standard Galileo format, together with references to its origin, and a textual and/or graphical description of the tree. This includes quantitative information such as failure rates, and the results of quantitative analyses of standard reliability metrics, such as the system reliability, availability and mean time to failure. Thus, the FFORT benchmark provides: (1) Examples of how fault trees are used in various domains; (2) A large class of tree models to evaluate fault tree methods and tools; (3) Results of analyses to compare newly developed methods with the benchmark results. Currently, the benchmark suite contains 202 fault tree models of great diversity in terms of size, type, and application domain. The benchmark offers statistics on several relevant model features, indicating e.g. how often such features occur in the benchmark, as well as search facilities for fault tree models with the desired features. In addition to the trees already collected, the website provides a user-friendly submission page, allowing the general public to contribute with more fault trees and/or analysis results with new methods. Thereby, we aim to provide an open-access, representative collection of fault trees at the state of the art in modeling and analysis.

hide abstract show abstract
Conference proceedings Induction of Fault Trees through Bayesian Networks [pdf]

Alexis Linard, Marcos Bueno, Doina Bucur and Marielle Stoelinga

European Safety and Reliability Conference (ESREL). Research Publishing Services, Singapore, 2019.

Abstract:
Cyber-physical systems have increasingly intricate architectures and failure modes, which is due to an explosion of their complexity, size, and failure criticality. While expert knowledge of individual components exists, their interaction is complex. For these reasons, obtaining accurate system reliability models is a hard task. At the same time, systems tend to be continuously monitored via advanced sensor systems. This data describes the components’ failure behavior, and can be exploited for failure diagnosis and learning of reliability models. This paper presents an effective algorithm for learning of Fault Trees from data. Fault trees (FTs) are a wide spread formalism in reliability engineering. They capture the failure behavior of components and their propagation through an entire system. To that end, we first use machine learning to compute a Bayesian Network (BN) highlighting probabilistic relationships between the failures of components and root causes. Then, we apply a set of rules to translate a BN into a FT, based on the Conditional Probability Tables to decide, amongst other, the nature of gates in the FT. We evaluate our method on synthetic data and on a benchmark set of FTs

hide abstract show abstract
Article Causal Discovery with Attention-Based Convolutional Neural Networks [pdf] [MDPI]

Meike Nauta, Doina Bucur, Christin Seifert

Machine Learning and Knowledge Extraction. MDPI, 2019.

Abstract:
Having insight into the causal associations in a complex system facilitates decision making, eg, for medical treatments, urban infrastructure improvements or financial investments. The amount of observational data grows, which enables the discovery of causal relationships between variables from observation of their behaviour in time. Existing methods for causal discovery from time series data do not yet exploit the representational power of deep learning. We therefore present the Temporal Causal Discovery Framework (TCDF), a deep learning framework that learns a causal graph structure by discovering causal relationships in observational time series data. TCDF uses attention-based convolutional neural networks combined with a causal validation step. By interpreting the internal parameters of the convolutional networks, TCDF can also discover the time delay between a cause and the occurrence of its effect. Our framework learns temporal causal graphs, which can include confounders and instantaneous effects. Experiments on financial and neuroscientific benchmarks show state-of-the-art performance of TCDF on discovering causal relationships in continuous time series data. Furthermore, we show that TCDF can circumstantially discover the presence of hidden confounders. Our broadly applicable framework can be used to gain novel insights into the causal dependencies in a complex system, which is important for reliable predictions, knowledge discovery and data-driven decision making.

hide abstract show abstract
Article Gender homophily in online book networks [pdf] [ScienceDirect]

Doina Bucur

Information Sciences. Elsevier, 2019.

Abstract:
We measure the gender homophily (and other network statistics) on large-scale online book markets: amazon.com and amazon.co.uk, using datasets describing millions of books sold to readers. Large book networks are created by sales (two books are connected if many readers have bought both books) and can recommend new books to buy. The networks are analysed by the gender of their first author: is book consumption assortative by gender? Book networks are indeed gender-assortative: readers globally prefer to read from one author gender (the global assortativity coefficients by gender is around 0.4). Although 33% of first authors among all books are female, female books are not proportionally sold together with male books: an average of 20% (and median of 11%) of books co-bought with male books are female books. Instead, female books make up on average more than half of the books co-bought with other female books. The gender makeup of literary genres and structural book communities show that the gender homophily originates in a gender skew not only in certain literary genres (a fact known from prior studies), but even more strongly in certain book communities, with these book communities spanning multiple literary genres.

hide abstract show abstract
Conference proceedings LIFT: Learning Fault Trees from Observational Data [pdf] [Springer]

Meike Nauta, Doina Bucur, Marielle Stoelinga

International Conference on Quantitative Evaluation of SysTems (QEST). Springer, 2018.

Abstract:
Industries with safety-critical systems increasingly collect data on events occurring at the level of system components, thus capturing instances of system failure or malfunction. With data availability, it becomes possible to automatically learn a model describing the failure modes of the system, i.e., how the states of individual components combine to cause a system failure. We present LIFT, a machine learning method for static fault trees directly out of observational datasets. The fault trees model probabilistic causal chains of events ending in a global system failure. Our method makes use of the Mantel-Haenszel statistical test to narrow down possible causal relationships between events. We evaluate LIFT with synthetic case studies, show how its performance varies with the quality of the data, and discuss practical variants of LIFT.

hide abstract show abstract
Conference proceedings Evaluating Surrogate Models for Multi-Objective Influence Maximization in Social Networks [pdf] [ACM]

Doina Bucur, Giovanni Iacca, Andrea Marcelli, Giovanni Squillero, Alberto Tonda

GECCO ’18 Companion. ACM, 2018.

Abstract:
One of the most relevant problems in social networks is influence maximization, that is the problem of finding the set of the most influential nodes in a network, for a given influence propagation model. As the problem is NP-hard, recent works have attempted to solve it by means of computational intelligence approaches, for instance Evolutionary Algorithms. However, most of these methods are of limited applicability for real-world large-scale networks, for two reasons: on the one hand, they require a large number of candidate solution evaluations to converge; on the other hand, each evaluation is computationally expensive in that it needs a consider- able number of Monte Carlo simulations to obtain reliable values. In this work, we consider a possible solution to such limitations, by evaluating a surrogate-assisted Multi-Objective Evolutionary Algorithm that uses an approximate model of influence propagation (instead of Monte Carlo simulations) to find the minimum-sized set of most influential nodes. Experiments carried out on two social networks datasets suggest that approximate models should be carefully considered before using them in influence maximization approaches, as the errors induced by these models are in some cases too big to benefit the algorithmic performance.

hide abstract show abstract
Conference proceedings Improving Multi-Objective Evolutionary Influence Maximization in Social Networks [pdf] [Springer]

Doina Bucur, Giovanni Iacca, Andrea Marcelli, Giovanni Squillero, Alberto Tonda

European Conference on the Applications of Evolutionary and Bio-inspired Computation (EvoApplications), track EvoComNet: Application of Nature-inspired Techniques for Communication Networks and other Parallel and Distributed Systems. Springer, 2018.

Abstract:
In the context of social networks, maximizing influence means contacting the largest possible number of nodes starting from a set of seed nodes, and assuming a model for influence propagation. The real-world applications of influence maximization are of uttermost importance, and range from social studies to marketing campaigns. Building on a previous work on multi-objective evolutionary influence maximization, we propose improvements that not only speed up the optimization process considerably, but also deliver higher-quality results. State-of-the-art heuristics are run for different sizes of the seed sets, and the results are then used to initialize the population of a multi-objective evolutionary algorithm. The proposed approach is tested on three publicly available real-world networks, where we show that the evolutionary algorithm is able to improve upon the solutions found by the heuristics, while also converging faster than an evolutionary algorithm started from scratch.

hide abstract show abstract
Conference proceedings On the gender of books: author gender mixing in book communities [pdf] [Springer]

Doina Bucur

International Conference on Complex Networks and Their Applications. Springer Studies in Computational Intelligence Series, 2017.

Abstract:
Using a book co-buying network from amazon.com of over 1 million books, we find empirically that readers who have purchased male first authors before are substantially less likely than expected to buy books by female first authors, when aggregated across the entire book market. Conversely, past buyers of female authors are slightly more likely than expected to buy other female authors. This same-gender assortativity is found to be local: certain writing genres are ``coloured'' preferentially by one gender. This can be attributed both to writer availability (i.e., a gender's preferential attachment to writing for one genre), and to the buyers' preferential attachment to the output of writers of one gender. We obtain these insights by classifying the gender of the first author for most of the books, then running statistical tests which compare the gender makeup of books co-bought with either male or female books. Structural book communities, as generated from readers' co-buying choices, are computed, visualised in terms of gender makeup, and their writing genres are summarised to match the genre with a gender makeup.

hide abstract show abstract
Article A stochastic De Novo assembly algorithm for viral-sized genomes obtains correct genomes and builds consensus [pdf] [ScienceDirect]

Doina Bucur

Information Sciences. Elsevier, 2017.

Abstract:
A genetic algorithm with stochastic macro mutation operators which merge, split, move, reverse and align DNA contigs on a scaffold is shown to accurately and consistently assemble raw DNA reads from an accurately sequenced single-read library into a contiguous genome. A candidate solution is a permutation of DNA reads, segmented into contigs. An interleaved merge operator for contigs allows for the quick minimization of a fitness function measuring the string length of a candidate solution. This study assembles read libraries for three genomic fragments from different organisms, five complete virus genomes, and one complete bacterial genome, with the largest genome length of 159 kbp. To evaluate the accuracy of any assembled genome, test libraries of DNA reads are generated from reference genomes, and the assembly is compared to the reference. The method has very high assembly accuracy: over repeated assemblies for each input genome, the original genome was constructed optimally in over 85 % of the runs. Given the consistency of the algorithm, the method is suitable to determine the consensus genome in de-novo assembly problems. There are two limitations to the method: genomes with long repeats may be overcompressed, and the computational complexity is high.

hide abstract show abstract
Conference proceedings Towards accurate De Novo assembly for genomes with repeats [paper, pdf] [poster, pdf] [IEEE Xplore]

Doina Bucur

International Conference on Computational Intelligence in Bioinformatics and Computational Biology. IEEE, 2017.

Abstract:
De novo genome assemblers designed for short k-mer length or using short raw reads are unlikely to recover complex features of the underlying genome, such as repeats hundreds of bases long. We implement a stochastic machine-learning method which obtains accurate assemblies with repeats and self-validates assemblies via consensus. For this, a prior assembler is extended with the ability to (a) assemble long, variable-length raw reads, which may span and unambiguously recover interspersed repeats in the genome, and (b) recognize long, direct terminal repeats during the assembly, then report an unambiguous circular assembly. Consensus is obtained via stochastically independent runs of the assembler on the same read library. We experiment on viral and mitochondrial genomes of up to 41 kbp, with synthetic raw-read libraries, to be able to evaluate the assembly against a reference. We show the prerequisites for obtaining polished assemblies. For genomes with interspersed repeats, using raw reads of average length comparable to the repeat length gives a polished genome. Genomes with long direct terminal repeats can be assembled accurately also with reads shorter than the repeats. In both cases, a simple majority forms consensus, with over 70 % of independent runs on this set of genomes yielding a perfect assembly.

hide abstract show abstract
Article Improved search methods for assessing Delay-Tolerant Networks vulnerability to colluding strong heterogeneous attacks [pdf] [ScienceDirect]

Doina Bucur, Giovanni Iacca

Expert Systems With Applications. Elsevier, 2017.

Abstract:
Increasingly more digital communication is routed among wireless, mobile computers over ad-hoc, unsecured communication channels. In this paper, we design two stochastic search algorithms (a greedy heuristic, and an evolu- tionary algorithm) which automatically search for strong insider attack methods against a given ad-hoc, delay-tolerant communication protocol, and thus expose its weaknesses. To assess their performance, we apply the two algorithms to two simulated, large-scale mobile scenarios (of different route morphology) with 200 nodes having free range of movement. We investigate a choice of two standard attack strategies (dropping messages and flooding the network), and four delay-tolerant routing protocols: First Contact, Epidemic, Spray and Wait, and MaxProp. We find dramatic drops in performance: replicative protocols (Epidemic, Spray and Wait, MaxProp), formerly deemed resilient, are compromised to different degrees (delivery rates between 24% and 87%), while a forwarding protocol (First Contact) is shown to drop delivery rates to under 5% in all cases by well-crafted attack strategies and with an attacker group of size less than 10% the total network size. Overall, we show that the two proposed methods combined constitute an effective means to discover (at design-time) and raise awareness about the weaknesses and strengths of existing ad-hoc, delay-tolerant communication protocols against potential malicious cyber-attacks.

hide abstract show abstract
Conference proceedings Multi-Objective Evolutionary Algorithms for Influence Maximization in Social Networks [pdf] [Springer]

Doina Bucur, Giovanni Iacca, Andrea Marcelli, Giovanni Squillero, Alberto Tonda

European Conference on the Applications of Evolutionary and Bio-inspired Computation (EvoApplications), track EvoComNet: Application of Nature-inspired Techniques for Communication Networks and other Parallel and Distributed Systems. Springer, 2017. (Best paper award)

Abstract:
As the pervasiveness of social networks increases, new NP-hard related problems become interesting for the optimization community. The objective of influence maximization is to contact the largest possible number of nodes in a network, starting from a small set of seed nodes, and assuming a model for information propagation. This problem is of utmost practical importance for applications ranging from social studies to marketing. The influence maximization problem is typically formulated assuming that the number of the seed nodes is a parameter. Differently, in this paper, we choose to formulate it in a multi-objective fashion, considering the minimization of the number of seed nodes among the goals, and we tackle it with an evolutionary approach. As a result, we are able to identify sets of seed nodes of different size that spread influence the best, providing factual data to trade-off costs with quality of the result. The methodology is tested on two real-world case studies, using two different influence propagation models, and compared against state-of-the-art heuristic algorithms. The results show that the proposed approach is almost always able to outperform the heuristics.

hide abstract show abstract
Conference proceedings De Novo DNA Assembly with a Genetic Algorithm Finds Accurate Genomes Even with Suboptimal Fitness [pdf] [Springer]

Doina Bucur

European Conference on the Applications of Evolutionary and Bio-inspired Computation (EvoApplications), track EvoBio: Evolutionary Computation, Machine Learning and Data Mining for Biology and Medicine. Springer, 2017.

Abstract:
We design an evolutionary heuristic for the combinatorial problem of de-novo DNA assembly with short, overlapping, accurately sequenced single DNA reads of uniform length, from both strands of a genome without repeated sequences. The representation of a candidate solution is a novel segmented permutation: an ordering of DNA reads into contigs, and of contigs into a DNA scaffold. Mutation and crossover operators work at the contig level. The fitness function minimizes the total length of scaffold (i.e., the sum of the length of the overlapped contigs) and the number of contigs on the scaffold. We evaluate the algorithm with read sets uniformly sampled from genomes 3835 to 48502 base pairs long, with genome coverage between 4 and 7, and verify the biological accuracy of the scaffolds obtained by comparing them against reference genomes. We find the correct genome as a contig string on the DNA scaffold in over 90% of all assembly runs. For the smaller read sets, the scaffold obtained consists of only the correct contig; for the larger read sets, the fitness of the solution may be suboptimal, with chaff contigs present; however, a simple post-processing step can realign the chaff onto the correct genome.

hide abstract show abstract
Conference proceedings Influence Maximization in Social Networks with Genetic Algorithms [pdf] [Springer]

Doina Bucur, Giovanni Iacca

European Conference on the Applications of Evolutionary and Bio-inspired Computation (Evo* EvoApplications), track EvoComplex: Evolutionary Algorithms and Complex Systems. Springer, 2016.

Abstract:
We live in a world of social networks. Our everyday actions, thoughts, and choices are often influenced by interactions with our social partners. In many practical applications, it is of great interest to determine which nodes have the highest influence over the network. These nodes might be, for instance, the target for early adopters of a product, the most influential endorsers in political elections, or the most important investors in financial operations, just to name a few examples. Here, we tackle the NP-hard problem of influence maximization on social networks by means of a Genetic Algorithm. We show that, by using simple genetic operators, it is possible to find in feasible runtime solutions of high-influence that are comparable, and occasionally better, than the solutions found by a number of known heuristics (one of which was previously proven to have the best possible approximation guarantee, in polynomial time, of the optimal solution). The advantages of Genetic Algorithms show, however, in them not requiring any assumptions about the graph underlying the network, and in them obtaining more diverse sets of feasible solutions than current heuristics.

hide abstract show abstract
Conference proceedings Benchmark Datasets for Fault Detection and Classification in Sensor Data [dataset release v3] [SciTePress]

Bas de Bruijn, Tuan Anh Nguyen, Doina Bucur, Kenji Tei

International Conference on Sensor Networks (SensorNets). SciTePress, 2016.

Abstract:
Data measured and collected from embedded sensors often contains faults, i.e., data points which are not an accurate representation of the physical phenomenon monitored by the sensor. These data faults may be caused by deployment conditions outside the operational bounds for the node, and short- or long-term hardware, software, or communication problems. On the other hand, the applications will expect accurate sensor data, and recent literature proposes algorithmic solutions for the fault detection and classification in sensor data.
In order to evaluate the performance of such solutions, however, the field lacks a set of benchmark sensor datasets. A benchmark dataset ideally satisfies the following criteria: (a) it is based on real-world raw sensor data from various types of sensor deployments; (b) it contains (natural or artificially injected) faulty data points reflecting various problems in the deployment, including missing data points; and (c) all data points are annotated with the ground truth, i.e., whether or not the data point is accurate, and, if faulty, the type of fault.
We prepare and publish three such benchmark datasets, together with the algorithmic methods used to create them: a dataset of 280 temperature and light subsets of data from 10 indoor Intel Lab sensors, a dataset of 140 subsets of outdoor temperature data from SensorScope sensors, and a dataset of 224 subsets of outdoor temperature data from 16 Smart Santander sensors. The three benchmark datasets total 5.783.504 data points, containing injected data faults of the following types known from the literature: random, malfunction, bias, drift, polynomial drift, and combinations. We present algorithmic procedures and a software tool for preparing further such benchmark datasets.

hide abstract show abstract
Article Optimizing groups of colluding strong attackers in mobile urban communication networks with evolutionary algorithms [pdf] [Elsevier]

Doina Bucur, Giovanni Iacca, Marco Gaudesi, Giovanni Squillero, Alberto Tonda

Applied Soft Computing (ASOC). Elsevier, Volume 40, March 2016, Pages 416–426.

Abstract:
In novel forms of the Social Internet of Things, any mobile user within communication range may help routing messages for another user in the network. The resulting message delivery rate depends both on the users' mobility patterns and the message load in the network. This new type of configuration, however, poses new challenges to security, amongst them, assessing the effect that a group of colluding malicious participants can have on the global message delivery rate in such a network is far from trivial. In this work, after modeling such a question as an optimization problem, we are able to find quite interesting results by coupling a network simulator with an evolutionary algorithm. The chosen algorithm is specifically designed to solve problems whose solutions can be decomposed into parts sharing the same structure. We demonstrate the effectiveness of the proposed approach on two medium-sized Delay-Tolerant Networks, realistically simulated in the urban contexts of two cities with very different route topology: Venice and San Francisco. In all experiments, our methodology produces attack patterns that greatly lower network performance with respect to previous studies on the subject, as the evolutionary core is able to exploit the specific weaknesses of each target configuration.

hide abstract show abstract
Article Characterizing Topological Bottlenecks for Data Delivery in CTP using Simulation-Based Stress Testing with Natural Selection [pdf] [Elsevier]

Doina Bucur, Giovanni Iacca, Pieter-Tjerk de Boer

Ad Hoc Networks. Elsevier, Volume 30, July 2015, Pages 22–45.

Abstract:
Routing protocols for ad-hoc networks, e.g., the Collection Tree Protocol (CTP), are designed with simple node-local behaviour, but are deployed on testbeds with uncontrollable physical topology; exhaustively verifying the protocol on all possible topologies at design time is not tractable. We obtain topological insights on CTP performance, to answer the question: Which topological patterns cause CTP data routing to fail? We stress-test CTP with a quantitative testing method which searches for topologies using evolutionary algorithms combined with protocol simulation. The method iteratively generates new test topologies, such that the execution of the protocol over these topologies shows increasingly worse data-delivery ratios (DDR). We obtain a large set of example topologies of different network sizes up to 50 nodes, network densities, data rates, table sizes, and radio-frequency noise models, which, although connected, trigger a data delivery of nearly zero. We summarize these topologies into three types of topological problems, the root cause of which is the presence of certain asymmetric links and cycles, combined with a certain size of the routing table. We verify causality, i.e., show that randomly generated topologies having these particular features do cause low DDR in CTP. This testing methodology, while computationally intensive, is sound, fully automated and has better coverage over the corner cases of protocol behaviour than testing a protocol over manually crafted or random topologies.

hide abstract show abstract
Conference Proceedings Black Holes and Revelations: Using Evolutionary Algorithms to Uncover Vulnerabilities in Disruption-Tolerant Networks [pdf] [Springer]

Doina Bucur, Giovanni Iacca, Giovanni Squillero, Alberto Tonda

European Conference on the Applications of Evolutionary and Bio-inspired Computation (Evo* EvoApplications), track EvoComNet: Application of Nature-inspired Techniques for Communication Networks and other Parallel and Distributed Systems. Springer, 2015.

Abstract:
A challenging aspect in open ad hoc networks is their resilience against malicious agents. This is especially true in complex, urban-scale scenarios where numerous moving agents carry mobile devices that create a peer-to-peer network without authentication. A requirement for the proper functioning of such networks is that all the peers act legitimately, forwarding the needed messages, and concurring to the maintenance of the network connectivity. However, few malicious agents may easily exploit the movement patterns in the network to dramatically reduce its performance. We propose a methodology where an evolutionary algorithm evolves the parameters of different malicious agents, determining their types and mobility patterns in order to minimize the data delivery rate and maximize the latency of communication in the network. As a case study, we consider a fine-grained simulation of a large-scale disruption-tolerant network in the city of Venice. By evolving malicious agents, we uncover situations where even a single attacker can hamper the network performance, and we correlate the performance decay to the number of malicious agents.

hide abstract show abstract
Conference proceedings The Tradeoffs between Data Delivery Ratio and Energy Costs in Wireless Sensor Networks: A Multi-Objective Evolutionary Framework for Protocol Analysis [pdf] [ACM]

Doina Bucur, Giovanni Iacca, Giovanni Squillero, Alberto Tonda

Genetic and Evolutionary Computation Conference (GECCO). Sheridan/ACM, 2014.

Abstract:
Wireless sensor network (WSN) routing protocols, e.g., the Collection Tree Protocol (CTP), are designed to adapt in an ad-hoc fashion to the quality of the environment. WSNs thus have high internal dynamics and complex global behavior. Classical techniques for performance evaluation (such as testing or verification) fail to uncover the cases of extreme behavior which are most interesting to designers. We con- tribute a practical framework for performance evaluation of WSN protocols. The framework is based on multi-objective optimization, coupled with protocol simulation and evaluation of performance factors. For evaluation, we consider the two crucial functional and non-functional performance factors of a WSN, respectively: the ratio of data delivery from the network (DDR), and the total energy expenditure of the network (COST). We are able to discover network topological configurations over which CTP has unexpectedly low DDR and/or high COST performance, and expose full Pareto fronts which show what the possible performance tradeoffs for CTP are in terms of these two performance factors. Eventually, Pareto fronts allow us to bound the state space of the WSN, a fact which provides essential knowledge to WSN protocol designers.

hide abstract show abstract
Article The Impact of Topology on Energy Consumption for Collection Tree Protocols: an Experimental Assessment through Evolutionary Computation [pdf] [ScienceDirect]

Doina Bucur, Giovanni Iacca, Giovanni Squillero, Alberto Tonda

Applied Soft Computing (ASOC). Elsevier, Volume 16, March 2014, Pages 210-222.

Abstract:
The analysis of worst-case behavior in wireless sensor networks is an extremely difficult task, due to the complex interactions that characterize the dynamics of these systems. In this paper, we present a new methodology for analyzing the performance of routing protocols used in such networks. The approach exploits a stochastic optimization technique, specifically an evolutionary algorithm, to generate a large, yet tractable, set of critical net- work topologies; such topologies are then used to infer general considerations on the behaviors under analysis. As a case study, we focussed on the energy consumption of two well-known ad-hoc routing protocols for sensor networks: the multi-hop link quality indicator and the collection tree protocol. The evolutionary algorithm started from a set of randomly generated topologies and iteratively enhanced them, maximizing a measure of “how interesting” such topologies are with respect to the analysis. In the second step, starting from the gathered evidence, we were able to define concrete, protocol-independent topological metrics which correlate well with protocols’ poor performances. Finally, we discovered a causal relation between the presence of cycles in a disconnected network, and abnormal network traffic. Such creative processes were made possible by the availability of a set of meaningful topology examples. Both the proposed methodology and the specific results presented here – that is, the new topological metrics and the causal explanation – can be fruitfully reused in different contexts, even beyond wireless sensor networks.

hide abstract show abstract
Conference Proceedings Applying Time Series Analysis and Neighbourhood Voting in a Decentralised Approach for Fault Detection and Classification in WSNs [pdf] [ACM]

Tuan Anh Nguyen, Doina Bucur, Marco Aiello, Kenji Tei

Symposium on Information and Communication Technology (SoICT). pages 234-241, ACM, 2013.

Abstract:
In pervasive computing environments, wireless sensor networks play an important infrastructure role, collecting reliable and accurate context information so that applications are able to provide services to users on demand. In such environments, sensors should be self-adaptive by taking correct decisions based on sensed data in real-time in a decentralised manner; however, sensed data is often faulty. We thus design a decentralised scheme for fault detection and classification in sensor data in which each sensor node does localised fault detection. A combination of neighbourhood voting and time series data analysis techniques are used to detect faults. We also study the comparative accuracy of both the union and the intersection of the two techniques. Then, detected faults are classified into known fault categories. An initial evaluation with SensorScope, an outdoor temperature dataset, confirms that our solution is able to detect and classify faulty readings into four fault types, namely, 1) random, 2) malfunction, 3) bias, and 4) drift with accuracy up to 95%. The results also show that, with the experimental dataset, the time series data analysis technique performs comparable well in most of the cases, whilst in some other cases the sup- port from neighbourhood voting technique and histogram analysis helps our hybrid solution to successfully detects the faults of all types.

hide abstract show abstract
Conference Proceedings Towards Context Consistency in a Rule-Based Activity Recognition Architecture [pdf] [IEEE]

Tuan Anh Nguyen, Viktoriya Degeler, Rosario Contarino, Alexander Lazovik, Doina Bucur, and Marco Aiello

International Symposium on Ubiquitous Intelligence and Autonomic Systems (UIAS), in conjunction with UIC'13: IEEE International Conference on Ubiquitous Intelligence and Computing. IEEE Computer Society Press, 2013.

Abstract:
Human activity recognition (AR) is a crucial research area for intelligent pervasive environments such as energy-saving buildings. In order to gain precise and fine-grained AR results, a system must overcome partial observability of the environment and noisy, imprecise, and corrupted sensor data. In this work, we propose a rule-based AR architecture that effectively handles multiple-user, multiple-area situations, recognizing real-time office activities. The proposed solution is based on an ontological approach, using low-cost, binary, and wireless sensors. We employ context consistency diagrams (CCD) as the key component for fault corrections. CCD is a data structure that provides a mechanism for probabilistic reasoning about the current situation and calculates the most probable situation at each moment of time even with the presence of inconsistencies, conflicts, and ambiguities in available sensor readings. The implementation of the system and its evaluation in a living lab environment show that CCD corrects up to 46.83% of faults in sensor readings, improving overall recognition accuracy by up to 11.07%, thus producing reliable recognition results from unreliable sensor data.

hide abstract show abstract
Conference Proceedings Continuous Correctness of Business Processes against Process Interference [pdf] [IEEE]

Nick Van Beest and Doina Bucur

IEEE International Conference on Service Oriented Computing and Applications (SOCA). Pages 110-117, IEEE, 2013.

Abstract:
In distributed business process support environments, process interference from multiple stakeholders may cause erroneous process outcomes. Existing solutions to detect and correct interference at runtime employ formal verification and the automatic generation of intervention processes at runtime. However, these solutions are limited in their generality: they cannot cope with interference occurring during the entire runtime of the business process, some of which is unknown at design time. In this paper, we present an automated framework for the runtime verification and correction of business processes against data interference, which guarantees continuous correctness of process execution in any distributed environment. The continuous detection of interference during process execution is achieved on-the-fly using temporal verification of the running process together with virtual external data transactions; this identifies a minimal set of process checkpoints where the execution of external transactions would change the outcome of the local process. To achieve continuous correction, runtime checks are made at these process checkpoints, and, whenever interference occurs, an intervention process is generated to correct the local process from its current state. Subsequently, any intervention process is itself continuously verified and corrected, thus guaranteeing correct execution throughout the lifetime of a process. The approach is evaluated on a real case-study from the Dutch e-Government.

hide abstract show abstract
Conference Proceedings An Evolutionary Framework for Routing Protocol Analysis in Wireless Sensor Networks [pdf] [Springer]

Doina Bucur, Giovanni Iacca, Giovanni Squillero, Alberto Tonda

European Conference on the Applications of Evolutionary and Bio-inspired Computation, track EvoComNet: Application of Nature-inspired Techniques for Communication Networks and other Parallel and Distributed Systems. Springer Lecture Notes in Computer Science Volume 7835, pages 1-11, 2013.

Abstract:
Wireless Sensor Networks (WSNs) are widely adopted for applications ranging from surveillance to environmental monitoring. While powerful and relatively inexpensive, they are subject to behavioural faults which make them unreliable. Due to the complex interactions between network nodes, it is difficult to uncover faults in a WSN by resorting to formal techniques for verification and analysis, or to testing. This paper proposes an evolutionary framework to detect anomalous behaviour related to energy consumption in WSN routing protocols. Given a collection protocol, the framework creates candidate topologies and evaluates them through simulation on the basis of metrics measuring the radio activity on nodes. Experimental results using the standard Collection Tree Protocol show that the proposed approach is able to unveil topologies plagued by excessive energy depletion over one or more nodes, and thus could be used as an offline debugging tool to understand and correct the issues before network deployment and during the development of new protocols.

hide abstract show abstract
Conference Proceedings A Survey of Formal Business Process Verification: From Soundness to Variability [pdf]

Heerko Groefsema and Doina Bucur

International Symposium on Business Modeling and Software Design (BMSD). SciTePress, 2013.

Abstract:
Formal verification of business process models is of interest to a number of application areas, including checking for basic process correctness, business compliance, and process variability. A large amount of work on these topics exist, while a comprehensive overview of the field and its directions is lacking. We provide an overview and critical reflections on existing approaches.

hide abstract show abstract
Conference Proceedings Temporal Monitors for TinyOS [pdf] [Springer]

Doina Bucur

International Conference on Runtime Verification (RV). Springer Lecture Notes in Computer Science (LNCS), Volume 7687, pages 96-109, 2013.

Abstract:
Networked embedded systems generally have extremely low visibility of system faults. In this paper, we report on experimenting with online, node-local temporal monitors for networked embedded nodes running the TinyOS operating system and programmed in the nesC language. We instrument the original node software to signal asynchronous atomic events to a local nesC component running a runtime verification algorithm; this checks LTL properties automatically translated into deterministic state-machine monitors and encoded in nesC. We focus on quantifying the added (i) memory and (ii) computational overhead of this embedded checker and identify practical upper bounds with runtime checking on mainstream embedded platforms.

hide abstract show abstract
Conference Proceedings Intelligible TinyOS Sensor Systems: Explanations for Embedded Software [pdf] [Springer]

Doina Bucur

International and Interdisciplinary Conference on Modeling and Using Context. Springer Lecture Notes in Computer Science (LNCS), Volume 6967, pages 54-66, 2011.

Abstract:
As embedded sensing systems are central to developing pervasive, context-aware services, the applications running on these systems should be intelligible to system programmers and to users. Given that sensor systems are programmed in low-level languages, manually writing high-level explanations about their decision model requires knowledge about the system architecture, and is error-prone. We explore the possibility of extracting explanations which are small and expressive, but still preserve bit-level accuracy when needed. We contribute a tool which automatically and soundly generates compact, graphical explanations from sensor software implementation at compile-time. We base our algorithm on the techniques of (i) finite-state machine model extraction from software as used in model checking, and (ii) abstraction of program execution traces. We experiment with extracting explanations from heavyweight, low-level TinyOS applications for a mainstream sensor platform.

hide abstract show abstract
Article On Software Verification for Sensor Nodes [pdf] [ScienceDirect]

Doina Bucur and Marta Kwiatkowska

Journal of Software and Systems (JSS). Elsevier, Volume 84, Issue 10, pages 1693-1707, 2011.

Abstract:
We consider software written for networked, wireless sensor nodes, and specialize software verification techniques for standard C programs in order to locate programming errors in sensor applications before the software's deployment on motes. Ensuring the reliability of sensor applications is challenging: low-level, interrupt-driven code runs without memory protection in dynamic environments. The difficulties lie with (i) being able to automatically extract standard C models out of the particular flavours of embedded C used in sensor programming solutions, and (ii) decreasing the resulting program's state space to a degree that allows practical verification times. We contribute a platform-dependent, OS-independent software verification tool for OS-wide programs written in MSP430 embedded C with asynchronous hardware interrupts. Our tool automatically translates the program into standard C by modelling the MCU's memory map and direct memory access. To emulate the existence of hardware interrupts, calls to hardware interrupt handlers are added, and their occurrence is minimized with a double strategy: a partial-order reduction technique, and a supplementary reachability check to reduce overapproximation. This decreases the program's state space, while preserving program semantics. Safety specifications are written as C assertions embedded in the code. The resulting sequential program is then passed to CBMC, a bounded software verifier for sequential ANSI C. Besides standard errors (e.g., out-of-bounds arrays, null-pointer dereferences), this tool chain is able to verify application-specific assertions, including low-level assertions upon the state of the registers and peripherals. Verification for wireless sensor network applications is an emerging field of research; thus, as a final note, we survey current research on the topic.

hide abstract show abstract
Conference Proceedings Software Verification for TinyOS [pdf] [ACM]

Doina Bucur and Marta Kwiatkowska

ACM/IEEE International Conference on Information Processing in Sensor Networks (IPSN). ACM, pages 400-401, 2010.

Abstract:
We describe the first software tool for the verification of TinyOS 2, MSP430 applications at compile-time. Given assertions upon the state of the sensor node, the tool boundedly explores all program executions and returns to the programmer an error trace leading to any assertion violation. Besides memory-related errors (out-of-bounds arrays, nullpointer dereferences), we verify application-specific assertions, including low-level assertions upon the state of the registers and peripherals.

hide abstract show abstract
Conference Proceedings Bug-Free Sensors: The Automatic Verification of Context-Aware TinyOS Applications [pdf] [Springer]

Doina Bucur and Marta Kwiatkowska

European Conference on Ambient Intelligence (AmI). Springer Lecture Notes in Computer Science (LNCS), Volume 5859, pages 101-105, 2009.

Abstract:
We provide the first tool for verifying the logic of context-aware applications written for the mainstream sensor network operating system TinyOS; we focus on detecting programming errors related to incorrect adaptation to context.

hide abstract show abstract
Conference Proceedings GammaSense: Infrastructureless Positioning using Background Radioactivity [pdf] [Springer]

Doina Bucur and Mikkel Baun Kjaergaard

IEEE European Conference on Smart Sensing and Context (EuroSSC). Springer Lecture Notes in Computer Science (LNCS), Volume 5279, pages 69-82, 2008.

Abstract:
We introduce the harvesting of natural background radioactivity for positioning. Using a standard Geiger-Mueller counter as sensor, we fingerprint the natural levels of gamma radiation with the aim of then roughly pinpointing the position of a client in terms of interfloor, intrafloor, and indoor-versus-outdoor locations. We find that the performance of a machine-learning algorithm in detecting position varies with the building, and is highest for interfloor detection in the case of an old domestic house, while it is highest for intrafloor detection if the floor spans building segments made from different construction materials. Altogether, the technique has lower performance than infrastructure-based localization techniques.

hide abstract show abstract
Book chapter Secure Data Flow in a Calculus for Context Awareness [pdf] [Springer]

Doina Bucur and Mogens Nielsen

Concurrency, Graphs and Models. Springer Lecture Notes in Computer Science (LNCS), Volume 5065, pages 439-456, 2008.

Abstract:
We present a Mobile-Ambients-based process calculus to describe context-aware computing in an infrastructure-based Ubiquitous Computing setting. In our calculus, computing agents can provide and discover contextual information and are owners of security policies. Simple access control to contextual information is not sufficient to insure confidentiality in Global Computing, therefore our security policies regulate agents’ rights to the provision and discovery of contextual information over distributed flows of actions. A type system enforcing security policies by a combination of static and dynamic checking of mobile agents is provided, together with its type soundness.

hide abstract show abstract
Article Resource Discovery in Activity-Based Sensor Networks [pdf] [Springer]

Doina Bucur and Jakob E. Bardram

Special Issue on Pervasive Healthcare. Springer Mobile Networks and Applications (MONET), Volume 12, Numbers 2-3, pages 129-142, 2007.

Abstract:
This paper proposes a service discovery protocol for sensor networks that is specifically tailored for human-centered pervasive environments and scales well to large sensor networks, such as those deployed for medical care in major incidents and hospitals. It uses the high-level concept of computational activities (logical bundles of data and resources) to give sensors in activity-based sensor networks (ABSNs) knowledge about their usage even at the network layer. ABSN redesigns classical service discovery protocols to include a logical structuring of the network for a more applicable discovery scheme. Noting that in practical settings activity-based sensor patches are localized, ABSN designs a fully distributed, hybrid discovery protocol based on extended zone routing protocol (EZRP), proactive in a neighbourhood zone and reactive outside, so that any query among the sensors of one activity is routed through the network with minimum overhead, guided by the bounds of that activity. Compared to EZRP, ABSN lowers the network overhead of the discovery process, while keeping discovery latency close to optimal.

hide abstract show abstract

Doina Bucur

Main Publications

Software and Datasets

Teaching