We propose a framework to infer influences between agents in a network using only observed time series.The framework is general---it does not require any particular class of models for the dynamics.It includes graphical models to depict influences in the network, algorithms to identify and approximate the graphs, and techniques to estimate directed information, an information theoretic quantity that measures influence, from data.We demonstrate the utility of the methods by identifying influences between neurons in a primate as well as between news agencies and users in the Twitter network.We introduce two graphical models to concisely represent causal influences between agents in a network.The first, the minimal generative model graph, reflects a minimal state space description of relationships.The second, the directed information graph, is a statistical approach similar to conventional graphical models and uses directed information to generalize Granger causality.Although they are motivated differently, we show that under minimal assumptions, the graphs are equivalent. In order to identify the underlying graph, we present several algorithms.In general, joint statistics of the whole network are needed.An algorithm that uses the minimal-dimension statistics necessary when upper bounds on the in-degrees are known is presented.In the event that the upper bounds are not valid, the result is nonetheless an optimal approximation.An adaptive algorithm is introduced that uses near minimal-dimension statistics but does not require assumptions on the in-degree bound.Several algorithms to optimally approximate the graph are proposed.The quality of an approximation is measured by Kullback-Leibler divergence between the full joint distribution and the distribution induced by the approximation.The first class of approximations are directed trees.We then discuss algorithms to find the best connected and unconstrained approximations that have user-specified in-degrees to incorporate more dynamics.A greedy search algorithm is shown to identify near-optimal approximations of these classes.The algorithms require calculations of directed information.For the setting when directed information is estimated from data, we characterize the sample-complexity of two plug-in directed information estimators.Their performance is similar to standard results for statistical estimation with i.i.d.~data.When point estimates of directed information are not reliable, we compute confidence intervals.Furthermore, we propose algorithms that use confidence intervals to identify graph approximations that are robust to estimation error.We also propose a consistent parametric estimation technique analogous to the asymptotic equipartition property.Last, we demonstrate the effectiveness of the proposed algorithms through simulations and data analysis.The methods identify influences between neurons in a primate which give rise to observed regional information transfer observed by a collaborator.The framework also identifies which news agencies influence which users in the Twitter network by analyzing only tweet times.The algorithms determine influences with high precision.
【 预 览 】
附件列表
Files
Size
Format
View
Identification and approximation of the structure of networks of stochastic processes