## Foundations of Data Science

March 2019 , Volume 1 , Issue 1

Select all articles

Export/Reference:

*+*[Abstract](3394)

*+*[HTML](1646)

*+*[PDF](3141.49KB)

**Abstract:**

For data sampled from an arbitrary density on a manifold embedded in Euclidean space, the *Continuous k*-*Nearest Neighbors* (CkNN) graph construction is introduced. It is shown that CkNN is geometrically consistent in the sense that under certain conditions, the unnormalized graph Laplacian converges to the Laplace-Beltrami operator, spectrally as well as pointwise. It is proved for compact (and conjectured for noncompact) manifolds that CkNN is the unique unweighted construction that yields a geometry consistent with the connected components of the underlying manifold in the limit of large data. Thus CkNN produces a single graph that captures all topological features simultaneously, in contrast to persistent homology, which represents each homology generator at a separate scale. As applications we derive a new fast clustering algorithm and a method to identify patterns in natural images topologically. Finally, we conjecture that CkNN is topologically consistent, meaning that the homology of the Vietoris-Rips complex (implied by the graph Laplacian) converges to the homology of the underlying manifold (implied by the Laplace-de Rham operators) in the limit of large data.

*+*[Abstract](2369)

*+*[HTML](1013)

*+*[PDF](1121.17KB)

**Abstract:**

The aim of this paper is to bring together recent developments in Bayesian generalised linear mixed models and geostatistics. We focus on approximate methods on both areas. A technique known as full-scale approximation, proposed by Sang and Huang (2012) for improving the computational drawbacks of large geostatistical data, is incorporated into the INLA methodology, used for approximate Bayesian inference. We also discuss how INLA can be used for approximating the posterior distribution of transformations of parameters, useful for practical applications. Issues regarding the choice of the parameters of the approximation such as the knots and taper range are also addressed. Emphasis is given in applications in the context of disease mapping by illustrating the methodology for modelling the *loa loa* prevalence in Cameroon and malaria in the Gambia.

*+*[Abstract](2529)

*+*[HTML](1545)

*+*[PDF](935.96KB)

**Abstract:**

Multivariate stochastic volatility models are a popular and well-known class of models in the analysis of financial time series because of their abilities to capture the important stylized facts of financial returns data. We consider the problems of filtering distribution estimation and also marginal likelihood calculation for multivariate stochastic volatility models with cross-leverage effects in the high dimensional case, that is when the number of financial time series that we analyze simultaneously (denoted by

*+*[Abstract](4717)

*+*[HTML](1010)

*+*[PDF](1089.46KB)

**Abstract:**

Kidney Paired Donation (KPD) is a system whereby incompatible patient-donor pairs (PD pairs) are entered into a pool to find compatible cyclic kidney exchanges where each pair gives and receives a kidney. The donation allocation decision problem for a KPD pool has traditionally been viewed within an economic theory and integer-programming framework. While previous allocation schema work well to donate the maximum number of kidneys at a specific time, certain subgroups of patients are rarely matched in such an exchange. Consequently, these methods lead to systematic inequity in the exchange, where many patients are rejected a kidney repeatedly. Our goal is to investigate inequity within the distribution of kidney allocation among patients, and to present an algorithm which minimizes allocation disparities. The method presented is inspired by cohomology and describes the cyclic structure in a kidney exchange efficiently; this structure is then used to search for an equitable kidney allocation. Another key result of our approach is a score function defined on PD pairs which measures cycle disparity within a KPD pool; i.e., this function measures the relative chance for each PD pair to take part in the kidney exchange if cycles are chosen uniformly. Specifically, we show that PD pairs with underdemanded donors or highly sensitized patients have lower scores than typical PD pairs. Furthermore, our results demonstrate that PD pair score and the chance to obtain a kidney are positively correlated when allocation is done by utility-optimal integer programming methods. In contrast, the chance to obtain a kidney through our method is independent of score, and thus unbiased in this regard.

## Readers

## Authors

## Editors

## Referees

## Librarians

## Email Alert

Add your name and e-mail address to receive news of forthcoming issues of this journal:

[Back to Top]