@knn_create(cloud:tab)
@knn_rebuild(cloud:tab)
@knn_delete(cloud:tab)
@knn_rnd(dim:int, max_range:numeric, N:int)
knnr_search(cloud:tab, point:tab, n:float)
@knn_rsearch(cloud:tab, point:tab, radius:float)
@knn_scan(cloud:tab, point:tab, n:float, f:fun)
@knn_rscan(cloud:tab, point:tab, radius:float, f:fun)
@knn_combine(cloud:tab, point:tab, n:float, g:fun, v:value)
@knn_rcombine(cloud:tab, point:tab, radius:float, g:fun, v:value)
These functions provides efficient ways to search in a cloud of points the Nearest Neighbors of an arbitrary point.
The cloud is given as a tab of points. A points is represented by a tab
listing its coordinates (floats). Points must live all in the same
euclidean space i.e., the have the same number of coordinates. This
number is the dimension of the underlying space. Functions @knn_...
can handle any space of dimension grether than 1. The distance
used in this space is the _euclidean distance.
The function @knn_rnd(d, b, N)
generates a random cloud
of N points in d dimension in the bounding box [0, b]^d.
Function @knn_create
builds an internal k-d
tree which then used by the
others functions. The argument is returned if the k-d tree can be built,
and <undef>
otherwise. The construction may fails if the
cloud is empty, if the points have not the same dimension or if the
point coordinates are not floats.
The internal k-d tree must be rebuild if the point coordinates are
updated, or if some points are added or deleted to the cloud. This is
done using function @knn_rebuild
or by calling again
@knn_create
on the same tab argument.
The internal k-d tree can be released when there is no more nearest
neighbors queries to process. This is done by calling @knn_delete
on the corresponding cloud.
The six remaining functions are used to retrieve efficiently the nearest neighbors of a point given as the second argument :
-
The search variants return the result in a pait (a tab of two elements). The first element of the pair is a tab listing the indices of teh neighbors in the cloud tab. The second elemnt is a tab listing the distance of the corresponding neighbor to point.
-
The scan variants apply a function f on each neighbor. The number of neigbors is returned. The function f is applyed for each neighbor and takes two arguments: the index of the neigbor in the cloud tab and the distance of this tneighbor to point.
-
The combine variants apply a function g to combine the neighbors., The function g takes three arguments: an accumulator initialzed with value v0, the index of the neigbor in the cloud tab and the distance of this tneighbor to point.
-
The r... variants specify the neighborhood by a radius around a given poiunt.
-
The variants without r specify the neighborhood by looking for the n closest neigbors.
Notices:
-
Search radius and all returned distances are actually squared distances.
-
All functions leave their argument untouched.
-
k-d trees are used to avoid an exhaustive search of the neighbor. They are not always suitable for efficiently finding the nearest neighbour in high-dimensional spaces. As a general rule, the number N of points in the data should be N ≫ 2^d where d is the space dimension. Otherwise, when k-d trees are used with high-dimensional data, most of the points in the tree will be evaluated and the efficiency is no better than exhaustive search.
For example
$cloud := @knn_rnd(2, 100, 456)
$cloud.knn_create()
generates a cloud of 456 points in \mathbb{R}^2 with coordinates is in [0, 100]. Then, the k-d tree is build internally.
It is then possible to make several queries:
$cloud.knn_search([5., 62.], 10) ; returns the 10 closest neigbors of [5., 62.]
$cloud.knn_rsearch([5., 62.], 9.) ; returns neigbors in a radius of 3 around [5., 62.]
remarks than a radious of 3.0 is specified through its square 9.0. The distances to point (5., 62.) given in the result are also squared.
The four others functions can be used to avoid the explicit construction of the query's result:
$cloud.knn_scan([0., 0.],
10,
\$index, $dist.(print "neigborh " $index " is at distance " (@sqrt($dist))))
Using the rcombine function, we can for instance computes the barycenter of the points within a radius of 5 of the origin. The accumulator is a pair [n, p] where n is the number of points in the radius and p sums up the coordinates.
$p := $cloud.knn_rcombine([0., 0.],
25.,
\$v, $index, $dist.([$v[0]+1, $v[1] + $cloud[$index]]),
[0, [0., 0.]])
$barycenter := ($p[0] == 0 ? [0, 0] : $p[1]/$p[0])
If there is no points near enough the origin, we set the barycenter to [0, 0] by convention.
See also Mathematical Functions @abs, @acos, @asin, @atan, @atan2, @between, @bit_and, @bit_or, @bit_shiftl, @bit_shiftr, @ceil, @clear, @cos, @cosh, @exp, @floor, @knn_create, @knn_rebuild, @knn_delete, @knn_rnd, @knn_search, @knn_rsearch, @knn_scan, @knn_rscan, @knn_combine, @knn_rcombine, @log, @log10, @log2, @max, @min, @ode_solve, @pow, @rand, @rand_int, @random, @rnd_bernoulli, @rnd_binomial, @rnd_exponential, @rnd_gamma, @rnd_geometric, @rnd_normal, @rnd_uniform_float, @rnd_uniform_int, @round, @sin, @sinh, @sqrt, @tan, @y0, @y1