Jackknife

class Jackknife.Jackknife(delete_fraction: float, number_samples: int, seed: int = 42)[source]

Delete-d Jackknife class for computation of confidence intervals for many different statistics. The Jackknife method is similar to a bootstrap method, but instead of replacing the data with random samples, it deletes a fraction of the data and computes the statistic on the reduced data.

Literature on the method can be found in:

Avery McIntosh “The Jackknife Estimation Method” arXiv:1606.00497.
1. Shao and D. Tu “The Jackknife and Bootstrap” Springer Series in Statistics, 0172-739, 1995.

The code supports parallel processing for the computation of the Jackknife samples.

If the function applied to the data needs multi-dimensional data, the data array is truncated along axis 0.

Note

It is the user’s responsibility to ensure that the function applied to the data is well-defined and that the data array is in the correct input format for the specific function. This code always dilutes the data along axis 0. This class only checks that the function is callable and that the return value is a single number (int or float).

Parameters:

delete_fractionfloat: The fraction of the data to be randomly deleted.
number_samplesint: The number of Jackknife samples to be computed.
seedint, optional: The random seed to be used for the Jackknife computation. The default is 42.

Raises:

ValueError: If delete_fraction is less than 0 or greater than or equal to 1. If number_samples is less than 1.
TypeError: If delete_fraction is not a float. If number_samples is not an integer. If seed is not an integer.

Examples

A demonstration how to the Jackknife class:

>>> from sparkx.Jackknife import Jackknife
>>>
>>> data_Gaussian = np.random.normal(0, 1, 100)
>>> jackknife = Jackknife(delete_fraction=0.4, number_samples=100)
>>> jackknife.compute_jackknife_estimates(data_Gaussian, function=np.mean)

Attributes:

delete_fractionfloat: The fraction of the data to be randomly deleted.
number_samplesint: The number of Jackknife samples to be computed.
seedint: The random seed to be used for the Jackknife computation.

Methods

compute_jackknife_estimates:

Compute the Jackknife estimates

Jackknife.compute_jackknife_estimates(data: ~numpy.ndarray, function: ~typing.Callable[[...], ~typing.Any] = <function mean>, num_cores: int | None = None, *args: tuple, **kwargs: ~typing.Any) → float[source]

Compute the Jackknife uncertainty estimates for a function applied to a data array. The default function is np.mean, but it can be changed to any other function that accepts a numpy array as input. Multiple other arguments can be passed to the function as args and kwargs.

Parameters:

datanp.ndarray: The data to be used to compute the Jackknife samples. The data is truncated along axis 0.
functionfunction, optional: The function to be applied to the reduced data. The function can accept additional arguments and keyword arguments. The default is np.mean. It has to return a single number (int or float).
num_coresint, optional: The number of cores to be used for parallel processing. The default is None, which means that all available cores will be used.
*argstuple: Additional arguments to be passed to the function.
**kwargsdict: Additional keyword arguments to be passed to the function.

Returns:

float: The Jackknife estimate for the standard deviation of the function applied to the data.

Raises:

ValueError: If delete_n_points is less than 1 (delete_fraction too small).
TypeError: If data is not a numpy array or if function is not callable.
TypeError: If the function does not return a single number (float or int).