Jackknife

class Jackknife.Jackknife(delete_fraction: float, number_samples: int, seed: int = 42)[source]

Delete-d Jackknife class for computation of confidence intervals for many different statistics. The Jackknife method is similar to a bootstrap method, but instead of replacing the data with random samples, it deletes a fraction of the data and computes the statistic on the reduced data.

Literature on the method can be found in:

The code supports parallel processing for the computation of the Jackknife samples.

If the function applied to the data needs multi-dimensional data, the data array is truncated along axis 0.

Note

It is the user’s responsibility to ensure that the function applied to the data is well-defined and that the data array is in the correct input format for the specific function. This code always dilutes the data along axis 0. This class only checks that the function is callable and that the return value is a single number (int or float).

Parameters:
delete_fractionfloat

The fraction of the data to be randomly deleted.

number_samplesint

The number of Jackknife samples to be computed.

seedint, optional

The random seed to be used for the Jackknife computation. The default is 42.

Raises:
ValueError

If delete_fraction is less than 0 or greater than or equal to 1. If number_samples is less than 1.

TypeError

If delete_fraction is not a float. If number_samples is not an integer. If seed is not an integer.

Examples

A demonstration how to the Jackknife class:

1>>> from sparkx.Jackknife import Jackknife
2>>>
3>>> data_Gaussian = np.random.normal(0, 1, 100)
4>>> jackknife = Jackknife(delete_fraction=0.4, number_samples=100)
5>>> jackknife.compute_jackknife_estimates(data_Gaussian, function=np.mean)
Attributes:
delete_fractionfloat

The fraction of the data to be randomly deleted.

number_samplesint

The number of Jackknife samples to be computed.

seedint

The random seed to be used for the Jackknife computation.

Methods

compute_jackknife_estimates:

Compute the Jackknife estimates

Jackknife.compute_jackknife_estimates(data: ~numpy.ndarray, function: ~typing.Callable[[...], ~typing.Any] = <function mean>, num_cores: int | None = None, *args: tuple, **kwargs: ~typing.Any) float[source]

Compute the Jackknife uncertainty estimates for a function applied to a data array. The default function is np.mean, but it can be changed to any other function that accepts a numpy array as input. Multiple other arguments can be passed to the function as args and kwargs.

Parameters:
datanp.ndarray

The data to be used to compute the Jackknife samples. The data is truncated along axis 0.

functionfunction, optional

The function to be applied to the reduced data. The function can accept additional arguments and keyword arguments. The default is np.mean. It has to return a single number (int or float).

num_coresint, optional

The number of cores to be used for parallel processing. The default is None, which means that all available cores will be used.

*argstuple

Additional arguments to be passed to the function.

**kwargsdict

Additional keyword arguments to be passed to the function.

Returns:
float

The Jackknife estimate for the standard deviation of the function applied to the data.

Raises:
ValueError

If delete_n_points is less than 1 (delete_fraction too small).

TypeError

If data is not a numpy array or if function is not callable.

TypeError

If the function does not return a single number (float or int).