Jackknife
- class Jackknife.Jackknife(delete_fraction: float, number_samples: int, seed: int = 42)[source]
Delete-d Jackknife class for computation of confidence intervals for many different statistics. The Jackknife method is similar to a bootstrap method, but instead of replacing the data with random samples, it deletes a fraction of the data and computes the statistic on the reduced data.
Literature on the method can be found in:
Avery McIntosh “The Jackknife Estimation Method” arXiv:1606.00497.
Shao and D. Tu “The Jackknife and Bootstrap” Springer Series in Statistics, 0172-739, 1995.
The code supports parallel processing for the computation of the Jackknife samples.
If the function applied to the data needs multi-dimensional data, the data array is truncated along axis 0.
Note
It is the user’s responsibility to ensure that the function applied to the data is well-defined and that the data array is in the correct input format for the specific function. This code always dilutes the data along axis 0. This class only checks that the function is callable and that the return value is a single number (int or float).
- Parameters:
- delete_fractionfloat
The fraction of the data to be randomly deleted.
- number_samplesint
The number of Jackknife samples to be computed.
- seedint, optional
The random seed to be used for the Jackknife computation. The default is 42.
- Raises:
- ValueError
If
delete_fraction
is less than 0 or greater than or equal to 1. Ifnumber_samples
is less than 1.- TypeError
If
delete_fraction
is not a float. Ifnumber_samples
is not an integer. Ifseed
is not an integer.
Examples
A demonstration how to the Jackknife class:
1>>> from sparkx.Jackknife import Jackknife 2>>> 3>>> data_Gaussian = np.random.normal(0, 1, 100) 4>>> jackknife = Jackknife(delete_fraction=0.4, number_samples=100) 5>>> jackknife.compute_jackknife_estimates(data_Gaussian, function=np.mean)
- Attributes:
- delete_fractionfloat
The fraction of the data to be randomly deleted.
- number_samplesint
The number of Jackknife samples to be computed.
- seedint
The random seed to be used for the Jackknife computation.
Methods
compute_jackknife_estimates:
Compute the Jackknife estimates
- Jackknife.compute_jackknife_estimates(data: ~numpy.ndarray, function: ~typing.Callable[[...], ~typing.Any] = <function mean>, num_cores: int | None = None, *args: tuple, **kwargs: ~typing.Any) float [source]
Compute the Jackknife uncertainty estimates for a function applied to a data array. The default function is
np.mean
, but it can be changed to any other function that accepts a numpy array as input. Multiple other arguments can be passed to the function as args and kwargs.- Parameters:
- datanp.ndarray
The data to be used to compute the Jackknife samples. The data is truncated along axis 0.
- functionfunction, optional
The function to be applied to the reduced data. The function can accept additional arguments and keyword arguments. The default is np.mean. It has to return a single number (int or float).
- num_coresint, optional
The number of cores to be used for parallel processing. The default is None, which means that all available cores will be used.
- *argstuple
Additional arguments to be passed to the function.
- **kwargsdict
Additional keyword arguments to be passed to the function.
- Returns:
- float
The Jackknife estimate for the standard deviation of the function applied to the data.
- Raises:
- ValueError
If
delete_n_points
is less than 1 (delete_fraction
too small).- TypeError
If data is not a numpy array or if function is not callable.
- TypeError
If the function does not return a single number (float or int).