Histogram

class Histogram.Histogram(bin_boundaries: Tuple[float, float, int] | List[float] | ndarray)[source]

Defines a histogram object.

The histograms can be initialized either with a tuple (hist_min,hist_max,num_bins) or a list/numpy.ndarray containing the bin boundaries, which allows for different bin widths. Multiple histograms can be added and averaged.

Note

It may be important to keep in mind that each bin contains it left edge but not the right edge. If a value is added to a histogram the bin assignment therefore follows:

bins[i-1] <= value < bins[i]

Parameters:

bin_boundariestuple: A tuple with three values (hist_min, hist_max, num_bins). This creates an evenly sized histogram in the range [hist_min, hist_max] divided into num_bins bins.
bin_boundarieslist/numpy.ndarray: A list or numpy array that contains all bin edges. This allows for non- uniform bin sizes.

Notes

Two Histogram objects can be added together. The resulting Histogram object will have the same bin edges as the original objects, and the values are added bin-wise. The errors are added in quadrature. All relevant arrays must have the same shape, otherwise a ValueError is raised. The scaling array is reset to ones.

Examples

Creating histograms

To create multiple histograms with the same number and size of bins we can initialize the Histogram object with a tuple (otherwise initialize with a list containing all bin edges). To fill these histograms store them in a variable, the methods add_value(value), add_histogram() and histogram() can be used.

>>> from sparkx.Histogram import Histogram
>>>
>>> # Initialize a histogram in the range [0,10] with 10 bins
>>> histObj = Histogram((0, 10, 10))
>>>
>>> # Add the values [1, 1.2, 3, 5, 6.5, 9, 9] to the histogram
>>> histObj.add_value([1, 1.2, 3, 5, 6.5, 9, 9])
>>> print(histObj.histogram())
[0. 2. 0. 1. 0. 1. 1. 0. 0. 2.]
>>>
>>> # Add a second histogram and add the values [2, 7, 7.2, 9]
>>> histObj.add_histogram()
>>> histObj.add_value([2, 7, 7.2, 9])
>>> print(histObj.histogram())
[[0. 2. 0. 1. 0. 1. 1. 0. 0. 2.]
 [0. 0. 1. 0. 0. 0. 0. 2. 0. 1.]]
>>>
>>> # Store the histograms in hist as numpy.ndarray
>>> hist = histObj.histogram()

Attributes:

number_of_bins_: int: Number of histogram bins.
bin_edges_: numpy.ndarray: Array containing the edges of the bins.
number_of_histograms_: int: Number of created histograms.
histograms_: numpy.ndarray: Array containing the histograms (might be scaled).
histograms_raw_count_: numpy.ndarray: Array containing the raw counts of the histograms.
error_: numpy.ndarray: Array containing the histogram error.
scaling_: numpy.ndarray: Array containing scaling factors for each bin.

Methods

histogram:	Get the created histogram(s)
histogram_raw_counts:	Get the raw bin counts of the histogram(s).
number_of_histograms:	Get the number of current histograms.
bin_centers: numpy.ndarray	Get the bin centers.
bin_width: numpy.ndarray	Get the bin widths.
bin_bounds_left: numpy.ndarray	Extract the lower bounds of the individual bins.
bin_bounds_right: numpy.ndarray	Extract the upper bounds of the individual bins.
bin_boundaries: numpy.ndarray	Get the bin boundaries.
add_value:	Add one or multiple values to the latest histogram.
add_histogram:	Add a new histogram.
make_density:	Create probability density from last histogram.
average:	Averages over all histograms.
average_weighted:	Performs a weighted average over all histograms.
standard_error:	Get the standard deviation for each bin.
statistical_error:	Statistical error of all histogram bins in all histograms.
set_systematic_error:	Sets the systematic histogram error by hand.
scale_histogram:	Multiply latest histogram with a factor.
set_error:	Set the error for the histogram by hand.
print_histogram:	Print the histogram(s) to the terminal.
write_to_file:	Write one histogram to a csv file.

Histogram.histogram() → ndarray[source]

Get the current histogram(s).

Returns:

histograms_: numpy.ndarray: Array containing the histogram(s).

Histogram.histogram_raw_counts() → ndarray[source]

Get the raw bin counts of the histogram(s), even after the original histograms are scaled or averaged. If weights are used, then they affect the raw counts histogram.

Returns:

histograms_raw_count_: numpy.ndarray: Array containing the raw counts of the histogram(s)

Histogram.number_of_histograms() → int[source]

Get the number of current histograms.

Returns:

number_of_histograms_: int: Number of histograms.

Histogram.bin_centers() → ndarray[source]

Get the bin centers.

Returns:

numpy.ndarray: Array containing the bin centers.

Histogram.bin_width() → ndarray[source]

Get the bin widths.

Returns:

numpy.ndarray: Array containing the bin widths.

Histogram.bin_bounds_left() → ndarray[source]

Extract the lower bounds of the individual bins.

Returns:

numpy.ndarray: Array containing the lower bin boundaries.

Histogram.bin_bounds_right() → ndarray[source]

Extract the upper bounds of the individual bins.

Returns:

numpy.ndarray: Array containing the upper bin boundaries.

Histogram.bin_boundaries() → ndarray[source]

Get the bin boundaries.

Returns:

numpy.ndarray: Array containing the bin boundaries.

Add value(s) to the latest histogram.

Different cases, if there is just one number added or a whole list/array of numbers.

Parameters:

value: int, float, np.number, list, numpy.ndarray: Value(s) which are supposed to be added to the histogram instance.
weight: int, float, np.number, list, numpy.ndarray, optional: Weight(s) associated with the value(s). If provided, it should have the same length as the value parameter.

Raises:

TypeError: if the input is not a number or numpy.ndarray or list
ValueError: if an input value is np.nan
ValueError: if the input weight has not the same dimension as value
ValueError: if a weight value is np.nan

Histogram.add_histogram() → Histogram[source]

Add a new histogram to the Histogram class instance.

If new values are added to the histogram afterwards, these are added to the last histogram.

Histogram.make_density() → None[source]

Make a probability density from the last histogram. The result represents the probability density function in a given bin. It is normalized such that the integral over the whole histogram yields 1. This behavior is similar to the one of numpy histograms.

Raises:

ValueError: if there is no histogram available
ValueError: if the integral over the histogram is zero

Histogram.average() → Histogram[source]

Average over all histograms.

When this function is called the previously generated histograms are averaged with the unit weights and they are overwritten by the averaged histogram. The standard error of the averaged histograms is computed. histogram_raw_counts is summed over all histograms. scaling_ is set to the value of the first histogram.

Returns:

Histogram: Returns a Histogram object.

Histogram.average_weighted(weights: ndarray) → Histogram[source]

Weighted average over all histograms.

When this function is called the previously generated histograms are averaged with the given weights and they are overwritten by the averaged histogram. The weighted standard error of the histograms is computed. histogram_raw_counts is summed over all histograms. scaling_ is set to the value of the first histogram.

Parameters:

weights: numpy.ndarray: Array containing a weight for each histogram.

Returns:

Histogram: Returns a Histogram object.

Histogram.standard_error() → ndarray[source]

Get the standard deviation over all histogram counts for each bin.

Returns:

numpy.ndarray: Array containing the standard deviation for each bin.

Histogram.statistical_error() → ndarray[source]

Compute the statistical error of all histogram bins for all histograms. This assumes Poisson distributed counts in each bin and independent draws.

Returns:

numpy.ndarray: 2D Array containing the statistical error (standard deviation) for each bin and histogram.

Histogram.set_systematic_error(own_error: List[float] | ndarray) → None[source]

Sets the systematic histogram error of the last created histogram by hand.

Parameters:

value: list, numpy.ndarray: Values for the systematic uncertainties of the individual bins.

Histogram.scale_histogram(value: int | float | number | List[float] | ndarray) → None[source]

Scale the latest histogram by a factor.

Multiplies the latest histogram by a number or a list/numpy array with a scaling factor for each bin.

The standard deviation of the histogram(s) is also rescaled by the same factor.

Parameters:

value: int, float, np.number, list, numpy.ndarray: Scaling factor for the histogram.

Histogram.set_error(own_error: List[float] | ndarray) → None[source]

Sets the histogram error by hand. This is helpful for weighted histograms where the weight has also an uncertainty.

This function has to be called after averaging, otherwise the error will be overwritten by the standard error.

Parameters:

value: list, numpy.ndarray: Values for the uncertainties of the individual bins.

Histogram.print_histogram() → None[source]: Print the histograms to the terminal.

Histogram.write_to_file(filename: str, hist_labels: List[Dict[str, str]], comment: str = '', columns: List[str] | None = None) → None[source]

Write multiple histograms to a CSV file along with their headers.

Parameters:

filenamestr

Name of the output CSV file.

hist_labelslist of dicts

List containing dictionaries with header labels for each histogram. Provide a list of dictionaries, where each dictionary contains the header labels for the corresponding histogram. If the list contains only one dictionary, then this is used for all histograms.

The dictionaries should have the following keys:

‘bin_center’: Label for the bin center column.
‘bin_low’: Label for the lower boundary of the bins.
‘bin_high’: Label for the upper boundary of the bins.
‘distribution’: Label for the histogram/distribution.
‘stat_err+’: Label for the statistical error (positive).
‘stat_err-’: Label for the statistical error (negative).
‘sys_err+’: Label for the systematic error (positive).
‘sys_err-’: Label for the systematic error (negative).

Other keys are possible if non-default columns are used.

commentstr, optional

Additional comment to be included at the beginning of the file. It is possible to give a multi-line comment where each line should start with a ‘#’.

columnslist of str, optional

List of columns to include in the output. If None, all columns are included which are:

‘bin_center’: Bin center.
‘bin_low’: Lower bin boundary.
‘bin_high’: Upper bin boundary.
‘distribution’: Histogram value.
‘stat_err+’: Statistical error (positive).
‘stat_err-’: Statistical error (negative).
‘sys_err+’: Systematic error (positive).
‘sys_err-’: Systematic error (negative).

Raises:

TypeError: If hist_labels is not a list of dictionaries, or if columns is not a list of strings or contains keys which are not present in ‘hist_labels’.
ValueError: If the number of histograms is greater than 1, and the number of provided headers in hist_labels does not match the number of histograms. An exception is the case, where the number of histograms is greater than 1 and the number of provided dictionaries is 1. Then the same dictionary is used for all histograms.