Histogram

class Histogram.Histogram(bin_boundaries: Tuple[float, float, int] | List[float] | ndarray)[source]

Defines a histogram object.

The histograms can be initialized either with a tuple (hist_min,hist_max,num_bins) or a list/numpy.ndarray containing the bin boundaries, which allows for different bin widths. Multiple histograms can be added and averaged.

Note

It may be important to keep in mind that each bin contains it left edge but not the right edge. If a value is added to a histogram the bin assignment therefore follows:

bins[i-1] <= value < bins[i]

Parameters:
bin_boundariestuple

A tuple with three values (hist_min, hist_max, num_bins). This creates an evenly sized histogram in the range [hist_min, hist_max] divided into num_bins bins.

bin_boundarieslist/numpy.ndarray

A list or numpy array that contains all bin edges. This allows for non- uniform bin sizes.

Examples

Creating histograms

To create multiple histograms with the same number and size of bins we can initialize the Histogram object with a tuple (otherwise initialize with a list containing all bin edges). To fill these histograms store them in a variable, the methods add_value(value), add_histogram() and histogram() can be used.

 1>>> from sparkx.Histogram import Histogram
 2>>>
 3>>> # Initialize a histogram in the range [0,10] with 10 bins
 4>>> histObj = Histogram((0, 10, 10))
 5>>>
 6>>> # Add the values [1, 1.2, 3, 5, 6.5, 9, 9] to the histogram
 7>>> histObj.add_value([1, 1.2, 3, 5, 6.5, 9, 9])
 8>>> print(histObj.histogram())
 9[0. 2. 0. 1. 0. 1. 1. 0. 0. 2.]
10>>>
11>>> # Add a second histogram and add the values [2, 7, 7.2, 9]
12>>> histObj.add_histogram()
13>>> histObj.add_value([2, 7, 7.2, 9])
14>>> print(histObj.histogram())
15[[0. 2. 0. 1. 0. 1. 1. 0. 0. 2.]
16 [0. 0. 1. 0. 0. 0. 0. 2. 0. 1.]]
17>>>
18>>> # Store the histograms in hist as numpy.ndarray
19>>> hist = histObj.histogram()
Attributes:
number_of_bins_: int

Number of histogram bins.

bin_edges_: numpy.ndarray

Array containing the edges of the bins.

number_of_histograms_: int

Number of created histograms.

histograms_: numpy.ndarray

Array containing the histograms (might be scaled).

histograms_raw_count_: numpy.ndarray

Array containing the raw counts of the histograms.

error_: numpy.ndarray

Array containing the histogram error.

scaling_: numpy.ndarray

Array containing scaling factors for each bin.

Methods

histogram:

Get the created histogram(s)

histogram_raw_counts:

Get the raw bin counts of the histogram(s).

number_of_histograms:

Get the number of current histograms.

bin_centers: numpy.ndarray

Get the bin centers.

bin_width: numpy.ndarray

Get the bin widths.

bin_bounds_left: numpy.ndarray

Extract the lower bounds of the individual bins.

bin_bounds_right: numpy.ndarray

Extract the upper bounds of the individual bins.

bin_boundaries: numpy.ndarray

Get the bin boundaries.

add_value:

Add one or multiple values to the latest histogram.

add_histogram:

Add a new histogram.

make_density:

Create probability density from last histogram.

average:

Averages over all histograms.

average_weighted:

Performs a weighted average over all histograms.

standard_error:

Get the standard deviation for each bin.

statistical_error:

Statistical error of all histogram bins in all histograms.

set_systematic_error:

Sets the systematic histogram error by hand.

scale_histogram:

Multiply latest histogram with a factor.

set_error:

Set the error for the histogram by hand.

print_histogram:

Print the histogram(s) to the terminal.

write_to_file:

Write one histogram to a csv file.

Histogram.histogram() ndarray[source]

Get the current histogram(s).

Returns:
histograms_: numpy.ndarray

Array containing the histogram(s).

Histogram.histogram_raw_counts() ndarray[source]

Get the raw bin counts of the histogram(s), even after the original histograms are scaled or averaged. If weights are used, then they affect the raw counts histogram.

Returns:
histograms_raw_count_: numpy.ndarray

Array containing the raw counts of the histogram(s)

Histogram.number_of_histograms() int[source]

Get the number of current histograms.

Returns:
number_of_histograms_: int

Number of histograms.

Histogram.bin_centers() ndarray[source]

Get the bin centers.

Returns:
numpy.ndarray

Array containing the bin centers.

Histogram.bin_width() ndarray[source]

Get the bin widths.

Returns:
numpy.ndarray

Array containing the bin widths.

Histogram.bin_bounds_left() ndarray[source]

Extract the lower bounds of the individual bins.

Returns:
numpy.ndarray

Array containing the lower bin boundaries.

Histogram.bin_bounds_right() ndarray[source]

Extract the upper bounds of the individual bins.

Returns:
numpy.ndarray

Array containing the upper bin boundaries.

Histogram.bin_boundaries() ndarray[source]

Get the bin boundaries.

Returns:
numpy.ndarray

Array containing the bin boundaries.

Histogram.add_value(value: float | List[float] | ndarray, weight: float | List[float] | ndarray | None = None) None[source]

Add value(s) to the latest histogram.

Different cases, if there is just one number added or a whole list/array of numbers.

Parameters:
value: int, float, np.number, list, numpy.ndarray

Value(s) which are supposed to be added to the histogram instance.

weight: int, float, np.number, list, numpy.ndarray, optional

Weight(s) associated with the value(s). If provided, it should have the same length as the value parameter.

Raises:
TypeError

if the input is not a number or numpy.ndarray or list

ValueError

if an input value is np.nan

ValueError

if the input weight has not the same dimension as value

ValueError

if a weight value is np.nan

Histogram.add_histogram() Histogram[source]

Add a new histogram to the Histogram class instance.

If new values are added to the histogram afterwards, these are added to the last histogram.

Histogram.make_density() None[source]

Make a probability density from the last histogram. The result represents the probability density function in a given bin. It is normalized such that the integral over the whole histogram yields 1. This behavior is similar to the one of numpy histograms.

Raises:
ValueError

if there is no histogram available

ValueError

if the integral over the histogram is zero

Histogram.average() Histogram[source]

Average over all histograms.

When this function is called the previously generated histograms are averaged with the unit weights and they are overwritten by the averaged histogram. The standard error of the averaged histograms is computed. histogram_raw_counts is summed over all histograms. scaling_ is set to the value of the first histogram.

Returns:
Histogram

Returns a Histogram object.

Histogram.average_weighted(weights: ndarray) Histogram[source]

Weighted average over all histograms.

When this function is called the previously generated histograms are averaged with the given weights and they are overwritten by the averaged histogram. The weighted standard error of the histograms is computed. histogram_raw_counts is summed over all histograms. scaling_ is set to the value of the first histogram.

Parameters:
weights: numpy.ndarray

Array containing a weight for each histogram.

Returns:
Histogram

Returns a Histogram object.

Histogram.standard_error() ndarray[source]

Get the standard deviation over all histogram counts for each bin.

Returns:
numpy.ndarray

Array containing the standard deviation for each bin.

Histogram.statistical_error() ndarray[source]

Compute the statistical error of all histogram bins for all histograms. This assumes Poisson distributed counts in each bin and independent draws.

Returns:
numpy.ndarray

2D Array containing the statistical error (standard deviation) for each bin and histogram.

Histogram.set_systematic_error(own_error: List[float] | ndarray) None[source]

Sets the systematic histogram error of the last created histogram by hand.

Parameters:
value: list, numpy.ndarray

Values for the systematic uncertainties of the individual bins.

Histogram.scale_histogram(value: int | float | number | List[float] | ndarray) None[source]

Scale the latest histogram by a factor.

Multiplies the latest histogram by a number or a list/numpy array with a scaling factor for each bin.

The standard deviation of the histogram(s) is also rescaled by the same factor.

Parameters:
value: int, float, np.number, list, numpy.ndarray

Scaling factor for the histogram.

Histogram.set_error(own_error: List[float] | ndarray) None[source]

Sets the histogram error by hand. This is helpful for weighted histograms where the weight has also an uncertainty.

This function has to be called after averaging, otherwise the error will be overwritten by the standard error.

Parameters:
value: list, numpy.ndarray

Values for the uncertainties of the individual bins.

Histogram.print_histogram() None[source]

Print the histograms to the terminal.

Histogram.write_to_file(filename: str, hist_labels: List[Dict[str, str]], comment: str = '', columns: List[str] | None = None) None[source]

Write multiple histograms to a CSV file along with their headers.

Parameters:
filenamestr

Name of the output CSV file.

hist_labelslist of dicts

List containing dictionaries with header labels for each histogram. Provide a list of dictionaries, where each dictionary contains the header labels for the corresponding histogram. If the list contains only one dictionary, then this is used for all histograms.

The dictionaries should have the following keys:
  • ‘bin_center’: Label for the bin center column.

  • ‘bin_low’: Label for the lower boundary of the bins.

  • ‘bin_high’: Label for the upper boundary of the bins.

  • ‘distribution’: Label for the histogram/distribution.

  • ‘stat_err+’: Label for the statistical error (positive).

  • ‘stat_err-’: Label for the statistical error (negative).

  • ‘sys_err+’: Label for the systematic error (positive).

  • ‘sys_err-’: Label for the systematic error (negative).

Other keys are possible if non-default columns are used.

commentstr, optional

Additional comment to be included at the beginning of the file. It is possible to give a multi-line comment where each line should start with a ‘#’.

columnslist of str, optional
List of columns to include in the output. If None, all columns are included which are:
  • ‘bin_center’: Bin center.

  • ‘bin_low’: Lower bin boundary.

  • ‘bin_high’: Upper bin boundary.

  • ‘distribution’: Histogram value.

  • ‘stat_err+’: Statistical error (positive).

  • ‘stat_err-’: Statistical error (negative).

  • ‘sys_err+’: Systematic error (positive).

  • ‘sys_err-’: Systematic error (negative).

Raises:
TypeError

If hist_labels is not a list of dictionaries, or if columns is not a list of strings or contains keys which are not present in ‘hist_labels’.

ValueError

If the number of histograms is greater than 1, and the number of provided headers in hist_labels does not match the number of histograms. An exception is the case, where the number of histograms is greater than 1 and the number of provided dictionaries is 1. Then the same dictionary is used for all histograms.