Histogram
- class Histogram.Histogram(bin_boundaries)[source]
Defines a histogram object.
The histograms can be initialized either with a tuple (hist_min,hist_max,num_bins) or a list/numpy.ndarray containing the bin boundaries, which allows for different bin widths. Multiple histograms can be added and averaged.
Note
It may be important to keep in mind that each bin contains it left edge but not the right edge. If a value is added to a histogram the bin assignment therefore follows:
bins[i-1] <= value < bins[i]
- Parameters:
- bin_boundariestuple
A tuple with three values
(hist_min, hist_max, num_bins)
. This creates an evenly sized histogram in the range[hist_min, hist_max]
divided intonum_bins
bins.- bin_boundarieslist/numpy.ndarray
A list or numpy array that contains all bin edges. This allows for non- uniform bin sizes.
Examples
Creating histograms
To create multiple histograms with the same number and size of bins we can initialize the Histogram object with a tuple (otherwise initialize with a list containing all bin edges). To fill these histograms store them in a variable, the methods
add_value(value)
,add_histogram()
andhistogram()
can be used.1>>> from sparkx.Histogram import Histogram 2>>> 3>>> # Initialize a histogram in the range [0,10] with 10 bins 4>>> histObj = Histogram((0, 10, 10)) 5>>> 6>>> # Add the values [1, 1.2, 3, 5, 6.5, 9, 9] to the histogram 7>>> histObj.add_value([1, 1.2, 3, 5, 6.5, 9, 9]) 8>>> print(histObj.histogram()) 9[0. 2. 0. 1. 0. 1. 1. 0. 0. 2.] 10>>> 11>>> # Add a second histogram and add the values [2, 7, 7.2, 9] 12>>> histObj.add_histogram() 13>>> histObj.add_value([2, 7, 7.2, 9]) 14>>> print(histObj.histogram()) 15[[0. 2. 0. 1. 0. 1. 1. 0. 0. 2.] 16 [0. 0. 1. 0. 0. 0. 0. 2. 0. 1.]] 17>>> 18>>> # Store the histograms in hist as numpy.ndarray 19>>> hist = histObj.histogram()
- Attributes:
- number_of_bins_: int
Number of histogram bins.
- bin_edges_: numpy.ndarray
Array containing the edges of the bins.
- number_of_histograms_: int
Number of created histograms.
- histograms_: numpy.ndarray
Array containing the histograms (might be scaled).
- histograms_raw_count_: numpy.ndarray
Array containing the raw counts of the histograms.
- error_: numpy.ndarray
Array containing the histogram error.
- scaling_: numpy.ndarray
Array containing scaling factors for each bin.
Methods
histogram:
Get the created histogram(s)
histogram_raw_counts:
Get the raw bin counts of the histogram(s).
number_of_histograms:
Get the number of current histograms.
bin_centers: numpy.ndarray
Get the bin centers.
bin_width: numpy.ndarray
Get the bin widths.
bin_bounds_left: numpy.ndarray
Extract the lower bounds of the individual bins.
bin_bounds_right: numpy.ndarray
Extract the upper bounds of the individual bins.
bin_boundaries: numpy.ndarray
Get the bin boundaries.
add_value:
Add one or multiple values to the latest histogram.
add_histogram:
Add a new histogram.
make_density:
Create probability density from last histogram.
average:
Averages over all histograms.
average_weighted:
Performs a weighted average over all histograms.
standard_error:
Get the standard deviation for each bin.
statistical_error:
Statistical error of all histogram bins in all histograms.
set_systematic_error:
Sets the systematic histogram error by hand.
scale_histogram:
Multiply latest histogram with a factor.
set_error:
Set the error for the histogram by hand.
print_histogram:
Print the histogram(s) to the terminal.
write_to_file:
Write one histogram to a csv file.
- Histogram.histogram()[source]
Get the current histogram(s).
- Returns:
- histograms_: numpy.ndarray
Array containing the histogram(s).
- Histogram.histogram_raw_counts()[source]
Get the raw bin counts of the histogram(s), even after the original histograms are scaled or averaged. If weights are used, then they affect the raw counts histogram.
- Returns:
- histograms_raw_count_: numpy.ndarray
Array containing the raw counts of the histogram(s)
- Histogram.number_of_histograms()[source]
Get the number of current histograms.
- Returns:
- number_of_histograms_: int
Number of histograms.
- Histogram.bin_centers()[source]
Get the bin centers.
- Returns:
- numpy.ndarray
Array containing the bin centers.
- Histogram.bin_width()[source]
Get the bin widths.
- Returns:
- numpy.ndarray
Array containing the bin widths.
- Histogram.bin_bounds_left()[source]
Extract the lower bounds of the individual bins.
- Returns:
- numpy.ndarray
Array containing the lower bin boundaries.
- Histogram.bin_bounds_right()[source]
Extract the upper bounds of the individual bins.
- Returns:
- numpy.ndarray
Array containing the upper bin boundaries.
- Histogram.bin_boundaries()[source]
Get the bin boundaries.
- Returns:
- numpy.ndarray
Array containing the bin boundaries.
- Histogram.add_value(value, weight=None)[source]
Add value(s) to the latest histogram.
Different cases, if there is just one number added or a whole list/ array of numbers.
- Parameters:
- value: int, float, np.number, list, numpy.ndarray
Value(s) which are supposed to be added to the histogram instance.
- weight: int, float, np.number, list, numpy.ndarray, optional
Weight(s) associated with the value(s). If provided, it should have the same length as the value parameter.
- Raises:
- TypeError
if the input is not a number or numpy.ndarray or list
- ValueError
if an input value is np.nan
- ValueError
if the input weight has not the same dimension as value
- ValueError
if a weight value is np.nan
- Histogram.add_histogram()[source]
Add a new histogram to the Histogram class instance.
If new values are added to the histogram afterwards, these are added to the last histogram.
- Histogram.make_density()[source]
Make a probability density from the last histogram. The result represents the probability density function in a given bin. It is normalized such that the integral over the whole histogram yields 1. This behavior is similar to the one of numpy histograms.
- Raises:
- ValueError
if there is no histogram available
- ValueError
if the integral over the histogram is zero
- Histogram.average()[source]
Average over all histograms.
When this function is called the previously generated histograms are averaged with the unit weights and they are overwritten by the averaged histogram. The standard error of the averaged histograms is computed.
histogram_raw_counts
is summed over all histograms.scaling_
is set to the value of the first histogram.- Returns:
- Histogram
Returns a Histogram object.
- Histogram.average_weighted(weights)[source]
Weighted average over all histograms.
When this function is called the previously generated histograms are averaged with the given weights and they are overwritten by the averaged histogram. The weighted standard error of the histograms is computed.
histogram_raw_counts
is summed over all histograms.scaling_
is set to the value of the first histogram.- Parameters:
- weights: numpy.ndarray
Array containing a weight for each histogram.
- Returns:
- Histogram
Returns a Histogram object.
- Histogram.standard_error()[source]
Get the standard deviation over all histogram counts for each bin.
- Returns:
- numpy.ndarray
Array containing the standard deviation for each bin.
- Histogram.statistical_error()[source]
Compute the statistical error of all histogram bins for all histograms. This assumes Poisson distributed counts in each bin and independent draws.
- Returns:
- numpy.ndarray
2D Array containing the statistical error (standard deviation) for each bin and histogram.
- Histogram.set_systematic_error(own_error)[source]
Sets the systematic histogram error of the last created histogram by hand.
- Parameters:
- value: list, numpy.ndarray
Values for the systematic uncertainties of the individual bins.
- Histogram.scale_histogram(value)[source]
Scale the latest histogram by a factor.
Multiplies the latest histogram by a number or a list/numpy array with a scaling factor for each bin.
The standard deviation of the histogram(s) is also rescaled by the same factor.
- Parameters:
- value: int, float, np.number, list, numpy.ndarray
Scaling factor for the histogram.
- Histogram.set_error(own_error)[source]
Sets the histogram error by hand. This is helpful for weighted histograms where the weight has also an uncertainty.
This function has to be called after averaging, otherwise the error will be overwritten by the standard error.
- Parameters:
- value: list, numpy.ndarray
Values for the uncertainties of the individual bins.
- Histogram.write_to_file(filename, hist_labels, comment='', columns=None)[source]
Write multiple histograms to a CSV file along with their headers.
- Parameters:
- filenamestr
Name of the output CSV file.
- hist_labelslist of dicts
List containing dictionaries with header labels for each histogram. Provide a list of dictionaries, where each dictionary contains the header labels for the corresponding histogram. If the list contains only one dictionary, then this is used for all histograms.
- The dictionaries should have the following keys:
‘bin_center’: Label for the bin center column.
‘bin_low’: Label for the lower boundary of the bins.
‘bin_high’: Label for the upper boundary of the bins.
‘distribution’: Label for the histogram/distribution.
‘stat_err+’: Label for the statistical error (positive).
‘stat_err-’: Label for the statistical error (negative).
‘sys_err+’: Label for the systematic error (positive).
‘sys_err-’: Label for the systematic error (negative).
Other keys are possible if non-default columns are used.
- commentstr, optional
Additional comment to be included at the beginning of the file. It is possible to give a multi-line comment where each line should start with a ‘#’.
- columnslist of str, optional
- List of columns to include in the output. If None, all columns are included which are:
‘bin_center’: Bin center.
‘bin_low’: Lower bin boundary.
‘bin_high’: Upper bin boundary.
‘distribution’: Histogram value.
‘stat_err+’: Statistical error (positive).
‘stat_err-’: Statistical error (negative).
‘sys_err+’: Systematic error (positive).
‘sys_err-’: Systematic error (negative).
- Raises:
- TypeError
If hist_labels is not a list of dictionaries, or if columns is not a list of strings or contains keys which are not present in ‘hist_labels’.
- ValueError
If the number of histograms is greater than 1, and the number of provided headers in hist_labels does not match the number of histograms. An exception is the case, where the number of histograms is greater than 1 and the number of provided dictionaries is 1. Then the same dictionary is used for all histograms.