correctionlib
The purpose of this library is to provide a well-structured JSON data format for a wide variety of ad-hoc correction factors encountered in a typical HEP analysis and a companion evaluation tool suitable for use in C++ and python programs. Here we restrict our definition of correction factors to a class of functions with scalar inputs that produce a scalar output.
In python, the function signature is:
from typing import Union
def f(*args: Union[str,int,float]) -> float:
return ...
In C++, the evaluator implements this currently as:
double Correction::evaluate(const std::vector<std::variant<int, double, std::string>>& values) const;
The supported function classes include:
multi-dimensional binned lookups;
binned lookups pointing to multi-argument formulas with a restricted math function set (
exp
,sqrt
, etc.);categorical (string or integer enumeration) maps;
input transforms (updating one input value in place); and
compositions of the above.
Each function type is represented by a “node” in a call graph and holds all of its parameters in a JSON structure, described by the JSON schema. Possible future extension nodes might include weigted sums (which, when composed with the others, could represent a BDT) and perhaps simple MLPs.
The tool should provide:
standardized, versioned JSON schemas;
forward-porting tools (to migrate data written in older schema versions); and
a well-optimized C++ evaluator and python bindings (with numpy vectorization support).
This tool will definitely not provide:
support for
TLorentzVector
or other object-type inputs (such tools should be written as a higher-level tool depending on this library as a low-level tool)
Formula support currently includes a mostly-complete subset of the ROOT library TFormula
class,
and is implemented in a threadsafe standalone manner. The parsing grammar is formally defined
and parsed through the use of a header-only PEG parser library.
The supported features mirror CMSSW’s reco::formulaEvaluator
and fully passes the test suite for that utility with the purposeful exception of the TMath::
namespace.
The python bindings may be able to call into numexpr,
though, due to the tree-like structure of the corrections, it may prove difficult to exploit vectorization
at levels other than the entrypoint.