data

module for describing data process.

All data structure is describing as nested combination of dict or list for ndarray. Data process is a translation from data structure to another data structure or typical ndarray. Data cache can be implemented based on the dynamic features of list and dict.

The full data structure is

{
"particle":{
    "A":{"p":...,"m":...}
    ...
},
"decay":[
    {
    "A->R1+B": {
        "R1": {
        "ang":  {
            "alpha":[...],
            "beta": [...],
            "gamma": [...]
        },
        "z": [[x1,y1,z1],...],
        "x": [[x2,y2,z2],...]
        },
        "B" : {...}
    },
    "R->C+D": {
        "C": {
        ...,
        "aligned_angle":{
            "alpha":[...],
            "beta":[...],
            "gamma":[...]
        }
        },
        "D": {...}
    },
    },
    {
    "A->R2+C": {...},
    "R2->B+D": {...}
    },
    ...
],
"weight": [...]
}
class EvalLazy(f)[source]

Bases: object

class HeavyCall(f)[source]

Bases: object

class LazyCall(f, x, *args, **kwargs)[source]

Bases: object

as_dataset(batch=65000)[source]
batch(batch, axis=0)[source]
copy()[source]
create_new(f, x, *args, **kwargs)[source]
eval()[source]
get(index, value=None)[source]
get_weight()[source]
merge(*other, axis=0)[source]
set_cached_file(cached_file, name)[source]
class LazyFile(x, *args, **kwargs)[source]

Bases: LazyCall

as_dataset(batch=65000)[source]
create_new(f, x, *args, **kwargs)[source]
eval()[source]
class ReadData(var, trans=None)[source]

Bases: object

batch_call(function, data, batch=10000)[source]
batch_call_numpy(function, data, batch=10000)[source]
batch_sum(function, data, batch=10000)[source]
check_nan(data, no_raise=False)[source]

check if there is nan in data

data_cut(data, expr, var_map=None)[source]

cut data with boolean expression

Parameters:
  • data – data need to cut

  • expr – cut expression

  • var_map – variable map between parameters in expr and data, [option]

Returns:

data after being cut,

data_generator(data, fun=<function _data_split>, args=(), kwargs=None, MAX_ITER=1000)[source]

Data generator: call fun to each data as a generator. The extra arguments will be passed to fun.

data_index(data, key, no_raise=False)[source]

Indexing data for key or a list of keys.

data_map(data, fun, args=(), kwargs=None)[source]

Apply fun for each data. It returns the same structure.

data_mask(data, select)[source]

This function using boolean mask to select data.

Parameters:
  • data – data to select

  • select – 1-d boolean array for selection

Returns:

data after selection

data_merge(*data, axis=0)[source]

This function merges data with the same structure.

data_replace(data, key, value)[source]
data_shape(data, axis=0, all_list=False)[source]

Get data size.

Parameters:
  • data – Data array

  • axis – Integer. ???

  • all_list – Boolean. ???

Returns:

data_split(data, batch_size, axis=0)[source]

Split data for batch_size each in axis.

Parameters:
  • data – structured data

  • batch_size – Integer, data size for each split data

  • axis – Integer, axis for split, [option]

Returns:

a generator for split data

>>> data = {"a": [np.array([1.0, 2.0]), np.array([3.0, 4.0])], "b": {"c": np.array([5.0, 6.0])}, "d": [], "e": {}}
>>> for i, data_i in enumerate(data_split(data, 1)):
...     print(i, data_to_numpy(data_i))
...
0 {'a': [array([1.]), array([3.])], 'b': {'c': array([5.])}, 'd': [], 'e': {}}
1 {'a': [array([2.]), array([4.])], 'b': {'c': array([6.])}, 'd': [], 'e': {}}
data_strip(data, keys)[source]
data_struct(data)[source]

get the structure of data, keys and shape

data_to_numpy(dat)[source]

Convert Tensor data to numpy.ndarray.

data_to_tensor(dat)[source]

convert data to tensorflow.Tensor.

flatten_dict_data(data, fun=<built-in method format of str object>)[source]

Flatten data as dict with structure named as fun.

load_dat_file(fnames, particles, dtype=None, split=None, order=None, _force_list=False, mmap_mode=None)[source]

Load *.dat file(s) of 4-momenta of the final particles.

Parameters:
  • fnames – String or list of strings. File names.

  • particles – List of Particle. Final particles.

  • dtype – Data type.

  • split – sizes of each splited dat files

  • order – transpose order

Returns:

Dictionary of data indexed by Particle.

load_data(file_name, **kwargs)[source]

Load data file from save_data. The arguments will be passed to numpy.load().

save_data(file_name, obj, **kwargs)[source]

Save structured data to files. The arguments will be passed to numpy.save().

save_dataz(file_name, obj, **kwargs)[source]

Save compressed structured data to files. The arguments will be passed to numpy.save().

set_random_seed(seed)[source]

set random seed for random, numpy and tensorflow

split_generator(data, batch_size, axis=0)

Split data for batch_size each in axis.

Parameters:
  • data – structured data

  • batch_size – Integer, data size for each split data

  • axis – Integer, axis for split, [option]

Returns:

a generator for split data

>>> data = {"a": [np.array([1.0, 2.0]), np.array([3.0, 4.0])], "b": {"c": np.array([5.0, 6.0])}, "d": [], "e": {}}
>>> for i, data_i in enumerate(data_split(data, 1)):
...     print(i, data_to_numpy(data_i))
...
0 {'a': [array([1.]), array([3.])], 'b': {'c': array([5.])}, 'd': [], 'e': {}}
1 {'a': [array([2.]), array([4.])], 'b': {'c': array([6.])}, 'd': [], 'e': {}}