
longling¶
Overview¶
The project contains several modules for different purposes:
Quick scripts¶
The project provide several cli scripts to help construct different architecture.
CLI¶
Provide several general tools, consistently invoked by:
longling $subcommand $parameters1 $parameters2
To see the help
information:
longling -- --help
longling $subcommand --help
Take a glance on all available cli.
The cli tools is constructed based on fire. Refer to the documentation for detailed usage.
Demo¶
Split dataset¶
target: split a dataset into train/valid/test
longling train_valid_test $filename1 $filename2 -- --train_ratio 0.8 --valid_ratio 0.1 --test_ratio 0.1
Similar commands:
train_test
longling train_test $filename1 -- --train_ratio 0.8 --test_ratio 0.2
train_valid
longling train_valid $filename1 -- --train_ratio 0.8 --valid_ratio 0.2
- Cross Validation
kfold
longling kfold $filename1 $filename2 -- --n_splits 5
Display the tree of content¶
longling toc .
such as
/
├── __init__.py
├── __pycache__/
│ ├── __init__.cpython-36.pyc
│ └── toc.cpython-36.pyc
└── toc.py
Quickly construct a project¶
longling arch
or you can also directly copy the template files
longling arch-cli
To be noticed that, you need to check $VARIABLE
in the template files.
Result Analysis¶
The cli tools for result analysis is specially designed for json result format:
longling max $filename $key1 $key2 $key3
longling amax $key1 $key2 $key3 --src $filename
For the composite key like {'prf':{'avg': {'f1': 0.77}}}
, the key should be presented as prf:avg:f1
.
Thus, all the key used in the result file should not contain :
.
Module Index¶
Command Line Interfaces¶
use longling -- --help
to see all available cli,
and use longling $subcommand -- --help
to see concrete help information for a certain cli,
e.g. longling encode -- --help
Format and Encoding¶
longling.lib.stream.encode (src, ...) |
Convert a file in source encoding to target encoding |
longling.lib.loading.csv2jsonl (src, ...[, ...]) |
将 csv 格式文件/io流 转换为 json 格式文件/io流 |
longling.lib.loading.jsonl2csv (src, ...[, ...]) |
将 json 格式文件/io流 转换为 csv 格式文件/io流 |
Download Data¶
longling.spider.download_data.download_file (url) |
cli alias: download , download data from specified url |
Architecture¶
longling.toolbox.toc.toc ([root, parent, ...]) |
打印目录树 |
longling.Architecture.cli.cli ([skip_top, ...]) |
The main function for arch |
longling.Architecture.install_file.nni ([tar_dir]) |
cli alias: arch nni and install nni |
longling.Architecture.install_file.gitignore (...) |
cli alias: arch gitignore |
longling.Architecture.install_file.pytest (...) |
cli alias: arch pytest |
longling.Architecture.install_file.coverage (...) |
cli alias: arch coverage |
longling.Architecture.install_file.pysetup ([...]) |
cli alias: arch pysetup |
longling.Architecture.install_file.sphinx_conf ([...]) |
cli alias: arch sphinx_conf |
longling.Architecture.install_file.makefile ([...]) |
cli alias: arch makefile |
longling.Architecture.install_file.readthedocs ([...]) |
cli alias: arch readthedocs |
longling.Architecture.install_file.travis (...) |
cli alias: arch travis |
longling.Architecture.install_file.dockerfile (atype) |
cli alias: arch dockerfile |
longling.Architecture.install_file.gitlab_ci (...) |
cli alias: arch gitlab_ci |
longling.Architecture.install_file.chart (...) |
cli alias: arch chart |
Model Selection¶
Validation on Datasets¶
Split dataset to train, valid and test or apply kfold.
longling.ML.toolkit.dataset.train_valid_test (...) |
|
||
longling.ML.toolkit.dataset.train_test (...) |
|
||
longling.ML.toolkit.dataset.kfold (*files[, ...]) |
Select Best Model¶
Select best models on specified conditions
longling.ML.toolkit.analyser.cli.select_max (...) |
cli alias: max |
longling.ML.toolkit.analyser.cli.arg_select_max (...) |
cli alias: amax |
longling.ML.toolkit.hyper_search.nni.show_top_k (k) |
Updated in v1.3.17 |
longling.ML.toolkit.hyper_search.nni.show (key) |
Updated in v1.3.17 |
General Library¶
Quick Glance¶
For io:
longling.lib.stream.to_io (stream, <class >, ...) |
Convert an object as an io stream, could be a path to file or an io stream. |
longling.lib.stream.as_io (src, <class >, ...) |
with wrapper for to_io function, default mode is "r" |
longling.lib.stream.as_out_io (tar, <class >, ...) |
with wrapper for to_io function, default mode is "w" |
longling.lib.loading.loading (src, <class >), ...) |
缓冲式按行读取文件 |
迭代器
longling.lib.iterator.AsyncLoopIter (src[, ...]) |
异步循环迭代器,适用于加载文件 |
longling.lib.iterator.CacheAsyncLoopIter (...) |
带缓冲池的异步迭代器,适用于带预处理的文件 |
longling.lib.iterator.iterwrap (itertype, ...) |
迭代器装饰器,适用于希望能重复使用一个迭代器的情况,能将迭代器生成函数转换为可以重复使用的函数。 默认使用 AsyncLoopIter。 |
日志
longling.lib.utilog.config_logging ([...]) |
主日志设定文件 |
For path
longling.lib.path.path_append (path, *addition) |
路径合并函数 |
longling.lib.path.abs_current_dir (filepath) |
获取文件所在目录的绝对路径 |
longling.lib.path.file_exist (filepath) |
判断文件是否存在 |
语法糖
longling.lib.candylib.as_list (obj) |
A utility function that converts the argument to a list if it is not already. |
计时与进度
longling.lib.clock.print_time (tips[, logger]) |
统计并打印脚本运行时间, 秒为单位 |
longling.lib.clock.Clock (store_dict, ...[, tips]) |
计时器。 包含两种时间:wall_time 和 process_time |
longling.lib.stream.flush_print (*values, ...) |
刷新打印函数 |
并发
longling.lib.concurrency.concurrent_pool (...) |
Simple api for start completely independent concurrent programs: |
测试
longling.lib.testing.simulate_stdin (*inputs) |
测试中模拟标准输入 |
结构体 .. autosummary:
longling.lib.structure.AttrDict
longling.lib.structure.nested_update
longling.lib.structure.SortedList
正则 .. autosummary:
longling.lib.structure.variable_replace
longling.lib.structure.default_variable_replace
candylib¶
-
longling.lib.candylib.
as_list
(obj) → list[源代码]¶ A utility function that converts the argument to a list if it is not already.
参数: obj (object) -- argument to be converted to a list 返回: list_obj -- If obj is a list or tuple, return it. Otherwise, return [obj] as a single-element list. 返回类型: list 实际案例
>>> as_list(1) [1] >>> as_list([1]) [1] >>> as_list((1, 2)) [1, 2]
-
longling.lib.candylib.
dict2pv
(dict_obj: dict, path_to_node: list = None)[源代码]¶ >>> dict_obj = {"a": {"b": [1, 2], "c": "d"}, "e": 1} >>> path, value = dict2pv(dict_obj) >>> path [['a', 'b'], ['a', 'c'], ['e']] >>> value [[1, 2], 'd', 1]
-
longling.lib.candylib.
list2dict
(list_obj, value=None, dict_obj=None)[源代码]¶ >>> list_obj = ["a", 2, "c"] >>> list2dict(list_obj, 10) {'a': {2: {'c': 10}}}
-
longling.lib.candylib.
get_dict_by_path
(dict_obj, path_to_node)[源代码]¶ >>> dict_obj = {"a": {"b": {"c": 1}}} >>> get_dict_by_path(dict_obj, ["a", "b", "c"]) 1
-
longling.lib.candylib.
format_byte_sizeof
(num, suffix='B')[源代码]¶ 实际案例
>>> format_byte_sizeof(1024) '1.00KB'
-
longling.lib.candylib.
group_by_n
(obj: list, n: int) → list[源代码]¶ 实际案例
>>> list_obj = [1, 2, 3, 4, 5, 6] >>> group_by_n(list_obj, 3) [[1, 2, 3], [4, 5, 6]]
-
longling.lib.candylib.
as_ordered_dict
(dict_data: (<class 'dict'>, <class 'collections.OrderedDict'>), index: (<class 'list'>, None) = None)[源代码]¶ 实际案例
>>> as_ordered_dict({0: 0, 2: 123, 1: 1}) OrderedDict([(0, 0), (2, 123), (1, 1)]) >>> as_ordered_dict({0: 0, 2: 123, 1: 1}, [2, 0, 1]) OrderedDict([(2, 123), (0, 0), (1, 1)]) >>> as_ordered_dict(OrderedDict([(2, 123), (0, 0), (1, 1)])) OrderedDict([(2, 123), (0, 0), (1, 1)])
clock¶
-
class
longling.lib.clock.
Clock
(store_dict: (<class 'dict'>, None) = None, logger: (<class 'logging.Logger'>, None) = <Logger clock (INFO)>, tips='')[源代码]¶ 计时器。 包含两种时间:wall_time 和 process_time
- wall_time: 包括等待时间在内的程序运行时间
- process_time: 不包括等待时间在内的程序运行时间
参数: - store_dict (dict or None) -- with closure 中存储运行时间
- logger (logging.logger) -- 日志
- tips (str) -- 提示前缀
实际案例
with Clock(): a = 1 + 1 clock = Clock() clock.start() # some code clock.end(wall=True) # default to return the wall_time, to get process_time, set wall=False
-
process_time
¶ 获取程序运行时间(不包括等待时间)
-
wall_time
¶ 获取程序运行时间(包括等待时间)
-
longling.lib.clock.
print_time
(tips: str = '', logger=<Logger clock (INFO)>)[源代码]¶ 统计并打印脚本运行时间, 秒为单位
参数: - tips (str) --
- logger (logging.Logger or logging) --
实际案例
>>> with print_time("tips"): ... a = 1 + 1 # The code you want to test
-
longling.lib.clock.
Timer
¶
concurrency¶
-
longling.lib.concurrency.
concurrent_pool
(level: str, pool_size: int = None, ret: list = None)[源代码]¶ Simple api for start completely independent concurrent programs:
- thread
- process
- coroutine
实际案例
def pseudo(idx): return idx
ret = [] with concurrent_pool("p", ret=ret) as e: # or concurrent_pool("t", ret=ret) for i in range(4): e.submit(pseudo, i) print(ret)
[0, 1, 2, 3]
formatter¶
-
longling.lib.formatter.
dict_format
(data: dict, digits=6, col: int = None)[源代码]¶ 实际案例
>>> print(dict_format({"a": 123, "b": 3, "c": 4, "d": 5})) # doctest: +NORMALIZE_WHITESPACE a: 123 b: 3 c: 4 d: 5 >>> print(dict_format({"a": 123, "b": 3, "c": 4, "d": 5}, col=3)) # doctest: +NORMALIZE_WHITESPACE a: 123 b: 3 c: 4 d: 5
-
longling.lib.formatter.
pandas_format
(data: (<class 'dict'>, <class 'list'>, <class 'tuple'>), columns: list = None, index: (<class 'list'>, <class 'str'>) = None, orient='index', pd_kwargs: dict = None, max_rows=80, max_columns=80, **kwargs)[源代码]¶ 参数: - data (dict, list, tuple, pd.DataFrame) --
- columns (list, default None) -- Column labels to use when
orient='index'
. Raises a ValueError if used withorient='columns'
. - index (list of strings) -- Optional display names matching the labels (same order).
- orient ({'columns', 'index'}, default 'columns') -- The "orientation" of the data. If the keys of the passed dict should be the columns of the resulting DataFrame, pass 'columns' (default). Otherwise if the keys should be rows, pass 'index'.
- pd_kwargs (dict) --
- max_rows ((int, None), default 80) --
- max_columns ((int, None), default 80) --
实际案例
>>> print(pandas_format({"a": {"x": 1, "y": 2}, "b": {"x": 1.0, "y": 3}}, ["x", "y"])) x y a 1.0 2 b 1.0 3 >>> print(pandas_format([[1.0, 2], [1.0, 3]], ["x", "y"], index=["a", "b"])) x y a 1.0 2 b 1.0 3
-
longling.lib.formatter.
table_format
(data: (<class 'dict'>, <class 'list'>, <class 'tuple'>), columns: list = None, index: (<class 'list'>, <class 'str'>) = None, orient='index', pd_kwargs: dict = None, max_rows=80, max_columns=80, **kwargs)¶ 参数: - data (dict, list, tuple, pd.DataFrame) --
- columns (list, default None) -- Column labels to use when
orient='index'
. Raises a ValueError if used withorient='columns'
. - index (list of strings) -- Optional display names matching the labels (same order).
- orient ({'columns', 'index'}, default 'columns') -- The "orientation" of the data. If the keys of the passed dict should be the columns of the resulting DataFrame, pass 'columns' (default). Otherwise if the keys should be rows, pass 'index'.
- pd_kwargs (dict) --
- max_rows ((int, None), default 80) --
- max_columns ((int, None), default 80) --
实际案例
>>> print(pandas_format({"a": {"x": 1, "y": 2}, "b": {"x": 1.0, "y": 3}}, ["x", "y"])) x y a 1.0 2 b 1.0 3 >>> print(pandas_format([[1.0, 2], [1.0, 3]], ["x", "y"], index=["a", "b"])) x y a 1.0 2 b 1.0 3
-
longling.lib.formatter.
series_format
(data: dict, digits=6, col: int = None)¶ 实际案例
>>> print(dict_format({"a": 123, "b": 3, "c": 4, "d": 5})) # doctest: +NORMALIZE_WHITESPACE a: 123 b: 3 c: 4 d: 5 >>> print(dict_format({"a": 123, "b": 3, "c": 4, "d": 5}, col=3)) # doctest: +NORMALIZE_WHITESPACE a: 123 b: 3 c: 4 d: 5
iterator¶
-
class
longling.lib.iterator.
BaseIter
(src, fargs=None, fkwargs=None, length=None, *args, **kwargs)[源代码]¶ 迭代器
Notes
- 如果 src 是一个迭代器实例,那么在一轮迭代之后,迭代器里的内容就被迭代完了,将无法重启。
- 如果想使得迭代器可以一直被循环迭代,那么 src 应当是迭代器实例的生成函数, 同时在每次循环结束后,调用reset()
- 如果 src 没有 __length__,那么在第一次迭代结束前,无法对 BaseIter 的实例调用 len() 函数
实际案例
# 单次迭代后穷尽内容 with open("demo.txt") as f: bi = BaseIter(f) for line in bi: pass # 可多次迭代 def open_file(): with open("demo.txt") as f: for line in f: yield line bi = BaseIter(open_file) for _ in range(5): for line in bi: pass bi.reset() # 简化的可多次迭代的写法 @BaseIter.wrap def open_file(): with open("demo.txt") as f: for line in f: yield line bi = open_file() for _ in range(5): for line in bi: pass bi.reset()
-
class
longling.lib.iterator.
MemoryIter
(src, fargs=None, fkwargs=None, length=None, prefetch=False, *args, **kwargs)[源代码]¶ 内存迭代器
会将所有迭代器内容装载入内存
-
class
longling.lib.iterator.
LoopIter
(src, fargs=None, fkwargs=None, length=None, *args, **kwargs)[源代码]¶ 循环迭代器
每次迭代后会进行自动的 reset() 操作
-
class
longling.lib.iterator.
AsyncLoopIter
(src, fargs=None, fkwargs=None, tank_size=8, timeout=None, level='t')[源代码]¶ 异步循环迭代器,适用于加载文件
数据的读入和数据的使用迭代是异步的。reset() 之后会进行数据预取
-
class
longling.lib.iterator.
AsyncIter
(src, fargs=None, fkwargs=None, tank_size=8, timeout=None, level='t')[源代码]¶ 异步装载迭代器
不会进行自动 reset()
-
class
longling.lib.iterator.
CacheAsyncLoopIter
(src, cache_file, fargs=None, fkwargs=None, rerun=True, tank_size=8, timeout=None, level='t')[源代码]¶ 带缓冲池的异步迭代器,适用于带预处理的文件
自动 reset(), 同时针对 src 为 function 时可能存在的复杂预处理(即异步加载取数据操作比迭代输出数据操作时间长很多), 将异步加载中处理的预处理数据放到指定的缓冲文件中
-
longling.lib.iterator.
iterwrap
(itertype: str = 'AsyncLoopIter', *args, **kwargs)[源代码]¶ 迭代器装饰器,适用于希望能重复使用一个迭代器的情况,能将迭代器生成函数转换为可以重复使用的函数。 默认使用 AsyncLoopIter。
实际案例
@iterwrap() def open_file(): with open("demo.txt") as f: for line in f: yield line data = open_file() for _ in range(5): for line in data: pass
警告
As mentioned in [1], on Windows or MacOS, spawn() is the default multiprocessing start method. Using spawn(), another interpreter is launched which runs your main script, followed by the internal worker function that receives parameters through pickle serialization. However, decorator ,`functools`, lambda and local function does not well fit pickle like discussed in [2]. Therefore, since version 1.3.36, instead of using multiprocessing, we use multiprocess which replace pickle with dill . Nevertheless, the users should be aware of that level='p' may not work in windows and mac platform if the decorated function does not follow the spawn() behaviour.
Notes
Although fork in multiprocessing is quite easy to use, and iterwrap can work well with it, the users should still be aware of that fork is not safety enough as mentioned in [3].
We use the default mode when deal with multiprocessing, i.e., spawn in windows and macos, and folk in linux. An example to change the default behaviour is multiprocessing.set_start_method('spawn'), which could be found in [3].
References
[1] https://pytorch.org/docs/stable/data.html#platform-specific-behaviors [2] https://stackoverflow.com/questions/51867402/cant- pickle-function-stringtongrams-at-0x104144f28-its-not-the-same-object [3] https://docs.python.org/3/library/multiprocessing.html#contexts-and-start-methods
loading¶
-
longling.lib.loading.
csv2jsonl
(src: ((<class 'str'>, <class 'pathlib.PurePath'>), (<class '_io.TextIOWrapper'>, <class 'typing.TextIO'>, <class 'typing.BinaryIO'>, <class 'codecs.StreamReaderWriter'>, <class 'fileinput.FileInput'>)), tar: ((<class 'str'>, <class 'pathlib.PurePath'>), (<class '_io.TextIOWrapper'>, <class 'typing.TextIO'>, <class 'typing.BinaryIO'>, <class 'codecs.StreamReaderWriter'>, <class 'fileinput.FileInput'>)) = None, delimiter=', ', **kwargs)[源代码]¶ 将 csv 格式文件/io流 转换为 json 格式文件/io流
transfer csv file or io stream into json file or io stream
参数: - src (PATH_IO_TYPE) -- 数据源,可以是文件路径,也可以是一个IO流。 the path to source file or io stream.
- tar (PATH_IO_TYPE) -- 输出目标,可以是文件路径,也可以是一个IO流。 the path to target file or io stream.
- delimiter (str) -- 分隔符 the delimiter used in csv. some usually used delimiters are "," and " "
- kwargs (dict) -- options passed to csv.DictWriter
实际案例
Assume such component is written in demo.csv:
use following codes to reading the component
csv2json("demo.csv", "demo.jsonl")
and get
-
longling.lib.loading.
jsonl2csv
(src: ((<class 'str'>, <class 'pathlib.PurePath'>), (<class '_io.TextIOWrapper'>, <class 'typing.TextIO'>, <class 'typing.BinaryIO'>, <class 'codecs.StreamReaderWriter'>, <class 'fileinput.FileInput'>)), tar: ((<class 'str'>, <class 'pathlib.PurePath'>), (<class '_io.TextIOWrapper'>, <class 'typing.TextIO'>, <class 'typing.BinaryIO'>, <class 'codecs.StreamReaderWriter'>, <class 'fileinput.FileInput'>)) = None, delimiter=', ', **kwargs)[源代码]¶ 将 json 格式文件/io流 转换为 csv 格式文件/io流
transfer json file or io stream into csv file or io stream
参数: - src (PATH_IO_TYPE) -- 数据源,可以是文件路径,也可以是一个IO流。 the path to source file or io stream.
- tar (PATH_IO_TYPE) -- 输出目标,可以是文件路径,也可以是一个IO流。 the path to target file or io stream.
- delimiter (str) -- 分隔符 the delimiter used in csv. some usually used delimiters are "," and " "
- kwargs (dict) -- options passed to csv.DictWriter
实际案例
Assume such component is written in demo.csv:
use following codes to reading the component
jsonl2csv("demo.csv", "demo.jsonl")
and get
-
longling.lib.loading.
loading
(src: (((<class 'str'>, <class 'pathlib.PurePath'>), (<class '_io.TextIOWrapper'>, <class 'typing.TextIO'>, <class 'typing.BinaryIO'>, <class 'codecs.StreamReaderWriter'>, <class 'fileinput.FileInput'>)), Ellipsis), src_type=None)[源代码]¶ 缓冲式按行读取文件
Support read from
- jsonl (apply load_jsonl)
- csv (apply load_csv).
- file in other format will be treated as raw text (apply load_file).
- function will be invoked and return
- others will be directly returned
-
longling.lib.loading.
load_jsonl
(src: ((<class 'str'>, <class 'pathlib.PurePath'>), (<class '_io.TextIOWrapper'>, <class 'typing.TextIO'>, <class 'typing.BinaryIO'>, <class 'codecs.StreamReaderWriter'>, <class 'fileinput.FileInput'>)))[源代码]¶ 缓冲式按行读取jsonl文件
实际案例
Assume such component is written in demo.jsonl:
for line in load_jsonl('demo.jsonl'): print(line)
-
longling.lib.loading.
load_csv
(src: ((<class 'str'>, <class 'pathlib.PurePath'>), (<class '_io.TextIOWrapper'>, <class 'typing.TextIO'>, <class 'typing.BinaryIO'>, <class 'codecs.StreamReaderWriter'>, <class 'fileinput.FileInput'>)), delimiter=', ', **kwargs)[源代码]¶ read the dict from csv
实际案例
Assume such component is written in demo.csv:
for line in load_csv('demo.csv'): print(line)
-
longling.lib.loading.
load_file
(src: ((<class 'str'>, <class 'pathlib.PurePath'>), (<class '_io.TextIOWrapper'>, <class 'typing.TextIO'>, <class 'typing.BinaryIO'>, <class 'codecs.StreamReaderWriter'>, <class 'fileinput.FileInput'>)))[源代码]¶ Read raw text from source
实际案例
Assume such component is written in demo.txt:
use following codes to reading the component
for line in load_csv('demo.txt'): print(line, end="")
and get
parser¶
自定义的配置文件及对应的解析工具包。目的是为了更方便、快速地进行文件参数配置与解析。
自定义的配置文件及对应的解析工具包。目的是为了更方便、快速地进行文件参数配置与解析。
-
longling.lib.parser.
get_class_var
(class_obj, exclude_names: (<class 'set'>, None) = None, get_vars=None) → dict[源代码]¶ Update in v1.3.18
获取某个类的所有属性的变量名及其值
实际案例
>>> class A(object): ... att1 = 1 ... att2 = 2 >>> get_class_var(A) {'att1': 1, 'att2': 2} >>> get_class_var(A, exclude_names={"att1"}) {'att2': 2} >>> class B(object): ... att3 = 3 ... att4 = 4 ... @staticmethod ... def excluded_names(): ... return {"att4"} >>> get_class_var(B) {'att3': 3}
参数: - class_obj -- 类或类实例。需要注意两者的区别。
- exclude_names -- 需要排除在外的变量名。也可以通过在类定义 excluded_names 方法来指定要排除的变量名。
- get_vars --
返回: 类内属性变量名及值
返回类型: class_var
-
longling.lib.parser.
get_parsable_var
(class_obj, parse_exclude: set = None, dump_parse_functions=None, get_vars=True)[源代码]¶ 获取所有可以被解析的参数及其值,可以使用dump_parse_functions来对不可dump的值进行转换
-
longling.lib.parser.
load_configuration
(fp, file_format='json', load_parse_function=None)[源代码]¶ 装载配置文件
Updated in version 1.3.16
参数: - fp --
- file_format --
- load_parse_function --
-
longling.lib.parser.
var2exp
(var_str, env_wrap=<function <lambda>>)[源代码]¶ 将含有 $ 标识的变量转换为表达式
参数: - var_str --
- env_wrap --
实际案例
>>> root = "dir" >>> dataset = "d1" >>> eval(var2exp("$root/data/$dataset")) 'dir/data/d1'
-
longling.lib.parser.
path_append
(path, *addition, to_str=False)[源代码]¶ 路径合并函数
实际案例
path_append("../", "../data", "../dataset1/", "train", to_str=True) '../../data/../dataset1/train'
参数: - path (str or PurePath) --
- addition (list(str or PurePath)) --
- to_str (bool) -- Convert the new path to str
-
class
longling.lib.parser.
Configuration
(logger=<module 'logging' from '/home/docs/.pyenv/versions/3.7.9/lib/python3.7/logging/__init__.py'>, **kwargs)[源代码]¶ 自定义的配置文件基类
实际案例
>>> c = Configuration(a=1, b="example", c=[0,2], d={"a1": 3}) >>> c.instance_var {'a': 1, 'b': 'example', 'c': [0, 2], 'd': {'a1': 3}} >>> c.default_file_format() 'json' >>> c.get("a") 1 >>> c.get("e") is None True >>> c.get("e", 0) 0 >>> c.update(e=2) >>> c["e"] 2
-
class_var
¶ 获取所有设定的参数
返回: parameters -- all variables used as parameters 返回类型: dict
-
dump
(cfg_path: str, override=True, file_format=None)[源代码]¶ 将配置参数写入文件
Updated in version 1.3.16
参数: - cfg_path (str) --
- override (bool) --
- file_format (str) --
-
parsable_var
¶ 获取可以进行命令行设定的参数
返回: store_vars -- 可以进行命令行设定的参数 返回类型: dict
-
-
class
longling.lib.parser.
ConfigurationParser
(class_type, excluded_names: (<class 'set'>, None) = None, commands=None, *args, params_help=None, commands_help=None, override_help=False, **kwargs)[源代码]¶ Update in v1.3.18
配置文件解析类,可用于构建cli工具。该类首先读入所有目标配置文件类class_obj的所有类属性,解析后生成命令行。 普通属性参数使用 "--att_name att_value" 来读入。另外提供一个额外参数标记 ‘--kwargs’ 来读入可选参数。 可选参数格式为
--kwargs key1=value1;key2=value2;...
首先生成一个解析类
cli_parser = ConfigurationParser(Configuration)
除了解析已有配置文件外,解析类还可以进一步添加函数来生成子解析器
cli_parser = ConfigurationParser($function)
或者
cli_parser = ConfigurationParser([$function1, $function2])
用以下三种解析方式中任意一种来解析参数:
命令行模式
cli_parser()
字符串传参模式
cli_parser('$parameter1 $parameters ...')
列表传参模式
cli_parser(["--a", "int(1)", "--b", "int(2)"])
Notes
包含以下关键字的字符串会在解析过程中进行类型转换
int, float, dict, list, set, tuple, None
参数: - class_type -- 类。注意是类,不是类实例。
- excluded_names -- 类中不进行解析的变量名集合
- commands -- 待解析的命令函数
实际案例
>>> class TestC(Configuration): ... a = 1 ... b = 2 >>> def test_f1(k=1): ... return k >>> def test_f2(h=1): ... return h >>> def test_f3(m): ... return m >>> parser = ConfigurationParser(TestC) >>> parser("--a 1 --b 2") {'a': '1', 'b': '2'} >>> ConfigurationParser.get_cli_cfg(TestC) {'a': 1, 'b': 2} >>> parser(["--a", "1", "--b", "int(1)"]) {'a': '1', 'b': 1} >>> parser(["--a", "1", "--b", "int(1)", "--kwargs", "c=int(3);d=None"]) {'a': '1', 'b': 1, 'c': 3, 'd': None} >>> parser.add_command(test_f1, test_f2, test_f3) >>> parser(["test_f1"]) {'a': 1, 'b': 2, 'k': 1, 'subcommand': 'test_f1'} >>> parser(["test_f2"]) {'a': 1, 'b': 2, 'h': 1, 'subcommand': 'test_f2'} >>> parser(["test_f3", "3"]) {'a': 1, 'b': 2, 'm': '3', 'subcommand': 'test_f3'} >>> parser = ConfigurationParser(TestC, commands=[test_f1, test_f2]) >>> parser(["test_f1"]) {'a': 1, 'b': 2, 'k': 1, 'subcommand': 'test_f1'} >>> class TestCC: ... c = {"_c": 1, "_d": 0.1} >>> parser = ConfigurationParser(TestCC) >>> parser("--c _c=int(3);_d=float(0.3)") {'c': {'_c': 3, '_d': 0.3}} >>> class TestCls: ... def a(self, a=1): ... return a ... @staticmethod ... def b(b=2): ... return b ... @classmethod ... def c(cls, c=3): ... return c >>> parser = ConfigurationParser(TestCls, commands=[TestCls.b, TestCls.c]) >>> parser("b") {'b': 2, 'subcommand': 'b'} >>> parser("c") {'c': 3, 'subcommand': 'c'}
-
class
longling.lib.parser.
Formatter
(formatter: (<class 'str'>, None) = None)[源代码]¶ 以特定格式格式化字符串
实际案例
>>> formatter = Formatter() >>> formatter("hello world") 'hello world' >>> formatter = Formatter("hello {}") >>> formatter("world") 'hello world' >>> formatter = Formatter("hello {} v{:.2f}") >>> formatter("world", 0.2) 'hello world v0.20' >>> formatter = Formatter("hello {1} v{0:.2f}") >>> formatter(0.2, "world") 'hello world v0.20' >>> Formatter.format(0.2, "world", formatter="hello {1} v{0:.3f}") 'hello world v0.200'
-
class
longling.lib.parser.
ParserGroup
(parsers: dict, prog=None, usage=None, description=None, epilog=None, add_help=True)[源代码]¶ >>> class TestC(Configuration): ... a = 1 ... b = 2 >>> def test_f1(k=1): ... return k >>> def test_f2(h=1): ... return h >>> class TestC2(Configuration): ... c = 3 >>> parser1 = ConfigurationParser(TestC, commands=[test_f1]) >>> parser2 = ConfigurationParser(TestC, commands=[test_f2]) >>> pg = ParserGroup({"model1": parser1, "model2": parser2}) >>> pg(["model1", "test_f1"]) {'a': 1, 'b': 2, 'k': 1, 'subcommand': 'test_f1'} >>> pg("model2 test_f2") {'a': 1, 'b': 2, 'h': 1, 'subcommand': 'test_f2'}
-
longling.lib.parser.
is_classmethod
(method)[源代码]¶ 参数: method -- 实际案例
>>> class A: ... def a(self): ... pass ... @staticmethod ... def b(): ... pass ... @classmethod ... def c(cls): ... pass >>> obj = A() >>> is_classmethod(obj.a) False >>> is_classmethod(obj.b) False >>> is_classmethod(obj.c) True >>> def fun(): ... pass >>> is_classmethod(fun) False
path¶
progress¶
进度监视器,帮助用户知晓当前运行进度,主要适配于机器学习中分 epoch,batch 的情况。
和 tqdm 针对单个迭代对象进行快速适配不同, progress的目标是能将监视器不同功能部件模块化后再行组装,可以实现description的动态化, 给用户提供更大的便利性。
- MonitorPlayer 定义了如何显示进度和其它过程参数(better than tqdm, where only n is changed and description is fixed)
- 在 __call__ 方法中定义如何显示
- 继承ProgressMonitor并传入必要参数进行实例化
- 继承重写ProgressMonitor的__call__函数,用 IterableMIcing 包裹迭代器,这一步可以灵活定义迭代前后的操作
- 需要在__init__的时候传入一个MonitorPlayer实例
- IterableMIcing 用来组装迭代器、监控器
一个简单的示例如下
class DemoMonitor(ProgressMonitor):
def __call__(self, iterator):
return IterableMIcing(
iterator,
self.player, self.player.set_length
)
progress_monitor = DemoMonitor(MonitorPlayer())
for _ in range(5):
for _ in progress_monitor(range(10000)):
pass
print()
cooperate with tqdm
from tqdm import tqdm
class DemoTqdmMonitor(ProgressMonitor):
def __call__(self, iterator, **kwargs):
return tqdm(iterator, **kwargs)
-
class
longling.lib.progress.
IterableMIcing
(iterator: (typing.Iterable, <class 'list'>, <class 'tuple'>, <class 'dict'>), hook_in_iter=<function pass_function>, hook_after_iter=<function pass_function>, length: (<class 'int'>, None) = None)[源代码]¶ 将迭代器包装为监控器可以使用的迭代类: * 添加计数器 count, 每迭代一次,count + 1, 迭代结束时,可根据 count 得知数据总长 * 每次 __iter__ 时会调用 call_in_iter 函数 * 迭代结束时,会调用 call_after_iter
参数: - iterator -- 待迭代数据
- hook_in_iter -- 每次迭代中的回调函数(例如:打印进度等),接受当前的 count 为输入
- hook_after_iter -- 每轮迭代后的回调函数(所有数据遍历一遍后),接受当前的 length 为输入
- length -- 数据总长(有多少条数据)
- iterator = IterableMIcing(range(100)) (>>>) --
- for i in iterator (>>>) --
- pass (..) --
- len(iterator) (>>>) --
- 100 --
- def iter_fn(num) (>>>) --
- for i in range(num) (..) --
- yield num (..) --
- iterator = IterableMIcing(iter_fn(50)) (>>>) --
- for i in iterator --
- pass --
- len(iterator) --
- 50 --
regex¶
-
longling.lib.regex.
variable_replace
(string: str, key_lower: bool = True, quotation: str = '', **variables)[源代码]¶ 实际案例
>>> string = "hello $who" >>> variable_replace(string, who="world") 'hello world' >>> string = "hello $WHO" >>> variable_replace(string, key_lower=False, WHO="world") 'hello world' >>> string = "hello $WHO" >>> variable_replace(string, who="longling") 'hello longling' >>> string = "hello $Wh_o" >>> variable_replace(string, wh_o="longling") 'hello longling'
-
longling.lib.regex.
default_variable_replace
(string: str, default_value: (<class 'str'>, None, <class 'dict'>) = None, key_lower: bool = True, quotation: str = '', **variables) → str[源代码]¶ 实际案例
>>> string = "hello $who, I am $author" >>> default_variable_replace(string, default_value={"author": "groot"}, who="world") 'hello world, I am groot' >>> string = "hello $who, I am $author" >>> default_variable_replace(string, default_value={"author": "groot"}) 'hello , I am groot' >>> string = "hello $who, I am $author" >>> default_variable_replace(string, default_value='', who="world") 'hello world, I am ' >>> string = "hello $who, I am $author" >>> default_variable_replace(string, default_value=None, who="world") 'hello world, I am $author'
stream¶
此模块用以进行流处理
-
longling.lib.stream.
to_io
(stream: (<class '_io.TextIOWrapper'>, <class 'typing.TextIO'>, <class 'typing.BinaryIO'>, (<class 'str'>, <class 'pathlib.PurePath'>), <class 'list'>, None) = None, mode='r', encoding='utf-8', **kwargs)[源代码]¶ Convert an object as an io stream, could be a path to file or an io stream.
实际案例
to_io("demo.txt") # equal to open("demo.txt") to_io(open("demo.txt")) # equal to open("demo.txt") a = to_io() # equal to a = sys.stdin b = to_io(mode="w) # equal to a = sys.stdout
-
longling.lib.stream.
as_io
(src: (<class '_io.TextIOWrapper'>, <class 'typing.TextIO'>, <class 'typing.BinaryIO'>, (<class 'str'>, <class 'pathlib.PurePath'>), <class 'list'>, None) = None, mode='r', encoding='utf-8', **kwargs)[源代码]¶ with wrapper for to_io function, default mode is "r"
实际案例
with as_io("demo.txt") as f: for line in f: pass # equal to with open(demo.txt) as src: with as_io(src) as f: for line in f: pass # from several files with as_io(["demo1.txt", "demo2.txt"]) as f: for line in f: pass # from sys.stdin with as_io() as f: for line in f: pass
-
longling.lib.stream.
as_out_io
(tar: (<class '_io.TextIOWrapper'>, <class 'typing.TextIO'>, <class 'typing.BinaryIO'>, (<class 'str'>, <class 'pathlib.PurePath'>), <class 'list'>, None) = None, mode='w', encoding='utf-8', **kwargs)[源代码]¶ with wrapper for to_io function, default mode is "w"
实际案例
with as_out_io("demo.txt") as wf: print("hello world", file=wf) # equal to with open(demo.txt) as tar: with as_out_io(tar) as f: print("hello world", file=wf) # to sys.stdout with as_out_io() as wf: print("hello world", file=wf) # to sys.stderr with as_out_io(mode="stderr) as wf: print("hello world", file=wf)
-
longling.lib.stream.
wf_open
(stream_name: (((<class 'str'>, <class 'pathlib.PurePath'>), (<class '_io.TextIOWrapper'>, <class 'typing.TextIO'>, <class 'typing.BinaryIO'>, <class 'codecs.StreamReaderWriter'>, <class 'fileinput.FileInput'>)), None) = None, mode='w', encoding='utf-8', **kwargs)[源代码]¶ Simple wrapper to codecs for writing.
stream_name为空时 mode - w 返回标准错误输出 stderr; 否则,返回标准输出 stdout
stream_name不为空时,返回文件流
参数: - stream_name (str, PurePath or None) --
- mode (str) --
- encoding (str) -- 编码方式,默认为 utf-8
返回: write_stream -- 返回打开的流
返回类型: StreamReaderWriter
实际案例
>>> wf = wf_open(mode="stdout") >>> print("hello world", file=wf) hello world
-
longling.lib.stream.
build_dir
(path, mode=509, parse_dir=True)[源代码]¶ 创建目录,从path中解析出目录路径,如果目录不存在,创建目录
参数: - path (str) --
- mode (int) --
- parse_dir (bool) --
-
class
longling.lib.stream.
AddPrinter
(fp, values_wrapper=<function AddPrinter.<lambda>>, to_io_params=None, ensure_io=False, **kwargs)[源代码]¶ 以add方法添加文件内容的打印器
实际案例
>>> import sys >>> printer = AddPrinter(sys.stdout, ensure_io=True) >>> printer.add("hello world") hello world
-
longling.lib.stream.
check_file
(filepath, size=None)[源代码]¶ 检查文件是否存在,size给定时,检查文件大小是否一致
参数: - filepath (str) --
- size (int) --
返回: file exist or not
返回类型: bool
structure¶
-
class
longling.lib.structure.
AttrDict
(*args, **kwargs)[源代码]¶ Example
>>> ad = AttrDict({'first_name': 'Eduardo'}, last_name='Pool', age=24, sports=['Soccer']) >>> ad {'first_name': 'Eduardo', 'last_name': 'Pool', 'age': 24, 'sports': ['Soccer']} >>> ad.first_name 'Eduardo' >>> ad.age 24 >>> ad.age = 16 >>> ad.age 16 >>> ad["age"] = 20 >>> ad["age"] 20
-
class
longling.lib.structure.
SortedList
(iterable: Iterable[T_co] = (), key=None)[源代码]¶ A list maintaining the element in an ascending order.
A custom key function can be supplied to customize the sort order.
实际案例
>>> sl = SortedList() >>> sl.adds(*[1, 2, 3, 4, 5]) >>> sl [1, 2, 3, 4, 5] >>> sl.add(7) >>> sl [1, 2, 3, 4, 5, 7] >>> sl.add(6) >>> sl [1, 2, 3, 4, 5, 6, 7] >>> sl = SortedList([4]) >>> sl.add(3) >>> sl.add(2) >>> sl [2, 3, 4] >>> list(reversed(sl)) [4, 3, 2] >>> sl = SortedList([("harry", 1), ("tom", 0)], key=lambda x: x[1]) >>> sl [('tom', 0), ('harry', 1)] >>> sl.add(("jack", -1), key=lambda x: x[1]) >>> sl [('jack', -1), ('tom', 0), ('harry', 1)] >>> sl.add(("ada", 2)) >>> sl [('jack', -1), ('tom', 0), ('harry', 1), ('ada', 2)]
-
longling.lib.structure.
nested_update
(src: dict, update: dict)[源代码]¶ 实际案例
>>> nested_update({"a": {"x": 1}}, {"a": {"y": 2}}) {'a': {'x': 1, 'y': 2}} >>> nested_update({"a": {"x": 1}}, {"a": {"x": 2}}) {'a': {'x': 2}} >>> nested_update({"a": {"x": 1}}, {"b": {"y": 2}}) {'a': {'x': 1}, 'b': {'y': 2}} >>> nested_update({"a": {"x": 1}}, {"a": 2}) {'a': 2}
testing¶
time¶
utilog¶
日志设定文件
-
longling.lib.utilog.
config_logging
(filename=None, log_format=None, level=20, logger=None, console_log_level=None, propagate=False, mode='a', file_format=None, encoding: (<class 'str'>, None) = 'utf-8', enable_colored=False, datefmt=None)[源代码]¶ 主日志设定文件
参数: - filename (str or None) -- 日志存储文件名,不为空时将创建文件存储日志
- log_format (str) -- 默认日志输出格式: %(name)s, %(levelname)s %(message)s 如果 datefmt 被指定,则变为 %(name)s: %(asctime)s, %(levelname)s %(message)s
- level (str or int) -- 默认日志等级
- logger (str or logging.logger) -- 日志logger名,可以为空(使用root logger), 字符串类型(创建对应名logger),logger
- console_log_level (str, int or None) -- 屏幕日志等级,不为空时,使能屏幕日志输出
- propagate (bool) --
- mode (str) --
- file_format (str or None) -- 文件日志输出格式,为空时,使用log_format
- encoding --
- enable_colored (bool) --
- datefmt (str) --
-
longling.lib.utilog.
default_timestamp
() → str¶ 实际案例
> get_current_timestamp() '20200327172235'
Spider¶
一个简单的爬虫库
预期提供:
- 能下载指定url的文件,如果是压缩文件,进行可选的解压缩
- [x] 可以显示进度
- 能够进行简单的信息抽取
- [x] 提取url
- [x] 提取文本
- [ ] 提取图片
更复杂的功能,如爬取特定网站形成结构化数据,反爬虫等内容独立成库
Quick Glance¶
longling.spider.lib.get_html_code (url) |
get encoded html code from specified url |
longling.spider.download_data.download_file (url) |
cli alias: download , download data from specified url |
Architecture Tools for Constructing Projects¶
notice¶
- In sphinx setting, the default setting in
sphinx-quickstart
would be
> Separate source and build directories (y/n) [n]
Thus, default directory to the built files is _build
.
If the y
is chosen, the directory to the built files is build
.
entrance¶
components¶
-
longling.Architecture.install_file.
template_copy
(src: (<class 'str'>, <class 'pathlib.PurePath'>), tar: (<class 'str'>, <class 'pathlib.PurePath'>), default_value: (<class 'str'>, <class 'dict'>, None) = '', quotation="'", key_lower=True, **variables)[源代码]¶ Generate the tar file based on the template file where the variables will be replaced. Usually, the variable is specified like $PROJECT in the template file.
参数: - src (template file) --
- tar (target location) --
- default_value (the default value) --
- quotation (the quotation to wrap the variable value) --
- variables (the real variable values which are used to replace the variable in template file) --
-
longling.Architecture.install_file.
gitignore
(atype: str = '', tar_dir: (<class 'str'>, <class 'pathlib.PurePath'>) = './')[源代码]¶ cli alias:
arch gitignore
参数: - atype (the gitignore type, currently support docs and python) --
- tar_dir (target directory) --
-
longling.Architecture.install_file.
pytest
(tar_dir: (<class 'str'>, <class 'pathlib.PurePath'>) = './')[源代码]¶ cli alias:
arch pytest
参数: tar_dir --
-
longling.Architecture.install_file.
coverage
(tar_dir: (<class 'str'>, <class 'pathlib.PurePath'>) = './', **variables)[源代码]¶ cli alias:
arch coverage
参数: - tar_dir --
- variables --
These variables should be provided:
- project
-
longling.Architecture.install_file.
pysetup
(tar_dir='./', **variables)[源代码]¶ cli alias:
arch pysetup
参数: - tar_dir --
- variables --
-
longling.Architecture.install_file.
sphinx_conf
(tar_dir='./', **variables)[源代码]¶ cli alias:
arch sphinx_conf
参数: - tar_dir --
- variables --
-
longling.Architecture.install_file.
makefile
(tar_dir='./', **variables)[源代码]¶ cli alias:
arch makefile
参数: - tar_dir --
- variables --
-
longling.Architecture.install_file.
readthedocs
(tar_dir='./')[源代码]¶ cli alias:
arch readthedocs
参数: tar_dir --
-
longling.Architecture.install_file.
travis
(tar_dir: (<class 'str'>, <class 'pathlib.PurePath'>) = './')[源代码]¶ cli alias:
arch travis
参数: tar_dir --
-
longling.Architecture.install_file.
nni
(tar_dir='./')[源代码]¶ cli alias:
arch nni
andinstall nni
参数: tar_dir --
-
longling.Architecture.install_file.
dockerfile
(atype, tar_dir='./', **variables)[源代码]¶ cli alias:
arch dockerfile
参数: - atype --
- tar_dir --
- variables --
-
longling.Architecture.install_file.
gitlab_ci
(private, stages: dict, atype: str = '', tar_dir: (<class 'str'>, <class 'pathlib.PurePath'>) = './', version_in_path=True)[源代码]¶ cli alias:
arch gitlab_ci
参数: - private --
- stages --
- atype --
- tar_dir --
- version_in_path --
-
longling.Architecture.install_file.
chart
(tar_dir: (<class 'str'>, <class 'pathlib.PurePath'>) = './')[源代码]¶ cli alias:
arch chart
参数: tar_dir (target directory) --
-
longling.Architecture.utils.
legal_input
(__promt: str, __legal_input: set = None, __illegal_input: set = None, is_legal=None, __default_value: str = None)[源代码]¶ To make sure the input legal, if the input is illegal, the user will be asked to retype the input.
Input being illegal means the input either not in __legal_input or in __illegal_input.
For the case that user do not type in anything, if the __default_value is set, the __default_value will be used. Else, the input will be treated as illegal.
参数: - __promt (str) -- tips to be displayed
- __legal_input (set) -- the input should be in the __legal_input set
- __illegal_input (set) -- the input should be not in the __illegal_input set
- is_legal (function) -- the function used to judge whether the input is legal, by default, we use the inner __is_legal function
- __default_value (str) -- default value when user type in nothing
Machine Learning¶
ML Framework¶
ML Framework is designed to help quickly construct a practical ML program where the user can focus on
developing the algorithm despite of some other additional but important engineering components like log
and cli
.
Currently, two supported packages are provided for the popular DL framework: mxnet
and pytorch
. The overall scenery are almost the same, but the details may be a little different.
To be noticed that, ML Framework just provide a template, all components are allowed to be modified.
Overview¶
The architecture produced by the ML Framework is look like:
ModelName/
├── __init__.py
├── docs/
├── ModelName/
├── REAME.md
└── Some other components
And the core part is ModelName
under the package ModelName
, the architecture of it is:
ModelName/
├── __init__.py
├── ModelName.py <-- the main module
└── Module/
├── __init__.py
├── configuration.py <-- define the configurable variables
├── etl.py <-- define how the data will be loaded and preprocessed
├── module.py <-- the wrapper of the network, raraly need modification
├── run.py <-- human testing script
└── sym/ <-- define the network
├── __init__.py
├── fit_eval.py <-- define how the network will be trained and evaluated
├── net.py <-- network architecture
└── viz.py <-- (option) how to visualize the network
Configuration¶
In configuration, some variables are predefined, such as the data_dir
(where the data is stored) and model_dir
(where the model file like parameters and running log should be stored). The following rules is used to automatically construct the needed path, which can be modified as the user wants:
model_name = "automatically be consistent with ModelName"
root = "./"
dataset = "" # option
timestamp = datetime.datetime.now().strftime("%Y%m%d%H%M%S") # option
workspace = "" # option
root_data_dir = "$root/data/$dataset" if dataset else "$root/data"
data_dir = "$root_data_dir/data"
root_model_dir = "$root_data_dir/model/$model_name"
model_dir = "$root_model_dir/$workspace" if workspace else root_model_dir
cfg_path = "$model_dir/configuration.json"
The value of the variable containing $
will be automatically evaluated during program running. Thus, it is easy to construct flexible variables via cli. For example, some one may want to have the model_dir
contained timestamp
information, then he or she can specify the model_dir
in cli like:
--model_dir \$root_model_dir/\$workspace/\$timestamp
Annotation: \
is a escape character in shell
, which can have \$variable
finally be got as the string $variable
in the program, otherwise, the variable will be converted to the environment variable of shell
like $HOME
.
Also, some general variables which may frequently used in all algorithms are also predefined like optimizer
and batch_size
:
# 训练参数设置
begin_epoch = 0
end_epoch = 100
batch_size = 32
save_epoch = 1
# 优化器设置
optimizer, optimizer_params = get_optimizer_cfg(name="base")
lr_params = {
"learning_rate": optimizer_params["learning_rate"],
"step": 100,
"max_update_steps": get_update_steps(
update_epoch=10,
batches_per_epoch=1000,
),
}
metrics¶
Metrics¶
classification¶
-
longling.ML.metrics.classification.
classification_report
(y_true, y_pred=None, y_score=None, labels=None, metrics=None, sample_weight=None, average_options=None, multiclass_to_multilabel=False, logger=<module 'logging' from '/home/docs/.pyenv/versions/3.7.9/lib/python3.7/logging/__init__.py'>, **kwargs)[源代码]¶ Currently support binary and multiclasss classification.
参数: - y_true (list, 1d array-like, or label indicator array / sparse matrix) -- Ground truth (correct) target values.
- y_pred (list or None, 1d array-like, or label indicator array / sparse matrix) -- Estimated targets as returned by a classifier.
- y_score (array or None, shape = [n_samples] or [n_samples, n_classes]) -- Target scores, can either be probability estimates of the positive class, confidence values, or non-thresholded measure of decisions (as returned by "decision_function" on some classifiers). For binary y_true, y_score is supposed to be the score of the class with greater label.
- labels (array, shape = [n_labels]) -- Optional list of label indices to include in the report.
- metrics (list of str,) -- Support: precision, recall, f1, support, accuracy, auc, aupoc.
- sample_weight (array-like of shape = [n_samples], optional) -- Sample weights.
- average_options (str or list) -- default to macro, choices (one or many): "micro", "macro", "samples", "weighted"
- multiclass_to_multilabel (bool) --
- logger --
实际案例
>>> import numpy as np >>> # binary classification >>> y_true = np.array([0, 0, 1, 1, 0]) >>> y_pred = np.array([0, 1, 0, 1, 0]) >>> classification_report(y_true, y_pred) precision recall f1 support 0 0.666667 0.666667 0.666667 3 1 0.500000 0.500000 0.500000 2 macro_avg 0.583333 0.583333 0.583333 5 accuracy: 0.600000 >>> y_true = np.array([0, 0, 1, 1]) >>> y_score = np.array([0.1, 0.4, 0.35, 0.8]) >>> classification_report(y_true, y_score=y_score) # doctest: +NORMALIZE_WHITESPACE macro_auc: 0.750000 macro_aupoc: 0.833333 >>> y_true = np.array([0, 0, 1, 1]) >>> y_pred = [0, 0, 0, 1] >>> y_score = np.array([0.1, 0.4, 0.35, 0.8]) >>> classification_report(y_true, y_pred, y_score=y_score) # doctest: +NORMALIZE_WHITESPACE precision recall f1 support 0 0.666667 1.00 0.800000 2 1 1.000000 0.50 0.666667 2 macro_avg 0.833333 0.75 0.733333 4 accuracy: 0.750000 macro_auc: 0.750000 macro_aupoc: 0.833333 >>> # multiclass classification >>> y_true = [0, 1, 2, 2, 2] >>> y_pred = [0, 0, 2, 2, 1] >>> classification_report(y_true, y_pred) precision recall f1 support 0 0.5 1.000000 0.666667 1 1 0.0 0.000000 0.000000 1 2 1.0 0.666667 0.800000 3 macro_avg 0.5 0.555556 0.488889 5 accuracy: 0.600000 >>> # multiclass in multilabel >>> y_true = np.array([0, 0, 1, 1, 2, 1]) >>> y_pred = np.array([2, 1, 0, 2, 1, 0]) >>> y_score = np.array([ ... [0.15, 0.4, 0.45], ... [0.1, 0.9, 0.0], ... [0.33333, 0.333333, 0.333333], ... [0.15, 0.4, 0.45], ... [0.1, 0.9, 0.0], ... [0.33333, 0.333333, 0.333333] ... ]) >>> classification_report( ... y_true, y_pred, y_score, ... multiclass_to_multilabel=True, ... metrics=["aupoc"] ... ) aupoc 0 0.291667 1 0.416667 2 0.166667 macro_avg 0.291667 >>> classification_report( ... y_true, y_pred, y_score, ... multiclass_to_multilabel=True, ... metrics=["auc", "aupoc"] ... ) auc aupoc 0 0.250000 0.291667 1 0.055556 0.416667 2 0.100000 0.166667 macro_avg 0.135185 0.291667 macro_auc: 0.194444 >>> y_true = np.array([0, 1, 1, 1, 2, 1]) >>> y_pred = np.array([2, 1, 0, 2, 1, 0]) >>> y_score = np.array([ ... [0.45, 0.4, 0.15], ... [0.1, 0.9, 0.0], ... [0.33333, 0.333333, 0.333333], ... [0.15, 0.4, 0.45], ... [0.1, 0.9, 0.0], ... [0.33333, 0.333333, 0.333333] ... ]) >>> classification_report( ... y_true, y_pred, ... y_score, ... multiclass_to_multilabel=True, ... ) # doctest: +NORMALIZE_WHITESPACE precision recall f1 auc aupoc support 0 0.000000 0.000000 0.000000 1.00 1.000000 1 1 0.500000 0.250000 0.333333 0.25 0.583333 4 2 0.000000 0.000000 0.000000 0.10 0.166667 1 macro_avg 0.166667 0.083333 0.111111 0.45 0.583333 6 accuracy: 0.166667 macro_auc: 0.437500 >>> classification_report( ... y_true, y_pred, ... y_score, ... labels=[0, 1], ... multiclass_to_multilabel=True, ... ) # doctest: +NORMALIZE_WHITESPACE precision recall f1 auc aupoc support 0 0.00 0.000 0.000000 1.00 1.000000 1 1 0.50 0.250 0.333333 0.25 0.583333 4 macro_avg 0.25 0.125 0.166667 0.45 0.583333 5 accuracy: 0.166667 macro_auc: 0.437500
regression¶
-
longling.ML.metrics.regression.
regression_report
(y_true, y_pred, metrics=None, sample_weight=None, multioutput='uniform_average', average_options=None, key_prefix='', key_suffix='', verbose=True)[源代码]¶ 参数: - y_true (array-like of shape (n_samples,) or (n_samples, n_outputs)) -- Ground truth (correct) target values.
- y_pred (array-like of shape (n_samples,) or (n_samples, n_outputs)) -- Estimated target values.
- metrics (list of str,) -- Support: evar(explained_variance), mse, rmse, mae, r2
- sample_weight (array-like of shape (n_samples,), optional) -- Sample weights.
- multioutput (string in ['raw_values', 'uniform_average', 'variance_weighted'], list) --
or array-like of shape (n_outputs) Defines aggregating of multiple output values. Disabled when verbose is True. Array-like value defines weights used to average errors. 'raw_values' :
Returns a full set of errors in case of multioutput input.- 'uniform_average' :
- Errors of all outputs are averaged with uniform weight. Alias: "macro"
- 'variance_weighted':
- Only support in evar and r2. Scores of all outputs are averaged, weighted by the variances of each individual output. Alias: "vw"
- average_options (str or list) -- default to macro, choices (one or many): "macro", "vw"
- key_prefix (str) --
- key_suffix (str) --
- verbose (bool) --
返回: - evar (explained variance)
- mse (mean squared error)
- rmse (root mean squared error)
- mae (mean absolute error)
- r2 (r2 score)
实际案例
>>> y_true = [[0.5, 1, 1], [-1, 1, 1], [7, -6, 1]] >>> y_pred = [[0, 2, 1], [-1, 2, 1], [8, -5, 1]] >>> regression_report(y_true, y_pred) # doctest: +NORMALIZE_WHITESPACE evar mse rmse mae r2 0 0.967742 0.416667 0.645497 0.5 0.965438 1 1.000000 1.000000 1.000000 1.0 0.908163 2 1.000000 0.000000 0.000000 0.0 1.000000 uniform_average 0.989247 0.472222 0.548499 0.5 0.957867 variance_weighted 0.983051 0.472222 0.548499 0.5 0.938257 >>> regression_report(y_true, y_pred, verbose=False) # doctest: +NORMALIZE_WHITESPACE evar: 0.989247 mse: 0.472222 rmse: 0.548499 mae: 0.500000 r2: 0.957867 >>> regression_report( ... y_true, y_pred, multioutput="variance_weighted", verbose=False ... ) # doctest: +NORMALIZE_WHITESPACE evar: 0.983051 mse: 0.472222 rmse: 0.548499 mae: 0.500000 r2: 0.938257 >>> regression_report(y_true, y_pred, multioutput=[0.3, 0.6, 0.1], verbose=False) # doctest: +NORMALIZE_WHITESPACE evar: 0.990323 mse: 0.725000 rmse: 0.793649 mae: 0.750000 r2: 0.934529 >>> regression_report(y_true, y_pred, verbose=True) # doctest: +NORMALIZE_WHITESPACE evar mse rmse mae r2 0 0.967742 0.416667 0.645497 0.5 0.965438 1 1.000000 1.000000 1.000000 1.0 0.908163 2 1.000000 0.000000 0.000000 0.0 1.000000 uniform_average 0.989247 0.472222 0.548499 0.5 0.957867 variance_weighted 0.983051 0.472222 0.548499 0.5 0.938257 >>> regression_report( ... y_true, y_pred, verbose=True, average_options=["macro", "vw", [0.3, 0.6, 0.1]] ... ) # doctest: +NORMALIZE_WHITESPACE evar mse rmse mae r2 0 0.967742 0.416667 0.645497 0.50 0.965438 1 1.000000 1.000000 1.000000 1.00 0.908163 2 1.000000 0.000000 0.000000 0.00 1.000000 uniform_average 0.989247 0.472222 0.548499 0.50 0.957867 variance_weighted 0.983051 0.472222 0.548499 0.50 0.938257 weighted 0.990323 0.725000 0.793649 0.75 0.934529
ranking¶
-
longling.ML.metrics.ranking.
ranking_report
(y_true, y_pred, k: (<class 'int'>, <class 'list'>) = None, continuous=False, coerce='ignore', pad_pred=-100, metrics=None, bottom=False, verbose=True) → longling.ML.metrics.utils.POrderedDict[源代码]¶ 参数: - y_true --
- y_pred --
- k --
- continuous --
- coerce --
- pad_pred --
- metrics --
- bottom --
- verbose --
实际案例
>>> y_true = [[1, 0, 0], [0, 0, 1]] >>> y_pred = [[0.75, 0.5, 1], [1, 0.2, 0.1]] >>> ranking_report(y_true, y_pred) # doctest: +NORMALIZE_WHITESPACE ndcg@k precision@k recall@k f1@k len@k support@k 1 1.000000 0.000000 0.0 0.0 1.0 2 3 0.565465 0.333333 1.0 0.5 3.0 2 5 0.565465 0.333333 1.0 0.5 3.0 2 10 0.565465 0.333333 1.0 0.5 3.0 2 auc: 0.250000 map: 0.416667 mrr: 0.416667 coverage_error: 2.500000 ranking_loss: 0.750000 len: 3.000000 support: 2 >>> ranking_report(y_true, y_pred, k=[1, 3, 5]) # doctest: +NORMALIZE_WHITESPACE ndcg@k precision@k recall@k f1@k len@k support@k 1 1.000000 0.000000 0.0 0.0 1.0 2 3 0.565465 0.333333 1.0 0.5 3.0 2 5 0.565465 0.333333 1.0 0.5 3.0 2 auc: 0.250000 map: 0.416667 mrr: 0.416667 coverage_error: 2.500000 ranking_loss: 0.750000 len: 3.000000 support: 2 >>> ranking_report(y_true, y_pred, bottom=True) # doctest: +NORMALIZE_WHITESPACE ndcg@k precision@k recall@k f1@k len@k support@k ndcg@k(B) \ 1 1.000000 0.000000 0.0 0.0 1.0 2 1.000000 3 0.565465 0.333333 1.0 0.5 3.0 2 0.806574 5 0.565465 0.333333 1.0 0.5 3.0 2 0.806574 10 0.565465 0.333333 1.0 0.5 3.0 2 0.806574 <BLANKLINE> precision@k(B) recall@k(B) f1@k(B) len@k(B) support@k(B) 1 0.500000 0.25 0.333333 1.0 2 3 0.666667 1.00 0.800000 3.0 2 5 0.666667 1.00 0.800000 3.0 2 10 0.666667 1.00 0.800000 3.0 2 auc: 0.250000 map: 0.416667 mrr: 0.416667 coverage_error: 2.500000 ranking_loss: 0.750000 len: 3.000000 support: 2 map(B): 0.708333 mrr(B): 0.750000 >>> ranking_report(y_true, y_pred, bottom=True, metrics=["auc"]) # doctest: +NORMALIZE_WHITESPACE auc: 0.250000 len: 3.000000 support: 2 >>> y_true = [[0.9, 0.7, 0.1], [0, 0.5, 1]] >>> y_pred = [[0.75, 0.5, 1], [1, 0.2, 0.1]] >>> ranking_report(y_true, y_pred, continuous=True) # doctest: +NORMALIZE_WHITESPACE ndcg@k len@k support@k 3 0.675647 3.0 2 5 0.675647 3.0 2 10 0.675647 3.0 2 mrr: 0.750000 len: 3.000000 support: 2 >>> y_true = [[1, 0], [0, 0, 1]] >>> y_pred = [[0.75, 0.5], [1, 0.2, 0.1]] >>> ranking_report(y_true, y_pred) # doctest: +NORMALIZE_WHITESPACE ndcg@k precision@k recall@k f1@k len@k support@k 1 1.00 0.500000 0.5 0.500000 1.0 2 3 0.75 0.416667 1.0 0.583333 2.5 2 5 0.75 0.416667 1.0 0.583333 2.5 2 10 0.75 0.416667 1.0 0.583333 2.5 2 auc: 0.500000 map: 0.666667 mrr: 0.666667 coverage_error: 2.000000 ranking_loss: 0.500000 len: 2.500000 support: 2 >>> ranking_report(y_true, y_pred, coerce="abandon") # doctest: +NORMALIZE_WHITESPACE ndcg@k precision@k recall@k f1@k len@k support@k 1 1.0 0.500000 0.5 0.5 1.0 2 3 0.5 0.333333 1.0 0.5 3.0 1 auc: 0.500000 map: 0.666667 mrr: 0.666667 coverage_error: 2.000000 ranking_loss: 0.500000 len: 2.500000 support: 2 >>> ranking_report(y_true, y_pred, coerce="padding") # doctest: +NORMALIZE_WHITESPACE ndcg@k precision@k recall@k f1@k len@k support@k 1 1.00 0.500000 0.5 0.500000 1.0 2 3 0.75 0.416667 1.0 0.583333 2.5 2 5 0.75 0.416667 1.0 0.583333 2.5 2 10 0.75 0.416667 1.0 0.583333 2.5 2 auc: 0.500000 map: 0.666667 mrr: 0.666667 coverage_error: 2.000000 ranking_loss: 0.500000 len: 2.500000 support: 2 >>> ranking_report(y_true, y_pred, bottom=True) # doctest: +NORMALIZE_WHITESPACE ndcg@k precision@k recall@k f1@k len@k support@k ndcg@k(B) \ 1 1.00 0.500000 0.5 0.500000 1.0 2 1.000000 3 0.75 0.416667 1.0 0.583333 2.5 2 0.846713 5 0.75 0.416667 1.0 0.583333 2.5 2 0.846713 10 0.75 0.416667 1.0 0.583333 2.5 2 0.846713 <BLANKLINE> precision@k(B) recall@k(B) f1@k(B) len@k(B) support@k(B) 1 0.500000 0.5 0.500000 1.0 2 3 0.583333 1.0 0.733333 2.5 2 5 0.583333 1.0 0.733333 2.5 2 10 0.583333 1.0 0.733333 2.5 2 auc: 0.500000 map: 0.666667 mrr: 0.666667 coverage_error: 2.000000 ranking_loss: 0.500000 len: 2.500000 support: 2 map(B): 0.791667 mrr(B): 0.750000 >>> ranking_report(y_true, y_pred, bottom=True, coerce="abandon") # doctest: +NORMALIZE_WHITESPACE ndcg@k precision@k recall@k f1@k len@k support@k ndcg@k(B) \ 1 1.0 0.500000 0.5 0.5 1.0 2 1.000000 3 0.5 0.333333 1.0 0.5 3.0 1 0.693426 <BLANKLINE> precision@k(B) recall@k(B) f1@k(B) len@k(B) support@k(B) 1 0.500000 0.5 0.5 1.0 2 3 0.666667 1.0 0.8 3.0 1 auc: 0.500000 map: 0.666667 mrr: 0.666667 coverage_error: 2.000000 ranking_loss: 0.500000 len: 2.500000 support: 2 map(B): 0.791667 mrr(B): 0.750000 >>> ranking_report(y_true, y_pred, bottom=True, coerce="padding") # doctest: +NORMALIZE_WHITESPACE ndcg@k precision@k recall@k f1@k len@k support@k ndcg@k(B) \ 1 1.00 0.500000 0.5 0.500000 1.0 2 1.000000 3 0.75 0.416667 1.0 0.583333 2.5 2 0.846713 5 0.75 0.416667 1.0 0.583333 2.5 2 0.846713 10 0.75 0.416667 1.0 0.583333 2.5 2 0.846713 <BLANKLINE> precision@k(B) recall@k(B) f1@k(B) len@k(B) support@k(B) 1 0.50 0.5 0.500000 1.0 2 3 0.50 1.0 0.650000 3.0 2 5 0.30 1.0 0.452381 5.0 2 10 0.15 1.0 0.257576 10.0 2 auc: 0.500000 map: 0.666667 mrr: 0.666667 coverage_error: 2.000000 ranking_loss: 0.500000 len: 2.500000 support: 2 map(B): 0.791667 mrr(B): 0.750000
toolkit¶
工具包模块
Monitor¶
用来监控数据装载,训练、测试(如epoch,batch 的进程,叫progress好了)。
数据的装载其实可以分出去
需要一个 monitor group 来统一管理这些monitor,可以用两种,一个是类,一个是字典。
API reference¶
General Toolkit¶
-
longling.ML.toolkit.analyser.
get_max
(src: ((<class 'str'>, <class 'pathlib.PurePath'>), <class 'list'>), *keys, with_keys: (<class 'str'>, None) = None, with_all=False, merge=True)[源代码]¶ 实际案例
>>> src = [ ... {"Epoch": 0, "macro avg": {"f1": 0.7}, "loss": 0.04, "accuracy": 0.7}, ... {"Epoch": 1, "macro avg": {"f1": 0.88}, "loss": 0.03, "accuracy": 0.8}, ... {"Epoch": 1, "macro avg": {"f1": 0.7}, "loss": 0.02, "accuracy": 0.66} ... ] >>> result, _ = get_max(src, "accuracy", merge=False) >>> result {'accuracy': 0.8} >>> _, result_appendix = get_max(src, "accuracy", with_all=True, merge=False) >>> result_appendix {'accuracy': {'Epoch': 1, 'macro avg': {'f1': 0.88}, 'loss': 0.03, 'accuracy': 0.8}} >>> result, result_appendix = get_max(src, "accuracy", "macro avg:f1", with_keys="Epoch", merge=False) >>> result {'accuracy': 0.8, 'macro avg:f1': 0.88} >>> result_appendix {'accuracy': {'Epoch': 1}, 'macro avg:f1': {'Epoch': 1}} >>> get_max(src, "accuracy", "macro avg:f1", with_keys="Epoch") {'accuracy': {'Epoch': 1, 'accuracy': 0.8}, 'macro avg:f1': {'Epoch': 1, 'macro avg:f1': 0.88}}
-
longling.ML.toolkit.analyser.
get_min
(src: ((<class 'str'>, <class 'pathlib.PurePath'>), <class 'list'>), *keys, with_keys: (<class 'str'>, None) = None, with_all=False, merge=True)[源代码]¶ >>> src = [ ... {"Epoch": 0, "macro avg": {"f1": 0.7}, "loss": 0.04, "accuracy": 0.7}, ... {"Epoch": 1, "macro avg": {"f1": 0.88}, "loss": 0.03, "accuracy": 0.8}, ... {"Epoch": 1, "macro avg": {"f1": 0.7}, "loss": 0.02, "accuracy": 0.66} ... ] >>> get_min(src, "loss") {'loss': 0.02}
-
class
longling.ML.toolkit.dataset.
ID2Feature
(feature_df: pandas.core.frame.DataFrame, id_field=None, set_index=False)[源代码]¶ 实际案例
>>> import pandas as pd >>> df = pd.DataFrame({"id": [0, 1, 2, 3, 4], "numeric": [1, 2, 3, 4, 5], "text": ["a", "b", "c", "d", "e"]}) >>> i2f = ID2Feature(df, id_field="id", set_index=True) >>> i2f[2] numeric 3 text c Name: 2, dtype: object >>> i2f[[2, 3]]["numeric"] id 2 3 3 4 Name: numeric, dtype: int64 >>> i2f(2) [3, 'c'] >>> i2f([2, 3]) [[3, 'c'], [4, 'd']]
-
class
longling.ML.toolkit.dataset.
ItemSpecificSampler
(triplet_df: pandas.core.frame.DataFrame, query_field='item_id', pos_field='pos', neg_field='neg', set_index=False, item_id_range=None, user_id_range=None, random_state=10)[源代码]¶ 实际案例
>>> import pandas as pd >>> user_num = 3 >>> item_num = 4 >>> rating_matrix = pd.DataFrame({ ... "user_id": [0, 1, 1, 1, 2], ... "item_id": [1, 3, 0, 2, 1] ... }) >>> triplet_df = ItemSpecificSampler.rating2triplet(rating_matrix) >>> triplet_df # doctest: +NORMALIZE_WHITESPACE pos neg item_id 0 [1] [] 1 [0, 2] [] 2 [1] [] 3 [1] [] >>> triplet_df.index Int64Index([0, 1, 2, 3], dtype='int64', name='item_id') >>> sampler = ItemSpecificSampler(triplet_df) >>> sampler(1) (0, [0]) >>> sampler = ItemSpecificSampler(triplet_df, user_id_range=user_num) >>> sampler(0, implicit=True) (1, [2]) >>> sampler(0, 5, implicit=True) (2, [2, 0, 0, 0, 0]) >>> sampler(0, 5, implicit=True, pad_value=-1) (2, [0, 2, -1, -1, -1]) >>> sampler([0, 1, 2], 5, implicit=True, pad_value=-1) [(2, [0, 2, -1, -1, -1]), (1, [1, -1, -1, -1, -1]), (2, [0, 2, -1, -1, -1])] >>> rating_matrix = pd.DataFrame({ ... "user_id": [0, 1, 1, 1, 2], ... "item_id": [1, 3, 0, 2, 1], ... "score": [1, 0, 1, 1, 0] ... }) >>> triplet_df = ItemSpecificSampler.rating2triplet(rating_matrix=rating_matrix, value_field="score") >>> triplet_df # doctest: +NORMALIZE_WHITESPACE pos neg item_id 0 [1] [] 1 [0] [2] 2 [1] [] 3 [] [1] >>> sampler = UserSpecificPairSampler(triplet_df) >>> sampler([0, 1, 2], 5, pad_value=-1) [(0, [-1, -1, -1, -1, -1]), (1, [2, -1, -1, -1, -1]), (0, [-1, -1, -1, -1, -1])] >>> sampler([0, 1, 2], 5, neg=False, pad_value=-1) [(1, [1, -1, -1, -1, -1]), (1, [0, -1, -1, -1, -1]), (1, [1, -1, -1, -1, -1])] >>> sampler(rating_matrix["item_id"], 2, neg=rating_matrix["score"], ... excluded_key=rating_matrix["user_id"], pad_value=-1) [(1, [2, -1]), (0, [-1, -1]), (0, [-1, -1]), (0, [-1, -1]), (1, [0, -1])] >>> sampler(rating_matrix["item_id"], 2, neg=rating_matrix["score"], ... excluded_key=rating_matrix["user_id"], pad_value=-1, return_column=True) ((1, 0, 0, 0, 1), ([2, -1], [-1, -1], [-1, -1], [-1, -1], [0, -1])) >>> sampler(rating_matrix["item_id"], 2, neg=rating_matrix["score"], ... excluded_key=rating_matrix["user_id"], pad_value=-1, return_column=True, split_sample_to_column=True) ((1, 0, 0, 0, 1), [(2, -1, -1, -1, 0), (-1, -1, -1, -1, -1)])
-
class
longling.ML.toolkit.dataset.
TripletPairSampler
(triplet_df: pandas.core.frame.DataFrame, query_field, pos_field='pos', neg_field='neg', set_index=False, query_range: (<class 'int'>, <class 'tuple'>, <class 'list'>) = None, key_range: (<class 'int'>, <class 'tuple'>, <class 'list'>) = None, random_state=10)[源代码]¶ 实际案例
>>> # implicit feedback >>> import pandas as pd >>> triplet_df = pd.DataFrame({ ... "query": [0, 1, 2], ... "pos": [[1], [3, 0, 2], [1]], ... "neg": [[], [], []] ... }) >>> sampler = TripletPairSampler(triplet_df, "query", set_index=True) >>> rating_matrix = pd.DataFrame({ ... "query": [0, 1, 1, 1, 2], ... "key": [1, 3, 0, 2, 1] ... }) >>> triplet_df = TripletPairSampler.rating2triplet(rating_matrix, query_field="query", key_field="key") >>> triplet_df # doctest: +NORMALIZE_WHITESPACE pos neg query 0 [1] [] 1 [3, 0, 2] [] 2 [1] [] >>> sampler = TripletPairSampler(triplet_df, "query") >>> sampler(0) (0, [0]) >>> sampler(0, 3) (0, [0, 0, 0]) >>> sampler(0, 3, padding=False) (0, []) >>> sampler = TripletPairSampler(triplet_df, "query", query_range=3, key_range=4) >>> sampler(0) (0, [0]) >>> sampler(0, 3) (0, [0, 0, 0]) >>> sampler(0, 3, padding=False) (0, []) >>> sampler(0, 5, padding=False, implicit=True) (3, [2, 3, 0]) >>> sampler(0, 5, padding=False, implicit=True, excluded_key=[3]) (2, [0, 2]) >>> sampler(0, 5, padding=True, implicit=True, excluded_key=[3]) (2, [2, 0, 0, 0, 0]) >>> sampler(0, 5, implicit=True, pad_value=-1) (3, [2, 3, 0, -1, -1]) >>> sampler(0, 5, implicit=True, fast_implicit=True, pad_value=-1) (3, [0, 2, 3, -1, -1]) >>> sampler(0, 5, implicit=True, fast_implicit=True, with_n_implicit=3, pad_value=-1) (3, [0, 2, 3, -1, -1, -1, -1, -1]) >>> sampler(0, 5, implicit=True, fast_implicit=True, with_n_implicit=3, pad_value=-1, padding_implicit=True) (3, [0, 2, 3, -1, -1, -1, -1, -1]) >>> rating_matrix = pd.DataFrame({ ... "query": [0, 1, 1, 1, 2], ... "key": [1, 3, 0, 2, 1], ... "score": [1, 0, 1, 1, 0] ... }) >>> triplet_df = TripletPairSampler.rating2triplet( ... rating_matrix, ... "query", "key", ... value_field="score" ... ) >>> triplet_df # doctest: +NORMALIZE_WHITESPACE pos neg query 0 [1] [] 1 [0, 2] [3] 2 [] [1] >>> sampler = TripletPairSampler(triplet_df, "query", query_range=3, key_range=4) >>> sampler([0, 1, 2], 5, implicit=True, pad_value=-1) [(3, [2, 3, 0, -1, -1]), (1, [1, -1, -1, -1, -1]), (3, [3, 0, 2, -1, -1])] >>> sampler([0, 1, 2], 5, pad_value=-1) [(0, [-1, -1, -1, -1, -1]), (1, [3, -1, -1, -1, -1]), (1, [1, -1, -1, -1, -1])] >>> sampler([0, 1, 2], 5, neg=False, pad_value=-1) [(1, [1, -1, -1, -1, -1]), (2, [0, 2, -1, -1, -1]), (0, [-1, -1, -1, -1, -1])] >>> sampler(rating_matrix["query"], 2, neg=rating_matrix["score"], ... excluded_key=rating_matrix["key"], pad_value=-1) [(0, [-1, -1]), (2, [2, 0]), (1, [3, -1]), (1, [3, -1]), (0, [-1, -1])] >>> sampler(rating_matrix["query"], 2, neg=rating_matrix["score"], ... excluded_key=rating_matrix["key"], pad_value=-1, return_column=True) ((0, 2, 1, 1, 0), ([-1, -1], [0, 2], [3, -1], [3, -1], [-1, -1])) >>> sampler(rating_matrix["query"], 2, neg=rating_matrix["score"], ... excluded_key=rating_matrix["key"], pad_value=-1, return_column=True, split_sample_to_column=True) ((0, 2, 1, 1, 0), [(-1, 0, 3, 3, -1), (-1, 2, -1, -1, -1)]) >>> rating_matrix = pd.DataFrame({ ... "query": [0, 1, 1, 1, 2], ... "key": [1, 3, 0, 2, 1], ... "score": [0.8, 0.4, 0.7, 0.5, 0.1] ... }) >>> TripletPairSampler.rating2triplet( ... rating_matrix, ... "query", "key", ... value_field="score", ... value_threshold=0.5 ... ) # doctest: +NORMALIZE_WHITESPACE pos neg query 0 [1] [] 1 [0, 2] [3] 2 [] [1]
-
class
longling.ML.toolkit.dataset.
UserSpecificPairSampler
(triplet_df: pandas.core.frame.DataFrame, query_field='user_id', pos_field='pos', neg_field='neg', set_index=False, user_id_range=None, item_id_range=None, random_state=10)[源代码]¶ 实际案例
>>> import pandas as pd >>> user_num = 3 >>> item_num = 4 >>> rating_matrix = pd.DataFrame({ ... "user_id": [0, 1, 1, 1, 2], ... "item_id": [1, 3, 0, 2, 1] ... }) >>> triplet_df = UserSpecificPairSampler.rating2triplet(rating_matrix) >>> triplet_df # doctest: +NORMALIZE_WHITESPACE pos neg user_id 0 [1] [] 1 [3, 0, 2] [] 2 [1] [] >>> sampler = UserSpecificPairSampler(triplet_df) >>> sampler(1) (0, [0]) >>> sampler = UserSpecificPairSampler(triplet_df, item_id_range=item_num) >>> sampler(0, implicit=True) (1, [3]) >>> sampler(0, 5, implicit=True) (3, [3, 2, 0, 0, 0]) >>> sampler(0, 5, implicit=True, pad_value=-1) (3, [3, 2, 0, -1, -1]) >>> sampler([0, 1, 2], 5, implicit=True, pad_value=-1) [(3, [2, 3, 0, -1, -1]), (1, [1, -1, -1, -1, -1]), (3, [2, 0, 3, -1, -1])] >>> rating_matrix = pd.DataFrame({ ... "user_id": [0, 1, 1, 1, 2], ... "item_id": [1, 3, 0, 2, 1], ... "score": [1, 0, 1, 1, 0] ... }) >>> triplet_df = UserSpecificPairSampler.rating2triplet(rating_matrix=rating_matrix, value_field="score") >>> triplet_df # doctest: +NORMALIZE_WHITESPACE pos neg user_id 0 [1] [] 1 [0, 2] [3] 2 [] [1] >>> sampler = UserSpecificPairSampler(triplet_df) >>> sampler([0, 1, 2], 5, pad_value=-1) [(0, [-1, -1, -1, -1, -1]), (1, [3, -1, -1, -1, -1]), (1, [1, -1, -1, -1, -1])] >>> sampler([0, 1, 2], 5, neg=False, pad_value=-1) [(1, [1, -1, -1, -1, -1]), (2, [0, 2, -1, -1, -1]), (0, [-1, -1, -1, -1, -1])] >>> sampler(rating_matrix["user_id"], 2, neg=rating_matrix["score"], ... excluded_key=rating_matrix["item_id"], pad_value=-1) [(0, [-1, -1]), (2, [2, 0]), (1, [3, -1]), (1, [3, -1]), (0, [-1, -1])] >>> sampler(rating_matrix["user_id"], 2, neg=rating_matrix["score"], ... excluded_key=rating_matrix["item_id"], pad_value=-1, return_column=True) ((0, 2, 1, 1, 0), ([-1, -1], [0, 2], [3, -1], [3, -1], [-1, -1])) >>> sampler(rating_matrix["user_id"], 2, neg=rating_matrix["score"], ... excluded_key=rating_matrix["item_id"], pad_value=-1, return_column=True, split_sample_to_column=True) ((0, 2, 1, 1, 0), [(-1, 2, 3, 3, -1), (-1, 0, -1, -1, -1)])
-
longling.ML.toolkit.dataset.
train_test
(*files, train_size: (<class 'float'>, <class 'int'>) = 0.8, test_size: (<class 'float'>, <class 'int'>, None) = None, ratio=None, random_state=None, shuffle=True, target_names=None, suffix: list = None, prefix='', logger=<Logger dataset (INFO)>, **kwargs)[源代码]¶ 参数: - files --
- train_size (float, int, or None, (default=0.8)) -- Represent the proportion of the dataset to include in the train split.
- test_size (float, int, or None) -- Represent the proportion of the dataset to include in the train split.
- random_state (int, RandomState instance or None, optional (default=None)) -- If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by np.random.
- shuffle (boolean, optional (default=True)) -- Whether or not to shuffle the data before splitting. If shuffle=False then stratify must be None
- target_names (list of PATH_TYPE) --
- suffix (list) --
- kwargs --
-
longling.ML.toolkit.dataset.
train_valid_test
(*files, train_size: (<class 'float'>, <class 'int'>) = 0.8, valid_size: (<class 'float'>, <class 'int'>) = 0.1, test_size: (<class 'float'>, <class 'int'>, None) = None, ratio=None, random_state=None, shuffle=True, target_names=None, suffix: list = None, logger=<Logger dataset (INFO)>, prefix='', **kwargs)[源代码]¶ 参数: - files --
- train_size (float, int, or None, (default=0.8)) -- Represent the proportion of the dataset to include in the train split.
- valid_size (float, int, or None, (default=0.1)) -- Represent the proportion of the dataset to include in the valid split.
- test_size (float, int, or None) -- Represent the proportion of the dataset to include in the test split.
- random_state (int, RandomState instance or None, optional (default=None)) -- If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by np.random.
- shuffle (boolean, optional (default=True)) -- Whether or not to shuffle the data before splitting. If shuffle=False then stratify must be None
- target_names --
- suffix (list) --
- kwargs --
-
class
longling.ML.toolkit.formatter.
EpisodeEvalFMT
(logger=<RootLogger root (WARNING)>, dump_file: (((<class 'str'>, <class 'pathlib.PurePath'>), (<class '_io.TextIOWrapper'>, <class 'typing.TextIO'>, <class 'typing.BinaryIO'>, <class 'codecs.StreamReaderWriter'>, <class 'fileinput.FileInput'>)), None) = False, col: (<class 'int'>, None) = None, **kwargs)[源代码]¶ 实际案例
>>> import numpy as np >>> from longling.ML.metrics import classification_report >>> y_true = np.array([0, 0, 1, 1, 2, 1]) >>> y_pred = np.array([2, 1, 0, 1, 1, 0]) >>> y_score = np.array([ ... [0.15, 0.4, 0.45], ... [0.1, 0.9, 0.0], ... [0.33333, 0.333333, 0.333333], ... [0.15, 0.4, 0.45], ... [0.1, 0.9, 0.0], ... [0.33333, 0.333333, 0.333333] ... ]) >>> print(EpisodeEvalFMT.format( ... iteration=30, ... eval_name_value=classification_report(y_true, y_pred, y_score) ... )) # doctest: +NORMALIZE_WHITESPACE Episode [30] precision recall f1 support 0 0.000000 0.000000 0.000000 2 1 0.333333 0.333333 0.333333 3 2 0.000000 0.000000 0.000000 1 macro_avg 0.111111 0.111111 0.111111 6 accuracy: 0.166667 macro_auc: 0.194444
-
class
longling.ML.toolkit.formatter.
EpochEvalFMT
(logger=<RootLogger root (WARNING)>, dump_file: (((<class 'str'>, <class 'pathlib.PurePath'>), (<class '_io.TextIOWrapper'>, <class 'typing.TextIO'>, <class 'typing.BinaryIO'>, <class 'codecs.StreamReaderWriter'>, <class 'fileinput.FileInput'>)), None) = False, col: (<class 'int'>, None) = None, **kwargs)[源代码]¶ 实际案例
>>> import numpy as np >>> from longling.ML.metrics import classification_report >>> y_true = np.array([0, 0, 1, 1, 2, 1]) >>> y_pred = np.array([2, 1, 0, 1, 1, 0]) >>> y_score = np.array([ ... [0.15, 0.4, 0.45], ... [0.1, 0.9, 0.0], ... [0.33333, 0.333333, 0.333333], ... [0.15, 0.4, 0.45], ... [0.1, 0.9, 0.0], ... [0.33333, 0.333333, 0.333333] ... ]) >>> print(EpochEvalFMT.format( ... iteration=30, ... eval_name_value=classification_report(y_true, y_pred, y_score) ... )) # doctest: +NORMALIZE_WHITESPACE Epoch [30] precision recall f1 support 0 0.000000 0.000000 0.000000 2 1 0.333333 0.333333 0.333333 3 2 0.000000 0.000000 0.000000 1 macro_avg 0.111111 0.111111 0.111111 6 accuracy: 0.166667 macro_auc: 0.194444
-
class
longling.ML.toolkit.formatter.
EvalFMT
(logger=<RootLogger root (WARNING)>, dump_file: (((<class 'str'>, <class 'pathlib.PurePath'>), (<class '_io.TextIOWrapper'>, <class 'typing.TextIO'>, <class 'typing.BinaryIO'>, <class 'codecs.StreamReaderWriter'>, <class 'fileinput.FileInput'>)), None) = False, col: (<class 'int'>, None) = None, **kwargs)[源代码]¶ 评价指标格式化类。可以按一定格式快速格式化评价指标。
参数: - logger -- 默认为 root logger
- dump_file -- 不为空时,将结果写入dump_file
- col (int) -- 每行放置的指标数量
- kwargs -- 拓展兼容性参数
实际案例
>>> import numpy as np >>> from longling.ML.metrics import classification_report >>> y_true = np.array([0, 0, 1, 1, 2, 1]) >>> y_pred = np.array([2, 1, 0, 1, 1, 0]) >>> y_score = np.array([ ... [0.15, 0.4, 0.45], ... [0.1, 0.9, 0.0], ... [0.33333, 0.333333, 0.333333], ... [0.15, 0.4, 0.45], ... [0.1, 0.9, 0.0], ... [0.33333, 0.333333, 0.333333] ... ]) >>> print(EvalFMT.format( ... iteration=30, ... eval_name_value=classification_report(y_true, y_pred, y_score) ... )) # doctest: +NORMALIZE_WHITESPACE Iteration [30] precision recall f1 support 0 0.000000 0.000000 0.000000 2 1 0.333333 0.333333 0.333333 3 2 0.000000 0.000000 0.000000 1 macro_avg 0.111111 0.111111 0.111111 6 accuracy: 0.166667 macro_auc: 0.194444
-
class
longling.ML.toolkit.monitor.
EMAValue
(value_function_names: (<class 'list'>, <class 'dict'>), smoothing_constant=0.1, *args, **kwargs)[源代码]¶ Exponential moving average: smoothing to give progressively lower weights to older values.
\[losses[name] = (1 - c) \times previous\_value + c \times loss\_value\]>>> ema = EMAValue(["l2"]) >>> ema["l2"] nan >>> ema("l2", 100) >>> ema("l2", 1) >>> ema["l2"] 90.1 >>> list(ema.values()) [90.1] >>> list(ema.keys()) ['l2'] >>> list(ema.items()) [('l2', 90.1)] >>> ema.reset() >>> ema["l2"] nan >>> ema = EMAValue(["l1", "l2"]) >>> ema["l2"], ema["l1"] (nan, nan) >>> ema.updates({"l1": 1, "l2": 10}) >>> ema.updates({"l1": 10, "l2": 100}) >>> ema["l1"] 1.9 >>> ema["l2"] 19.0 >>> ema = EMAValue(["l1"], smoothing_constant=0.0) >>> ema["l1"] nan >>> ema.updates({"l1": 1}) >>> ema.updates({"l1": 10}) >>> ema["l1"] 1.0 >>> ema = EMAValue(["l1"], smoothing_constant=1.0) >>> ema.updates({"l1": 1}) >>> ema.updates({"l1": 10}) >>> ema["l1"] 10.0 >>> @as_tmt_value ... def mse_loss(a): ... return a ** 2 >>> ema = EMAValue({"mse": mse_loss}) >>> ema["mse"] nan >>> mse_loss(1) 1 >>> ema["mse"] 1 >>> mse_loss(10) 100 >>> ema["mse"] 10.9 >>> ema = EMAValue({"mse": mse_loss}) >>> mse_loss(1) 1 >>> ema["mse"] 1 >>> ema.monitor_off("mse") >>> ema.func {} >>> mse_loss(10) 100 >>> "mse" not in ema True >>> ema.monitor_on("mse", mse_loss) >>> mse_loss(10) 100 >>> ema["mse"] 100
-
class
longling.ML.toolkit.monitor.
MovingLoss
(value_function_names: (<class 'list'>, <class 'dict'>), smoothing_constant=0.1, *args, **kwargs)[源代码]¶ 实际案例
>>> lm = MovingLoss(["l2"]) >>> lm.losses {'l2': nan} >>> lm("l2", 100) >>> lm("l2", 1) >>> lm["l2"] 90.1
-
longling.ML.toolkit.monitor.
as_tmt_loss
(loss_obj, loss2value=<function <lambda>>)[源代码]¶ 参数: - loss_obj --
- loss2value --
实际案例
>>> @as_tmt_loss ... def mse(v): ... return v ** 2 >>> mse(2) 4
-
longling.ML.toolkit.monitor.
as_tmt_value
(value_obj, transform=<function <lambda>>)[源代码]¶ 参数: - value_obj --
- transform --
实际案例
>>> def loss_f(a): ... return a >>> loss_f(10) 10 >>> tmt_loss_f = as_tmt_value(loss_f) >>> tmt_loss_f(10) 10 >>> @as_tmt_value ... def loss_f2(a): ... return a >>> loss_f2(10) 10
-
longling.ML.toolkit.hyper_search.
prepare_hyper_search
(cfg_kwargs: dict, reporthook=None, final_reporthook=None, primary_key=None, max_key=True, reporter_cls=None, with_keys: (<class 'list'>, <class 'str'>, None) = None, final_keys: (<class 'list'>, <class 'str'>, None) = None, dump=False, disable=False)[源代码]¶ Updated in v1.3.18
从 nni package 中获取超参,更新配置文件参数。当 nni 不可用或不是 nni 搜索模式时,参数将不会改变。
cfg_kwargs, reporthook, final_reporthook, tag = prepare_hyper_search( cfg_kwargs, reporthook, final_reporthook, primary_key="macro_avg:f1" ) _cfg = Configuration(**cfg_kwargs) model = Model(_cfg) ... for epoch in range(_cfg.begin_epoch, _cfg.end_epoch): for batch_data in dataset: train_model(batch_data) data = evaluate_model() reporthook(data) final_reporthook()
参数: - cfg_kwargs (dict) -- 待传入cfg的参数
- reporthook --
- final_reporthook --
- primary_key -- 评估模型用的主键,
nni.report_intermediate_result
和nni.report_final_result
中metric
的default
- max_key (bool) -- 主键是越大越好
- reporter_cls --
- with_keys (list or str) -- 其它要存储的 metric,final report时默认为 primary_key 最优时指标
- final_keys (list or str) -- with_keys 中使用最后一个 report result 而不是 primary_key 最优时指标
- dump (bool) -- 为 True 时,会修改 配置文件 中 workspace 参数为
workspace/nni.get_experiment_id()/nni.get_trial_id()
使得 nni 的中间结果会被存储下来。 - disable --
返回: - cfg_kwargs (dict) -- 插入了nni超参后的配置文件参数
- reporthook (function) -- 每个iteration结束后的回调函数,用来报告中间结果。
默认
nni.report_intermediate_result
。 - final_reporthook -- 所有iteration结束后的回调函数,用来报告最终结果。
默认
nni.report_final_result
- dump (bool) -- 和传入参数保持一致
实际案例
class CFG(Configuration): hyper_params = {"hidden_num": 100} learning_rate = 0.001 workspace = "" cfg_kwargs, reporthook, final_reporthook, dump = prepare_hyper_search( {"learning_rate": 0.1}, CFG, primary_key="macro_avg:f1", with_keys="accuracy" ) # cfg_kwargs: {'learning_rate': 0.1}
when nni start (e.g., using
nni create --config _config.yml
), suppose in_config.yml
:and in
_search_space.json
{ "hidden_num": {"_type": "choice", "_value": [500, 600, 700, 835, 900]}, }
one of the return cfg_kwargs is
{'hyper_params': {'hidden_num': 50}, 'learning_rate': 0.1}
MxnetHelper¶
总共包含四个模块
gallery¶
模型、网络层集合
- layer: 网络单元,如各类 attention,highway 等
- loss: 损失函数,如 pairwise-loss 等
- network: 网络层,如 TextCNN 等
API reference¶
Helper Functions¶
General Toolkit¶
Here are some frequently used regex expression for select in collect_params()
Glue: Gluon Example¶
Glue (Gluon Example) aims to generate a neural network model template of Mxnet-Gluon which can be quickly developed into a mature model. The source code is here
It is automatically installed when you installing longling package. The tutorial of installing can be found here.
With glue, it is possible to quickly construct a model. A demo case can be referred in `here <>`_. And the model can be divided into several different functionalities:
- ETL(extract-transform-load): generate the data stream for model;
- Symbol()
Also, we call those variables like working directory, path to data, hyper parameters
Run the following commands to use glue:
# Create a full project including docs and requirements
glue --model_name ModelName
# Or, only create a network model template
glue --model_name ModelName --skip_top
The template files will be generate in current directory. To change
the position of files, use --directory
option to specify the location:
glue --model_name ModelName --directory LOCATION
For more help, run glue --help
Usually, the project template will consist of doc files and model files. Assume the project name by default is ModelName,then the directory of model files will have the same name, the directory tree is like:
ModelName(Project)
----docs
----ModelName(Model)
And in ModelName(Model), there are one template file named ModelName.py and a directory containing four sub-template files.
The directory tree is like:
ModelName/
├── __init__.py
├── ModelName.py
└── Module/
├── __init__.py
├── configuration.py
├── etl.py
├── module.py
├── run.py
└── sym/
├── __init__.py
├── fit_eval.py
├── net.py
└── viz.py
- The `configuration.py <>`_ defines the all parameters should be configured, like where to store the model parameters and configuration parameters, the hyper-parameters of the neural network.
- The `etl.py <>`_ defines the process of extract-transform-load, which is the definition of data processing.
- The `module.py <>`_ serves as a high-level wrapper for sym.py, which provides the well-written interfaces, like model persistence, batch loop, epoch loop and data pre-process on distributed computation.
- The `sym.py <>`_ is the minimal model can be directly used to train, evaluate, also supports visualization. But some higher-level operations are not supported for simplification and modulation, which are defined in module.py.
extract: extract the data from data src .. code-block:: python
- def extract(data_src):
# load data from file, the data format is looked like: # feature, label features = [] labels = [] with open(data_src) as f:
- for line in f:
feature, label = line.split() features.append(feature) labels.append(label)
return features, labels
transform: Convert the extracted into batch data. The pre-process like bucketing can be defined here.
from mxnet import gluon def transform(raw_data, params): # 定义数据转换接口 # raw_data --> batch_data batch_size = params.batch_size return gluon.data.DataLoader(gluon.data.ArrayDataset(raw_data), batch_size)
etl: combine the extract and transform together.
Usually, there are three level components need to be configured:
- bottom: the network symbol and how to fit and eval it;
- middle: the higher level to define the batch and epoch, also the initialization and persistence of model parameters.
- top: the api of model
Find the configuration.py and define the configuration variables that you need, for example:
- begin_epoch
- end_epoch
- batch_size
Also, the paths can be configured:
import longling.ML.MxnetHelper.glue.parser as parser
from longling.ML.MxnetHelper.glue.parser import var2exp
import pathlib
import datetime
# 目录配置
class Configuration(parser.Configuration):
model_name = str(pathlib.Path(__file__).parents[1].name)
root = pathlib.Path(__file__).parents[2]
dataset = ""
timestamp = datetime.datetime.now().strftime("%Y%m%d%H%M%S")
workspace = ""
root_data_dir = "$root/data/$dataset" if dataset else "$root/data"
data_dir = "$root_data_dir/data"
root_model_dir = "$root_data_dir/model/$model_name"
model_dir = "$root_model_dir/$workspace" if workspace else root_model_dir
cfg_path = "$model_dir/configuration.json"
def __init__(self, params_json=None, **kwargs):
# set dataset
if kwargs.get("dataset"):
kwargs["root_data_dir"] = "$root/data/$dataset"
# set workspace
if kwargs.get("workspace"):
kwargs["model_dir"] = "$root_model_dir/$workspace"
# rebuild relevant directory or file path according to the kwargs
_dirs = [
"workspace", "root_data_dir", "data_dir", "root_model_dir",
"model_dir"
]
for _dir in _dirs:
exp = var2exp(
kwargs.get(_dir, getattr(self, _dir)),
env_wrap=lambda x: "self.%s" % x
)
setattr(self, _dir, eval(exp))
How the variable paths work can be referred in `here <>`_
Refer to the prototype for illustration. Refer to `full documents about Configuration <>`_ for details.
The network symbol file is `sym.py <>`_
The following variables and functions should be rewritten (marked as **todo**):
two ways can be used to check whether the network works well:
- Visualization:
- functions name: net_viz *
- Numerical: