datawork documentation¶
datawork.api.config¶
Basic Option and Config functionality.
-
class
datawork.api.config.
Configurable
[source]¶ A Configurable class contains a ‘Config’ attribute called ‘.config’.
-
class
datawork.api.config.
Option
(desc=None, name=None, required=False, default=None)[source]¶ An Option is a JSON serializable argument to a function.
-
__init__
(desc=None, name=None, required=False, default=None)[source]¶ Construct an ‘Option’ with name, default value, and description.
-
value
¶ Get the set value or raise a ValueError.
-
datawork.api.data¶
Module implementing abstract Data class.
-
class
datawork.api.data.
Data
(desc=None, name=None)[source]¶ Data placeholder class.
This class represents data that has either not yet been computed, or is furthermore not fully specified. Classes inheriting
Data
implement placeholders for specific data types, e.g. Pandas dataframes or numpy arrays.Subclasses of
Data
are typically instantiated by invocations ofTool
.Thus
Data
andInvocation
are connected and form the backbone of the computational graph, withTool
objects connected toInvocation
as objects that can be configured.Note that the provider attribute itself an
Invocation
, can be “partial”, in which case the data object itself is callable. When called, arguments are passed to the provider which will create new invocations; potentially now non-partial ones.-
__init__
(desc=None, name=None)[source]¶ Construct a placeholder data object.
Parameters: - desc – a plain-text description of this data object
- name – a short-hand name for this data object
-
classmethod
constant
(val, name='constant')[source]¶ Create a constant from appropriately typed variable.
-
data
¶ Getter for data attribute.
-
datawork.api.graph¶
Node class and associated graph traversal functionality.
-
class
datawork.api.graph.
Node
[source]¶ Abstract node class.
-
datawork.api.graph.
compute_dag
(outputs)[source]¶ Compute a directed acyclic graph with the given outputs.
-
datawork.api.graph.
extract_config
(g)[source]¶ Extract dictionary with all configuration options found in graph.
-
datawork.api.graph.
extract_inputs
(g)[source]¶ Extract Data nodes that have no “Provides” in-neighbors.
datawork.api.invocation¶
Module implementing Tool and Invocation.
-
class
datawork.api.invocation.
Invocation
(tool, args)[source]¶ Called tool connecting input data to output data.
This class represents a
Tool
with fully or partially specified inputs, ready for computation and caching. It is responsible for providing cache identifiers for all its outputs.-
invoke
(*args)[source]¶ Handle partial evaluation by invoking with more arguments.
The result of this method is another
Invocation
.If the same arguments (meaning the same objects, identified by python id) are provided, the same invocation object is returned.
-
o
¶ Implement a getter for evaluating outputs on demand.
-
populate
()[source]¶ Compute the outputs by calling a tool’s ‘.run()’ method.
This method organizes all of the input
data.Data
andconfig.Option
parameters and passes them to the statictool.Tool.run()
method.
-
datawork.api.tool¶
Module implementing Tool and Invocation.
-
class
datawork.api.tool.
Tool
[source]¶ A class for composable tools.
This is the base class for configurable functions that transform
Data
objects.-
__call__
(*args)[source]¶ Let a tool act on some
Data
objects.Calling a tool instance _invokes_ the tool, which results in an
Invocation
instance if the arguments are of classData
.
-
datawork.instances.config¶
Common instances of Option including most JSON types.
-
class
datawork.instances.config.
BoolOption
(desc=None, name=None, required=False, default=None)[source]¶ A boolean option.
-
value_type
¶ alias of
builtins.bool
-
-
class
datawork.instances.config.
EnumOption
(desc, choices=None, **kwargs)[source]¶ An enum option represents a choice from a finite list.
-
value_type
¶ alias of
builtins.str
-
-
class
datawork.instances.config.
FloatOption
(desc=None, name=None, required=False, default=None)[source]¶ A single float option.
-
value_type
¶ alias of
builtins.float
-
-
class
datawork.instances.config.
IntOption
(desc=None, name=None, required=False, default=None)[source]¶ A single integer option.
-
value_type
¶ alias of
builtins.int
-
-
class
datawork.instances.config.
RandomSeedOption
(desc=None, name=None, required=False, default=None)[source]¶ An
IntOption
subclass specifically for random seeds.This class makes it a bit easier to detect random seeds in large pipelines, which should make studying variability due to controllable (RNG) randomness straightforward.
datawork.instances.data¶
Instances of Data for common data payloads.
-
class
datawork.instances.data.
FileData
(desc=None, name=None)[source]¶ Base class for any disk-native data.
For example, SQLiteData will use this as a base class.
-
class
datawork.instances.data.
JSONData
(desc=None, name=None)[source]¶ A Data class for primitive JSON serializable types.
- The so-called “primitive types” in JSON are:
- string
- numeric types
- object (in python this is a
dict
) - array
- boolean
- null
- In this class, hierarchies of the following types are supported:
Note that although other types than these may be serializable in Python (by subclassing
json.JSONEncoder
), the primitive types can be serialized/deserialized unambiguously. For example, we do not support tuples, although thejson
module supports serializing them by casting them to lists.
-
class
datawork.instances.data.
KerasModelData
(desc=None, name=None)[source]¶ A Data class for Keras models.
-
class
datawork.instances.data.
PandasData
(*args, **kwargs)[source]¶ Data type for Pandas DataFrames and Series.