bulwark package¶

Submodules¶

bulwark.checks module¶

Each function in this module should:

take a pd.DataFrame as its first argument, with optional additional arguments,
make an assert about the pd.DataFrame, and
return the original, unaltered pd.DataFrame

bulwark.checks.custom_check(check_func, df, *args, **kwargs)[source]¶

Assert that check(df, *args, **kwargs) is true.

Parameters:	df (pd.DataFrame) – Any pd.DataFrame. check_func (function) – A function taking df, args, and *kwargs. Should raise AssertionError if check not passed.
Returns:	Original df.

bulwark.checks.has_columns(df, columns, exact_cols=False, exact_order=False)[source]¶

Asserts that df has columns

Parameters:	df (pd.DataFrame) – Any pd.DataFrame. columns (list or tuple) – Columns that are expected to be in `df`. exact_cols (bool) – Whether or not `columns` need to be the only columns in `df`. exact_order (bool) – Whether or not `columns` need to be in the same order as the columns in `df`.
Returns:	Original df.

bulwark.checks.has_dtypes(df, items)[source]¶

Asserts that df has dtypes

Parameters:	df (pd.DataFrame) – Any pd.DataFrame. items (dict) – Mapping of columns to dtype.
Returns:	Original df.

bulwark.checks.has_no_infs(df, columns=None)[source]¶

Asserts that there are no np.infs in df.

This is a convenience wrapper for has_no_x.

Parameters:	df (pd.DataFrame) – Any pd.DataFrame. columns (list) – A subset of columns to check for np.infs.
Returns:	Original df.

bulwark.checks.has_no_nans(df, columns=None)[source]¶

Asserts that there are no np.nans in df.

This is a convenience wrapper for has_no_x.

Parameters:	df (pd.DataFrame) – Any pd.DataFrame. columns (list) – A subset of columns to check for np.nans.
Returns:	Original df.

bulwark.checks.has_no_neg_infs(df, columns=None)[source]¶

Asserts that there are no np.infs in df.

This is a convenience wrapper for has_no_x.

Parameters:	df (pd.DataFrame) – Any pd.DataFrame. columns (list) – A subset of columns to check for -np.infs.
Returns:	Original df.

bulwark.checks.has_no_nones(df, columns=None)[source]¶

Asserts that there are no Nones in df.

This is a convenience wrapper for has_no_x.

Parameters:	df (pd.DataFrame) – Any pd.DataFrame. columns (list) – A subset of columns to check for Nones.
Returns:	Original df.

bulwark.checks.has_no_x(df, values=None, columns=None)[source]¶

Asserts that there are no user-specified values in df’s columns.

Parameters:	df (pd.DataFrame) – Any pd.DataFrame. values (list) – A list of values to check for in the pd.DataFrame. columns (list) – A subset of columns to check for values.
Returns:	Original df.

bulwark.checks.has_set_within_vals(df, items)[source]¶

Asserts that all given values are found in columns’ values.

In other words, the given values in the items dict should all be a subset of the values found in the associated column in df.

Parameters:	df (pd.DataFrame) – Any pd.DataFrame. items (dict) – Mapping of columns to values excepted to be found within them.
Returns:	Original df.

Examples

The following check will pass, since df[‘a’] contains each of 1 and 2:

>>> df = pd.DataFrame({'a': [1, 2, 3], 'b': ['a', 'b', 'c']})
>>> ck.has_set_within_vals(df, items={"a": [1, 2]})

The following check will fail, since df[‘b’] doesn’t contain each of “a” and “d”:

>>> df = pd.DataFrame({'a': [1, 2, 3], 'b': ['a', 'b', 'c']})
>>> ck.has_set_within_vals(df, items={"a": [1, 2], "b": ["a", "d"]})

bulwark.checks.has_unique_index(df)[source]¶

Asserts that df’s index is unique.

Parameters:	df (pd.DataFrame) – Any pd.DataFrame.
Returns:	Original df.

bulwark.checks.is_monotonic(df, items=None, increasing=None, strict=False)[source]¶

Asserts that the df is monotonic.

Parameters:	df (pd.DataFrame) – Any pd.DataFrame. items (dict) – Mapping of columns to conditions (increasing, strict) increasing (bool, None) – None is either increasing or decreasing. strict (bool) – Whether the comparison should be strict.
Returns:	Original df.

bulwark.checks.is_same_as(df, df_to_compare, **kwargs)[source]¶

Asserts that two pd.DataFrames are equal.

Parameters:	df (pd.DataFrame) – Any pd.DataFrame. df_to_compare (pd.DataFrame) – A second pd.DataFrame. *kwargs (dict*) – Keyword arguments passed through to pandas’ `assert_frame_equal`.
Returns:	Original df.

bulwark.checks.is_shape(df, shape)[source]¶

Asserts that df is of a known row x column shape.

Parameters:	df (pd.DataFrame) – Any pd.DataFrame. shape (tuple) – Shape of df as (n_rows, n_columns). Use None or -1 if you don’t care about a specific dimension.
Returns:	Original df.

bulwark.checks.multi_check(df, checks, warn=False)[source]¶

Asserts that all checks pass.

Parameters:	df (pd.DataFrame) – Any pd.DataFrame. checks (dict) – Mapping of check functions to parameters for those check functions. warn (bool) – Indicates whether an error should be raised or only a warning notification should be displayed. Default is to error.
Returns:	Original df.

bulwark.checks.one_to_many(df, unitcol, manycol)[source]¶

Asserts that a many-to-one relationship is preserved between two columns.

For example, a retail store will have have distinct departments, each with several employees. If each employee may only work in a single department, then the relationship of the department to the employees is one to many.

Parameters:	df (pd.DataFrame) – Any pd.DataFrame. unitcol (str) – The column that encapulates the groups in `manycol`. manycol (str) – The column that must remain unique in the distict pairs between `manycol` and `unitcol`.
Returns:	Original df.

bulwark.checks.unique(df, columns=None)[source]¶

Asserts that columns in df only have unique values.

Parameters:	df (pd.DataFrame) – Any pd.DataFrame. columns (list) – A subset of columns to check for uniqueness of row values.
Returns:	Original df.

bulwark.checks.within_n_std(df, n=3)[source]¶

Asserts that every value is within n standard deviations of its column’s mean.

Parameters:	df (pd.DataFrame) – Any pd.DataFrame. n (int) – Number of standard deviations from the mean.
Returns:	Original df.

bulwark.checks.within_range(df, items=None)[source]¶

Asserts that df is within a range.

Parameters:	df (pd.DataFrame) – Any pd.DataFrame. items (dict) – Mapping of columns (col) to a (low, high) tuple (v) that `df[col]` is expected to be between.
Returns:	Original df.

bulwark.checks.within_set(df, items=None)[source]¶

Asserts that df is a subset of items.

Parameters:	df (pd.DataFrame) – Any pd.DataFrame. items (dict) – Mapping of columns (col) to array-like of values (v) that `df[col]` is expected to be a subset of.
Returns:	Original df.

bulwark.decorators module¶

class bulwark.decorators.BaseDecorator(*args, **kwargs)[source]¶: Bases: object

bulwark.decorators.HasColumns¶: alias of bulwark.decorators.decorator_factory.<locals>.decorator_name

bulwark.decorators.HasDtypes¶: alias of bulwark.decorators.decorator_factory.<locals>.decorator_name

bulwark.decorators.HasNoInfs¶: alias of bulwark.decorators.decorator_factory.<locals>.decorator_name

bulwark.decorators.HasNoNans¶: alias of bulwark.decorators.decorator_factory.<locals>.decorator_name

bulwark.decorators.HasNoNegInfs¶: alias of bulwark.decorators.decorator_factory.<locals>.decorator_name

bulwark.decorators.HasNoNones¶: alias of bulwark.decorators.decorator_factory.<locals>.decorator_name

bulwark.decorators.HasNoX¶: alias of bulwark.decorators.decorator_factory.<locals>.decorator_name

bulwark.decorators.HasSetWithinVals¶: alias of bulwark.decorators.decorator_factory.<locals>.decorator_name

bulwark.decorators.HasUniqueIndex¶: alias of bulwark.decorators.decorator_factory.<locals>.decorator_name

bulwark.decorators.IsMonotonic¶: alias of bulwark.decorators.decorator_factory.<locals>.decorator_name

bulwark.decorators.IsSameAs¶: alias of bulwark.decorators.decorator_factory.<locals>.decorator_name

bulwark.decorators.IsShape¶: alias of bulwark.decorators.decorator_factory.<locals>.decorator_name

bulwark.decorators.MultiCheck¶: alias of bulwark.decorators.decorator_factory.<locals>.decorator_name

bulwark.decorators.OneToMany¶: alias of bulwark.decorators.decorator_factory.<locals>.decorator_name

bulwark.decorators.Unique¶: alias of bulwark.decorators.decorator_factory.<locals>.decorator_name

bulwark.decorators.WithinNStd¶: alias of bulwark.decorators.decorator_factory.<locals>.decorator_name

bulwark.decorators.WithinRange¶: alias of bulwark.decorators.decorator_factory.<locals>.decorator_name

bulwark.decorators.WithinSet¶: alias of bulwark.decorators.decorator_factory.<locals>.decorator_name

bulwark.decorators.CustomCheck(check_func, *args, **kwargs)[source]¶: Assert that func(df, *args, **kwargs) is true.

bulwark.decorators.decorator_factory(decorator_name, func)[source]¶: Takes in a function and outputs a class that can be used as a decorator.

bulwark.generic module¶

Module for useful generic functions.

bulwark.generic.bad_locations(df)[source]¶: Indicates bad cells in df.

bulwark.generic.snake_to_camel(snake_str)[source]¶

bulwark package¶

Submodules¶

bulwark.checks module¶

bulwark.decorators module¶

bulwark.generic module¶

Module contents¶