bulwark package

Submodules

bulwark.checks module

Each function in this module should:

  • take a pd.DataFrame as its first argument, with optional additional arguments,
  • make an assert about the pd.DataFrame, and
  • return the original, unaltered pd.DataFrame
bulwark.checks.custom_check(check_func, df, *args, **kwargs)[source]

Assert that check(df, *args, **kwargs) is true.

Parameters:
  • df (pd.DataFrame) – Any pd.DataFrame.
  • check_func (function) – A function taking df, *args, and **kwargs. Should raise AssertionError if check not passed.
Returns:

Original df.

bulwark.checks.has_columns(df, columns, exact_cols=False, exact_order=False)[source]

Asserts that df has columns

Parameters:
  • df (pd.DataFrame) – Any pd.DataFrame.
  • columns (list or tuple) – Columns that are expected to be in df.
  • exact_cols (bool) – Whether or not columns need to be the only columns in df.
  • exact_order (bool) – Whether or not columns need to be in the same order as the columns in df.
Returns:

Original df.

bulwark.checks.has_dtypes(df, items)[source]

Asserts that df has dtypes

Parameters:
  • df (pd.DataFrame) – Any pd.DataFrame.
  • items (dict) – Mapping of columns to dtype.
Returns:

Original df.

bulwark.checks.has_no_infs(df, columns=None)[source]

Asserts that there are no np.infs in df.

This is a convenience wrapper for has_no_x.

Parameters:
  • df (pd.DataFrame) – Any pd.DataFrame.
  • columns (list) – A subset of columns to check for np.infs.
Returns:

Original df.

bulwark.checks.has_no_nans(df, columns=None)[source]

Asserts that there are no np.nans in df.

This is a convenience wrapper for has_no_x.

Parameters:
  • df (pd.DataFrame) – Any pd.DataFrame.
  • columns (list) – A subset of columns to check for np.nans.
Returns:

Original df.

bulwark.checks.has_no_neg_infs(df, columns=None)[source]

Asserts that there are no np.infs in df.

This is a convenience wrapper for has_no_x.

Parameters:
  • df (pd.DataFrame) – Any pd.DataFrame.
  • columns (list) – A subset of columns to check for -np.infs.
Returns:

Original df.

bulwark.checks.has_no_nones(df, columns=None)[source]

Asserts that there are no Nones in df.

This is a convenience wrapper for has_no_x.

Parameters:
  • df (pd.DataFrame) – Any pd.DataFrame.
  • columns (list) – A subset of columns to check for Nones.
Returns:

Original df.

bulwark.checks.has_no_x(df, values=None, columns=None)[source]

Asserts that there are no user-specified values in df’s columns.

Parameters:
  • df (pd.DataFrame) – Any pd.DataFrame.
  • values (list) – A list of values to check for in the pd.DataFrame.
  • columns (list) – A subset of columns to check for values.
Returns:

Original df.

bulwark.checks.has_set_within_vals(df, items)[source]

Asserts that all given values are found in columns’ values.

In other words, the given values in the items dict should all be a subset of the values found in the associated column in df.

Parameters:
  • df (pd.DataFrame) – Any pd.DataFrame.
  • items (dict) – Mapping of columns to values excepted to be found within them.
Returns:

Original df.

Examples

The following check will pass, since df[‘a’] contains each of 1 and 2:

>>> df = pd.DataFrame({'a': [1, 2, 3], 'b': ['a', 'b', 'c']})
>>> ck.has_set_within_vals(df, items={"a": [1, 2]})

The following check will fail, since df[‘b’] doesn’t contain each of “a” and “d”:

>>> df = pd.DataFrame({'a': [1, 2, 3], 'b': ['a', 'b', 'c']})
>>> ck.has_set_within_vals(df, items={"a": [1, 2], "b": ["a", "d"]})
bulwark.checks.has_unique_index(df)[source]

Asserts that df’s index is unique.

Parameters:df (pd.DataFrame) – Any pd.DataFrame.
Returns:Original df.
bulwark.checks.is_monotonic(df, items=None, increasing=None, strict=False)[source]

Asserts that the df is monotonic.

Parameters:
  • df (pd.DataFrame) – Any pd.DataFrame.
  • items (dict) – Mapping of columns to conditions (increasing, strict)
  • increasing (bool, None) – None is either increasing or decreasing.
  • strict (bool) – Whether the comparison should be strict.
Returns:

Original df.

bulwark.checks.is_same_as(df, df_to_compare, **kwargs)[source]

Asserts that two pd.DataFrames are equal.

Parameters:
  • df (pd.DataFrame) – Any pd.DataFrame.
  • df_to_compare (pd.DataFrame) – A second pd.DataFrame.
  • **kwargs (dict) – Keyword arguments passed through to pandas’ assert_frame_equal.
Returns:

Original df.

bulwark.checks.is_shape(df, shape)[source]

Asserts that df is of a known row x column shape.

Parameters:
  • df (pd.DataFrame) – Any pd.DataFrame.
  • shape (tuple) – Shape of df as (n_rows, n_columns). Use None or -1 if you don’t care about a specific dimension.
Returns:

Original df.

bulwark.checks.multi_check(df, checks, warn=False)[source]

Asserts that all checks pass.

Parameters:
  • df (pd.DataFrame) – Any pd.DataFrame.
  • checks (dict) – Mapping of check functions to parameters for those check functions.
  • warn (bool) – Indicates whether an error should be raised or only a warning notification should be displayed. Default is to error.
Returns:

Original df.

bulwark.checks.one_to_many(df, unitcol, manycol)[source]

Asserts that a many-to-one relationship is preserved between two columns.

For example, a retail store will have have distinct departments, each with several employees. If each employee may only work in a single department, then the relationship of the department to the employees is one to many.

Parameters:
  • df (pd.DataFrame) – Any pd.DataFrame.
  • unitcol (str) – The column that encapulates the groups in manycol.
  • manycol (str) – The column that must remain unique in the distict pairs between manycol and unitcol.
Returns:

Original df.

bulwark.checks.unique(df, columns=None)[source]

Asserts that columns in df only have unique values.

Parameters:
  • df (pd.DataFrame) – Any pd.DataFrame.
  • columns (list) – A subset of columns to check for uniqueness of row values.
Returns:

Original df.

bulwark.checks.within_n_std(df, n=3)[source]

Asserts that every value is within n standard deviations of its column’s mean.

Parameters:
  • df (pd.DataFrame) – Any pd.DataFrame.
  • n (int) – Number of standard deviations from the mean.
Returns:

Original df.

bulwark.checks.within_range(df, items=None)[source]

Asserts that df is within a range.

Parameters:
  • df (pd.DataFrame) – Any pd.DataFrame.
  • items (dict) – Mapping of columns (col) to a (low, high) tuple (v) that df[col] is expected to be between.
Returns:

Original df.

bulwark.checks.within_set(df, items=None)[source]

Asserts that df is a subset of items.

Parameters:
  • df (pd.DataFrame) – Any pd.DataFrame.
  • items (dict) – Mapping of columns (col) to array-like of values (v) that df[col] is expected to be a subset of.
Returns:

Original df.

bulwark.decorators module

class bulwark.decorators.BaseDecorator(*args, **kwargs)[source]

Bases: object

bulwark.decorators.HasColumns

alias of bulwark.decorators.decorator_factory.<locals>.decorator_name

bulwark.decorators.HasDtypes

alias of bulwark.decorators.decorator_factory.<locals>.decorator_name

bulwark.decorators.HasNoInfs

alias of bulwark.decorators.decorator_factory.<locals>.decorator_name

bulwark.decorators.HasNoNans

alias of bulwark.decorators.decorator_factory.<locals>.decorator_name

bulwark.decorators.HasNoNegInfs

alias of bulwark.decorators.decorator_factory.<locals>.decorator_name

bulwark.decorators.HasNoNones

alias of bulwark.decorators.decorator_factory.<locals>.decorator_name

bulwark.decorators.HasNoX

alias of bulwark.decorators.decorator_factory.<locals>.decorator_name

bulwark.decorators.HasSetWithinVals

alias of bulwark.decorators.decorator_factory.<locals>.decorator_name

bulwark.decorators.HasUniqueIndex

alias of bulwark.decorators.decorator_factory.<locals>.decorator_name

bulwark.decorators.IsMonotonic

alias of bulwark.decorators.decorator_factory.<locals>.decorator_name

bulwark.decorators.IsSameAs

alias of bulwark.decorators.decorator_factory.<locals>.decorator_name

bulwark.decorators.IsShape

alias of bulwark.decorators.decorator_factory.<locals>.decorator_name

bulwark.decorators.MultiCheck

alias of bulwark.decorators.decorator_factory.<locals>.decorator_name

bulwark.decorators.OneToMany

alias of bulwark.decorators.decorator_factory.<locals>.decorator_name

bulwark.decorators.Unique

alias of bulwark.decorators.decorator_factory.<locals>.decorator_name

bulwark.decorators.WithinNStd

alias of bulwark.decorators.decorator_factory.<locals>.decorator_name

bulwark.decorators.WithinRange

alias of bulwark.decorators.decorator_factory.<locals>.decorator_name

bulwark.decorators.WithinSet

alias of bulwark.decorators.decorator_factory.<locals>.decorator_name

bulwark.decorators.CustomCheck(check_func, *args, **kwargs)[source]

Assert that func(df, *args, **kwargs) is true.

bulwark.decorators.decorator_factory(decorator_name, func)[source]

Takes in a function and outputs a class that can be used as a decorator.

bulwark.generic module

Module for useful generic functions.

bulwark.generic.bad_locations(df)[source]

Indicates bad cells in df.

bulwark.generic.snake_to_camel(snake_str)[source]

Module contents