`anomed_anonymizer.anonymizer`

Module Contents

Classes

`PersistingTabularDataAnonymizer`
`SupervisedLearningAnonymizer`	A base class for anonymizers (privacy preserving machine learning models) that rely on the supervised learning paradigm.
`TabularDataAnonymizer`	A base class for anonymizing schemes that process leaky data to provide anonymized data.
`TFKerasWrapper`	If you already have a compiled (!) model of type `tf.keras.layers.Model`, use this wrapper to lift it no the `SupervisedLearningAnonymizer` interface.
`WrappedAnonymizer`	If you already have an anonymizer object that offers a `fit(X, y)` method and either a `predict(X)` or `predict(X, batch_size)` too, use this wrapper to lift it to a `SupervisedLearningAnonymizer`. If your object also features `save`, it will used. Otherwise, provide a replacement functions at initialization.

Functions

`batch_views`	Create batch views of numpy arrays for a given batch size.
`pickle_anonymizer`	A pickling-based serializer to use as a replacement in `WrappedAnonymizer`
`unpickle_anonymizer`	An inverse to `pickle_anonymizer`, to use as `model_loader` argument for `anonymizer_server.supervised_learning_anonymizer_server_factory`

API

anomed_anonymizer.anonymizer.batch_views(array: numpy.ndarray, batch_size: int | None) → list[numpy.ndarray]

Create batch views of numpy arrays for a given batch size.

Parameters:

array (np.ndarray) – The array to create batches of.
batch_size (int | None) – The requested size of the individual batches. The final batch might be of smaller size, if less than batch_size elements are left to batch.

Returns:

A list of batch views (see np.split for details).

Return type:

list[np.ndarray]

class anomed_anonymizer.anonymizer.PersistingTabularDataAnonymizer(tabular_data_anonymizer: anomed_anonymizer.anonymizer.TabularDataAnonymizer, output_dir: str | pathlib.Path)

Initialization

anonymize(leaky_data: pandas.DataFrame) → None

get_anon_data() → pandas.DataFrame

get_anon_scheme() → anomed_challenge.AnonymizationScheme

anomed_anonymizer.anonymizer.pickle_anonymizer(anonymizer: Any, filepath: str | pathlib.Path) → None: A pickling-based serializer to use as a replacement in WrappedAnonymizer

class anomed_anonymizer.anonymizer.SupervisedLearningAnonymizer

Bases: abc.ABC

A base class for anonymizers (privacy preserving machine learning models) that rely on the supervised learning paradigm.

Subclasses need to define a way to …

fit/train the model they represent using only a feature array and a target array (i.e. without explicitly given hyperparameters)
use the (trained) model for inference
save the (trained) model to disk
validate model input arrays

abstract fit(X: numpy.ndarray, y: numpy.ndarray) → None

Perform a full training cycle (all epochs, not just one) using the given features and targets.

Parameters:

X (np.ndarray) – The feature array.
y (np.ndarray) – The target array.

abstract predict(X: numpy.ndarray, batch_size: int | None = None) → numpy.ndarray

Infer the target values of a feature array.

Parameters:

X (np.ndarray) – The features to infer the target values for.
batch_size (int | None, optional) – The batch size to use while inferring (to limit compute resource consumption). By default None, which results in processing the whole array X at once.

Returns:

The target values.

Return type:

np.ndarray

abstract save(filepath: str | pathlib.Path) → None

Save the instance to disk, maintaining the current training progress.

Parameters:: filepath (str | Path) – Where to save the instance.

abstract validate_input(feature_array: numpy.ndarray) → None

Check whether the input array is a valid argument for fit and for predict (parameter X`).

If so, do nothing. Otherwise, raise a ValueError.

Parameters:: feature_array (np.ndarray) – The input feature array to validate.
Raises:: ValueError – If feature_array is incompatible with this anonymizer.

class anomed_anonymizer.anonymizer.TabularDataAnonymizer

Bases: abc.ABC

A base class for anonymizing schemes that process leaky data to provide anonymized data.

This class is intended to be used to contribute to challenges of type TabularDataReconstructionChallenge. That implies the anonymized data has to respect one of the schemes denoted by AnonymizationScheme.

Subclasses need to define a way to anonymize the leaky data, respecting one of the predefined anonymizing schemes.

abstract anonymize(leaky_data: pandas.DataFrame) → tuple[pandas.DataFrame, anomed_challenge.AnonymizationScheme]

Anonymize leaky tabular data.

Parameters:: leaky_data (pd.DataFrame) – The tabular data to anonymize.
Returns:: (anon_data, scheme) – The anonymized data and the used anonymization scheme.
Return type:: tuple[pd.DataFrame, anomed_challenge.AnonymizationScheme]

class anomed_anonymizer.anonymizer.TFKerasWrapper(tfkeras_model: Any, feature_array_validator: Callable[[numpy.ndarray], None], **kwargs)

Bases: anomed_anonymizer.anonymizer.SupervisedLearningAnonymizer

If you already have a compiled (!) model of type tf.keras.layers.Model, use this wrapper to lift it no the SupervisedLearningAnonymizer interface.

Initialization

Parameters:

tfkeras_model (tf.keras.layers.Model) – A compiled (!) model created using tf.keras.
feature_array_validator (Callable[[np.ndarray], None]) – The function to use, when invoking SupervisedLearningAnonymizer.validate_input (see the abstract class’ docs for more info).
**kwargs (dict[str, Any]) – Further arguments that will be passed to tfkeras_model.fit. Avoid setting the parameters x and y, as they are already in use by this wrapper.

fit(X: numpy.ndarray, y: numpy.ndarray) → None

predict(X: numpy.ndarray, batch_size: int | None = None) → numpy.ndarray

save(filepath: str | pathlib.Path) → None

validate_input(feature_array: numpy.ndarray) → None

anomed_anonymizer.anonymizer.unpickle_anonymizer(filepath: str | pathlib.Path) → Any: An inverse to pickle_anonymizer, to use as model_loader argument for anonymizer_server.supervised_learning_anonymizer_server_factory

class anomed_anonymizer.anonymizer.WrappedAnonymizer(anonymizer, serializer: Callable[[Any, str | pathlib.Path], None] | None = None, feature_array_validator: Callable[[numpy.ndarray], None] | None = None)

Bases: anomed_anonymizer.anonymizer.SupervisedLearningAnonymizer

If you already have an anonymizer object that offers a fit(X, y) method and either a predict(X) or predict(X, batch_size) too, use this wrapper to lift it to a SupervisedLearningAnonymizer. If your object also features save, it will used. Otherwise, provide a replacement functions at initialization.

Initialization

Parameters:

anonymizer – The object to be wrapped as a SupervisedLearningAnonymizer. It should implement a fit(X: np.ndarray, y: np.ndarray) and either a predict(X: np.ndarray) or a predict(X: np.ndarray, batch_size: int | None).
serializer (Callable[[Any, str | Path], None] | None, optional) – The serializer (pickler) to use, if anonymizer does not provide a save method. The first argument of the serializer is anonymizer and the second the filepath. By default None, which means invoking anonymizer.save(...).
feature_array_validator (Callable[[np.ndarray], None] | None, optional) – The feature array validator to use, if anonymizer does not provide a validate_input method. By default None, which means invoking anonymizer.validate_input(...).

Raises:

NotImplementedError – If anonymizer does not provide a fit or predict method.

fit(X: numpy.ndarray, y: numpy.ndarray)

predict(X: numpy.ndarray, batch_size: int | None = None) → numpy.ndarray

Uses the anonymizer’s predict method to predict the target values for X. If that method accepts a batch_size parameter, this makes use of it. Otherwise, this methods takes care of batching.

Parameters:

X (np.ndarray) – The feature array.
batch_size (int | None, optional) – The batch size to use for prediction. By default None, which means use the whole array X at once.

Returns:

The inferred/predicted target values.

Return type:

np.ndarray

save(filepath: str | pathlib.Path)

validate_input(feature_array: numpy.ndarray) → None

anomed_anonymizer.anonymizer

Module Contents

Classes

Functions

API

`anomed_anonymizer.anonymizer`