anomed_anonymizer.anonymizer

Module Contents

Classes

PersistingTabularDataAnonymizer

SupervisedLearningAnonymizer

A base class for anonymizers (privacy preserving machine learning models) that rely on the supervised learning paradigm.

TabularDataAnonymizer

A base class for anonymizing schemes that process leaky data to provide anonymized data.

TFKerasWrapper

If you already have a compiled (!) model of type tf.keras.layers.Model, use this wrapper to lift it no the SupervisedLearningAnonymizer interface.

WrappedAnonymizer

If you already have an anonymizer object that offers a fit(X, y) method and either a predict(X) or predict(X, batch_size) too, use this wrapper to lift it to a SupervisedLearningAnonymizer. If your object also features save, it will used. Otherwise, provide a replacement functions at initialization.

Functions

batch_views

Create batch views of numpy arrays for a given batch size.

pickle_anonymizer

A pickling-based serializer to use as a replacement in WrappedAnonymizer

unpickle_anonymizer

An inverse to pickle_anonymizer, to use as model_loader argument for anonymizer_server.supervised_learning_anonymizer_server_factory

API

anomed_anonymizer.anonymizer.batch_views(array: numpy.ndarray, batch_size: int | None) list[numpy.ndarray]

Create batch views of numpy arrays for a given batch size.

Parameters:
  • array (np.ndarray) – The array to create batches of.

  • batch_size (int | None) – The requested size of the individual batches. The final batch might be of smaller size, if less than batch_size elements are left to batch.

Returns:

A list of batch views (see np.split for details).

Return type:

list[np.ndarray]

class anomed_anonymizer.anonymizer.PersistingTabularDataAnonymizer(tabular_data_anonymizer: anomed_anonymizer.anonymizer.TabularDataAnonymizer, output_dir: str | pathlib.Path)

Initialization

anonymize(leaky_data: pandas.DataFrame) None
get_anon_data() pandas.DataFrame
get_anon_scheme() anomed_challenge.AnonymizationScheme
anomed_anonymizer.anonymizer.pickle_anonymizer(anonymizer: Any, filepath: str | pathlib.Path) None

A pickling-based serializer to use as a replacement in WrappedAnonymizer

class anomed_anonymizer.anonymizer.SupervisedLearningAnonymizer

Bases: abc.ABC

A base class for anonymizers (privacy preserving machine learning models) that rely on the supervised learning paradigm.

Subclasses need to define a way to …

  • fit/train the model they represent using only a feature array and a target array (i.e. without explicitly given hyperparameters)

  • use the (trained) model for inference

  • save the (trained) model to disk

  • validate model input arrays

abstract fit(X: numpy.ndarray, y: numpy.ndarray) None

Perform a full training cycle (all epochs, not just one) using the given features and targets.

Parameters:
  • X (np.ndarray) – The feature array.

  • y (np.ndarray) – The target array.

abstract predict(X: numpy.ndarray, batch_size: int | None = None) numpy.ndarray

Infer the target values of a feature array.

Parameters:
  • X (np.ndarray) – The features to infer the target values for.

  • batch_size (int | None, optional) – The batch size to use while inferring (to limit compute resource consumption). By default None, which results in processing the whole array X at once.

Returns:

The target values.

Return type:

np.ndarray

abstract save(filepath: str | pathlib.Path) None

Save the instance to disk, maintaining the current training progress.

Parameters:

filepath (str | Path) – Where to save the instance.

abstract validate_input(feature_array: numpy.ndarray) None

Check whether the input array is a valid argument for fit and for predict (parameter X`).

If so, do nothing. Otherwise, raise a ValueError.

Parameters:

feature_array (np.ndarray) – The input feature array to validate.

Raises:

ValueError – If feature_array is incompatible with this anonymizer.

class anomed_anonymizer.anonymizer.TabularDataAnonymizer

Bases: abc.ABC

A base class for anonymizing schemes that process leaky data to provide anonymized data.

This class is intended to be used to contribute to challenges of type TabularDataReconstructionChallenge. That implies the anonymized data has to respect one of the schemes denoted by AnonymizationScheme.

Subclasses need to define a way to anonymize the leaky data, respecting one of the predefined anonymizing schemes.

abstract anonymize(leaky_data: pandas.DataFrame) tuple[pandas.DataFrame, anomed_challenge.AnonymizationScheme]

Anonymize leaky tabular data.

Parameters:

leaky_data (pd.DataFrame) – The tabular data to anonymize.

Returns:

(anon_data, scheme) – The anonymized data and the used anonymization scheme.

Return type:

tuple[pd.DataFrame, anomed_challenge.AnonymizationScheme]

class anomed_anonymizer.anonymizer.TFKerasWrapper(tfkeras_model: Any, feature_array_validator: Callable[[numpy.ndarray], None], **kwargs)

Bases: anomed_anonymizer.anonymizer.SupervisedLearningAnonymizer

If you already have a compiled (!) model of type tf.keras.layers.Model, use this wrapper to lift it no the SupervisedLearningAnonymizer interface.

Initialization

Parameters:
  • tfkeras_model (tf.keras.layers.Model) – A compiled (!) model created using tf.keras.

  • feature_array_validator (Callable[[np.ndarray], None]) – The function to use, when invoking SupervisedLearningAnonymizer.validate_input (see the abstract class’ docs for more info).

  • **kwargs (dict[str, Any]) – Further arguments that will be passed to tfkeras_model.fit. Avoid setting the parameters x and y, as they are already in use by this wrapper.

fit(X: numpy.ndarray, y: numpy.ndarray) None
predict(X: numpy.ndarray, batch_size: int | None = None) numpy.ndarray
save(filepath: str | pathlib.Path) None
validate_input(feature_array: numpy.ndarray) None
anomed_anonymizer.anonymizer.unpickle_anonymizer(filepath: str | pathlib.Path) Any

An inverse to pickle_anonymizer, to use as model_loader argument for anonymizer_server.supervised_learning_anonymizer_server_factory

class anomed_anonymizer.anonymizer.WrappedAnonymizer(anonymizer, serializer: Callable[[Any, str | pathlib.Path], None] | None = None, feature_array_validator: Callable[[numpy.ndarray], None] | None = None)

Bases: anomed_anonymizer.anonymizer.SupervisedLearningAnonymizer

If you already have an anonymizer object that offers a fit(X, y) method and either a predict(X) or predict(X, batch_size) too, use this wrapper to lift it to a SupervisedLearningAnonymizer. If your object also features save, it will used. Otherwise, provide a replacement functions at initialization.

Initialization

Parameters:
  • anonymizer – The object to be wrapped as a SupervisedLearningAnonymizer. It should implement a fit(X: np.ndarray, y: np.ndarray) and either a predict(X: np.ndarray) or a predict(X: np.ndarray, batch_size: int | None).

  • serializer (Callable[[Any, str | Path], None] | None, optional) – The serializer (pickler) to use, if anonymizer does not provide a save method. The first argument of the serializer is anonymizer and the second the filepath. By default None, which means invoking anonymizer.save(...).

  • feature_array_validator (Callable[[np.ndarray], None] | None, optional) – The feature array validator to use, if anonymizer does not provide a validate_input method. By default None, which means invoking anonymizer.validate_input(...).

Raises:

NotImplementedError – If anonymizer does not provide a fit or predict method.

fit(X: numpy.ndarray, y: numpy.ndarray)
predict(X: numpy.ndarray, batch_size: int | None = None) numpy.ndarray

Uses the anonymizer’s predict method to predict the target values for X. If that method accepts a batch_size parameter, this makes use of it. Otherwise, this methods takes care of batching.

Parameters:
  • X (np.ndarray) – The feature array.

  • batch_size (int | None, optional) – The batch size to use for prediction. By default None, which means use the whole array X at once.

Returns:

The inferred/predicted target values.

Return type:

np.ndarray

save(filepath: str | pathlib.Path)
validate_input(feature_array: numpy.ndarray) None