anomed_anonymizer.anonymizer
Module Contents
Classes
A base class for anonymizers (privacy preserving machine learning models) that rely on the supervised learning paradigm. |
|
A base class for anonymizing schemes that process leaky data to provide anonymized data. |
|
If you already have a compiled (!) model of type |
|
If you already have an anonymizer object that offers a |
Functions
Create batch views of numpy arrays for a given batch size. |
|
A pickling-based serializer to use as a replacement in
|
|
An inverse to |
API
- anomed_anonymizer.anonymizer.batch_views(array: numpy.ndarray, batch_size: int | None) list[numpy.ndarray]
Create batch views of numpy arrays for a given batch size.
- Parameters:
array (np.ndarray) – The array to create batches of.
batch_size (int | None) – The requested size of the individual batches. The final batch might be of smaller size, if less than
batch_sizeelements are left to batch.
- Returns:
A list of batch views (see
np.splitfor details).- Return type:
list[np.ndarray]
- class anomed_anonymizer.anonymizer.PersistingTabularDataAnonymizer(tabular_data_anonymizer: anomed_anonymizer.anonymizer.TabularDataAnonymizer, output_dir: str | pathlib.Path)
Initialization
- anonymize(leaky_data: pandas.DataFrame) None
- get_anon_data() pandas.DataFrame
- get_anon_scheme() anomed_challenge.AnonymizationScheme
- anomed_anonymizer.anonymizer.pickle_anonymizer(anonymizer: Any, filepath: str | pathlib.Path) None
A pickling-based serializer to use as a replacement in
WrappedAnonymizer
- class anomed_anonymizer.anonymizer.SupervisedLearningAnonymizer
Bases:
abc.ABCA base class for anonymizers (privacy preserving machine learning models) that rely on the supervised learning paradigm.
Subclasses need to define a way to …
fit/train the model they represent using only a feature array and a target array (i.e. without explicitly given hyperparameters)
use the (trained) model for inference
save the (trained) model to disk
validate model input arrays
- abstract fit(X: numpy.ndarray, y: numpy.ndarray) None
Perform a full training cycle (all epochs, not just one) using the given features and targets.
- Parameters:
X (np.ndarray) – The feature array.
y (np.ndarray) – The target array.
- abstract predict(X: numpy.ndarray, batch_size: int | None = None) numpy.ndarray
Infer the target values of a feature array.
- Parameters:
X (np.ndarray) – The features to infer the target values for.
batch_size (int | None, optional) – The batch size to use while inferring (to limit compute resource consumption). By default
None, which results in processing the whole arrayXat once.
- Returns:
The target values.
- Return type:
np.ndarray
- abstract save(filepath: str | pathlib.Path) None
Save the instance to disk, maintaining the current training progress.
- Parameters:
filepath (str | Path) – Where to save the instance.
- abstract validate_input(feature_array: numpy.ndarray) None
Check whether the input array is a valid argument for
fitand forpredict (parameterX`).If so, do nothing. Otherwise, raise a
ValueError.- Parameters:
feature_array (np.ndarray) – The input feature array to validate.
- Raises:
ValueError – If
feature_arrayis incompatible with this anonymizer.
- class anomed_anonymizer.anonymizer.TabularDataAnonymizer
Bases:
abc.ABCA base class for anonymizing schemes that process leaky data to provide anonymized data.
This class is intended to be used to contribute to challenges of type
TabularDataReconstructionChallenge. That implies the anonymized data has to respect one of the schemes denoted byAnonymizationScheme.Subclasses need to define a way to anonymize the leaky data, respecting one of the predefined anonymizing schemes.
- abstract anonymize(leaky_data: pandas.DataFrame) tuple[pandas.DataFrame, anomed_challenge.AnonymizationScheme]
Anonymize leaky tabular data.
- Parameters:
leaky_data (pd.DataFrame) – The tabular data to anonymize.
- Returns:
(anon_data, scheme) – The anonymized data and the used anonymization scheme.
- Return type:
tuple[pd.DataFrame, anomed_challenge.AnonymizationScheme]
- class anomed_anonymizer.anonymizer.TFKerasWrapper(tfkeras_model: Any, feature_array_validator: Callable[[numpy.ndarray], None], **kwargs)
Bases:
anomed_anonymizer.anonymizer.SupervisedLearningAnonymizerIf you already have a compiled (!) model of type
tf.keras.layers.Model, use this wrapper to lift it no theSupervisedLearningAnonymizerinterface.Initialization
- Parameters:
tfkeras_model (tf.keras.layers.Model) – A compiled (!) model created using tf.keras.
feature_array_validator (Callable[[np.ndarray], None]) – The function to use, when invoking
SupervisedLearningAnonymizer.validate_input(see the abstract class’ docs for more info).**kwargs (dict[str, Any]) – Further arguments that will be passed to
tfkeras_model.fit. Avoid setting the parametersxandy, as they are already in use by this wrapper.
- fit(X: numpy.ndarray, y: numpy.ndarray) None
- predict(X: numpy.ndarray, batch_size: int | None = None) numpy.ndarray
- save(filepath: str | pathlib.Path) None
- validate_input(feature_array: numpy.ndarray) None
- anomed_anonymizer.anonymizer.unpickle_anonymizer(filepath: str | pathlib.Path) Any
An inverse to
pickle_anonymizer, to use asmodel_loaderargument foranonymizer_server.supervised_learning_anonymizer_server_factory
- class anomed_anonymizer.anonymizer.WrappedAnonymizer(anonymizer, serializer: Callable[[Any, str | pathlib.Path], None] | None = None, feature_array_validator: Callable[[numpy.ndarray], None] | None = None)
Bases:
anomed_anonymizer.anonymizer.SupervisedLearningAnonymizerIf you already have an anonymizer object that offers a
fit(X, y)method and either apredict(X)orpredict(X, batch_size)too, use this wrapper to lift it to aSupervisedLearningAnonymizer. If your object also featuressave, it will used. Otherwise, provide a replacement functions at initialization.Initialization
- Parameters:
anonymizer – The object to be wrapped as a
SupervisedLearningAnonymizer. It should implement afit(X: np.ndarray, y: np.ndarray)and either apredict(X: np.ndarray)or apredict(X: np.ndarray, batch_size: int | None).serializer (Callable[[Any, str | Path], None] | None, optional) – The serializer (pickler) to use, if
anonymizerdoes not provide asavemethod. The first argument of the serializer isanonymizerand the second the filepath. By defaultNone, which means invokinganonymizer.save(...).feature_array_validator (Callable[[np.ndarray], None] | None, optional) – The feature array validator to use, if
anonymizerdoes not provide avalidate_inputmethod. By defaultNone, which means invokinganonymizer.validate_input(...).
- Raises:
NotImplementedError – If
anonymizerdoes not provide afitorpredictmethod.
- fit(X: numpy.ndarray, y: numpy.ndarray)
- predict(X: numpy.ndarray, batch_size: int | None = None) numpy.ndarray
Uses the anonymizer’s
predictmethod to predict the target values forX. If that method accepts abatch_sizeparameter, this makes use of it. Otherwise, this methods takes care of batching.- Parameters:
X (np.ndarray) – The feature array.
batch_size (int | None, optional) – The batch size to use for prediction. By default
None, which means use the whole arrayXat once.
- Returns:
The inferred/predicted target values.
- Return type:
np.ndarray
- save(filepath: str | pathlib.Path)
- validate_input(feature_array: numpy.ndarray) None