Welcome to CiphermodeClient’s documentation!

Welcome to CiphermodeClient’s documentation!#

create_client(frontend_address, auth_config='~/.ciphercore/auth_config', token_path='~/.ciphercore/token', custom_root_ca=None, tls_domain='localhost', private_key=None, certificate_chain=None, *args, **kwargs)[source]#

Create a CiphermodeApi instance and intialize it.

Parameters:

frontend_address (str) – The address of the server.
auth_config (str, optional) – Path to auth config.
token_path (str, optional) – Path to file containing OpenIDConnect token.
custom_root_ca (str, optional) – Path to a TLS certificate file.
tls_domain (str, optional) – The domain protected by the TLS certificate.
private_key (str, optional) – Path to the client’s private key.
certificate_chain (str, optional) – Path to the client’s certificate chain.
*args – Arguments for the PandasConverter.
**kwargs – Kwargs for the PandasConverter.

Returns:

An instance of the CiphermodeApi.

Return type:

CiphermodeApi

class CiphermodeApi(address, auth_handler, cert=None, tls_domain=None, private_key=None, certificate_chain=None, *args, **kwargs)[source]#

Bases: object

add_user_role(user_id, role)[source]#

Add a role to a user. Valid roles are: - ‘data_owner’: [SMPC] The user can upload datasets and approve computations using them. - ‘analyst’: [SMPC] The user can create computations. - ‘admin’: [SMPC] The user can perform all actions of ‘data_owner’ and ‘analyst’, and can modify acls and user roles.

Parameters:

user_id (str) – The ID of the user.
role (str) – The role to be added to the user.

Returns:

A pandas DataFrame containing the updated user information.

Return type:

DataFrame

approve_data_request(id, comment='')[source]#

Approves a data request.

Parameters:

id (str) – The ID of the data request to approve.
comment (str, optional) – A comment to attach to the data request.

Returns:

A pandas DataFrame containing the approved data request.

Return type:

DataFrame

build_info()[source]#

Get the build information.

Returns:: An object containing the build information.
Return type:: Object

cancel_computation_session(id)[source]#

Cancel a specific computation session.

Parameters:: id (str) – The ID of the computation session to cancel.
Returns:: A pandas DataFrame containing the cancelled computation session information.
Return type:: DataFrame

comment_data_request(id, comment='')[source]#

Comments on a data request.

Parameters:

id (str) – The ID of the data request to comment on.
comment (str) – The comment to attach to the data request.

Returns:

A pandas DataFrame containing the commented data request.

Return type:

DataFrame

create_computation(orchestrator, graphs_config, name, description, config=None)[source]#

Create a computation.

Computation object specifies what computation to execute, regardless of the data. The same computation can be used multiple times with different datasets. Note that there are easier-to-use functions for specific computations (PSI, SQL, NN training, etc.). We strongly recommend using those functions when possible.

Parameters:

orchestrator (str) – The orchestrator type for the computation.
graphs_config (dict) – The “graph name -> graph ID” mapping.
name (str) – The name of the computation.
description (str) – The description of the computation.
config (dict, optional) – Additional orchestrator-specific configuration for the computation.

Returns:

A pandas DataFrame containing the created computation information. The ‘id’ field of this DataFrame is the ID of the created computation, which is used to refer to this computation in other operations.

Return type:

DataFrame

create_computation_session(computation_id, data_config, name='', description='')[source]#

Create a computation session. A computation session is an instantiation of a computation on specific datasets, specified by the data_config argument.

Parameters:

computation_id (str) – The ID of the computation.
data_config (dict) – The mapping (name -> dataset ID). Names are orchestrator-specific (see orchestrator-specific functions for details, e.g. create_psi).
name (str, optional) – The name of the session.
description (str, optional) – The description of the session.

Returns:

A pandas Series containing the created computation session information. The ‘id’ in this Series is the ID of the created computation session, which is used to refer to this session in other operations.

create_explore_dataset_intersection(dataset_id1, dataset_id2, column_names1, column_names2, use_approx_match_rate=False)[source]#

Creates an exploration of the intersection between two datasets.

Parameters:

dataset_id1 (str) – The ID of the first dataset.
dataset_id2 (str) – The ID of the second dataset.
column_names1 (list(str)) – Names of the columns in the first dataset to compare.
column_names2 (list(str)) – Names of the columns in the second dataset to compare.
use_approx_match_rate (bool, optional) – Whether to use approximate match rate. Default is False.

Returns:

computation_session_id

Return type:

String

create_knn(key_dataset_id, query_dataset_id, num_neighbors, value_dataset_id=None, name='', description='')[source]#

Create a KNN (k-Nearest-Neighbors) computation session.

Parameters:

key_dataset_id (str) – The ID of the rowwise dataset with lookup keys (vectors).
query_dataset_id (str) – The ID of the rowwise dataset with lookup queries (vectors).
num_neighbors (int) – The number of neighbors to consider in the KNN computation.
value_dataset_id (str, optional) – The ID of the dataset with labels. Default is None.
name (str, optional) – The name of the session.
description (str, optional) – The description of the session.

Returns:

A pandas DataFrame containing the created computation session information.

Return type:

DataFrame

create_llm_inference(inference_dataset_id, model_dataset_id, max_len=128, num_layers=8, embedding_dim=512, num_heads=16, temperature=0.85, top_p=0.85, name='', description='')[source]#

Create a LLM inference computation session.

Parameters:

inference_dataset_id (str) – The ID of the inference dataset.
model_dataset_id (str) – The ID of the model dataset.
max_len (int, optional) – The maximum length of the generated sequence.
num_layers (int, optional) – The number of layers in the model.
embedding_dim (int, optional) – The embedding dimension of the model.
num_heads (int, optional) – The number of attention heads in the model.
temperature (float, optional) – The temperature for sampling.
top_p (float, optional) – The top-p heuristic value for sampling.
name (str, optional) – The name of the session.
description (str, optional) – The description of the session.

Returns:

A pandas DataFrame containing the created computation session information.

Return type:

DataFrame

create_mlp(train_datasets, validation_datasets, test_datasets, model_dataset, layers=[100, 1], batch_size=64, optimizer='adam', learning_rate=0.0003, loss='log_loss', epochs=3, precision=15, name='', description='')[source]#

Create an MLP (Multi-Layer Perceptron) training computation session.

Parameters:

train_datasets (list) – The list of training dataset IDs.
validation_datasets (list) – The list of validation dataset IDs.
test_datasets (list) – The list of testing dataset IDs.
layers (list, optional) – List of hidden layer sizes in the MLP (in most cases, the last one should be 1). Default is [100, 1].
batch_size (int, optional) – Batch size for training. Default is 64.
optimizer (str, optional) – Optimizer to use for training. Default is ‘adam’, supported optimizers are ‘adam’, ‘adagrad’, ‘sgd’.
learning_rate (float, optional) – Learning rate for training. Default is 3e-4.
loss (str, optional) – Loss function to use for training. Default is ‘log_loss’, supported losses are ‘log_loss’ and ‘mse’.
epochs (int, optional) – Number of epochs for training. Default is 3.
precision (int, optional) – Precision for training. Default is 15. Training is performed in fixed-point arithmetic with denominator 2**precision.
name (str, optional) – The name of the session.
description (str, optional) – The description of the session.

Returns:

A pandas DataFrame containing the created computation session information.

Return type:

DataFrame

create_nn_inference(inference_dataset_id, model_dataset_id, batch_size=64, precision=15, name='', description='')[source]#

Create a neural network inference computation session.

Parameters:

inference_dataset_id (str) – The ID of the inference dataset.
model_dataset_id (str) – The ID of the model dataset.
batch_size (int, optional) – The batch size for inference. Default is 64, should be the same as for training.
precision (int, optional) – The precision for inference. Default is 15, should be the same as for training.
name (str, optional) – The name of the session.
description (str, optional) – The description of the session.

Returns:

A pandas DataFrame containing the created computation session information.

Return type:

DataFrame

create_psi(first_dataset_id, second_dataset_id, first_dataset_columns, second_dataset_columns, name='', description='', sharded=True)[source]#

Create a PSI (Private Set Intersection) computation session.

Parameters:

first_dataset_id (str) – The ID of the first dataset.
second_dataset_id (str) – The ID of the second dataset.
first_dataset_columns (list[str]) – The column from the first dataset to join.
second_dataset_columns (list[str]) – The column from the second dataset to join.
name (str, optional) – The name of the session.
description (str, optional) – The description of the session.
sharded (bool, optional) – Whether to shard the computation. Default is True.

Returns:

A pandas Series containing the created computation session information.

create_single_graph_computation(serialized_graph, name='', description='')[source]#

Create a single graph computation. These computations are usually used for testing or basic examples, e.g. computing the sum of two numbers with SMPC.

Parameters:

serialized_graph (str) – The serialized Ciphercore graph to create a computation for.
name (str, optional) – The name of the computation.
description (str, optional) – The description of the computation.

Returns:

A pandas DataFrame containing the created computation information.

Return type:

DataFrame

create_sql(query, data_config, name='', description='')[source]#

Create an SQL computation session.

Parameters:

query (str) – The SQL query to execute. The query should refer to columns with lowercase names.
data_config (dict) – The mapping (table name -> dataset ID). SQL queries refer to datasets by the table names in this mapping.
name (str, optional) – The name of the session.
description (str, optional) – The description of the session.

Returns:

A pandas DataFrame containing the created computation session information.

Return type:

DataFrame

delete_dataset(dataset_id)[source]#

Delete dataset with the specified ID.

Parameters:: dataset_id (str) – The ID of the dataset.
Returns:: True if dataset was successfully deleted.
Return type:: bool

download_computation_session_result(id, onnx=False)[source]#

Download the result of a specific computation session.

Parameters:

id (str) – The ID of the computation session to download.
onnx (bool, optional) – Whether to convert the result to ONNX protobuf. Default is False.

Returns:

A pandas DataFrame containing the downloaded computation session result.

Return type:

DataFrame

Raises:

CiphermodeException – if more than one of csv, onnx and float_array is set.

download_graph(id)[source]#

Download a graph with the specified ID.

Parameters:: id (str) – The ID of the graph.
Returns:: The serialized Ciphercore graph.
Return type:: str

get_cloud_upload(id)[source]#

Get cloud upload with the specified ID. Used to track the progress of uploading a particular computation session result to cloud storage.

Returns:: A pandas DataFrame containing the cloud upload information.
Return type:: DataFrame

get_dataset(dataset_id)[source]#

Get information about the dataset with the specified ID.

Parameters:: dataset_id (str) – The ID of the dataset.
Returns:: Metadata about the dataset with the specified ID, e.g. name, description, visibility, permissions. Does not return metadata about the data itself.
Return type:: Dataset

get_knn_computation(num_neighbors, has_labels=False)[source]#

Create a KNN (k-nearest-neighbors) computation.

Parameters:

num_neighbors (int) – The number of neighbors to consider in the KNN computation.
has_labels (bool, optional) – Whether the input data has labels. Default is False.

Returns:

The ID of the created computation.

Return type:

str

get_llm_inference_computation(max_len, num_layers, embedding_dim, num_heads, temperature, top_p)[source]#

Create a LLM inference computation.

Parameters:

max_len (int) – The maximum length of the generated text.
num_layers (int) – The number of layers in the transformer.
embedding_dim (int) – The embedding dimension of the transformer.
num_heads (int) – The number of heads in the transformer.
temperature (float) – The temperature for the sampling.
top_p (float) – The top p for the sampling.

Returns:

The ID of the created computation.

Return type:

str

get_mlp_computation(layers, batch_size, optimizer, learning_rate, loss, epochs, precision)[source]#

Create an MLP (Multi-Layer Perceptron) computation.

Parameters:

layers (list) – The list with the sizes of hidden layers in the MLP (note that the last one should be 1 in most cases).
batch_size (int) – The batch size for training.
optimizer (str) – The optimizer to use for training (we currently support ‘adam’, ‘adagrad’ and ‘sgd’).
learning_rate (float) – The learning rate for training.
loss (str) – The loss function to use for training (we currently support ‘log_loss’ and ‘mse’).
epochs (int) – The number of epochs for training.
precision (int) – The precision for training (it is conducted with fixed precision numbers, with 2**precision as denominator).

Returns:

The ID of the created computation.

Return type:

str

get_nn_inference_computation(batch_size, precision)[source]#

Create a neural network inference computation.

Parameters:

batch_size (int) – The batch size for inference, should be the same as for training.
precision (int) – The precision for inference, should be the same as for training.

Returns:

The ID of the created computation.

Return type:

str

get_psi_computation(first_dataset_columns, second_dataset_columns, sharded=True)[source]#

Create a PSI (Private Set Intersection) computation.

Parameters:

first_dataset_columns (list[str]) – The list of columns from the first dataset to join.
second_dataset_columns (list[str]) – The list of columns from the second dataset to join.
sharded (bool, optional) – Whether to shard the computation. This is useful for reducing
True. (the memory usage of a computation on large datasets. Default is)

Returns:

The ID of the created computation.

Return type:

str

get_report(dataset_id)[source]#

Get the report of the specified dataset.

Parameters:: dataset_id (str) – The ID of the dataset.
Returns:: The report of the specified dataset.
Return type:: Report (str)

get_sql_computation(query)[source]#

Create an SQL computation.

Parameters:: query (str) – The SQL query to execute. It can refer to tables by names, these names need to be specified in the corresponding computation session.
Returns:: The ID of the created computation.
Return type:: str

hash_dataset_columns(dataset_id, hash_column_names, new_dataset_name, async_init=False)[source]#

Hashes entries of dataset with given column names to create a succinct representation of the input dataset.

Succinct representations output by this method can be matched with create_psi to get hash values they have in common.

Parameters:

dataset_id (str) – The dataset ID.
hash_column_names (list[str]) – Columns from the dataset to hash.
new_dataset_name (str) – New dataset name.
async_init (bool, optional) – If true, function returns immediately after creating new dataset object and populates it with hashes asynchronously.

Returns:

A pandas Series containing the dataset ID for the succinct representation.

This dataset contains a single column of (de-duplicated) hash values, each value corresponding to some set of rows in the input dataset where entries indexed by columns in hash_column_names had the same hash.

If async_init was True, user should wait for the dataset to be finalized before using it. This status of the dataset can be checked by calling self.get_dataset() with the returned dataset ID.

list_cloud_uploads()[source]#

List all cloud uploads.

Returns:: A pandas DataFrame containing the list of cloud uploads.
Return type:: DataFrame

list_computation_sessions(filter_computation_session_ids=None, show_tags=False)[source]#

List computation sessions.

Parameters:

filter_computation_session_ids (list[str], optional) – List of specific computation session IDs to return. If None, all computation sessions are returned. Default is None.
show_tags (bool, optional) – Whether to include the tags column.

Returns:

A pandas DataFrame containing the list of computation sessions.

Return type:

DataFrame

list_computation_sessions_ids()[source]#

List computation session IDs.

Returns:: A list of computation session IDs.
Return type:: list[str]

list_computations()[source]#

List all computations.

Returns:: A pandas DataFrame containing the list of computations.
Return type:: DataFrame

list_computations_ids()[source]#

List the IDs of all computations.

Returns:: A list of computation IDs.
Return type:: list[str]

list_data_requests(filter_computation_session_id=None, filter_can_approve=False)[source]#

Lists data requests.

Parameters:

filter_computation_session_id (str, optional) – If provided, only data requests for this computation session ID will be returned.
filter_can_approve (bool) – If true, only data requests that the user can approve will be returned.

Returns:

A pandas DataFrame containing the list of data requests.

Return type:

DataFrame

list_data_requests_ids(filter_computation_session_id=None, filter_can_approve=False)[source]#

Lists the IDs of data requests.

Parameters:

filter_computation_session_id (str, optional) – If provided, only data requests for this computation session ID will be returned.
filter_can_approve (bool) – If true, only data requests that the user can approve will be returned.

Returns:

A list of data request IDs.

Return type:

list[str]

list_datasets()[source]#

List all datasets. For each dataset, lists metadata about the dataset object, but not about the data itself. e.g. the name, description, visibility, and permissions of the dataset.

Returns:: A pandas DataFrame containing the list of datasets.
Return type:: DataFrame

list_datasets_ids()[source]#

List the IDs of all datasets.

Returns:: A list of dataset IDs.
Return type:: list[str]

list_graphs()[source]#

List all Ciphercore graphs currently uploaded.

Returns:: A pandas DataFrame containing the list of graphs.
Return type:: DataFrame

list_graphs_ids()[source]#

List the IDs of all graphs.

Returns:: A list of graph IDs.
Return type:: list[str]

list_groups()[source]#

List all groups.

Returns:: A pandas DataFrame containing the list of groups.
Return type:: DataFrame

list_groups_ids()[source]#

List the IDs of all groups.

Returns:: A list of group IDs.
Return type:: list[str]

list_node_events(timestamp_ms, num_events)[source]#

Lists node audit events up to a given timestamp. Admin only.

Parameters:

timestamp_ms (int) – Timestamp, in milliseconds.
num_events (int) – Number of events to fetch.

Returns:

A pandas DataFrame containing node audit events.

Return type:

DataFrame

list_user_events(timestamp_ms, num_events, user='')[source]#

Lists user audit events up to a given timestamp. Admin only.

Parameters:

timestamp_ms (int) – Timestamp, in milliseconds.
num_events (int) – Number of events to fetch.
user (str, optional) – Email address to filter events on.

Returns:

A pandas DataFrame containing user audit events.

Return type:

DataFrame

list_users()[source]#

List all users.

Returns:: A pandas DataFrame containing the list of users.
Return type:: DataFrame

list_users_ids()[source]#

List the IDs of all users.

Returns:: A list of user IDs.
Return type:: list[str]

local_node_connections()[source]#

Get local node connections.

Returns:: Local node connections.
Return type:: list

node_connections()[source]#

Get node connections.

Returns:: A pandas DataFrame containing the node connections.
Return type:: DataFrame

poll_explore_dataset_intersection(session_id)[source]#

Polls the exploration of a dataset intersection.

Parameters:: session_id (str) – The session id associated with the dataset intersection exploration.
Returns:: Object containing explore computation details.
Return type:: ExploreDatasetIntersectionResponse

publish_dataset(id)[source]#

Make the dataset visible for all organizations.

Parameters:: id (str) – The ID of the dataset.
Returns:: A pandas DataFrame containing the published dataset.
Return type:: DataFrame

reject_data_request(id, comment='')[source]#

Rejects a data request.

Parameters:

id (str) – The ID of the data request to reject.
comment (str, optional) – A comment to attach to the data request.

Returns:

A pandas DataFrame containing the rejected data request.

Return type:

DataFrame

remove_user_role(user_id, role)[source]#

Remove a role from a user.

Parameters:

user_id (str) – The ID of the user.
role (str) – The role to be removed from the user.

Returns:

A pandas DataFrame containing the updated user information.

Return type:

DataFrame

run_gc()[source]#

Run garbage collection.

Returns:: The number of collected values.
Return type:: int

save_computation_session_result(id, name='', description='', as_csv=False, include_summary=False, sql_permissions=None, publish=False)[source]#

Saves the result of a computation session to a new dataset.

Parameters:

id (str) – The ID of the computation session.
name (str, optional) – The name to assign to the dataset.
description (str, optional) – The description to assign to the dataset.
as_csv (bool, optional) – Whether to treat the computation result as a CSV-like table (results in a columnwise dataset).
include_summary (bool, optional) – Whether to include a dataset summary for the newly created dataset.
sql_permissions (str, optional) – The SQL permissions to assign to the dataset.
publish (bool, optional) – Whether to make dataset visible for all organizations.

Returns:

A pandas DataFrame containing the new dataset.

Return type:

DataFrame

show_dataset(dataset_id)[source]#

Display metadata about the dataset with the specified ID. Returns metadata about the data stored in this object, e.g. the number of rows and columns, the type of data in the table, permissions, and the number of shards (if applicable).

Parameters:: dataset_id (str) – The ID of the dataset.
Returns:: A pandas DataFrame containing the dataset information.
Return type:: DataFrame

start_computation_session(id)[source]#

Start a specific computation session.

Parameters:: id (str) – The ID of the computation session to start.
Returns:: A pandas DataFrame containing the started computation session information.
Return type:: DataFrame

tag_computation_session(id, key, value=None)[source]#

Tag computation session.

Parameters:

id (str) – The ID of the computation session to start.
key (str) – Tag key.
value (str, optional) – Tag value. If None, the tag with a given key is removed instead.

upload_and_publish_dataset(*args, **kwargs)[source]#

Upload a dataset and than make it visible for all organizations.

See upload_dataset for arguments.

Returns:: A pandas DataFrame containing the uploaded and published dataset.
Return type:: DataFrame

upload_computation_session_result(id, endpoint, credentials)[source]#

Uploads the result of a computation session to a specified endpoint. The endpoint should be a valid filename within a cloud storage bucket for a supported cloud provider (AWS, GCP, or Azure).

Credentials is a dictionary that has the user’s cloud storage credentials. Possible keys are ‘aws_access_key_id’, ‘aws_secret_access_key’, and ‘aws_session_token’ for AWS credentials,

‘gcp_access_key_id’, ‘gcp_secret_access_key’, and ‘gcp_session_token’ for GCP credentials, and ‘secret_key’ for Azure credentials. Credentials for multiple providers can be passed in at once.

We will parse the cloud provider from the endpoint, then look in the credentials dictionary to find the relevant credentials for this cloud provider. Credentials can be left empty if the endpoint is public.

Parameters:

id (str) – The ID of the computation session.
endpoint (str) – The endpoint to which the computation session result will be uploaded.
credentials (dict) – The credentials to access the endpoint.

Returns:

A pandas DataFrame containing the new dataset.

Return type:

DataFrame

upload_dataset(name='', description='', type='columnwise', endpoint='', credentials={}, data=None, column_permissions='everything', sql_permissions='', include_report=True, publish=False, async_init=False, allow_secure_test=False)[source]#

Upload a dataset.

Parameters:

name (str, optional) – The name of the dataset.
description (str, optional) – A description of the dataset.
type (str, optional) – The type of the dataset. Default is ‘columnwise’, available options are {‘typed_value’, ‘columnwise’, ‘rowwise’, ‘model’}.
endpoint (str, optional) – In case of non-local datasets (cloud storage, remote SQL server), the address of the dataset object.
credentials (dict, optional) – The credentials to access the dataset in cloud storage. See details in docs for `upload_computation_session_result()’.
data (list, optional) – In case of local datasets, the data to upload (CSV files for columnwise/rowwise types, binary data of an ONNX model, or TypedValue JSON otherwise).
column_permissions (str, optional) – The column permissions of the dataset. Default is ‘everything’. Avaliable options are {‘everything’, ‘everything_local’, None}.
sql_permissions (str, optional) – The SQL permissions of the dataset.
include_report (bool, optional) – Whether to include a report in the upload.
publish (bool, optional) – Whether to make dataset visible for all organizations.
async_init (bool, optional) – Whether to download the dataset from the endpoint asynchronously.
allow_secure_test (bool, optional) – Whether to allow the dataset to be used in SecureTest computations.

Returns:

A pandas Series containing the uploaded dataset.

The ‘id’ field in the return value is the ID of the uploaded dataset, which is used to refer to this dataset in computations and other operations.

If async_init was True, user should wait for the dataset to be finalized before using it. This status of the dataset can be checked by calling self.get_dataset() with the returned dataset ID.

Raises:

CiphermodeException – If both endpoint and data are specified, or if permissions are given for a non-columnwise dataset.

upload_graph(serialized_graph)[source]#

Upload a serialized graph.

Parameters:: serialized_graph (str) – The serialized Ciphercore graph to upload.
Returns:: A pandas DataFrame containing the uploaded graph information.
Return type:: DataFrame

waterfall_gather(original_dataset_id, stage_session_ids, endpoint, credentials)[source]#

Post-processes the results of multiple PSI computations on hashed datasets output by hash_dataset_columns to obtain the indices of rows in the original dataset that matched, along with the index of the first computation they matched in.

Used to implement a multi-stage “waterfall” join by providing ordered session IDs for each stage. Can also be called with a single stage to obtain the row indices that matched for a single PSI computation.

Parameters:

original_dataset_id (str) – The original dataset ID.
stage_session_ids (list[str]) – Waterfall session IDs. Each should correspond to a PSI computation (made by create_psi) on hashed datasets (made with hash_dataset_columns). Should be non-empty.
endpoint (str) – The endpoint to which the computation session result will be uploaded.
credentials (dict) – The credentials to use for the cloud upload. See details in docs for upload_computation_session_result().

Returns:

If endpoint is empty, returns the result directly, encoded as bytes. Otherwise, returns a string that can be input to self.get_cloud_upload() to check the progress of uploading the result to the cloud.

Welcome to CiphermodeClient’s documentation!

Contents

Welcome to CiphermodeClient’s documentation!#

Indices and tables#