Welcome to CiphermodeClient’s documentation!

Welcome to CiphermodeClient’s documentation!#

create_client(frontend_address, auth_config='~/.ciphercore/auth_config', token_path='~/.ciphercore/token', custom_root_ca=None, tls_domain='localhost', private_key=None, certificate_chain=None, *args, **kwargs)[source]#

Create a CiphermodeApi instance and intialize it.

Parameters:

frontend_address (str) – The address of the server.
auth_config (str, optional) – Path to auth config.
token_path (str, optional) – Path to the token file.
custom_root_ca (str, optional) – Path to a TLS certificate file.
tls_domain (str, optional) – The domain protected by the TLS certificate.
private_key (str, optional) – Path to the client’s private key.
certificate_chain (str, optional) – Path to the client’s certificate chain.
*args – Arguments for the PandasConverter.
**kwargs – Kwargs for the PandasConverter.

Returns:

An instance of the CiphermodeApi.

Return type:

CiphermodeApi

class CiphermodeApi(address, auth_handler, cert=None, tls_domain=None, private_key=None, certificate_chain=None, *args, **kwargs)[source]#

Bases: object

add_user_role(user_id, role)[source]#

Add a role to a user.

Parameters:

user_id (str) – The ID of the user.
role (str) – The role to be added to the user.

Returns:

A pandas DataFrame containing the updated user information.

Return type:

DataFrame

approve_data_request(id, comment='')[source]#

Approves a data request.

Parameters:

id (str) – The ID of the data request to approve.
comment (str, optional) – A comment to attach to the data request.

Returns:

A pandas DataFrame containing the approved data request.

Return type:

DataFrame

build_info()[source]#

Get the build information.

Returns:: An object containing the build information.
Return type:: Object

cancel_computation_session(id)[source]#

Cancel a specific computation session.

Parameters:: id (str) – The ID of the computation session to cancel.
Returns:: A pandas DataFrame containing the cancelled computation session information.
Return type:: DataFrame

comment_data_request(id, comment='')[source]#

Comments on a data request.

Parameters:

id (str) – The ID of the data request to comment on.
comment (str) – The comment to attach to the data request.

Returns:

A pandas DataFrame containing the commented data request.

Return type:

DataFrame

create_computation(orchestrator, graphs_config, name, description, config=None)[source]#

Create a computation.

Computation object specifies what computation to execute, regardless of the data. The same computation can be used multiple times with different datasets. Note that there are easier-to-use functions for specific computations (PSI, SQL, NN training, etc.).

Parameters:

orchestrator (str) – The orchestrator type for the computation.
graphs_config (dict) – The “graph name -> graph ID” mapping.
name (str) – The name of the computation.
description (str) – The description of the computation.
config (dict, optional) – Additional orchestrator-specific configuration for the computation.

Returns:

A pandas DataFrame containing the created computation information.

Return type:

DataFrame

create_computation_session(computation_id, data_config, name='', description='')[source]#

Create a computation session.

Parameters:

computation_id (str) – The ID of the computation.
data_config (dict) – The mapping (name -> value ID). Names are orchestrator-specific (see orchestrator-specific functions for details, e.g. create_psi).
name (str, optional) – The name of the session.
description (str, optional) – The description of the session.

Returns:

A pandas Series containing the created computation session information.

create_explore_dataset_intersection(dataset_id1, dataset_id2, column_names1, column_names2, use_approx_match_rate=False)[source]#

Creates an exploration of the intersection between two datasets.

Parameters:

dataset_id1 (str) – The ID of the first dataset.
dataset_id2 (str) – The ID of the second dataset.
column_names1 (list(str)) – Names of the columns in the first dataset to compare.
column_names2 (list(str)) – Names of the columns in the second dataset to compare.
use_approx_match_rate (bool, optional) – Whether to use approximate match rate. Default is False.

Returns:

computation_session_id

Return type:

String

create_knn(key_dataset_id, query_dataset_id, num_neighbors, value_dataset_id=None, name='', description='')[source]#

Create a KNN (k-Nearest-Neighbors) computation session.

Parameters:

key_dataset_id (str) – The ID of the rowwise dataset with lookup keys (vectors).
query_dataset_id (str) – The ID of the rowwise dataset with lookup queries (vectors).
num_neighbors (int) – The number of neighbors to consider in the KNN computation.
value_dataset_id (str, optional) – The ID of the dataset with labels. Default is None.
name (str, optional) – The name of the session.
description (str, optional) – The description of the session.

Returns:

A pandas DataFrame containing the created computation session information.

Return type:

DataFrame

create_llm_inference(inference_dataset_id, model_dataset_id, max_len=128, num_layers=8, embedding_dim=512, num_heads=16, temperature=0.85, top_p=0.85, name='', description='')[source]#

Create a LLM inference computation session.

Parameters:

inference_dataset_id (str) – The ID of the inference dataset.
model_dataset_id (str) – The ID of the model dataset.
max_len (int, optional) – The maximum length of the generated sequence.
num_layers (int, optional) – The number of layers in the model.
embedding_dim (int, optional) – The embedding dimension of the model.
num_heads (int, optional) – The number of attention heads in the model.
temperature (float, optional) – The temperature for sampling.
top_p (float, optional) – The top-p heuristic value for sampling.
name (str, optional) – The name of the session.
description (str, optional) – The description of the session.

Returns:

A pandas DataFrame containing the created computation session information.

Return type:

DataFrame

create_mlp(train_datasets, validation_datasets, test_datasets, model_dataset, layers=[100, 1], batch_size=64, optimizer='adam', learning_rate=0.0003, loss='log_loss', epochs=3, precision=15, name='', description='')[source]#

Create an MLP (Multi-Layer Perceptron) training computation session.

Parameters:

train_datasets (list) – The list of training dataset IDs.
validation_datasets (list) – The list of validation dataset IDs.
test_datasets (list) – The list of testing dataset IDs.
layers (list, optional) – List of hidden layer sizes in the MLP (in most cases, the last one should be 1). Default is [100, 1].
batch_size (int, optional) – Batch size for training. Default is 64.
optimizer (str, optional) – Optimizer to use for training. Default is ‘adam’, supported optimizers are ‘adam’, ‘adagrad’, ‘sgd’.
learning_rate (float, optional) – Learning rate for training. Default is 3e-4.
loss (str, optional) – Loss function to use for training. Default is ‘log_loss’, supported losses are ‘log_loss’ and ‘mse’.
epochs (int, optional) – Number of epochs for training. Default is 3.
precision (int, optional) – Precision for training. Default is 15. Training is performed in fixed-point arithmetic with denominator 2**precision.
name (str, optional) – The name of the session.
description (str, optional) – The description of the session.

Returns:

A pandas DataFrame containing the created computation session information.

Return type:

DataFrame

create_nn_inference(inference_dataset_id, model_dataset_id, batch_size=64, precision=15, name='', description='')[source]#

Create a neural network inference computation session.

Parameters:

inference_dataset_id (str) – The ID of the inference dataset.
model_dataset_id (str) – The ID of the model dataset.
batch_size (int, optional) – The batch size for inference. Default is 64, should be the same as for training.
precision (int, optional) – The precision for inference. Default is 15, should be the same as for training.
name (str, optional) – The name of the session.
description (str, optional) – The description of the session.

Returns:

A pandas DataFrame containing the created computation session information.

Return type:

DataFrame

create_psi(first_dataset_id, second_dataset_id, first_dataset_columns, second_dataset_columns, name='', description='', sharded=True)[source]#

Create a PSI (Private Set Intersection) computation session.

Parameters:

first_dataset_id (str) – The ID of the first dataset.
second_dataset_id (str) – The ID of the second dataset.
first_dataset_columns (list[str]) – The column from the first dataset to join.
second_dataset_columns (list[str]) – The column from the second dataset to join.
name (str, optional) – The name of the session.
description (str, optional) – The description of the session.
sharded (bool, optional) – Whether to shard the computation. Default is True.

Returns:

A pandas Series containing the created computation session information.

create_single_graph_computation(serialized_graph, name='', description='')[source]#

Create a single graph computation.

Parameters:

serialized_graph (str) – The serialized Ciphercore graph to create a computation for.
name (str, optional) – The name of the computation.
description (str, optional) – The description of the computation.

Returns:

A pandas DataFrame containing the created computation information.

Return type:

DataFrame

create_sql(query, data_config, name='', description='')[source]#

Create an SQL computation session.

Parameters:

query (str) – The SQL query to execute.
data_config (dict) – The configuration of data for the computation.
name (str, optional) – The name of the session.
description (str, optional) – The description of the session.

Returns:

A pandas DataFrame containing the created computation session information.

Return type:

DataFrame

delete_dataset(dataset_id)[source]#

Delete dataset with the specified ID.

Parameters:: dataset_id (str) – The ID of the dataset.
Returns:: True if dataset was successfully deleted.
Return type:: bool

download_computation_session_result(id, onnx=False)[source]#

Download the result of a specific computation session.

Parameters:

id (str) – The ID of the computation session to download.
onnx (bool, optional) – Whether to convert the result to ONNX protobuf. Default is False.

Returns:

A pandas DataFrame containing the downloaded computation session result.

Return type:

DataFrame

Raises:

CiphermodeException – if more than one of csv, onnx and float_array is set.

download_graph(id)[source]#

Download a graph with the specified ID.

Parameters:: id (str) – The ID of the graph.
Returns:: The serialized Ciphercore graph.
Return type:: str

get_cloud_upload(id)[source]#

Get cloud upload with the specified ID.

Returns:: A pandas DataFrame containing the cloud upload information.
Return type:: DataFrame

get_dataset(dataset_id)[source]#

Get the dataset with the specified ID.

Parameters:: dataset_id (str) – The ID of the dataset.
Returns:: The dataset with the specified ID.
Return type:: Dataset

get_knn_computation(num_neighbors, has_labels=False)[source]#

Create a KNN (k-nearest-neighbors) computation.

Parameters:

num_neighbors (int) – The number of neighbors to consider in the KNN computation.
has_labels (bool, optional) – Whether the input data has labels. Default is False.

Returns:

The ID of the created computation.

Return type:

str

get_llm_inference_computation(max_len, num_layers, embedding_dim, num_heads, temperature, top_p)[source]#

Create a LLM inference computation.

Parameters:

max_len (int) – The maximum length of the generated text.
num_layers (int) – The number of layers in the transformer.
embedding_dim (int) – The embedding dimension of the transformer.
num_heads (int) – The number of heads in the transformer.
temperature (float) – The temperature for the sampling.
top_p (float) – The top p for the sampling.

Returns:

The ID of the created computation.

Return type:

str

get_mlp_computation(layers, batch_size, optimizer, learning_rate, loss, epochs, precision)[source]#

Create an MLP (Multi-Layer Perceptron) computation.

Parameters:

layers (list) – The list with the sizes of hidden layers in the MLP (note that the last one should be 1 in most cases).
batch_size (int) – The batch size for training.
optimizer (str) – The optimizer to use for training (we currently support ‘adam’, ‘adagrad’ and ‘sgd’).
learning_rate (float) – The learning rate for training.
loss (str) – The loss function to use for training (we currently support ‘log_loss’ and ‘mse’).
epochs (int) – The number of epochs for training.
precision (int) – The precision for training (it is conducted with fixed precision numbers, with 2**precision as denominator).

Returns:

The ID of the created computation.

Return type:

str

get_nn_inference_computation(batch_size, precision)[source]#

Create a neural network inference computation.

Parameters:

batch_size (int) – The batch size for inference, should be the same as for training.
precision (int) – The precision for inference, should be the same as for training.

Returns:

The ID of the created computation.

Return type:

str

get_psi_computation(first_dataset_columns, second_dataset_columns, sharded=True)[source]#

Create a PSI (Private Set Intersection) computation.

Parameters:

first_dataset_columns (list[str]) – The list of columns from the first dataset to join.
second_dataset_columns (list[str]) – The list of columns from the second dataset to join.
sharded (bool, optional) – Whether to shard the computation. Default is True.

Returns:

The ID of the created computation.

Return type:

str

get_report(dataset_id)[source]#

Get the report of the specified dataset.

Parameters:: dataset_id (str) – The ID of the dataset.
Returns:: The report of the specified dataset.
Return type:: Report (str)

get_sql_computation(query)[source]#

Create an SQL computation.

Parameters:: query (str) – The SQL query to execute. It can refer to tables by names, these names need to be specified in the corresponding computation session.
Returns:: The ID of the created computation.
Return type:: str

hash_dataset_columns(dataset_id, hash_column_names, new_dataset_name, async_init=False)[source]#

Hashes entries of dataset with given column names to create a succinct representation of the input dataset.

Succinct representations output by this method can be matched with create_psi to get hash values they have in common.

Parameters:

dataset_id (str) – The dataset ID.
hash_column_names (list[str]) – Columns from the dataset to hash.
new_dataset_name (str) – New dataset name.
async_init (bool, optional) – Whether to download the dataset from the endpoint asynchronously.

Returns:

A pandas Series containing the dataset ID for the succinct representation.

This dataset contains a single column of (de-duplicated) hash values, each value corresponding to some set of rows in the input dataset where entries indexed by columns in hash_column_names had the same hash.

list_cloud_uploads()[source]#

List all cloud uploads.

Returns:: A pandas DataFrame containing the list of cloud uploads.
Return type:: DataFrame

list_computation_sessions(filter_computation_session_ids=None, show_tags=False)[source]#

List computation sessions.

Parameters:

filter_computation_session_ids (list[str], optional) – List of specific computation session IDs to return. If None, all computation sessions are returned. Default is None.
show_tags (bool, optional) – Whether to include the tags column.

Returns:

A pandas DataFrame containing the list of computation sessions.

Return type:

DataFrame

list_computation_sessions_ids()[source]#

List computation session IDs.

Returns:: A list of computation session IDs.
Return type:: list[str]

list_computations()[source]#

List all computations.

Returns:: A pandas DataFrame containing the list of computations.
Return type:: DataFrame

list_computations_ids()[source]#

List the IDs of all computations.

Returns:: A list of computation IDs.
Return type:: list[str]

list_data_requests(filter_computation_session_id=None)[source]#

Lists data requests.

Parameters:: filter_computation_session_id (str, optional) – If provided, only data requests for this computation session ID will be returned.
Returns:: A pandas DataFrame containing the list of data requests.
Return type:: DataFrame

list_data_requests_ids()[source]#

Lists the IDs of data requests.

Returns:: A list of data request IDs.
Return type:: list[str]

list_datasets()[source]#

List all datasets.

Returns:: A pandas DataFrame containing the list of datasets.
Return type:: DataFrame

list_datasets_ids()[source]#

List the IDs of all datasets.

Returns:: A list of dataset IDs.
Return type:: list[str]

list_graphs()[source]#

List all graphs.

Returns:: A pandas DataFrame containing the list of graphs.
Return type:: DataFrame

list_graphs_ids()[source]#

List the IDs of all graphs.

Returns:: A list of graph IDs.
Return type:: list[str]

list_groups()[source]#

List all groups.

Returns:: A pandas DataFrame containing the list of groups.
Return type:: DataFrame

list_groups_ids()[source]#

List the IDs of all groups.

Returns:: A list of group IDs.
Return type:: list[str]

list_node_events(timestamp_ms, num_events)[source]#

Lists node audit events up to a given timestamp. Admin only.

Parameters:

timestamp_ms (int) – Timestamp, in milliseconds.
num_events (int) – Number of events to fetch.

Returns:

A pandas DataFrame containing node audit events.

Return type:

DataFrame

list_user_events(timestamp_ms, num_events, user='')[source]#

Lists user audit events up to a given timestamp. Admin only.

Parameters:

timestamp_ms (int) – Timestamp, in milliseconds.
num_events (int) – Number of events to fetch.
user (str, optional) – Email address to filter events on.

Returns:

A pandas DataFrame containing user audit events.

Return type:

DataFrame

list_users()[source]#

List all users.

Returns:: A pandas DataFrame containing the list of users.
Return type:: DataFrame

list_users_ids()[source]#

List the IDs of all users.

Returns:: A list of user IDs.
Return type:: list[str]

local_node_connections()[source]#

Get local node connections.

Returns:: Local node connections.
Return type:: list

node_connections()[source]#

Get node connections.

Returns:: A pandas DataFrame containing the node connections.
Return type:: DataFrame

poll_explore_dataset_intersection(session_id)[source]#

Polls the exploration of a dataset intersection.

Parameters:: session_id (str) – The session id associated with the dataset intersection exploration.
Returns:: Object containing explore computation details.
Return type:: ExploreDatasetIntersectionResponse

publish_dataset(id)[source]#

Make the dataset visible for all organizations.

Parameters:: id (str) – The ID of the dataset.
Returns:: A pandas DataFrame containing the published dataset.
Return type:: DataFrame

reject_data_request(id, comment='')[source]#

Rejects a data request.

Parameters:

id (str) – The ID of the data request to reject.
comment (str, optional) – A comment to attach to the data request.

Returns:

A pandas DataFrame containing the rejected data request.

Return type:

DataFrame

remove_user_role(user_id, role)[source]#

Remove a role from a user.

Parameters:

user_id (str) – The ID of the user.
role (str) – The role to be removed from the user.

Returns:

A pandas DataFrame containing the updated user information.

Return type:

DataFrame

run_gc()[source]#

Run garbage collection.

Returns:: The number of collected values.
Return type:: int

save_computation_session_result(id, name='', description='', as_csv=False, include_summary=False, sql_permissions=None, publish=False)[source]#

Saves the result of a computation session to a new dataset.

Parameters:

id (str) – The ID of the computation session.
name (str, optional) – The name to assign to the dataset.
description (str, optional) – The description to assign to the dataset.
as_csv (bool, optional) – Whether to treat the computation result as a CSV-like table (results in a columnwise dataset).
include_summary (bool, optional) – Whether to include a dataset summary for the newly created dataset.
sql_permissions (str, optional) – The SQL permissions to assign to the dataset.
publish (bool, optional) – Whether to make dataset visible for all organizations.

Returns:

A pandas DataFrame containing the new dataset.

Return type:

DataFrame

show_dataset(dataset_id)[source]#

Display the metadata about the dataset with the specified ID.

Parameters:: dataset_id (str) – The ID of the dataset.
Returns:: A pandas DataFrame containing the dataset information.
Return type:: DataFrame

start_computation_session(id)[source]#

Start a specific computation session.

Parameters:: id (str) – The ID of the computation session to start.
Returns:: A pandas DataFrame containing the started computation session information.
Return type:: DataFrame

tag_computation_session(id, key, value=None)[source]#

Tag computation session.

Parameters:

id (str) – The ID of the computation session to start.
key (str) – Tag key.
value (str, optional) – Tag value. If None, the tag with a given key is removed instead.

upload_and_publish_dataset(*args, **kwargs)[source]#

Upload a dataset and than make it visible for all organizations.

See upload_dataset for arguments.

Returns:: A pandas DataFrame containing the uploaded and published dataset.
Return type:: DataFrame

upload_computation_session_result(id, endpoint)[source]#

Uploads the result of a computation session to a specified endpoint.

Parameters:

id (str) – The ID of the computation session.
endpoint (str) – The endpoint to which the computation session result will be uploaded.

Returns:

A pandas DataFrame containing the new dataset.

Return type:

DataFrame

upload_dataset(name='', description='', type='columnwise', endpoint='', data=None, column_permissions='everything', sql_permissions='', include_report=True, publish=False, async_init=False, allow_secure_test=False)[source]#

Upload a dataset.

Parameters:

name (str, optional) – The name of the dataset.
description (str, optional) – A description of the dataset.
type (str, optional) – The type of the dataset. Default is ‘columnwise’, available options are {‘typed_value’, ‘columnwise’, ‘rowwise’, ‘model’}.
endpoint (str, optional) – In case of non-local datasets (cloud storage, remote SQL server), the address of the dataset object.
data (list, optional) – In case of local datasets, the data to upload (CSV files for columnwise/rowwise types, binary data of an ONNX model, or TypedValue JSON otherwise).
column_permissions (str, optional) – The column permissions of the dataset. Default is ‘everything’. Avaliable options are {‘everything’, ‘everything_local’, None}.
sql_permissions (str, optional) – The SQL permissions of the dataset.
include_report (bool, optional) – Whether to include a report in the upload.
publish (bool, optional) – Whether to make dataset visible for all organizations.
async_init (bool, optional) – Whether to download the dataset from the endpoint asynchronously.
allow_secure_test (bool, optional) – Whether to allow the dataset to be used in SecureTest computations.

Returns:

A pandas Series containing the uploaded dataset.

Raises:

CiphermodeException – If both endpoint and data are specified, or if permissions are given for a non-columnwise dataset.

upload_graph(serialized_graph)[source]#

Upload a serialized graph.

Parameters:: serialized_graph (str) – The serialized Ciphercore graph to upload.
Returns:: A pandas DataFrame containing the uploaded graph information.
Return type:: DataFrame

waterfall_gather(original_dataset_id, stage_session_ids, endpoint)[source]#

Post-processes the results of multiple PSI computations on hashed datasets output by hash_dataset_columns to obtain the indices of rows in the original dataset that matched, along with the index of the first computation they matched in.

Can be used to implement a multi-stage “waterfall” join by providing ordered session IDs for each stage, or to convert a dataset of hashes into a dataset of indices in the original dataset corresponding to these hashes.

Parameters:

original_dataset_id (str) – The original dataset ID.
stage_session_ids (list[str]) – Waterfall session IDs. Each should correspond to a PSI computation (made by create_psi) on hashed datasets (made with hash_dataset_columns).
endpoint (str) – The endpoint to which the computation session result will be uploaded.

Returns:

A pandas Series containing the result of running a multi-stage waterfall match on stage_session_ids.

Note that the result will be empty if called with non-empty endpoint - the result will be written directly to cloud storage.

Welcome to CiphermodeClient’s documentation!

Contents

Welcome to CiphermodeClient’s documentation!#

Indices and tables#