Welcome to CiphermodeClient’s documentation!#

create_client(frontend_address, auth_config='~/.ciphercore/auth_config', token_path='~/.ciphercore/token', custom_root_ca=None, tls_domain='localhost', private_key=None, certificate_chain=None, *args, **kwargs)[source]#

Create a CiphermodeApi instance and intialize it.

Parameters:
  • frontend_address (str) – The address of the server.

  • auth_config (str, optional) – Path to auth config.

  • token_path (str, optional) – Path to the token file.

  • custom_root_ca (str, optional) – Path to a TLS certificate file.

  • tls_domain (str, optional) – The domain protected by the TLS certificate.

  • private_key (str, optional) – Path to the client’s private key.

  • certificate_chain (str, optional) – Path to the client’s certificate chain.

  • *args – Arguments for the PandasConverter.

  • **kwargs – Kwargs for the PandasConverter.

Returns:

An instance of the CiphermodeApi.

Return type:

CiphermodeApi

class CiphermodeApi(address, auth_handler, cert=None, tls_domain=None, private_key=None, certificate_chain=None, *args, **kwargs)[source]#

Bases: object

add_user_role(user_id, role)[source]#

Add a role to a user.

Parameters:
  • user_id (str) – The ID of the user.

  • role (str) – The role to be added to the user.

Returns:

A pandas DataFrame containing the updated user information.

Return type:

DataFrame

approve_data_request(id, comment='')[source]#

Approves a data request.

Parameters:
  • id (str) – The ID of the data request to approve.

  • comment (str, optional) – A comment to attach to the data request.

Returns:

A pandas DataFrame containing the approved data request.

Return type:

DataFrame

build_info()[source]#

Get the build information.

Returns:

An object containing the build information.

Return type:

Object

cancel_computation_session(id)[source]#

Cancel a specific computation session.

Parameters:

id (str) – The ID of the computation session to cancel.

Returns:

A pandas DataFrame containing the cancelled computation session information.

Return type:

DataFrame

comment_data_request(id, comment='')[source]#

Comments on a data request.

Parameters:
  • id (str) – The ID of the data request to comment on.

  • comment (str) – The comment to attach to the data request.

Returns:

A pandas DataFrame containing the commented data request.

Return type:

DataFrame

create_computation(orchestrator, graphs_config, name, description, config=None)[source]#

Create a computation.

Computation object specifies what computation to execute, regardless of the data. The same computation can be used multiple times with different datasets. Note that there are easier-to-use functions for specific computations (PSI, SQL, NN training, etc.).

Parameters:
  • orchestrator (str) – The orchestrator type for the computation.

  • graphs_config (dict) – The “graph name -> graph ID” mapping.

  • name (str) – The name of the computation.

  • description (str) – The description of the computation.

  • config (dict, optional) – Additional orchestrator-specific configuration for the computation.

Returns:

A pandas DataFrame containing the created computation information.

Return type:

DataFrame

create_computation_session(computation_id, data_config, name='', description='')[source]#

Create a computation session.

Parameters:
  • computation_id (str) – The ID of the computation.

  • data_config (dict) – The mapping (name -> value ID). Names are orchestrator-specific (see orchestrator-specific functions for details, e.g. create_psi).

  • name (str, optional) – The name of the session.

  • description (str, optional) – The description of the session.

Returns:

A pandas Series containing the created computation session information.

create_explore_dataset_intersection(dataset_id1, dataset_id2, column_names1, column_names2, use_approx_match_rate=False)[source]#

Creates an exploration of the intersection between two datasets.

Parameters:
  • dataset_id1 (str) – The ID of the first dataset.

  • dataset_id2 (str) – The ID of the second dataset.

  • column_names1 (list(str)) – Names of the columns in the first dataset to compare.

  • column_names2 (list(str)) – Names of the columns in the second dataset to compare.

  • use_approx_match_rate (bool, optional) – Whether to use approximate match rate. Default is False.

Returns:

computation_session_id

Return type:

String

create_knn(key_dataset_id, query_dataset_id, num_neighbors, value_dataset_id=None, name='', description='')[source]#

Create a KNN (k-Nearest-Neighbors) computation session.

Parameters:
  • key_dataset_id (str) – The ID of the rowwise dataset with lookup keys (vectors).

  • query_dataset_id (str) – The ID of the rowwise dataset with lookup queries (vectors).

  • num_neighbors (int) – The number of neighbors to consider in the KNN computation.

  • value_dataset_id (str, optional) – The ID of the dataset with labels. Default is None.

  • name (str, optional) – The name of the session.

  • description (str, optional) – The description of the session.

Returns:

A pandas DataFrame containing the created computation session information.

Return type:

DataFrame

create_llm_inference(inference_dataset_id, model_dataset_id, max_len=128, num_layers=8, embedding_dim=512, num_heads=16, temperature=0.85, top_p=0.85, name='', description='')[source]#

Create a LLM inference computation session.

Parameters:
  • inference_dataset_id (str) – The ID of the inference dataset.

  • model_dataset_id (str) – The ID of the model dataset.

  • max_len (int, optional) – The maximum length of the generated sequence.

  • num_layers (int, optional) – The number of layers in the model.

  • embedding_dim (int, optional) – The embedding dimension of the model.

  • num_heads (int, optional) – The number of attention heads in the model.

  • temperature (float, optional) – The temperature for sampling.

  • top_p (float, optional) – The top-p heuristic value for sampling.

  • name (str, optional) – The name of the session.

  • description (str, optional) – The description of the session.

Returns:

A pandas DataFrame containing the created computation session information.

Return type:

DataFrame

create_mlp(train_datasets, validation_datasets, test_datasets, model_dataset, layers=[100, 1], batch_size=64, optimizer='adam', learning_rate=0.0003, loss='log_loss', epochs=3, precision=15, name='', description='')[source]#

Create an MLP (Multi-Layer Perceptron) training computation session.

Parameters:
  • train_datasets (list) – The list of training dataset IDs.

  • validation_datasets (list) – The list of validation dataset IDs.

  • test_datasets (list) – The list of testing dataset IDs.

  • layers (list, optional) – List of hidden layer sizes in the MLP (in most cases, the last one should be 1). Default is [100, 1].

  • batch_size (int, optional) – Batch size for training. Default is 64.

  • optimizer (str, optional) – Optimizer to use for training. Default is ‘adam’, supported optimizers are ‘adam’, ‘adagrad’, ‘sgd’.

  • learning_rate (float, optional) – Learning rate for training. Default is 3e-4.

  • loss (str, optional) – Loss function to use for training. Default is ‘log_loss’, supported losses are ‘log_loss’ and ‘mse’.

  • epochs (int, optional) – Number of epochs for training. Default is 3.

  • precision (int, optional) – Precision for training. Default is 15. Training is performed in fixed-point arithmetic with denominator 2**precision.

  • name (str, optional) – The name of the session.

  • description (str, optional) – The description of the session.

Returns:

A pandas DataFrame containing the created computation session information.

Return type:

DataFrame

create_nn_inference(inference_dataset_id, model_dataset_id, batch_size=64, precision=15, name='', description='')[source]#

Create a neural network inference computation session.

Parameters:
  • inference_dataset_id (str) – The ID of the inference dataset.

  • model_dataset_id (str) – The ID of the model dataset.

  • batch_size (int, optional) – The batch size for inference. Default is 64, should be the same as for training.

  • precision (int, optional) – The precision for inference. Default is 15, should be the same as for training.

  • name (str, optional) – The name of the session.

  • description (str, optional) – The description of the session.

Returns:

A pandas DataFrame containing the created computation session information.

Return type:

DataFrame

create_psi(first_dataset_id, second_dataset_id, first_dataset_columns, second_dataset_columns, name='', description='', sharded=True)[source]#

Create a PSI (Private Set Intersection) computation session.

Parameters:
  • first_dataset_id (str) – The ID of the first dataset.

  • second_dataset_id (str) – The ID of the second dataset.

  • first_dataset_columns (list[str]) – The column from the first dataset to join.

  • second_dataset_columns (list[str]) – The column from the second dataset to join.

  • name (str, optional) – The name of the session.

  • description (str, optional) – The description of the session.

  • sharded (bool, optional) – Whether to shard the computation. Default is True.

Returns:

A pandas Series containing the created computation session information.

create_single_graph_computation(serialized_graph, name='', description='')[source]#

Create a single graph computation.

Parameters:
  • serialized_graph (str) – The serialized Ciphercore graph to create a computation for.

  • name (str, optional) – The name of the computation.

  • description (str, optional) – The description of the computation.

Returns:

A pandas DataFrame containing the created computation information.

Return type:

DataFrame

create_sql(query, data_config, name='', description='')[source]#

Create an SQL computation session.

Parameters:
  • query (str) – The SQL query to execute.

  • data_config (dict) – The configuration of data for the computation.

  • name (str, optional) – The name of the session.

  • description (str, optional) – The description of the session.

Returns:

A pandas DataFrame containing the created computation session information.

Return type:

DataFrame

delete_dataset(dataset_id)[source]#

Delete dataset with the specified ID.

Parameters:

dataset_id (str) – The ID of the dataset.

Returns:

True if dataset was successfully deleted.

Return type:

bool

download_computation_session_result(id, onnx=False)[source]#

Download the result of a specific computation session.

Parameters:
  • id (str) – The ID of the computation session to download.

  • onnx (bool, optional) – Whether to convert the result to ONNX protobuf. Default is False.

Returns:

A pandas DataFrame containing the downloaded computation session result.

Return type:

DataFrame

Raises:

CiphermodeException – if more than one of csv, onnx and float_array is set.

download_graph(id)[source]#

Download a graph with the specified ID.

Parameters:

id (str) – The ID of the graph.

Returns:

The serialized Ciphercore graph.

Return type:

str

get_cloud_upload(id)[source]#

Get cloud upload with the specified ID.

Returns:

A pandas DataFrame containing the cloud upload information.

Return type:

DataFrame

get_dataset(dataset_id)[source]#

Get the dataset with the specified ID.

Parameters:

dataset_id (str) – The ID of the dataset.

Returns:

The dataset with the specified ID.

Return type:

Dataset

get_knn_computation(num_neighbors, has_labels=False)[source]#

Create a KNN (k-nearest-neighbors) computation.

Parameters:
  • num_neighbors (int) – The number of neighbors to consider in the KNN computation.

  • has_labels (bool, optional) – Whether the input data has labels. Default is False.

Returns:

The ID of the created computation.

Return type:

str

get_llm_inference_computation(max_len, num_layers, embedding_dim, num_heads, temperature, top_p)[source]#

Create a LLM inference computation.

Parameters:
  • max_len (int) – The maximum length of the generated text.

  • num_layers (int) – The number of layers in the transformer.

  • embedding_dim (int) – The embedding dimension of the transformer.

  • num_heads (int) – The number of heads in the transformer.

  • temperature (float) – The temperature for the sampling.

  • top_p (float) – The top p for the sampling.

Returns:

The ID of the created computation.

Return type:

str

get_mlp_computation(layers, batch_size, optimizer, learning_rate, loss, epochs, precision)[source]#

Create an MLP (Multi-Layer Perceptron) computation.

Parameters:
  • layers (list) – The list with the sizes of hidden layers in the MLP (note that the last one should be 1 in most cases).

  • batch_size (int) – The batch size for training.

  • optimizer (str) – The optimizer to use for training (we currently support ‘adam’, ‘adagrad’ and ‘sgd’).

  • learning_rate (float) – The learning rate for training.

  • loss (str) – The loss function to use for training (we currently support ‘log_loss’ and ‘mse’).

  • epochs (int) – The number of epochs for training.

  • precision (int) – The precision for training (it is conducted with fixed precision numbers, with 2**precision as denominator).

Returns:

The ID of the created computation.

Return type:

str

get_nn_inference_computation(batch_size, precision)[source]#

Create a neural network inference computation.

Parameters:
  • batch_size (int) – The batch size for inference, should be the same as for training.

  • precision (int) – The precision for inference, should be the same as for training.

Returns:

The ID of the created computation.

Return type:

str

get_psi_computation(first_dataset_columns, second_dataset_columns, sharded=True)[source]#

Create a PSI (Private Set Intersection) computation.

Parameters:
  • first_dataset_columns (list[str]) – The list of columns from the first dataset to join.

  • second_dataset_columns (list[str]) – The list of columns from the second dataset to join.

  • sharded (bool, optional) – Whether to shard the computation. Default is True.

Returns:

The ID of the created computation.

Return type:

str

get_report(dataset_id)[source]#

Get the report of the specified dataset.

Parameters:

dataset_id (str) – The ID of the dataset.

Returns:

The report of the specified dataset.

Return type:

Report (str)

get_sql_computation(query)[source]#

Create an SQL computation.

Parameters:

query (str) – The SQL query to execute. It can refer to tables by names, these names need to be specified in the corresponding computation session.

Returns:

The ID of the created computation.

Return type:

str

hash_dataset_columns(dataset_id, hash_column_names, new_dataset_name, async_init=False)[source]#

Hashes entries of dataset with given column names to create a succinct representation of the input dataset.

Succinct representations output by this method can be matched with create_psi to get hash values they have in common.

Parameters:
  • dataset_id (str) – The dataset ID.

  • hash_column_names (list[str]) – Columns from the dataset to hash.

  • new_dataset_name (str) – New dataset name.

  • async_init (bool, optional) – Whether to download the dataset from the endpoint asynchronously.

Returns:

A pandas Series containing the dataset ID for the succinct representation.

This dataset contains a single column of (de-duplicated) hash values, each value corresponding to some set of rows in the input dataset where entries indexed by columns in hash_column_names had the same hash.

list_cloud_uploads()[source]#

List all cloud uploads.

Returns:

A pandas DataFrame containing the list of cloud uploads.

Return type:

DataFrame

list_computation_sessions(filter_computation_session_ids=None, show_tags=False)[source]#

List computation sessions.

Parameters:
  • filter_computation_session_ids (list[str], optional) – List of specific computation session IDs to return. If None, all computation sessions are returned. Default is None.

  • show_tags (bool, optional) – Whether to include the tags column.

Returns:

A pandas DataFrame containing the list of computation sessions.

Return type:

DataFrame

list_computation_sessions_ids()[source]#

List computation session IDs.

Returns:

A list of computation session IDs.

Return type:

list[str]

list_computations()[source]#

List all computations.

Returns:

A pandas DataFrame containing the list of computations.

Return type:

DataFrame

list_computations_ids()[source]#

List the IDs of all computations.

Returns:

A list of computation IDs.

Return type:

list[str]

list_data_requests(filter_computation_session_id=None, filter_can_approve=False)[source]#

Lists data requests.

Parameters:
  • filter_computation_session_id (str, optional) – If provided, only data requests for this computation session ID will be returned.

  • filter_can_approve (bool) – If true, only data requests that the user can approve will be returned.

Returns:

A pandas DataFrame containing the list of data requests.

Return type:

DataFrame

list_data_requests_ids(filter_computation_session_id=None, filter_can_approve=False)[source]#

Lists the IDs of data requests.

Parameters:
  • filter_computation_session_id (str, optional) – If provided, only data requests for this computation session ID will be returned.

  • filter_can_approve (bool) – If true, only data requests that the user can approve will be returned.

Returns:

A list of data request IDs.

Return type:

list[str]

list_datasets()[source]#

List all datasets.

Returns:

A pandas DataFrame containing the list of datasets.

Return type:

DataFrame

list_datasets_ids()[source]#

List the IDs of all datasets.

Returns:

A list of dataset IDs.

Return type:

list[str]

list_graphs()[source]#

List all graphs.

Returns:

A pandas DataFrame containing the list of graphs.

Return type:

DataFrame

list_graphs_ids()[source]#

List the IDs of all graphs.

Returns:

A list of graph IDs.

Return type:

list[str]

list_groups()[source]#

List all groups.

Returns:

A pandas DataFrame containing the list of groups.

Return type:

DataFrame

list_groups_ids()[source]#

List the IDs of all groups.

Returns:

A list of group IDs.

Return type:

list[str]

list_node_events(timestamp_ms, num_events)[source]#

Lists node audit events up to a given timestamp. Admin only.

Parameters:
  • timestamp_ms (int) – Timestamp, in milliseconds.

  • num_events (int) – Number of events to fetch.

Returns:

A pandas DataFrame containing node audit events.

Return type:

DataFrame

list_user_events(timestamp_ms, num_events, user='')[source]#

Lists user audit events up to a given timestamp. Admin only.

Parameters:
  • timestamp_ms (int) – Timestamp, in milliseconds.

  • num_events (int) – Number of events to fetch.

  • user (str, optional) – Email address to filter events on.

Returns:

A pandas DataFrame containing user audit events.

Return type:

DataFrame

list_users()[source]#

List all users.

Returns:

A pandas DataFrame containing the list of users.

Return type:

DataFrame

list_users_ids()[source]#

List the IDs of all users.

Returns:

A list of user IDs.

Return type:

list[str]

local_node_connections()[source]#

Get local node connections.

Returns:

Local node connections.

Return type:

list

node_connections()[source]#

Get node connections.

Returns:

A pandas DataFrame containing the node connections.

Return type:

DataFrame

poll_explore_dataset_intersection(session_id)[source]#

Polls the exploration of a dataset intersection.

Parameters:

session_id (str) – The session id associated with the dataset intersection exploration.

Returns:

Object containing explore computation details.

Return type:

ExploreDatasetIntersectionResponse

publish_dataset(id)[source]#

Make the dataset visible for all organizations.

Parameters:

id (str) – The ID of the dataset.

Returns:

A pandas DataFrame containing the published dataset.

Return type:

DataFrame

reject_data_request(id, comment='')[source]#

Rejects a data request.

Parameters:
  • id (str) – The ID of the data request to reject.

  • comment (str, optional) – A comment to attach to the data request.

Returns:

A pandas DataFrame containing the rejected data request.

Return type:

DataFrame

remove_user_role(user_id, role)[source]#

Remove a role from a user.

Parameters:
  • user_id (str) – The ID of the user.

  • role (str) – The role to be removed from the user.

Returns:

A pandas DataFrame containing the updated user information.

Return type:

DataFrame

run_gc()[source]#

Run garbage collection.

Returns:

The number of collected values.

Return type:

int

save_computation_session_result(id, name='', description='', as_csv=False, include_summary=False, sql_permissions=None, publish=False)[source]#

Saves the result of a computation session to a new dataset.

Parameters:
  • id (str) – The ID of the computation session.

  • name (str, optional) – The name to assign to the dataset.

  • description (str, optional) – The description to assign to the dataset.

  • as_csv (bool, optional) – Whether to treat the computation result as a CSV-like table (results in a columnwise dataset).

  • include_summary (bool, optional) – Whether to include a dataset summary for the newly created dataset.

  • sql_permissions (str, optional) – The SQL permissions to assign to the dataset.

  • publish (bool, optional) – Whether to make dataset visible for all organizations.

Returns:

A pandas DataFrame containing the new dataset.

Return type:

DataFrame

show_dataset(dataset_id)[source]#

Display the metadata about the dataset with the specified ID.

Parameters:

dataset_id (str) – The ID of the dataset.

Returns:

A pandas DataFrame containing the dataset information.

Return type:

DataFrame

start_computation_session(id)[source]#

Start a specific computation session.

Parameters:

id (str) – The ID of the computation session to start.

Returns:

A pandas DataFrame containing the started computation session information.

Return type:

DataFrame

tag_computation_session(id, key, value=None)[source]#

Tag computation session.

Parameters:
  • id (str) – The ID of the computation session to start.

  • key (str) – Tag key.

  • value (str, optional) – Tag value. If None, the tag with a given key is removed instead.

upload_and_publish_dataset(*args, **kwargs)[source]#

Upload a dataset and than make it visible for all organizations.

See upload_dataset for arguments.

Returns:

A pandas DataFrame containing the uploaded and published dataset.

Return type:

DataFrame

upload_computation_session_result(id, endpoint)[source]#

Uploads the result of a computation session to a specified endpoint.

Parameters:
  • id (str) – The ID of the computation session.

  • endpoint (str) – The endpoint to which the computation session result will be uploaded.

Returns:

A pandas DataFrame containing the new dataset.

Return type:

DataFrame

upload_dataset(name='', description='', type='columnwise', endpoint='', data=None, column_permissions='everything', sql_permissions='', include_report=True, publish=False, async_init=False, allow_secure_test=False)[source]#

Upload a dataset.

Parameters:
  • name (str, optional) – The name of the dataset.

  • description (str, optional) – A description of the dataset.

  • type (str, optional) – The type of the dataset. Default is ‘columnwise’, available options are {‘typed_value’, ‘columnwise’, ‘rowwise’, ‘model’}.

  • endpoint (str, optional) – In case of non-local datasets (cloud storage, remote SQL server), the address of the dataset object.

  • data (list, optional) – In case of local datasets, the data to upload (CSV files for columnwise/rowwise types, binary data of an ONNX model, or TypedValue JSON otherwise).

  • column_permissions (str, optional) – The column permissions of the dataset. Default is ‘everything’. Avaliable options are {‘everything’, ‘everything_local’, None}.

  • sql_permissions (str, optional) – The SQL permissions of the dataset.

  • include_report (bool, optional) – Whether to include a report in the upload.

  • publish (bool, optional) – Whether to make dataset visible for all organizations.

  • async_init (bool, optional) – Whether to download the dataset from the endpoint asynchronously.

  • allow_secure_test (bool, optional) – Whether to allow the dataset to be used in SecureTest computations.

Returns:

A pandas Series containing the uploaded dataset.

Raises:

CiphermodeException – If both endpoint and data are specified, or if permissions are given for a non-columnwise dataset.

upload_graph(serialized_graph)[source]#

Upload a serialized graph.

Parameters:

serialized_graph (str) – The serialized Ciphercore graph to upload.

Returns:

A pandas DataFrame containing the uploaded graph information.

Return type:

DataFrame

waterfall_gather(original_dataset_id, stage_session_ids, endpoint)[source]#

Post-processes the results of multiple PSI computations on hashed datasets output by hash_dataset_columns to obtain the indices of rows in the original dataset that matched, along with the index of the first computation they matched in.

Can be used to implement a multi-stage “waterfall” join by providing ordered session IDs for each stage, or to convert a dataset of hashes into a dataset of indices in the original dataset corresponding to these hashes.

Parameters:
  • original_dataset_id (str) – The original dataset ID.

  • stage_session_ids (list[str]) – Waterfall session IDs. Each should correspond to a PSI computation (made by create_psi) on hashed datasets (made with hash_dataset_columns).

  • endpoint (str) – The endpoint to which the computation session result will be uploaded.

Returns:

A pandas Series containing the result of running a multi-stage waterfall match on stage_session_ids.

Note that the result will be empty if called with non-empty endpoint - the result will be written directly to cloud storage.

Indices and tables#