Welcome to CiphermodeClient’s documentation!#
- create_client(frontend_address, auth_config='~/.ciphercore/auth_config', token_path='~/.ciphercore/token', custom_root_ca=None, tls_domain='localhost', private_key=None, certificate_chain=None, *args, **kwargs)[source]#
Create a CiphermodeApi instance and intialize it.
- Parameters:
frontend_address (str) – The address of the server.
auth_config (str, optional) – Path to auth config.
token_path (str, optional) – Path to the token file.
custom_root_ca (str, optional) – Path to a TLS certificate file.
tls_domain (str, optional) – The domain protected by the TLS certificate.
private_key (str, optional) – Path to the client’s private key.
certificate_chain (str, optional) – Path to the client’s certificate chain.
*args – Arguments for the PandasConverter.
**kwargs – Kwargs for the PandasConverter.
- Returns:
An instance of the CiphermodeApi.
- Return type:
- class CiphermodeApi(address, auth_handler, cert=None, tls_domain=None, private_key=None, certificate_chain=None, *args, **kwargs)[source]#
Bases:
object
- add_user_role(user_id, role)[source]#
Add a role to a user.
- Parameters:
user_id (str) – The ID of the user.
role (str) – The role to be added to the user.
- Returns:
A pandas DataFrame containing the updated user information.
- Return type:
DataFrame
- approve_data_request(id, comment='')[source]#
Approves a data request.
- Parameters:
id (str) – The ID of the data request to approve.
comment (str, optional) – A comment to attach to the data request.
- Returns:
A pandas DataFrame containing the approved data request.
- Return type:
DataFrame
- build_info()[source]#
Get the build information.
- Returns:
An object containing the build information.
- Return type:
Object
- cancel_computation_session(id)[source]#
Cancel a specific computation session.
- Parameters:
id (str) – The ID of the computation session to cancel.
- Returns:
A pandas DataFrame containing the cancelled computation session information.
- Return type:
DataFrame
- comment_data_request(id, comment='')[source]#
Comments on a data request.
- Parameters:
id (str) – The ID of the data request to comment on.
comment (str) – The comment to attach to the data request.
- Returns:
A pandas DataFrame containing the commented data request.
- Return type:
DataFrame
- create_computation(orchestrator, graphs_config, name, description, config=None)[source]#
Create a computation.
Computation object specifies what computation to execute, regardless of the data. The same computation can be used multiple times with different datasets. Note that there are easier-to-use functions for specific computations (PSI, SQL, NN training, etc.).
- Parameters:
orchestrator (str) – The orchestrator type for the computation.
graphs_config (dict) – The “graph name -> graph ID” mapping.
name (str) – The name of the computation.
description (str) – The description of the computation.
config (dict, optional) – Additional orchestrator-specific configuration for the computation.
- Returns:
A pandas DataFrame containing the created computation information.
- Return type:
DataFrame
- create_computation_session(computation_id, data_config, name='', description='')[source]#
Create a computation session.
- Parameters:
computation_id (str) – The ID of the computation.
data_config (dict) – The mapping (name -> value ID). Names are orchestrator-specific (see orchestrator-specific functions for details, e.g. create_psi).
name (str, optional) – The name of the session.
description (str, optional) – The description of the session.
- Returns:
A pandas Series containing the created computation session information.
- create_explore_dataset_intersection(dataset_id1, dataset_id2, column_names1, column_names2, use_approx_match_rate=False)[source]#
Creates an exploration of the intersection between two datasets.
- Parameters:
dataset_id1 (str) – The ID of the first dataset.
dataset_id2 (str) – The ID of the second dataset.
column_names1 (list(str)) – Names of the columns in the first dataset to compare.
column_names2 (list(str)) – Names of the columns in the second dataset to compare.
use_approx_match_rate (bool, optional) – Whether to use approximate match rate. Default is False.
- Returns:
computation_session_id
- Return type:
String
- create_knn(key_dataset_id, query_dataset_id, num_neighbors, value_dataset_id=None, name='', description='')[source]#
Create a KNN (k-Nearest-Neighbors) computation session.
- Parameters:
key_dataset_id (str) – The ID of the rowwise dataset with lookup keys (vectors).
query_dataset_id (str) – The ID of the rowwise dataset with lookup queries (vectors).
num_neighbors (int) – The number of neighbors to consider in the KNN computation.
value_dataset_id (str, optional) – The ID of the dataset with labels. Default is None.
name (str, optional) – The name of the session.
description (str, optional) – The description of the session.
- Returns:
A pandas DataFrame containing the created computation session information.
- Return type:
DataFrame
- create_llm_inference(inference_dataset_id, model_dataset_id, max_len=128, num_layers=8, embedding_dim=512, num_heads=16, temperature=0.85, top_p=0.85, name='', description='')[source]#
Create a LLM inference computation session.
- Parameters:
inference_dataset_id (str) – The ID of the inference dataset.
model_dataset_id (str) – The ID of the model dataset.
max_len (int, optional) – The maximum length of the generated sequence.
num_layers (int, optional) – The number of layers in the model.
embedding_dim (int, optional) – The embedding dimension of the model.
num_heads (int, optional) – The number of attention heads in the model.
temperature (float, optional) – The temperature for sampling.
top_p (float, optional) – The top-p heuristic value for sampling.
name (str, optional) – The name of the session.
description (str, optional) – The description of the session.
- Returns:
A pandas DataFrame containing the created computation session information.
- Return type:
DataFrame
- create_mlp(train_datasets, validation_datasets, test_datasets, model_dataset, layers=[100, 1], batch_size=64, optimizer='adam', learning_rate=0.0003, loss='log_loss', epochs=3, precision=15, name='', description='')[source]#
Create an MLP (Multi-Layer Perceptron) training computation session.
- Parameters:
train_datasets (list) – The list of training dataset IDs.
validation_datasets (list) – The list of validation dataset IDs.
test_datasets (list) – The list of testing dataset IDs.
layers (list, optional) – List of hidden layer sizes in the MLP (in most cases, the last one should be 1). Default is [100, 1].
batch_size (int, optional) – Batch size for training. Default is 64.
optimizer (str, optional) – Optimizer to use for training. Default is ‘adam’, supported optimizers are ‘adam’, ‘adagrad’, ‘sgd’.
learning_rate (float, optional) – Learning rate for training. Default is 3e-4.
loss (str, optional) – Loss function to use for training. Default is ‘log_loss’, supported losses are ‘log_loss’ and ‘mse’.
epochs (int, optional) – Number of epochs for training. Default is 3.
precision (int, optional) – Precision for training. Default is 15. Training is performed in fixed-point arithmetic with denominator 2**precision.
name (str, optional) – The name of the session.
description (str, optional) – The description of the session.
- Returns:
A pandas DataFrame containing the created computation session information.
- Return type:
DataFrame
- create_nn_inference(inference_dataset_id, model_dataset_id, batch_size=64, precision=15, name='', description='')[source]#
Create a neural network inference computation session.
- Parameters:
inference_dataset_id (str) – The ID of the inference dataset.
model_dataset_id (str) – The ID of the model dataset.
batch_size (int, optional) – The batch size for inference. Default is 64, should be the same as for training.
precision (int, optional) – The precision for inference. Default is 15, should be the same as for training.
name (str, optional) – The name of the session.
description (str, optional) – The description of the session.
- Returns:
A pandas DataFrame containing the created computation session information.
- Return type:
DataFrame
- create_psi(first_dataset_id, second_dataset_id, first_dataset_columns, second_dataset_columns, name='', description='', sharded=True)[source]#
Create a PSI (Private Set Intersection) computation session.
- Parameters:
first_dataset_id (str) – The ID of the first dataset.
second_dataset_id (str) – The ID of the second dataset.
first_dataset_columns (list[str]) – The column from the first dataset to join.
second_dataset_columns (list[str]) – The column from the second dataset to join.
name (str, optional) – The name of the session.
description (str, optional) – The description of the session.
sharded (bool, optional) – Whether to shard the computation. Default is True.
- Returns:
A pandas Series containing the created computation session information.
- create_single_graph_computation(serialized_graph, name='', description='')[source]#
Create a single graph computation.
- Parameters:
serialized_graph (str) – The serialized Ciphercore graph to create a computation for.
name (str, optional) – The name of the computation.
description (str, optional) – The description of the computation.
- Returns:
A pandas DataFrame containing the created computation information.
- Return type:
DataFrame
- create_sql(query, data_config, name='', description='')[source]#
Create an SQL computation session.
- Parameters:
query (str) – The SQL query to execute.
data_config (dict) – The configuration of data for the computation.
name (str, optional) – The name of the session.
description (str, optional) – The description of the session.
- Returns:
A pandas DataFrame containing the created computation session information.
- Return type:
DataFrame
- delete_dataset(dataset_id)[source]#
Delete dataset with the specified ID.
- Parameters:
dataset_id (str) – The ID of the dataset.
- Returns:
True if dataset was successfully deleted.
- Return type:
bool
- download_computation_session_result(id, onnx=False)[source]#
Download the result of a specific computation session.
- Parameters:
id (str) – The ID of the computation session to download.
onnx (bool, optional) – Whether to convert the result to ONNX protobuf. Default is False.
- Returns:
A pandas DataFrame containing the downloaded computation session result.
- Return type:
DataFrame
- Raises:
CiphermodeException – if more than one of csv, onnx and float_array is set.
- download_graph(id)[source]#
Download a graph with the specified ID.
- Parameters:
id (str) – The ID of the graph.
- Returns:
The serialized Ciphercore graph.
- Return type:
str
- get_cloud_upload(id)[source]#
Get cloud upload with the specified ID.
- Returns:
A pandas DataFrame containing the cloud upload information.
- Return type:
DataFrame
- get_dataset(dataset_id)[source]#
Get the dataset with the specified ID.
- Parameters:
dataset_id (str) – The ID of the dataset.
- Returns:
The dataset with the specified ID.
- Return type:
Dataset
- get_knn_computation(num_neighbors, has_labels=False)[source]#
Create a KNN (k-nearest-neighbors) computation.
- Parameters:
num_neighbors (int) – The number of neighbors to consider in the KNN computation.
has_labels (bool, optional) – Whether the input data has labels. Default is False.
- Returns:
The ID of the created computation.
- Return type:
str
- get_llm_inference_computation(max_len, num_layers, embedding_dim, num_heads, temperature, top_p)[source]#
Create a LLM inference computation.
- Parameters:
max_len (int) – The maximum length of the generated text.
num_layers (int) – The number of layers in the transformer.
embedding_dim (int) – The embedding dimension of the transformer.
num_heads (int) – The number of heads in the transformer.
temperature (float) – The temperature for the sampling.
top_p (float) – The top p for the sampling.
- Returns:
The ID of the created computation.
- Return type:
str
- get_mlp_computation(layers, batch_size, optimizer, learning_rate, loss, epochs, precision)[source]#
Create an MLP (Multi-Layer Perceptron) computation.
- Parameters:
layers (list) – The list with the sizes of hidden layers in the MLP (note that the last one should be 1 in most cases).
batch_size (int) – The batch size for training.
optimizer (str) – The optimizer to use for training (we currently support ‘adam’, ‘adagrad’ and ‘sgd’).
learning_rate (float) – The learning rate for training.
loss (str) – The loss function to use for training (we currently support ‘log_loss’ and ‘mse’).
epochs (int) – The number of epochs for training.
precision (int) – The precision for training (it is conducted with fixed precision numbers, with 2**precision as denominator).
- Returns:
The ID of the created computation.
- Return type:
str
- get_nn_inference_computation(batch_size, precision)[source]#
Create a neural network inference computation.
- Parameters:
batch_size (int) – The batch size for inference, should be the same as for training.
precision (int) – The precision for inference, should be the same as for training.
- Returns:
The ID of the created computation.
- Return type:
str
- get_psi_computation(first_dataset_columns, second_dataset_columns, sharded=True)[source]#
Create a PSI (Private Set Intersection) computation.
- Parameters:
first_dataset_columns (list[str]) – The list of columns from the first dataset to join.
second_dataset_columns (list[str]) – The list of columns from the second dataset to join.
sharded (bool, optional) – Whether to shard the computation. Default is True.
- Returns:
The ID of the created computation.
- Return type:
str
- get_report(dataset_id)[source]#
Get the report of the specified dataset.
- Parameters:
dataset_id (str) – The ID of the dataset.
- Returns:
The report of the specified dataset.
- Return type:
Report (str)
- get_sql_computation(query)[source]#
Create an SQL computation.
- Parameters:
query (str) – The SQL query to execute. It can refer to tables by names, these names need to be specified in the corresponding computation session.
- Returns:
The ID of the created computation.
- Return type:
str
- hash_dataset_columns(dataset_id, hash_column_names, new_dataset_name, async_init=False)[source]#
Hashes entries of dataset with given column names to create a succinct representation of the input dataset.
Succinct representations output by this method can be matched with create_psi to get hash values they have in common.
- Parameters:
dataset_id (str) – The dataset ID.
hash_column_names (list[str]) – Columns from the dataset to hash.
new_dataset_name (str) – New dataset name.
async_init (bool, optional) – Whether to download the dataset from the endpoint asynchronously.
- Returns:
A pandas Series containing the dataset ID for the succinct representation.
This dataset contains a single column of (de-duplicated) hash values, each value corresponding to some set of rows in the input dataset where entries indexed by columns in hash_column_names had the same hash.
- list_cloud_uploads()[source]#
List all cloud uploads.
- Returns:
A pandas DataFrame containing the list of cloud uploads.
- Return type:
DataFrame
- list_computation_sessions(filter_computation_session_ids=None, show_tags=False)[source]#
List computation sessions.
- Parameters:
filter_computation_session_ids (list[str], optional) – List of specific computation session IDs to return. If None, all computation sessions are returned. Default is None.
show_tags (bool, optional) – Whether to include the tags column.
- Returns:
A pandas DataFrame containing the list of computation sessions.
- Return type:
DataFrame
- list_computation_sessions_ids()[source]#
List computation session IDs.
- Returns:
A list of computation session IDs.
- Return type:
list[str]
- list_computations()[source]#
List all computations.
- Returns:
A pandas DataFrame containing the list of computations.
- Return type:
DataFrame
- list_computations_ids()[source]#
List the IDs of all computations.
- Returns:
A list of computation IDs.
- Return type:
list[str]
- list_data_requests(filter_computation_session_id=None)[source]#
Lists data requests.
- Parameters:
filter_computation_session_id (str, optional) – If provided, only data requests for this computation session ID will be returned.
- Returns:
A pandas DataFrame containing the list of data requests.
- Return type:
DataFrame
- list_data_requests_ids()[source]#
Lists the IDs of data requests.
- Returns:
A list of data request IDs.
- Return type:
list[str]
- list_datasets()[source]#
List all datasets.
- Returns:
A pandas DataFrame containing the list of datasets.
- Return type:
DataFrame
- list_datasets_ids()[source]#
List the IDs of all datasets.
- Returns:
A list of dataset IDs.
- Return type:
list[str]
- list_graphs()[source]#
List all graphs.
- Returns:
A pandas DataFrame containing the list of graphs.
- Return type:
DataFrame
- list_graphs_ids()[source]#
List the IDs of all graphs.
- Returns:
A list of graph IDs.
- Return type:
list[str]
- list_groups()[source]#
List all groups.
- Returns:
A pandas DataFrame containing the list of groups.
- Return type:
DataFrame
- list_groups_ids()[source]#
List the IDs of all groups.
- Returns:
A list of group IDs.
- Return type:
list[str]
- list_node_events(timestamp_ms, num_events)[source]#
Lists node audit events up to a given timestamp. Admin only.
- Parameters:
timestamp_ms (int) – Timestamp, in milliseconds.
num_events (int) – Number of events to fetch.
- Returns:
A pandas DataFrame containing node audit events.
- Return type:
DataFrame
- list_user_events(timestamp_ms, num_events, user='')[source]#
Lists user audit events up to a given timestamp. Admin only.
- Parameters:
timestamp_ms (int) – Timestamp, in milliseconds.
num_events (int) – Number of events to fetch.
user (str, optional) – Email address to filter events on.
- Returns:
A pandas DataFrame containing user audit events.
- Return type:
DataFrame
- list_users()[source]#
List all users.
- Returns:
A pandas DataFrame containing the list of users.
- Return type:
DataFrame
- list_users_ids()[source]#
List the IDs of all users.
- Returns:
A list of user IDs.
- Return type:
list[str]
- local_node_connections()[source]#
Get local node connections.
- Returns:
Local node connections.
- Return type:
list
- node_connections()[source]#
Get node connections.
- Returns:
A pandas DataFrame containing the node connections.
- Return type:
DataFrame
- poll_explore_dataset_intersection(session_id)[source]#
Polls the exploration of a dataset intersection.
- Parameters:
session_id (str) – The session id associated with the dataset intersection exploration.
- Returns:
Object containing explore computation details.
- Return type:
ExploreDatasetIntersectionResponse
- publish_dataset(id)[source]#
Make the dataset visible for all organizations.
- Parameters:
id (str) – The ID of the dataset.
- Returns:
A pandas DataFrame containing the published dataset.
- Return type:
DataFrame
- reject_data_request(id, comment='')[source]#
Rejects a data request.
- Parameters:
id (str) – The ID of the data request to reject.
comment (str, optional) – A comment to attach to the data request.
- Returns:
A pandas DataFrame containing the rejected data request.
- Return type:
DataFrame
- remove_user_role(user_id, role)[source]#
Remove a role from a user.
- Parameters:
user_id (str) – The ID of the user.
role (str) – The role to be removed from the user.
- Returns:
A pandas DataFrame containing the updated user information.
- Return type:
DataFrame
- save_computation_session_result(id, name='', description='', as_csv=False, include_summary=False, sql_permissions=None, publish=False)[source]#
Saves the result of a computation session to a new dataset.
- Parameters:
id (str) – The ID of the computation session.
name (str, optional) – The name to assign to the dataset.
description (str, optional) – The description to assign to the dataset.
as_csv (bool, optional) – Whether to treat the computation result as a CSV-like table (results in a columnwise dataset).
include_summary (bool, optional) – Whether to include a dataset summary for the newly created dataset.
sql_permissions (str, optional) – The SQL permissions to assign to the dataset.
publish (bool, optional) – Whether to make dataset visible for all organizations.
- Returns:
A pandas DataFrame containing the new dataset.
- Return type:
DataFrame
- show_dataset(dataset_id)[source]#
Display the metadata about the dataset with the specified ID.
- Parameters:
dataset_id (str) – The ID of the dataset.
- Returns:
A pandas DataFrame containing the dataset information.
- Return type:
DataFrame
- start_computation_session(id)[source]#
Start a specific computation session.
- Parameters:
id (str) – The ID of the computation session to start.
- Returns:
A pandas DataFrame containing the started computation session information.
- Return type:
DataFrame
- tag_computation_session(id, key, value=None)[source]#
Tag computation session.
- Parameters:
id (str) – The ID of the computation session to start.
key (str) – Tag key.
value (str, optional) – Tag value. If None, the tag with a given key is removed instead.
- upload_and_publish_dataset(*args, **kwargs)[source]#
Upload a dataset and than make it visible for all organizations.
See upload_dataset for arguments.
- Returns:
A pandas DataFrame containing the uploaded and published dataset.
- Return type:
DataFrame
- upload_computation_session_result(id, endpoint)[source]#
Uploads the result of a computation session to a specified endpoint.
- Parameters:
id (str) – The ID of the computation session.
endpoint (str) – The endpoint to which the computation session result will be uploaded.
- Returns:
A pandas DataFrame containing the new dataset.
- Return type:
DataFrame
- upload_dataset(name='', description='', type='columnwise', endpoint='', data=None, column_permissions='everything', sql_permissions='', include_report=True, publish=False, async_init=False, allow_secure_test=False)[source]#
Upload a dataset.
- Parameters:
name (str, optional) – The name of the dataset.
description (str, optional) – A description of the dataset.
type (str, optional) – The type of the dataset. Default is ‘columnwise’, available options are {‘typed_value’, ‘columnwise’, ‘rowwise’, ‘model’}.
endpoint (str, optional) – In case of non-local datasets (cloud storage, remote SQL server), the address of the dataset object.
data (list, optional) – In case of local datasets, the data to upload (CSV files for columnwise/rowwise types, binary data of an ONNX model, or TypedValue JSON otherwise).
column_permissions (str, optional) – The column permissions of the dataset. Default is ‘everything’. Avaliable options are {‘everything’, ‘everything_local’, None}.
sql_permissions (str, optional) – The SQL permissions of the dataset.
include_report (bool, optional) – Whether to include a report in the upload.
publish (bool, optional) – Whether to make dataset visible for all organizations.
async_init (bool, optional) – Whether to download the dataset from the endpoint asynchronously.
allow_secure_test (bool, optional) – Whether to allow the dataset to be used in SecureTest computations.
- Returns:
A pandas Series containing the uploaded dataset.
- Raises:
CiphermodeException – If both endpoint and data are specified, or if permissions are given for a non-columnwise dataset.
- upload_graph(serialized_graph)[source]#
Upload a serialized graph.
- Parameters:
serialized_graph (str) – The serialized Ciphercore graph to upload.
- Returns:
A pandas DataFrame containing the uploaded graph information.
- Return type:
DataFrame
- waterfall_gather(original_dataset_id, stage_session_ids, endpoint)[source]#
Post-processes the results of multiple PSI computations on hashed datasets output by hash_dataset_columns to obtain the indices of rows in the original dataset that matched, along with the index of the first computation they matched in.
Can be used to implement a multi-stage “waterfall” join by providing ordered session IDs for each stage, or to convert a dataset of hashes into a dataset of indices in the original dataset corresponding to these hashes.
- Parameters:
original_dataset_id (str) – The original dataset ID.
stage_session_ids (list[str]) – Waterfall session IDs. Each should correspond to a PSI computation (made by create_psi) on hashed datasets (made with hash_dataset_columns).
endpoint (str) – The endpoint to which the computation session result will be uploaded.
- Returns:
A pandas Series containing the result of running a multi-stage waterfall match on stage_session_ids.
Note that the result will be empty if called with non-empty endpoint - the result will be written directly to cloud storage.