diff --git a/README.md b/README.md index cc2a78a..00d58e4 100644 --- a/README.md +++ b/README.md @@ -1,5 +1,10 @@ # Skyflow Python SDK +[![PyPI version](https://img.shields.io/pypi/v/skyflow.svg)](https://pypi.org/project/skyflow/) +[![Python versions](https://img.shields.io/pypi/pyversions/skyflow.svg)](https://pypi.org/project/skyflow/) +[![CI Checks](https://github.com/skyflowapi/skyflow-python/actions/workflows/ci.yml/badge.svg)](https://github.com/skyflowapi/skyflow-python/actions/workflows/ci.yml) +[![License](https://img.shields.io/github/license/skyflowapi/skyflow-python.svg)](https://github.com/skyflowapi/skyflow-python/blob/main/LICENSE) + > **This is the current, recommended version of the Skyflow SDK.** V2.1.0 brings flexible auth, multi-vault support, native data types, and rich error diagnostics. > > Migrating from v1? See the **[Migration Guide](https://github.com/skyflowapi/skyflow-python/blob/main/docs/migrate_to_v2.md)** for step-by-step instructions. V1 is in maintenance mode and will reach End of Life on October 31, 2026. @@ -69,6 +74,9 @@ The Skyflow Python SDK is designed to help with integrating Skyflow into a Pytho The Skyflow SDK enables you to connect to your Skyflow Vault(s) to securely handle sensitive data at rest, in-transit, and in-use. +> [!TIP] +> Looking for the full list of request parameters, response object attributes, enums, client-management methods, and Detect helper classes? See the **[API Reference](docs/api_reference.md)**. + > [!IMPORTANT] > This readme documents SDK version 2. > For version 1 see the [v1.16.0 README](https://github.com/skyflowapi/skyflow-python/tree/v1). @@ -78,7 +86,7 @@ The Skyflow SDK enables you to connect to your Skyflow Vault(s) to securely hand ### Require -- Python 3.8.0 and above (tested with Python 3.8.0) +- Python 3.9 and above (tested with Python 3.9) ### Configuration @@ -92,6 +100,19 @@ pip install skyflow Get started quickly with the essential steps: authenticate, initialize the client, and perform a basic vault operation. This section shows you a minimal working example. +### Before you begin + +To run the examples below, you need a Skyflow account and a few values from the Skyflow Studio console. If you don't have an account yet, [request a demo](https://www.skyflow.com/get-demo). + +| Value | Where to find it | +|-------|------------------| +| `vault_id` | Your vault's details page in Skyflow Studio. | +| `cluster_id` | The first segment of your vault URL: `https://{cluster_id}.vault.skyflowapis.com`. | +| `env` | The environment your vault runs in — `Env.PROD`, `Env.SANDBOX`, `Env.DEV`, or `Env.STAGE` (defaults to `PROD`). | +| Credentials | Create a **service account** in Studio. Choose **API key** during creation for the simplest setup, or download the service-account `credentials.json` for token-based auth. See [Authentication & authorization](#authentication--authorization). | + +The quickstart below assumes a table named `table1` with `card_number` and `cardholder_name` columns. Create a matching table (or adjust the table/column names to your schema) in your vault before running it. See the [Skyflow docs](https://docs.skyflow.com/) for creating vaults, tables, and service accounts. + ### Authenticate You can use an API key or a personal bearer token to directly authenticate and authorize requests with the SDK. Use API keys for long-term service authentication. Use bearer tokens for optimal security. @@ -146,7 +167,7 @@ See [docs/advanced_initialization.md](docs/advanced_initialization.md) for advan Insert data into your vault using the `insert` method. Set `return_tokens=True` in the request to ensure values are tokenized in the response. -Create an insert request with the `InsertRequest` class, which includes the values to be inserted as a list of records. +Create an insert request with the [`InsertRequest`](docs/api_reference.md#insertrequest) class, which includes the values to be inserted as a list of records. Below is a simple example to get started. See the [Insert and tokenize data](#insert-and-tokenize-data-insertrequest) section for advanced options. @@ -168,6 +189,12 @@ insert_response = skyflow_client.vault('').insert(insert_request) print('Insert response:', insert_response) ``` +Returns an [`InsertResponse`](docs/api_reference.md#insertresponse) (`inserted_fields`, `errors`). With `return_tokens=True`, each entry includes the `skyflow_id` and a token per column: + +```text +Insert response: InsertResponse(inserted_fields=[{'skyflow_id': 'a8f0c2e1-7b3d-4f9a-8c21-1d2e3f4a5b6c', 'card_number': '5391-4629-3722-7102', 'cardholder_name': '0f6b8a2c-90ab-4cde-9def-567890abcdef'}], errors=None) +``` + ## Upgrade from v1 to v2 Upgrade from `skyflow-python` v1 using the dedicated guide in [docs/migrate_to_v2.md](docs/migrate_to_v2.md). @@ -202,6 +229,14 @@ response = skyflow_client.vault('').insert(insert_request) print('Insert response:', response) ``` +Returns an [`InsertResponse`](docs/api_reference.md#insertresponse): + +```text +Insert response: InsertResponse(inserted_fields=[{'skyflow_id': 'a8f0c2e1-7b3d-4f9a-8c21-1d2e3f4a5b6c', '': '', '': ''}], errors=None) +``` + +> With `continue_on_error=True`, each entry also carries a `request_index`, and `errors` is a list of `{request_index, request_id, error, http_code}` for the rows that failed. + #### Insert example with `continue_on_error` option Set the `continue_on_error` flag to `True` to allow insert operations to proceed despite encountering partial errors. @@ -227,7 +262,7 @@ insert_request = InsertRequest( Convert tokens back into plaintext values (or masked values) using the `.detokenize()` method. Detokenization accepts tokens and returns values. -Create a detokenization request with the `DetokenizeRequest` class, which requires a list of tokens and column groups as input. +Create a detokenization request with the [`DetokenizeRequest`](docs/api_reference.md#detokenizerequest) class, which requires a list of tokens and column groups as input. Provide optional parameters such as the redaction type and the option to continue on error. @@ -249,12 +284,18 @@ response = skyflow_client.vault('').detokenize(detokenize_request) print('Detokenization response:', response) ``` +Returns a [`DetokenizeResponse`](docs/api_reference.md#detokenizeresponse) (`detokenized_fields`, `errors`); each field has `token`, `value`, and `type`: + +```text +Detokenization response: DetokenizeResponse(detokenized_fields=[{'token': 'token1', 'value': '4111111111111111', 'type': 'STRING'}, {'token': 'token2', 'value': 'John Doe', 'type': 'STRING'}], errors=None) +``` + > [!TIP] > See the full example in the samples directory: [detokenize_records.py](samples/vault_api/detokenize_records.py) ### Get Record(s): `.get(request)` -Retrieve data using Skyflow IDs or unique column values with the `get` method. Create a get request with the `GetRequest` class, specifying parameters such as the table name, redaction type, Skyflow IDs, column names, and column values. +Retrieve data using Skyflow IDs or unique column values with the `get` method. Create a get request with the [`GetRequest`](docs/api_reference.md#getrequest) class, specifying parameters such as the table name, redaction type, Skyflow IDs, column names, and column values. > [!NOTE] > You can't use both Skyflow IDs and column name/value pairs in the same request. @@ -276,6 +317,12 @@ response = skyflow_client.vault('').get(get_request) print('Get response:', response) ``` +Returns a [`GetResponse`](docs/api_reference.md#getresponse) (`data`, `errors`), where `data` is a list of record dicts: + +```text +Get response: GetResponse(data=[{'skyflow_id': 'a8f0c2e1-7b3d-4f9a-8c21-1d2e3f4a5b6c', 'card_number': '4111111111111111', 'cardholder_name': 'John Doe'}], errors=None) +``` + #### Get by Skyflow IDs Retrieve specific records using Skyflow IDs. Use this method when you know the exact record IDs. @@ -295,6 +342,10 @@ response = skyflow_client.vault('').get(get_request) print('Data retrieval successful:', response) ``` +```text +Data retrieval successful: GetResponse(data=[{'skyflow_id': '', 'card_number': '4111111111111111', 'cardholder_name': 'John Doe'}], errors=None) +``` + #### Get tokens for records Return tokens for records to securely process sensitive data while maintaining data privacy. @@ -344,7 +395,7 @@ Use redaction types to control how sensitive data displays when retrieved from t ### Update Records -Update data in your vault using the `update` method. Create an update request with the `UpdateRequest` class, specifying parameters such as the table name and data (as a dictionary). +Update data in your vault using the `update` method. Create an update request with the [`UpdateRequest`](docs/api_reference.md#updaterequest) class, specifying parameters such as the table name and data (as a dictionary). You can pass options like `return_tokens` directly to the request. When `True`, Skyflow returns tokens for the updated records. When `False`, it returns IDs. @@ -366,12 +417,18 @@ response = skyflow_client.vault('').update(update_request) print('Update response:', response) ``` +Returns an [`UpdateResponse`](docs/api_reference.md#updateresponse) (`updated_field`, `errors`). With the default `return_tokens=False`, only the `skyflow_id` is returned; with `return_tokens=True`, tokens for the updated columns are included: + +```text +Update response: UpdateResponse(updated_field={'skyflow_id': ''}, errors=None) +``` + > [!TIP] > See the full example in the samples directory: [update_record.py](samples/vault_api/update_record.py) ### Delete Records -Delete records using Skyflow IDs with the `delete` method. Create a delete request with the `DeleteRequest` class, which accepts a list of Skyflow IDs: +Delete records using Skyflow IDs with the `delete` method. Create a delete request with the [`DeleteRequest`](docs/api_reference.md#deleterequest) class, which accepts a list of Skyflow IDs: ```python from skyflow.vault.data import DeleteRequest @@ -385,12 +442,18 @@ response = skyflow_client.vault('').delete(delete_request) print('Delete response:', response) ``` +Returns a [`DeleteResponse`](docs/api_reference.md#deleteresponse) (`deleted_ids`, `errors`): + +```text +Delete response: DeleteResponse(deleted_ids=['', '', ''], errors=None) +``` + > [!TIP] > See the full example in the samples directory: [delete_records.py](samples/vault_api/delete_records.py) ### Query -Retrieve data with SQL queries using the `query` method. Create a query request with the `QueryRequest` class, which takes the `query` parameter as follows: +Retrieve data with SQL queries using the `query` method. Create a query request with the [`QueryRequest`](docs/api_reference.md#queryrequest) class, which takes the `query` parameter as follows: ```python from skyflow.vault.data import QueryRequest @@ -403,6 +466,12 @@ response = skyflow_client.vault('').query(query_request) print('Query response:', response) ``` +Returns a [`QueryResponse`](docs/api_reference.md#queryresponse) (`fields`, `errors`), where `fields` is a list of matching record dicts (each also includes a `tokenized_data` map): + +```text +Query response: QueryResponse(fields=[{'card_number': '4111111111111111', 'cardholder_name': 'John Doe', 'tokenized_data': {}}], errors=None) +``` + > [!TIP] > See the full example in the samples directory: [query_records.py](samples/vault_api/query_records.py) @@ -410,7 +479,7 @@ Refer to [Query your data](https://docs.skyflow.com/query-data/) and [Execute Qu ### Upload File -Upload files to a Skyflow vault using the `upload_file` method. Create a file upload request with the `FileUploadRequest` class. +Upload files to a Skyflow vault using the `upload_file` method. Create a file upload request with the [`FileUploadRequest`](docs/api_reference.md#fileuploadrequest) class. **Upload a file to an existing record:** @@ -444,12 +513,18 @@ with open('path/to/file.pdf', 'rb') as file_obj: print('File upload:', response) ``` +Both forms return a [`FileUploadResponse`](docs/api_reference.md#fileuploadresponse) (`skyflow_id`, `errors`) with the ID of the record the file was attached to (or the newly created record): + +```text +File upload: FileUploadResponse(skyflow_id='a8f0c2e1-7b3d-4f9a-8c21-1d2e3f4a5b6c', errors=None) +``` + > [!TIP] > See the full example in the samples directory: [upload_file.py](samples/vault_api/upload_file.py) ### Retrieve Existing Tokens: `.tokenize(request)` -Retrieve tokens for values that already exist in the vault using the `.tokenize()` method. This method returns existing tokens only and does not generate new tokens. +Retrieve tokens for values that already exist in the vault using the `.tokenize()` method. This method returns existing tokens only and does not generate new tokens. Build the request with the [`TokenizeRequest`](docs/api_reference.md#tokenizerequest) class. #### Construct a `.tokenize()` request @@ -467,6 +542,12 @@ response = skyflow_client.vault('').tokenize(tokenize_request) print('Tokenization result:', response) ``` +Returns a [`TokenizeResponse`](docs/api_reference.md#tokenizeresponse) (`tokenized_fields`, `errors`); each field carries its `token`: + +```text +Tokenization result: TokenizeResponse(tokenized_fields=[{'token': 'a1b2c3d4-...'}, {'token': 'e5f6g7h8-...'}], errors=None) +``` + > [!TIP] > See the full example in the samples directory: [tokenize_records.py](samples/vault_api/tokenize_records.py) @@ -478,7 +559,7 @@ De-identify and reidentify sensitive data in text and files using Skyflow Detect De-identify or anonymize text using the `deidentify_text` method. -Create a de-identify text request with the `DeidentifyTextRequest` class. +Create a de-identify text request with the [`DeidentifyTextRequest`](docs/api_reference.md#deidentifytextrequest) class. ```python from skyflow.vault.detect import DeidentifyTextRequest, TokenFormat, Transformations, DateTransformation @@ -501,12 +582,18 @@ response = skyflow_client.detect('').deidentify_text(request) print('De-identify Text Response:', response) ``` +Returns a [`DeidentifyTextResponse`](docs/api_reference.md#deidentifytextresponse) (`processed_text`, `entities`, `word_count`, `char_count`, `errors`). `entities` is a list of [`EntityInfo`](docs/api_reference.md#entityinfo) describing each detected entity: + +```text +De-identify Text Response: DeidentifyTextResponse(processed_text='My SSN is [SSN_1].', entities=[...], word_count=4, char_count=18, errors=None) +``` + > [!TIP] > See the full example in the samples directory: [deidentify_text.py](samples/detect_api/deidentify_text.py) ### Re-identify Text: `.reidentify_text(request)` -Re-identify text using the `reidentify_text` method. Create a reidentify text request with the `ReidentifyTextRequest` class, which includes the redacted or de-identified text to be re-identified. +Re-identify text using the `reidentify_text` method. Create a reidentify text request with the [`ReidentifyTextRequest`](docs/api_reference.md#reidentifytextrequest) class, which includes the redacted or de-identified text to be re-identified. ```python from skyflow.vault.detect import ReidentifyTextRequest @@ -523,12 +610,18 @@ response = skyflow_client.detect().reidentify_text(request) print('Re-identify Text Response:', response) ``` +Returns a [`ReidentifyTextResponse`](docs/api_reference.md#reidentifytextresponse) (`processed_text`, `errors`): + +```text +Re-identify Text Response: ReidentifyTextResponse(processed_text='John lives in NYC', errors=None) +``` + > [!TIP] > See the full example in the samples directory: [reidentify_text.py](samples/detect_api/reidentify_text.py) ### De-identify File: `.deidentify_file(request)` -De-identify files using the `deidentify_file` method. Create a request with the `DeidentifyFileRequest` class, which includes the file to be deidentified. Provide optional parameters to control how entities are detected and deidentified. +De-identify files using the `deidentify_file` method. Create a request with the [`DeidentifyFileRequest`](docs/api_reference.md#deidentifyfilerequest) class, which includes the file to be deidentified. Provide optional parameters to control how entities are detected and deidentified. ```python from skyflow.vault.detect import DeidentifyFileRequest, TokenFormat, FileInput @@ -548,6 +641,12 @@ with open('path/to/file.pdf', 'rb') as file_obj: print('De-identify File Response:', response) ``` +Returns a [`DeidentifyFileResponse`](docs/api_reference.md#deidentifyfileresponse) with the processed file plus metadata (`file`, `type`, `extension`, `word_count`, `char_count`, `size_in_kb`, `entities`, `run_id`, `status`, `errors`, and more — see the [API Reference](docs/api_reference.md#response-objects)). If processing exceeds `wait_time`, only `run_id` and `status` are returned (poll with `get_detect_run`): + +```text +De-identify File Response: DeidentifyFileResponse(file_base64=None, file=, type='application/pdf', extension='pdf', ..., run_id='r-9c1f2a3b', status='SUCCESS', errors=None) +``` + **Supported file types:** - Documents: `doc`, `docx`, `pdf` @@ -569,7 +668,7 @@ with open('path/to/file.pdf', 'rb') as file_obj: ### Get Run: `.get_detect_run(request)` -Retrieve the results of a previously started file de-identification operation using the `get_detect_run` method. Initialize the request with the `run_id` returned from a prior .`deidentify_file` call. +Retrieve the results of a previously started file de-identification operation using the `get_detect_run` method. Build the request with the [`GetDetectRunRequest`](docs/api_reference.md#getdetectrunrequest) class, initialized with the `run_id` returned from a prior `deidentify_file` call. ```python from skyflow.vault.detect import GetDetectRunRequest @@ -582,6 +681,12 @@ response = skyflow_client.detect().get_detect_run(request) print('Get Detect Run Response:', response) ``` +Returns a [`DeidentifyFileResponse`](docs/api_reference.md#deidentifyfileresponse) with the current `status` for the run (and the processed file once `status` is complete): + +```text +Get Detect Run Response: DeidentifyFileResponse(file_base64=None, file=None, ..., run_id='r-9c1f2a3b', status='IN_PROGRESS', errors=None) +``` + > [!TIP] > See the full example in the samples directory: [get_detect_run.py](samples/detect_api/get_detect_run.py) @@ -594,7 +699,7 @@ Securely send and receive data between your systems and first- or third-party se ### Invoke a connection -To invoke a connection, use the `invoke` method of the Skyflow client. +To invoke a connection, use the `invoke` method of the Skyflow client. Build the request with the [`InvokeConnectionRequest`](docs/api_reference.md#invokeconnectionrequest) class. #### Construct an invoke connection request @@ -614,12 +719,17 @@ response = skyflow_client.connection().invoke(invoke_request) print('Connection response:', response) ``` -`method` supports the following methods: +Returns an [`InvokeConnectionResponse`](docs/api_reference.md#invokeconnectionresponse) (`data`, `metadata`, `errors`), where `data` is the connection's response body: + +```text +Connection response: InvokeConnectionResponse(data={'message': 'success'}, metadata={'request_id': 'b7d3...'}, errors=None) +``` + +`method` supports the following methods (see [`RequestMethod`](docs/api_reference.md#requestmethod)): - `GET` - `POST` - `PUT` -- `PATCH` - `DELETE` **path_params, query_params, header, body** are the JSON objects represented as dictionaries that will be sent through the connection integration url. @@ -823,6 +933,28 @@ skyflow_client = ( ) ``` +## Using the client in production + +**Build the client once and reuse it.** `Skyflow.builder()...build()` returns a long-lived client that lazily creates and caches an HTTP client and bearer token per vault. Construct it once at startup (for example, as a module-level singleton or a dependency-injected instance) and reuse it across requests. Rebuilding the client on every request discards these caches and forces unnecessary token regeneration. + +```python +# At application startup +skyflow_client = ( + Skyflow.builder() + .add_vault_config(vault_config) + .set_log_level(LogLevel.ERROR) + .build() +) + +# Reuse `skyflow_client` for the lifetime of the process +``` + +**Bearer token refresh is automatic.** When you authenticate with a service-account credentials file/string (or API key), the SDK caches the generated bearer token and regenerates it automatically once it expires. You don't need to manage token lifecycle yourself for the common case. (For the rare expire-mid-request case, see [Bearer token expiration edge cases](#bearer-token-expiration-edge-cases).) + +**Configuration mutation is not concurrency-safe.** Methods that change client configuration at runtime — `add_vault_config`, `update_vault_config`, `remove_vault_config`, the `*_connection_config` methods, and `update_skyflow_credentials` — mutate shared client state without locking. Perform configuration changes during setup, not concurrently with in-flight requests from other threads. Once configured, reusing the built client to issue operations is the intended usage pattern. + +**Timeouts and retries.** The SDK does not currently expose request timeout or automatic-retry configuration. If you need strict timeout or retry guarantees, wrap your SDK calls with your own timeout/retry logic at the application layer. + ## Error handling ### Catching `SkyflowError` instances @@ -861,6 +993,25 @@ If you encounter this kind of error, retry the request. During the retry the SDK > See the full example in the samples directory: [bearer_token_expiry_example.py](samples/service_account/bearer_token_expiry_example.py) > See [docs.skyflow.com](https://docs.skyflow.com) for more details on authentication, access control, and governance for Skyflow. +## Troubleshooting + +Most first-run problems come from configuration mismatches. Every error raised by the SDK is a `SkyflowError` exposing `http_code`, `message`, and `details` — inspect these first (see [Error handling](#error-handling)). + +| Symptom | Likely cause | Fix | +|---------|--------------|-----| +| `pip install skyflow` fails / `RuntimeError: skyflow requires Python 3.9+` | Python older than 3.9 | Use Python 3.9 or above. | +| Connection/DNS failures, or 404 on every call | Wrong `cluster_id` | `cluster_id` is the first segment of your vault URL: `https://{cluster_id}.vault.skyflowapis.com`. | +| Requests hit the wrong host / unexpected auth failures | Wrong `env` | Match `env` to where your vault runs (`Env.PROD`, `Env.SANDBOX`, `Env.DEV`, `Env.STAGE`). | +| `401 Unauthorized` | Invalid or expired credentials | Verify your API key / service-account credentials. Regenerate if needed. | +| `403 Forbidden` | Service account lacks permission for the operation | Grant the service account a role with the required permissions, or use a [scoped token](#generate-bearer-tokens-scoped-to-certain-roles) with the right role. | +| `404` referencing a table or column | Table/column doesn't exist or name mismatch | Confirm the table and column names match your vault schema exactly (case-sensitive). | +| Vault not found / 404 with a valid `cluster_id` | Wrong `vault_id` | Copy `vault_id` from the vault's details page in Skyflow Studio. | +| `Authentication failed. Bearer token is expired.` | Token expired between verification and the API call | Retry the request; the SDK regenerates the token. See [Bearer token expiration edge cases](#bearer-token-expiration-edge-cases). | +| Unexpected credential is used | Multiple credentials provided | Only one credential type is used at a time; the last one added takes precedence. Provide exactly one. | +| `RequestMethod.PATCH` raises `AttributeError` | `PATCH` is not a supported connection method | Use `GET`, `POST`, `PUT`, or `DELETE` (see [`RequestMethod`](docs/api_reference.md#requestmethod)). | + +If you're stuck, set `set_log_level(LogLevel.DEBUG)` during development for detailed SDK logs (see [Logging](#logging)). + ## Security ### Reporting a Vulnerability diff --git a/docs/api_reference.md b/docs/api_reference.md new file mode 100644 index 0000000..cc5fa61 --- /dev/null +++ b/docs/api_reference.md @@ -0,0 +1,508 @@ +# API Reference + +A reference for the public Skyflow Python SDK surface: client-management methods, request and response objects, enums, Detect helper classes, and service-account functions. For task-oriented usage and examples, see the [README](../README.md). + +All attributes, parameters, and enum values below are taken directly from the SDK source. + +## Table of Contents + +- [Client management methods](#client-management-methods) +- [Request objects](#request-objects) +- [Response objects](#response-objects) +- [Enums](#enums) +- [Detect helper classes](#detect-helper-classes) +- [Service account functions](#service-account-functions) + +--- + +## Client management methods + +In addition to the builder methods (`add_vault_config`, `add_connection_config`, `add_skyflow_credentials`, `set_log_level`, `build`) and the operation accessors (`vault()`, `connection()`, `detect()`), a built `Skyflow` client exposes methods to mutate its configuration and logging at runtime. + +| Method | Purpose | +|--------|---------| +| `add_vault_config(config)` | Add a vault configuration after build. | +| `remove_vault_config(vault_id)` | Remove a vault configuration. | +| `update_vault_config(config)` | Update an existing vault configuration. | +| `get_vault_config(vault_id)` | Retrieve a vault configuration. | +| `add_connection_config(config)` | Add a connection configuration. | +| `remove_connection_config(connection_id)` | Remove a connection configuration. | +| `update_connection_config(config)` | Update a connection configuration. | +| `get_connection_config(connection_id)` | Retrieve a connection configuration. | +| `add_skyflow_credentials(credentials)` | Add common Skyflow credentials applied across configs. | +| `update_skyflow_credentials(credentials)` | Update the common Skyflow credentials. | +| `set_log_level(log_level)` | Set the log level (builder + client). | +| `update_log_level(log_level)` | Change the log level after initialization. | +| `get_log_level()` | Return the current log level. | +| `vault(vault_id=None)` | Get a vault controller for the given (or default) vault. | +| `connection(connection_id=None)` | Get a connection controller. | +| `detect(vault_id=None)` | Get a Detect controller. | + +```python +# Example: manage configuration after the client is built +skyflow_client.add_vault_config(another_vault_config) +skyflow_client.update_log_level(LogLevel.DEBUG) +current_level = skyflow_client.get_log_level() +``` + +--- + +## Request objects + +Parameters are listed with their defaults as defined in the constructors. + +### `InsertRequest` + +`skyflow.vault.data` — passed to `vault().insert()`. + +| Parameter | Default | Description | +|-----------|---------|-------------| +| `table` | _(required)_ | Target table name. | +| `values` | _(required)_ | List of record dicts to insert. | +| `tokens` | `None` | Bring-your-own-token values, aligned with `values` (used with `token_mode`). | +| `upsert` | `None` | Column name to use as the upsert index (must have a `unique` constraint). | +| `homogeneous` | `False` | Treat the batch as homogeneous (all records share the same columns). | +| `token_mode` | `TokenMode.DISABLE` | BYOT mode. See [`TokenMode`](#tokenmode). | +| `return_tokens` | `True` | Return tokens for inserted values. | +| `continue_on_error` | `False` | Continue the batch despite partial errors. | + +### `UpdateRequest` + +`skyflow.vault.data` — passed to `vault().update()`. + +| Parameter | Default | Description | +|-----------|---------|-------------| +| `table` | _(required)_ | Target table name. | +| `data` | _(required)_ | Dict containing `skyflow_id` and the columns to update. | +| `tokens` | `None` | BYOT values for the updated columns. | +| `return_tokens` | `False` | Return tokens (vs. IDs) for updated records. | +| `token_mode` | `TokenMode.DISABLE` | BYOT mode. See [`TokenMode`](#tokenmode). | + +### `GetRequest` + +`skyflow.vault.data` — passed to `vault().get()`. + +| Parameter | Default | Description | +|-----------|---------|-------------| +| `table` | _(required)_ | Target table name. | +| `ids` | `None` | Skyflow IDs to retrieve. Mutually exclusive with `column_name`/`column_values`. | +| `redaction_type` | `None` | See [`RedactionType`](#redactiontype). | +| `return_tokens` | `False` | Return tokens instead of values. | +| `fields` | `None` | Specific fields/columns to return. | +| `offset` | `None` | Pagination offset. | +| `limit` | `None` | Pagination limit. | +| `download_url` | `None` | Return file download URLs for file columns. | +| `column_name` | `None` | Unique column to look up by. Mutually exclusive with `ids`. | +| `column_values` | `None` | Values for `column_name`. | + +### `FileUploadRequest` + +`skyflow.vault.data` — passed to `vault().upload_file()`. Provide exactly one file source: `file_object`, `file_path`, or `base64`. + +| Parameter | Default | Description | +|-----------|---------|-------------| +| `table` | _(required)_ | Target table name. | +| `column_name` | `None` | File column name. | +| `skyflow_id` | `None` | Existing record ID. Omit to create a new record. | +| `file_path` | `None` | Path to a file to upload. | +| `base64` | `None` | Base64-encoded file content. | +| `file_object` | `None` | An open binary file object. | +| `file_name` | `None` | Override the file name. | + +### `FileInput` + +`skyflow.vault.detect` — wrapper for a file passed to `DeidentifyFileRequest`. Provide one of: + +| Parameter | Default | Description | +|-----------|---------|-------------| +| `file` | `None` | An open binary file (`BufferedReader`). | +| `file_path` | `None` | Path to a file. | + +### `DeidentifyTextRequest` + +`skyflow.vault.detect` — passed to `detect().deidentify_text()`. + +| Parameter | Default | Description | +|-----------|---------|-------------| +| `text` | _(required)_ | Text to de-identify. | +| `entities` | `None` | Entity types to detect. See `DetectEntities`. | +| `allow_regex_list` | `None` | Regex patterns to always treat as detectable. | +| `restrict_regex_list` | `None` | Regex patterns to exclude from detection. | +| `token_format` | `None` | `TokenFormat` controlling token types per entity. | +| `transformations` | `None` | `Transformations` (e.g. date shifting). | + +### `DeidentifyFileRequest` + +`skyflow.vault.detect` — passed to `detect().deidentify_file()`. + +| Parameter | Default | Description | +|-----------|---------|-------------| +| `file` | `None` | A `FileInput`. | +| `entities` | `None` | Entity types to detect. | +| `allow_regex_list` | `None` | Regex patterns to always treat as detectable. | +| `restrict_regex_list` | `None` | Regex patterns to exclude. | +| `token_format` | `None` | `TokenFormat` per entity. | +| `transformations` | `None` | `Transformations` (not supported for Documents/Images/PDFs). | +| `output_processed_image` | `None` | Include the processed image in output. | +| `output_ocr_text` | `None` | Include OCR text in the response. | +| `masking_method` | `None` | See [`MaskingMethod`](#maskingmethod). | +| `pixel_density` | `None` | Pixel density for PDF processing. | +| `max_resolution` | `None` | Max resolution for PDF processing. | +| `output_processed_audio` | `None` | Include processed audio. | +| `output_transcription` | `None` | See [`DetectOutputTranscriptions`](#detectoutputtranscriptions). | +| `bleep` | `None` | Audio bleep config. See [`Bleep`](#bleep). | +| `output_directory` | `None` | Directory to write the processed file. | +| `wait_time` | `None` | Max seconds to wait (≤ 64). | + +### `DetokenizeRequest` + +`skyflow.vault.tokens` — passed to `vault().detokenize()`. + +| Parameter | Default | Description | +|-----------|---------|-------------| +| `data` | _(required)_ | List of `{token, redaction_type}` dicts to detokenize. See [`RedactionType`](#redactiontype). | +| `continue_on_error` | `False` | Continue despite per-token errors. | + +### `TokenizeRequest` + +`skyflow.vault.tokens` — passed to `vault().tokenize()`. + +| Parameter | Default | Description | +|-----------|---------|-------------| +| `values` | _(required)_ | List of `{value, column_group}` dicts to tokenize. | + +### `DeleteRequest` + +`skyflow.vault.data` — passed to `vault().delete()`. + +| Parameter | Default | Description | +|-----------|---------|-------------| +| `table` | _(required)_ | Target table name. | +| `ids` | _(required)_ | List of Skyflow IDs to delete. | + +### `QueryRequest` + +`skyflow.vault.data` — passed to `vault().query()`. + +| Parameter | Default | Description | +|-----------|---------|-------------| +| `query` | _(required)_ | The SQL query string to execute. | + +### `ReidentifyTextRequest` + +`skyflow.vault.detect` — passed to `detect().reidentify_text()`. + +| Parameter | Default | Description | +|-----------|---------|-------------| +| `text` | _(required)_ | The redacted/de-identified text to re-identify. | +| `redacted_entities` | `None` | Entity types to keep redacted. See `DetectEntities`. | +| `masked_entities` | `None` | Entity types to mask. | +| `plain_text_entities` | `None` | Entity types to reveal as plain text. | + +### `GetDetectRunRequest` + +`skyflow.vault.detect` — passed to `detect().get_detect_run()`. + +| Parameter | Default | Description | +|-----------|---------|-------------| +| `run_id` | _(required)_ | The `run_id` returned by a prior `deidentify_file` call. | + +### `InvokeConnectionRequest` + +`skyflow.vault.connection` — passed to `connection().invoke()`. + +| Parameter | Default | Description | +|-----------|---------|-------------| +| `method` | _(required)_ | HTTP method. See [`RequestMethod`](#requestmethod). | +| `body` | `None` | Request body (dict). | +| `path_params` | `None` | Path parameters (dict). | +| `query_params` | `None` | Query parameters (dict). | +| `headers` | `None` | Request headers (dict). | + +--- + +## Response objects + +Every vault, token, connection, and Detect operation returns a typed response object. Each attribute below lists its type and meaning. Types use `| None` to mark attributes that may be absent. + +> **The `errors` attribute** is common to most responses. It is `list[dict] | None` and is populated only on partial failure (for example when `continue_on_error=True`); it is `None` when there are no errors. Each error dict contains `request_index`, `request_id`, `error`, and `http_code`. The per-class tables below describe only the operation-specific attributes and refer back to this note for `errors`. + +```python +response = skyflow_client.vault('').insert(insert_request) +print(response.inserted_fields) # list of inserted records (with tokens if return_tokens=True) +print(response.errors) # None unless there was a partial failure +``` + +### `InsertResponse` + +`skyflow.vault.data` — returned by `vault().insert()`. + +| Attribute | Type | Description | +|-----------|------|-------------| +| `inserted_fields` | `list[dict]` | One entry per inserted record. Each has `skyflow_id`; with `return_tokens=True`, also a token per column; with `continue_on_error=True`, also a `request_index`. | +| `errors` | `list[dict] \| None` | See the note above. | + +### `GetResponse` + +`skyflow.vault.data` — returned by `vault().get()`. + +| Attribute | Type | Description | +|-----------|------|-------------| +| `data` | `list[dict]` | Retrieved records as `field → value` dicts (tokens instead of values when `return_tokens=True`). Defaults to `[]`. | +| `errors` | `list[dict] \| None` | See the note above. | + +### `DeleteResponse` + +`skyflow.vault.data` — returned by `vault().delete()`. + +| Attribute | Type | Description | +|-----------|------|-------------| +| `deleted_ids` | `list[str] \| None` | Skyflow IDs of the deleted records. | +| `errors` | `list[dict] \| None` | See the note above. | + +### `UpdateResponse` + +`skyflow.vault.data` — returned by `vault().update()`. + +| Attribute | Type | Description | +|-----------|------|-------------| +| `updated_field` | `dict` | The updated record: `skyflow_id`, plus a token per updated column when `return_tokens=True`. | +| `errors` | `list[dict] \| None` | See the note above. | + +### `QueryResponse` + +`skyflow.vault.data` — returned by `vault().query()`. + +| Attribute | Type | Description | +|-----------|------|-------------| +| `fields` | `list[dict]` | Matching records. Each record dict also includes a `tokenized_data` map. | +| `errors` | `list[dict] \| None` | See the note above. | + +### `FileUploadResponse` + +`skyflow.vault.data` — returned by `vault().upload_file()`. + +| Attribute | Type | Description | +|-----------|------|-------------| +| `skyflow_id` | `str` | ID of the record the file was attached to (or of the newly created record). | +| `errors` | `list[dict] \| None` | See the note above. | + +### `DetokenizeResponse` + +`skyflow.vault.tokens` — returned by `vault().detokenize()`. + +| Attribute | Type | Description | +|-----------|------|-------------| +| `detokenized_fields` | `list[dict]` | One entry per token, each with `token`, `value` (plaintext or masked), and `type` (the value type). | +| `errors` | `list[dict] \| None` | See the note above. | + +### `TokenizeResponse` + +`skyflow.vault.tokens` — returned by `vault().tokenize()`. + +| Attribute | Type | Description | +|-----------|------|-------------| +| `tokenized_fields` | `list[dict]` | One entry per value, each with its `token`. | +| `errors` | `list[dict] \| None` | See the note above. | + +### `InvokeConnectionResponse` + +`skyflow.vault.connection` — returned by `connection().invoke()`. + +| Attribute | Type | Description | +|-----------|------|-------------| +| `data` | `dict` | The connection's response body. | +| `metadata` | `dict` | Response metadata (for example `request_id`). Defaults to `{}`. | +| `errors` | `list[dict] \| None` | See the note above. | + +### `DeidentifyTextResponse` + +`skyflow.vault.detect` — returned by `detect().deidentify_text()`. + +| Attribute | Type | Description | +|-----------|------|-------------| +| `processed_text` | `str` | The de-identified text. | +| `entities` | `list[EntityInfo]` | Detected entities. See [`EntityInfo`](#entityinfo). | +| `word_count` | `int` | Word count of the input text. | +| `char_count` | `int` | Character count of the input text. | +| `errors` | `list \| None` | See the note above. | + +### `ReidentifyTextResponse` + +`skyflow.vault.detect` — returned by `detect().reidentify_text()`. + +| Attribute | Type | Description | +|-----------|------|-------------| +| `processed_text` | `str` | The re-identified text. | +| `errors` | `list \| None` | See the note above. | + +### `DeidentifyFileResponse` + +`skyflow.vault.detect` — returned by `detect().deidentify_file()` and `detect().get_detect_run()`. All non-error attributes are optional (default `None`) and are populated based on the file type and processing status. If processing exceeds `wait_time`, only `run_id` and `status` are set; poll with `get_detect_run`. + +| Attribute | Type | Description | +|-----------|------|-------------| +| `file_base64` | `str \| None` | The processed file as a base64 string. | +| `file` | `File \| None` | The processed file wrapper. See [`File`](#file). | +| `type` | `str \| None` | MIME type of the processed file. | +| `extension` | `str \| None` | File extension of the processed file. | +| `word_count` | `int \| None` | Word count (text-bearing files). | +| `char_count` | `int \| None` | Character count (text-bearing files). | +| `size_in_kb` | `float \| None` | Size of the processed file in KB. | +| `duration_in_seconds` | `float \| None` | Duration in seconds (audio files). | +| `page_count` | `int \| None` | Page count (PDF/document files). | +| `slide_count` | `int \| None` | Slide count (presentation files). | +| `entities` | `list[EntityInfo]` | Detected entities. Defaults to `[]`. See [`EntityInfo`](#entityinfo). | +| `run_id` | `str \| None` | Run identifier; pass to `get_detect_run` to poll for results. | +| `status` | `str \| None` | Processing status of the run. | +| `errors` | `list \| None` | See the note above. | + +--- + +## Enums + +All enums are importable from `skyflow.utils.enums`. + +### `Env` + +Deployment environment. Values: `DEV`, `SANDBOX`, `PROD`, `STAGE`. + +### `EnvUrls` + +Vault hostnames per environment (used internally; exported for reference). + +| Member | Host | +|--------|------| +| `PROD` | `vault.skyflowapis.com` | +| `SANDBOX` | `vault.skyflowapis-preview.com` | +| `DEV` | `vault.skyflowapis.dev` | +| `STAGE` | `vault.skyflowapis.tech` | + +### `LogLevel` + +`DEBUG`, `INFO`, `WARN`, `ERROR`, `OFF`. See [Logging](../README.md#logging). + +### `RedactionType` + +How retrieved data is displayed. Values: `PLAIN_TEXT`, `MASKED`, `DEFAULT`, `REDACTED`. See [Redaction Types](../README.md#redaction-types). + +### `TokenMode` + +Bring-your-own-token mode for `InsertRequest`/`UpdateRequest`. + +| Member | Meaning | +|--------|---------| +| `DISABLE` | Do not accept caller-supplied tokens (default). | +| `ENABLE` | Accept caller-supplied tokens. | +| `ENABLE_STRICT` | Accept caller-supplied tokens with strict validation. | + +### `TokenType` + +Token format for Detect. Values: `VAULT_TOKEN` (`vault_token`), `ENTITY_UNIQUE_COUNTER` (`entity_unq_counter`), `ENTITY_ONLY` (`entity_only`). + +### `ContentType` + +Content type for connection requests. Values: `JSON`, `PLAINTEXT`, `XML`, `URLENCODED`, `FORMDATA`, `HTML`. + +### `RequestMethod` + +HTTP method for connections. Values: `GET`, `POST`, `PUT`, `DELETE`, `NONE`. + +> Note: `PATCH` is **not** a member of this enum. + +### `MaskingMethod` + +Image masking method for Detect file de-identification. Values: `BLACKBOX` (`blackbox`), `BLUR` (`blur`). + +### `DetectOutputTranscriptions` + +Audio transcription output type for Detect. Values: `DIARIZED_TRANSCRIPTION`, `MEDICAL_DIARIZED_TRANSCRIPTION`, `MEDICAL_TRANSCRIPTION`, `TRANSCRIPTION`, `PLAINTEXT_TRANSCRIPTION`. + +### `DetectEntities` + +Entity types Detect can identify (e.g. `SSN`, `CREDIT_CARD`, `NAME`, `DOB`). Import from `skyflow.utils.enums`. + +--- + +## Detect helper classes + +Importable from `skyflow.vault.detect`. + +### `EntityInfo` + +A detected entity, returned inside `DeidentifyTextResponse.entities`. + +| Attribute | Type | Description | +|-----------|------|-------------| +| `token` | `str` | The token replacing the entity. | +| `value` | `str` | The original entity value. | +| `text_index` | `TextIndex` | Position in the input text. | +| `processed_index` | `TextIndex` | Position in the processed text. | +| `entity` | `str` | Entity type. | +| `scores` | `Dict[str, float]` | Confidence scores. | + +### `TextIndex` + +| Attribute | Type | Description | +|-----------|------|-------------| +| `start` | `int` | Start offset. | +| `end` | `int` | End offset. | + +### `Bleep` + +Audio bleep configuration for `DeidentifyFileRequest`. + +| Attribute | Type | Description | +|-----------|------|-------------| +| `gain` | `float` | Loudness in dB. | +| `frequency` | `float` | Pitch in Hz. | +| `start_padding` | `float` | Padding at start (seconds). | +| `stop_padding` | `float` | Padding at end (seconds). | + +### `File` + +Wrapper around the processed file returned in `DeidentifyFileResponse.file`. + +| Member | Kind | Description | +|--------|------|-------------| +| `name` | property | File name. | +| `size` | property | Size in bytes. | +| `type` | property | MIME/type string. | +| `last_modified` | property | Last-modified timestamp. | +| `seek(offset, whence=0)` | method | Seek within the file. | +| `read(size=-1)` | method | Read file content. | + +--- + +## Service account functions + +Importable from `skyflow.service_account`. See [Authentication & authorization](../README.md#authentication--authorization) for `generate_bearer_token`, `generate_bearer_token_from_creds`, and `generate_signed_data_tokens`. + +### `is_expired(token, logger=None)` + +Returns `True` if the given bearer token is expired (or `None`). Useful for caching tokens and only regenerating when needed. + +```python +from skyflow.service_account import generate_bearer_token, is_expired + +if cached_token is None or is_expired(cached_token): + cached_token, _ = generate_bearer_token('path/to/credentials.json') +``` + +### `generate_signed_data_tokens_from_creds(credentials, options)` + +The credentials-string counterpart to `generate_signed_data_tokens(filepath, options)`. Accepts a JSON credentials string instead of a file path; `options` is the same (`data_tokens`, `time_to_live`, `ctx`). + +```python +import os +from skyflow.service_account import generate_signed_data_tokens_from_creds + +signed_tokens = generate_signed_data_tokens_from_creds( + os.getenv('SKYFLOW_CREDENTIALS'), + { + 'data_tokens': ['dataToken1', 'dataToken2'], + 'time_to_live': 90, + 'ctx': 'user_12345', + }, +) +``` diff --git a/requirements.txt b/requirements.txt index bc927eb..d8c5fea 100644 --- a/requirements.txt +++ b/requirements.txt @@ -1,5 +1,5 @@ python_dateutil >= 2.5.3 -setuptools >= 21.0.0 +setuptools >= 75.3.3 urllib3 >= 1.25.3, < 3 pydantic >= 2 typing-extensions >= 4.7.1 diff --git a/samples/README.md b/samples/README.md new file mode 100644 index 0000000..e3d20c4 --- /dev/null +++ b/samples/README.md @@ -0,0 +1,65 @@ +# Skyflow Python SDK — Samples + +Runnable examples for the Skyflow Python SDK, grouped by area. Start with the [README](../README.md) and [API Reference](../docs/api_reference.md) for full documentation. + +## Prerequisites + +- Python 3.9 or above +- The SDK installed: `pip install skyflow` +- A Skyflow account, a vault, and a service account (see [Before you begin](../README.md#before-you-begin)) +- Your `vault_id`, `cluster_id`, `env`, and one credential (API key, bearer token, or service-account credentials) + +## Configure + +The samples ship with inline `` strings (for example ``). Replace the placeholders in the sample you want to run with your own values before running it. + +> Never commit real credentials. + +## Run a sample + +```bash +python samples/vault_api/insert_records.py +``` + +## What's here + +### `vault_api/` +Core vault data operations. + +| Sample | Demonstrates | +|--------|--------------| +| `client_operations.py` | Building and managing the Skyflow client | +| `credentials_options.py` | The different credential types | +| `insert_records.py` | Inserting and tokenizing records (`continue_on_error`) | +| `insert_byot.py` | Bring-your-own-token inserts | +| `get_records.py` | Getting records by Skyflow ID | +| `get_column_values.py` | Getting records by column name/values | +| `update_record.py` | Updating a record | +| `delete_records.py` | Deleting records | +| `query_records.py` | SQL queries | +| `detokenize_records.py` | Detokenizing tokens | +| `tokenize_records.py` | Retrieving existing tokens | +| `upload_file.py` | Uploading a file to a record | +| `invoke_connection.py` | Invoking a Skyflow Connection | + +### `detect_api/` +Skyflow Detect (de-identification / re-identification). + +| Sample | Demonstrates | +|--------|--------------| +| `deidentify_text.py` | De-identifying text | +| `reidentify_text.py` | Re-identifying text | +| `deidentify_file.py` | De-identifying a file | +| `deidentify_file_async.py` | Running a file de-identification on a background thread (thread-based concurrency, not asyncio) | +| `get_detect_run.py` | Polling a file de-identification run by `run_id` | + +### `service_account/` +Bearer-token and signed-data-token generation. + +| Sample | Demonstrates | +|--------|--------------| +| `token_generation_example.py` | Generating a bearer token | +| `scoped_token_generation_example.py` | Tokens scoped to specific roles | +| `token_generation_with_context_example.py` | Tokens with context (`ctx`) | +| `signed_token_generation_example.py` | Signed data tokens | +| `bearer_token_expiry_example.py` | Handling token expiry / regeneration | diff --git a/samples/detect_api/deidentify_file_async.py b/samples/detect_api/deidentify_file_async.py index 23d2f40..d3144f3 100644 --- a/samples/detect_api/deidentify_file_async.py +++ b/samples/detect_api/deidentify_file_async.py @@ -12,15 +12,18 @@ from concurrent.futures import ThreadPoolExecutor """ - * Skyflow Deidentify File Example - * - * This sample demonstrates how to use all available options for deidentifying files - * using an asynchronous approach. - * Supported file types: images (jpg, png, etc.), pdf, audio (mp3, wav), documents, + * Skyflow Deidentify File Example (concurrent) + * + * This sample demonstrates how to use all available options for deidentifying files. + * The SDK is synchronous; this example runs the (blocking) deidentify_file call on a + * background thread using concurrent.futures.ThreadPoolExecutor so the main thread can + * continue working. This is thread-based concurrency, not asyncio — the SDK does not + * expose async/await coroutines. + * Supported file types: images (jpg, png, etc.), pdf, audio (mp3, wav), documents, * spreadsheets, presentations, structured text. """ -def perform_file_deidentification_async(): +def perform_file_deidentification_concurrent(): try: # Step 1: Configure Credentials credentials = { diff --git a/setup.py b/setup.py index 5e833f8..83c5b49 100644 --- a/setup.py +++ b/setup.py @@ -18,6 +18,11 @@ author='Skyflow', author_email='service-ops@skyflow.com', packages=find_packages(where='.', exclude=['test*']), + # Ship PEP 561 markers so type checkers (mypy/pyright) see the SDK's types. + package_data={ + 'skyflow': ['py.typed'], + 'skyflow.generated.rest': ['py.typed'], + }, url='https://github.com/skyflowapi/skyflow-python/', license='LICENSE', description='Skyflow SDK for the Python programming language', diff --git a/skyflow/py.typed b/skyflow/py.typed new file mode 100644 index 0000000..e69de29