Developer Interface

Authentication

class tamr_unify_client.auth.UsernamePasswordAuth(username, password)[source]

Provides username/password authentication for Tamr. Specifically, sets the Authorization HTTP header with Tamr’s custom BasicCreds format.

Parameters
  • username (str) –

  • password (str) –

Usage:
>>> from tamr_unify_client.auth import UsernamePasswordAuth
>>> auth = UsernamePasswordAuth('my username', 'my password')
>>> import tamr_unify_client as api
>>> unify = api.Client(auth)

Client

class tamr_unify_client.Client(auth, host='localhost', protocol='http', port=9100, base_path='/api/versioned/v1/', session=None)[source]

Python Client for Tamr API.

Each client is specific to a specific origin (protocol, host, port).

Parameters
  • auth (AuthBase) –

    Tamr-compatible Authentication provider.

    Recommended: use one of the classes described in Authentication

  • host (str) – Host address of remote Tamr instance (e.g. '10.0.10.0')

  • protocol (str) – Either 'http' or 'https'

  • port (int) – Tamr instance main port

  • base_path (str) – Base API path. Requests made by this client will be relative to this path.

  • session (Optional[Session]) – Session to use for API calls. If none is provided, will use a new requests.Session.

Example

>>> from tamr_unify_client import Client
>>> from tamr_unify_client.auth import UsernamePasswordAuth
>>> auth = UsernamePasswordAuth('my username', 'my password')
>>> tamr_local = Client(auth) # on http://localhost:9100
>>> tamr_remote = Client(auth, protocol='https', host='10.0.10.0') # on https://10.0.10.0:9100
property origin

HTTP origin i.e. <protocol>://<host>[:<port>].

For additional information, see MDN web docs .

Return type

str

request(method, endpoint, **kwargs)[source]

Sends a request to Tamr.

The URL for the request will be <origin>/<base_path>/<endpoint>. The request is authenticated via Client.auth.

Parameters
  • method (str) – The HTTP method to use (e.g. ‘GET’ or ‘POST’)

  • endpoint (str) – API endpoint to call (relative to the Base API path for this client).

Return type

Response

Returns

HTTP response from the Tamr server

get(endpoint, **kwargs)[source]

Calls request() with the "GET" method.

post(endpoint, **kwargs)[source]

Calls request() with the "POST" method.

put(endpoint, **kwargs)[source]

Calls request() with the "PUT" method.

delete(endpoint, **kwargs)[source]

Calls request() with the "DELETE" method.

property projects

Collection of all projects on this Tamr instance.

Return type

ProjectCollection

Returns

Collection of all projects.

property datasets

Collection of all datasets on this Tamr instance.

Return type

DatasetCollection

Returns

Collection of all datasets.

Attributes

Attribute

class tamr_unify_client.attribute.resource.Attribute(client, data, alias=None)[source]

A Tamr Attribute.

See https://docs.tamr.com/reference#attribute-types

property relative_id
Type

str

property name
Type

str

property description
Type

str

property type
Type

AttributeType

property is_nullable
Type

bool

spec()[source]

Returns a spec representation of this attribute.

Returns

The attribute spec.

Return type

AttributeSpec

delete()

Deletes this resource. Some resources do not support deletion, and will raise a 405 error if this is called.

Returns

HTTP response from the server

Return type

requests.Response

property resource_id
Type

str

Attribute Spec

class tamr_unify_client.attribute.resource.AttributeSpec(client, data, api_path)[source]

A representation of the server view of an attribute

static of(resource)[source]

Creates an attribute spec from an attribute.

Parameters

resource (Attribute) – The existing attribute.

Returns

The corresponding attribute spec.

Return type

AttributeSpec

static new()[source]

Creates a blank spec that could be used to construct a new attribute.

Returns

The empty spec.

Return type

AttributeSpec

from_data(data)[source]

Creates a spec with the same client and API path as this one, but new data.

Parameters

data (dict) – The data for the new spec.

Returns

The new spec.

Return type

AttributeSpec

to_dict()[source]

Returns a version of this spec that conforms to the API representation.

Returns

The spec’s dict.

Return type

dict

with_name(new_name)[source]

Creates a new spec with the same properties, updating name.

Parameters

new_name (str) – The new name.

Returns

The new spec.

Return type

AttributeSpec

with_description(new_description)[source]

Creates a new spec with the same properties, updating description.

Parameters

new_description (str) – The new description.

Returns

The new spec.

Return type

AttributeSpec

with_type(new_type)[source]

Creates a new spec with the same properties, updating type.

Parameters

new_type (AttributeTypeSpec) – The spec of the new type.

Returns

The new spec.

Return type

AttributeSpec

with_is_nullable(new_is_nullable)[source]

Creates a new spec with the same properties, updating is nullable.

Parameters

new_is_nullable (bool) – The new is nullable.

Returns

The new spec.

Return type

AttributeSpec

put()[source]

Commits the changes and updates the attribute in Tamr.

Returns

The updated attribute.

Return type

Attribute

Attribute Collection

class tamr_unify_client.attribute.collection.AttributeCollection(client, api_path)[source]

Collection of Attribute s.

Parameters
  • client (Client) – Client for API call delegation.

  • api_path (str) – API path used to access this collection. E.g. "datasets/1/attributes".

by_resource_id(resource_id)[source]

Retrieve an attribute by resource ID.

Parameters

resource_id (str) – The resource ID. E.g. "AttributeName"

Returns

The specified attribute.

Return type

Attribute

by_relative_id(relative_id)[source]

Retrieve an attribute by relative ID.

Parameters

relative_id (str) – The resource ID. E.g. "datasets/1/attributes/AttributeName"

Returns

The specified attribute.

Return type

Attribute

by_external_id(external_id)[source]

Retrieve an attribute by external ID.

Since attributes do not have external IDs, this method is not supported and will raise a NotImplementedError .

Parameters

external_id (str) – The external ID.

Returns

The specified attribute, if found.

Return type

Attribute

Raises
  • KeyError – If no attribute with the specified external_id is found

  • LookupError – If multiple attributes with the specified external_id are found

stream()[source]

Stream attributes in this collection. Implicitly called when iterating over this collection.

Returns

Stream of attributes.

Return type

Python generator yielding Attribute

Usage:
>>> for attribute in collection.stream(): # explicit
>>>     do_stuff(attribute)
>>> for attribute in collection: # implicit
>>>     do_stuff(attribute)
by_name(attribute_name)[source]

Lookup a specific attribute in this collection by exact-match on name.

Parameters

attribute_name (str) – Name of the desired attribute.

Returns

Attribute with matching name in this collection.

Return type

Attribute

create(creation_spec)[source]

Create an Attribute in this collection

Parameters

creation_spec (dict[str, str]) – Attribute creation specification should be formatted as specified in the Public Docs for adding an Attribute.

Returns

The created Attribute

Return type

Attribute

delete_by_resource_id(resource_id)

Deletes a resource from this collection by resource ID.

Parameters

resource_id (str) – The resource ID of the resource that will be deleted.

Returns

HTTP response from the server.

Return type

requests.Response

Attribute Type

class tamr_unify_client.attribute.type.AttributeType(data)[source]

The type of an Attribute or SubAttribute.

See https://docs.tamr.com/reference#attribute-types

Parameters

data (dict) – JSON data representing this type

property base_type
Type

str

property inner_type
Type

AttributeType

property attributes
Type

list[SubAttribute]

spec()[source]

Returns a spec representation of this attribute type.

Returns

The attribute type spec.

Return type

AttributeTypeSpec

Attribute Type Spec

class tamr_unify_client.attribute.type.AttributeTypeSpec(data)[source]
static of(resource)[source]

Creates an attribute type spec from an attribute type.

Parameters

resource (AttributeType) – The existing attribute type.

Returns

The corresponding attribute type spec.

Return type

AttributeTypeSpec

static new()[source]

Creates a blank spec that could be used to construct a new attribute type.

Returns

The empty spec.

Return type

AttributeTypeSpec

to_dict()[source]

Returns a version of this spec that conforms to the API representation.

Returns

The spec’s dict.

Return type

dict

with_base_type(new_base_type)[source]

Creates a new spec with the same properties, updating the base type.

Parameters

new_base_type (str) – The new base type.

Returns

The new spec.

Return type

AttributeTypeSpec

with_inner_type(new_inner_type)[source]

Creates a new spec with the same properties, updating the inner type.

Parameters

new_inner_type (AttributeTypeSpec) – The spec of the new inner type.

Returns

The new spec.

Return type

AttributeTypeSpec

with_attributes(new_attributes)[source]

Creates a new spec with the same properties, updating attributes.

Parameters

new_attributes (list[AttributeSpec]) – The specs of the new attributes.

Returns

The new spec.

Return type

AttributeTypeSpec

SubAttribute

class tamr_unify_client.attribute.subattribute.SubAttribute(name, type, is_nullable, _json, description=None)[source]

An attribute which is itself a property of another attribute.

See https://docs.tamr.com/reference#attribute-types

Parameters
static from_json(data)[source]

Create a SubAttribute from JSON data.

Parameters

data (Dict[str, Any]) – JSON data received from Tamr server.

Return type

SubAttribute

Categorization

Categorization Project

class tamr_unify_client.categorization.project.CategorizationProject(client, data, alias=None)[source]

A Categorization project in Tamr.

model()[source]

Machine learning model for this Categorization project. Learns from verified labels and predicts categorization labels for unlabeled records.

Returns

The machine learning model for categorization.

Return type

MachineLearningModel

create_taxonomy(creation_spec)[source]

Creates a Taxonomy for this project.

A taxonomy cannot already be associated with this project.

Parameters

creation_spec (dict) – The creation specification for the taxonomy, which can include name.

Returns

The new Taxonomy

Return type

Taxonomy

taxonomy()[source]

Retrieves the Taxonomy associated with this project. If a taxonomy is not already associated with this project, call create_taxonomy() first.

Returns

The project’s Taxonomy

Return type

Taxonomy

add_input_dataset(dataset)

Associate a dataset with a project in Tamr.

By default, datasets are not associated with any projects. They need to be added as input to a project before they can be used as part of that project

Parameters

dataset (Dataset) – The dataset to associate with the project.

Returns

HTTP response from the server

Return type

requests.Response

as_categorization()

Convert this project to a CategorizationProject

Returns

This project.

Return type

CategorizationProject

Raises

TypeError – If the type of this project is not "CATEGORIZATION"

as_mastering()

Convert this project to a MasteringProject

Returns

This project.

Return type

MasteringProject

Raises

TypeError – If the type of this project is not "DEDUP"

attribute_configurations()

Project’s attribute’s configurations.

Returns

The configurations of the attributes of a project.

Return type

AttributeConfigurationCollection

attribute_mappings()

Project’s attribute’s mappings.

Returns

The attribute mappings of a project.

Return type

AttributeMappingCollection

property attributes

Attributes of this project.

Returns

Attributes of this project.

Return type

AttributeCollection

delete()

Deletes this resource. Some resources do not support deletion, and will raise a 405 error if this is called.

Returns

HTTP response from the server

Return type

requests.Response

property description
Type

str

property external_id
Type

str

input_datasets()

Retrieve a collection of this project’s input datasets.

Returns

The project’s input datasets.

Return type

DatasetCollection

property name
Type

str

property relative_id
Type

str

remove_input_dataset(dataset)

Remove a dataset from a project.

Parameters

dataset (Dataset) – The dataset to be removed from this project.

Returns

HTTP response from the server

Return type

requests.Response

property resource_id
Type

str

spec()

Returns this project’s spec.

Returns

The spec for the project.

Return type

ProjectSpec

property type

A Tamr project type, listed in https://docs.tamr.com/reference#create-a-project.

Type

str

unified_dataset()

Unified dataset for this project.

Returns

Unified dataset for this project.

Return type

Dataset

Categories

Category

class tamr_unify_client.categorization.category.resource.Category(client, data, alias=None)[source]

A category of a taxonomy

property name
Type

str

property description
Type

str

property path
Type

list[str]

parent()[source]

Gets the parent Category of this one, or None if it is a tier 1 category

Returns

The parent Category or None

Return type

Category

spec()[source]

Returns this category’s spec.

Returns

The spec for the category.

Return type

CategorySpec

delete()

Deletes this resource. Some resources do not support deletion, and will raise a 405 error if this is called.

Returns

HTTP response from the server

Return type

requests.Response

property relative_id
Type

str

property resource_id
Type

str

Category Spec

class tamr_unify_client.categorization.category.resource.CategorySpec(client, data, api_path)[source]

A representation of the server view of a category.

static of(resource)[source]

Creates a category spec from a category.

Parameters

resource (Category) – The existing category.

Returns

The corresponding category spec.

Return type

CategorySpec

static new()[source]

Creates a blank spec that could be used to construct a new category.

Returns

The empty spec.

Return type

CategorySpec

from_data(data)[source]

Creates a spec with the same client and API path as this one, but new data.

Parameters

data (dict) – The data for the new spec.

Returns

The new spec.

Return type

CategorySpec

to_dict()[source]

Returns a version of this spec that conforms to the API representation.

Returns

The spec’s dict.

Return type

dict

with_name(new_name)[source]

Creates a new spec with the same properties, updating name.

Parameters

new_name (str) – The new name.

Returns

The new spec.

Return type

CategorySpec

with_description(new_description)[source]

Creates a new spec with the same properties, updating description.

Parameters

new_description (str) – The new description.

Returns

The new spec.

Return type

CategorySpec

with_path(new_path)[source]

Creates a new spec with the same properties, updating path.

Parameters

new_path (list[str]) – The new path.

Returns

The new spec.

Return type

CategorySpec

Category Collection

class tamr_unify_client.categorization.category.collection.CategoryCollection(client, api_path)[source]

Collection of Category s.

Parameters
  • client (Client) – Client for API call delegation.

  • api_path (str) – API path used to access this collection. E.g. "projects/1/taxonomy/categories".

by_resource_id(resource_id)[source]

Retrieve a category by resource ID.

Parameters

resource_id (str) – The resource ID. E.g. "1"

Returns

The specified category.

Return type

Category

by_relative_id(relative_id)[source]

Retrieve a category by relative ID.

Parameters

relative_id (str) – The relative ID. E.g. "projects/1/categories/1"

Returns

The specified category.

Return type

Category

by_external_id(external_id)[source]

Retrieve an attribute by external ID.

Since categories do not have external IDs, this method is not supported and will raise a NotImplementedError .

Parameters

external_id (str) – The external ID.

Returns

The specified category, if found.

Return type

Category

Raises
  • KeyError – If no category with the specified external_id is found

  • LookupError – If multiple categories with the specified external_id are found

stream()[source]

Stream categories in this collection. Implicitly called when iterating over this collection.

Returns

Stream of categories.

Return type

Python generator yielding Category

Usage:
>>> for category in collection.stream(): # explicit
>>>     do_stuff(category)
>>> for category in collection: # implicit
>>>     do_stuff(category)
create(creation_spec)[source]

Creates a new category.

Parameters

creation_spec (dict) – Category creation specification, formatted as specified in the Public Docs for Creating a Category.

Returns

The newly created category.

Return type

Category

bulk_create(creation_specs)[source]

Creates new categories in bulk.

Parameters

creation_specs (iterable[dict]) – A collection of creation specifications, as detailed for create.

Returns

JSON response from the server

Return type

dict

delete_by_resource_id(resource_id)

Deletes a resource from this collection by resource ID.

Parameters

resource_id (str) – The resource ID of the resource that will be deleted.

Returns

HTTP response from the server.

Return type

requests.Response

Taxonomy

class tamr_unify_client.categorization.taxonomy.Taxonomy(client, data, alias=None)[source]

A project’s taxonomy

property name
Type

str

categories()[source]

Retrieves the categories of this taxonomy.

Returns

A collection of the taxonomy categories.

Return type

CategoryCollection

delete()

Deletes this resource. Some resources do not support deletion, and will raise a 405 error if this is called.

Returns

HTTP response from the server

Return type

requests.Response

property relative_id
Type

str

property resource_id
Type

str

Datasets

Dataset

class tamr_unify_client.dataset.resource.Dataset(client, data, alias=None)[source]

A Tamr dataset.

property name
Type

str

property external_id
Type

str

property description
Type

str

property version
Type

str

property tags
Type

list[str]

property key_attribute_names
Type

list[str]

property attributes

Attributes of this dataset.

Returns

Attributes of this dataset.

Return type

AttributeCollection

upsert_records(records, primary_key_name, **json_args)[source]

Creates or updates the specified records.

Parameters
  • records (iterable[dict]) – The records to update, as dictionaries.

  • primary_key_name (str) – The name of the primary key for these records, which must be a key in each record dictionary.

  • **json_args – Arguments to pass to the JSON dumps function, as documented here. Some of these, such as indent, may not work with Tamr.

Returns

JSON response body from the server.

Return type

dict

delete_records(records, primary_key_name)[source]

Deletes the specified records.

Parameters
  • records (iterable[dict]) – The records to delete, as dictionaries.

  • primary_key_name (str) – The name of the primary key for these records, which must be a key in each record dictionary.

Returns

JSON response body from the server.

Return type

dict

delete_records_by_id(record_ids)[source]

Deletes the specified records.

Parameters

record_ids (iterable) – The IDs of the records to delete.

Returns

JSON response body from the server.

Return type

dict

delete_all_records()[source]

Removes all records from the dataset.

Returns

HTTP response from the server

Return type

requests.Response

refresh(**options)[source]

Brings dataset up-to-date if needed, taking whatever actions are required.

Parameters

**options – Options passed to underlying Operation . See apply_options() .

Returns

The refresh operation.

Return type

Operation

profile()[source]

Returns profile information for a dataset.

If profile information has not been generated, call create_profile() first. If the returned profile information is out-of-date, you can call refresh() on the returned object to bring it up-to-date.

Returns

Dataset Profile information.

Return type

DatasetProfile

create_profile(**options)[source]

Create a profile for this dataset.

If a profile already exists, the existing profile will be brought up to date.

Parameters

**options – Options passed to underlying Operation . See apply_options() .

Returns

The operation to create the profile.

Return type

Operation

records()[source]

Stream this dataset’s records as Python dictionaries.

Returns

Stream of records.

Return type

Python generator yielding dict

status()[source]

Retrieve this dataset’s streamability status.

Returns

Dataset streamability status.

Return type

DatasetStatus

usage()[source]

Retrieve this dataset’s usage by recipes and downstream datasets.

Returns

The dataset’s usage.

Return type

DatasetUsage

from_geo_features(features, geo_attr=None)[source]

Upsert this dataset from a geospatial FeatureCollection or iterable of Features.

features can be:

  • An object that implements __geo_interface__ as a FeatureCollection (see https://gist.github.com/sgillies/2217756)

  • An iterable of features, where each element is a feature dictionary or an object that implements the __geo_interface__ as a Feature

  • A map where the “features” key contains an iterable of features

See: geopandas.GeoDataFrame.from_features()

If geo_attr is provided, then the named Tamr attribute will be used for the geometry. If geo_attr is not provided, then the first attribute on the dataset with geometry type will be used for the geometry.

Parameters
  • features – geospatial features

  • geo_attr (str) – (optional) name of the Tamr attribute to use for the feature’s geometry

upstream_datasets()[source]

The Dataset’s upstream datasets.

API returns the URIs of the upstream datasets, resulting in a list of DatasetURIs, not actual Datasets.

Returns

A list of the Dataset’s upstream datasets.

Return type

list[DatasetURI]

spec()[source]

Returns this dataset’s spec.

Returns

The spec of this dataset.

Return type

DatasetSpec

delete(cascade=False)[source]

Deletes this dataset, optionally deleting all derived datasets as well.

Parameters

cascade (bool) – Whether to delete all datasets derived from this one. Optional, default is False. Do not use this option unless you are certain you need it as it can have unindended consequences.

Returns

HTTP response from the server

Return type

requests.Response

itergeofeatures(geo_attr=None)[source]

Returns an iterator that yields feature dictionaries that comply with __geo_interface__

See https://gist.github.com/sgillies/2217756

Parameters

geo_attr (str) – (optional) name of the Tamr attribute to use for the feature’s geometry

Returns

stream of features

Return type

Python generator yielding dict[str, object]

property relative_id
Type

str

property resource_id
Type

str

Dataset Spec

class tamr_unify_client.dataset.resource.DatasetSpec(client, data, api_path)[source]

A representation of the server view of a dataset.

static of(resource)[source]

Creates a dataset spec from a dataset.

Parameters

resource (Dataset) – The existing dataset.

Returns

The corresponding dataset spec.

Return type

DatasetSpec

static new()[source]

Creates a blank spec that could be used to construct a new dataset.

Returns

The empty spec.

Return type

DatasetSpec

from_data(data)[source]

Creates a spec with the same client and API path as this one, but new data.

Parameters

data (dict) – The data for the new spec.

Returns

The new spec.

Return type

DatasetSpec

to_dict()[source]

Returns a version of this spec that conforms to the API representation.

Returns

The spec’s dict.

Return type

dict

with_name(new_name)[source]

Creates a new spec with the same properties, updating name.

Parameters

new_name (str) – The new name.

Returns

A new spec.

Return type

DatasetSpec

with_external_id(new_external_id)[source]

Creates a new spec with the same properties, updating external ID.

Parameters

new_external_id (str) – The new external ID.

Returns

A new spec.

Return type

DatasetSpec

with_description(new_description)[source]

Creates a new spec with the same properties, updating description.

Parameters

new_description (str) – The new description.

Returns

A new spec.

Return type

DatasetSpec

with_key_attribute_names(new_key_attribute_names)[source]

Creates a new spec with the same properties, updating key attribute names.

Parameters

new_key_attribute_names (list[str]) – The new key attribute names.

Returns

A new spec.

Return type

DatasetSpec

with_tags(new_tags)[source]

Creates a new spec with the same properties, updating tags.

Parameters

new_tags (list[str]) – The new tags.

Returns

A new spec.

Return type

DatasetSpec

put()[source]

Updates the dataset on the server.

Returns

The modified dataset.

Return type

Dataset

Dataset Collection

class tamr_unify_client.dataset.collection.DatasetCollection(client, api_path='datasets')[source]

Collection of Dataset s.

Parameters
  • client (Client) – Client for API call delegation.

  • api_path (str) – API path used to access this collection. E.g. "projects/1/inputDatasets". Default: "datasets".

by_resource_id(resource_id)[source]

Retrieve a dataset by resource ID.

Parameters

resource_id (str) – The resource ID. E.g. "1"

Returns

The specified dataset.

Return type

Dataset

by_relative_id(relative_id)[source]

Retrieve a dataset by relative ID.

Parameters

relative_id (str) – The resource ID. E.g. "datasets/1"

Returns

The specified dataset.

Return type

Dataset

by_external_id(external_id)[source]

Retrieve a dataset by external ID.

Parameters

external_id (str) – The external ID.

Returns

The specified dataset, if found.

Return type

Dataset

Raises
  • KeyError – If no dataset with the specified external_id is found

  • LookupError – If multiple datasets with the specified external_id are found

stream()[source]

Stream datasets in this collection. Implicitly called when iterating over this collection.

Returns

Stream of datasets.

Return type

Python generator yielding Dataset

Usage:
>>> for dataset in collection.stream(): # explicit
>>>     do_stuff(dataset)
>>> for dataset in collection: # implicit
>>>     do_stuff(dataset)
by_name(dataset_name)[source]

Lookup a specific dataset in this collection by exact-match on name.

Parameters

dataset_name (str) – Name of the desired dataset.

Returns

Dataset with matching name in this collection.

Return type

Dataset

Raises

KeyError – If no dataset with specified name was found.

delete_by_resource_id(resource_id, cascade=False)[source]

Deletes a dataset from this collection by resource_id. Optionally deletes all derived datasets as well.

Parameters
  • resource_id (str) – The resource id of the dataset in this collection to delete.

  • cascade (bool) – Whether to delete all datasets derived from the deleted one. Optional, default is False. Do not use this option unless you are certain you need it as it can have unindended consequences.

Returns

HTTP response from the server.

Return type

requests.Response

create(creation_spec)[source]

Create a Dataset in Tamr

Parameters

creation_spec (dict[str, str]) – Dataset creation specification should be formatted as specified in the Public Docs for Creating a Dataset.

Returns

The created Dataset

Return type

Dataset

create_from_dataframe(df, primary_key_name, dataset_name, ignore_nan=True)[source]

Creates a dataset in this collection with the given name, creates an attribute for each column in the df (with primary_key_name as the key attribute), and upserts a record for each row of df.

Each attribute has the default type ARRAY[STRING], besides the key attribute, which will have type STRING.

This function attempts to ensure atomicity, but it is not guaranteed. If an error occurs while creating attributes or records, an attempt will be made to delete the dataset that was created. However, if this request errors, it will not try again.

Parameters
  • df (pandas.DataFrame) – The data to create the dataset with.

  • primary_key_name (str) – The name of the primary key of the dataset. Must be a column of df.

  • dataset_name (str) – What to name the dataset in Tamr. There cannot already be a dataset with this name.

  • ignore_nan (bool) – Whether to convert NaN values to null before upserting records to Tamr. If False and NaN is in df, this function will fail. Optional, default is True.

Returns

The newly created dataset.

Return type

Dataset

Raises
  • KeyError – If primary_key_name is not a column in df.

  • CreationError – If a step in creating the dataset fails.

class tamr_unify_client.dataset.collection.CreationError(error_message)[source]

An error from create_from_dataframe()

with_traceback()

Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.

Dataset Profile

class tamr_unify_client.dataset.profile.DatasetProfile(client, data, alias=None)[source]

Profile info of a Tamr dataset.

property dataset_name

The name of the associated dataset.

Type

str

Return type

str

property relative_dataset_id

The relative dataset ID of the associated dataset.

Type

str

Return type

str

property is_up_to_date

Whether the associated dataset is up to date.

Type

bool

Return type

bool

property profiled_data_version

The profiled data version.

Type

str

Return type

str

property profiled_at

Info about when profile info was generated.

Type

dict

Return type

dict

property simple_metrics

Simple metrics for profiled dataset.

Type

list

Return type

list

property attribute_profiles

Simple metrics for profiled dataset.

Type

list

Return type

list

refresh(**options)[source]

Updates the dataset profile if needed.

The dataset profile is updated on the server; you will need to call profile() to retrieve the updated profile.

Parameters

**options – Options passed to underlying Operation . See apply_options() .

Returns

The refresh operation.

Return type

Operation

delete()

Deletes this resource. Some resources do not support deletion, and will raise a 405 error if this is called.

Returns

HTTP response from the server

Return type

requests.Response

property relative_id
Type

str

property resource_id
Type

str

Dataset Status

class tamr_unify_client.dataset.status.DatasetStatus(client, data, alias=None)[source]

Streamability status of a Tamr dataset.

property dataset_name

The name of the associated dataset.

Type

str

Return type

str

property relative_dataset_id

The relative dataset ID of the associated dataset.

Type

str

Return type

str

property is_streamable

Whether the associated dataset is available to be streamed.

Type

bool

Return type

bool

delete()

Deletes this resource. Some resources do not support deletion, and will raise a 405 error if this is called.

Returns

HTTP response from the server

Return type

requests.Response

property relative_id
Type

str

property resource_id
Type

str

Dataset URI

class tamr_unify_client.dataset.uri.DatasetURI(client, uri)[source]

Indentifier of a dataset.

Parameters
  • client (Client) – Queried dataset’s client.

  • uri (str) – Queried dataset’s dataset ID.

property resource_id
Type

str

property relative_id
Type

str

property uri
Type

str

dataset()[source]

Fetch the dataset that this identifier points to.

Returns

A Tamr dataset.

Return type

class

~tamr_unify_client.dataset.resource.Dataset

Dataset Usage

class tamr_unify_client.dataset.usage.DatasetUsage(client, data, alias=None)[source]

The usage of a dataset and its downstream dependencies.

See https://docs.tamr.com/reference#retrieve-downstream-dataset-usage

property relative_id
Type

str

property usage
Type

DatasetUse

property dependencies
Type

list[DatasetUse]

delete()

Deletes this resource. Some resources do not support deletion, and will raise a 405 error if this is called.

Returns

HTTP response from the server

Return type

requests.Response

property resource_id
Type

str

Dataset Use

class tamr_unify_client.dataset.use.DatasetUse(client, data)[source]

The use of a dataset in project steps. This is not a BaseResource because it has no API path and cannot be directly retrieved or modified.

See https://docs.tamr.com/reference#retrieve-downstream-dataset-usage

Parameters
  • client (Client) – Delegate underlying API calls to this client.

  • data (dict) – The JSON body containing usage information.

property dataset_id
Type

str

property dataset_name
Type

str

property input_to_project_steps
Type

list[ProjectStep]

property output_from_project_steps
Type

list[ProjectStep]

dataset()[source]

Retrieves the Dataset this use represents.

Returns

The dataset being used.

Return type

Dataset

Machine Learning Model

class tamr_unify_client.base_model.MachineLearningModel(client, data, alias=None)[source]

A Tamr Machine Learning model.

train(**options)[source]

Learn from verified labels.

Parameters

**options – Options passed to underlying Operation . See apply_options() .

Returns

The resultant operation.

Return type

Operation

predict(**options)[source]

Suggest labels for unverified records.

Parameters

**options – Options passed to underlying Operation . See apply_options() .

Returns

The resultant operation.

Return type

Operation

delete()

Deletes this resource. Some resources do not support deletion, and will raise a 405 error if this is called.

Returns

HTTP response from the server

Return type

requests.Response

property relative_id
Type

str

property resource_id
Type

str

Mastering

Binning Model

class tamr_unify_client.mastering.binning_model.BinningModel(client, data, alias=None)[source]

A binning model object.

records()[source]

Stream this object’s records as Python dictionaries.

Returns

Stream of records.

Return type

Python generator yielding dict

update_records(records)[source]

Send a batch of record creations/updates/deletions to this dataset.

Parameters

records (iterable[dict]) – Each record should be formatted as specified in the Public Docs for Dataset updates.

Returns

JSON response body from server.

Return type

dict

delete()

Deletes this resource. Some resources do not support deletion, and will raise a 405 error if this is called.

Returns

HTTP response from the server

Return type

requests.Response

property relative_id
Type

str

property resource_id
Type

str

Estimated Pair Counts

class tamr_unify_client.mastering.estimated_pair_counts.EstimatedPairCounts(client, data, alias=None)[source]

Estimated Pair Counts info for Mastering Project

property is_up_to_date

Whether an estimate pairs job has been run since the last edit to the binning model.

Return type

bool

property total_estimate

The total number of estimated candidate pairs and generated pairs for the model across all clauses.

Returns

A dictionary containing candidate pairs and estimated pairs mapped to their corresponding estimated counts. For example:

{

“candidatePairCount”: “54321”,

”generatedPairCount”: “12345”

}

Return type

dict[str, str]

property clause_estimates

The estimated candidate pair count and generated pair count for each clause in the model.

Returns

A dictionary containing each clause name mapped to a dictionary containing the corresponding estimated candidate and generated pair counts. For example:

{

“Clause1”: {

“candidatePairCount”: “321”,

”generatedPairCount”: “123”

},

”Clause2”: {

“candidatePairCount”: “654”,

”generatedPairCount”: “456”

}

}

Return type

dict[str, dict[str, str]]

refresh(**options)[source]

Updates the estimated pair counts if needed.

The pair count estimates are updated on the server; you will need to call estimate_pairs() to retrieve the updated estimate.

Parameters

**options – Options passed to underlying Operation . See apply_options() .

Returns

The refresh operation.

Return type

Operation

delete()

Deletes this resource. Some resources do not support deletion, and will raise a 405 error if this is called.

Returns

HTTP response from the server

Return type

requests.Response

property relative_id
Type

str

property resource_id
Type

str

Mastering Project

class tamr_unify_client.mastering.project.MasteringProject(client, data, alias=None)[source]

A Mastering project in Tamr.

pairs()[source]

Record pairs generated by Tamr’s binning model. Pairs are displayed on the “Pairs” page in the Tamr UI.

Call refresh() from this dataset to regenerate pairs according to the latest binning model.

Returns

The record pairs represented as a dataset.

Return type

Dataset

pair_matching_model()[source]

Machine learning model for pair-matching for this Mastering project. Learns from verified labels and predicts categorization labels for unlabeled pairs.

Calling predict() from this dataset will produce new (unpublished) clusters. These clusters are displayed on the “Clusters” page in the Tamr UI.

Returns

The machine learning model for pair-matching.

Return type

MachineLearningModel

high_impact_pairs()[source]

High-impact pairs as a dataset. Tamr labels pairs as “high-impact” if labeling these pairs would help it learn most quickly (i.e. “Active learning”).

High-impact pairs are displayed with a ⚡ lightning bolt icon on the “Pairs” page in the Tamr UI.

Call refresh() from this dataset to produce new high-impact pairs according to the latest pair-matching model.

Returns

The high-impact pairs represented as a dataset.

Return type

Dataset

record_clusters()[source]

Record Clusters as a dataset. Tamr clusters labeled pairs using pairs model. These clusters populate the cluster review page and get transient cluster ids, rather than published cluster ids (i.e., “Permanent Ids”)

Call refresh() from this dataset to generate clusters based on to the latest pair-matching model.

Returns

The record clusters represented as a dataset.

Return type

Dataset

published_clusters()[source]

Published record clusters generated by Tamr’s pair-matching model.

Returns

The published clusters represented as a dataset.

Return type

Dataset

published_clusters_configuration()[source]

Retrieves published clusters configuration for this project.

Returns

The published clusters configuration

Return type

PublishedClustersConfiguration

published_cluster_ids()[source]

Retrieves published cluster IDs for this project.

Returns

The published cluster ID dataset.

Return type

Dataset

published_cluster_stats()[source]

Retrieves published cluster stats for this project.

Returns

The published cluster stats dataset.

Return type

Dataset

published_cluster_versions(cluster_ids)[source]

Retrieves version information for the specified published clusters. See https://docs.tamr.com/reference#retrieve-published-clusters-given-cluster-ids.

Parameters

cluster_ids (iterable[str]) – The persistent IDs of the clusters to get version information for.

Returns

A stream of the published clusters.

Return type

Python generator yielding PublishedCluster

record_published_cluster_versions(record_ids)[source]

Retrieves version information for the published clusters of the given records. See https://docs.tamr.com/reference#retrieve-published-clusters-given-record-ids.

Parameters

record_ids (iterable[str]) – The Tamr IDs of the records to get cluster version information for.

Returns

A stream of the relevant published clusters.

Return type

Python generator yielding RecordPublishedCluster

estimate_pairs()[source]

Returns pair estimate information for a mastering project

Returns

Pairs Estimate information.

Return type

EstimatedPairCounts

record_clusters_with_data()[source]

Project’s unified dataset with associated clusters.

Returns

The record clusters with data represented as a dataset

Return type

Dataset

published_clusters_with_data()[source]

Project’s unified dataset with associated clusters.

Returns

The published clusters with data represented as a dataset

Return type

Dataset

binning_model()[source]

Binning model for this project.

Returns

Binning model for this project.

Return type

BinningModel

add_input_dataset(dataset)

Associate a dataset with a project in Tamr.

By default, datasets are not associated with any projects. They need to be added as input to a project before they can be used as part of that project

Parameters

dataset (Dataset) – The dataset to associate with the project.

Returns

HTTP response from the server

Return type

requests.Response

as_categorization()

Convert this project to a CategorizationProject

Returns

This project.

Return type

CategorizationProject

Raises

TypeError – If the type of this project is not "CATEGORIZATION"

as_mastering()

Convert this project to a MasteringProject

Returns

This project.

Return type

MasteringProject

Raises

TypeError – If the type of this project is not "DEDUP"

attribute_configurations()

Project’s attribute’s configurations.

Returns

The configurations of the attributes of a project.

Return type

AttributeConfigurationCollection

attribute_mappings()

Project’s attribute’s mappings.

Returns

The attribute mappings of a project.

Return type

AttributeMappingCollection

property attributes

Attributes of this project.

Returns

Attributes of this project.

Return type

AttributeCollection

delete()

Deletes this resource. Some resources do not support deletion, and will raise a 405 error if this is called.

Returns

HTTP response from the server

Return type

requests.Response

property description
Type

str

property external_id
Type

str

input_datasets()

Retrieve a collection of this project’s input datasets.

Returns

The project’s input datasets.

Return type

DatasetCollection

property name
Type

str

property relative_id
Type

str

remove_input_dataset(dataset)

Remove a dataset from a project.

Parameters

dataset (Dataset) – The dataset to be removed from this project.

Returns

HTTP response from the server

Return type

requests.Response

property resource_id
Type

str

spec()

Returns this project’s spec.

Returns

The spec for the project.

Return type

ProjectSpec

property type

A Tamr project type, listed in https://docs.tamr.com/reference#create-a-project.

Type

str

unified_dataset()

Unified dataset for this project.

Returns

Unified dataset for this project.

Return type

Dataset

Published Clusters

Metric

class tamr_unify_client.mastering.published_cluster.metric.Metric(data)[source]

A metric for a published cluster.

This is not a BaseResource because it does not have its own API endpoint.

Parameters

data – The JSON entity representing this cluster.

property name
Type

str

property value
Type

str

Published Cluster

class tamr_unify_client.mastering.published_cluster.resource.PublishedCluster(data)[source]

A representation of a published cluster in a mastering project with version information. See https://docs.tamr.com/reference#retrieve-published-clusters-given-cluster-ids.

This is not a BaseResource because it does not have its own API endpoint.

Parameters

data – The JSON entity representing this PublishedCluster.

property id
Type

str

property versions
Type

list[PublishedClusterVersion]

Published Cluster Configuration

class tamr_unify_client.mastering.published_cluster.configuration.PublishedClustersConfiguration(client, data, alias=None)[source]

The configuration of published clusters in a project.

See https://docs.tamr.com/reference#the-published-clusters-configuration-object

property relative_id
Type

str

property versions_time_to_live
Type

str

spec()[source]

Returns a spec representation of this published cluster configuration.

Returns

The published cluster configuration spec.

Return type

:class`~tamr_unify_client.mastering.published_cluster.configuration.PublishedClustersConfigurationSpec`

delete()

Deletes this resource. Some resources do not support deletion, and will raise a 405 error if this is called.

Returns

HTTP response from the server

Return type

requests.Response

property resource_id
Type

str

Published Cluster Version

class tamr_unify_client.mastering.published_cluster.version.PublishedClusterVersion(data)[source]

A version of a published cluster in a mastering project.

This is not a BaseResource because it does not have its own API endpoint.

Parameters

data – The JSON entity representing this version.

property version
Type

str

property timestamp
Type

str

property name
Type

str

property metrics
Type

list[Metric]

property record_ids
Type

list[dict[str, str]]

Record Published Cluster

class tamr_unify_client.mastering.published_cluster.record.RecordPublishedCluster(data)[source]

A representation of a published cluster of a record in a mastering project with version information. See https://docs.tamr.com/reference#retrieve-published-clusters-given-record-ids.

This is not a BaseResource because it does not have its own API endpoint.

Parameters

data – The JSON entity representing this RecordPublishedCluster.

property entity_id
Type

str

property source_id
Type

str

property origin_entity_id
Type

str

property origin_source_id
Type

str

property versions
Type

list[RecordPublishedClusterVersion]

Record Published Cluster Version

class tamr_unify_client.mastering.published_cluster.record_version.RecordPublishedClusterVersion(data)[source]

A version of a published cluster in a mastering project.

This is not a BaseResource because it does not have its own API endpoint.

Parameters

data – The JSON entity representing this version.

property version
Type

str

property timestamp
Type

str

property cluster_id
Type

str

Operation

class tamr_unify_client.operation.Operation(client, data, alias=None)[source]

A long-running operation performed by Tamr. Operations appear on the “Jobs” page of the Tamr UI.

By design, client-side operations represent server-side operations at a particular point in time (namely, when the operation was fetched from the server). In other words: Operations will not pick up on server-side changes automatically. To get an up-to-date representation, refetch the operation e.g. op = op.poll().

classmethod from_response(client, response)[source]

Handle idiosyncrasies in constructing Operations from Tamr responses. When a Tamr API call would start an operation, but all results that would be produced by that operation are already up-to-date, Tamr returns HTTP 204 No Content

To make it easy for client code to handle these API responses without checking the response code, this method will either construct an Operation, or a dummy NoOp operation representing the 204 Success response.

Parameters
  • client (Client) – Delegate underlying API calls to this client.

  • response (requests.Response) – HTTP Response from the request that started the operation.

Returns

Operation

Return type

Operation

apply_options(asynchronous=False, **options)[source]

Applies operation options to this operation.

NOTE: This function should not be called directly. Rather, options should be passed in through a higher-level function e.g. refresh() .

Synchronous mode:

Automatically waits for operation to resolve before returning the operation.

asynchronous mode:

Immediately return the 'PENDING' operation. It is up to the user to coordinate this operation with their code via wait() and/or poll() .

Parameters
  • asynchronous (bool) – Whether or not to run in asynchronous mode. Default: False.

  • **options – When running in synchronous mode, these options are passed to the underlying wait() call.

Returns

Operation with options applied.

Return type

Operation

property type
Type

str

property description
Type

str

property state

Server-side state of this operation.

Operation state can be unresolved (i.e. state is one of: 'PENDING', 'RUNNING'), or resolved (i.e. state is one of: 'CANCELED', 'SUCCEEDED', 'FAILED'). Unless opting into asynchronous mode, all exposed operations should be resolved.

Note: you only need to manually pick up server-side changes when opting into asynchronous mode when kicking off this operation.

Usage:
>>> op.state # operation is currently 'PENDING'
'PENDING'
>>> op.wait() # continually polls until operation resolves
>>> op.state # incorrect usage; operation object state never changes.
'PENDING'
>>> op = op.poll() # correct usage; use value returned by Operation.poll or Operation.wait
>>> op.state
'SUCCEEDED'
poll()[source]

Poll this operation for server-side updates.

Does not update the calling Operation object. Instead, returns a new Operation.

Returns

Updated representation of this operation.

Return type

Operation

wait(poll_interval_seconds=3, timeout_seconds=None)[source]

Continuously polls for this operation’s server-side state.

Parameters
  • poll_interval_seconds (int) – Time interval (in seconds) between subsequent polls.

  • timeout_seconds (int) – Time (in seconds) to wait for operation to resolve.

Raises

TimeoutError – If operation takes longer than timeout_seconds to resolve.

Returns

Resolved operation.

Return type

Operation

succeeded()[source]

Convenience method for checking if operation was successful.

Returns

True if operation’s state is 'SUCCEEDED', False otherwise.

Return type

bool

delete()

Deletes this resource. Some resources do not support deletion, and will raise a 405 error if this is called.

Returns

HTTP response from the server

Return type

requests.Response

property relative_id
Type

str

property resource_id
Type

str

Projects

Attribute Configurations

Attribute Configuration

class tamr_unify_client.project.attribute_configuration.resource.AttributeConfiguration(client, data, alias=None)[source]

The configurations of Tamr Attributes.

See https://docs.tamr.com/reference#the-attribute-configuration-object

property relative_id
Type

str

property id
Type

str

property relative_attribute_id
Type

str

property attribute_role
Type

str

property similarity_function
Type

str

property enabled_for_ml
Type

bool

property tokenizer
Type

str

property numeric_field_resolution
Type

list

property attribute_name
Type

str

spec()[source]

Returns this attribute configuration’s spec.

Returns

The spec of this attribute configuration.

Return type

AttributeConfigurationSpec

delete()

Deletes this resource. Some resources do not support deletion, and will raise a 405 error if this is called.

Returns

HTTP response from the server

Return type

requests.Response

property resource_id
Type

str

Attribute Configuration Spec

class tamr_unify_client.project.attribute_configuration.resource.AttributeConfigurationSpec(client, data, api_path)[source]

A representation of the server view of an attribute configuration.

static of(resource)[source]

Creates an attribute configuration spec from an attribute configuration.

Parameters

resource (AttributeConfiguration) – The existing attribute configuration.

Returns

The corresponding attribute creation spec.

Return type

AttributeConfigurationSpec

static new()[source]

Creates a blank spec that could be used to construct a new attribute configuration.

Returns

The empty spec.

Return type

AttributeConfigurationSpec

from_data(data)[source]

Creates a spec with the same client and API path as this one, but new data.

Parameters

data (dict) – The data for the new spec.

Returns

The new spec.

Return type

AttributeConfigurationSpec

to_dict()[source]

Returns a version of this spec that conforms to the API representation.

Returns

The spec’s dict.

Return type

dict

with_attribute_role(new_attribute_role)[source]

Creates a new spec with the same properties, updating attribute role.

Parameters

new_attribute_role (str) – The new attribute role.

Returns

A new spec.

Return type

AttributeConfigurationSpec

with_similarity_function(new_similarity_function)[source]

Creates a new spec with the same properties, updating similarity function.

Parameters

new_similarity_function (str) – The new similarity function.

Returns

A new spec.

Return type

AttributeConfigurationSpec

with_enabled_for_ml(new_enabled_for_ml)[source]

Creates a new spec with the same properties, updating enabled for ML.

Parameters

new_enabled_for_ml (bool) – Whether the builder is enabled for ML.

Returns

A new spec.

Return type

AttributeConfigurationSpec

with_tokenizer(new_tokenizer)[source]

Creates a new spec with the same properties, updating tokenizer.

Parameters

new_tokenizer (str) – The new tokenizer.

Returns

A new spec.

Return type

AttributeConfigurationSpec

with_numeric_field_resolution(new_numeric_field_resolution)[source]

Creates a new spec with the same properties, updating numeric field resolution.

Parameters

new_numeric_field_resolution (str) – The new numeric field resolution.

Returns

A new spec.

Return type

AttributeConfigurationSpec

with_attribute_name(new_attribute_name)[source]

Creates a new spec with the same properties, updating new attribute name.

Parameters

new_attribute_name (str) – The new attribute name.

Returns

A new spec.

Return type

AttributeConfigurationSpec

put()[source]

Updates the attribute configuration on the server.

Returns

The modified attribute configuration.

Return type

AttributeConfiguration

Attribute Configuration Collection

class tamr_unify_client.project.attribute_configuration.collection.AttributeConfigurationCollection(client, api_path)[source]

Collection of AttributeConfiguration

Parameters
  • client (Client) – Client for API call delegation.

  • api_path (str) – API path used to access this collection. E.g. "projects/1/attributeConfigurations"

by_resource_id(resource_id)[source]

Retrieve an attribute configuration by resource ID.

Parameters

resource_id (str) – The resource ID.

Returns

The specified attribute configuration.

Return type

AttributeConfiguration

by_relative_id(relative_id)[source]

Retrieve an attribute configuration by relative ID.

Parameters

relative_id (str) – The relative ID.

Returns

The specified attribute configuration.

Return type

AttributeConfiguration

by_external_id(external_id)[source]

Retrieve an attribute configuration by external ID.

Since attributes do not have external IDs, this method is not supported and will raise a NotImplementedError .

Parameters

external_id (str) – The external ID.

Returns

The specified attribute, if found.

Return type

AttributeConfiguration

Raises
  • KeyError – If no attribute with the specified external_id is found

  • LookupError – If multiple attributes with the specified external_id are found

  • NotImplementedError – AttributeConfiguration does not support external_id

stream()[source]

Stream attribute configurations in this collection. Implicitly called when iterating over this collection.

Returns

Stream of attribute configurations.

Return type

Python generator yielding AttributeConfiguration

Usage:
>>> for attributeConfiguration in collection.stream(): # explicit
>>>     do_stuff(attributeConfiguration)
>>> for attributeConfiguration in collection: # implicit
>>>     do_stuff(attributeConfiguration)
create(creation_spec)[source]

Create an Attribute configuration in this collection

Parameters

creation_spec (dict[str, str]) – Attribute configuration creation specification should be formatted as specified in the Public Docs for adding an AttributeConfiguration.

Returns

The created Attribute configuration

Return type

AttributeConfiguration

delete_by_resource_id(resource_id)

Deletes a resource from this collection by resource ID.

Parameters

resource_id (str) – The resource ID of the resource that will be deleted.

Returns

HTTP response from the server.

Return type

requests.Response

Attribute Mappings

Attribute Mapping

class tamr_unify_client.project.attribute_mapping.resource.AttributeMapping(client, data)[source]

see https://docs.tamr.com/reference#retrieve-projects-mappings AttributeMapping and AttributeMappingCollection do not inherit from BaseResource and BaseCollection. BC and BR require a specific URL for each individual attribute mapping (ex: /projects/1/attributeMappings/1), but these types of URLs do not exist for attribute mappings

property id
Type

str

property relative_id
Type

str

property input_attribute_id
Type

str

property relative_input_attribute_id
Type

str

property input_dataset_name
Type

str

property input_attribute_name
Type

str

property unified_attribute_id
Type

str

property relative_unified_attribute_id
Type

str

property unified_dataset_name
Type

str

property unified_attribute_name
Type

str

property resource_id
Type

str

spec()[source]

Returns a spec representation of this attribute mapping.

Returns

The attribute mapping spec.

Return type

AttributeMappingSpec

delete()[source]

Delete this attribute mapping.

Returns

HTTP response from the server

Return type

requests.Response

Attribute Mapping Spec

class tamr_unify_client.project.attribute_mapping.resource.AttributeMappingSpec(data)[source]

A representation of the server view of an attribute mapping

static of(resource)[source]

Creates an attribute mapping spec from a attribute mapping.

Parameters

resource (AttributeMapping) – The existing attribute mapping.

Returns

The corresponding attribute mapping spec.

Return type

AttributeMappingSpec

static new()[source]

Creates a blank spec that could be used to construct a new attribute mapping.

Returns

The empty spec.

Return type

AttributeMappingSpec

to_dict()[source]

Returns a version of this spec that conforms to the API representation.

Returns

The spec’s dict.

Return type

dict

with_input_attribute_id(new_input_attribute_id)[source]

Creates a new spec with the same properties, updating the input attribute id.

Parameters

new_input_attribute_id (str) – The new input attribute id.

Returns

The new spec.

Return type

AttributeMappingSpec

with_relative_input_attribute_id(new_relative_input_attribute_id)[source]

Creates a new spec with the same properties, updating the relative input attribute id.

Parameters

new_relative_input_attribute_id (str) – The new relative input attribute Id.

Returns

The new spec.

Return type

AttributeMappingSpec

with_input_dataset_name(new_input_dataset_name)[source]

Creates a new spec with the same properties, updating the input dataset name.

Parameters

new_input_dataset_name (str) – The new input dataset name.

Returns

The new spec.

Return type

AttributeMappingSpec

with_input_attribute_name(new_input_attribute_name)[source]

Creates a new spec with the same properties, updating the input attribute name.

Parameters

new_input_attribute_name (str) – The new input attribute name.

Returns

The new spec.

Return type

AttributeMappingSpec

with_unified_attribute_id(new_unified_attribute_id)[source]

Creates a new spec with the same properties, updating the unified attribute id.

Parameters

new_unified_attribute_id (str) – The new unified attribute id.

Returns

The new spec.

Return type

AttributeMappingSpec

with_relative_unified_attribute_id(new_relative_unified_attribute_id)[source]

Creates a new spec with the same properties, updating the relative unified attribute id.

Parameters

new_relative_unified_attribute_id (str) – The new relative unified attribute id.

Returns

The new spec.

Return type

AttributeMappingSpec

with_unified_dataset_name(new_unified_dataset_name)[source]

Creates a new spec with the same properties, updating the unified dataset name.

Parameters

new_unified_dataset_name (str) – The new unified dataset name.

Returns

The new spec.

Return type

AttributeMappingSpec

with_unified_attribute_name(new_unified_attribute_name)[source]

Creates a new spec with the same properties, updating the unified attribute name.

Parameters

new_unified_attribute_name (str) – The new unified attribute name.

Returns

The new spec.

Return type

AttributeMappingSpec

Attribute Mapping Collection

class tamr_unify_client.project.attribute_mapping.collection.AttributeMappingCollection(client, api_path)[source]

Collection of AttributeMapping

Parameters
  • client (Client) – Client for API call delegation.

  • api_path (str) – API path used to access this collection.

stream()[source]

Stream attribute mappings in this collection. Implicitly called when iterating over this collection.

Returns

Stream of attribute mappings.

Return type

Python generator yielding AttributeMapping

by_resource_id(resource_id)[source]

Retrieve an item in this collection by resource ID.

Parameters

resource_id (str) – The resource ID.

Returns

The specified attribute mapping.

Return type

AttributeMapping

by_relative_id(relative_id)[source]

Retrieve an item in this collection by relative ID.

Parameters

relative_id (str) – The relative ID.

Returns

The specified attribute mapping.

Return type

AttributeMapping

create(creation_spec)[source]

Create an Attribute mapping in this collection

Parameters

creation_spec (dict[str, str]) – Attribute mapping creation specification should be formatted as specified in the Public Docs for adding an AttributeMapping.

Returns

The created Attribute mapping

Return type

AttributeMapping

delete_by_resource_id(resource_id)[source]

Delete an attribute mapping using its Resource ID.

Parameters

resource_id (str) – the resource ID of the mapping to be deleted.

Returns

HTTP response from the server

Return type

requests.Response

Project

class tamr_unify_client.project.resource.Project(client, data, alias=None)[source]

A Tamr project.

property name
Type

str

property external_id
Type

str

property description
Type

str

property type

A Tamr project type, listed in https://docs.tamr.com/reference#create-a-project.

Type

str

property attributes

Attributes of this project.

Returns

Attributes of this project.

Return type

AttributeCollection

unified_dataset()[source]

Unified dataset for this project.

Returns

Unified dataset for this project.

Return type

Dataset

as_categorization()[source]

Convert this project to a CategorizationProject

Returns

This project.

Return type

CategorizationProject

Raises

TypeError – If the type of this project is not "CATEGORIZATION"

as_mastering()[source]

Convert this project to a MasteringProject

Returns

This project.

Return type

MasteringProject

Raises

TypeError – If the type of this project is not "DEDUP"

add_input_dataset(dataset)[source]

Associate a dataset with a project in Tamr.

By default, datasets are not associated with any projects. They need to be added as input to a project before they can be used as part of that project

Parameters

dataset (Dataset) – The dataset to associate with the project.

Returns

HTTP response from the server

Return type

requests.Response

remove_input_dataset(dataset)[source]

Remove a dataset from a project.

Parameters

dataset (Dataset) – The dataset to be removed from this project.

Returns

HTTP response from the server

Return type

requests.Response

input_datasets()[source]

Retrieve a collection of this project’s input datasets.

Returns

The project’s input datasets.

Return type

DatasetCollection

attribute_configurations()[source]

Project’s attribute’s configurations.

Returns

The configurations of the attributes of a project.

Return type

AttributeConfigurationCollection

attribute_mappings()[source]

Project’s attribute’s mappings.

Returns

The attribute mappings of a project.

Return type

AttributeMappingCollection

spec()[source]

Returns this project’s spec.

Returns

The spec for the project.

Return type

ProjectSpec

delete()

Deletes this resource. Some resources do not support deletion, and will raise a 405 error if this is called.

Returns

HTTP response from the server

Return type

requests.Response

property relative_id
Type

str

property resource_id
Type

str

Project Spec

class tamr_unify_client.project.resource.ProjectSpec(client, data, api_path)[source]

A representation of the server view of a project.

static of(resource)[source]

Creates a project spec from a project.

Parameters

resource (Project) – The existing project.

Returns

The corresponding project spec.

Return type

ProjectSpec

static new()[source]

Creates a blank spec that could be used to construct a new project.

Returns

The empty spec.

Return type

ProjectSpec

from_data(data)[source]

Creates a spec with the same client and API path as this one, but new data.

Parameters

data (dict) – The data for the new spec.

Returns

The new spec.

Return type

ProjectSpec

to_dict()[source]

Returns a version of this spec that conforms to the API representation.

Returns

The spec’s dict.

Return type

dict

with_name(new_name)[source]

Creates a new spec with the same properties, updating name.

Parameters

new_name (str) – The new name.

Returns

The new spec.

Return type

ProjectSpec

with_description(new_description)[source]

Creates a new spec with the same properties, updating description.

Parameters

new_description (str) – The new description.

Returns

The new spec.

Return type

ProjectSpec

with_type(new_type)[source]

Creates a new spec with the same properties, updating type.

Parameters

new_type (str) – The new type.

Returns

The new spec.

Return type

ProjectSpec

with_external_id(new_external_id)[source]

Creates a new spec with the same properties, updating external ID.

Parameters

new_external_id (str) – The new external ID.

Returns

The new spec.

Return type

ProjectSpec

with_unified_dataset_name(new_unified_dataset_name)[source]

Creates a new spec with the same properties, updating unified dataset name.

Parameters

new_unified_dataset_name (str) – The new unified dataset name.

Returns

The new spec.

Return type

ProjectSpec

put()[source]

Commits these changes by updating the project in Tamr.

Returns

The updated project.

Return type

Project

Project Collection

class tamr_unify_client.project.collection.ProjectCollection(client, api_path='projects')[source]

Collection of Project s.

Parameters
  • client (Client) – Client for API call delegation.

  • api_path (str) – API path used to access this collection. Default: "projects".

by_resource_id(resource_id)[source]

Retrieve a project by resource ID.

Parameters

resource_id (str) – The resource ID. E.g. "1"

Returns

The specified project.

Return type

Project

by_relative_id(relative_id)[source]

Retrieve a project by relative ID.

Parameters

relative_id (str) – The resource ID. E.g. "projects/1"

Returns

The specified project.

Return type

Project

by_external_id(external_id)[source]

Retrieve a project by external ID.

Parameters

external_id (str) – The external ID.

Returns

The specified project, if found.

Return type

Project

Raises
  • KeyError – If no project with the specified external_id is found

  • LookupError – If multiple projects with the specified external_id are found

stream()[source]

Stream projects in this collection. Implicitly called when iterating over this collection.

Returns

Stream of projects.

Return type

Python generator yielding Project

Usage:
>>> for project in collection.stream(): # explicit
>>>     do_stuff(project)
>>> for project in collection: # implicit
>>>     do_stuff(project)
create(creation_spec)[source]

Create a Project in Tamr

Parameters

creation_spec (dict[str, str]) – Project creation specification should be formatted as specified in the Public Docs for Creating a Project.

Returns

The created Project

Return type

Project

delete_by_resource_id(resource_id)

Deletes a resource from this collection by resource ID.

Parameters

resource_id (str) – The resource ID of the resource that will be deleted.

Returns

HTTP response from the server.

Return type

requests.Response

Project Step

class tamr_unify_client.project.step.ProjectStep(client, data)[source]

A step of a Tamr project. This is not a BaseResource because it has no API path and cannot be directly retrieved or modified.

See https://docs.tamr.com/reference#retrieve-downstream-dataset-usage

Parameters
  • client (Client) – Delegate underlying API calls to this client.

  • data (dict) – The JSON body containing project step information.

property project_step_id
Type

str

property project_step_name
Type

str

property project_name
Type

str

property type

A Tamr project type, listed in https://docs.tamr.com/reference#create-a-project.

Type

str

project()[source]

Retrieves the Project this step is associated with.

Returns

This step’s project.

Return type

Project

Raises
  • KeyError – If no project with the specified name is found.

  • LookupError – If multiple projects with the specified name are found.