Developer Interface¶
Authentication¶
-
class
tamr_unify_client.auth.
UsernamePasswordAuth
(username, password)[source]¶ Provides username/password authentication for Tamr. Specifically, sets the Authorization HTTP header with Tamr’s custom BasicCreds format.
- Usage:
>>> from tamr_unify_client.auth import UsernamePasswordAuth >>> auth = UsernamePasswordAuth('my username', 'my password') >>> import tamr_unify_client as api >>> unify = api.Client(auth)
Client¶
-
class
tamr_unify_client.
Client
(auth, host='localhost', protocol='http', port=9100, base_path='/api/versioned/v1/', session=None)[source]¶ Python Client for Tamr API.
Each client is specific to a specific origin (protocol, host, port).
- Parameters
auth (
AuthBase
) –Tamr-compatible Authentication provider.
Recommended: use one of the classes described in Authentication
host (
str
) – Host address of remote Tamr instance (e.g.'10.0.10.0'
)protocol (
str
) – Either'http'
or'https'
port (
int
) – Tamr instance main portbase_path (
str
) – Base API path. Requests made by this client will be relative to this path.session (
Optional
[Session
]) – Session to use for API calls. If none is provided, will use a newrequests.Session
.
Example
>>> from tamr_unify_client import Client >>> from tamr_unify_client.auth import UsernamePasswordAuth >>> auth = UsernamePasswordAuth('my username', 'my password') >>> tamr_local = Client(auth) # on http://localhost:9100 >>> tamr_remote = Client(auth, protocol='https', host='10.0.10.0') # on https://10.0.10.0:9100
-
property
origin
¶ HTTP origin i.e.
<protocol>://<host>[:<port>]
.For additional information, see MDN web docs .
- Return type
-
request
(method, endpoint, **kwargs)[source]¶ Sends a request to Tamr.
The URL for the request will be
<origin>/<base_path>/<endpoint>
. The request is authenticated viaClient.auth
.
-
property
projects
¶ Collection of all projects on this Tamr instance.
- Return type
- Returns
Collection of all projects.
-
property
datasets
¶ Collection of all datasets on this Tamr instance.
- Return type
- Returns
Collection of all datasets.
Attributes¶
Attribute¶
-
class
tamr_unify_client.attribute.resource.
Attribute
(client, data, alias=None)[source]¶ A Tamr Attribute.
See https://docs.tamr.com/reference#attribute-types
-
property
type
¶ - Type
-
spec
()[source]¶ Returns a spec representation of this attribute.
- Returns
The attribute spec.
- Return type
-
delete
()¶ Deletes this resource. Some resources do not support deletion, and will raise a 405 error if this is called.
- Returns
HTTP response from the server
- Return type
-
property
Attribute Spec¶
-
class
tamr_unify_client.attribute.resource.
AttributeSpec
(client, data, api_path)[source]¶ A representation of the server view of an attribute
-
static
of
(resource)[source]¶ Creates an attribute spec from an attribute.
- Parameters
resource (
Attribute
) – The existing attribute.- Returns
The corresponding attribute spec.
- Return type
-
static
new
()[source]¶ Creates a blank spec that could be used to construct a new attribute.
- Returns
The empty spec.
- Return type
-
from_data
(data)[source]¶ Creates a spec with the same client and API path as this one, but new data.
- Parameters
data (dict) – The data for the new spec.
- Returns
The new spec.
- Return type
-
to_dict
()[source]¶ Returns a version of this spec that conforms to the API representation.
- Returns
The spec’s dict.
- Return type
-
with_name
(new_name)[source]¶ Creates a new spec with the same properties, updating name.
- Parameters
new_name (str) – The new name.
- Returns
The new spec.
- Return type
-
with_description
(new_description)[source]¶ Creates a new spec with the same properties, updating description.
- Parameters
new_description (str) – The new description.
- Returns
The new spec.
- Return type
-
with_type
(new_type)[source]¶ Creates a new spec with the same properties, updating type.
- Parameters
new_type (
AttributeTypeSpec
) – The spec of the new type.- Returns
The new spec.
- Return type
-
static
Attribute Collection¶
-
class
tamr_unify_client.attribute.collection.
AttributeCollection
(client, api_path)[source]¶ Collection of
Attribute
s.- Parameters
-
by_external_id
(external_id)[source]¶ Retrieve an attribute by external ID.
Since attributes do not have external IDs, this method is not supported and will raise a
NotImplementedError
.- Parameters
external_id (str) – The external ID.
- Returns
The specified attribute, if found.
- Return type
- Raises
KeyError – If no attribute with the specified external_id is found
LookupError – If multiple attributes with the specified external_id are found
-
stream
()[source]¶ Stream attributes in this collection. Implicitly called when iterating over this collection.
- Returns
Stream of attributes.
- Return type
Python generator yielding
Attribute
- Usage:
>>> for attribute in collection.stream(): # explicit >>> do_stuff(attribute) >>> for attribute in collection: # implicit >>> do_stuff(attribute)
-
by_name
(attribute_name)[source]¶ Lookup a specific attribute in this collection by exact-match on name.
-
create
(creation_spec)[source]¶ Create an Attribute in this collection
- Parameters
creation_spec (dict[str, str]) – Attribute creation specification should be formatted as specified in the Public Docs for adding an Attribute.
- Returns
The created Attribute
- Return type
Attribute Type¶
-
class
tamr_unify_client.attribute.type.
AttributeType
(data)[source]¶ The type of an
Attribute
orSubAttribute
.See https://docs.tamr.com/reference#attribute-types
- Parameters
data (
dict
) – JSON data representing this type
-
property
inner_type
¶ - Type
-
property
attributes
¶ - Type
list[
SubAttribute
]
Attribute Type Spec¶
-
class
tamr_unify_client.attribute.type.
AttributeTypeSpec
(data)[source]¶ -
static
of
(resource)[source]¶ Creates an attribute type spec from an attribute type.
- Parameters
resource (
AttributeType
) – The existing attribute type.- Returns
The corresponding attribute type spec.
- Return type
-
static
new
()[source]¶ Creates a blank spec that could be used to construct a new attribute type.
- Returns
The empty spec.
- Return type
-
to_dict
()[source]¶ Returns a version of this spec that conforms to the API representation.
- Returns
The spec’s dict.
- Return type
-
with_base_type
(new_base_type)[source]¶ Creates a new spec with the same properties, updating the base type.
- Parameters
new_base_type (str) – The new base type.
- Returns
The new spec.
- Return type
-
with_inner_type
(new_inner_type)[source]¶ Creates a new spec with the same properties, updating the inner type.
- Parameters
new_inner_type (
AttributeTypeSpec
) – The spec of the new inner type.- Returns
The new spec.
- Return type
-
with_attributes
(new_attributes)[source]¶ Creates a new spec with the same properties, updating attributes.
- Parameters
new_attributes (list[
AttributeSpec
]) – The specs of the new attributes.- Returns
The new spec.
- Return type
-
static
SubAttribute¶
-
class
tamr_unify_client.attribute.subattribute.
SubAttribute
(name, type, is_nullable, _json, description=None)[source]¶ An attribute which is itself a property of another attribute.
See https://docs.tamr.com/reference#attribute-types
- Parameters
name (
str
) – Name of sub-attributetype (
AttributeType
) – See https://docs.tamr.com/reference#attribute-typesis_nullable (
bool
) – If this sub-attribute can be null
Categorization¶
Categorization Project¶
-
class
tamr_unify_client.categorization.project.
CategorizationProject
(client, data, alias=None)[source]¶ A Categorization project in Tamr.
-
model
()[source]¶ Machine learning model for this Categorization project. Learns from verified labels and predicts categorization labels for unlabeled records.
- Returns
The machine learning model for categorization.
- Return type
-
create_taxonomy
(creation_spec)[source]¶ Creates a
Taxonomy
for this project.A taxonomy cannot already be associated with this project.
-
taxonomy
()[source]¶ Retrieves the
Taxonomy
associated with this project. If a taxonomy is not already associated with this project, callcreate_taxonomy()
first.- Returns
The project’s Taxonomy
- Return type
-
add_input_dataset
(dataset)¶ Associate a dataset with a project in Tamr.
By default, datasets are not associated with any projects. They need to be added as input to a project before they can be used as part of that project
- Parameters
dataset (
Dataset
) – The dataset to associate with the project.- Returns
HTTP response from the server
- Return type
-
as_categorization
()¶ Convert this project to a
CategorizationProject
-
as_mastering
()¶ Convert this project to a
MasteringProject
- Returns
This project.
- Return type
- Raises
-
attribute_configurations
()¶ Project’s attribute’s configurations.
- Returns
The configurations of the attributes of a project.
- Return type
-
attribute_mappings
()¶ Project’s attribute’s mappings.
- Returns
The attribute mappings of a project.
- Return type
-
property
attributes
¶ Attributes of this project.
- Returns
Attributes of this project.
- Return type
-
delete
()¶ Deletes this resource. Some resources do not support deletion, and will raise a 405 error if this is called.
- Returns
HTTP response from the server
- Return type
-
input_datasets
()¶ Retrieve a collection of this project’s input datasets.
- Returns
The project’s input datasets.
- Return type
-
remove_input_dataset
(dataset)¶ Remove a dataset from a project.
- Parameters
dataset (
Dataset
) – The dataset to be removed from this project.- Returns
HTTP response from the server
- Return type
-
spec
()¶ Returns this project’s spec.
- Returns
The spec for the project.
- Return type
-
property
type
¶ A Tamr project type, listed in https://docs.tamr.com/reference#create-a-project.
- Type
-
Categories¶
Category¶
-
class
tamr_unify_client.categorization.category.resource.
Category
(client, data, alias=None)[source]¶ A category of a taxonomy
-
parent
()[source]¶ Gets the parent Category of this one, or None if it is a tier 1 category
- Returns
The parent Category or None
- Return type
-
delete
()¶ Deletes this resource. Some resources do not support deletion, and will raise a 405 error if this is called.
- Returns
HTTP response from the server
- Return type
-
Category Spec¶
-
class
tamr_unify_client.categorization.category.resource.
CategorySpec
(client, data, api_path)[source]¶ A representation of the server view of a category.
-
static
of
(resource)[source]¶ Creates a category spec from a category.
- Parameters
resource (
Category
) – The existing category.- Returns
The corresponding category spec.
- Return type
-
static
new
()[source]¶ Creates a blank spec that could be used to construct a new category.
- Returns
The empty spec.
- Return type
-
from_data
(data)[source]¶ Creates a spec with the same client and API path as this one, but new data.
- Parameters
data (dict) – The data for the new spec.
- Returns
The new spec.
- Return type
-
to_dict
()[source]¶ Returns a version of this spec that conforms to the API representation.
- Returns
The spec’s dict.
- Return type
-
with_name
(new_name)[source]¶ Creates a new spec with the same properties, updating name.
- Parameters
new_name (str) – The new name.
- Returns
The new spec.
- Return type
-
with_description
(new_description)[source]¶ Creates a new spec with the same properties, updating description.
- Parameters
new_description (str) – The new description.
- Returns
The new spec.
- Return type
-
static
Category Collection¶
-
class
tamr_unify_client.categorization.category.collection.
CategoryCollection
(client, api_path)[source]¶ Collection of
Category
s.- Parameters
-
by_external_id
(external_id)[source]¶ Retrieve an attribute by external ID.
Since categories do not have external IDs, this method is not supported and will raise a
NotImplementedError
.- Parameters
external_id (str) – The external ID.
- Returns
The specified category, if found.
- Return type
- Raises
KeyError – If no category with the specified external_id is found
LookupError – If multiple categories with the specified external_id are found
-
stream
()[source]¶ Stream categories in this collection. Implicitly called when iterating over this collection.
- Returns
Stream of categories.
- Return type
Python generator yielding
Category
- Usage:
>>> for category in collection.stream(): # explicit >>> do_stuff(category) >>> for category in collection: # implicit >>> do_stuff(category)
-
create
(creation_spec)[source]¶ Creates a new category.
- Parameters
creation_spec (dict) – Category creation specification, formatted as specified in the Public Docs for Creating a Category.
- Returns
The newly created category.
- Return type
Taxonomy¶
-
class
tamr_unify_client.categorization.taxonomy.
Taxonomy
(client, data, alias=None)[source]¶ A project’s taxonomy
-
categories
()[source]¶ Retrieves the categories of this taxonomy.
- Returns
A collection of the taxonomy categories.
- Return type
-
delete
()¶ Deletes this resource. Some resources do not support deletion, and will raise a 405 error if this is called.
- Returns
HTTP response from the server
- Return type
-
Datasets¶
Dataset¶
-
class
tamr_unify_client.dataset.resource.
Dataset
(client, data, alias=None)[source]¶ A Tamr dataset.
-
property
attributes
¶ Attributes of this dataset.
- Returns
Attributes of this dataset.
- Return type
-
upsert_records
(records, primary_key_name, **json_args)[source]¶ Creates or updates the specified records.
- Parameters
records (iterable[dict]) – The records to update, as dictionaries.
primary_key_name (str) – The name of the primary key for these records, which must be a key in each record dictionary.
**json_args – Arguments to pass to the JSON dumps function, as documented here. Some of these, such as indent, may not work with Tamr.
- Returns
JSON response body from the server.
- Return type
-
delete_records_by_id
(record_ids)[source]¶ Deletes the specified records.
- Parameters
record_ids (iterable) – The IDs of the records to delete.
- Returns
JSON response body from the server.
- Return type
-
delete_all_records
()[source]¶ Removes all records from the dataset.
- Returns
HTTP response from the server
- Return type
-
refresh
(**options)[source]¶ Brings dataset up-to-date if needed, taking whatever actions are required.
- Parameters
**options – Options passed to underlying
Operation
. Seeapply_options()
.- Returns
The refresh operation.
- Return type
-
profile
()[source]¶ Returns profile information for a dataset.
If profile information has not been generated, call create_profile() first. If the returned profile information is out-of-date, you can call refresh() on the returned object to bring it up-to-date.
- Returns
Dataset Profile information.
- Return type
-
create_profile
(**options)[source]¶ Create a profile for this dataset.
If a profile already exists, the existing profile will be brought up to date.
- Parameters
**options – Options passed to underlying
Operation
. Seeapply_options()
.- Returns
The operation to create the profile.
- Return type
-
records
()[source]¶ Stream this dataset’s records as Python dictionaries.
- Returns
Stream of records.
- Return type
Python generator yielding
dict
-
status
()[source]¶ Retrieve this dataset’s streamability status.
- Returns
Dataset streamability status.
- Return type
-
usage
()[source]¶ Retrieve this dataset’s usage by recipes and downstream datasets.
- Returns
The dataset’s usage.
- Return type
-
from_geo_features
(features, geo_attr=None)[source]¶ Upsert this dataset from a geospatial FeatureCollection or iterable of Features.
features can be:
An object that implements
__geo_interface__
as a FeatureCollection (see https://gist.github.com/sgillies/2217756)An iterable of features, where each element is a feature dictionary or an object that implements the
__geo_interface__
as a FeatureA map where the “features” key contains an iterable of features
See: geopandas.GeoDataFrame.from_features()
If geo_attr is provided, then the named Tamr attribute will be used for the geometry. If geo_attr is not provided, then the first attribute on the dataset with geometry type will be used for the geometry.
- Parameters
features – geospatial features
geo_attr (str) – (optional) name of the Tamr attribute to use for the feature’s geometry
-
upstream_datasets
()[source]¶ The Dataset’s upstream datasets.
API returns the URIs of the upstream datasets, resulting in a list of DatasetURIs, not actual Datasets.
- Returns
A list of the Dataset’s upstream datasets.
- Return type
list[
DatasetURI
]
-
delete
(cascade=False)[source]¶ Deletes this dataset, optionally deleting all derived datasets as well.
- Parameters
cascade (bool) – Whether to delete all datasets derived from this one. Optional, default is False. Do not use this option unless you are certain you need it as it can have unindended consequences.
- Returns
HTTP response from the server
- Return type
-
itergeofeatures
(geo_attr=None)[source]¶ Returns an iterator that yields feature dictionaries that comply with __geo_interface__
See https://gist.github.com/sgillies/2217756
- Parameters
geo_attr (str) – (optional) name of the Tamr attribute to use for the feature’s geometry
- Returns
stream of features
- Return type
Python generator yielding
dict[str, object]
-
property
Dataset Spec¶
-
class
tamr_unify_client.dataset.resource.
DatasetSpec
(client, data, api_path)[source]¶ A representation of the server view of a dataset.
-
static
of
(resource)[source]¶ Creates a dataset spec from a dataset.
- Parameters
resource (
Dataset
) – The existing dataset.- Returns
The corresponding dataset spec.
- Return type
-
static
new
()[source]¶ Creates a blank spec that could be used to construct a new dataset.
- Returns
The empty spec.
- Return type
-
from_data
(data)[source]¶ Creates a spec with the same client and API path as this one, but new data.
- Parameters
data (dict) – The data for the new spec.
- Returns
The new spec.
- Return type
-
to_dict
()[source]¶ Returns a version of this spec that conforms to the API representation.
- Returns
The spec’s dict.
- Return type
-
with_name
(new_name)[source]¶ Creates a new spec with the same properties, updating name.
- Parameters
new_name (str) – The new name.
- Returns
A new spec.
- Return type
-
with_external_id
(new_external_id)[source]¶ Creates a new spec with the same properties, updating external ID.
- Parameters
new_external_id (str) – The new external ID.
- Returns
A new spec.
- Return type
-
with_description
(new_description)[source]¶ Creates a new spec with the same properties, updating description.
- Parameters
new_description (str) – The new description.
- Returns
A new spec.
- Return type
-
with_key_attribute_names
(new_key_attribute_names)[source]¶ Creates a new spec with the same properties, updating key attribute names.
Creates a new spec with the same properties, updating tags.
- Parameters
- Returns
A new spec.
- Return type
-
static
Dataset Collection¶
-
class
tamr_unify_client.dataset.collection.
DatasetCollection
(client, api_path='datasets')[source]¶ Collection of
Dataset
s.- Parameters
-
by_external_id
(external_id)[source]¶ Retrieve a dataset by external ID.
- Parameters
external_id (str) – The external ID.
- Returns
The specified dataset, if found.
- Return type
- Raises
KeyError – If no dataset with the specified external_id is found
LookupError – If multiple datasets with the specified external_id are found
-
stream
()[source]¶ Stream datasets in this collection. Implicitly called when iterating over this collection.
- Returns
Stream of datasets.
- Return type
Python generator yielding
Dataset
- Usage:
>>> for dataset in collection.stream(): # explicit >>> do_stuff(dataset) >>> for dataset in collection: # implicit >>> do_stuff(dataset)
-
delete_by_resource_id
(resource_id, cascade=False)[source]¶ Deletes a dataset from this collection by resource_id. Optionally deletes all derived datasets as well.
- Parameters
- Returns
HTTP response from the server.
- Return type
-
create
(creation_spec)[source]¶ Create a Dataset in Tamr
- Parameters
creation_spec (dict[str, str]) – Dataset creation specification should be formatted as specified in the Public Docs for Creating a Dataset.
- Returns
The created Dataset
- Return type
-
create_from_dataframe
(df, primary_key_name, dataset_name, ignore_nan=True)[source]¶ Creates a dataset in this collection with the given name, creates an attribute for each column in the df (with primary_key_name as the key attribute), and upserts a record for each row of df.
Each attribute has the default type ARRAY[STRING], besides the key attribute, which will have type STRING.
This function attempts to ensure atomicity, but it is not guaranteed. If an error occurs while creating attributes or records, an attempt will be made to delete the dataset that was created. However, if this request errors, it will not try again.
- Parameters
df (
pandas.DataFrame
) – The data to create the dataset with.primary_key_name (str) – The name of the primary key of the dataset. Must be a column of df.
dataset_name (str) – What to name the dataset in Tamr. There cannot already be a dataset with this name.
ignore_nan (bool) – Whether to convert NaN values to null before upserting records to Tamr. If False and NaN is in df, this function will fail. Optional, default is True.
- Returns
The newly created dataset.
- Return type
- Raises
KeyError – If primary_key_name is not a column in df.
CreationError – If a step in creating the dataset fails.
-
class
tamr_unify_client.dataset.collection.
CreationError
(error_message)[source]¶ An error from
create_from_dataframe()
-
with_traceback
()¶ Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.
-
Dataset Profile¶
-
class
tamr_unify_client.dataset.profile.
DatasetProfile
(client, data, alias=None)[source]¶ Profile info of a Tamr dataset.
-
property
relative_dataset_id
¶ The relative dataset ID of the associated dataset.
-
refresh
(**options)[source]¶ Updates the dataset profile if needed.
The dataset profile is updated on the server; you will need to call
profile()
to retrieve the updated profile.- Parameters
**options – Options passed to underlying
Operation
. Seeapply_options()
.- Returns
The refresh operation.
- Return type
-
delete
()¶ Deletes this resource. Some resources do not support deletion, and will raise a 405 error if this is called.
- Returns
HTTP response from the server
- Return type
-
property
Dataset Status¶
-
class
tamr_unify_client.dataset.status.
DatasetStatus
(client, data, alias=None)[source]¶ Streamability status of a Tamr dataset.
-
property
relative_dataset_id
¶ The relative dataset ID of the associated dataset.
-
property
is_streamable
¶ Whether the associated dataset is available to be streamed.
-
delete
()¶ Deletes this resource. Some resources do not support deletion, and will raise a 405 error if this is called.
- Returns
HTTP response from the server
- Return type
-
property
Dataset URI¶
Dataset Usage¶
-
class
tamr_unify_client.dataset.usage.
DatasetUsage
(client, data, alias=None)[source]¶ The usage of a dataset and its downstream dependencies.
See https://docs.tamr.com/reference#retrieve-downstream-dataset-usage
-
property
usage
¶ - Type
-
property
dependencies
¶ - Type
list[
DatasetUse
]
-
delete
()¶ Deletes this resource. Some resources do not support deletion, and will raise a 405 error if this is called.
- Returns
HTTP response from the server
- Return type
-
property
Dataset Use¶
-
class
tamr_unify_client.dataset.use.
DatasetUse
(client, data)[source]¶ The use of a dataset in project steps. This is not a BaseResource because it has no API path and cannot be directly retrieved or modified.
See https://docs.tamr.com/reference#retrieve-downstream-dataset-usage
- Parameters
-
property
input_to_project_steps
¶ - Type
list[
ProjectStep
]
-
property
output_from_project_steps
¶ - Type
list[
ProjectStep
]
Machine Learning Model¶
-
class
tamr_unify_client.base_model.
MachineLearningModel
(client, data, alias=None)[source]¶ A Tamr Machine Learning model.
-
train
(**options)[source]¶ Learn from verified labels.
- Parameters
**options – Options passed to underlying
Operation
. Seeapply_options()
.- Returns
The resultant operation.
- Return type
-
predict
(**options)[source]¶ Suggest labels for unverified records.
- Parameters
**options – Options passed to underlying
Operation
. Seeapply_options()
.- Returns
The resultant operation.
- Return type
-
delete
()¶ Deletes this resource. Some resources do not support deletion, and will raise a 405 error if this is called.
- Returns
HTTP response from the server
- Return type
-
Mastering¶
Binning Model¶
-
class
tamr_unify_client.mastering.binning_model.
BinningModel
(client, data, alias=None)[source]¶ A binning model object.
-
records
()[source]¶ Stream this object’s records as Python dictionaries.
- Returns
Stream of records.
- Return type
Python generator yielding
dict
-
update_records
(records)[source]¶ Send a batch of record creations/updates/deletions to this dataset.
- Parameters
records (iterable[dict]) – Each record should be formatted as specified in the Public Docs for Dataset updates.
- Returns
JSON response body from server.
- Return type
-
delete
()¶ Deletes this resource. Some resources do not support deletion, and will raise a 405 error if this is called.
- Returns
HTTP response from the server
- Return type
-
Estimated Pair Counts¶
-
class
tamr_unify_client.mastering.estimated_pair_counts.
EstimatedPairCounts
(client, data, alias=None)[source]¶ Estimated Pair Counts info for Mastering Project
-
property
is_up_to_date
¶ Whether an estimate pairs job has been run since the last edit to the binning model.
- Return type
-
property
total_estimate
¶ The total number of estimated candidate pairs and generated pairs for the model across all clauses.
-
property
clause_estimates
¶ The estimated candidate pair count and generated pair count for each clause in the model.
- Returns
A dictionary containing each clause name mapped to a dictionary containing the corresponding estimated candidate and generated pair counts. For example:
{
“Clause1”: {
“candidatePairCount”: “321”,
”generatedPairCount”: “123”
},
”Clause2”: {
“candidatePairCount”: “654”,
”generatedPairCount”: “456”
}
}
- Return type
-
refresh
(**options)[source]¶ Updates the estimated pair counts if needed.
The pair count estimates are updated on the server; you will need to call
estimate_pairs()
to retrieve the updated estimate.- Parameters
**options – Options passed to underlying
Operation
. Seeapply_options()
.- Returns
The refresh operation.
- Return type
-
delete
()¶ Deletes this resource. Some resources do not support deletion, and will raise a 405 error if this is called.
- Returns
HTTP response from the server
- Return type
-
property
Mastering Project¶
-
class
tamr_unify_client.mastering.project.
MasteringProject
(client, data, alias=None)[source]¶ A Mastering project in Tamr.
-
pairs
()[source]¶ Record pairs generated by Tamr’s binning model. Pairs are displayed on the “Pairs” page in the Tamr UI.
Call
refresh()
from this dataset to regenerate pairs according to the latest binning model.- Returns
The record pairs represented as a dataset.
- Return type
-
pair_matching_model
()[source]¶ Machine learning model for pair-matching for this Mastering project. Learns from verified labels and predicts categorization labels for unlabeled pairs.
Calling
predict()
from this dataset will produce new (unpublished) clusters. These clusters are displayed on the “Clusters” page in the Tamr UI.- Returns
The machine learning model for pair-matching.
- Return type
-
high_impact_pairs
()[source]¶ High-impact pairs as a dataset. Tamr labels pairs as “high-impact” if labeling these pairs would help it learn most quickly (i.e. “Active learning”).
High-impact pairs are displayed with a ⚡ lightning bolt icon on the “Pairs” page in the Tamr UI.
Call
refresh()
from this dataset to produce new high-impact pairs according to the latest pair-matching model.- Returns
The high-impact pairs represented as a dataset.
- Return type
-
record_clusters
()[source]¶ Record Clusters as a dataset. Tamr clusters labeled pairs using pairs model. These clusters populate the cluster review page and get transient cluster ids, rather than published cluster ids (i.e., “Permanent Ids”)
Call
refresh()
from this dataset to generate clusters based on to the latest pair-matching model.- Returns
The record clusters represented as a dataset.
- Return type
-
published_clusters
()[source]¶ Published record clusters generated by Tamr’s pair-matching model.
- Returns
The published clusters represented as a dataset.
- Return type
-
published_clusters_configuration
()[source]¶ Retrieves published clusters configuration for this project.
- Returns
The published clusters configuration
- Return type
-
published_cluster_ids
()[source]¶ Retrieves published cluster IDs for this project.
- Returns
The published cluster ID dataset.
- Return type
-
published_cluster_stats
()[source]¶ Retrieves published cluster stats for this project.
- Returns
The published cluster stats dataset.
- Return type
-
published_cluster_versions
(cluster_ids)[source]¶ Retrieves version information for the specified published clusters. See https://docs.tamr.com/reference#retrieve-published-clusters-given-cluster-ids.
- Parameters
cluster_ids (iterable[str]) – The persistent IDs of the clusters to get version information for.
- Returns
A stream of the published clusters.
- Return type
Python generator yielding
PublishedCluster
-
record_published_cluster_versions
(record_ids)[source]¶ Retrieves version information for the published clusters of the given records. See https://docs.tamr.com/reference#retrieve-published-clusters-given-record-ids.
- Parameters
record_ids (iterable[str]) – The Tamr IDs of the records to get cluster version information for.
- Returns
A stream of the relevant published clusters.
- Return type
Python generator yielding
RecordPublishedCluster
-
estimate_pairs
()[source]¶ Returns pair estimate information for a mastering project
- Returns
Pairs Estimate information.
- Return type
-
record_clusters_with_data
()[source]¶ Project’s unified dataset with associated clusters.
- Returns
The record clusters with data represented as a dataset
- Return type
-
published_clusters_with_data
()[source]¶ Project’s unified dataset with associated clusters.
- Returns
The published clusters with data represented as a dataset
- Return type
-
binning_model
()[source]¶ Binning model for this project.
- Returns
Binning model for this project.
- Return type
-
add_input_dataset
(dataset)¶ Associate a dataset with a project in Tamr.
By default, datasets are not associated with any projects. They need to be added as input to a project before they can be used as part of that project
- Parameters
dataset (
Dataset
) – The dataset to associate with the project.- Returns
HTTP response from the server
- Return type
-
as_categorization
()¶ Convert this project to a
CategorizationProject
-
as_mastering
()¶ Convert this project to a
MasteringProject
- Returns
This project.
- Return type
- Raises
-
attribute_configurations
()¶ Project’s attribute’s configurations.
- Returns
The configurations of the attributes of a project.
- Return type
-
attribute_mappings
()¶ Project’s attribute’s mappings.
- Returns
The attribute mappings of a project.
- Return type
-
property
attributes
¶ Attributes of this project.
- Returns
Attributes of this project.
- Return type
-
delete
()¶ Deletes this resource. Some resources do not support deletion, and will raise a 405 error if this is called.
- Returns
HTTP response from the server
- Return type
-
input_datasets
()¶ Retrieve a collection of this project’s input datasets.
- Returns
The project’s input datasets.
- Return type
-
remove_input_dataset
(dataset)¶ Remove a dataset from a project.
- Parameters
dataset (
Dataset
) – The dataset to be removed from this project.- Returns
HTTP response from the server
- Return type
-
spec
()¶ Returns this project’s spec.
- Returns
The spec for the project.
- Return type
-
property
type
¶ A Tamr project type, listed in https://docs.tamr.com/reference#create-a-project.
- Type
-
Published Clusters¶
Metric¶
Published Cluster¶
-
class
tamr_unify_client.mastering.published_cluster.resource.
PublishedCluster
(data)[source]¶ A representation of a published cluster in a mastering project with version information. See https://docs.tamr.com/reference#retrieve-published-clusters-given-cluster-ids.
This is not a BaseResource because it does not have its own API endpoint.
- Parameters
data – The JSON entity representing this
PublishedCluster
.
-
property
versions
¶ - Type
list[
PublishedClusterVersion
]
Published Cluster Configuration¶
-
class
tamr_unify_client.mastering.published_cluster.configuration.
PublishedClustersConfiguration
(client, data, alias=None)[source]¶ The configuration of published clusters in a project.
See https://docs.tamr.com/reference#the-published-clusters-configuration-object
-
spec
()[source]¶ Returns a spec representation of this published cluster configuration.
- Returns
The published cluster configuration spec.
- Return type
:class`~tamr_unify_client.mastering.published_cluster.configuration.PublishedClustersConfigurationSpec`
-
delete
()¶ Deletes this resource. Some resources do not support deletion, and will raise a 405 error if this is called.
- Returns
HTTP response from the server
- Return type
-
Published Cluster Version¶
Record Published Cluster¶
-
class
tamr_unify_client.mastering.published_cluster.record.
RecordPublishedCluster
(data)[source]¶ A representation of a published cluster of a record in a mastering project with version information. See https://docs.tamr.com/reference#retrieve-published-clusters-given-record-ids.
This is not a BaseResource because it does not have its own API endpoint.
- Parameters
data – The JSON entity representing this
RecordPublishedCluster
.
-
property
versions
¶ - Type
Record Published Cluster Version¶
-
class
tamr_unify_client.mastering.published_cluster.record_version.
RecordPublishedClusterVersion
(data)[source]¶ A version of a published cluster in a mastering project.
This is not a BaseResource because it does not have its own API endpoint.
- Parameters
data – The JSON entity representing this version.
Operation¶
-
class
tamr_unify_client.operation.
Operation
(client, data, alias=None)[source]¶ A long-running operation performed by Tamr. Operations appear on the “Jobs” page of the Tamr UI.
By design, client-side operations represent server-side operations at a particular point in time (namely, when the operation was fetched from the server). In other words: Operations will not pick up on server-side changes automatically. To get an up-to-date representation, refetch the operation e.g.
op = op.poll()
.-
classmethod
from_response
(client, response)[source]¶ Handle idiosyncrasies in constructing Operations from Tamr responses. When a Tamr API call would start an operation, but all results that would be produced by that operation are already up-to-date, Tamr returns HTTP 204 No Content
To make it easy for client code to handle these API responses without checking the response code, this method will either construct an Operation, or a dummy NoOp operation representing the 204 Success response.
- Parameters
client (
Client
) – Delegate underlying API calls to this client.response (
requests.Response
) – HTTP Response from the request that started the operation.
- Returns
Operation
- Return type
-
apply_options
(asynchronous=False, **options)[source]¶ Applies operation options to this operation.
NOTE: This function should not be called directly. Rather, options should be passed in through a higher-level function e.g.
refresh()
.
-
property
state
¶ Server-side state of this operation.
Operation state can be unresolved (i.e.
state
is one of:'PENDING'
,'RUNNING'
), or resolved (i.e. state is one of:'CANCELED'
,'SUCCEEDED'
,'FAILED'
). Unless opting into asynchronous mode, all exposed operations should be resolved.Note: you only need to manually pick up server-side changes when opting into asynchronous mode when kicking off this operation.
- Usage:
>>> op.state # operation is currently 'PENDING' 'PENDING' >>> op.wait() # continually polls until operation resolves >>> op.state # incorrect usage; operation object state never changes. 'PENDING' >>> op = op.poll() # correct usage; use value returned by Operation.poll or Operation.wait >>> op.state 'SUCCEEDED'
-
poll
()[source]¶ Poll this operation for server-side updates.
Does not update the calling
Operation
object. Instead, returns a newOperation
.- Returns
Updated representation of this operation.
- Return type
-
wait
(poll_interval_seconds=3, timeout_seconds=None)[source]¶ Continuously polls for this operation’s server-side state.
- Parameters
- Raises
TimeoutError – If operation takes longer than timeout_seconds to resolve.
- Returns
Resolved operation.
- Return type
-
succeeded
()[source]¶ Convenience method for checking if operation was successful.
- Returns
True
if operation’s state is'SUCCEEDED'
,False
otherwise.- Return type
-
delete
()¶ Deletes this resource. Some resources do not support deletion, and will raise a 405 error if this is called.
- Returns
HTTP response from the server
- Return type
-
classmethod
Projects¶
Attribute Configurations¶
Attribute Configuration¶
-
class
tamr_unify_client.project.attribute_configuration.resource.
AttributeConfiguration
(client, data, alias=None)[source]¶ The configurations of Tamr Attributes.
See https://docs.tamr.com/reference#the-attribute-configuration-object
-
spec
()[source]¶ Returns this attribute configuration’s spec.
- Returns
The spec of this attribute configuration.
- Return type
-
delete
()¶ Deletes this resource. Some resources do not support deletion, and will raise a 405 error if this is called.
- Returns
HTTP response from the server
- Return type
-
Attribute Configuration Spec¶
-
class
tamr_unify_client.project.attribute_configuration.resource.
AttributeConfigurationSpec
(client, data, api_path)[source]¶ A representation of the server view of an attribute configuration.
-
static
of
(resource)[source]¶ Creates an attribute configuration spec from an attribute configuration.
- Parameters
resource (
AttributeConfiguration
) – The existing attribute configuration.- Returns
The corresponding attribute creation spec.
- Return type
-
static
new
()[source]¶ Creates a blank spec that could be used to construct a new attribute configuration.
- Returns
The empty spec.
- Return type
-
from_data
(data)[source]¶ Creates a spec with the same client and API path as this one, but new data.
- Parameters
data (dict) – The data for the new spec.
- Returns
The new spec.
- Return type
-
to_dict
()[source]¶ Returns a version of this spec that conforms to the API representation.
- Returns
The spec’s dict.
- Return type
-
with_attribute_role
(new_attribute_role)[source]¶ Creates a new spec with the same properties, updating attribute role.
- Parameters
new_attribute_role (str) – The new attribute role.
- Returns
A new spec.
- Return type
-
with_similarity_function
(new_similarity_function)[source]¶ Creates a new spec with the same properties, updating similarity function.
- Parameters
new_similarity_function (str) – The new similarity function.
- Returns
A new spec.
- Return type
-
with_enabled_for_ml
(new_enabled_for_ml)[source]¶ Creates a new spec with the same properties, updating enabled for ML.
- Parameters
new_enabled_for_ml (bool) – Whether the builder is enabled for ML.
- Returns
A new spec.
- Return type
-
with_tokenizer
(new_tokenizer)[source]¶ Creates a new spec with the same properties, updating tokenizer.
- Parameters
new_tokenizer (str) – The new tokenizer.
- Returns
A new spec.
- Return type
-
with_numeric_field_resolution
(new_numeric_field_resolution)[source]¶ Creates a new spec with the same properties, updating numeric field resolution.
- Parameters
new_numeric_field_resolution (str) – The new numeric field resolution.
- Returns
A new spec.
- Return type
-
with_attribute_name
(new_attribute_name)[source]¶ Creates a new spec with the same properties, updating new attribute name.
- Parameters
new_attribute_name (str) – The new attribute name.
- Returns
A new spec.
- Return type
-
static
Attribute Configuration Collection¶
-
class
tamr_unify_client.project.attribute_configuration.collection.
AttributeConfigurationCollection
(client, api_path)[source]¶ Collection of
AttributeConfiguration
- Parameters
-
by_resource_id
(resource_id)[source]¶ Retrieve an attribute configuration by resource ID.
- Parameters
resource_id (str) – The resource ID.
- Returns
The specified attribute configuration.
- Return type
-
by_relative_id
(relative_id)[source]¶ Retrieve an attribute configuration by relative ID.
- Parameters
relative_id (str) – The relative ID.
- Returns
The specified attribute configuration.
- Return type
-
by_external_id
(external_id)[source]¶ Retrieve an attribute configuration by external ID.
Since attributes do not have external IDs, this method is not supported and will raise a
NotImplementedError
.- Parameters
external_id (str) – The external ID.
- Returns
The specified attribute, if found.
- Return type
- Raises
KeyError – If no attribute with the specified external_id is found
LookupError – If multiple attributes with the specified external_id are found
NotImplementedError – AttributeConfiguration does not support external_id
-
stream
()[source]¶ Stream attribute configurations in this collection. Implicitly called when iterating over this collection.
- Returns
Stream of attribute configurations.
- Return type
Python generator yielding
AttributeConfiguration
- Usage:
>>> for attributeConfiguration in collection.stream(): # explicit >>> do_stuff(attributeConfiguration) >>> for attributeConfiguration in collection: # implicit >>> do_stuff(attributeConfiguration)
-
create
(creation_spec)[source]¶ Create an Attribute configuration in this collection
- Parameters
creation_spec (dict[str, str]) – Attribute configuration creation specification should be formatted as specified in the Public Docs for adding an AttributeConfiguration.
- Returns
The created Attribute configuration
- Return type
Attribute Mappings¶
Attribute Mapping¶
-
class
tamr_unify_client.project.attribute_mapping.resource.
AttributeMapping
(client, data)[source]¶ see https://docs.tamr.com/reference#retrieve-projects-mappings AttributeMapping and AttributeMappingCollection do not inherit from BaseResource and BaseCollection. BC and BR require a specific URL for each individual attribute mapping (ex: /projects/1/attributeMappings/1), but these types of URLs do not exist for attribute mappings
-
spec
()[source]¶ Returns a spec representation of this attribute mapping.
- Returns
The attribute mapping spec.
- Return type
-
Attribute Mapping Spec¶
-
class
tamr_unify_client.project.attribute_mapping.resource.
AttributeMappingSpec
(data)[source]¶ A representation of the server view of an attribute mapping
-
static
of
(resource)[source]¶ Creates an attribute mapping spec from a attribute mapping.
- Parameters
resource (
AttributeMapping
) – The existing attribute mapping.- Returns
The corresponding attribute mapping spec.
- Return type
-
static
new
()[source]¶ Creates a blank spec that could be used to construct a new attribute mapping.
- Returns
The empty spec.
- Return type
-
to_dict
()[source]¶ Returns a version of this spec that conforms to the API representation.
- Returns
The spec’s dict.
- Return type
-
with_input_attribute_id
(new_input_attribute_id)[source]¶ Creates a new spec with the same properties, updating the input attribute id.
- Parameters
new_input_attribute_id (str) – The new input attribute id.
- Returns
The new spec.
- Return type
-
with_relative_input_attribute_id
(new_relative_input_attribute_id)[source]¶ Creates a new spec with the same properties, updating the relative input attribute id.
- Parameters
new_relative_input_attribute_id (str) – The new relative input attribute Id.
- Returns
The new spec.
- Return type
-
with_input_dataset_name
(new_input_dataset_name)[source]¶ Creates a new spec with the same properties, updating the input dataset name.
- Parameters
new_input_dataset_name (str) – The new input dataset name.
- Returns
The new spec.
- Return type
-
with_input_attribute_name
(new_input_attribute_name)[source]¶ Creates a new spec with the same properties, updating the input attribute name.
- Parameters
new_input_attribute_name (str) – The new input attribute name.
- Returns
The new spec.
- Return type
-
with_unified_attribute_id
(new_unified_attribute_id)[source]¶ Creates a new spec with the same properties, updating the unified attribute id.
- Parameters
new_unified_attribute_id (str) – The new unified attribute id.
- Returns
The new spec.
- Return type
-
with_relative_unified_attribute_id
(new_relative_unified_attribute_id)[source]¶ Creates a new spec with the same properties, updating the relative unified attribute id.
- Parameters
new_relative_unified_attribute_id (str) – The new relative unified attribute id.
- Returns
The new spec.
- Return type
-
with_unified_dataset_name
(new_unified_dataset_name)[source]¶ Creates a new spec with the same properties, updating the unified dataset name.
- Parameters
new_unified_dataset_name (str) – The new unified dataset name.
- Returns
The new spec.
- Return type
-
static
Attribute Mapping Collection¶
-
class
tamr_unify_client.project.attribute_mapping.collection.
AttributeMappingCollection
(client, api_path)[source]¶ Collection of
AttributeMapping
- Parameters
-
stream
()[source]¶ Stream attribute mappings in this collection. Implicitly called when iterating over this collection.
- Returns
Stream of attribute mappings.
- Return type
Python generator yielding
AttributeMapping
-
by_resource_id
(resource_id)[source]¶ Retrieve an item in this collection by resource ID.
- Parameters
resource_id (str) – The resource ID.
- Returns
The specified attribute mapping.
- Return type
-
by_relative_id
(relative_id)[source]¶ Retrieve an item in this collection by relative ID.
- Parameters
relative_id (str) – The relative ID.
- Returns
The specified attribute mapping.
- Return type
-
create
(creation_spec)[source]¶ Create an Attribute mapping in this collection
- Parameters
creation_spec (dict[str, str]) – Attribute mapping creation specification should be formatted as specified in the Public Docs for adding an AttributeMapping.
- Returns
The created Attribute mapping
- Return type
Project¶
-
class
tamr_unify_client.project.resource.
Project
(client, data, alias=None)[source]¶ A Tamr project.
-
property
type
¶ A Tamr project type, listed in https://docs.tamr.com/reference#create-a-project.
- Type
-
property
attributes
¶ Attributes of this project.
- Returns
Attributes of this project.
- Return type
-
unified_dataset
()[source]¶ Unified dataset for this project.
- Returns
Unified dataset for this project.
- Return type
-
as_categorization
()[source]¶ Convert this project to a
CategorizationProject
-
as_mastering
()[source]¶ Convert this project to a
MasteringProject
- Returns
This project.
- Return type
- Raises
-
add_input_dataset
(dataset)[source]¶ Associate a dataset with a project in Tamr.
By default, datasets are not associated with any projects. They need to be added as input to a project before they can be used as part of that project
- Parameters
dataset (
Dataset
) – The dataset to associate with the project.- Returns
HTTP response from the server
- Return type
-
remove_input_dataset
(dataset)[source]¶ Remove a dataset from a project.
- Parameters
dataset (
Dataset
) – The dataset to be removed from this project.- Returns
HTTP response from the server
- Return type
-
input_datasets
()[source]¶ Retrieve a collection of this project’s input datasets.
- Returns
The project’s input datasets.
- Return type
-
attribute_configurations
()[source]¶ Project’s attribute’s configurations.
- Returns
The configurations of the attributes of a project.
- Return type
-
attribute_mappings
()[source]¶ Project’s attribute’s mappings.
- Returns
The attribute mappings of a project.
- Return type
-
delete
()¶ Deletes this resource. Some resources do not support deletion, and will raise a 405 error if this is called.
- Returns
HTTP response from the server
- Return type
-
property
Project Spec¶
-
class
tamr_unify_client.project.resource.
ProjectSpec
(client, data, api_path)[source]¶ A representation of the server view of a project.
-
static
of
(resource)[source]¶ Creates a project spec from a project.
- Parameters
resource (
Project
) – The existing project.- Returns
The corresponding project spec.
- Return type
-
static
new
()[source]¶ Creates a blank spec that could be used to construct a new project.
- Returns
The empty spec.
- Return type
-
from_data
(data)[source]¶ Creates a spec with the same client and API path as this one, but new data.
- Parameters
data (dict) – The data for the new spec.
- Returns
The new spec.
- Return type
-
to_dict
()[source]¶ Returns a version of this spec that conforms to the API representation.
- Returns
The spec’s dict.
- Return type
-
with_name
(new_name)[source]¶ Creates a new spec with the same properties, updating name.
- Parameters
new_name (str) – The new name.
- Returns
The new spec.
- Return type
-
with_description
(new_description)[source]¶ Creates a new spec with the same properties, updating description.
- Parameters
new_description (str) – The new description.
- Returns
The new spec.
- Return type
-
with_type
(new_type)[source]¶ Creates a new spec with the same properties, updating type.
- Parameters
new_type (str) – The new type.
- Returns
The new spec.
- Return type
-
with_external_id
(new_external_id)[source]¶ Creates a new spec with the same properties, updating external ID.
- Parameters
new_external_id (str) – The new external ID.
- Returns
The new spec.
- Return type
-
static
Project Collection¶
-
class
tamr_unify_client.project.collection.
ProjectCollection
(client, api_path='projects')[source]¶ Collection of
Project
s.- Parameters
-
by_external_id
(external_id)[source]¶ Retrieve a project by external ID.
- Parameters
external_id (str) – The external ID.
- Returns
The specified project, if found.
- Return type
- Raises
KeyError – If no project with the specified external_id is found
LookupError – If multiple projects with the specified external_id are found
-
stream
()[source]¶ Stream projects in this collection. Implicitly called when iterating over this collection.
- Returns
Stream of projects.
- Return type
Python generator yielding
Project
- Usage:
>>> for project in collection.stream(): # explicit >>> do_stuff(project) >>> for project in collection: # implicit >>> do_stuff(project)
-
create
(creation_spec)[source]¶ Create a Project in Tamr
- Parameters
creation_spec (dict[str, str]) – Project creation specification should be formatted as specified in the Public Docs for Creating a Project.
- Returns
The created Project
- Return type
Project Step¶
-
class
tamr_unify_client.project.step.
ProjectStep
(client, data)[source]¶ A step of a Tamr project. This is not a BaseResource because it has no API path and cannot be directly retrieved or modified.
See https://docs.tamr.com/reference#retrieve-downstream-dataset-usage
- Parameters
-
property
type
¶ A Tamr project type, listed in https://docs.tamr.com/reference#create-a-project.
- Type