Mastering

Binning Model

class tamr_unify_client.mastering.binning_model.BinningModel(client, data, alias=None)[source]

A binning model object.

records()[source]

Stream this object’s records as Python dictionaries.

Returns

Stream of records.

Return type

Python generator yielding dict

update_records(records)[source]

Send a batch of record creations/updates/deletions to this dataset.

Parameters

records (iterable[dict]) – Each record should be formatted as specified in the Public Docs for Dataset updates.

Returns

JSON response body from server.

Return type

dict

delete()

Deletes this resource. Some resources do not support deletion, and will raise a 405 error if this is called.

Returns

HTTP response from the server

Return type

requests.Response

property relative_id

str

Type

type

property resource_id

str

Type

type

Estimated Pair Counts

class tamr_unify_client.mastering.estimated_pair_counts.EstimatedPairCounts(client, data, alias=None)[source]

Estimated Pair Counts info for Mastering Project

property is_up_to_date

Whether an estimate pairs job has been run since the last edit to the binning model.

Return type

bool

property total_estimate

The total number of estimated candidate pairs and generated pairs for the model across all clauses.

Returns

A dictionary containing candidate pairs and estimated pairs mapped to their corresponding estimated counts. For example:

{

“candidatePairCount”: “54321”,

”generatedPairCount”: “12345”

}

Return type

dict[str, str]

property clause_estimates

The estimated candidate pair count and generated pair count for each clause in the model.

Returns

A dictionary containing each clause name mapped to a dictionary containing the corresponding estimated candidate and generated pair counts. For example:

{

“Clause1”: {

“candidatePairCount”: “321”,

”generatedPairCount”: “123”

},

”Clause2”: {

“candidatePairCount”: “654”,

”generatedPairCount”: “456”

}

}

Return type

dict[str, dict[str, str]]

refresh(**options)[source]

Updates the estimated pair counts if needed.

The pair count estimates are updated on the server; you will need to call estimate_pairs() to retrieve the updated estimate.

Parameters

**options – Options passed to underlying Operation . See apply_options() .

Returns

The refresh operation.

Return type

Operation

delete()

Deletes this resource. Some resources do not support deletion, and will raise a 405 error if this is called.

Returns

HTTP response from the server

Return type

requests.Response

property relative_id

str

Type

type

property resource_id

str

Type

type

Mastering Project

class tamr_unify_client.mastering.project.MasteringProject(client, data, alias=None)[source]

A Mastering project in Tamr.

pairs()[source]

Record pairs generated by Tamr’s binning model. Pairs are displayed on the “Pairs” page in the Tamr UI.

Call refresh() from this dataset to regenerate pairs according to the latest binning model.

Returns

The record pairs represented as a dataset.

Return type

Dataset

pair_matching_model()[source]

Machine learning model for pair-matching for this Mastering project. Learns from verified labels and predicts categorization labels for unlabeled pairs.

Calling predict() from this dataset will produce new (unpublished) clusters. These clusters are displayed on the “Clusters” page in the Tamr UI.

Returns

The machine learning model for pair-matching.

Return type

MachineLearningModel

high_impact_pairs()[source]

High-impact pairs as a dataset. Tamr labels pairs as “high-impact” if labeling these pairs would help it learn most quickly (i.e. “Active learning”).

High-impact pairs are displayed with a ⚡ lightning bolt icon on the “Pairs” page in the Tamr UI.

Call refresh() from this dataset to produce new high-impact pairs according to the latest pair-matching model.

Returns

The high-impact pairs represented as a dataset.

Return type

Dataset

record_clusters()[source]

Record Clusters as a dataset. Tamr clusters labeled pairs using pairs model. These clusters populate the cluster review page and get transient cluster ids, rather than published cluster ids (i.e., “Permanent Ids”)

Call refresh() from this dataset to generate clusters based on to the latest pair-matching model.

Returns

The record clusters represented as a dataset.

Return type

Dataset

published_clusters()[source]

Published record clusters generated by Tamr’s pair-matching model.

Returns

The published clusters represented as a dataset.

Return type

Dataset

published_clusters_configuration()[source]

Retrieves published clusters configuration for this project.

Returns

The published clusters configuration

Return type

PublishedClustersConfiguration

published_cluster_ids()[source]

Retrieves published cluster IDs for this project.

Returns

The published cluster ID dataset.

Return type

Dataset

published_cluster_stats()[source]

Retrieves published cluster stats for this project.

Returns

The published cluster stats dataset.

Return type

Dataset

published_cluster_versions(cluster_ids)[source]

Retrieves version information for the specified published clusters. See https://docs.tamr.com/reference#retrieve-published-clusters-given-cluster-ids.

Parameters

cluster_ids (iterable[str]) – The persistent IDs of the clusters to get version information for.

Returns

A stream of the published clusters.

Return type

Python generator yielding PublishedCluster

record_published_cluster_versions(record_ids)[source]

Retrieves version information for the published clusters of the given records. See https://docs.tamr.com/reference#retrieve-published-clusters-given-record-ids.

Parameters

record_ids (iterable[str]) – The Tamr IDs of the records to get cluster version information for.

Returns

A stream of the relevant published clusters.

Return type

Python generator yielding RecordPublishedCluster

estimate_pairs()[source]

Returns pair estimate information for a mastering project

Returns

Pairs Estimate information.

Return type

EstimatedPairCounts

record_clusters_with_data()[source]

Project’s unified dataset with associated clusters.

Returns

The record clusters with data represented as a dataset

Return type

Dataset

published_clusters_with_data()[source]

Project’s unified dataset with associated clusters.

Returns

The published clusters with data represented as a dataset

Return type

Dataset

binning_model()[source]

Binning model for this project.

Returns

Binning model for this project.

Return type

BinningModel

add_input_dataset(dataset)

Associate a dataset with a project in Tamr.

By default, datasets are not associated with any projects. They need to be added as input to a project before they can be used as part of that project

Parameters

dataset (Dataset) – The dataset to associate with the project.

Returns

HTTP response from the server

Return type

requests.Response

as_categorization()

Convert this project to a CategorizationProject

Returns

This project.

Return type

CategorizationProject

Raises

TypeError – If the type of this project is not "CATEGORIZATION"

as_mastering()

Convert this project to a MasteringProject

Returns

This project.

Return type

MasteringProject

Raises

TypeError – If the type of this project is not "DEDUP"

attribute_configurations()

Project’s attribute’s configurations.

Returns

The configurations of the attributes of a project.

Return type

AttributeConfigurationCollection

attribute_mappings()

Project’s attribute’s mappings.

Returns

The attribute mappings of a project.

Return type

AttributeMappingCollection

property attributes

Attributes of this project.

Returns

Attributes of this project.

Return type

AttributeCollection

delete()

Deletes this resource. Some resources do not support deletion, and will raise a 405 error if this is called.

Returns

HTTP response from the server

Return type

requests.Response

property description

str

Type

type

property external_id

str

Type

type

input_datasets()

Retrieve a collection of this project’s input datasets.

Returns

The project’s input datasets.

Return type

DatasetCollection

property name

str

Type

type

property relative_id

str

Type

type

remove_input_dataset(dataset)

Remove a dataset from a project.

Parameters

dataset (Dataset) – The dataset to be removed from this project.

Returns

HTTP response from the server

Return type

requests.Response

property resource_id

str

Type

type

spec()

Returns this project’s spec.

Returns

The spec for the project.

Return type

ProjectSpec

property type

//docs.tamr.com/reference#create-a-project.

Type

str

Type

A Tamr project type, listed in https

unified_dataset()

Unified dataset for this project.

Returns

Unified dataset for this project.

Return type

Dataset

Published Clusters

Metric

class tamr_unify_client.mastering.published_cluster.metric.Metric(data)[source]

A metric for a published cluster.

This is not a BaseResource because it does not have its own API endpoint.

Parameters

data – The JSON entity representing this cluster.

property name

str

Type

type

property value

str

Type

type

Published Cluster

class tamr_unify_client.mastering.published_cluster.resource.PublishedCluster(data)[source]

A representation of a published cluster in a mastering project with version information. See https://docs.tamr.com/reference#retrieve-published-clusters-given-cluster-ids.

This is not a BaseResource because it does not have its own API endpoint.

Parameters

data – The JSON entity representing this PublishedCluster.

property id

str

Type

type

property versions

list[PublishedClusterVersion]

Type

type

Published Cluster Configuration

class tamr_unify_client.mastering.published_cluster.configuration.PublishedClustersConfiguration(client, data, alias=None)[source]

The configuration of published clusters in a project.

See https://docs.tamr.com/reference#the-published-clusters-configuration-object

property relative_id

str

Type

type

property versions_time_to_live

str

Type

type

spec()[source]

Returns a spec representation of this published cluster configuration.

Returns

The published cluster configuration spec.

Return type

:class`~tamr_unify_client.mastering.published_cluster.configuration.PublishedClustersConfigurationSpec`

delete()

Deletes this resource. Some resources do not support deletion, and will raise a 405 error if this is called.

Returns

HTTP response from the server

Return type

requests.Response

property resource_id

str

Type

type

Published Cluster Version

class tamr_unify_client.mastering.published_cluster.version.PublishedClusterVersion(data)[source]

A version of a published cluster in a mastering project.

This is not a BaseResource because it does not have its own API endpoint.

Parameters

data – The JSON entity representing this version.

property version

str

Type

type

property timestamp

str

Type

type

property name

str

Type

type

property metrics

list[Metric]

Type

type

property record_ids

list[dict[str, str]]

Type

type

Record Published Cluster

class tamr_unify_client.mastering.published_cluster.record.RecordPublishedCluster(data)[source]

A representation of a published cluster of a record in a mastering project with version information. See https://docs.tamr.com/reference#retrieve-published-clusters-given-record-ids.

This is not a BaseResource because it does not have its own API endpoint.

Parameters

data – The JSON entity representing this RecordPublishedCluster.

property entity_id

str

Type

type

property source_id

str

Type

type

property origin_entity_id

str

Type

type

property origin_source_id

str

Type

type

property versions

list[RecordPublishedClusterVersion]

Type

type

Record Published Cluster Version

class tamr_unify_client.mastering.published_cluster.record_version.RecordPublishedClusterVersion(data)[source]

A version of a published cluster in a mastering project.

This is not a BaseResource because it does not have its own API endpoint.

Parameters

data – The JSON entity representing this version.

property version

str

Type

type

property timestamp

str

Type

type

property cluster_id

str

Type

type