Categorization

Categorization Project

class tamr_unify_client.categorization.project.CategorizationProject(client, data, alias=None)[source]

A Categorization project in Tamr.

model()[source]

Machine learning model for this Categorization project. Learns from verified labels and predicts categorization labels for unlabeled records.

Returns

The machine learning model for categorization.

Return type

MachineLearningModel

create_taxonomy(creation_spec)[source]

Creates a Taxonomy for this project.

A taxonomy cannot already be associated with this project.

Parameters

creation_spec (dict) – The creation specification for the taxonomy, which can include name.

Returns

The new Taxonomy

Return type

Taxonomy

taxonomy()[source]

Retrieves the Taxonomy associated with this project. If a taxonomy is not already associated with this project, call create_taxonomy() first.

Returns

The project’s Taxonomy

Return type

Taxonomy

add_input_dataset(dataset)

Associate a dataset with a project in Tamr.

By default, datasets are not associated with any projects. They need to be added as input to a project before they can be used as part of that project

Parameters

dataset (Dataset) – The dataset to associate with the project.

Returns

HTTP response from the server

Return type

requests.Response

as_categorization()

Convert this project to a CategorizationProject

Returns

This project.

Return type

CategorizationProject

Raises

TypeError – If the type of this project is not "CATEGORIZATION"

as_mastering()

Convert this project to a MasteringProject

Returns

This project.

Return type

MasteringProject

Raises

TypeError – If the type of this project is not "DEDUP"

attribute_configurations()

Project’s attribute’s configurations.

Returns

The configurations of the attributes of a project.

Return type

AttributeConfigurationCollection

attribute_mappings()

Project’s attribute’s mappings.

Returns

The attribute mappings of a project.

Return type

AttributeMappingCollection

property attributes

Attributes of this project.

Returns

Attributes of this project.

Return type

AttributeCollection

delete()

Deletes this resource. Some resources do not support deletion, and will raise a 405 error if this is called.

Returns

HTTP response from the server

Return type

requests.Response

property description
Type

str

property external_id
Type

str

input_datasets()

Retrieve a collection of this project’s input datasets.

Returns

The project’s input datasets.

Return type

DatasetCollection

property name
Type

str

property relative_id
Type

str

remove_input_dataset(dataset)

Remove a dataset from a project.

Parameters

dataset (Dataset) – The dataset to be removed from this project.

Returns

HTTP response from the server

Return type

requests.Response

property resource_id
Type

str

spec()

Returns this project’s spec.

Returns

The spec for the project.

Return type

ProjectSpec

property type

A Tamr project type, listed in https://docs.tamr.com/reference#create-a-project.

Type

str

unified_dataset()

Unified dataset for this project.

Returns

Unified dataset for this project.

Return type

Dataset

Categories

Category

class tamr_unify_client.categorization.category.resource.Category(client, data, alias=None)[source]

A category of a taxonomy

property name
Type

str

property description
Type

str

property path
Type

list[str]

parent()[source]

Gets the parent Category of this one, or None if it is a tier 1 category

Returns

The parent Category or None

Return type

Category

spec()[source]

Returns this category’s spec.

Returns

The spec for the category.

Return type

CategorySpec

delete()

Deletes this resource. Some resources do not support deletion, and will raise a 405 error if this is called.

Returns

HTTP response from the server

Return type

requests.Response

property relative_id
Type

str

property resource_id
Type

str

Category Spec

class tamr_unify_client.categorization.category.resource.CategorySpec(client, data, api_path)[source]

A representation of the server view of a category.

static of(resource)[source]

Creates a category spec from a category.

Parameters

resource (Category) – The existing category.

Returns

The corresponding category spec.

Return type

CategorySpec

static new()[source]

Creates a blank spec that could be used to construct a new category.

Returns

The empty spec.

Return type

CategorySpec

from_data(data)[source]

Creates a spec with the same client and API path as this one, but new data.

Parameters

data (dict) – The data for the new spec.

Returns

The new spec.

Return type

CategorySpec

to_dict()[source]

Returns a version of this spec that conforms to the API representation.

Returns

The spec’s dict.

Return type

dict

with_name(new_name)[source]

Creates a new spec with the same properties, updating name.

Parameters

new_name (str) – The new name.

Returns

The new spec.

Return type

CategorySpec

with_description(new_description)[source]

Creates a new spec with the same properties, updating description.

Parameters

new_description (str) – The new description.

Returns

The new spec.

Return type

CategorySpec

with_path(new_path)[source]

Creates a new spec with the same properties, updating path.

Parameters

new_path (list[str]) – The new path.

Returns

The new spec.

Return type

CategorySpec

Category Collection

class tamr_unify_client.categorization.category.collection.CategoryCollection(client, api_path)[source]

Collection of Category s.

Parameters
  • client (Client) – Client for API call delegation.

  • api_path (str) – API path used to access this collection. E.g. "projects/1/taxonomy/categories".

by_resource_id(resource_id)[source]

Retrieve a category by resource ID.

Parameters

resource_id (str) – The resource ID. E.g. "1"

Returns

The specified category.

Return type

Category

by_relative_id(relative_id)[source]

Retrieve a category by relative ID.

Parameters

relative_id (str) – The relative ID. E.g. "projects/1/categories/1"

Returns

The specified category.

Return type

Category

by_external_id(external_id)[source]

Retrieve an attribute by external ID.

Since categories do not have external IDs, this method is not supported and will raise a NotImplementedError .

Parameters

external_id (str) – The external ID.

Returns

The specified category, if found.

Return type

Category

Raises
  • KeyError – If no category with the specified external_id is found

  • LookupError – If multiple categories with the specified external_id are found

stream()[source]

Stream categories in this collection. Implicitly called when iterating over this collection.

Returns

Stream of categories.

Return type

Python generator yielding Category

Usage:
>>> for category in collection.stream(): # explicit
>>>     do_stuff(category)
>>> for category in collection: # implicit
>>>     do_stuff(category)
create(creation_spec)[source]

Creates a new category.

Parameters

creation_spec (dict) – Category creation specification, formatted as specified in the Public Docs for Creating a Category.

Returns

The newly created category.

Return type

Category

bulk_create(creation_specs)[source]

Creates new categories in bulk.

Parameters

creation_specs (iterable[dict]) – A collection of creation specifications, as detailed for create.

Returns

JSON response from the server

Return type

dict

delete_by_resource_id(resource_id)

Deletes a resource from this collection by resource ID.

Parameters

resource_id (str) – The resource ID of the resource that will be deleted.

Returns

HTTP response from the server.

Return type

requests.Response

Taxonomy

class tamr_unify_client.categorization.taxonomy.Taxonomy(client, data, alias=None)[source]

A project’s taxonomy

property name
Type

str

categories()[source]

Retrieves the categories of this taxonomy.

Returns

A collection of the taxonomy categories.

Return type

CategoryCollection

delete()

Deletes this resource. Some resources do not support deletion, and will raise a 405 error if this is called.

Returns

HTTP response from the server

Return type

requests.Response

property relative_id
Type

str

property resource_id
Type

str