Tamr - Python Client¶
Example¶
from tamr_unify_client import Client
from tamr_unify_client.auth import UsernamePasswordAuth
import os
# grab credentials from environment variables
username = os.environ['TAMR_USERNAME']
password = os.environ['TAMR_PASSWORD']
auth = UsernamePasswordAuth(username, password)
host = 'localhost' # replace with your Tamr host
tamr = Client(auth, host=host)
# programmatically interact with Tamr!
# e.g. refresh your project's Unified Dataset
project = tamr.projects.by_resource_id('3')
ud = project.unified_dataset()
op = ud.refresh()
assert op.succeeded()
User Guide¶
FAQ¶
What version of the Python Client should I use?¶
The Python Client just cares about features, and will try everything it knows to implement those features correctly, independent of the API version.
If you are starting a new project or your existing project does not yet use the Python Client, we encourage you to use the latest stable version of the Python Client.
Otherwise, check the Releases to see:
what new features and bug fixes are available in newer versions
which breaking changes (if any) will require changes in your code to get those new features and bug fixes
Note: You do not need to reason about the Tamr API version nor the the Tamr app/server version.
How do I call custom endpoints, e.g. endpoints outside the Tamr API?¶
To call a custom endpoint within the Tamr API, use the client.request()
method, and provide an endpoint described by a path relative to base_path
.
For example, if base_path
is /api/versioned/v1/
(the default), and you want to get /api/versioned/v1/projects/1
, you only need to provide projects/1
(the relative ID provided by the project) as the endpoint, and the Client will resolve that into /api/versioned/v1/projects/1
.
There are various APIs outside the /api/versioned/v1/
prefix that are often useful or necessary to call - e.g. /api/service/health
, or other un-versioned / unsupported APIs. To call a custom endpoint outside the Tamr API, use the client.request()
method, and provide an endpoint described by an absolute path (a path starting with /
). For example, to get /api/service/health
(no matter what base_path
is), call client.request()
with /api/service/health
as the endpoint. The Client will ignore base_path
and send the request directly against the absolute path provided.
For additional detail, see Raw HTTP requests and Unversioned API Access
Installation¶
tamr-unify-client
is compatible with Python 3.6 or newer.
Stable releases¶
Installation is as simple as:
pip install tamr-unify-client
Or:
poetry add tamr-unify-client
Note
If you don’t use poetry, we recommend you use a virtual environment for your project and install the Python Client into that virtual environment.
You can create a virtual environment with Python 3 via:
python3 -m venv my-venv
For more, see The Hitchhiker’s Guide to Python.
Latest (unstable)¶
Note
This project uses the new pyproject.toml
file, not a setup.py
file, so make sure you have the latest version of pip
installed: `pip install -U pip
.
To install the bleeding edge:
git clone https://github.com/Datatamer/tamr-client
cd tamr-client
pip install .
Offline installs¶
First, download tamr-unify-client
and its dependencies on a machine with online access to PyPI:
pip download tamr-unify-client -d tamr-unify-client-requirements
zip -r tamr-unify-client-requirements.zip tamr-unify-client-requirements
Then, ship the .zip
file to the target machine where you want tamr-unify-client
installed. You can do this via email, cloud drives, scp
or any other mechanism.
Finally, install tamr-unify-client
from the saved dependencies:
unzip tamr-unify-client-requirements.zip
pip install --no-index --find-links=tamr-unify-client-requirements tamr-unify-client
If you are not using a virtual environment, you may need to specify the --user
flag if you get permissions errors:
pip install --user --no-index --find-links=tamr-unify-client-requirements tamr-unify-client
Quickstart¶
Client configuration¶
Start by importing the Python Client and authentication provider:
from tamr_unify_client import Client
from tamr_unify_client.auth import UsernamePasswordAuth
Next, create an authentication provider and use that to create an authenticated client:
import os
username = os.environ['TAMR_USERNAME']
password = os.environ['TAMR_PASSWORD']
auth = UsernamePasswordAuth(username, password)
tamr = Client(auth)
Warning
For security, it’s best to read your credentials in from environment variables or secure files instead of hardcoding them directly into your code.
For more, see User Guide > Secure Credentials.
By default, the client tries to find the Tamr instance on localhost
. To point to a different host, set the host argument when instantiating the Client.
For example, to connect to 10.20.0.1
:
tamr = Client(auth, host='10.20.0.1')
Top-level collections¶
The Python Client exposes 2 top-level collections: Projects and Datasets.
You can access these collections through the client and loop over their members
with simple for
-loops.
E.g.:
for project in tamr.projects:
print(project.name)
for dataset in tamr.datasets:
print(dataset.name)
Fetch a specific resource¶
If you know the identifier for a specific resource, you can ask for it directly via the by_resource_id
methods exposed by collections.
E.g. To fetch the project with ID '1'
:
project = tamr.projects.by_resource_id('1')
Similarly, if you know the name of a specific resource, you can ask for it directly via the by_name
methods exposed by collections.
E.g. To fetch the project with name 'Number 1'
:
project = tamr.projects.by_name('Number 1')
Note
If working with projects across Tamr instances for migrations or promotions, use external IDs (via by_external_id
) instead of name (via by_name
).
Resource relationships¶
Related resources (like a project and its unified dataset) can be accessed through specific methods.
E.g. To access the Unified Dataset for a particular project:
ud = project.unified_dataset()
Kick-off Tamr Operations¶
Some methods on Model objects can kick-off long-running Tamr operations.
Here, kick-off a “Unified Dataset refresh” operation:
operation = project.unified_dataset().refresh()
assert op.succeeded()
By default, the API Clients expose a synchronous interface for Tamr operations.
Secure Credentials¶
This section discusses ways to pass credentials securely to
UsernamePasswordAuth
. Specifically, you should not hardcode your password(s) in your source code. Instead, you should use environment variables or secure files to store your credentials and simple Python code to read your credentials.
Environment variables¶
You can use os.environ
to read in your credentials from environment variables:
# my_script.py
import os
from tamr_unify_client.auth import UsernamePasswordAuth
username = os.environ['TAMR_USERNAME'] # replace with your username environment variable name
password = os.environ['TAMR_PASSWORD'] # replace with your password environment variable name
auth = UsernamePasswordAuth(username, password)
You can pass in the environment variables from the terminal by including them before your command:
TAMR_USERNAME="my Tamr username" TAMR_PASSWORD="my Tamr password" python my_script.py
You can also create an .sh
file to store your environment variables and
simply source
that file before running your script.
Config files¶
You can also store your credentials in a secure credentials file:
# credentials.yaml
---
username: "my tamr username"
password: "my tamr password"
Then pip install pyyaml
read the credentials in your Python code:
# my_script.py
from tamr_unify_client.auth import UsernamePasswordAuth
import yaml
with open("path/to/credentials.yaml") as f: # replace with your credentials.yaml path
creds = yaml.safe_load(f)
auth = UsernamePasswordAuth(creds['username'], creds['password'])
As in this example, we recommend you use YAML as your format since YAML has support for comments and is more human-readable than JSON.
Important
You should not check these credentials files into your version control system (e.g. git
). Do not share this file with anyone who should not have access to the password stored in it.
Workflows¶
Continuous Categorization¶
from tamr_unify_client import Client
from tamr_unify_client.auth import UsernamePasswordAuth
import os
username = os.environ['TAMR_USERNAME']
password = os.environ['TAMR_PASSWORD']
auth = UsernamePasswordAuth(username, password)
host = 'localhost' # replace with your host
tamr = Client(auth)
project_id = "1" # replace with your project ID
project = tamr.projects.by_resource_id(project_id)
project = project.as_categorization()
unified_dataset = project.unified_dataset()
op = unified_dataset.refresh()
assert op.succeeded()
model = project.model()
op = model.train()
assert op.succeeded()
op = model.predict()
assert op.succeeded()
Continuous Mastering¶
from tamr_unify_client import Client
from tamr_unify_client.auth import UsernamePasswordAuth
import os
username = os.environ['TAMR_USERNAME']
password = os.environ['TAMR_PASSWORD']
auth = UsernamePasswordAuth(username, password)
host = 'localhost' # replace with your host
tamr = Client(auth)
project_id = "1" # replace with your project ID
project = tamr.projects.by_resource_id(project_id)
project = project.as_mastering()
unified_dataset = project.unified_dataset()
op = unified_dataset.refresh()
assert op.succeeded()
op = project.pairs().refresh()
assert op.succeeded()
model = project.pair_matching_model()
op = model.train()
assert op.succeeded()
op = model.predict()
assert op.succeeded()
op = project.record_clusters().refresh()
assert op.succeeded()
op = project.published_clusters().refresh()
assert op.succeeded()
Creating and Modifying Resources¶
Creating resources¶
Resources, such as projects, dataset, and attribute configurations, can be created through their respective collections. Each create
function takes in a dictionary that conforms to the
Tamr Public Docs for creating that resource type:
spec = {
"name": "project",
"description": "Mastering Project",
"type": "DEDUP",
"unifiedDatasetName": "project_unified_dataset"
}
project = tamr.projects.create(spec)
Using specs¶
These dictionaries can also be created using spec classes.
Each Resource
has a corresponding ResourceSpec
which can be used to build an instance of that resource by specifying the value for each property.
The spec can then be converted to a dictionary that can be passed to create
.
For instance, to create a project:
spec = (
ProjectSpec.new()
.with_name("Project")
.with_type("DEDUP")
.with_description("Mastering Project")
.with_unified_dataset_name("Project_unified_dataset")
.with_external_id("tamrProject1")
)
project = tamr.projects.create(spec.to_dict())
Calling with_*
on a spec creates a new spec with the same properties besides the modified one. The original spec is unaltered, so it could be used multiple times:
base_spec = (
ProjectSpec.new()
.with_type("DEDUP")
.with_description("Mastering Project")
)
specs = []
for name in project_names:
spec = (
base_spec.with_name(name)
.with_unified_dataset_name(name + "_unified_dataset")
)
specs.append(spec)
projects = [tamr.projects.create(spec.to_dict()) for spec in specs]
Creating a dataset¶
Datasets can be created as described above, but the dataset’s schema and records must then be handled separately.
To combine all of these steps into one, DatasetCollection
has a convenience function create_from_dataframe
that takes a Pandas DataFrame.
This makes it easy to create a Tamr dataset from a CSV:
import pandas as pd
df = pd.read_csv("my_data.csv", dtype=str) # string is the recommended data type
dataset = tamr.datasets.create_from_dataframe(df, primary_key_name="primary key name", dataset_name="My Data")
This will create a dataset called “My Data” with the specified primary key, an attribute
for each column of the DataFrame
, and the DataFrame
’s rows as records.
Modifying a resource¶
Certain resources can also be modified using specs.
After getting a spec corresponding to a resource and modifying some properties,
the updated resource can be committed to Tamr with the put
function:
updated_dataset = (
dataset.spec()
.with_description("Modified description")
.put()
)
Each spec class has many properties that can be changed, but refer to the Public Docs for which properties will actually be updated in Tamr. If an immutable property is changed in the update request, the new value will simply be ignored.
Logging¶
IMPORTANT Make sure to configure logging BEFORE import
ing from 3rd party
libraries. Logging will use the first configuration it finds, and if a library
configures logging before you, your configuration will be ignored.
To configure logging, simply follow the official Python logging HOWTO.
For example:
# script.py
import logging
logging.basicConfig(filename="script.log", level=logging.INFO)
# configure logging before other imports
from tamr_unify_client import Client
from tamr_unify_client.auth import UsernamePasswordAuth
auth = UsernamePasswordAuth("my username", "my password")
tamr = Client(auth, host="myhost")
for p in tamr.projects:
print(p)
for d in tamr.datasets:
print(d)
# should cause an HTTP error
tamr.get("/invalid/api/path").successful()
This will log all API requests made and print the response bodies for any requests with HTTP error codes.
If you want to only configure logging for the Tamr Client:
import logging
logger = logging.getLogger('tamr_unify_client')
logger.setLevel(logging.INFO)
logger.addHandler(logging.FileHandler('tamr-client.log'))
# configure logging before other imports
from tamr_unify_client import Client
from tamr_unify_client import UsernamePasswordAuth
# rest of script goes here
Geospatial Data¶
What geospatial data is supported?¶
In general, the Python Geo Interface is supported; see https://gist.github.com/sgillies/2217756.
There are three layers of information, modeled after GeoJSON (see https://tools.ietf.org/html/rfc7946):
The outermost layer is a FeatureCollection
Within a FeatureCollection are Features, each of which represents one “thing”, like a building or a river. Each feature has:
type (string; required)
id (object; required)
geometry (Geometry, see below; optional)
bbox (“bounding box”, 4 doubles; optional)
properties (map[string, object]; optional)
Within a Feature is a Geometry, which represents a shape, like a point or a polygon. Each geometry has:
type (one of “Point”, “MultiPoint”, “LineString”, “MultiLineString”, “Polygon”, “MultiPolygon”; required)
coordinates (doubles; exactly how these are structured depends on the type of the geometry)
Although the Python Geo Interface is non-prescriptive when it comes to the data types of the id and properties, Tamr has a more restricted set of supported types. See https://docs.tamr.com/reference#attribute-types.
The Dataset
class supports the __geo_interface__
property. This will produce one FeatureCollection
for the entire dataset.
There is a companion iterator itergeofeatures()
that returns a generator that allows you to
stream the records in the dataset as Geospatial features.
To produce a GeoJSON representation of a dataset:
dataset = client.datasets.by_name("my_dataset")
with open("my_dataset.json", "w") as f:
json.dump(dataset.__geo_interface__, f)
By default, itergeofeatures()
will use the first dataset attribute with geometry type to fill in the feature geometry. You can override this by specifying the geometry attribute to use in the geo_attr
parameter to itergeofeatures
.
Dataset
can also be updated from a feature collection that supports the Python Geo Interface:
import geopandas
geodataframe = geopandas.GeoDataFrame(...)
dataset = client.dataset.by_name("my_dataset")
dataset.from_geo_features(geodataframe)
Note that there are currently some limitations to GeoPandas’ implementation of the Geo Interface. See below for more details.
By default the features’ geometries will be placed into the first dataset attribute with geometry
type. You can override this by specifying the geometry attribute to use in the geo_attr
parameter to from_geo_features
.
Rules for converting from Tamr records to Geospatial Features¶
The record’s primary key will be used as the feature’s id
. If the primary key is a single attribute, then the value of that attribute will be the value of id
. If the primary key is composed of multiple attributes, then the value of the id
will be an array with the values of the key attributes in order.
Tamr allows any number of geometry attributes per record; the Python Geo Interface is limited to one. When converting Tamr records to Python Geo Features, the first geometry attribute in the schema will be used as the geometry; all other geometry attributes will appear as properties with no type conversion. In the future, additional control over the handling of multiple geometries may be provided; the current set of capabilities is intended primarily to support the use case of working with FeatureCollections within Tamr, and FeatureCollection has only one geometry per feature.
An attribute is considered to have geometry type if it has type RECORD
and contains an attribute named point
, multiPoint
, lineString
, multiLineString
, polygon
, or multiPolygon
.
If an attribute named bbox
is available, it will be used as bbox
. No conversion is done on the value of bbox
. In the future, additional control over the handling of bbox
attributes may be provided.
All other attributes will be placed in properties
, with no type conversion. This includes all geometry attributes other than the first.
Rules for converting from Geospatial Features to Tamr records¶
The Feature’s id
will be converted into the primary key for the record. If the record uses a simple key, no value translation will be done. If the record uses a composite key, then the value of the Feature’s id
must be an array of values, one per attribute in the key.
If the Feature contains keys in properties
that conflict with the record keys, bbox
, or geometry, those keys are ignored (omitted).
If the Feature contains a bbox
, it is copied to the record’s bbox
.
All other keys in the Feature’s properties
are propagated to the same-name attribute on the record, with no type conversion.
Streaming data access¶
The Dataset
method itergeofeatures()
returns a generator that allows you to stream the records in the dataset as Geospatial features:
my_dataset = client.datasets.by_name("my_dataset")
for feature in my_dataset.itergeofeatures():
do_something(feature)
Note that many packages that consume the Python Geo Interface will be able to consume this iterator directly. For example::
from geopandas import GeoDataFrame
df = GeoDataFrame.from_features(my_dataset.itergeofeatures())
This allows construction of a GeoDataFrame directly from the stream of records, without materializing the intermediate dataset.
Note on GeoPandas data access¶
There is a current limitation in GeoPandas that causes the feature’s ID field to be ignored in certain scenarios. The Tamr primary key is stored in this field.
The result is that when loading data and updating records through the dataset.from_geo_features()
method, records will not be overwritten as anticipated.
This issue can be circumvented by loading features into GeoPandas by re-inserting the id field into the data.
my_dataset = client.datasets.by_name("my_dataset")
for feature in my_dataset.itergeofeatures():
primary_key = feature['id']
df = gpd.GeoDataFrame.from_features([feature])
do_something(df)
geo.index = [primary_key]
my_dataset.from_geo_features(df)
Alternatively, it is possible to load the full dataset as follows:
my_dataset = client.datasets.by_name("my_dataset")
def geopandas_dataset(dataset):
for feature in dataset.itergeofeatures():
feature['properties']['primary_key'] = feature['id']
yield feature
df = gpd.GeoDataFrame.from_features(geo_dataset(my_dataset))
df.set_index('primary_key')
do_something(df)
my_dataset.from_geo_features(df)
Pandas Workflow¶
Connecting To Tamr¶
Connecting to a Tamr instance:
import os
import pandas as pd
from tamr_unify_client import Client
from tamr_unify_client.auth import UsernamePasswordAuth
username = os.environ['TAMR_USERNAME']
password = os.environ['TAMR_PASSWORD']
auth = UsernamePasswordAuth(username, password)
tamr = Client(auth)
Load dataset as Dataframe¶
Loading: In Memory¶
Loading a dataset
as a pandas dataframe
is possible via the from_records()
method that pandas provides.
An example is shown below:
my_dataset = tamr.datasets.by_name("my_tamr_dataset")
df = pd.DataFrame.from_records(my_dataset.records())
This will construct a pandas dataframe based on the records that are streamed in, and stored in the pandas dataframe. Once all records have been loaded, you will be able to interact with the dataframe normally.
Note that as values are typically represented inside arrays
within Tamr, the values will be encapsulated lists
inside the dataframe. You can use traditional methods in pandas to deal with this; for example by calling .explode()
,
or extracting specific elements.
Loading: Streaming¶
When working with large datasets
it is sometimes better not to work in memory, but to iterate through a dataset, rather
than load the entire dataset at once.
Since dataset.records()
is a generator, this can easily be done as follows:
output = []
for record in dataset.records():
single_record_df = pd.DataFrame.from_records(record)
output.append(do_something(single_record_df))
Custom Generators¶
In order to customise the data loaded into the pandas dataframe, it is possible to customise the generator object
dataset.records()
by wrapping it in a different generator.
For example, it is possible to automatically flatten all lists with a length of one, and apply this to the dataset.records()
generator as follows:
def unlist(lst):
"""
If object is a list of length one, return first element.
Otherwise, return original object.
"""
if isinstance(lst, list) and len(lst) is 1:
return lst[0]
else:
return lst
def dataset_to_pandas(dataset):
"""
Incorporates basic unlisting for easy transfer between Tamr and Pandas.
"""
for record in dataset.records():
for key in record:
record[key] = unlist(record[key])
yield record
df = pd.DataFrame.from_records(dataset_to_pandas(my_dataset))
Similarly, it is possible to filter to extracting only certain attributes, by specifying this in the generator:
def filter_dataset_to_pandas(dataset, colnames):
"""
Filter the dataset to only the primary key and the columns specified as a list in colnames.
"""
assert isinstance(colnames, list)
colnames = dataset.key_attribute_names + colnames if dataset.key_attribute_names[0] not in colnames else colnames
for record in dataset.records():
yield {k: unlist(v) for k, v in record.items() if k in colnames}
df = pd.DataFrame.from_records(filter_dataset_to_pandas(my_dataset, ['City', 'new_attr']))
Note that upserting these records back to the original Tamr Dataset would overwite the existing records and attributes, and cause loss of the data stored in the removed attributes.
Upload Dataframe as Dataset¶
Create New Dataset¶
To create a new dataset and upload data, the convenience function datasets.create_from_dataframe()
can be used.
Note that Tamr will throw an error if columns aren’t generally formatted as strings. (The exception being geospatial
columns. For that, see the geospatial examples.)
To format values as strings while preserving null information, specify dtype=object
when creating a dataframe from a csv file.
df = pd.read_csv("my_file.csv", dtype=object)
Creating the dataset is as easy as calling:
tamr.datasets.create_from_dataframe(df, 'primaryKey', 'my_new_dataset')
For an already-existing dataframe, the columns can be converted to strings using:
df = df.astype(str)
Note, however, that converting this way will cause any NaN
or None
values to become strings like 'nan'
that will persist into the created Tamr dataset.
Changing Values¶
Making Changes: In Memory¶
When making changes to a dataset that was loaded as a dataframe, changes can be pushed back to Tamr using the
dataset.upsert_from_dataframe()
method as follows:
df = pd.DataFrame.from_records(my_dataset.records())
df['column'] = 'new_value'
my_dataset.upsert_from_dataframe(df, primary_key_name='primary_key')
Making Changes: Streaming¶
For larger datasets it might be better to stream the data and apply changes while iterating through the dataset. This way the full dataset does not need to be loaded into memory.
for record in dataset.records():
single_record_df = pd.DataFrame.from_records(record)
single_record_df['column_to_change'] = 'new_value'
dataset.upsert_from_dataframe(single_record_df, primary_key_name='primary_key')
Adding Attributes¶
When making changes to dataframes, new dataframe columns are not automatically created as attributes when upserting records to Tamr. In order for these changes to be recorded, these attributes first need to be created.
One way of creating these for source datasets automatically would be as follows:
def add_missing_attributes(dataset, df):
"""
Detects any attributes in the dataframe that aren't in the dataset and attempts to add them (as strings).
"""
existing_attributes = [att.name for att in dataset.attributes]
new_attributes = [att for att in df.columns.to_list() if att not in existing_attributes]
if not new_attributes:
return
for new_attribute in new_attributes:
attr_spec = {"name": new_attribute,
"type": {"baseType": "ARRAY", "innerType": {"baseType": "STRING"}},
}
dataset.attributes.create(attr_spec)
add_missing_attributes(my_dataset, df)
Troubleshooting¶
When running into errors upon loading dataset.records()
into a pandas dataframe, it is good to consider the following
steps. To extract a single record, the following code can be used to provide a minimal reproducible example:
record = next(dataset.records())
print(record)
Parsing¶
Tamr allows for more variety in attribute names and contents than pandas does. In most cases pandas can load data correctly, but it is possible to modify the parsing using a custom generator as shown above. An example below changes an attribute name, and extracts only the first element:
def custom_parser(dataset):
for record in dataset.records():
record['pandas_column_name'] = record.pop('dataset_attribute_name')
record['first_element_of_column'] = record['multi_value_column'][0]
yield record
df = pd.DataFrame.from_records(custom_parser(dataset))
Advanced Usage¶
Asynchronous Operations¶
You can opt-in to an asynchronous interface via the asynchronous keyword argument for methods that kick-off Tamr operations.
E.g.:
op = project.unified_dataset().refresh(asynchronous=True)
# do asynchronous stuff here while operation is running
op = op.wait() # hangs until operation finishes
assert op.succeeded()
Raw HTTP requests and Unversioned API Access¶
We encourage you to use the high-level, object-oriented interface offered by the Python Client. If you aren’t sure whether you need to send low-level HTTP requests, you probably don’t.
But sometimes it’s useful to directly send HTTP requests to Tamr; for example, Tamr has many APIs that are not covered by the higher-level interface (most of which are neither versioned nor supported). You can still call these endpoints using the Python Client, but you’ll need to work with raw Response
objects.
Custom endpoint¶
The client exposes a request
method with the same interface as
requests.request
:
# import Python Client library and configure your client
tamr = Client(auth)
# do stuff with the `tamr` client
# now I NEED to send a request to a specific endpoint
response = tamr.request('GET', 'relative/path/to/resource')
This will send a request relative to the base_path registered with the client. If you provide an absolute path to the resource, the base_path will be ignored when composing the request:
# import Python Client library and configure your client
tamr = Client(auth)
# request a resource outside the configured base_path
response = tamr.request('GET', '/absolute/path/to/resource')
You can also use the get
, post
, put
, delete
convenience
methods:
# e.g. `get` convenience method
response = tamr.get('relative/path/to/resource')
Request headers and data can be supplied by passing dictionaries or lists with the headers
and json
arguments:
# e.g. `post` with headers and data
headers = {...}
body = {...}
response = tamr.post('relative/path/to/resource', headers=headers, json=body)
Custom Host / Port / Base API path¶
If you need to repeatedly send requests to another port or base API path (i.e. not /api/versioned/v1/
), you can simply instantiate a different client.
Then just call request
as described above:
# import Python Client library and configure your client
tamr = api.Client(auth)
# do stuff with the `tamr` client
# now I NEED to send requests to a different host/port/base API path etc..
# NOTE: in this example, we reuse `auth` from the first client, but we could
# have made a new Authentication provider if this client needs it.
custom_client = api.Client(
auth,
host="10.10.0.1",
port=9090,
base_path="/api/some_service/",
)
response = custom_client.get('relative/path/to/resource')
One-off authenticated request¶
All of the Python Client Authentication providers adhere to the requests.auth.BaseAuth
interface.
This means that you can pass in an Authentication provider directly to the requests
library:
from tamr_unify_client.auth import UsernamePasswordAuth
import os
import requests
username = os.environ['TAMR_USERNAME']
password = os.environ['TAMR_PASSWORD']
auth = UsernamePasswordAuth(username, password)
response = requests.request('GET', 'some/specific/endpoint', auth=auth)
Reference¶
Reference¶
Attributes¶
Attribute¶
-
class
tamr_unify_client.attribute.resource.
Attribute
(client, data, alias=None)[source]¶ A Tamr Attribute.
See https://docs.tamr.com/reference#attribute-types
-
property
type
¶ -
- Type
-
spec
()[source]¶ Returns a spec representation of this attribute.
- Returns
The attribute spec.
- Return type
-
delete
()¶ Deletes this resource. Some resources do not support deletion, and will raise a 405 error if this is called.
- Returns
HTTP response from the server
- Return type
-
property
Attribute Spec¶
-
class
tamr_unify_client.attribute.resource.
AttributeSpec
(client, data, api_path)[source]¶ A representation of the server view of an attribute
-
static
of
(resource)[source]¶ Creates an attribute spec from an attribute.
- Parameters
resource (
Attribute
) – The existing attribute.- Returns
The corresponding attribute spec.
- Return type
-
static
new
()[source]¶ Creates a blank spec that could be used to construct a new attribute.
- Returns
The empty spec.
- Return type
-
from_data
(data)[source]¶ Creates a spec with the same client and API path as this one, but new data.
- Parameters
data (dict) – The data for the new spec.
- Returns
The new spec.
- Return type
-
to_dict
()[source]¶ Returns a version of this spec that conforms to the API representation.
- Returns
The spec’s dict.
- Return type
-
with_name
(new_name)[source]¶ Creates a new spec with the same properties, updating name.
- Parameters
new_name (str) – The new name.
- Returns
The new spec.
- Return type
-
with_description
(new_description)[source]¶ Creates a new spec with the same properties, updating description.
- Parameters
new_description (str) – The new description.
- Returns
The new spec.
- Return type
-
with_type
(new_type)[source]¶ Creates a new spec with the same properties, updating type.
- Parameters
new_type (
AttributeTypeSpec
) – The spec of the new type.- Returns
The new spec.
- Return type
-
static
Attribute Collection¶
-
class
tamr_unify_client.attribute.collection.
AttributeCollection
(client, api_path)[source]¶ Collection of
Attribute
s.- Parameters
-
by_external_id
(external_id)[source]¶ Retrieve an attribute by external ID.
Since attributes do not have external IDs, this method is not supported and will raise a
NotImplementedError
.- Parameters
external_id (str) – The external ID.
- Returns
The specified attribute, if found.
- Return type
- Raises
KeyError – If no attribute with the specified external_id is found
LookupError – If multiple attributes with the specified external_id are found
-
stream
()[source]¶ Stream attributes in this collection. Implicitly called when iterating over this collection.
- Returns
Stream of attributes.
- Return type
Python generator yielding
Attribute
- Usage:
>>> for attribute in collection.stream(): # explicit >>> do_stuff(attribute) >>> for attribute in collection: # implicit >>> do_stuff(attribute)
-
by_name
(attribute_name)[source]¶ Lookup a specific attribute in this collection by exact-match on name.
-
create
(creation_spec)[source]¶ Create an Attribute in this collection
- Parameters
creation_spec (dict[str, str]) – Attribute creation specification should be formatted as specified in the Public Docs for adding an Attribute.
- Returns
The created Attribute
- Return type
Attribute Type¶
-
class
tamr_unify_client.attribute.type.
AttributeType
(data)[source]¶ The type of an
Attribute
orSubAttribute
.See https://docs.tamr.com/reference#attribute-types
- Parameters
data (
dict
) – JSON data representing this type
-
property
inner_type
¶ -
- Type
-
property
attributes
¶ list[
SubAttribute
]- Type
Attribute Type Spec¶
-
class
tamr_unify_client.attribute.type.
AttributeTypeSpec
(data)[source]¶ -
static
of
(resource)[source]¶ Creates an attribute type spec from an attribute type.
- Parameters
resource (
AttributeType
) – The existing attribute type.- Returns
The corresponding attribute type spec.
- Return type
-
static
new
()[source]¶ Creates a blank spec that could be used to construct a new attribute type.
- Returns
The empty spec.
- Return type
-
to_dict
()[source]¶ Returns a version of this spec that conforms to the API representation.
- Returns
The spec’s dict.
- Return type
-
with_base_type
(new_base_type)[source]¶ Creates a new spec with the same properties, updating the base type.
- Parameters
new_base_type (str) – The new base type.
- Returns
The new spec.
- Return type
-
with_inner_type
(new_inner_type)[source]¶ Creates a new spec with the same properties, updating the inner type.
- Parameters
new_inner_type (
AttributeTypeSpec
) – The spec of the new inner type.- Returns
The new spec.
- Return type
-
with_attributes
(new_attributes)[source]¶ Creates a new spec with the same properties, updating attributes.
- Parameters
new_attributes (list[
AttributeSpec
]) – The specs of the new attributes.- Returns
The new spec.
- Return type
-
static
SubAttribute¶
-
class
tamr_unify_client.attribute.subattribute.
SubAttribute
(name, type, is_nullable, _json, description=None)[source]¶ An attribute which is itself a property of another attribute.
See https://docs.tamr.com/reference#attribute-types
- Parameters
name (
str
) – Name of sub-attributetype (
AttributeType
) – See https://docs.tamr.com/reference#attribute-typesis_nullable (
bool
) – If this sub-attribute can be null
Auth¶
-
class
tamr_unify_client.auth.
JwtTokenAuth
(token)[source]¶ Provides JWT authentication for Tamr. Specifically, sets the Authorization HTTP header with Bearer format. This feature is only supported in Tamr releases beginning with v2022.010.0
- Parameters
token – The JWT value to be used for authentication
- Usage:
>>> from tamr_unify_client.auth import JwtTokenAuth >>> auth = JwtTokenAuth('my token') >>> import tamr_unify_client as api >>> unify = api.Client(auth)
-
class
tamr_unify_client.auth.
UsernamePasswordAuth
(username, password)[source]¶ Provides username/password authentication for Tamr. Specifically, sets the Authorization HTTP header with Tamr’s custom BasicCreds format.
- Usage:
>>> from tamr_unify_client.auth import UsernamePasswordAuth >>> auth = UsernamePasswordAuth('my username', 'my password') >>> import tamr_unify_client as api >>> unify = api.Client(auth)
Categorization¶
Categorization Project¶
-
class
tamr_unify_client.categorization.project.
CategorizationProject
(client, data, alias=None)[source]¶ A Categorization project in Tamr.
-
model
()[source]¶ Machine learning model for this Categorization project. Learns from verified labels and predicts categorization labels for unlabeled records.
- Returns
The machine learning model for categorization.
- Return type
-
create_taxonomy
(creation_spec)[source]¶ Creates a
Taxonomy
for this project.A taxonomy cannot already be associated with this project.
-
taxonomy
()[source]¶ Retrieves the
Taxonomy
associated with this project. If a taxonomy is not already associated with this project, callcreate_taxonomy()
first.- Returns
The project’s Taxonomy
- Return type
-
add_input_dataset
(dataset)¶ Associate a dataset with a project in Tamr.
By default, datasets are not associated with any projects. They need to be added as input to a project before they can be used as part of that project
- Parameters
dataset (
Dataset
) – The dataset to associate with the project.- Returns
HTTP response from the server
- Return type
-
as_categorization
()¶ Convert this project to a
CategorizationProject
-
as_mastering
()¶ Convert this project to a
MasteringProject
- Returns
This project.
- Return type
- Raises
-
attribute_configurations
()¶ Project’s attribute’s configurations.
- Returns
The configurations of the attributes of a project.
- Return type
-
attribute_mappings
()¶ Project’s attribute’s mappings.
- Returns
The attribute mappings of a project.
- Return type
-
property
attributes
¶ Attributes of this project.
- Returns
Attributes of this project.
- Return type
-
delete
()¶ Deletes this resource. Some resources do not support deletion, and will raise a 405 error if this is called.
- Returns
HTTP response from the server
- Return type
-
input_datasets
()¶ Retrieve a collection of this project’s input datasets.
- Returns
The project’s input datasets.
- Return type
-
remove_input_dataset
(dataset)¶ Remove a dataset from a project.
- Parameters
dataset (
Dataset
) – The dataset to be removed from this project.- Returns
HTTP response from the server
- Return type
-
spec
()¶ Returns this project’s spec.
- Returns
The spec for the project.
- Return type
-
property
type
¶ //docs.tamr.com/reference#create-a-project.
- Type
- Type
A Tamr project type, listed in https
-
Categories¶
Category¶
-
class
tamr_unify_client.categorization.category.resource.
Category
(client, data, alias=None)[source]¶ A category of a taxonomy
-
parent
()[source]¶ Gets the parent Category of this one, or None if it is a tier 1 category
- Returns
The parent Category or None
- Return type
-
delete
()¶ Deletes this resource. Some resources do not support deletion, and will raise a 405 error if this is called.
- Returns
HTTP response from the server
- Return type
-
Category Spec¶
-
class
tamr_unify_client.categorization.category.resource.
CategorySpec
(client, data, api_path)[source]¶ A representation of the server view of a category.
-
static
of
(resource)[source]¶ Creates a category spec from a category.
- Parameters
resource (
Category
) – The existing category.- Returns
The corresponding category spec.
- Return type
-
static
new
()[source]¶ Creates a blank spec that could be used to construct a new category.
- Returns
The empty spec.
- Return type
-
from_data
(data)[source]¶ Creates a spec with the same client and API path as this one, but new data.
- Parameters
data (dict) – The data for the new spec.
- Returns
The new spec.
- Return type
-
to_dict
()[source]¶ Returns a version of this spec that conforms to the API representation.
- Returns
The spec’s dict.
- Return type
-
with_name
(new_name)[source]¶ Creates a new spec with the same properties, updating name.
- Parameters
new_name (str) – The new name.
- Returns
The new spec.
- Return type
-
with_description
(new_description)[source]¶ Creates a new spec with the same properties, updating description.
- Parameters
new_description (str) – The new description.
- Returns
The new spec.
- Return type
-
static
Category Collection¶
-
class
tamr_unify_client.categorization.category.collection.
CategoryCollection
(client, api_path)[source]¶ Collection of
Category
s.- Parameters
-
by_external_id
(external_id)[source]¶ Retrieve an attribute by external ID.
Since categories do not have external IDs, this method is not supported and will raise a
NotImplementedError
.- Parameters
external_id (str) – The external ID.
- Returns
The specified category, if found.
- Return type
- Raises
KeyError – If no category with the specified external_id is found
LookupError – If multiple categories with the specified external_id are found
-
stream
()[source]¶ Stream categories in this collection. Implicitly called when iterating over this collection.
- Returns
Stream of categories.
- Return type
Python generator yielding
Category
- Usage:
>>> for category in collection.stream(): # explicit >>> do_stuff(category) >>> for category in collection: # implicit >>> do_stuff(category)
-
create
(creation_spec)[source]¶ Creates a new category.
- Parameters
creation_spec (dict) – Category creation specification, formatted as specified in the Public Docs for Creating a Category.
- Returns
The newly created category.
- Return type
Taxonomy¶
-
class
tamr_unify_client.categorization.taxonomy.
Taxonomy
(client, data, alias=None)[source]¶ A project’s taxonomy
-
categories
()[source]¶ Retrieves the categories of this taxonomy.
- Returns
A collection of the taxonomy categories.
- Return type
-
delete
()¶ Deletes this resource. Some resources do not support deletion, and will raise a 405 error if this is called.
- Returns
HTTP response from the server
- Return type
-
Client¶
-
class
tamr_unify_client.
Client
(auth, host='localhost', protocol='http', port=9100, base_path='/api/versioned/v1/', session=None, store_auth_cookie=False)[source]¶ Python Client for Tamr API.
Each client is specific to a specific origin (protocol, host, port).
- Parameters
auth (
AuthBase
) –Tamr-compatible Authentication provider.
Recommended: use one of the classes described in Authentication
host (
str
) – Host address of remote Tamr instance (e.g.'10.0.10.0'
)protocol (
str
) – Either'http'
or'https'
base_path (
str
) – Base API path. Requests made by this client will be relative to this path.session (
Optional
[Session
]) – Session to use for API calls. If none is provided, will use a newrequests.Session
.
Example
>>> from tamr_unify_client import Client >>> from tamr_unify_client.auth import UsernamePasswordAuth >>> auth = UsernamePasswordAuth('my username', 'my password') >>> tamr_local = Client(auth) # on http://localhost:9100 >>> tamr_remote = Client(auth, protocol='https', host='10.0.10.0') # on https://10.0.10.0:9100 >>> tamr_remote = Client(auth, protocol='https', host='10.0.10.0', port=None) # on https://10.0.10.0
-
property
origin
¶ HTTP origin i.e.
<protocol>://<host>[:<port>]
.For additional information, see MDN web docs .
- Return type
-
request
(method, endpoint, **kwargs)[source]¶ Sends a request to Tamr.
The URL for the request will be
<origin>/<base_path>/<endpoint>
. The request is authenticated viaClient.auth
.
Fetch and store an auth token for the given client configuration
-
property
projects
¶ Collection of all projects on this Tamr instance.
- Return type
- Returns
Collection of all projects.
-
property
datasets
¶ Collection of all datasets on this Tamr instance.
- Return type
- Returns
Collection of all datasets.
Datasets¶
Dataset¶
-
class
tamr_unify_client.dataset.resource.
Dataset
(client, data, alias=None)[source]¶ A Tamr dataset.
list[str]
- Type
-
property
attributes
¶ Attributes of this dataset.
- Returns
Attributes of this dataset.
- Return type
-
upsert_from_dataframe
(df, *, primary_key_name, ignore_nan=None)[source]¶ Upserts a record for each row of df with attributes for each column in df.
- Parameters
- Return type
- Returns
JSON response body from the server.
- Raises
KeyError – If primary_key_name is not a column in df.
-
upsert_records
(records, primary_key_name, *, ignore_nan=False)[source]¶ Creates or updates the specified records.
- Parameters
records (iterable[dict]) – The records to update, as dictionaries.
primary_key_name (str) – The name of the primary key for these records, which must be a key in each record dictionary.
ignore_nan (bool) – Whether to convert NaN values to null when upserting records. If False and NaN is found this function will fail. Deprecated.
- Returns
JSON response body from the server.
- Return type
-
delete_records_by_id
(record_ids)[source]¶ Deletes the specified records.
- Parameters
record_ids (iterable) – The IDs of the records to delete.
- Returns
JSON response body from the server.
- Return type
-
delete_all_records
()[source]¶ Removes all records from the dataset.
- Returns
HTTP response from the server
- Return type
-
refresh
(**options)[source]¶ Brings dataset up-to-date if needed, taking whatever actions are required.
- Parameters
**options – Options passed to underlying
Operation
. Seeapply_options()
.- Returns
The refresh operation.
- Return type
-
profile
()[source]¶ Returns profile information for a dataset.
If profile information has not been generated, call create_profile() first. If the returned profile information is out-of-date, you can call refresh() on the returned object to bring it up-to-date.
- Returns
Dataset Profile information.
- Return type
-
create_profile
(**options)[source]¶ Create a profile for this dataset.
If a profile already exists, the existing profile will be brought up to date.
- Parameters
**options – Options passed to underlying
Operation
. Seeapply_options()
.- Returns
The operation to create the profile.
- Return type
-
records
()[source]¶ Stream this dataset’s records as Python dictionaries.
- Returns
Stream of records.
- Return type
Python generator yielding
dict
-
status
()[source]¶ Retrieve this dataset’s streamability status.
- Returns
Dataset streamability status.
- Return type
-
usage
()[source]¶ Retrieve this dataset’s usage by recipes and downstream datasets.
- Returns
The dataset’s usage.
- Return type
-
from_geo_features
(features, geo_attr=None)[source]¶ Upsert this dataset from a geospatial FeatureCollection or iterable of Features.
features can be:
An object that implements
__geo_interface__
as a FeatureCollection (see https://gist.github.com/sgillies/2217756)An iterable of features, where each element is a feature dictionary or an object that implements the
__geo_interface__
as a FeatureA map where the “features” key contains an iterable of features
See: geopandas.GeoDataFrame.from_features()
If geo_attr is provided, then the named Tamr attribute will be used for the geometry. If geo_attr is not provided, then the first attribute on the dataset with geometry type will be used for the geometry.
-
upstream_datasets
()[source]¶ The Dataset’s upstream datasets.
API returns the URIs of the upstream datasets, resulting in a list of DatasetURIs, not actual Datasets.
- Returns
A list of the Dataset’s upstream datasets.
- Return type
list[
DatasetURI
]
-
delete
(cascade=False)[source]¶ Deletes this dataset, optionally deleting all derived datasets as well.
- Parameters
cascade (bool) – Whether to delete all datasets derived from this one. Optional, default is False. Do not use this option unless you are certain you need it as it can have unindended consequences.
- Returns
HTTP response from the server
- Return type
-
itergeofeatures
(geo_attr=None)[source]¶ Returns an iterator that yields feature dictionaries that comply with __geo_interface__
See https://gist.github.com/sgillies/2217756
- Parameters
geo_attr (str) – (optional) name of the Tamr attribute to use for the feature’s geometry
- Returns
stream of features
- Return type
Python generator yielding
dict[str, object]
Dataset Spec¶
-
class
tamr_unify_client.dataset.resource.
DatasetSpec
(client, data, api_path)[source]¶ A representation of the server view of a dataset.
-
static
of
(resource)[source]¶ Creates a dataset spec from a dataset.
- Parameters
resource (
Dataset
) – The existing dataset.- Returns
The corresponding dataset spec.
- Return type
-
static
new
()[source]¶ Creates a blank spec that could be used to construct a new dataset.
- Returns
The empty spec.
- Return type
-
from_data
(data)[source]¶ Creates a spec with the same client and API path as this one, but new data.
- Parameters
data (dict) – The data for the new spec.
- Returns
The new spec.
- Return type
-
to_dict
()[source]¶ Returns a version of this spec that conforms to the API representation.
- Returns
The spec’s dict.
- Return type
-
with_name
(new_name)[source]¶ Creates a new spec with the same properties, updating name.
- Parameters
new_name (str) – The new name.
- Returns
A new spec.
- Return type
-
with_external_id
(new_external_id)[source]¶ Creates a new spec with the same properties, updating external ID.
- Parameters
new_external_id (str) – The new external ID.
- Returns
A new spec.
- Return type
-
with_description
(new_description)[source]¶ Creates a new spec with the same properties, updating description.
- Parameters
new_description (str) – The new description.
- Returns
A new spec.
- Return type
-
with_key_attribute_names
(new_key_attribute_names)[source]¶ Creates a new spec with the same properties, updating key attribute names.
Creates a new spec with the same properties, updating tags.
- Parameters
- Returns
A new spec.
- Return type
-
static
Dataset Collection¶
-
class
tamr_unify_client.dataset.collection.
DatasetCollection
(client, api_path='datasets')[source]¶ Collection of
Dataset
s.- Parameters
-
by_external_id
(external_id)[source]¶ Retrieve a dataset by external ID.
- Parameters
external_id (str) – The external ID.
- Returns
The specified dataset, if found.
- Return type
- Raises
KeyError – If no dataset with the specified external_id is found
LookupError – If multiple datasets with the specified external_id are found
-
stream
()[source]¶ Stream datasets in this collection. Implicitly called when iterating over this collection.
- Returns
Stream of datasets.
- Return type
Python generator yielding
Dataset
- Usage:
>>> for dataset in collection.stream(): # explicit >>> do_stuff(dataset) >>> for dataset in collection: # implicit >>> do_stuff(dataset)
-
delete_by_resource_id
(resource_id, cascade=False)[source]¶ Deletes a dataset from this collection by resource_id. Optionally deletes all derived datasets as well.
- Parameters
- Returns
HTTP response from the server.
- Return type
-
create
(creation_spec)[source]¶ Create a Dataset in Tamr
- Parameters
creation_spec (dict[str, str]) – Dataset creation specification should be formatted as specified in the Public Docs for Creating a Dataset.
- Returns
The created Dataset
- Return type
-
create_from_dataframe
(df, primary_key_name, dataset_name, ignore_nan=None)[source]¶ Creates a dataset in this collection with the given name, creates an attribute for each column in the df (with primary_key_name as the key attribute), and upserts a record for each row of df.
Each attribute has the default type ARRAY[STRING], besides the key attribute, which will have type STRING.
This function attempts to ensure atomicity, but it is not guaranteed. If an error occurs while creating attributes or records, an attempt will be made to delete the dataset that was created. However, if this request errors, it will not try again.
- Parameters
df (
pandas.DataFrame
) – The data to create the dataset with.primary_key_name (str) – The name of the primary key of the dataset. Must be a column of df.
dataset_name (str) – What to name the dataset in Tamr. There cannot already be a dataset with this name.
ignore_nan (bool) – Legacy parameter that does nothing
- Returns
The newly created dataset.
- Return type
- Raises
KeyError – If primary_key_name is not a column in df.
CreationError – If a step in creating the dataset fails.
-
class
tamr_unify_client.dataset.collection.
CreationError
(error_message)[source]¶ An error from
create_from_dataframe()
-
with_traceback
()¶ Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.
-
Dataset Profile¶
-
class
tamr_unify_client.dataset.profile.
DatasetProfile
(client, data, alias=None)[source]¶ Profile info of a Tamr dataset.
-
property
relative_dataset_id
¶ The relative dataset ID of the associated dataset.
-
refresh
(**options)[source]¶ Updates the dataset profile if needed.
The dataset profile is updated on the server; you will need to call
profile()
to retrieve the updated profile.- Parameters
**options – Options passed to underlying
Operation
. Seeapply_options()
.- Returns
The refresh operation.
- Return type
-
delete
()¶ Deletes this resource. Some resources do not support deletion, and will raise a 405 error if this is called.
- Returns
HTTP response from the server
- Return type
-
property
Dataset Status¶
-
class
tamr_unify_client.dataset.status.
DatasetStatus
(client, data, alias=None)[source]¶ Streamability status of a Tamr dataset.
-
property
relative_dataset_id
¶ The relative dataset ID of the associated dataset.
-
property
is_streamable
¶ Whether the associated dataset is available to be streamed.
-
delete
()¶ Deletes this resource. Some resources do not support deletion, and will raise a 405 error if this is called.
- Returns
HTTP response from the server
- Return type
-
property
Dataset URI¶
Dataset Usage¶
-
class
tamr_unify_client.dataset.usage.
DatasetUsage
(client, data, alias=None)[source]¶ The usage of a dataset and its downstream dependencies.
See https://docs.tamr.com/reference#retrieve-downstream-dataset-usage
-
property
usage
¶ -
- Type
-
property
dependencies
¶ list[
DatasetUse
]- Type
-
delete
()¶ Deletes this resource. Some resources do not support deletion, and will raise a 405 error if this is called.
- Returns
HTTP response from the server
- Return type
-
property
Dataset Use¶
-
class
tamr_unify_client.dataset.use.
DatasetUse
(client, data)[source]¶ The use of a dataset in project steps. This is not a BaseResource because it has no API path and cannot be directly retrieved or modified.
See https://docs.tamr.com/reference#retrieve-downstream-dataset-usage
- Parameters
-
property
input_to_project_steps
¶ list[
ProjectStep
]- Type
-
property
output_from_project_steps
¶ list[
ProjectStep
]- Type
Machine Learning Model¶
-
class
tamr_unify_client.base_model.
MachineLearningModel
(client, data, alias=None)[source]¶ A Tamr Machine Learning model.
-
train
(**options)[source]¶ Learn from verified labels.
- Parameters
**options – Options passed to underlying
Operation
. Seeapply_options()
.- Returns
The resultant operation.
- Return type
-
predict
(**options)[source]¶ Suggest labels for unverified records.
- Parameters
**options – Options passed to underlying
Operation
. Seeapply_options()
.- Returns
The resultant operation.
- Return type
-
delete
()¶ Deletes this resource. Some resources do not support deletion, and will raise a 405 error if this is called.
- Returns
HTTP response from the server
- Return type
-
Mastering¶
Binning Model¶
-
class
tamr_unify_client.mastering.binning_model.
BinningModel
(client, data, alias=None)[source]¶ A binning model object.
-
records
()[source]¶ Stream this object’s records as Python dictionaries.
- Returns
Stream of records.
- Return type
Python generator yielding
dict
-
update_records
(records)[source]¶ Send a batch of record creations/updates/deletions to this dataset.
- Parameters
records (iterable[dict]) – Each record should be formatted as specified in the Public Docs for Dataset updates.
- Returns
JSON response body from server.
- Return type
-
delete
()¶ Deletes this resource. Some resources do not support deletion, and will raise a 405 error if this is called.
- Returns
HTTP response from the server
- Return type
-
Estimated Pair Counts¶
-
class
tamr_unify_client.mastering.estimated_pair_counts.
EstimatedPairCounts
(client, data, alias=None)[source]¶ Estimated Pair Counts info for Mastering Project
-
property
is_up_to_date
¶ Whether an estimate pairs job has been run since the last edit to the binning model.
- Return type
-
property
total_estimate
¶ The total number of estimated candidate pairs and generated pairs for the model across all clauses.
-
property
clause_estimates
¶ The estimated candidate pair count and generated pair count for each clause in the model.
- Returns
A dictionary containing each clause name mapped to a dictionary containing the corresponding estimated candidate and generated pair counts. For example:
{
“Clause1”: {
“candidatePairCount”: “321”,
”generatedPairCount”: “123”
},
”Clause2”: {
“candidatePairCount”: “654”,
”generatedPairCount”: “456”
}
}
- Return type
-
refresh
(**options)[source]¶ Updates the estimated pair counts if needed.
The pair count estimates are updated on the server; you will need to call
estimate_pairs()
to retrieve the updated estimate.- Parameters
**options – Options passed to underlying
Operation
. Seeapply_options()
.- Returns
The refresh operation.
- Return type
-
delete
()¶ Deletes this resource. Some resources do not support deletion, and will raise a 405 error if this is called.
- Returns
HTTP response from the server
- Return type
-
property
Mastering Project¶
-
class
tamr_unify_client.mastering.project.
MasteringProject
(client, data, alias=None)[source]¶ A Mastering project in Tamr.
-
pairs
()[source]¶ Record pairs generated by Tamr’s binning model. Pairs are displayed on the “Pairs” page in the Tamr UI.
Call
refresh()
from this dataset to regenerate pairs according to the latest binning model.- Returns
The record pairs represented as a dataset.
- Return type
-
pair_matching_model
()[source]¶ Machine learning model for pair-matching for this Mastering project. Learns from verified labels and predicts categorization labels for unlabeled pairs.
Calling
predict()
from this dataset will produce new (unpublished) clusters. These clusters are displayed on the “Clusters” page in the Tamr UI.- Returns
The machine learning model for pair-matching.
- Return type
-
high_impact_pairs
()[source]¶ High-impact pairs as a dataset. Tamr labels pairs as “high-impact” if labeling these pairs would help it learn most quickly (i.e. “Active learning”).
High-impact pairs are displayed with a ⚡ lightning bolt icon on the “Pairs” page in the Tamr UI.
Call
refresh()
from this dataset to produce new high-impact pairs according to the latest pair-matching model.- Returns
The high-impact pairs represented as a dataset.
- Return type
-
record_clusters
()[source]¶ Record Clusters as a dataset. Tamr clusters labeled pairs using pairs model. These clusters populate the cluster review page and get transient cluster ids, rather than published cluster ids (i.e., “Permanent Ids”)
Call
refresh()
from this dataset to generate clusters based on to the latest pair-matching model.- Returns
The record clusters represented as a dataset.
- Return type
-
published_clusters
()[source]¶ Published record clusters generated by Tamr’s pair-matching model.
- Returns
The published clusters represented as a dataset.
- Return type
-
published_clusters_configuration
()[source]¶ Retrieves published clusters configuration for this project.
- Returns
The published clusters configuration
- Return type
-
published_cluster_ids
()[source]¶ Retrieves published cluster IDs for this project.
- Returns
The published cluster ID dataset.
- Return type
-
published_cluster_stats
()[source]¶ Retrieves published cluster stats for this project.
- Returns
The published cluster stats dataset.
- Return type
-
published_cluster_versions
(cluster_ids)[source]¶ Retrieves version information for the specified published clusters. See https://docs.tamr.com/reference#retrieve-published-clusters-given-cluster-ids.
- Parameters
cluster_ids (iterable[str]) – The persistent IDs of the clusters to get version information for.
- Returns
A stream of the published clusters.
- Return type
Python generator yielding
PublishedCluster
-
record_published_cluster_versions
(record_ids)[source]¶ Retrieves version information for the published clusters of the given records. See https://docs.tamr.com/reference#retrieve-published-clusters-given-record-ids.
- Parameters
record_ids (iterable[str]) – The Tamr IDs of the records to get cluster version information for.
- Returns
A stream of the relevant published clusters.
- Return type
Python generator yielding
RecordPublishedCluster
-
estimate_pairs
()[source]¶ Returns pair estimate information for a mastering project
- Returns
Pairs Estimate information.
- Return type
-
record_clusters_with_data
()[source]¶ Project’s unified dataset with associated clusters.
- Returns
The record clusters with data represented as a dataset
- Return type
-
published_clusters_with_data
()[source]¶ Project’s unified dataset with associated clusters.
- Returns
The published clusters with data represented as a dataset
- Return type
-
binning_model
()[source]¶ Binning model for this project.
- Returns
Binning model for this project.
- Return type
-
add_input_dataset
(dataset)¶ Associate a dataset with a project in Tamr.
By default, datasets are not associated with any projects. They need to be added as input to a project before they can be used as part of that project
- Parameters
dataset (
Dataset
) – The dataset to associate with the project.- Returns
HTTP response from the server
- Return type
-
as_categorization
()¶ Convert this project to a
CategorizationProject
-
as_mastering
()¶ Convert this project to a
MasteringProject
- Returns
This project.
- Return type
- Raises
-
attribute_configurations
()¶ Project’s attribute’s configurations.
- Returns
The configurations of the attributes of a project.
- Return type
-
attribute_mappings
()¶ Project’s attribute’s mappings.
- Returns
The attribute mappings of a project.
- Return type
-
property
attributes
¶ Attributes of this project.
- Returns
Attributes of this project.
- Return type
-
delete
()¶ Deletes this resource. Some resources do not support deletion, and will raise a 405 error if this is called.
- Returns
HTTP response from the server
- Return type
-
input_datasets
()¶ Retrieve a collection of this project’s input datasets.
- Returns
The project’s input datasets.
- Return type
-
remove_input_dataset
(dataset)¶ Remove a dataset from a project.
- Parameters
dataset (
Dataset
) – The dataset to be removed from this project.- Returns
HTTP response from the server
- Return type
-
spec
()¶ Returns this project’s spec.
- Returns
The spec for the project.
- Return type
-
property
type
¶ //docs.tamr.com/reference#create-a-project.
- Type
- Type
A Tamr project type, listed in https
-
Published Clusters¶
Metric¶
Published Cluster¶
-
class
tamr_unify_client.mastering.published_cluster.resource.
PublishedCluster
(data)[source]¶ A representation of a published cluster in a mastering project with version information. See https://docs.tamr.com/reference#retrieve-published-clusters-given-cluster-ids.
This is not a BaseResource because it does not have its own API endpoint.
- Parameters
data – The JSON entity representing this
PublishedCluster
.
-
property
versions
¶ list[
PublishedClusterVersion
]- Type
Published Cluster Configuration¶
-
class
tamr_unify_client.mastering.published_cluster.configuration.
PublishedClustersConfiguration
(client, data, alias=None)[source]¶ The configuration of published clusters in a project.
See https://docs.tamr.com/reference#the-published-clusters-configuration-object
-
spec
()[source]¶ Returns a spec representation of this published cluster configuration.
- Returns
The published cluster configuration spec.
- Return type
:class`~tamr_unify_client.mastering.published_cluster.configuration.PublishedClustersConfigurationSpec`
-
delete
()¶ Deletes this resource. Some resources do not support deletion, and will raise a 405 error if this is called.
- Returns
HTTP response from the server
- Return type
-
Published Cluster Version¶
Record Published Cluster¶
-
class
tamr_unify_client.mastering.published_cluster.record.
RecordPublishedCluster
(data)[source]¶ A representation of a published cluster of a record in a mastering project with version information. See https://docs.tamr.com/reference#retrieve-published-clusters-given-record-ids.
This is not a BaseResource because it does not have its own API endpoint.
- Parameters
data – The JSON entity representing this
RecordPublishedCluster
.
-
property
versions
¶ list[
RecordPublishedClusterVersion
]- Type
Record Published Cluster Version¶
-
class
tamr_unify_client.mastering.published_cluster.record_version.
RecordPublishedClusterVersion
(data)[source]¶ A version of a published cluster in a mastering project.
This is not a BaseResource because it does not have its own API endpoint.
- Parameters
data – The JSON entity representing this version.
Operation¶
-
class
tamr_unify_client.operation.
Operation
(client, data, alias=None)[source]¶ A long-running operation performed by Tamr. Operations appear on the “Jobs” page of the Tamr UI.
By design, client-side operations represent server-side operations at a particular point in time (namely, when the operation was fetched from the server). In other words: Operations will not pick up on server-side changes automatically. To get an up-to-date representation, refetch the operation e.g.
op = op.poll()
.-
classmethod
from_response
(client, response)[source]¶ Handle idiosyncrasies in constructing Operations from Tamr responses. When a Tamr API call would start an operation, but all results that would be produced by that operation are already up-to-date, Tamr returns HTTP 204 No Content
To make it easy for client code to handle these API responses without checking the response code, this method will either construct an Operation, or a dummy NoOp operation representing the 204 Success response.
- Parameters
client (
Client
) – Delegate underlying API calls to this client.response (
requests.Response
) – HTTP Response from the request that started the operation.
- Returns
Operation
- Return type
-
apply_options
(asynchronous=False, **options)[source]¶ Applies operation options to this operation.
NOTE: This function should not be called directly. Rather, options should be passed in through a higher-level function e.g.
refresh()
.
-
property
state
¶ Server-side state of this operation.
Operation state can be unresolved (i.e.
state
is one of:'PENDING'
,'RUNNING'
), or resolved (i.e. state is one of:'CANCELED'
,'SUCCEEDED'
,'FAILED'
). Unless opting into asynchronous mode, all exposed operations should be resolved.Note: you only need to manually pick up server-side changes when opting into asynchronous mode when kicking off this operation.
- Usage:
>>> op.state # operation is currently 'PENDING' 'PENDING' >>> op.wait() # continually polls until operation resolves >>> op.state # incorrect usage; operation object state never changes. 'PENDING' >>> op = op.poll() # correct usage; use value returned by Operation.poll or Operation.wait >>> op.state 'SUCCEEDED'
-
poll
()[source]¶ Poll this operation for server-side updates.
Does not update the calling
Operation
object. Instead, returns a newOperation
.- Returns
Updated representation of this operation.
- Return type
-
wait
(poll_interval_seconds=3, timeout_seconds=None)[source]¶ Continuously polls for this operation’s server-side state.
- Parameters
- Raises
TimeoutError – If operation takes longer than timeout_seconds to resolve.
- Returns
Resolved operation.
- Return type
-
succeeded
()[source]¶ Convenience method for checking if operation was successful.
- Returns
True
if operation’s state is'SUCCEEDED'
,False
otherwise.- Return type
-
delete
()¶ Deletes this resource. Some resources do not support deletion, and will raise a 405 error if this is called.
- Returns
HTTP response from the server
- Return type
-
classmethod
Projects¶
Attribute Configurations¶
Attribute Configuration¶
-
class
tamr_unify_client.project.attribute_configuration.resource.
AttributeConfiguration
(client, data, alias=None)[source]¶ The configurations of Tamr Attributes.
See https://docs.tamr.com/reference#the-attribute-configuration-object
-
spec
()[source]¶ Returns this attribute configuration’s spec.
- Returns
The spec of this attribute configuration.
- Return type
-
delete
()¶ Deletes this resource. Some resources do not support deletion, and will raise a 405 error if this is called.
- Returns
HTTP response from the server
- Return type
-
Attribute Configuration Spec¶
-
class
tamr_unify_client.project.attribute_configuration.resource.
AttributeConfigurationSpec
(client, data, api_path)[source]¶ A representation of the server view of an attribute configuration.
-
static
of
(resource)[source]¶ Creates an attribute configuration spec from an attribute configuration.
- Parameters
resource (
AttributeConfiguration
) – The existing attribute configuration.- Returns
The corresponding attribute creation spec.
- Return type
-
static
new
()[source]¶ Creates a blank spec that could be used to construct a new attribute configuration.
- Returns
The empty spec.
- Return type
-
from_data
(data)[source]¶ Creates a spec with the same client and API path as this one, but new data.
- Parameters
data (dict) – The data for the new spec.
- Returns
The new spec.
- Return type
-
to_dict
()[source]¶ Returns a version of this spec that conforms to the API representation.
- Returns
The spec’s dict.
- Return type
-
with_attribute_role
(new_attribute_role)[source]¶ Creates a new spec with the same properties, updating attribute role.
- Parameters
new_attribute_role (str) – The new attribute role.
- Returns
A new spec.
- Return type
-
with_similarity_function
(new_similarity_function)[source]¶ Creates a new spec with the same properties, updating similarity function.
- Parameters
new_similarity_function (str) – The new similarity function.
- Returns
A new spec.
- Return type
-
with_enabled_for_ml
(new_enabled_for_ml)[source]¶ Creates a new spec with the same properties, updating enabled for ML.
- Parameters
new_enabled_for_ml (bool) – Whether the builder is enabled for ML.
- Returns
A new spec.
- Return type
-
with_tokenizer
(new_tokenizer)[source]¶ Creates a new spec with the same properties, updating tokenizer.
- Parameters
new_tokenizer (str) – The new tokenizer.
- Returns
A new spec.
- Return type
-
with_numeric_field_resolution
(new_numeric_field_resolution)[source]¶ Creates a new spec with the same properties, updating numeric field resolution.
- Parameters
new_numeric_field_resolution (str) – The new numeric field resolution.
- Returns
A new spec.
- Return type
-
with_attribute_name
(new_attribute_name)[source]¶ Creates a new spec with the same properties, updating new attribute name.
- Parameters
new_attribute_name (str) – The new attribute name.
- Returns
A new spec.
- Return type
-
static
Attribute Configuration Collection¶
-
class
tamr_unify_client.project.attribute_configuration.collection.
AttributeConfigurationCollection
(client, api_path)[source]¶ Collection of
AttributeConfiguration
- Parameters
-
by_resource_id
(resource_id)[source]¶ Retrieve an attribute configuration by resource ID.
- Parameters
resource_id (str) – The resource ID.
- Returns
The specified attribute configuration.
- Return type
-
by_relative_id
(relative_id)[source]¶ Retrieve an attribute configuration by relative ID.
- Parameters
relative_id (str) – The relative ID.
- Returns
The specified attribute configuration.
- Return type
-
by_external_id
(external_id)[source]¶ Retrieve an attribute configuration by external ID.
Since attributes do not have external IDs, this method is not supported and will raise a
NotImplementedError
.- Parameters
external_id (str) – The external ID.
- Returns
The specified attribute, if found.
- Return type
- Raises
KeyError – If no attribute with the specified external_id is found
LookupError – If multiple attributes with the specified external_id are found
NotImplementedError – AttributeConfiguration does not support external_id
-
stream
()[source]¶ Stream attribute configurations in this collection. Implicitly called when iterating over this collection.
- Returns
Stream of attribute configurations.
- Return type
Python generator yielding
AttributeConfiguration
- Usage:
>>> for attributeConfiguration in collection.stream(): # explicit >>> do_stuff(attributeConfiguration) >>> for attributeConfiguration in collection: # implicit >>> do_stuff(attributeConfiguration)
-
create
(creation_spec)[source]¶ Create an Attribute configuration in this collection
- Parameters
creation_spec (dict[str, str]) – Attribute configuration creation specification should be formatted as specified in the Public Docs for adding an AttributeConfiguration.
- Returns
The created Attribute configuration
- Return type
Attribute Mappings¶
Attribute Mapping¶
-
class
tamr_unify_client.project.attribute_mapping.resource.
AttributeMapping
(client, data)[source]¶ see https://docs.tamr.com/reference#retrieve-projects-mappings AttributeMapping and AttributeMappingCollection do not inherit from BaseResource and BaseCollection. BC and BR require a specific URL for each individual attribute mapping (ex: /projects/1/attributeMappings/1), but these types of URLs do not exist for attribute mappings
-
spec
()[source]¶ Returns a spec representation of this attribute mapping.
- Returns
The attribute mapping spec.
- Return type
-
Attribute Mapping Spec¶
-
class
tamr_unify_client.project.attribute_mapping.resource.
AttributeMappingSpec
(data)[source]¶ A representation of the server view of an attribute mapping
-
static
of
(resource)[source]¶ Creates an attribute mapping spec from a attribute mapping.
- Parameters
resource (
AttributeMapping
) – The existing attribute mapping.- Returns
The corresponding attribute mapping spec.
- Return type
-
static
new
()[source]¶ Creates a blank spec that could be used to construct a new attribute mapping.
- Returns
The empty spec.
- Return type
-
to_dict
()[source]¶ Returns a version of this spec that conforms to the API representation.
- Returns
The spec’s dict.
- Return type
-
with_input_attribute_id
(new_input_attribute_id)[source]¶ Creates a new spec with the same properties, updating the input attribute id.
- Parameters
new_input_attribute_id (str) – The new input attribute id.
- Returns
The new spec.
- Return type
-
with_relative_input_attribute_id
(new_relative_input_attribute_id)[source]¶ Creates a new spec with the same properties, updating the relative input attribute id.
- Parameters
new_relative_input_attribute_id (str) – The new relative input attribute Id.
- Returns
The new spec.
- Return type
-
with_input_dataset_name
(new_input_dataset_name)[source]¶ Creates a new spec with the same properties, updating the input dataset name.
- Parameters
new_input_dataset_name (str) – The new input dataset name.
- Returns
The new spec.
- Return type
-
with_input_attribute_name
(new_input_attribute_name)[source]¶ Creates a new spec with the same properties, updating the input attribute name.
- Parameters
new_input_attribute_name (str) – The new input attribute name.
- Returns
The new spec.
- Return type
-
with_unified_attribute_id
(new_unified_attribute_id)[source]¶ Creates a new spec with the same properties, updating the unified attribute id.
- Parameters
new_unified_attribute_id (str) – The new unified attribute id.
- Returns
The new spec.
- Return type
-
with_relative_unified_attribute_id
(new_relative_unified_attribute_id)[source]¶ Creates a new spec with the same properties, updating the relative unified attribute id.
- Parameters
new_relative_unified_attribute_id (str) – The new relative unified attribute id.
- Returns
The new spec.
- Return type
-
with_unified_dataset_name
(new_unified_dataset_name)[source]¶ Creates a new spec with the same properties, updating the unified dataset name.
- Parameters
new_unified_dataset_name (str) – The new unified dataset name.
- Returns
The new spec.
- Return type
-
static
Attribute Mapping Collection¶
-
class
tamr_unify_client.project.attribute_mapping.collection.
AttributeMappingCollection
(client, api_path)[source]¶ Collection of
AttributeMapping
- Parameters
-
stream
()[source]¶ Stream attribute mappings in this collection. Implicitly called when iterating over this collection.
- Returns
Stream of attribute mappings.
- Return type
Python generator yielding
AttributeMapping
-
by_resource_id
(resource_id)[source]¶ Retrieve an item in this collection by resource ID.
- Parameters
resource_id (str) – The resource ID.
- Returns
The specified attribute mapping.
- Return type
-
by_relative_id
(relative_id)[source]¶ Retrieve an item in this collection by relative ID.
- Parameters
relative_id (str) – The relative ID.
- Returns
The specified attribute mapping.
- Return type
-
create
(creation_spec)[source]¶ Create an Attribute mapping in this collection
- Parameters
creation_spec (dict[str, str]) – Attribute mapping creation specification should be formatted as specified in the Public Docs for adding an AttributeMapping.
- Returns
The created Attribute mapping
- Return type
Project¶
-
class
tamr_unify_client.project.resource.
Project
(client, data, alias=None)[source]¶ A Tamr project.
-
property
type
¶ //docs.tamr.com/reference#create-a-project.
- Type
- Type
A Tamr project type, listed in https
-
property
attributes
¶ Attributes of this project.
- Returns
Attributes of this project.
- Return type
-
unified_dataset
()[source]¶ Unified dataset for this project.
- Returns
Unified dataset for this project.
- Return type
-
as_categorization
()[source]¶ Convert this project to a
CategorizationProject
-
as_mastering
()[source]¶ Convert this project to a
MasteringProject
- Returns
This project.
- Return type
- Raises
-
add_input_dataset
(dataset)[source]¶ Associate a dataset with a project in Tamr.
By default, datasets are not associated with any projects. They need to be added as input to a project before they can be used as part of that project
- Parameters
dataset (
Dataset
) – The dataset to associate with the project.- Returns
HTTP response from the server
- Return type
-
remove_input_dataset
(dataset)[source]¶ Remove a dataset from a project.
- Parameters
dataset (
Dataset
) – The dataset to be removed from this project.- Returns
HTTP response from the server
- Return type
-
input_datasets
()[source]¶ Retrieve a collection of this project’s input datasets.
- Returns
The project’s input datasets.
- Return type
-
attribute_configurations
()[source]¶ Project’s attribute’s configurations.
- Returns
The configurations of the attributes of a project.
- Return type
-
attribute_mappings
()[source]¶ Project’s attribute’s mappings.
- Returns
The attribute mappings of a project.
- Return type
-
delete
()¶ Deletes this resource. Some resources do not support deletion, and will raise a 405 error if this is called.
- Returns
HTTP response from the server
- Return type
-
property
Project Spec¶
-
class
tamr_unify_client.project.resource.
ProjectSpec
(client, data, api_path)[source]¶ A representation of the server view of a project.
-
static
of
(resource)[source]¶ Creates a project spec from a project.
- Parameters
resource (
Project
) – The existing project.- Returns
The corresponding project spec.
- Return type
-
static
new
()[source]¶ Creates a blank spec that could be used to construct a new project.
- Returns
The empty spec.
- Return type
-
from_data
(data)[source]¶ Creates a spec with the same client and API path as this one, but new data.
- Parameters
data (dict) – The data for the new spec.
- Returns
The new spec.
- Return type
-
to_dict
()[source]¶ Returns a version of this spec that conforms to the API representation.
- Returns
The spec’s dict.
- Return type
-
with_name
(new_name)[source]¶ Creates a new spec with the same properties, updating name.
- Parameters
new_name (str) – The new name.
- Returns
The new spec.
- Return type
-
with_description
(new_description)[source]¶ Creates a new spec with the same properties, updating description.
- Parameters
new_description (str) – The new description.
- Returns
The new spec.
- Return type
-
with_type
(new_type)[source]¶ Creates a new spec with the same properties, updating type.
- Parameters
new_type (str) – The new type.
- Returns
The new spec.
- Return type
-
with_external_id
(new_external_id)[source]¶ Creates a new spec with the same properties, updating external ID.
- Parameters
new_external_id (str) – The new external ID.
- Returns
The new spec.
- Return type
-
static
Project Collection¶
-
class
tamr_unify_client.project.collection.
ProjectCollection
(client, api_path='projects')[source]¶ Collection of
Project
s.- Parameters
-
by_external_id
(external_id)[source]¶ Retrieve a project by external ID.
- Parameters
external_id (str) – The external ID.
- Returns
The specified project, if found.
- Return type
- Raises
KeyError – If no project with the specified external_id is found
LookupError – If multiple projects with the specified external_id are found
-
stream
()[source]¶ Stream projects in this collection. Implicitly called when iterating over this collection.
- Returns
Stream of projects.
- Return type
Python generator yielding
Project
- Usage:
>>> for project in collection.stream(): # explicit >>> do_stuff(project) >>> for project in collection: # implicit >>> do_stuff(project)
-
by_name
(project_name)[source]¶ Get project by name
Fetches a specific project in this collection by exact-match on name.
-
create
(creation_spec)[source]¶ Create a Project in Tamr
- Parameters
creation_spec (dict[str, str]) – Project creation specification should be formatted as specified in the Public Docs for Creating a Project.
- Returns
The created Project
- Return type
Project Step¶
-
class
tamr_unify_client.project.step.
ProjectStep
(client, data)[source]¶ A step of a Tamr project. This is not a BaseResource because it has no API path and cannot be directly retrieved or modified.
See https://docs.tamr.com/reference#retrieve-downstream-dataset-usage
- Parameters
-
property
type
¶ //docs.tamr.com/reference#create-a-project.
- Type
- Type
A Tamr project type, listed in https
Contributor Guide¶
Contributor guide¶
Thank you for learning how to contribute to Tamr’s Python Client! Your contribution will help you and many others in the Tamr community. Before you begin, make sure you are viewing the latest version of Contributor Guide.
Feedback¶
Before submitting a new issue, you can search existing issues. If the bug/feature has been submitted already, leave a like 👍 on the description of the Github Issue. Maintainers will consider number of likes when prioritizing issues.
Bug reports¶
Submit bug reports as Github issues.
Feature requests¶
Submit feature requests as Github issues.
Documentation¶
How to write docs¶
Before you begin to add content, decide which of the three types of content you want to add:
Tutorial
How-To guide
Explanation
Note
There is fourth type of content, known as Reference.
For the Tamr Client, you don’t need to add reference topics manually because reference documentation for the Tamr Client is generated automatically based on the source code.
For more details, see Reference description below.
For more information about each type of content, see the following descriptions. Also see Divio’s documentation system manual.
Tutorial¶
Tutorials are learning-oriented and …
Must include an end-to-end walkthrough for a specific use case, such as “Tutorial: Deduplicating buildings in Cambridge”.
Must have a clearly stated goal and allow the users to achieve it after they complete the steps in the tutorial.
Must provide the sample data and input configuration that are necessary for the user to complete the tutorial. Include this information upfront, at the start of your tutorial.
Must be self-contained, but can include links to procedures described elsewhere in this documentation.
Tutorials are useful if the use case is both simple and in high demand. Not every use case deserves a tutorial. Before writing a tutorial, think first of a use case that has a high learning value, and then prepare the assets needed to complete your tutorial, such as a sample dataset and sample configuration.
Tutorials are in high demand. If you write a good one, many users will reference it and thank you for your work!
How-To¶
How-Tos are task-oriented and …
Must include a list of numbered steps, known as a task, or a procedure, to help users complete a specific, domain-agnostic task, such as running a request, copying a file, installing, exporting, or other. For example, you can create a task titled “How to stream datasets out of Tamr”.
Must include a context paragraph, such as “It is often useful to stream datasets from Tamr, to load them into business analytics applications, such as Tableau, for analysis.” Context may also include checks needed to be in place before users start the task, and links to related concepts. Context must provide information needed to begin the task, such as, it can list the host and port URL at which the endpoint for the service is served.
Must include a stem sentence, such as: “To stream a dataset out of Tamr:” The stem sentence is followed by numbered steps.
Must include a numbered list of steps where each step must begin with an imperative verb, such as: “Run the following curl request.”, or “Save the file”. For more examples see Use Imperatives in Procedures.
Explanation¶
Explanations are understanding-oriented and …
Must explain a single concept of the Tamr Python client. If you’d like to write another concept, create it separately.
Must keep sentences short.
May include examples of code or text examples.
Reference¶
Reference is information-oriented.
It is something that users cannot remember and want to be able to refer to, often. Reference provides details, such as configuration parameters for a particular method or call. It never contains tasks, or concepts. Reference is often automatically-generated from code, to ensure it is up-to-date and accurate at all times.
Note
Our reference documentation is automatically generated by [autodoc](https://www.sphinx-doc.org/en/master/usage/extensions/autodoc.html) based on type annotations and docstrings in the source code.
Code¶
Installation¶
Prerequisites¶
Install build dependencies for pyenv
Install pyenv
Install poetry
Clone + install¶
Clone your fork and
cd
into the project:git clone https://github.com/<your-github-username>/tamr-client cd tamr-client
Install all Python versions in .python-version:
Dev tasks will use these Python versions.
# run `pyenv install` for each line in `.python-version` cat .python-version | xargs -L 1 pyenv install
Install project dependencies via
poetry
:poetry install
Run dev tasks¶
This project uses nox.
Since nox
will be running inside of a poetry
environment (to guarantee you are running the same version of nox
as everyone else), we recommend adding the following alias to your .bashrc
/ .zshrc
to save you some keystrokes:
alias prn='poetry run nox'
To run all checks:
prn # with alias
poetry run nox # without alias
Formatting¶
To run formatter:
prn -s format # with alias
poetry run nox -s format # without alias
Run the formatter with the --fix
flag to autofix formatting:
prn -s format -- --fix # with alias
poetry run nox -s format -- --fix # without alias
Typechecks¶
To run typechecks:
prn -s typecheck # with alias
poetry run nox -s typecheck # without alias
Tests¶
To run all tests:
prn -s test # with alias
poetry run nox -s test # without alias
To run tests for a specific Python version e.g. 3.6:
prn -s test-3.6 # with alias
poetry run nox -s test-3.6 # without alias
See nox --list
for more details.
To run specific tests, see these pytest docs and pass pytest
args after --
e.g.:
prn -s test -- tests/unit/test_attribute.py # with alias
poetry run nox -s test -- tests/unit/test_attribute.py # without alias
Docs¶
To build the docs:
prn -s docs # with alias
poetry run nox -s docs # without alias
After docs are build, view them by:
open -a 'firefox' docs/_build/index.html # open in Firefox
open -a 'Google Chrome' docs/_build/index.html # open in Chrome
Architectural Decision Records¶
Important architectural decisions are logged as Architectural Decision Records (ADRs) and are housed here.
For more on ADRs, see:
To author new ADRs, we recommend adr-tools.
ADRs¶
Date: 2020-08-14
Accepted
We need to record the architectural decisions made on this project.
We will use Architecture Decision Records, as described by Michael Nygard.
Date: 2019-01-14
Accepted
Inconsistent code formatting slows down development and the review process.
Code should be linted for things like:
unused imports and variables
consistent import order
Code formatting should be done automatically or programmatically, taking the burden off of reviewers.
All linting and formatting are enforced programmatically.
Most linting and formatting errors can be autofixed.
Text editors and IDEs are able to integrate with our linting and formattings tools to automatically fix (most) errors on save.
Date: 2019-06-05
Accepted
Reproducing results from a program is challenging when operating systems, language versions, and dependency versions can vary.
For this codebase, we will focus on consistent Python versions and dependency versions.
Manage multiple Python versions via pyenv.
Manage dependencies via poetry.
Define tests via nox.
Run tests in automation/CI via Github Actions.
This solution lets us:
keep track of abstract and concrete versions for dependencies (think
.lock
file)locally test against multiple Python versions
run the same tests locally as we do in Continuous Integration (CI)
easily view CI test results within the review context
Date: 2019-10-03
Accepted
Documentation can take four forms:
Explanation
Tutorial
How-to
Reference
We need a way to author and host prosey documentation and generate reference docs based on source code.
Doc compilation will be done via sphinx.
Prosey documentation (1-3) via recommonmark.
Reference documentation (4) will be generated based on type annotations and docstrings via:
Automatic docs based on docstrings via sphinx-autodoc, sphinx-autodoc-typehints
Google-style docstrings via napoleon
Hosting on ReadTheDocs (RTD)
Build docs in CI and fail on errors or warnings.
Prosey documentation can be written in Markdown (.md), which is more familiar to our contributors than .rst format.
Reference doc generation makes docs more maintainable and consistent with actual code.
Google-style docstrings are easier to read than sphinx-style docstrings.
RTD natively compiles documentation using sphinx and simultaneously hosts docs at each version.
Date: 2019-11-01
Accepted
We need a reasonable tradeoff between ease-of-use and maintainability.
Specifically, we need composable, combinable units that can be improved independently.
One approach is to embrace Object-Oriented Programming (OOP) with fluent interfaces (i.e. method chaining):
project
.create(...)
.update(...)
.delete(...)
Characteristics:
Ease-of-use is maximized, but this requires each method to
return self
.Also, this approach implies that if a function can be called with X different object types, each of those object types should have a corresponding method that applies that functionality and then
return self
.
How to enforce these characteristics?
Any solution will be a tax on maintainability, as code that adheres to these characteristics will include many non-semantic lines simply going through the motions of return self
and copying function usage into dedicated methods for each class.
Another approach is to embrace a functional programming style: simple types and functions (no methods).
Usage is not as terse as for OOP:
p = tc.project.create(...)
u = tc.project.update(p, ...)
d = tc.project.delete(p, ...)
Characteristics:
Ease-of-use is not optimized, but still reasonable.
With tab-completion, ease-of-use is comparable to OOP.
Each type can be made immutable
Each function can be made pure
Functionality can be shared by calling the same function in user-land, not copying function calls in contributor-land.
Use @dataclass(frozen=True)
to model types and plain Python modules and functions to capture business logic.
Immutable types and pure functions make the code much easier to reason about, drastically cutting down the time to ramp up and debug.
Functions are easily composable without accumulating undesired side-effects, unlike methods.
Note that not all types and functions have to be immutable and pure, but immutable types and pure functions should be the default.
If there are good reasons to make exceptions, we can do so, but we should include comments to explain why that exception was made.
Date: 2020-01-29
Accepted
Static type-checking is available for Python, making us of the type annotations already in the codebase.
Testing is still important, but type checking helps to eliminate bugs via static checking, even for parts of the code not exercised during tests.
Additionally, type-checking relies on our type annotations, ensuring that the annotations are correct and complete.
Date: 2020-04-03
Accepted
We have an existing userbase that relies on tamr_unify_client
and cannot painlessly make backwards-incompatible changes.
But, we want to rearchitect this codebase as a library of composable functions.
Implement rearchitected design as a new package named tamr_client
.
Require the TAMR_CLIENT_BETA=1
feature flag for tamr_client
package usage.
Warn users who attempt to use tamr_client
package to opt-in if they want to beta test the new design.
Continue to support tamr_unify_client
, but any new functionality:
must be included in
tamr_client
may be included in
tamr_unify_client
Users are required to explicitly opt-in to new features, preserving backward compatiblitiy for current users.
Once we reach feature parity with tamr_unify_client
,
we can undergo a deprecation cycle and subsequently remove `tamr_unify_client.
Date: 2020-06-01
Accepted
Python has many ways of importing:
# option 1: import module
# option 1.a
import foo.bar.bazaar as baz
baz.do_the_thing()
# option 1.b
from foo.bar import bazaar as baz
baz.do_the_thing()
# option 2: import value
from foo.bar.bazaar import do_the_thing
do_the_thing()
Not to mention that each of these styles may be done with relative imports (replacing foo.bar
with .bar
if the bar
package is a sibling).
Confusingly, Option 1.a and Option 1.b are conceptually the same, but mechanically there are subtle differences.
Imports within tamr_client
:
Must import statements for modules, classes, and exceptions
Must
from foo import bar
instead ofimport foo.bar as bar
Must not import functions directly. Instead import the containing module and use
module.function(...)
Must not use relative imports. Use absolute imports instead.
Standardized import style helps linter correctly order imports.
Choosing import styles is a syntactic choice without semantic meaning. Removing this choice should speed up development and review.
Date: 2020-06-29
Accepted
Code must be organized to be compatible with:
Static type-checking via mypy
Runtime execution during normal usage and running tests via pytest
Static doc generation via sphinx-autodoc-typehints
Additionally:
Functions should be able to refer to any type
Most types depend on other types non-recursively, but some types (e.g.
SubAttribute
andAttributeType
) do depend on each other recursively / cyclically.
Put types (@dataclass(frozen=True)
) into the _types
module
and have all function modules depend on the _types
module to define their inputs and outputs.
Separating types into a _types
module (e.g. tc.Project
is an alias for tc._types.project.Project
)
and functions into namespaced modules (e.g. tc.project
is a module containing project-specific utilities)
allows all of our tooling to run successfully.
Also, splitting up types and functions means that we can author a function like tc.dataset.attributes
in the tc.dataset
module
while still having the tc.attribute
module depend on tc.Dataset
type.
Finally, for the rare cases where cyclical dependencies for types are unavoidable,
we can use typing.TYPE_CHECKING since mypy
and Python are smart enough to resolve these cyclical correctly via forward references.
Date: 2021-02-04
Accepted
There are multiple, equally-effective ways to implement many features. In some cases, the most
straightforward implementation might involve making more API calls than are strictly necessary
(e.g. tc.dataset.create
makes an additional call to retrieve the created dataset from the server
to construct the returned Dataset
).
The simplest and most understandably-written implementation of a feature should be prioritized over performance or reducing the number of API calls. When real performance issues are identified, optimization should be done on an as-needed basis.
Functions will not be unnecessarily optimized at the cost of readability.
How to write tests¶
Our test suite uses pytest
.
See the pytest docs for:
how to run specific tests
how to capture
print
output for debugging testsetc…
Note that you will need to pass any pytest
arguments after --
so that nox
passes the arguments correctly to pytest
:
prn -s test-3.6 -- -s tests/tamr_client/test_project.py::test_from_resource_id_mastering
Unit tests¶
Each unit test:
must be in a Python file whose name starts with
test_
must be a function whose name starts with
test_
should test one specific feature.
should use
tests.tamr_client.fake
utility to fake resources and Tamr server responses as necessary
For example, testing a simple feature that does not require communication with a Tamr server could look like:
# test_my_feature.py
import tamr_client as tc
from tests.tamr_client import fake
def test_my_feature_works():
# prerequisites
p = fake.project()
d = fake.dataset()
# test my feature
result = tc.my_feature(p, d)
assert result.is_correct()
After using the fake
utilities to set up your prerequisites,
the rest of the test code should be as representative of real user code as possible.
Test code that exercises the feature should not contain any test-specific logic.
If the tested feature requires communication with a Tamr server, you will need to fake Tamr server responses.
In general, any feature that takes a session argument will need faked responses.
You can fake responses via the @fake.json
decorator:
# test_my_feature.py
import tamr_client as tc
from tests.tamr_client import fake
@fake.json
def test_my_feature():
# prerequisites
s = fake.session()
p = fake.project()
# test my feature
result = tc.my_feature(s, p)
assert result.is_correct()
@fake.json
will look for a corresponding fake JSON file within tests/tamr_client/fake_json
,
specifically tests/tamr_client/fake_json/<name of test file>/<name of test function>
.
In the example, that would be tests/tamr_client/fake_json/test_my_feature/test_my_feature_works.json
.
The fake JSON file should be formatted as a list of request/response pairs in order of execution.
For a real examples, see existing fake JSON files within tests/tamr_client/fake_json
.
Contributing pull requests¶
️RFCs¶
If the proposed changes require design input, open a Request For Comment issue.
Discuss the feature with project maintainers to be sure that your change fits with the project vision and that you won’t be wasting effort going in the wrong direction.
Once you get the green light 🟢 from maintainers, you can proceed with the PR.
Pull requests¶
Contributions / PRs should follow the Forking Workflow. In short:
Fork it:
https://github.com/[your-github-username]/tamr-client/fork
Create your feature branch:
git checkout -b my-new-feature
Commit your changes:
git commit -am 'Add some feature'
Push to the branch:
git push origin my-new-feature
Create a new Pull Request
Commits¶
Split and squash commits as necessary to create a clean git
history. Once you ask for review, only add new commits (do not change existing commits) for reviewer convenience. You may change commits in your PR only if reviewers are ok with it.
Commit messages must follow the Conventional Commits. CI for pull requests will enforce this and fail if commit messages are not formatted correctly.
We recommend the Commitzen CLI to make writing Conventional Commits easy, but you may write commit messages manually or use any other tools.
Also, your commit messages should explain any things that are not obvious from reading your code!
Release process¶
Releases are automated by semantic-release.
Maintainers¶
Maintainer responsabilities:
Triage issues
Review + merge pull requests
Discuss RFCs
Publish new releases
Current maintainers:
Want to become a maintainer? Open a pull request that adds your name to the list of current maintainers!
BETA¶
BETA¶
WARNING: Do not rely on BETA features in production workflows. Support from Tamr may be limited.
Tutorials¶
Tutorial: Get Tamr version¶
This tutorial will cover basic Python client usage by guiding you through:
Configuring the connection to a Tamr instance
Retrieving the version of that instance
Prerequisites¶
To complete this tutorial you will need:
tamr-unify-client
installedaccess to a Tamr instance, specifically:
a username and password that allow you to log in to Tamr
the socket address of the instance
The socket address is composed of
The protocol, such as
"https"
or"http"
The host, which may be
"localhost"
if the instance is deployed from the same machine from which your Python code will be runThe port at which you access the Tamr user interface, typically
9100
When you view the Tamr user interface in a browser, the url is <protocol>://<host>:<port>
. If the port is missing, the URL is simply <protocol>://host
.
Steps¶
The Tamr Python client uses a Session
to persist the user’s authentication details across requests made to the server where Tamr is hosted.
A Session
carries authentication credentials derived from a username and password, and is not explicitly tied to any single Tamr instance. For more details, see the documentation for the Requests library.
Use your username and password to create an instance of
tamr_client.UsernamePasswordAuth
.Use the function
tamr_client.session.from.auth
to create aSession
.
from getpass import getpass
import tamr_client as tc
username = input("Tamr Username:")
password = getpass("Tamr Password:")
auth = tc.UsernamePasswordAuth(username, password)
session = tc.session.from_auth(auth)
An Instance
models the installation or instance of Tamr with which a user interacts via the Python client.
Create an
Instance
using theprotocol
,host
, andport
of your Tamr instance.
protocol = "http"
host = "localhost"
port = 9100
instance = tc.Instance(protocol=protocol, host=host, port=port)
With the Session
and Instance
defined, you can now interact with the API of the Tamr instance. One simple example is fetching the version of the Tamr software running on the server.
Use the function
tc.instance.version
and print the returned value.
print(tc.instance.version(session, instance))
All of the above steps can be combined into the following script get_tamr_version.py
:
from getpass import getpass
import tamr_client as tc
username = input("Tamr Username:")
password = getpass("Tamr Password:")
auth = tc.UsernamePasswordAuth(username, password)
session = tc.session.from_auth(auth)
protocol = "http"
host = "localhost"
port = 9100
instance = tc.Instance(protocol=protocol, host=host, port=port)
print(tc.instance.version(session, instance))
To run the script via command line:
TAMR_CLIENT_BETA=1 python get_tamr_version.py
If successful, the printed result should be similar to v2020.016.0
.
Congratulations! This is just the start of what can be done with the Tamr Python client.
To continue learning, see other tutorials and examples.
Tutorial: Continuous Mastering¶
This tutorial will cover using the Python client to keep a Mastering project up-to-date. This includes carrying new data through to the end of the project and using any new labels to update the machine-learning model.
While this is intended to propagate changes such as pair labeling that may be applied in the Tamr user interface, at no point during this tutorial is it necessary to interact with the user interface in any way.
Prerequisites¶
To complete this tutorial you will need:
tamr-unify-client
installedaccess to a Tamr instance, specifically:
a username and password that allow you to log in to Tamr
the socket address of the instance
an existing Mastering project in the following state
the schema mapping between the attributes of the source datasets and the unified dataset has been defined
the blocking model has been defined
labels have been applied to pairs
It is recommended that you first complete the tutorial here. Alternatively, a different Mastering project can be used as long as the above conditions are met.
Steps¶
Use your username and password to create an instance of
tamr_client.UsernamePasswordAuth
.Use the function
tamr_client.session.from.auth
to create aSession
.
from getpass import getpass
import tamr_client as tc
username = input("Tamr Username:")
password = getpass("Tamr Password:")
auth = tc.UsernamePasswordAuth(username, password)
session = tc.session.from_auth(auth)
Create an
Instance
using theprotocol
,host
, andport
of your Tamr instance. Replace these with the corresponding values for your Tamr instance.
protocol = "http"
host = "localhost"
port = 9100
instance = tc.Instance(protocol=protocol, host=host, port=port)
Use the function tc.project.by_name
to retrieve the project information from the server by its name.
project = tc.project.by_name(session, instance, "MasteringTutorial")
Ensure that the retrieved project is a Mastering project by checking its type:
if not isinstance(project, tc.MasteringProject):
raise RuntimeError(f"{project.name} is not a mastering project.")
To update the unified dataset, use the function tc.mastering.update_unified_dataset
. This function:
Applies the attribute mapping configuration
Applies any transformations
Updates the unified dataset with updated source data
operation_1 = tc.mastering.update_unified_dataset(session, project)
tc.operation.check(session, operation_1)
This function and all others in this tutorial are synchronous, meaning that they will not return until the job in Tamr has resolved, either successfully or unsuccessfully. The function tc.operation.check
will raise an exception and halt the script if the job started in Tamr fails for any reason.
To generate pairs according to the configured pair filter rules, use the function tc.mastering.generate_pairs
.
operation_2 = tc.mastering.generate_pairs(session, project)
tc.operation.check(session, operation_2)
Running all of the functions in this section and in the “Apply the model” section that follows is equivalent to initiating “Apply feedback and update results” in the Tamr user interface.
To update the machine-learning model with newly-applied labels use the function tc.mastering.apply_feedback
.
operation_3 = tc.mastering.apply_feedback(session, project)
tc.operation.check(session, operation_3)
Running all of the functions in the previous “Train the model with new labels” section and in this section is equivalent to initiating “Apply feedback and update results” in the Tamr user interface.
Running the functions in this section alone is equivalent to initiating “Update results only” in the Tamr user interface.
Applying the trained machine-learning model requires three functions.
To update the pair prediction results, use the function
tc.mastering.update_pair_results
.
operation_4 = tc.mastering.update_pair_results(session, project)
tc.operation.check(session, operation_4)
To update the list of high-impact pairs, use the function
tc.mastering.update_high_impact_pairs
.
operation_5 = tc.mastering.update_high_impact_pairs(session, project)
tc.operation.check(session, operation_5)
To update the clustering results, use the function
tc.mastering.update_cluster_results
.
operation_6 = tc.mastering.update_cluster_results(session, project)
tc.operation.check(session, operation_6)
To publish the record clusters, use the function tc.mastering.publish_clusters
.
operation_7 = tc.mastering.publish_clusters(session, project)
tc.operation.check(session, operation_7)
All of the above steps can be combined into the following script continuous_mastering.py
:
from getpass import getpass
import tamr_client as tc
username = input("Tamr Username:")
password = getpass("Tamr Password:")
auth = tc.UsernamePasswordAuth(username, password)
session = tc.session.from_auth(auth)
protocol = "http"
host = "localhost"
port = 9100
instance = tc.Instance(protocol=protocol, host=host, port=port)
project = tc.project.by_name(session, instance, "MasteringTutorial")
if not isinstance(project, tc.MasteringProject):
raise RuntimeError(f"{project.name} is not a mastering project.")
operation_1 = tc.mastering.update_unified_dataset(session, project)
tc.operation.check(session, operation_1)
operation_2 = tc.mastering.generate_pairs(session, project)
tc.operation.check(session, operation_2)
operation_3 = tc.mastering.apply_feedback(session, project)
tc.operation.check(session, operation_3)
operation_4 = tc.mastering.update_pair_results(session, project)
tc.operation.check(session, operation_4)
operation_5 = tc.mastering.update_high_impact_pairs(session, project)
tc.operation.check(session, operation_5)
operation_6 = tc.mastering.update_cluster_results(session, project)
tc.operation.check(session, operation_6)
operation_7 = tc.mastering.publish_clusters(session, project)
tc.operation.check(session, operation_7)
To run the script via command line:
TAMR_CLIENT_BETA=1 python continuous_mastering.py
To continue learning, see other tutorials and examples.
Reference¶
Attribute¶
Attribute¶
-
class
tamr_client.
Attribute
(url, name, type, is_nullable, description=None)[source]¶ A Tamr Attribute.
-
tamr_client.attribute.
by_resource_id
(session, parent, id)[source]¶ Get attribute by resource ID
Fetches attribute from Tamr server
- Parameters
- Raises
attribute.NotFound – If no attribute could be found at the specified URL. Corresponds to a 404 HTTP error.
requests.HTTPError – If any other HTTP error is encountered.
- Return type
Attribute
-
tamr_client.attribute.
create
(session, parent, *, name, is_nullable, type=Array(inner_type=<PrimitiveType.STRING: 5>), description=None)[source]¶ Create an attribute
Posts a creation request to the Tamr server
- Parameters
parent (
Union
[Dataset
,CategorizationProject
,MasteringProject
,SchemaMappingProject
,GoldenRecordsProject
,UnknownProject
]) – Dataset or project that should contain the new attributename (
str
) – Name for the new attributetype (
Union
[PrimitiveType
,Array
,Map
,Record
]) – Attribute type for the new attributeis_nullable (
bool
) – Determines if the new attribute can contain NULL valuesdescription (
Optional
[str
]) – Description of the new attributeforce – If True, skips reserved attribute name check
- Return type
Attribute
- Returns
The newly created attribute
- Raises
attribute.ReservedName – If attribute name is reserved.
attribute.AlreadyExists – If an attribute already exists at the specified URL. Corresponds to a 409 HTTP error.
requests.HTTPError – If any other HTTP error is encountered.
-
tamr_client.attribute.
update
(session, attribute, *, description=None)[source]¶ Update an existing attribute
PUTS an update request to the Tamr server
- Parameters
- Return type
Attribute
- Returns
The newly updated attribute
- Raises
attribute.NotFound – If no attribute could be found at the specified URL. Corresponds to a 404 HTTP error.
requests.HTTPError – If any other HTTP error is encountered.
-
tamr_client.attribute.
delete
(session, attribute)[source]¶ Deletes an existing attribute
Sends a deletion request to the Tamr server
- Parameters
attribute (
Attribute
) – Existing attribute to delete- Raises
attribute.NotFound – If no attribute could be found at the specified URL. Corresponds to a 404 HTTP error.
requests.HTTPError – If any other HTTP error is encountered.
-
class
tamr_client.attribute.
AlreadyExists
[source]¶ Raised when trying to create an attribute that already exists on the server
-
class
tamr_client.attribute.
NotFound
[source]¶ Raised when referencing (e.g. updating or deleting) an attribute that does not exist on the server.
AttributeType¶
See https://docs.tamr.com/reference#attribute-types
-
tamr_client.attribute.type.
BOOLEAN
= <PrimitiveType.BOOLEAN: 1>¶ An enumeration.
-
tamr_client.attribute.type.
DOUBLE
= <PrimitiveType.DOUBLE: 2>¶ An enumeration.
-
tamr_client.attribute.type.
INT
= <PrimitiveType.INT: 3>¶ An enumeration.
-
tamr_client.attribute.type.
LONG
= <PrimitiveType.LONG: 4>¶ An enumeration.
-
tamr_client.attribute.type.
STRING
= <PrimitiveType.STRING: 5>¶ An enumeration.
-
tamr_client.attribute.type.
DEFAULT
= Array(inner_type=<PrimitiveType.STRING: 5>)¶ //docs.tamr.com/reference#attribute-types
Note
sphinx_autodoc_typehints cannot handle forward reference to AttributeType, so reference docs are written manually for this type
- Parameters
inner_type –
- Type
See https
-
tamr_client.attribute.type.
GEOSPATIAL
= Record(attributes=(SubAttribute(name='point', type=Array(inner_type=<PrimitiveType.DOUBLE: 2>), is_nullable=True), SubAttribute(name='multiPoint', type=Array(inner_type=Array(inner_type=<PrimitiveType.DOUBLE: 2>)), is_nullable=True), SubAttribute(name='lineString', type=Array(inner_type=Array(inner_type=<PrimitiveType.DOUBLE: 2>)), is_nullable=True), SubAttribute(name='multiLineString', type=Array(inner_type=Array(inner_type=Array(inner_type=<PrimitiveType.DOUBLE: 2>))), is_nullable=True), SubAttribute(name='polygon', type=Array(inner_type=Array(inner_type=Array(inner_type=<PrimitiveType.DOUBLE: 2>))), is_nullable=True), SubAttribute(name='multiPolygon', type=Array(inner_type=Array(inner_type=Array(inner_type=Array(inner_type=<PrimitiveType.DOUBLE: 2>)))), is_nullable=True)))¶ //docs.tamr.com/reference#attribute-types
- Parameters
attributes –
- Type
See https
-
class
tamr_client.attribute.type.
Array
(inner_type)¶ - Parameters
inner_type (
AttributeType
) –
-
class
tamr_client.attribute.type.
Map
(inner_type)¶ - Parameters
inner_type (
AttributeType
) –
-
class
tamr_client.attribute.type.
Record
(attributes)[source]¶ See https://docs.tamr.com/reference#attribute-types
- Parameters
attributes (
Tuple
[SubAttribute
, …]) –
Auth¶
-
class
tamr_client.
JwtTokenAuth
(token)[source]¶ Provides JWT authentication for Tamr.
Specifically, sets the Authorization HTTP header with Bearer format. This feature is only supported in Tamr releases beginning with v2022.010.0
- Parameters
token – The JWT value to be used for authentication
- Usage:
>>> import tamr_client as tc >>> auth = tc.JwtTokenAuth('my token') >>> s = tc.Session(auth)
-
class
tamr_client.
UsernamePasswordAuth
(username, password)[source]¶ Provides username/password authentication for Tamr.
Sets the Authorization HTTP header with Tamr’s custom BasicCreds format.
- Parameters
Example
>>> import tamr_client as tc >>> auth = tc.UsernamePasswordAuth('my username', 'my password') >>> s = tc.Session(auth)
Backup¶
-
class
tamr_client.
Backup
(url, path, state, error_message)[source]¶ A Tamr backup
See https://docs.tamr.com/new/docs/configuration-backup-and-restore
-
tamr_client.backup.
get_all
(session, instance)[source]¶ Get all backups that have been initiated for a Tamr instance.
- Parameters
session (
Session
) – Tamr sessioninstance (
Instance
) – Tamr instance
- Return type
List
[Backup
]- Returns
A list of Tamr backups
- Raises
backup.NotFound – If no backup found at the specified URL
-
tamr_client.backup.
by_resource_id
(session, instance, resource_id)[source]¶ Get information on a specific Tamr backup.
- Parameters
session (
Session
) – Tamr sessioninstance (
Instance
) – Tamr instanceresource_id (
str
) – Resource ID of the backup
- Return type
Backup
- Returns
A Tamr backup
- Raises
backup.NotFound – If no backup found at the specified URL
-
tamr_client.backup.
initiate
(session, instance)[source]¶ Initiate a Tamr backup.
- Parameters
session (
Session
) – Tamr sessioninstance (
Instance
) – Tamr instance
- Return type
Backup
- Returns
Initiated backup
- Raises
backup.InvalidOperation – If attempting an invalid operation
-
tamr_client.backup.
cancel
(session, backup)[source]¶ Cancel a Tamr backup.
- Parameters
session (
Session
) – Tamr sessionbackup (
Backup
) – A Tamr backup
- Return type
Backup
- Returns
Canceled backup
- Raises
backup.NotFound – If no backup found at the specified URL
backup.InvalidOperation – If attempting an invalid operation
-
tamr_client.backup.
poll
(session, backup)[source]¶ Poll this backup for server-side updates.
Does not update the
Backup
object. Instead, returns a newBackup
.- Parameters
session (
Session
) – Tamr sessionbackup (
Backup
) – Tamr backup to be polled
- Return type
Backup
- Returns
A Tamr backup
- Raises
backup.NotFound – If no backup found at the specified URL
Categorization¶
Categorization¶
-
tamr_client.categorization.
update_unified_dataset
(session, project)[source]¶ Apply changes to the unified dataset and wait for the operation to complete
- Parameters
project (
CategorizationProject
) – Tamr Categorization project- Return type
Operation
-
tamr_client.categorization.
apply_feedback
(session, project)[source]¶ Train the categorization model according to verified labels and wait for the operation to complete
- Parameters
project (
CategorizationProject
) – Tamr Categorization project- Return type
Operation
-
tamr_client.categorization.
update_results
(session, project)[source]¶ Generate classifications based on the latest categorization model and wait for the operation to complete
- Parameters
project (
CategorizationProject
) – Tamr Categorization project- Return type
Operation
-
tamr_client.categorization.
manual_labels
(session, project)[source]¶ Get manual labels from a Categorization project.
- Parameters
project (
CategorizationProject
) – Tamr project containing labels- Return type
Dataset
- Returns
Dataset containing manual labels
- Raises
dataset.NotFound – If no dataset could be found at the specified URL
dataset.Ambiguous – If multiple targets match dataset name
Categorization Project¶
-
class
tamr_client.
CategorizationProject
(url, name, description=None)[source]¶ A Tamr Categorization project
-
tamr_client.categorization.project.
create
(session, instance, name, description=None, external_id=None, unified_dataset_name=None)[source]¶ Create a Categorization project in Tamr.
- Parameters
- Return type
Union
[CategorizationProject
,MasteringProject
,SchemaMappingProject
,GoldenRecordsProject
,UnknownProject
]- Returns
Project created in Tamr
- Raises
project.AlreadyExists – If a project with these specifications already exists
requests.HTTPError – If any other HTTP error is encountered
Dataset¶
Dataset¶
-
tamr_client.dataset.
by_resource_id
(session, instance, id)[source]¶ Get dataset by resource ID
Fetches dataset from Tamr server
- Parameters
instance (
Instance
) – Tamr instance containing this datasetid (
str
) – Dataset ID
- Raises
dataset.NotFound – If no dataset could be found at the specified URL. Corresponds to a 404 HTTP error.
requests.HTTPError – If any other HTTP error is encountered.
- Return type
Dataset
-
tamr_client.dataset.
by_name
(session, instance, name)[source]¶ Get dataset by name
Fetches dataset from Tamr server
- Parameters
instance (
Instance
) – Tamr instance containing this datasetname (
str
) – Dataset name
- Raises
dataset.NotFound – If no dataset could be found with that name.
dataset.Ambiguous – If multiple targets match dataset name.
requests.HTTPError – If any other HTTP error is encountered.
- Return type
Dataset
-
tamr_client.dataset.
attributes
(session, dataset)[source]¶ Get all attributes from a dataset
- Parameters
dataset (
Dataset
) – Dataset containing the desired attributes- Return type
Tuple
[Attribute
, …]- Returns
The attributes for the specified dataset
- Raises
requests.HTTPError – If an HTTP error is encountered.
-
tamr_client.dataset.
materialize
(session, dataset)[source]¶ Materialize a dataset and wait for the operation to complete Materializing consists of updating the dataset (including records) in persistent storage (HBase) based on upstream changes to data.
- Parameters
dataset (
Dataset
) – A Tamr dataset which will be materialized- Return type
Operation
-
tamr_client.dataset.
delete
(session, dataset, *, cascade=False)[source]¶ Deletes an existing dataset
Sends a deletion request to the Tamr server
- Parameters
dataset (
Dataset
) – Existing dataset to deletecascade (
bool
) – Whether to delete all derived datasets as well
- Raises
dataset.NotFound – If no dataset could be found at the specified URL. Corresponds to a 404 HTTP error.
requests.HTTPError – If any other HTTP error is encountered.
-
tamr_client.dataset.
get_all
(session, instance, *, filter=None)[source]¶ Get all datasets from an instance
- Parameters
- Return type
Tuple
[Dataset
, …]- Returns
The datasets retrieved from the instance
- Raises
requests.HTTPError – If an HTTP error is encountered.
-
tamr_client.dataset.
create
(session, instance, *, name, key_attribute_names, description=None, external_id=None)[source]¶ Create a dataset in Tamr.
- Parameters
- Return type
Dataset
- Returns
Dataset created in Tamr
- Raises
dataset.AlreadyExists – If a dataset with these specifications already exists.
requests.HTTPError – If any other HTTP error is encountered.
Record¶
See https://docs.tamr.com/reference/record
“The recommended approach for modifying records is to use the upsert()
and
delete()
functions for all use cases they can handle. For more advanced use cases, the
underlying _update()
function can be used directly.”
-
tamr_client.record.
upsert
(session, dataset, records, *, primary_key_name=None)¶ Create or update the specified records.
- Parameters
- Return type
- Returns
JSON response body from server
- Raises
requests.HTTPError – If an HTTP error is encountered
primary_key.NotFound – If primary_key_name does not match dataset primary key
primary_key.NotFound – If primary_key_name not in a record dictionary
-
tamr_client.record.
delete
(session, dataset, records, *, primary_key_name=None)¶ Deletes the specified records, based on primary key values. Does not check that other attribute values match.
- Parameters
- Return type
- Returns
JSON response body from server
- Raises
requests.HTTPError – If an HTTP error is encountered
primary_key.NotFound – If primary_key_name does not match dataset primary key
primary_key.NotFound – If primary_key_name not in a record dictionary
-
tamr_client.record.
_update
(session, dataset, updates)¶ Send a batch of record creations/updates/deletions to this dataset. You probably want to use
upsert()
ordelete()
instead.- Parameters
dataset (
Dataset
) – Dataset containing records to be updatedupdates (
Iterable
[Dict
]) – Each update should be formatted as specified in the Public Docs for Dataset updates.
- Return type
- Returns
JSON response body from server
- Raises
requests.HTTPError – If an HTTP error is encountered
-
tamr_client.record.
stream
(session, dataset)¶ Stream the records in this dataset as Python dictionaries.
Dataframe¶
-
tamr_client.dataframe.
upsert
(session, dataset, df, *, primary_key_name=None)¶ Upserts a record for each row of df with attributes for each column in df.
- Parameters
- Return type
- Returns
JSON response body from the server
- Raises
requests.HTTPError – If an HTTP error is encountered
primary_key.NotFound – If primary_key_name is not a column in df or the index of df
ValueError – If primary_key_name matches both a column in df and the index of df
-
tamr_client.dataframe.
create
(session, instance, df, *, name, primary_key_name=None, description=None, external_id=None)¶ Create a dataset in Tamr from the DataFrame df and creates a record from each row
All attributes other than the primary key are created as the default type array(string)
- Parameters
- Return type
Dataset
- Returns
Dataset created in Tamr
- Raises
dataset.AlreadyExists – If a dataset with these specifications already exists.
requests.HTTPError – If any other HTTP error is encountered.
primary_key.NotFound – If primary_key_name is not a column in df or the index of df
ValueError – If primary_key_name matches both a column in df and the index of df
Unified¶
-
class
tamr_client.dataset.unified.
UnifiedDataset
(url, name, key_attribute_names, description=None)[source]¶ A Tamr unified dataset
-
tamr_client.dataset.unified.
from_project
(session, project)[source]¶ Get unified dataset of a project
Fetches the unified dataset of a given project from Tamr server
- Parameters
project (
Union
[CategorizationProject
,MasteringProject
,SchemaMappingProject
,GoldenRecordsProject
,UnknownProject
]) – Tamr project of this Unified Dataset- Raises
unified.NotFound – If no unified dataset could be found at the specified URL. Corresponds to a 404 HTTP error.
requests.HTTPError – If any other HTTP error is encountered.
- Return type
UnifiedDataset
Golden Records¶
Instance¶
-
class
tamr_client.
Instance
(protocol='http', host='localhost', port=None)[source]¶ Connection parameters for a running Tamr instance
-
tamr_client.instance.
origin
(instance)[source]¶ HTTP origin i.e.
<protocol>://<host>[:<port>]
.For additional information, see MDN web docs .
- Return type
Mastering¶
Mastering¶
-
tamr_client.mastering.
update_unified_dataset
(session, project)[source]¶ Apply changes to the unified dataset and wait for the operation to complete
- Parameters
project (
MasteringProject
) – Tamr Mastering project- Return type
Operation
-
tamr_client.mastering.
estimate_pairs
(session, project)[source]¶ Update the estimated pair counts and wait for the operation to complete
- Parameters
project (
MasteringProject
) – Tamr Mastering project- Return type
Operation
-
tamr_client.mastering.
generate_pairs
(session, project)[source]¶ Generate pairs according to the binning model and wait for the operation to complete
- Parameters
project (
MasteringProject
) – Tamr Mastering project- Return type
Operation
-
tamr_client.mastering.
apply_feedback
(session, project)[source]¶ Train the pair-matching model according to verified labels and wait for the operation to complete
- Parameters
project (
MasteringProject
) – Tamr Mastering project- Return type
Operation
-
tamr_client.mastering.
update_pair_results
(session, project)[source]¶ Update record pair predictions according to the latest pair-matching model and wait for the operation to complete
- Parameters
project (
MasteringProject
) – Tamr Mastering project- Return type
Operation
-
tamr_client.mastering.
update_high_impact_pairs
(session, project)[source]¶ Produce new high-impact pairs according to the latest pair-matching model and wait for the operation to complete
- Parameters
project (
MasteringProject
) – Tamr Mastering project- Return type
Operation
Mastering Project¶
-
tamr_client.mastering.project.
create
(session, instance, name, description=None, external_id=None, unified_dataset_name=None)[source]¶ Create a Mastering project in Tamr.
- Parameters
- Return type
Union
[CategorizationProject
,MasteringProject
,SchemaMappingProject
,GoldenRecordsProject
,UnknownProject
]- Returns
Project created in Tamr
- Raises
project.AlreadyExists – If a project with these specifications already exists.
requests.HTTPError – If any other HTTP error is encountered.
Operation¶
-
class
tamr_client.
Operation
(url, type, status=None, description=None)[source]¶ A Tamr operation
See https://docs.tamr.com/new/reference/the-operation-object
-
tamr_client.operation.
check
(session, operation)[source]¶ Waits for the operation to finish and raises an exception if the operation was not successful.
- Parameters
operation (
Operation
) – Operation to be checked.- Raises
Failed – If the operation failed.
-
tamr_client.operation.
poll
(session, operation)[source]¶ Poll this operation for server-side updates.
Does not update the
Operation
object. Instead, returns a newOperation
.- Parameters
operation (
Operation
) – Operation to be polled.- Return type
Operation
-
tamr_client.operation.
wait
(session, operation, *, poll_interval_seconds=3, timeout_seconds=None)[source]¶ Continuously polls for this operation’s server-side state.
- Parameters
- Raises
TimeoutError – If operation takes longer than timeout_seconds to resolve.
- Return type
Operation
Primary Key¶
Project¶
-
class
tamr_client.
UnknownProject
(url, name, description=None)[source]¶ A Tamr project of an unrecognized type
-
tamr_client.project.
by_resource_id
(session, instance, id)[source]¶ Get project by resource ID. Fetches project from Tamr server.
- Parameters
instance (
Instance
) – Tamr instance containing this datasetid (
str
) – Project ID
- Raises
project.NotFound – If no project could be found at the specified URL. Corresponds to a 404 HTTP error.
requests.HTTPError – If any other HTTP error is encountered.
- Return type
Union
[CategorizationProject
,MasteringProject
,SchemaMappingProject
,GoldenRecordsProject
,UnknownProject
]
-
tamr_client.project.
by_name
(session, instance, name)[source]¶ Get project by name Fetches project from Tamr server.
- Parameters
instance (
Instance
) – Tamr instance containing this projectname (
str
) – Project name
- Raises
project.NotFound – If no project could be found with that name.
project.Ambiguous – If multiple targets match project name.
requests.HTTPError – If any other HTTP error is encountered.
- Return type
Union
[CategorizationProject
,MasteringProject
,SchemaMappingProject
,GoldenRecordsProject
,UnknownProject
]
-
tamr_client.project.
get_all
(session, instance, *, filter=None)[source]¶ Get all projects from an instance
- Parameters
- Return type
Tuple
[Union
[CategorizationProject
,MasteringProject
,SchemaMappingProject
,GoldenRecordsProject
,UnknownProject
], …]- Returns
The projects retrieved from the instance
- Raises
requests.HTTPError – If an HTTP error is encountered.
-
tamr_client.project.
attributes
(session, project)[source]¶ Get all attributes from a project
- Parameters
project (
Union
[CategorizationProject
,MasteringProject
,SchemaMappingProject
,GoldenRecordsProject
,UnknownProject
]) – Project containing the desired attributes- Return type
Tuple
[Attribute
, …]- Returns
The attributes for the specified project
- Raises
requests.HTTPError – If an HTTP error is encountered.
Restore¶
-
class
tamr_client.
Restore
(url, backup_path, state, error_message)[source]¶ A Tamr restore
See https://docs.tamr.com/new/docs/configuration-backup-and-restore
-
tamr_client.restore.
get
(session, instance)[source]¶ Get information on the latest Tamr restore, if any.
- Parameters
session (
Session
) – Tamr sessioninstance (
Instance
) – Tamr instance
- Return type
Restore
- Returns
Latest Tamr restore
- Raises
restore.NotFound – If no backup found at the specified URL
-
tamr_client.restore.
initiate
(session, instance, backup_path)[source]¶ Initiate a Tamr restore.
- Parameters
session (
Session
) – Tamr sessioninstance (
Instance
) – Tamr instancebackup_path (
str
) – Path to the backup
- Return type
Restore
- Returns
Initiated restore
- Raises
restore.InvalidOperation – If attempting an invalid operation
-
tamr_client.restore.
cancel
(session, restore)[source]¶ Cancel a Tamr restore.
- Parameters
session (
Session
) – Tamr sessionrestore (
Restore
) – A Tamr restore
- Return type
Restore
- Returns
Canceled restore
- Raises
restore.NotFound – If no backup file found at the specified path
restore.InvalidOperation – If attempting an invalid operation
Schema Mapping¶
Schema Mapping¶
Schema Mapping Project¶
-
class
tamr_client.
SchemaMappingProject
(url, name, description=None)[source]¶ A Tamr Schema Mapping project
-
tamr_client.schema_mapping.project.
create
(session, instance, name, description=None, external_id=None, unified_dataset_name=None)[source]¶ Create a Schema Mapping project in Tamr.
- Parameters
- Return type
Union
[CategorizationProject
,MasteringProject
,SchemaMappingProject
,GoldenRecordsProject
,UnknownProject
]- Returns
Project created in Tamr
- Raises
project.AlreadyExists – If a project with these specifications already exists.
requests.HTTPError – If any other HTTP error is encountered.
Attribute Mapping Project¶
-
class
tamr_client.
AttributeMapping
(url, input_attribute, unified_attribute)[source]¶ A Tamr Attribute Mapping.
See https://docs.tamr.com/new/reference/retrieve-projects-mappings
- Parameters
url (
URL
) –input_attribute (
Attribute
) –unified_attribute (
Attribute
) –
-
tamr_client.schema_mapping.attribute_mapping.
get_all
(session, tamr_project)[source]¶ Get all attribute mappings of a Tamr project
- Parameters
tamr_project (
Union
[CategorizationProject
,MasteringProject
,SchemaMappingProject
,GoldenRecordsProject
,UnknownProject
]) – Tamr project- Return type
List
[AttributeMapping
]- Returns
The attribute mappings of the project
- Raises
requests.HTTPError – If an HTTP error is encountered.
-
tamr_client.schema_mapping.attribute_mapping.
create
(session, project, input_attribute, unified_attribute)[source]¶ Create a mapping in Tamr between input attributes and unified attributes of the given project.
- Parameters
project (
Union
[CategorizationProject
,MasteringProject
,SchemaMappingProject
,GoldenRecordsProject
,UnknownProject
]) – Tamr projectinput_attribute (
Attribute
) – The attribute of a source dataset to mapunified_attribute (
Attribute
) – The attribute of the unified dataset to map onto
- Return type
AttributeMapping
- Returns
Attribute mapping created in Tamr
- Raises
attribute_mapping.AlreadyExists – If an attribute mapping with these specifications already exists.
requests.HTTPError – If any other HTTP error is encountered.
-
tamr_client.schema_mapping.attribute_mapping.
delete
(session, attribute_mapping)[source]¶ Delete an existing attribute mapping
- Parameters
attribute_mapping (
AttributeMapping
) – Existing attribute mapping to delete- Raises
attribute_mapping.NotFound – If no attribute mapping could be found at the specified URL. Corresponds to a 404 HTTP error.
requests.HTTPError – If any other HTTP error is encountered.
-
class
tamr_client.schema_mapping.attribute_mapping.
NotFound
[source]¶ Raised when referencing an attribute mapping that does not exist on the server.
Transformations¶
-
tamr_client.transformations.
get_all
(session, project)[source]¶ Get the transformations of a Project
- Parameters
project (
Union
[CategorizationProject
,MasteringProject
,SchemaMappingProject
,GoldenRecordsProject
,UnknownProject
]) – Project containing transformations- Raises
requests.HTTPError – If any HTTP error is encountered.
Example
>>> import tamr_client as tc >>> session = tc.session.from_auth('username', 'password') >>> instance = tc.instance.Instance(host="localhost", port=9100) >>> project1 = tc.project.by_resource_id(session, instance, id='1') >>> print(tc.transformations.get_all(session, project1))
- Return type
Transformations
-
tamr_client.transformations.
replace_all
(session, project, tx)[source]¶ Replaces the transformations of a Project
- Parameters
project (
Union
[CategorizationProject
,MasteringProject
,SchemaMappingProject
,GoldenRecordsProject
,UnknownProject
]) – Project to place transformations withintx (
Transformations
) – Transformations to put into project
- Raises
requests.HTTPError – If any HTTP error is encountered.
Example
>>> import tamr_client as tc >>> session = tc.session.from_auth('username', 'password') >>> instance = tc.instance.Instance(host="localhost", port=9100) >>> project1 = tc.project.by_resource_id(session, instance, id='1') >>> dataset3 = tc.dataset.by_resource_id(session, instance, id='3') >>> new_input_tx = tc.InputTransformation("SELECT *, upper(name) as name;", [dataset3]) >>> all_tx = tc.Transformations( ... input_scope=[new_input_tx], ... unified_scope=["SELECT *, 1 as one;"] ... ) >>> tc.transformations.replace_all(session, project1, all_tx)
- Return type
Response¶
Utilities for working with requests.Response
.
-
tamr_client.response.
successful
(response)[source]¶ Ensure response does not contain an HTTP error.
Delegates to
requests.Response.raise_for_status()
- Return type
- Returns
The response being checked.
- Raises
requests.exceptions.HTTPError – If an HTTP error is encountered.
-
tamr_client.response.
ndjson
(response, **kwargs)[source]¶ Stream newline-delimited JSON from the response body
Analog to
requests.Response.json()
but for.ndjson
-formatted body.Recommended: For memory efficiency, use
stream=True
when sending the request corresponding to this response.- Parameters
response (
Response
) – Response whose body should be streamed as newline-delimited JSON.**kwargs – Keyword arguments passed to underlying
requests.Response.iter_lines()
call.
- Returns
Each line of the response body, parsed as JSON
Example
>>> import tamr_client as tc >>> s = tc.session.from_auth(...) >>> r = s.get(..., stream=True) >>> for data in tc.response.ndjson(r): ... assert data['my key'] == 'my_value'
Session¶
The Session
type is an alias for requests.Session
.
For more information, see the official requests.Session
docs.